Amplitude estimation via maximum likelihood on noisy quantum computer

Recently we find several candidates of quantum algorithms that may be implementable in near-term devices for estimating the amplitude of a given quantum state, which is a core sub- routine in various computing tasks such as the Monte Carlo methods. One of those algorithms is based on the maximum likelihood estimate with parallelized quantum circuits. In this paper, we extend this method so that it incorporates the realistic noise effect, and then give an experimental demonstration on a superconducting IBM Quantum device. The maximum likelihood estimator is constructed based on the model assuming the depolarization noise. We then formulate the problem as a two-parameters estimation problem with respect to the target amplitude parameter and the noise parameter. In particular we show that there exist anomalous target values, where the Fisher information matrix becomes degenerate and consequently the estimation error cannot be improved even by increasing the number of amplitude amplifications. The experimental demonstration shows that the proposed maximum likelihood estimator achieves quantum speedup in the number of queries, though the estimation error saturates due to the noise. This saturated value of estimation error is consistent to the theory, which implies the validity of the depolarization noise model and thereby enables us to predict the basic requirement on the hardware components (particularly the gate error) in quantum computers to realize the quantum speedup in the amplitude estimation task.


Introduction
Ideal and fault-tolerant quantum computers will provide us with game-changing platforms in various area such as security [1], chemical [2,3], and financial engineering [4][5][6][7][8][9][10]. Although quantum devices available now are all small and noisy [11], the recent progress in hardware [12] is certainly going to bridge this gap. At the same time various software methods, which are expected to lower the hurdle for realizing quantum computing, are now being developed as well [13][14][15]. With careful evaluation of both of these approaches, it is important to have a right prediction on what applications of quantum computing can be realized and when they might emerge in reality; for instance [16,17]. The quantum volume [18] is one reasonable measure along with discussing such predictions.
This paper focuses on the quantum amplitude estimation algorithm [19,20], which can be typically applied to speedup the classical Monte Carlo methods [4,5], etc; more precisely, in an ideal setup the quantum amplitude estimation algorithm can quadratically reduce the number of samples and thereby the computation time for Monte Carlo methods. Because the standard amplitude estimation algorithm is demanding to implement, due to several technical reasons including the use of many ancilla qubits for quantum Fourier transform and the use of many controlled gate operations, recently some new techniques that circumvent these challenges have been developed [21][22][23][24]. In particular, our approach [21] takes the maximum likelihood (ML) method to estimate the amplitude without both the ancilla qubits and the controlled operation conditioned on those qubits, thereby drastically reducing the number of quantum gates (particularly the controlled NOT gate) involved in this algorithm compared to the standard one; also the near quadratic speedup was demonstrated in a numerical simulation in the ideal setup.
This paper extends the ML method [21] so that it can be used for actual noisy quantum computers, and then gives an experimental demonstration on a superconducting quantum device. The point of the ML method is that, while the true probability distribution generating the data is in general unknown, the ML estimator is constructed based on a suitable model distribution. In our case, the true distribution is determined from the output state of a noisy quantum computer operated under several imperfections that are impossible to perfectly characterize. To approximate such unknown noise source for the purpose of constructing the ML estimator, in this work, we consider the depolarizing noise model; this is often chosen as the minimal or the worst-case noise model [25][26][27], which is also used for quantifying the gate fidelity of a given quantum computing device, i.e., the randomized benchmarking [28][29][30]. We also add a fact that the depolarization noise process has a clear mathematical merit in that it commutes with an arbitrary unitary operation, which thereby enables analytic treatment for our estimation problem. We then formulate the problem as a two-parameters estimation problem with respect to the target amplitude parameter and the noise parameter. This problem formulation introduces a new important aspect to the near-term quantum computing field, in the sense that the unavoidable noise coming from the realistic imperfection has also to be estimated as a nuisance parameter. In our problem, thanks to the property of depolarizing noise, we have an explicit form of the Fisher information matrix for discussing the accuracy of estimation and thereby derive a formula for specifying the noise level so that near-quadratic speedup is achieved to reach a given estimation accuracy. Furthermore, the explicit formula of Fisher information matrix reveals the existence of anomalous case, where the target parameter cannot be efficiently estimated because the Fisher information matrix becomes degenerate. Note that, hence, such anomalous target parameter appears only in the multi-parameters estimation problem. Fortunately we can provide a simple way to circumvent this difficulty.
Below we show the organization of the paper, together with the summary of the results obtained in this paper Section 2: We take the depolarizing noise model and then formulate the two-parameters estimation problem. With this noise model we can have the analytic expression of the Fisher information matrix, which gives an asymptotically achievable lower bound of the estimation error. This result is further used to derive a condition of the noise level required to have nearly quadratic speedup to reach a given estimation error. Furthermore, we show the anomalous case where the Fisher information matrix is degenerate and consequently the target parameter cannot be efficiently estimated; a simple strategy to circumvent this issue is provided. Section 3: We give an experimental demonstration of our ML method for a simple Monte Carlo integration problem, using 2-and 3-qubits IBM Quantum devices [31,32]. The result is that, for the former case, a quantum speedup in the number of queries over the classical one is observed, while the estimation error saturates as the query becomes large. On the other hand, for the 3-qubit case, we found that the estimation error is always bigger than that via the classical method. For both cases the saturated value is consistent to the theoretical prediction, implying the validity of the depolarization noise model. Section 4: We give the following two discussions. First, based on the condition obtained in Section 2 together with the experimental result shown in Section 3, we discuss the hardware requirements, e.g., the error rates of single-qubit and CNOT gates, to achieve a given precision for estimating the value of multidimensional integration. Second, the computational complexity of our proposed algorithm is discussed.
2 Analysis of estimation error under depolarizing noise

Preliminary
We briefly review the amplitude estimation algorithm that uses the ML method on parallelized circuits [21]. This algorithm can estimate the parameter (i.e., the unknown amplitude) quadratically faster than any classical sampling method to attain a given precision, even without the standard phase estimation subroutine. The algorithm mainly consists of two components: one is the amplitude amplification process [19,20], which is the generalized version of Grover's quantum search algorithm, and the other is the classical ML estimation part for the data obtained by the amplitude amplification. The essential idea of the algorithm is sketched below. The amplitude amplification algorithm initially prepares the state given by |Ψ n+1 = A |0 n+1 , where A is a unitary operator acting on the (n + 1) qubits. The operator A is designed to satisfy is an unknown parameter to be estimated, and |Ψ 1 n and |Ψ 0 n are the n-qubit normalized "good" and "bad" states. In terms of θ a ∈ [0, π/2] satisfying sin 2 θ a = a, the prepared state |Ψ n+1 can also be expressed as The probability to measure the good state can be amplified by applying the unitary operator is an identity operator on k qubits. By applying Q on |Ψ n+1 for m times, we have This equation tells that the probability for measuring the good state is amplified depending on the number of repetitions of Q on |Ψ n+1 . Note that Eq. (2) is valid only in the absence of noise. Next, we describe the ML part. The idea is to estimate the amplitude a using the number of measuring the good state after performing m k (for 0 ≤ k ≤ M ) repetitions of Q. Now for the ideal state Q m k |Ψ n+1 , the probability measuring the good state is P (m k ; a) = sin 2 ((2m k + 1)θ a ); then the probability to have the good state h k times out of the total N k measurement shots is proportional to [P (m k ; a)] h k [1 − P (m k ; a)] N k −h k and accordingly the likelihood function is where h = (h 0 , h 1 , · · · , h M ). The ML estimate of a is then given by argmax a L(h; a), which was proven to achieve the theoretical lower bound in the estimation error (the detail is described just below). Note that, of course, the estimated precision depends on the choice of a sequence of integers {m k }. In this paper, we consider the following Linearly Incremental Sequence (LIS) and Exponential Incremental Sequence (EIS): where in the hereafter we omit the floor notation for simplicity. To asymptotically achieve a given error ǫ, LIS and EIS need O(1/ǫ 4/3 ) and O(1/ǫ) total query calls, respectively, while the classical case (i.e., ∀k : m k = 0) needs O(1/ǫ 2 ) to achieve the same precision [21]. Lastly we give a general framework of the above-described method for the vector of multiple parameters θ = [θ 1 , · · · , θ K ] ⊤ under noisy environment, which is indeed the scenario studied in this paper (• ⊤ denotes the transpose). Let P (m k ; θ) be the probability of measuring the good state for a density matrix obtained by applying Q m k on the state (1) under noisy environment. Then the likelihood function corresponding to the probability having the good state h k times in total N k measurement shots is, similar to the above, given by where again h = (h 0 , h 1 , · · · , h M ). The ML estimateθ ML is defined aŝ In general, the estimation error covariance matrix Cov(θ) = E[(θ −θ)(θ −θ) T ], withθ any unbiased estimate, satisfies the Cramér-Rao inequality: where I(θ) is the Fisher information matrix defined as Amplitude estimation via maximum likelihood on noisy quantum computer 5 The expectation value E [•] in Eq. (8) is defined as where X(h) is a function of random variables h. It is known that the ML estimate attains the lower bound of Cov(θ) in Eq. (7), asymptotically in the limit of large samples. The elements of the Fisher information matrix are given as In the case of the multi-parameter estimation problem, the ML estimation of the i-th and j-th (i = j) parameters can be performed independently, if the (i, j) element of the Fisher information matrix is zero. However, otherwise, the i-th and j-th parameters are correlated, and the estimation of parameter of interest may be adversely affected by other nuisance parameters, which is indeed the case in this work as shown later.

Fisher information in the presence of depolarizing noise
Quantum states are usually disturbed by noise that comes from the system-environment interaction. Then the ideal pure state (2) is replace by a mixed state, which is yet impossible to perfectly identify. On the other hand, the ML estimator is based on a model state which may well approximate such an unknown true mixed state. In this work, we assume the depolarization channel defined by [33,34]: Here ρ is a density matrix, D(ρ) is a completely positive trace-preserving (CPTP) map which represents the depolarization of qubits, 1 − p is the error probability that qubits are depolarized, and d is the dimension of the quantum system, i.e., d = 2 n+1 . Note that p should also be treated as an unknown parameter, in addition to a; hence we are studying the two-parameter estimation problem, which is essentially harder compared to the one-parameter problem studied in [21]. Now, the CPTP map of the ideal amplitude amplification channel, in terms of the density matrix, is represented as From Eqs. (11) and (12), the amplitude amplification process in the presence of noise is thus given by Moreover, it is easy to see that m times repetition of this noisy amplitude amplification process end up with Now the initial state is chosen as ρ = |Ψ n+1 Ψ |, where |Ψ n+1 is given in Eq. (1). As in the ideal case, we are interested in the probability of measuring the good state with which the last qubit is found to be |1 , i.e., P (m; θ) = Tr(ρ (m) noise E 1 ), where E 1 = I n ⊗ |1 1|; also θ = [a, κ] ⊤ is the vector of unknown parameters, where κ is defined as κ = − ln p which we refer as the noise level of the amplitude amplification process. By using Eq. (2) and Tr(I n+1 E 1 ) = 2 n , this probability is calculated as Then the likelihood function (5) is represented as Our task is to estimate θ = [a, κ] ⊤ via the ML estimate (6), i.e., θ ML = argmax θ L(h; θ), where h = (h 0 , h 1 , . . . , h M ) is the set of data obtained in the experiment. As mentioned in the previous subsection, θ ML asymptotically achieves the lower bound in the Cramér-Rao inequality (7) if the data is generated from the model distribution; the Fisher information matrix (8) in this case can now be calculated as , sin (4 (2m k + 1) θ a ) e 2κm k − cos 2 (2 (2m k + 1) θ a ) , Recall that a is the parameter of our true interest; from Eq. (7), the estimation error of a is lower bounded by the (1, 1) element of the inverse of the above Fisher information matrix as We use ǫ min (a, κ) to discuss the condition on the noise level κ to satisfy a required estimation precision. These topics are studied in detail in the next subsection. Lastly we remark on other noise models. The most general (Markovian) noise model, with multiple parameters, can be represented by the Kraus superoperator. However, in general this does not commute with Grover operator, and as a result we cannot obtain an analytic expression of Cramér-Rao lower bound. To discuss the estimation accuracy in such a case, several numerical methods have been developed in the field of quantum metrology [34][35][36]. Note that even for a special type of noise channel other than depolarization, it is still difficult to have an analytic expression of Cramér-Rao lower bound. For instance, in Ref. [37] considering the amplitude damping and dephasing noise model in the same amplitude estimation problem yet with known noise strength, the estimation error was evaluated numerically. Figure 1 shows the Cramér-Rao lower bound of the estimation error, ǫ min (a, κ), versus the total number of query calls, N q = M k=0 N k (2m k + 1). In particular in the classical case, it is given by

Achievable estimation error and required depolarizing noise level
The target value is chosen to be a = sin 2 θ a = 0.375, and N k = 100 for all k. The (red) thin solid line represents the lower bound in the classical case. The other lines represent the lower bounds obtained when using the amplitude amplification with the EIS for several noise level κ; the (yellow) thick solid line is the bound without noise (κ = 0); the (green) dashed line with triangles, the (blue) dash-dotted line with squares, and the (light blue) dotted line with crosses are the lower bound under depolarizing noise κ = 10 −1 , 10 −2 , and 10 −3 , respectively. Recall that, in the ideal case κ = 0, the number of query calls needed to reach a specified value of ǫ is N q ∼ O(1/ǫ), i.e., the Heisenberg-scaling. We note that a similar dependence of ǫ min (a, κ) on the magnitude of κ is observed for many cases of a, but there exist cases such that ǫ min (a, κ) takes much bigger values than those shown in Fig. 1 even when κ is sufficiently small; see Section 2.4 about this special case.
A notable point observed in Fig. 1 is that, even under the depolarizing noise model, the estimation error ǫ min (a, κ) decreases in nearly the Heisenberg-scaling law up to N q ∼ 10 4 and N q ∼ 10 5 for the cases κ = 0.01 and κ = 0.001, respectively. However, the error does not decrease anymore, even by using more queries. In other words, ǫ min (a, κ) gets saturated at those points of N q , and thus the algorithm has to be stopped 1 . The maximum number of queries within which the Heisenberg-scaling is guaranteed can be formally characterized as follows. That is, even under depolarizing noise model with κ, the Heisenberg-scaling is nearly preserved if the number of operations of Q, m k , is smaller thanm defined as the maximum integer satisfying the following inequality [34]: This condition is derived as a sufficient condition to guarantee that the probability to measure the good state is not affected by the noise in the limit of large samples. The star marks in Fig. 1 are the total query callsN q corresponding tom for given κ, showing that in fact the estimation error does not obey the Heisenberg-scaling law even by calling more queries thanN q . Note thatm given in Eq. (19) was originally derived in the one-parameter setting (that is, the case where κ is known), while ǫ min (a, κ) is a function of the two-parameter Fisher information matrix; despite of this gap m certainly captures the point of maximum number of query calls up to which the estimation error decreases according to the Heisenberg-scaling. Based on the above-described fact, we obtain the condition on the noise level κ so that the ML algorithm reaches a specified estimation error ǫ with the number of query calls of the order O(1/ǫ), i.e., the Heisenberg-scaling, even under the noisy environment. Fig. 2 yields such condition; this is 1 This saturation always occurs if κ = 0, which can be proven using the d'Alembert's ratio test together with the Amplitude estimation via maximum likelihood on noisy quantum computer 9 the relation between κ and the error atN q , for the case a = sin 2 θ a = 0.375, ∀k : N k = 100, and EIS. For instance, if we want to reach the estimation error ǫ = 10 −4 using O(1/ǫ) query calls, then we need κ to be smaller than ∼ 10 −3 . Importantly, Fig. 2 indicates the "quasi linear relation" between a specified ǫ and the required value of κ; that is, if we need to decrease ǫ to ǫ/10 then κ should be improved to simply κ/10. Note that this quasi linear relation is expected to hold from Eq. (19), which leads to 2m + 1 = 1/κ when κ is small, together with the Heisenberg-scalingN q = O(1/ǫ), although Eq. (19) was proven only in the one-parameter setting. The condition on κ can be further converted to that on the gate fidelity of elementary gates constructing the quantum circuit. That is, Fig. 2 represents the minimum hardware specification required to apply the quantum amplitude estimation method to solve a concrete problem such as a Monte Carlo integration task demonstrated later; a more detailed discussion on the hardware requirement will be given in Section 4.1.
The above discussion as well as Figs. 1 and 2 surely depend completely on the mathematical model described in Section 2.2. The real quantum system must not perfectly coincide with this model. The resulting ML estimate is then not guaranteed to reach the Cramér-Rao lower bound discussed here; also the quasi linear relation observed in Fig. 2 could be changed. That is, only with the materials posed up to now, we still could not say that Fig. 2 serves as a guide for discussing the condition on κ to reach a specified estimation error with Heisenberg-scaling law in the real world. This fully motivates us to execute some detailed experiment to see if the theory described above is consistent to the experimental result and thereby verify its usability to the real quantum computing applications; this topic will be discussed in Section 3.
Note that there is still a room for improvement on update strategy of m k . For instance, if we employ m k = ⌊r k−1 ⌋ where r is a real number which satisfies r > 1, the value ofN q changes depending on the value of r. This indicates that the achievable estimation error in the presence of noise can be reduced by changing the update strategy. More importantly, especially for the two-parameters problem considered in this paper, the estimation precision can be severely limited depending on the choice of r, which will be discussed in the next subsection.

Anomalous target value that induces a large estimation error
The previous discussion is based on the following two conditions; that is, the quasi linear relation between κ and ǫ min (a, κ), and the Heisenberg scaling of the total number of queries N q = M k=0 N k (2m k + 1) with respect to a given estimation error ǫ in the range m k m. However, these conditions does not always hold. For instance, when the target value a is a = sin 2 (π/8), the estimation error ǫ min (a, κ) does not obey the quasi linear relation with respect to κ, as shown by the blue solid line in Fig. 3. That is, even when the noise κ is sufficiently small, a precise estimate of a is not possible. This is due to the multi-parameter estimation setting, where in general the estimation error covariance matrix of the parameters θ satisfies the Cramér-Rao inequality (7), i.e., Cov(θ) ≥ I −1 (θ), where I(θ) is the Fisher information matrix. Clearly, if det I(θ) is nearly zero at a certain point of θ, then the estimation of those parameters has to be inefficient. Actually in the above-described example, our Fisher information matrix I(a, κ) is nearly degenerate at around a = sin 2 (π/8); this is the reason why the quasi linear decrease of ǫ with respect to κ does not hold in this case. In this section, we investigate this "anomalous target" point of a in detail. But before moving forward, we would like to emphasize that this analysis never happen in the 1-parameter setting. That is, as stated in Section 1, to achieve quantum advantage on available noisy quan- tum devices, the noise parameter has to be incorporated into the parameters to be estimated and accordingly such a singular point analysis needs to be carried out. First, to see how likely such an anomalous target of a exists, we plot the lower bound of the estimation error ǫ min = (I −1 ) 1,1 , as a function of a and κ in Fig. 4. This contour plot shows that, for almost all target values, the estimation error ǫ min almost linearly decreases with respect to κ; that is, ǫ min decreases approximately by one order of magnitude, as κ decreases by one order of magnitude. However, there exist anomalous target values of which estimation errors are insensitive to the value of κ; for instance, at around a = sin 2 (π/8) = 0.146, we observe a long spike where ǫ min takes almost the same value in the range [1 × 10 −2 , 1 × 10 −3 ] regardless of κ.
This insensitivity of ǫ min at a certain point of a is originated from the fact that, as mentioned in the first paragraph of this section, the Fisher information matrix I(a, κ) is nearly degenerate at around a. To closely look into the relation between the estimation error and the degeneracy of I(a, κ), in Fig. 5 we plot ǫ min with the solid blue line and the "anomality" β = I 2 1,2 /I 1,1 I 2,2 with the dotted red line, as a function of the target value a. The noise parameter κ is fixed to (A) κ = 10 −2 and (B) κ = 10 −3 . Note that 0 < β ≤ 1 and β = 1 is equivalent to det I(a, κ) = 0; hence, β ∼ 1 means that such (a, κ) are difficult to estimate precisely. Both of (A) and (B) of Fig. 5 indeed show that, at around the anomalous target of a where β = 1, the estimation error takes a relatively large value.
The existence ratio of anomalous target values can be quantitatively analyzed in terms of the linear density defined as follows. Here, N = 10 5 samples of a are randomly chosen from the uniform distribution on [0, 1], and and the ratio of a satisfying β > 0.9 is computed; the linear density is given by this ratio. Table 1  density is almost independent of the value of κ; it is about 1% ∼ 2% regardless of κ. This is due to the composite of the following two properties of the anomalous targets: (i) the number of anomalous segments satisfying β > 0.9 is inversely proportional to κ, and (ii) the length of each anomalous segment is proportional to κ. Since the linear density of anomalous targets is approximately the product of the number of the anomalous regions with the average length of anomalous regions, it is almost insensitive to κ.
It should be noted that the linear density of anomalous targets takes finite value in the limit of κ → 0, while the anomalous targets themselves are not present in the case of κ = 0. This is essentially originated from whether one has the information of the noise or not, i.e. the Fisher information I 1,1 can be employed to obtain the lower bound of the estimation error in the absence of the noise, however (I −1 ) 1,1 should be used if the noise level is unknown.
Finally, we propose two approaches to avoid the anomalous case. The first one is based on the observation that the determinant of the Fisher information matrix changes depending on the choice of the sequence {m k } of the amplitude amplification. Therefore, if the underlying target value is detected to be anomalous, then we can try another sequence {m k } to avoid the anomaly. For instance, when {m k } = {0, ⌊2.5 0 ⌋, ⌊2.5 1 ⌋, ⌊2.5 2 ⌋, · · · }, the quasi linear relation between κ and ǫ min is recovered even when a = sin 2 (π/8), as shown with the dotted red line in Fig. 3. This is a clear evidence showing that by suitably choosing {m k } the determinant of the Fisher information matrix does not get smaller. Our view is that it might be possible to detect the anomalous target  0.00 ± 0.00 % by calculating the empirical Fisher information, which eventually allows us to tune the sequence and thereby avoid the anomality. The second approach is by modifying the amplitude to be estimated. After detecting the anomalous target, we introduce an extra ancilla qubit; then R y (φ) rotation (i.e., the single qubit rotation around the y-axis with a fixed parameter φ) is applied to the ancilla qubit as follows: By estimating the probability that the last two-qubit state is |1 |1 , we could avoid the anomalous target problem, because the amplitude to be estimated is modified from a to a sin 2 (φ).

Experiment with a real quantum computing device
This section is devoted to show experimental result using the real-backend device of IBM Quantum Systems called ibmq_valencia [31], to evaluate the estimation performance of the proposed ML estimate and thereby the validity of the employed depolarization model. In particular we consider the Monte Carlo integration problem, whose computational (sampling) cost can be quadratically reduced via the amplitude estimation algorithm [4][5][6][7][8][9][10]. In this section, we begin with a brief explanation on the target integration problem, followed by showing the execution results of the real device for two-qubit and three-qubit cases.

Monte Carlo integration via amplitude estimation
where x j is defined as x j = (j + 1/2)/2 n and p is the probability mass function defined as p(x j ) = xj +1/2 n+1 xj −1/2 n+1 q(x)dx. It should be noted that there is an approximation error due to the discretization of f (x), i.e., S(f ) = E[f (x)]. In our analysis, however, we evaluate the error between S(f ) and the value obtained by Monte Carlo integration in order to assess the performance of our algorithm on a real quantum device. The amplitude estimation algorithm is run via the following operators: By applying these operators to the (n + 1)-qubit initial state, |0 n |0 , we obtain where |Ψ 1 and |Ψ 0 are defined as This is exactly the state of the form (1). Thus, the ideal amplitude estimation algorithm gives an approximation of S(f ) with the precision ǫ, with only O(1/ǫ) queries.
In this paper, we consider the simple integration 1 0 sin 2 (bx) dx with b a constant, which is approximated as In this case the operators P and R in Eq. (22) can be implemented only with Hadamard gates and controlled Y -rotation gates, as shown in Fig. 6.

Experimental result for the two-qubits case
We now show the experimental result of applying the ML algorithm in the real device, to compute Eq. (25). In this subsection, we consider the simple case n = 1, meaning that the integration is approximated via the discrete sum S(f ) having only two domain values x = 0 or x = 1, in which case Eq. (25) takes b = π/20, that is, the value S(f ) = a = sin 2 θ a = 0.0077. Also this setting means that we need only two qubits; in the experiment we chose the 0-th and 1-st qubit of ibmq_valencia. First, we show the experimental result of the quantum algorithm to compute the probability P (m k ; a, κ) given in Eq. (15); recall that and this is used to construct the likelihood function (16). The quantum circuit to execute the unitary operators Q and R(P ⊗ I) is shown in Fig. 6. Here |Ψ 2 = R(P ⊗ I) |0 |0 is given by Eq. (23). In Fig 7, the green round points are plotted by computing the hitting rate of ancilla qubit being 1 (i.e., the hitting rate of measuring the good state, which corresponds to P (m k ; a, κ)), for the LIS setting. Note that these points cover the points of the EIS setting which are marked with the blue square points depicted in Fig. 7. Importantly, the figures show that the hitting rate has a trend of exponentially-decaying oscillation and approaches to 0.5 as the number of Q in amplitude amplification, m k , increases. As a minimal model, we take the depolarizing noise (11) to model this decayed oscillation; actually the resulting probability distribution (26) well interpolates all the points obtained in the experiment, as shown in Fig. 7. Also, for reference, the analytic result of the ideal probability in the absence of noise, i.e., the case κ = 0, is depicted with the orange line in the Fig. 7.
We next experimentally executed the ML algorithm based on the model (26), for estimating a = S(f ) and κ. Recall that the best ML estimate of (a, κ) is given by the maximum of the likelihood function (16) with P (m k ; a, κ) the model (26) and h k the experimental result of the number of hit for a fixed number of Q in amplitude amplification, m k (k = 0, . . . , M ); here we tested 6 patterns M = 1, . . . , 6. In particular, we used EIS, fixed N k = 100 and b = 2π/5 therefore a = S(f ) = 0.375. The result is given in Fig. 8, which shows the relation between the estimation error of S(f ) and the total number of query calls, N q = M k=0 N k (2m k + 1). The solid thin red and thick yellow lines are the theoretical Cramér-Rao lower bound ǫ min (a, κ) given in Eq. (18), obtained via the classical method and the ideal quantum ML method without noise (κ = 0), respectively. The blue cross marks are the standard deviation of the estimated values of S(f ) obtained via the ML method, from the true value S(f ) = 0.375. Note that, for instance to mark the blue point at M = 4 (or equivalently N q ∼ 3.5 × 10 3 ), in which case the estimation error is about 0.65 × 10 −2 , the ML algorithm uses the likelihood function constructed from the amplitude amplification processes with different operation length m k (k = 0, . . . , 4). Further, to reduce the fluctuation of those points, we repeated the same experiment 1, 064 times and averaged out to determine each point; the three-times standard errors are indicated by the error bars. The green dotted line shows the Cramér-Rao lower bound with noise level κ = 0.067, which is the single ML estimate of κ based on the 1, 064 × N k = 1, 064 × 100 data at M = 6 (that is, roughly speaking, the best estimate of κ over the whole execution of the algorithm).
As seen from Fig. 8, the estimation error experimentally obtained using the ML method (the blue points) is in good agreement with the theoretical Cramér-Rao lower bound (the green dotted line). A few slight deviation, particularly the points where the ML estimate is below the Cramér-Rao lower bound, might be due to some imperfections other than depolarizing noise, such as the rotation error of the gate operation and unnecessary coupling to neighboring qubits on the device. We would also like to emphasize that, in this example, there are several points where the experiment achieves the more precise estimate than that obtained via the classical method (the red line). This is an important evidence that even a noisy quantum computer can be beneficial over the classical one, in the measure of query complexity. Finally we remark that a similar behavior was observed, in other settings that use different two qubits in the device and a different target value S(f ); see Appendix C: .

Experimental result for the three-qubit case
Here we present the experimental result for the case where the target integration 1 0 sin 2 (bx) dx with b = 2π/5 is to be approximated by S(f ) in Eq. (25) with 2 n = 2 2 segments. In this case, S(f ) = a = sin 2 θ a = 0.381. Also then we need n + 1 = 3 qubits to implement the amplitude amplification operation.
The quantum circuit to execute the ML algorithm is shown in Fig. 9. Because ibmq_valencia does not allow direct coupling for arbitrary pair of qubits, we chose three qubits to form a chain  structure. In the experiment, the qubit in the middle of chain was chosen as the ancilla qubit, and it is placed as the third qubit from the top in the circuit at Fig. 9. Fig. 10 is the three-qubit version of that at Fig. 8, and shows the relationship between the number of queries and the estimation error of a = S(f ). The Cramér-Rao lower bound under noise (the green dotted line with triangle) was plotted with κ = 0.331. Due to such a high noise level, as expected from Fig. 1, this Cramér-Rao lower bound is above the classical one (the red line), meaning that the quantum computation has no advantage in the amplitude estimation task. However, what is important in our context is that the estimation error obtained from the experiment (the blue points) lie near the green line; that is, the ML estimate computed with the 3-qubit real quantum device asymptotically reaches the theoretical Cramér-Rao lower bound.
Therefore, together with the result for the 2-qubit case, we now would like to conclude that Eq. (11) is a good model of the noise process, at least for a small size qubit device. This means that the theoretical predictions illustrated in Fig. 1 and thereby Fig. 2 might be usable as a practical guide for discussing the condition on the noise level κ to realize the Heisenberg-scaling in the quantum amplitude estimation task.

Discussion
This section is divided to two topics. In Section 4.1, based on the result obtained in Section 2.3, we show a procedure to assess the gate errors required to achieve a given task as well as the expected execution time of the algorithm, and discuss the limitations such as the gate error and the coherence time when using IBM Quantum devices. Next in Section 4.2, we discuss the computational complexity that takes into account the classical optimization procedure to compute the ML estimate.

Hardware specification for the amplitude estimation task
Recall that Fig. 2 is used to predict the maximum allowed noise levelκ for achieving a given estimation error ǫ with the query calls obeying the Heisenberg-scaling law. Here we connect this value ofκ to the error of elementary gates in the quantum circuit; also the execution time of the algorithm is assessed. The entire procedure to compute these quantities is composed of the following four steps.
1. An amplitude estimation problem, together with a target estimation error ǫ, is given to us.
Then the amplitude amplification operator Q (plus P and R in the Monte Carlo case), and accordingly the elementary gates constituting those operators are identified; for instance, see Fig. 6 or Fig. 9. 2. Given the target estimation error ǫ, we use Fig. 2 to compute the maximum allowed noise level κ when N k for all k is fixed. We can then compareκ to the ML estimateκ obtained through the experiment with a real device, to see whether the device can produce the desired estimatê a. Also Eq. (19) enables us to have the maximum number of operations of Q, i.e.,m, within which the Heisenberg-scaling nearly holds in estimating the parameter. More precisely it is given bym = 0.5/(eκ − 1); for example,m can take onlym = 5 whenκ = 0.1, butm = 500 when κ = 0.001. 3. Furthermore, fromm and the number of gates constituting the operator Q, denoted by L, we can have a rough estimate of the execution time of each quantum circuit with length m 1 , m 2 , . . . ,m. Then the execution time inm and the total execution time in m 1 , m 2 , . . . ,m can also be estimated, which are then compared to the coherence time of the available real device, and then the feasibility of this algorithm can be assessed respectively. 4. Under the assumption that all qubits are subjected to only depolarizing noise, we approximate the p = e −κ , the success probability of the operator Q, as where L is the number of gates that constitute the operator Q and ǫ i is the depolarizing error probability of the ith qubit. Using this equation we may also have a rough estimate of the gate error, usingκ and L. The gate error is compared to that of real device (which might be identified via the standard randomized benchmarking test) to see the feasibility of the algorithm.
A detailed procedure for computing the above quantities, especially the gate errors and the total execution time, is given in Appendix B: .
As an example, let us consider a multiple integration over 5 variables, which assumes direct correlations between any of two variables, e.g., (x 1 x 2 + x 2 x 3 + x 3 x 4 + x 4 x 5 )dx. The reason of this choice is that, to compute such a multiple integration with more than five variables, the Monte-Carlo method is usually employed rather than grid-based numerical integration approaches. Here we follow the above four steps one by one. (Step 1) Suppose that we are required to achieve the target estimation error ǫ = 0.001 with the Heisenberg-scaling query calls. (Step 2) Then Fig. 2 tells us that we needκ = 0.005 when N k = 100 for all k and 99 qubits wherein 48 qubits are used for the ancilla to gate decomposition. Also we havem = 99. (Step 3) For some quantum devices, the operating time of single gates and the CNOT gate are publicly available; in the case of ibmq_valencia, they are 7.1 × 10 −8 s and 2.8 × 10 −7 s, respectively. Also the measurement time is 3.5 × 10 −6 s. Moreover we assume that the interval time between the measurement and the next execution of the algorithm is 10 times longer than the execution time of the algorithm with lengthm. Then, we find that the execution time of the algorithm with lengthm is 0.54 s and the total computation time of the entire algorithm is about 1, 082 s; see Table 2 in Appendix B: for the detailed calculation. (Step 4) We can also assess the gate error required for the algorithm to follow the Heisenberg-scaling to reach the given estimation error. Under the assumption that the error of CNOT gate, ǫ d , is 10 times bigger than that of any single gate error, ǫ s , we end up with ǫ d = 2.8 × 10 −7 and ǫ s = 2.8 × 10 −8 . Now, in the case of ibmq_valencia, they are given by ǫ d ∼ 1.0 × 10 −2 and ǫ s ∼ 1.0 × 10 −3 .
An important finding is in quantifying the difference between the gate error of the currently available devices and that of the near ideal machine which realizes the Heisenberg-scaling to reach the given estimation error. The execution time (0.54 s) is also clarified, which may be much longer than the coherence time of the currently available devices. In fact, even though these gaps seem to be large, they are informative for us because we now know how much improvement in the hardware development is necessary to fill these gaps. In addition, algorithm improvements would help close these gaps. For example, recent research [38] suggested a method for reducing the circuit depth and the number of qubits.
Finally, we note that the estimated noise level κ is comparable with the value calculated based on the publicly available information on the gate error of the IBM Quantum device. In our case, the 2-qubits Grover circuit contains 5 CNOT gates (the Q operator part of Fig. 6), and the 3qubits one contains 16 CNOT gates (the Q operator part of Fig. 9); note that R y gate is composed of 2 CNOT gates. Then, for the 2 qubits case, the CNOT gate error was 0.00565, meaning that κ ∼ −ln((1−0.00565) 5 ) ∼ 0.0283. For the 3 qubits case, we used two types of CNOT gate with error rate 0.008923 and 0.01119, and they were used 8 times; hence we have κ ∼ −ln((1 − 0.008923) 8 * (1 − 0.01119) 8 ) ∼ 0.162. By taking into account some other impact of noise such as single gate error, we consider that these values becomes closer to the estimated values κ = 0.067 and κ = 0.331.

Computational time complexity of maximum likelihood estimation
We have argued about the possibility of less number of queries achieved by the ML method against classical cases under noise influences, and how to deal with anomalous target values. Here, assuming anomalous targets are detectable, we show the classical post-processing for maximum likelihood estimation of the target valueâ and noise levelκ can be performed in O(ln 5/2 (1/ǫ)), which is much less than that of the quantum part.
Because under depolarizing noise model the success probability now has two parameters, the target valueâ and noise levelκ as shown in Section 2.2, it is necessary to perform two-dimensional maximum likelihood to obtain the estimated values of the parameters. A straightforward random search or grid search requires O(1/ǫ 2 ) evaluations of likelihood function to achieve the error ǫ [39]. This completely ruins the advantage obtained from the quantum method because the total time complexity now becomes O(1/ǫ 2 ) which is at least the same as the classical Monte-Carlo approach.
In order to avoid the problem of computational complexity, we employed an adaptive constant grid search at each stage k for 0 ≤ k ≤ M in the experiment of Section 3. Namely, at the k-th stage, we performed the grid search with a constant number of divisions only in the range of confidence interval defined as C ǫ times larger than the error estimated from the Fisher information at the (k − 1)-th stage. By the Chebysev inequality, the error estimated at each stage is guaranteed to be within C ǫ times the confidence interval with probability at least 1 − 1/C 2 ǫ . To obtain an estimation within ǫ error, the number of stages M is O(ln(1/ǫ)). Thus, the total probability of obtaining good parameter estimation is at least (1 − 1/C 2 ǫ ) M , which is Ω(1) when C ǫ = Θ(ln 1/2 (1/ǫ)). The total time complexity of the maximum likelihood computation is thus C ǫ · M · O(ln(1/ǫ)), where the last term is that for evaluating the likelihood function. This gives the bound O(ln 5/2 (1/ǫ)) time complexity for the classical post processing.
Except for the anomalous cases, we should note the estimation error of the target value a is not affected even if the estimation error of the noise level κ is large. The details are shown in Appendix A: . This may allow us to reduce the two-dimensional ML estimation to a one-dimensional one. In this case, the estimated values can be obtained with lower computational cost; roughly estimate the noise level κ with large grid size, and then perform one-dimensional ML of the target value a with high accuracy. Furthermore, for sufficiently large number of shots the two-dimensional ML estimation can be approximated with the weighted least square estimation. The latter can be solved much more efficiently in practice thanks to the specific algorithms, such as the Levenberg-Marquardt's [40,41].

Conclusion
All quantum algorithms running on currently available quantum devices must consider the effects of noise. Hence, to perform an appropriate evaluation of such algorithms, particularly those aimed at possible quantum advantage, it is necessary to carefully model the noise, analyze its effects to the algorithm, and make a validation via experiments. This paper provides a demonstration of this evaluation, yielding Fig. 2, a relationship between the target estimation error and the required noise threshold for the amplitude estimation problem, which is validated by experiments on quantum devices.
Lastly we again point out the importance of multi-parameters estimation problem in the quantum computing framework. In our case, the nuisance noise parameter in addition to the target amplitude parameter must be estimated. We have seen that the Fisher information matrix can be degenerate, in which case the target amplitude parameter cannot be efficiently estimated even by increasing the number of amplitude amplifications. This singular problem arises only in multiparameters estimation problems, and a similar challenge may easily appear in other scenarios such as the phase estimation problem under noisy environment. In this sense, while this paper employed the ML estimate approach, we can have various options to design an efficient estimator for such multi-parameter estimation problems in noisy quantum devices. For instance Ref. [42] showed the method to tune the likelihood function for Bayesian inference of the amplitude parameter, and they have derived the run time estimation to achieve a target accuracy under noisy environment.
the true values and the estimation errors are denoted as δa =â − a 0 and δκ =κ − κ 0 , respectively. By definition, the one-dimensional ML estimateâ is given bŷ where L is the likelihood function introduced in Eq. (5). Assuming the smoothness of the likelihood function with respect to a, we have the following equation: Hence, the first order Taylor expansion of the left-hand side of this equation, at around the true parameters a 0 and κ 0 , yields As a consequence of the central limit theorem, the left-hand side of this equation asymptotically follows N(0, I 1,1 ), i.e., the normal distribution with mean 0 and variance I 1,1 , in the limit with a large number of shots. Also, the first term on the right-hand side asymptotically converges to I 1,1 δa and the second term converges to I 1,2 δκ. Thus in the asymptotic regime, Eq. (29) implies that δa obeys the normal distribution with mean −I 1,2 δκ/I 1,1 and variance 1/I 1,1 , i.e., δa ∼ N − I 1,2 I 1,1 δκ, Then the mean squared error ofâ can be calculated as E δa 2 = δκ 2 I 2 1,2 I 2 1,1 If the squared error ofκ is c times bigger than the Cramér-Rao lower bound of the variance, I 1,1 /(I 1,1 I 2,2 − I 2 1,2 ), this equation yields E δa 2 = c I 1,1 (I 1,1 I 2,2 − I 2 1,2 ) The first term of the rightmost side of this equation is just the Cramér-Rao lower bound and the second term is an additional term due to the effect from the error of noise parameter κ. This expression indicates that the estimation error of κ does not significantly affect on the estimation error of the target value a except for the anomalous target values, because I 2 1,2 /I 1,1 I 2,2 is not large for such typical target values.

Appendix B:
Detailed procedure for determining the hardware components In Section 4.1 we have discussed the method to evaluate the hardware components required to achieve the Heisenberg-scaling estimation under the influence of noise. Here we show a detailed procedure to determine those quantities for an N int -dimensional integration problem, which is summarized in Table 2. Our task is to use the quantum ML method to estimate the value of integral, within the target estimation error ǫ. Note that then the amplitude amplification (AA) operator Q as well as R and P are determined. Also we set the following assumptions; the gate error of CNOT, ǫ d , is 10 times bigger than that of any single gate error, ǫ s ; the qubits are fully connected in the device; the operating time of single and CNOT gates are 7.1 × 10 −8 s and 2.8 × 10 −7 s, respectively; the measurement duration time is 3.5 × 10 −6 s; the interval time between the measurements is 10 times bigger than the execution time; the number measurement of N k = 100 and the sequence is EIS.
The main focused quantities are the gate errors (ǫ d , ǫ s ) and the executing time of the algorithm. ((tAAm k + tm + ti) * N k ) 1, 082 N is from mN =m. ti is an interval time between shots calculated as 10 times the execution time. N k = 100 is the number of shots, for all k Appendix C: Two qubit experimental result with other settings In Section 3.2, we demonstrated the results using the real quantum computer with specific parameters. Here we present additional results in different experimental settings, to see how the basic quantities such as the estimation error would be affected by those changes. Fig. 11(a) shows the result when we used the third and the fourth qubits on ibmq_valencia to represent the target value S(f ) = a = 0.375, whereas the other settings are the same as before; e.g., EIS (m 0 = 0, m k = 2 k−1 ) is taken and the number of shots is N k = 100 ∀k. Next, Fig. 11(b) shows the case where the target value is chosen as S(f ) = a = sin 2 π/40 + sin 2 3π/40 ∼ 0.03, meaning that b = π/10; the other settings are the same as in Fig. 11(a).
These figures show that the estimation error obtained with the experiments (the blue cross marks) lies near the green dotted line which is the Cramér-Rao lower bound with depolarizing noise, supporting our depolarizing noise model discussed in Section 2.3. We note that the noise level κ ended up with different values, 0.055 and 0.079 respectively, depending on the different target values a; hence it is important to perform the multi-parameter estimation each time even when using the same qubits. In addition, the experimental results of Fig. 8 and Fig. 11(a) show that the noise level κ differs depending on the qubits; hence, a careful qubit mapping (i.e., assigning the location of qubits) is important to get to lower noise level.
Note also that, in Figs. 11(a) and (b), the error bars (the three-times standard errors) seem to have different values, but this is merely because the vertical axis is in the log scale and each difference is only within a factor of two.