1 Introduction

Quantum computers are expected to allow us to perform high-speed computations over classical computations for problems in a wide range of scientific and technological fields. Environments in which quantum algorithms can be executed by real quantum devices are currently being provided [1,2,3]. Real quantum devices with several tens of qubits will soon be realized in near future, although those are the so-called noisy intermediate-scale quantum (NISQ) devices that impose several practical limitations on their use [4], both in the number of gate operations and in the number of available qubits. Hence, several custom subroutines taking into account these constraints have been proposed, typically the variational quantum eigensolver [5, 6].

In this paper, we focus on the amplitude estimation algorithm, which is a core subroutine in quantum computation for various applications, e.g., in chemistry [7, 8], finance [9, 10], and machine learning [11,12,13,14]. In particular, quantum speedup of Monte Carlo sampling via amplitude estimation [15] lies in the heart of these applications. Therefore, in light of its importance, we followed the aforementioned direction and developed a new amplitude estimation algorithm that can be executed in NISQ devices.

Note that Ref. [16] demonstrated that the amplitude estimation problem can be formulated as a phase estimation problem [17], where the amplitude to be estimated is inferred from the eigenvalue of the corresponding amplification operator. Owing to the ubiquitous nature of the eigenvalue estimation problem, some versions of the phase estimation algorithm suitable for NISQ devices [18,19,20,21,22] have been proposed (with the last one appeared slightly after ours), and they all rely on classical post-processing statistics such as the Bayes method. However, these modified phase estimation algorithms as well as the original scheme [17] still involve many controlled operations (e.g., the controlled amplification operation in the case of Ref. [16]) that can be difficult to implement on NISQ devices. Therefore, a new algorithm specialized to the amplitude estimation problem is required, one that does not use expensive controlled operations.

The goal of amplitude estimation is, in its simplest form, to estimate the unknown parameter \(\theta \) contained in the state \(\left| {\varPsi }\right\rangle = \sin \theta \left| \mathrm{good}\right\rangle + \cos \theta \left| \mathrm{bad}\right\rangle \), where \(\left| \mathrm{good}\right\rangle \) and \(\left| \mathrm{bad}\right\rangle \) are given orthogonal state vectors. Our scheme is composed of the amplitude amplification process and the maximum likelihood (ML) estimation; the controlled operations and the subsequent quantum Fourier transform (QFT) are not involved. The amplification process transforms the coefficient of \(\left| \mathrm{good}\right\rangle \) to \(\sin ((2m+1) \theta )\) with m being the number of operations; if \(\theta \) is known, then, by suitably choosing m, we can enhance the probability of hitting “good” quadratically greater than the classical case, where no amplitude amplification is utilized [23]. However, the function \(\sin ((2m+1) \theta )\) does not always take a relatively large value for a certain m because \(\theta \) is unknown in this problem, meaning that an effective quantum speedup is not always available; also, the ML estimate is not uniquely determined due to the periodicity of this function. Our strategy is the first to make measurements on the transformed quantum state and to construct likelihood functions for several m, say \(\{m_0,\ldots ,m_M\}\), and then combine them to construct a single likelihood function that uniquely produces the ML estimate; see Fig. 1. The broad concept behind this scheme is to combine the data produced from different quantum circuits, and the scheme might be performed even on NISQ devices to compute a target value faster than classical algorithms via some post-processing. Actually a numerical simulation demonstrates that, by appropriately designing \(\{m_k\}\), compared with the classical sampling we can achieve nearly a square root reduction in the total number of queries to reach the specified estimation precision; notably, only relatively short-depth circuits are required to achieve this quantum speedup. We also show that, in an application of the amplitude estimation to Monte Carlo integration, our algorithm requires many fewer controlled NOT (CNOT) gates than the conventional phase-estimation-based approach, so it is suitable for obtaining quantum advantages with NISQ devices. Note that Ref. [24] also took the approach without using the phase estimation method, but it needed to change the query in each iteration, which is highly demanding in practice. Also the paper Ref. [25] gave an amplitude estimation scheme that employs a Bayes rule together with applying random Unitary operations (subjected to the Haar measure) to ideally realize the quadratic speedup, without a controlled Unitary operation; this scheme is applicable to low-dimensional quantum circuits, due to the hardness to implement the random Unitaries.

2 Preliminary

We herein briefly describe the quantum amplitude amplification, which is the basis of our approach for the amplitude estimation problem.

Our proposed algorithm mainly consists of two parts: quantum amplitude amplification and amplitude estimation based on likelihood analysis. The amplitude amplification [26, 27] is the generalization of the Grover’s quantum searching algorithm [23]. Similar to quantum searching, the amplitude amplification is known to achieve quadratic speedup over the corresponding classical algorithm.

We assume a unitary operator \({{\mathscr {A}}}\) that acts on \((n+1)\) qubits, such that \(|{\varPsi }\rangle = {{\mathscr {A}}} |0\rangle _{n+1} = \sqrt{a}|{\tilde{{\varPsi }}}_1\rangle |1\rangle +\sqrt{1-a}|{\tilde{{\varPsi }}}_0\rangle |0\rangle \), where \(a \in \left[ 0,1\right] \) is the unknown parameter to be estimated, while \(|{\tilde{{\varPsi }}}_1\rangle \) and \(|{\tilde{{\varPsi }}}_0\rangle \) are the n-qubit normalized good and bad states. The query complexity of estimating a is counted by the number of the operations of \({{\mathscr {A}}}\), which is often denoted as the number of queries for simplicity. By performing measurements on \(|{\varPsi }\rangle \) repeatedly, we can infer a from the ratio of obtaining the good and bad states, but the number of queries is exactly the same as the classical one in this case.

The advantage offered by the quantum amplitude amplification is that, instead of measuring right after the single operation of \({{\mathscr {A}}}\), we can amplify the probability of obtaining the good state by applying the following operator.

$$\begin{aligned} {\mathbf {Q}}=-{{\mathscr {A}}}{\mathbf {S}}_0{{\mathscr {A}}}^{-1}{\mathbf {S}}_{\chi }, \end{aligned}$$
(1)

where the operator \({\mathbf {S}}_\chi \) puts a negative sign to the good state, i.e., \({\mathbf {S}}_\chi |{\tilde{{\varPsi }}}_1\rangle |1\rangle = - |{\tilde{{\varPsi }}}_1\rangle |1\rangle \), and does nothing to the bad state. Similarly, \({\mathbf {S}}_0\) puts a negative sign to the all-zero state \(|0\rangle _{n+1}\) and does nothing to the other states. \({{\mathscr {A}}}^{-1}\) is the inverse of \({{\mathscr {A}}}\), the operation of which requires the same query complexity as \({{\mathscr {A}}}\).

By defining a parameter \(\theta _a \in \left[ 0,\pi /2\right] \) such that \(\sin ^2{\theta _a} = a\), we have

$$\begin{aligned} {{\mathscr {A}}} |0\rangle _{n+1} = \sin {\theta _a}|{\tilde{{\varPsi }}}_1\rangle |1\rangle + \cos {\theta _a}|{\tilde{{\varPsi }}}_0\rangle |0\rangle . \end{aligned}$$
(2)

Brassard et al. [16] showed that repeatedly applying \({\mathbf {Q}}\) for m times on \(|{\varPsi }\rangle \) results in

$$\begin{aligned} {\mathbf {Q}}^m|{\varPsi }\rangle = \sin ((2m+1)\theta _a)|{\tilde{{\varPsi }}}_1\rangle |1\rangle +\cos ((2 m+1)\theta _a)|{\tilde{{\varPsi }}}_0\rangle |0\rangle . \end{aligned}$$
(3)

This equation represents that, after applying \({\mathbf {Q}}\) m times (with 2m queries), we can obtain the good state with a probability of at least \(4m^2\) times larger than that obtained from \({{\mathscr {A}}}|0\rangle _{n+1}\) for sufficiently small a. This is in contrast with having 2m number of measurements from \({{\mathscr {A}}}|0\rangle _{n+1}\), which only gives the good state with probability 2m times larger. This intuitively gives the quadratic speedup obtained from the amplitude amplification: if we can infer the ratio of the good state after the amplitude amplification, we can estimate the value of a from the number of queries required to obtain such a ratio.

The conventional amplitude estimation [16] utilizes the quantum phase estimation which requires a quantum circuit that implements the multiple controlled \({\mathbf {Q}}\) operations, namely, \(\text{ Controlled- }{\mathbf {Q}}:~|m\rangle |{\varPsi }\rangle \rightarrow |m\rangle {\mathbf {Q}}^m|{\varPsi }\rangle \). Performing the controlled operations simultaneously on many m’s consecutively and gathering the amplitude by the inverse QFT enables an accurate estimation of a [16]. However, this approach suffers from the need for many controlled gates (thus, CNOT gates) and additional ancilla qubits (the number of which is dictated by the required accuracy). Such an approach can be problematic for NISQ devices.

3 Amplitude estimation without phase estimation

3.1 Algorithm

This section shows the quantum algorithm to estimate \(\theta _a\) in Eq. (3) without using the conventional phase-estimation-based method [16]. The first stage of the algorithm is to make good or bad measurements on the quantum state \({{\mathbf {Q}}}^{m_k}|{\varPsi }\rangle \) for a chosen set of \(\{m_k\}\). Let \(N_k\) be the number of measurements (shots) and \(h_k\) be the number of measuring good states for the state \({{\mathbf {Q}}}^{m_k}|{\varPsi }\rangle \); then, because the probability measuring the good state is \(\sin ^2((2m_k+1)\theta _a)\), the likelihood function representing this probabilistic event is given by

$$\begin{aligned} L_k(h_k;\theta _a) = \left[ \sin ^2((2 m_k+1)\theta _a)\right] ^{h_k} \left[ \cos ^2((2 m_k+1)\theta _a)\right] ^{N_k-h_k}. \end{aligned}$$
(4)

The second stage of the algorithm is to combine the likelihood functions \(L_k(h_k;\theta _a)\) for several \(\{m_0, \ldots , m_M\}\) to construct a single likelihood function \(L(\mathbf {h};\theta _a)\):

$$\begin{aligned} L(\mathbf {h};\theta _a) = \prod _{k=0}^{M}L_k(h_k;\theta _a), \end{aligned}$$
(5)

where \(\mathbf {h}=(h_0,h_1,\ldots ,h_M)\). The ML estimate is defined as the value that maximizes \(L(\mathbf {h};\theta _a)\):

$$\begin{aligned} {\hat{\theta }}_a = \mathop {\mathrm{arg~max}}\limits _{\theta _a} ~ L(\mathbf {h};\theta _a) = \mathop {\mathrm{arg~max}}\limits _{\theta _a} ~ \ln L(\mathbf {h};\theta _a) \end{aligned}$$
(6)

The whole procedure is summarized in Fig. 1. Now a and \(\theta _a\) are uniquely related through \(a=\sin ^2\theta _a\) in the range \(0\le \theta _a\le \pi /2\), and \({\hat{a}}:=\sin ^2{\hat{\theta }}_a\) is the ML estimate for a; thus, in what follows, \(L(\mathbf {h}; a)\) is denoted as \(L(\mathbf {h}; \theta _a)\). Note that the random variables \(h_0, h_1, \ldots , h_M\) are independent but not identically distributed because the probability distribution for obtaining \(h_k\), i.e., \(p_k(h_k;\theta _a)\propto L_k(h_k;\theta _a)\), is different for each k; however, the set of multidimensional random variables \(\mathbf {h}=(h_0,h_1,\ldots ,h_M)\) is independently generated from the identical joint probability distribution \(p(\mathbf {h};\theta _a)\propto L(\mathbf {h};\theta _a)\).

Fig. 1
figure 1

Schematic picture of our amplitude estimation algorithm using the ML estimation. After preparing the states \({\mathbf {Q}}^{m_k} |{\varPsi }\rangle \), the numbers of measuring good states, i.e., \(h_k\) are obtained (left). Based on the obtained \(h_k\), the likelihood functions \(L_k(h_k;\theta _a)\) are constructed (center). Finally, a single likelihood function \(L({\mathbf {h}};\theta _a)\) is introduced by combining the likelihood functions \(L_k(h_k;\theta _a)\) (right). The ML estimate is the value that maximizes the likelihood function \(L({\mathbf {h}};\theta _a)\)

This algorithm has two caveats: (i) if only a single amplitude amplification circuit is used like in the Grover search algorithm, i.e., the case \(M=0\) and \(m_0\ne 0\), the ML estimate \({\hat{\theta }}_a\) cannot be uniquely determined due to the periodicity of \(L_0(h_0;\theta _a)\), and (ii) if no amplification operator is applied, i.e., \(m_k=0~\forall k\), then the ML estimate is unique, but it does not have any quantum advantages, as shown later. Hence, the heart of our algorithm can be regarded as the quantum circuit fusion technique that combines some quantum circuits to determine the target value uniquely, while some quantum advantage is guaranteed.

3.2 Statistics: Cramér–Rao bound and Fisher information

The remaining to be determined in our algorithm was to design the sequences \(\{m_k, N_k\}\) so that the resulting ML estimate \({\hat{\theta }}_a\) might have a distinct quantum advantage over the classical one. Here, we introduce a basic statistical method to carry out this task and, based on that method, give some specific choice of \(\{m_k, N_k\}\).

First, in general, the Fisher information \({{\mathscr {I}}}(a)\) is defined as

$$\begin{aligned} {{\mathscr {I}}}(a) = {\mathbb {E}}\left[ \left( \frac{\partial }{\partial a}\ln L(x;a) \right) ^2 \right] , \end{aligned}$$
(7)

where the expectation is taken over a random variable x subjected to a given probability distribution p(xa) with an unknown parameter a. The importance of the Fisher information can be clearly seen from the fact that any estimate \({{\hat{a}}}\) satisfies the following Cramér–Rao inequality.

$$\begin{aligned} {\mathrm{var}}({\hat{a}}) = {\mathbb {E}}[({\hat{a}}-{\mathbb {E}}[{\hat{a}}])^2] \ge \frac{[1+b'(a)]^2}{{{{\mathscr {I}}}}(a)}, \end{aligned}$$
(8)

where b(a) represents the bias defined by \(b(a)={\mathbb {E}}[{\hat{a}}-a]\) and \(b'(a)\) indicates the derivative of b(a) with respect to a. It is easy to see that the mean squared estimation error satisfies

$$\begin{aligned} {\mathbb {E}}[({\hat{a}}-a)^2]\ge \frac{[1+b'(a)]^2}{{{{\mathscr {I}}}}(a)}+b(a)^2. \end{aligned}$$
(9)

A specifically important property of the ML estimate, which maximizes the likelihood function \(\prod _{k}p({x}_k;a)\) with the measurement data \({x}_k\), is that it becomes unbiased, i.e., \(b(a)=0\), and further achieves the equality in Eq. (9) in the large number limit of measurement data [28]; that is, the ML estimate is asymptotically optimal.

In our case, by substituting Eqs. (4) and (5) into Eq. (7) together with a straight forward calculation \({\mathbb {E}}[h_k]=N_k\sin ^2((2m_k+1)\theta _a)\), we find that

$$\begin{aligned} {{{\mathscr {I}}}(a)}=\frac{1}{a(1-a)}\sum _{k=0}^{M}{N_k}(2 m_k+1)^2. \end{aligned}$$
(10)

Also, for any sequences \(\{m_k, N_k\}\), the total number of queries is given as

$$\begin{aligned} N_{\mathrm{q}}=\sum _{k=0}^M N_k (2m_k+1). \end{aligned}$$
(11)

As stated before, the coefficient 2 multiplying \(m_k\) in Eq. (11) originates from the fact that the operator \({\mathbf {Q}}\) uses \({{\mathscr {A}}}\) and \({{\mathscr {A}}}^{-1}\), and the constant \(+1\) is due to the initial state preparation of \(|{\varPsi }\rangle ={{\mathscr {A}}}|0\rangle _{n+1}\). If \({\mathbf {Q}}\) is not applied to \(|{\varPsi }\rangle \) and if only the final measurements are performed for \(|{\varPsi }\rangle \), i.e., \(m_k=0\) for all k, the total number of queries is identical to that of classical random sampling. Because \(N_k\) and \((2 m_k + 1)\) are positive integers, the Fisher information in Eq. (10) satisfies the following relation.

$$\begin{aligned} {{{\mathscr {I}}}}(a) ~\le ~ \frac{1}{a(1-a)}\left( \sum _{k=0}^M N_k(2m_k+1)\right) ^2 ~=~ \frac{1}{a(1-a)}N_{\mathrm{q}}^2. \end{aligned}$$
(12)

Here, \({{\hat{a}}}\) is set to the ML estimate (6), and the estimation error is considered to be \({\hat{\varepsilon }}=\sqrt{{\mathbb {E}}[({\hat{a}}-a)^2]}\) in this case. The total number of measurements \(\sum _{k=0}^M N_k\) is assumed to be sufficiently large, in which case the ML estimate asymptotically converges to an unbiased estimate and achieves the lower bound of the Cramér–Rao inequality (8), as aforementioned. Hence, from Eqs. (8) and (12), the error \({\hat{\varepsilon }}\) satisfies

$$\begin{aligned} {\hat{\varepsilon }} \rightarrow \frac{1}{{{{\mathscr {I}}}}(a)^{1/2}} \ge \frac{\sqrt{a(1-a)}}{N_{\mathrm{q}}}. \end{aligned}$$
(13)

(More precisely, \({\hat{\varepsilon }} \, {{\mathscr {I}}}(a)^{1/2} \rightarrow 1\).) That is, the lower bound of the estimation error is on the order of \({{\mathscr {O}}}(N_{\mathrm{q}}^{-1})\), which is referred to as the Heisenberg limit. This is in stark contrast to the classical sampling method, the estimation error of which is lower bounded by \(\sqrt{a(1-a)}/N_{\mathrm{q}}^{1/2}\), obtained by setting \(m_k=0~\forall k\) (i.e., a case with no amplitude amplification) in Eqs. (10) and (11); that is, the lower bound is at best on the order of \({{\mathscr {O}}}(N_{\mathrm{q}}^{-1/2})\) in the classical case.

Now, we can consider the problem posed at the beginning of this subsection: designing the sequences \(\{m_k, N_k \}\) so that the resulting ML estimate \({\hat{\theta }}_a\) outperforms the classical limit \({{\mathscr {O}}}(N_{\mathrm{q}}^{-1/2})\) and hopefully achieves the Heisenberg limit \({{\mathscr {O}}}(N_{\mathrm{q}}^{-1})\), i.e., the quantum quadratic speedup. Although the problem can be formulated as a maximization problem of Fisher information (10) with respect to \(\{ m_k, N_k \}\) under some constraints on these variables, here we fix \(N_k\)’s to a constant and provide just two examples of the sequence \(\{ m_k\}\):

  • Linearly incremental sequence (LIS): \(N_k=N_{\mathrm{shot}}\) for all k, and \(m_k=k\), i.e., it increases as \(m_0=0, m_1=1, m_2=2, \ldots , m_M=M\).

  • Exponentially incremental sequence (EIS): \(N_k=N_{\mathrm{shot}}\) for all k, and \(m_k\) increases as \(m_0=0, m_1=2^0, m_2=2^1,\ldots , m_M=2^{(M-1)}\).

In the case of LIS, the Fisher information (7) and the number of queries (11) are calculated as \({{{\mathscr {I}}}}(a)=N_{\mathrm{shot}}(2M+3)(2M+1)(M+1)/(3a(1-a))\) and \(N_{\mathrm{q}}=N_{\mathrm{shot}}(M+1)^2\), respectively. Because \(N_{\mathrm{q}}\sim N_{\mathrm{shot}}M^2\) and \({{{\mathscr {I}}}}(a)\sim N_{\mathrm{shot}}M^3/(3 a(1-a))\) when \(M\gg 1\), the lower bound of the estimation error is evaluated as \({\hat{\varepsilon }}=1/{{{\mathscr {I}}}}(a)^{1/2}\sim N_{\mathrm{q}}^{-3/4}\); hence, a distinct quantum advantage occurs, although it does not reach the Heisenberg limit. Next for the case of EIS, we find \(N_{\mathrm{q}}\sim N_{\mathrm{shot}} 2^{M+1}\) and \({{\mathscr {I}}}(a)\sim N_{\mathrm{shot}} 2^{2(M+1)}/3\), which as a result lead to \({\hat{\varepsilon }}\sim N_{\mathrm{q}}^{-1}\). Therefore, this choice is asymptotically optimal; we again emphasize that the statistical method certainly serves as a guide for us to find an optimal sequence \(\{m_k\}\), achieving an optimal quantum amplitude estimation algorithm. But note that these quantum advantages are guaranteed only in the asymptotic regime and that the realistic performance with the finite (or rather short) circuit depth should be analyzed. We will carry out a numerical simulation to see this realistic case in the following.

3.3 Numerical simulation

In this section, the ML estimates \({\hat{\theta }}_a\) and errors \({\hat{\varepsilon }}\) are evaluated numerically for several fixed target probabilities \(a=\sin ^2 \theta _a\). Based on the chosen sequence of \(\{N_k\}\) and \(\{m_k\}\) shown in the previous subsection, \(h_k\)’s in Eq. (5) are generated using the Bernoulli sampling with probability \(\sin ^2((2m_k+1)\theta _a)\) for each k. The global maximum of the likelihood function can be obtained by using a modified brute-force search algorithm; the global maximum of \(\prod _{k=0}^{m}L_k(h_k;\theta _a)\) is determined by searching around the vicinity of the estimated global maximum for \(\prod _{k=0}^{m-1}L_k(h_k;\theta _a)\). The errors \({\hat{\varepsilon }}\) are evaluated by repeating the aforementioned procedures 1000 times for each \(N_{\mathrm{q}}\).

In Fig. 2, the relationship between the number of queries and errors is plotted for the target probabilities \(a=\sin ^2 \theta _a=2/3\), 1/3, 1/6, 1/12, 1/24, and 1/48 with \(N_{\text {shot}}=100\). The (red) triangles and (black) circles in Fig. 2 are errors that are obtained using LIS and EIS, respectively. For comparison, numerical simulations with \(m_k=0\) for all k are also performed, and the results are plotted as (blue) squares in Fig. 2. In addition, the lower bounds of errors (13) when the estimate is not biased are also plotted as (red) dotted and (black) solid lines for LIS and EIS, respectively. The (blue) dashed lines in Fig. 2 are the lower bounds for classical random sampling, i.e., \(\sqrt{a(1-a)/N_{\mathrm{q}}}\).

The slopes of the simulated results with the target probability \(a=\sin ^2\theta _a=1/48\) ranging from \(N_{\mathrm{q}}\simeq 10^3\) to \(N_{\mathrm{q}}\simeq 10^5\) in Fig. 2 are fitted by \(\log {\hat{\varepsilon }}=\gamma \cdot \log N_{\mathrm{q}}+\delta \), and the fitted parameters corresponding to the slope are obtained as \(\gamma =-0.76\), \(\gamma =-0.95\) and \(\gamma =-0.50\) for LIS and EIS, and classical random sampling, respectively. Similar slopes are obtained with other target probabilities. The fitted values of \(\gamma \) for LIS and EIS are consistent with the slopes obtained using the Fisher information, although \(\gamma \) slightly deviated from the theoretical values. This slight deviation indicates that \({\hat{a}}\) is a biased estimate; in fact, this deviation decreases as \(N_{\text {shot}}\) increases, which is consistent with the fact that, in general, the ML estimate becomes unbiased asymptotically as the sampling number increases. Also, the efficiency of the ML estimate can be observed in the numerical simulation; the estimation error approaches the Cramér–Rao lower bound (13). In Appendix A, we show the comparison of the error for the conventional phase-estimation-based approach with that of EIS. As a result, their estimation errors are found to be comparable.

Fig. 2
figure 2

Relationships between the number of queries and the estimation error for several target probabilities \(a=\sin ^2\theta _a\). The lower bounds of estimation error based on the Cramér–Rao inequality are depicted as lines, the (blue) dashed line is for \(m_k=0\) for all k (classical random sampling), the (red) dotted line is for \(m_0=0, m_1=1, \ldots m_M=M\) (LIS), and the (black) solid line is for \(m_0=0, m_1=2^0, \ldots m_M=2^{M-1}\) (EIS), respectively. The estimation errors obtained by numerical simulations are also plotted as symbols, the (blue) squares are for classical random sampling, the (red) triangles are for LIS, and the (black) circles are for EIS (Color figure online)

Finally, we remark that the computational complexity for naively finding the maximum of the likelihood function is on the order of \({{\mathscr {O}}}((1/\varepsilon )\ln (1/\varepsilon ))\) if \(m_k\) exponentially grows, as in EIS. This is because the computational complexity to obtain the likelihood function \(\ln L(\mathbf {h};\theta _a)\) is evaluated as \({{\mathscr {O}}}(M)\) in this case. The order of the error \(\varepsilon \) is estimated as \({{\mathscr {O}}}(N_{\mathrm{q}}^{-1})\) based on the Cramér–Rao bound. Because \(N_{\mathrm{q}}\sim 2^M N_{\mathrm{shot}}\), the complexity of evaluating the likelihood function is \({{\mathscr {O}}}(\ln (1/\varepsilon ))\). Assuming that the brute-force search among \(1/\varepsilon \) segments is performed to find the global maximum of the likelihood function, the complexity of finding the maximum is \({\mathscr {{O}}}((1/\varepsilon )\ln (1/\varepsilon ))\). In the case of LIS, the order of the computational complexity can also be evaluated as \({{\mathscr {O}}}(\varepsilon ^{-5/3})\) in the same manner as before. It should be noted that the brute-force search algorithm for finding global minima of \(\prod _{k=0}^{M}L_k(h_k;\theta _a)\) is not necessary if \(m_k\) is zero for all k (classical case), since the target value is simply obtained by \({\hat{a}}=\sum _{k=0}^{M}h_k/\sum _{k=0}^{M}N_{\mathrm{shot}}\). The error can be obtained as \({{\mathscr {O}}}(N_{\mathrm{q}}^{-1/2})\) based on the Cramér–Rao bound. Due to the fact that \(N_{\mathrm{q}}=N_{\mathrm{shot}}M\), the computational complexity in the classical case is \({{\mathscr {O}}}(\varepsilon ^{-2})\). The evaluated computational complexities of post-processing for different update rules of \(m_k\) are summarized together with the query complexities in Table 1.

Table 1 Summary of the complexities for estimating target value with given error \(\varepsilon \). The query complexity and computational complexity of post-processing for different update rules of \(m_k\) are listed

4 Application to the Monte Carlo integration

We conduct a Monte Carlo integration as an example of the application of our algorithm, as follows. In this section, we first review the quantum algorithm to calculate the Monte Carlo integration by amplitude estimation [15] and then explain the amplitude amplification operator used in our algorithm. Next, we present the integral of the sine function as a simple example of Monte Carlo integration. Using this example, we discuss the number of CNOT gates and qubits required for our algorithm and the conventional amplitude estimation [16].

4.1 The Monte Carlo integration as an amplitude estimation

One purpose of the Monte Carlo integration is to calculate the expected value of real-valued function \( 0 \le f(x) \le 1\) defined for n-bit input \(x \in \{0,1\}^n\) with probability p(x):

$$\begin{aligned} {\mathbb {E}}[f(x)]=\sum _{x=0}^{2^n-1}p(x)f(x). \end{aligned}$$
(14)

In the quantum algorithm for the Monte Carlo integration, an additional (ancilla) qubit is introduced and assumed to be rotated as

$$\begin{aligned} {{\mathscr {R}}}|x\rangle _n|0\rangle =|x\rangle _n\left( \sqrt{f(x)}|1\rangle + \sqrt{1-f(x)}|0\rangle \right) , \end{aligned}$$
(15)

where \({{\mathscr {R}}}\) is a unitary operator acting on \(n+1\) qubits. In addition, an algorithm \({{\mathscr {P}}}\) is introduced, and operating \({{\mathscr {P}}}\) to n-qubit resister \(|0\rangle _n\) yields

$$\begin{aligned} {{\mathscr {P}}}|0\rangle _n=\sum _{x=0}^{2^n-1}\sqrt{p(x)}|x\rangle _n, \end{aligned}$$
(16)

where all qubits in \(|0\rangle _n\) are in the state \(|0\rangle \). Operating \({\mathscr {R(P\otimes {\mathbf {I}}_{\mathrm{1}})}}\) to the state \(|0\rangle _n|0\rangle \) generates \(|{\varPsi }\rangle \):

$$\begin{aligned} |{\varPsi }\rangle= & {} {\mathscr {R(P\otimes {\mathbf {I}}_{\mathrm{1}})}}|0\rangle _n|0\rangle \end{aligned}$$
(17)
$$\begin{aligned}= & {} \sum _{x=0}^{2^n-1}\sqrt{p(x)}|x\rangle _n\left( \sqrt{f(x)}|1\rangle + \sqrt{1-f(x)}|0\rangle \right) , \end{aligned}$$
(18)

where \({\mathbf {I}}_{\mathrm{1}}\) is the identity operator acting on an ancilla qubit. For convenience, we put \(a=\sum _{x=0}^{2^n-1}p(x)f(x)\) and introduce two orthonormal bases:

$$\begin{aligned} |{\tilde{{\varPsi }}}_1\rangle= & {} \frac{1}{\sqrt{a}}\sum _{x=0}^{2^n-1}\sqrt{p(x)} \sqrt{f(x)}|x\rangle _n|1\rangle , \end{aligned}$$
(19)
$$\begin{aligned} |{\tilde{{\varPsi }}}_0\rangle= & {} \frac{1}{\sqrt{1-a}}\sum _{x=0}^{2^n-1}\sqrt{p(x)} \sqrt{1-f(x)}|x\rangle _n|0\rangle . \end{aligned}$$
(20)

By using these bases, the state \(|{\varPsi }\rangle \) can be rewritten as

$$\begin{aligned} |{\varPsi }\rangle =\sqrt{a} |{\tilde{{\varPsi }}}_1\rangle +\sqrt{1-a} |{\tilde{{\varPsi }}}_0\rangle . \end{aligned}$$
(21)
Fig. 3
figure 3

Quantum circuit of amplitude amplification for the Monte Carlo integration

Then, the square root of expected value \(a={\mathbb {E}}[f(x)]\) appears in the amplitude of \(|{\tilde{{\varPsi }}}_1\rangle \), and the Monte Carlo integration can be regarded as an amplitude estimation of \(|{\tilde{{\varPsi }}}_1\rangle \). The operator \({\mathbf {Q}}\) defined in Eq. (1) can be achieved using \({\mathbf {U}}_{{\varPsi }}{\mathbf {U}}_{{\tilde{{\varPsi }}}_0}\), where \({\mathbf {U}}_{{\tilde{{\varPsi }}}_0}={\mathbf {I}}-2|{\tilde{{\varPsi }}}_0\rangle \langle {\tilde{{\varPsi }}}_0|\), \({\mathbf {U}}_{{\varPsi }}={\mathbf {I}}-2|{\varPsi }\rangle \langle {\varPsi }|\), and \({\mathbf {I}}\) is the identity acting on \(n+1\) qubits [16]. In terms of a practical point of view, we use \({\mathbf {U}}_{{\tilde{{\varPsi }}}_0}={\mathbf {I}}_{n+1}-2{\mathbf {I}}_n|0\rangle \langle 0|\), where \({\mathbf {I}}={\mathbf {I}}_{n+1}={\mathbf {I}}_n\otimes (|0\rangle \langle 0|+|1\rangle \langle 1|)\). By putting \(a=\sin ^2\theta _a\) and using Eq. (3), we could apply our algorithm to the Monte Carlo integration. The circuit diagram of the amplitude amplification used in our algorithm is shown in Fig. 3. Note that the multi-qubit gate consisting of \({{\mathscr {P}}}\) and \({{\mathscr {R}}}\) in Fig. 3 corresponds to the quantum algorithm \({{\mathscr {A}}}\) shown in Sect. 2, and the only ancilla qubit for each k is measured when our algorithm is applied to the Monte Carlo. Similarly, the circuit of the conventional amplitude estimation [16] is shown in Fig. 4. In the following, we applied our algorithm to a very simple integral of the sine function and compared the number of CNOT gates and qubits with the results of the conventional amplitude estimation.

Fig. 4
figure 4

Quantum circuit of conventional amplitude estimation for the Monte Carlo integration. \(F_{m}^{-1}\) represents the inverse QFT of m qubits

4.2 Simple example: integral of the sine function

As a simple example of the Monte Carlo integration, the following integral is considered.

$$\begin{aligned} I = \frac{1}{b_{\text {max}}} \int _0^{b_{\text {max}}} \sin (x)^2 dx, \end{aligned}$$
(22)

where \(b_{\text {max}}\) is a constant that determines the upper limit of the integral. By discretizing this integral in n-qubit, we obtain

$$\begin{aligned} S = \sum _{x=0}^{2^n-1} p\left( x\right) \sin ^2\left( \frac{\left( x+\frac{1}{2}\right) b_{\text {max}}}{2^n}\right) , \end{aligned}$$
(23)

where \(p(x)=\frac{1}{2^n}\) is a discrete uniform probability distribution. We now explicitly describe the operators \({{\mathscr {P}}}\) and \({{\mathscr {R}}}\) for applying our algorithm to calculate sum (23). The operator \({{\mathscr {P}}}\) acting on the n-qubit initial state can be defined as

$$\begin{aligned} {{\mathscr {P}}}: |0\rangle _n |0\rangle \rightarrow \frac{1}{\sqrt{2^n}} \sum _{x=0}^{2^n-1}|x\rangle _n|0\rangle . \end{aligned}$$
(24)

The operator \({{\mathscr {P}}}\) can be constructed using n Hadamard gates. The operator \({{\mathscr {R}}}\) acting on the \((n+1)\)-qubit state \(|x\rangle _n|0\rangle \) can be defined as

$$\begin{aligned} {{\mathscr {R}}}: |x\rangle _n |0\rangle \rightarrow |x\rangle _n\left( \sin \left( \frac{\left( x+\frac{1}{2}\right) b_{\text {max}}}{2^n} \right) |1\rangle +\cos \left( \frac{\left( x+\frac{1}{2}\right) b_{\text {max}}}{2^n} \right) |0\rangle \right) . \end{aligned}$$
(25)

The operator \({{\mathscr {R}}}\) can be constructed using controlled Y-rotations as illustrated in Fig. 5.

Fig. 5
figure 5

Quantum circuit achieving the operator \({{\mathscr {R}}}\) in Eq. (25). In this circuit, \(|x\rangle \) in Eq. (25) is represented by n qubits, denoted by \(|q\rangle ^{(1)}\), \(|q\rangle ^{(2)},\ldots ,|q\rangle ^{(n)}\). \(R_y(\theta )\) represents a Y-rotation with angle \(\theta \)

We now explicitly show an example of the circuits for the amplitude amplification used in our algorithm and a conventional amplitude estimation with a single \({\mathbf {Q}}\) operation, which calculates sum (23). For simplicity, the circuit for \(b_{\text {max}} = \pi /4 \) and \(n=2\) is shown here. The quantum circuits for amplitude amplification and conventional amplitude estimation are shown in Figs. 6 and 7, respectively. In these circuits, all-to-all qubit connectivity is assumed. From these figures, we can see that the circuit for conventional amplitude estimation tends to have more gates and qubits than that of our algorithm. Furthermore, the multi-controlled operation in the conventional amplitude estimation circuit of Fig. 7 may require several ancilla qubits.

Table 2 shows the number of CNOT gates and qubits as a function of the number of \({\mathbf {Q}}\) operators required for conventional amplitude estimation and our algorithm. Here, we assume the gate set supported by Qiskit ver. 0.7 [29]. Because the number of CNOT gate operations is restricted in NISQ devices due to the error accumulation, the numbers of CNOT gates in our algorithm only those for the circuit with the largest \(m_k\) are evaluated. The numbers of CNOT gates in our algorithm are about 7–18 times smaller than those of conventional amplitude estimation. The number of qubits required for conventional amplitude estimation increases as the number of \({\mathbf {Q}}\) operations increased, while that for our algorithm kept constant. The source code for Monte Carlo integration based on our proposed algorithm is available at [30].

Fig. 6
figure 6

Quantum circuit of amplitude amplification in the case of \(n=2\) with single \({\mathbf {Q}}\) operation

Fig. 7
figure 7

Quantum circuit of conventional amplitude estimation in the case of \(n=2\) with a single \({\mathbf {Q}}\) operation

Table 2 Number of CNOT gates and qubits to calculate (23) as a function of \({\mathbf {Q}}\) operations

5 Conclusion

We proposed a quantum amplitude estimation algorithm achieving quantum speedup by reducing controlled gates with ML estimation. The essential idea of the proposed algorithm is constructing a likelihood function using the outcomes of measurements on several quantum states, which are transformed by the amplitude amplification process. Although the probability measuring good or bad states depends on the number of amplitude amplification operations, the outcomes are correlated due to the fact that each amplified probability is a function of a single parameter. To test the efficiency of the proposed algorithm, we performed numerical simulations and analyzed the relationships between the number of queries and estimation error. Empirical evidences showed the algorithm could estimate the target value with fewer queries than the classical algorithm. We also presented the lower bound of the estimation error in terms of the Fisher information and found that the estimation error observed in a numerical simulation was sufficiently close to the Heisenberg limit. In addition, we experimented the proposed algorithm for a Monte Carlo integration and found that fewer CNOT gates and qubits were required in comparison with the conventional amplitude estimation. These facts indicate that our algorithm could work well even with noisy intermediate-scale quantum devices.

Shortly after the publication of our results, simplified quantum counting and amplitude estimation without QFT with rigorous proofs were shown [31]. In contrast to our approach that can be run in parallel on multiple quantum devices, the simplified algorithms are adaptive and have to be run sequentially. They also require large constant-factor overhead, e.g., millions of measurement samples, which could be expensive in practice. Nevertheless, there are several interesting directions for future work as pointed out in [31], such as, obtaining rigorous proofs for our parallel approach and achieving quantum speedups with depth-limited quantum circuits.