## Introduction

Quantum computing, which utilizes quantum entanglement and quantum superpositions inherent to quantum mechanics, is rapidly gaining ground to overcome the limitations of classical computing. Shor’s algorithm [1] solving the integer factorization problem in a polynomial time and Grover’s algorithm [2] making it possible to substantially speed up the search in unstructured databasesFootnote 1 are one of the best-known examples of the astounding properties of quantum computing (see [6], for example, for various applications of quantum computing).

An implementation of the Fourier transform as a quantum circuit sometimes plays a crucial role on quantum computing. Indeed, the quantum Fourier transform (QFT) [7] is a key ingredient of many important quantum algorithms, including Shor’s factoring algorithm and the quantum phase estimation algorithm to estimate the eigenvalues of a unitary operator. Here, the QFT is the Fourier transform for the amplitudes of a quantum state:

\begin{aligned} \sum _{j=0}^{N-1} x_j |j\rangle \longmapsto \sum _{k=0}^{N-1} X_k |k\rangle , \end{aligned}
(1.1)

where we set $$N=2^n$$, and the amplitudes $$\{X_k\}$$ are the classical discrete Fourier transform of the amplitudes $$\{x_j\}$$

\begin{aligned} X_k=\sum _{j=0}^{N-1}W_N^{j k}x_j, \qquad x_j=\frac{1}{N}\sum _{k=0}^{N-1} W_N^{-jk}X_k, \end{aligned}
(1.2)

where $$W_N:=\exp (-2\pi i/N)$$. Due to the superposition of the state (1.1) and quantum parallelism, the QFT can be implemented in a quantum circuit consisting of $$O(n^2)$$ quantum gates, which is much more efficient than the fast Fourier transform (FFT) [8] whose complexity of the computation is $$O(n 2^n)$$.

The Fourier transform that we consider in this paper is somewhat different from the QFT: We propose a quantum implementation of the algorithm of the FFT rather than the QFT. In our procedure, a data sequence is expressed in terms of a tensor product of vector spaces: $$\bigotimes _{j=0}^{N-1}|x_j\rangle$$. Namely, the state vectors representing the given classical information are prepared via so-called basis encoding [9]. (On the other hand, the QFT (1.1) is based on the amplitude encoding.) Based on the basis encoding, the Fourier transform is defined as

\begin{aligned} \bigotimes _{j=0}^{N-1}|x_j\rangle \longmapsto \bigotimes _{k=0}^{N-1}|X_k\rangle , \end{aligned}
(1.3)

where the data sequence $$\{X_k\}$$ is the Fourier transform of $$\{x_j\}$$ as expressed in (1.2). We adopt the reversible FFT [10] as an algorithm of the above Fourier transform and implement it as a quantum circuit whose computational complexity is $$O(n 2^n)$$. In this point of view, the processing speed is the same as the classical one, as long as we consider only a single data sequence. Nevertheless, there are following advantages compared to the classical FFT, and even compared to the QFT. The first is due to quantum parallelism. Namely, utilizing quantum superposition of multiple data sets, we can simultaneously process them. Note here that, there exist several problems how to encode classical data in quantum states (and also how to read resultant superposed quantum data), which are peculiar to quantum computing. To take advantage of quantum computing, a qRAM suitable for quantum computation [3,4,5] is necessary [see Sect. 5 for comparison of computational costs between the classical FFT and our quantum version of the FFT (let us denote it as QFFT) including data encoding]. The second is due to its high versatility: The method is always applicable to data sets that can be processed by the conventional FFT. The third advantage is its data storage efficiency in terms, for instance, of the quantum image (see [11,12,13,14] for some applications of the QFT to quantum data sets).

Let us illustrate the third advantage above with a simple example: an $$L \times L$$ pixel image with a grayscale value ranging from 0 to $$M-1$$ ($$M=2^m$$) (see Fig. 1 for $$L=2$$). (This problem is equivalent to a lattice quantum many-body problem on an $$L\times L$$ square lattice with each site occupied by a particle with M degrees of freedom.) This quantum image $$|\psi ^{(\alpha )}\rangle$$ ($$\alpha$$ denotes the label of the image) can be represented by a tensor product of vector spaces [15,16,17,18]:

\begin{aligned} |\psi ^{(\alpha )}\rangle =\bigotimes _{(j,k)=(0,0)}^{(L-1,L-1)} |x^{(\alpha )}_{j,k}\rangle , \qquad 0\le x^{(\alpha )}_{j,k}\le M-1. \end{aligned}
(1.4)

Since $$|\psi ^{(\alpha )}\rangle \in (\mathbb {C}^2)^{\otimes m L^2}$$, it uses $$mL^2$$ qubits. By use of the quantum superposition, the QFFT can simultaneously process at most $$2^{mL^2}$$ quantum images:

\begin{aligned} |\Psi \rangle =\sum _{\alpha =1}^{2^{mL^2}} c_{\alpha } |\psi ^{(\alpha )}\rangle \in (\mathbb {C}^2)^{\otimes m L^2} \qquad (c_{\alpha }\in \mathbb {C}). \end{aligned}
(1.5)

On the other hand, to apply the QFT to the above image processing, we need to prepare the quantum image in the form of

\begin{aligned} |\widetilde{\psi }^{(\alpha )}\rangle =\sum _{(j,k)=(0,0)}^{(L-1,L-1)} x^{(\alpha )}_{j,k}|j,k\rangle , \end{aligned}
(1.6)

where $$|\widetilde{\psi }^{(\alpha )}\rangle \in (\mathbb {C}^2)^{\otimes 2\log _2L}$$ which uses only $$2\log _2L$$ qubits [cf. (1.4) for the QFFT] [19,20,21]. However, since the Fourier coefficients [see (1.1) and (1.2)] are expressed as the amplitudes of the superposition, it takes exponentially long time to extract all of them completely. Furthermore, to properly perform the Fourier transform for the multiple $$2^{m L^2}$$ quantum images, they must be represented as

\begin{aligned} |\widetilde{\Psi }\rangle = \bigotimes _{\alpha =1}^{2^{mL^2}}|\widetilde{\psi }^{(\alpha )}\rangle \in (\mathbb {C}^2)^{\otimes (2^{mL^2+1})\log _2L}. \end{aligned}
(1.7)

Namely, for the QFT, $$(2^{mL^2+1})\log _2L$$ qubits are required to process the $$2^{m L^2}$$ quantum images, which are much larger than $$m L^2$$ qubits for the QFFT. Moreover, the QFT must be applied to each image individually, since the data set (1.7) is not a superposition of images but a tensor product of each image.Footnote 2 As a result, the total processing time for the QFFT is shorter than that for the QFT, when the number of the quantum images is sufficiently large.

In this paper, we construct a quantum circuit of the above explained QFFT, by implementing some elementary arithmetic operations such as a quantum adder [22,23,24,25,26,27], subtractor [28,29,30,31] and newly developed shift-type operations, as efficiently as possible: Our quantum circuit does not generate any garbage bits.

The outline of the paper is as follows. In the subsequent section, introducing the algorithm of a quantum version of the FFT, we show the elementary arithmetic operations required for the implementation of the QFFT as a quantum circuit. In Sect. 3, we actually implement these elementary arithmetic operations into quantum circuits. In Sect. 4, combining these elementary circuits efficiently, we construct a quantum circuit for the QFFT. The number of quantum gates required for the implementation of the QFFT is estimated in Sect. 5. The computational costs between the classical FFT and the QFFT including data encoding are also discussed in this section. In Sect. 6, we illustrate a concrete example of an application of the QFFT. Section 7 is devoted to a summary and discussion. Some technical details are deferred to Appendix.

## Elementary operations required for the QFFT

In this section, we introduce the algorithm of a quantum version of the FFT and pictorially represent several arithmetic operations required for the implementation of the QFFT as a quantum circuit. (See [32], for instance, for the detailed algorithm of the FFT.) We only use the basis encoding method to obtain the quantum states. The matrix-like notations introduced here are helpful for the implementation of quantum algorithms.

### Algorithm of the QFFT

Let us start the formula (1.2) and (1.3) of the Fourier transform. Setting $$W_{N}=\exp (-2\pi i/N)$$ $$(N=2^n)$$ and decomposing the summation in (1.2) into the odd and even parts, we have

\begin{aligned} \left| X_k\right\rangle =\left| G^{(n-1,0)}_k+W_{N}^{k} G^{(n-1,1)}_k\right\rangle , \,\, \left| X_{k+N/2}\right\rangle =\left| G^{(n-1,0)}_k-W_{N}^k G^{(n-1,1)}_k\right\rangle , \end{aligned}
(2.1)

where $$0\le k \le N/2-1$$, and $${G_k^{(n-1,p)}}$$ ($$p=0,1$$) is the Fourier coefficients for $$\{x_{2r+p}\}$$ $$(0\le r \le N/2-1)$$:

\begin{aligned} {G^{(n-1,p)}_k}={\sum _{r=0}^{N/2-1}W_{N}^{2rk} x_{2r+p}} \quad (p=0,1). \end{aligned}
(2.2)

Note that $$G^{(n-1,p)}_{k+N/2}=G^{(n-1,p)}_k$$ and $$W^{N/2}_N=-1$$ hold. In general, $$X_k$$ is a complex number and the notation $$\left| X_k\right\rangle$$ stands for $$\left| (X_k)_r\right\rangle \otimes \left| (X_k)_i\right\rangle$$, where $$(X_k)_r$$ and $$(X_k)_i$$ are the real and imaginary part of $$X_k$$, respectively. Pictorially, (2.1) can be represented as so-called a butterfly diagram:

(2.3)

where $$0\le k \le N/2-1$$. Here, the broken line means the multiplication by $$-1$$. For convenience, we also denote it as a matrix-like notation:

\begin{aligned} \begin{bmatrix} |X_k\rangle \\ |X_{k+N/2}\rangle \end{bmatrix} =\begin{bmatrix} 1 &{} 1 \\ 1 &{}-1 \end{bmatrix} \begin{bmatrix} 1 &{} 0 \\ 0 &{}W_N^k \end{bmatrix} \begin{bmatrix} \left| G^{(n-1,0)}_k\right\rangle \\ \left| G^{(n-1,1)}_k\right\rangle \end{bmatrix} =\begin{bmatrix} \left| G^{(n-1,0)}_k+W_{N}^{k} G^{(n-1,1)}_k\right\rangle \\ \left| G^{(n-1,0)}_k-W_{N}^{k} G^{(n-1,1)}_k\right\rangle \end{bmatrix}. \end{aligned}
(2.4)

Here, the matrix-like operation is defined as

\begin{aligned} \begin{bmatrix} A &{} B \\ C&{} D \end{bmatrix} \begin{bmatrix} |a\rangle \\ |b\rangle \end{bmatrix} =\begin{bmatrix} |A a+B b\rangle \\ |C a+D d\rangle \end{bmatrix} \quad (A,B,C, D\in \mathbb {C}). \end{aligned}
(2.5)

Do not confuse the above manipulation with conventional matrix operations: The results are not liner combinations of $$|a\rangle$$ and $$|b\rangle$$. The matrix-like notations are useful to implement quantum algorithms as quantum circuits.

Again we decompose the Fourier transform for $$\{x_{2r}\}$$ (resp. $$\{x_{2r+1}\}$$) into that for $$\{x_{4s}\}$$ and $$\{x_{4s+2}\}$$ (resp. $$\{x_{4s+1}\}$$ and $$\{x_{4s+3}\}$$) ($$0\le s \le N/4-1$$). The result reads

\begin{aligned} \begin{bmatrix} \left| G_k^{(n-1,p)}\right\rangle \\ \left| G_{k+N/4}^{(n-1,p)}\right\rangle \end{bmatrix} =\begin{bmatrix} \left| G_k^{(n-2,p)}+W_{N/2}^k G_k^{(n-2,p+2)}\right\rangle \\ \left| G_k^{(n-2,p)}-W_{N/2}^k G_k^{(n-2,p+2)}\right\rangle \end{bmatrix}\quad (p=0,1;0\le k \le N/4-1), \end{aligned}
(2.6)

where

\begin{aligned} G_k^{(n-2,q)}=\sum _{s=0}^{N/4-1} W_N^{4 sk} x_{4s+q} \quad (0\le q \le 3; \,\, 0\le s \le N/4-1). \end{aligned}
(2.7)

Repeating this procedure, one obtains the following recursion relation:

\begin{aligned} \begin{bmatrix} \left| G_k^{(n-m,p)}\right\rangle \\ \left| G_{k+N/2^{m+1}}^{(n-m,p)}\right\rangle \end{bmatrix}= \begin{bmatrix} \left| G_k^{(n-m-1,p)}+W_{N/2^m}^k G_k^{(n-m-1,p+2^{m})}\right\rangle \\ \left| G_k^{(n-m-1,p)}-W_{N/2^m}^k G_k^{(n-m-1,p+2^{m})}\right\rangle \end{bmatrix}, \end{aligned}
(2.8)

where $$0\le p\le 2^{m}-1$$, $$0\le k \le N/2^{m+1}-1$$. The initial states are given by

\begin{aligned} \left| G_0^{(0,p)}\right\rangle =|x_p\rangle \,\, (0\le p\le N-1). \end{aligned}
(2.9)

This is the algorithm of the QFFT. The classical version is reproduced by just interpreting the state vectors as scalars.

Most importantly, the QFFT/FFT is decomposed into $$\log _2N$$ “layers,” where each layer consist of N/2 butterfly diagrams (see Fig. 2 for $$N=8$$): Totally $$(N\log _2 N)/2$$ diagrams are used in the QFFT/FFT. As a result, the total computational complexity of the Fourier transform (1.2) is reduced from $$O(N^2)$$ to $$O(N\log _2 N)$$ by the above procedure.

### Elementary operations in the QFFT

As seen in (2.4), to implement the QFFT in a quantum circuit, the multiplication of the matrices

\begin{aligned} \left[ \begin{matrix}1&{}1\\ 1&{}-1\end{matrix} \right] , \quad \left[ \begin{matrix}1&{}0 \\ 0&{}W_N^k\end{matrix}\right] \end{aligned}
(2.10)

should be carried out in terms of quantum computation. The first one is separated into an adder, a subtractor and shift operators by the LDU decomposition

\begin{aligned} \left[ \begin{matrix}1&{}1\\ 1&{}-1\end{matrix}\right] = \left[ \begin{matrix}1&{}0\\ 1&{}-1\end{matrix}\right] \left[ \begin{matrix}1&{}0\\ 0&{}2\end{matrix}\right] \left[ \begin{matrix}1&{}1\\ 0&{}1\end{matrix}\right] . \end{aligned}
(2.11)

Utilizing the matrix-like notation as in (2.4), the action of the first matrix defined in (2.4) on states $$|a\rangle$$ and $$|b\rangle$$ can be graphically interpreted as

(2.12)

Note again that the above operation differs from the conventional matrix operations. On the other hand, the second matrix is simply expressed as

(2.13)

Thus, the butterfly diagram as in (2.3) or (2.4) can be written as

(2.14)

Consequently, the QFFT can be implemented into a quantum circuit consisting of adders, subtractors and shift operators. In the next section, we explain these arithmetic operators. An actual implementation of these operators into the butterfly operations (2.14) is deferred to Sect. 4.

## Quantum circuits for arithmetic operations

In this section, we pictorially present a concept of some quantum arithmetic operations such as a quantum adder, subtractor and shift operators, which are required to implement the QFFT as a quantum circuit.

Here, we adopt two’s complement notation to represent a negative number. Let us write a state $$|a\rangle$$ $$(a\ge 0)$$ using the binary representation $$|a\rangle =|a_{n-1} \cdots a_0\rangle :=|a_{n-1}\rangle \otimes \cdots \otimes |a_0\rangle$$ ($$a_j\in \{0,1\}$$). Let m ($$m>n$$) be a total number of qubits of the system. Let us express $$|a\rangle$$ as

\begin{aligned} |a\rangle =|\underbrace{a_{+}a_{+}\cdots a_{+}}_{m-n}a_{n-1}\cdots a_0\rangle , \end{aligned}
(3.1)

where $$a_+=0$$. Then, a negative number $$|b\rangle$$ ($$=|-a-1\rangle$$) can be represented by the complement of $$|a\rangle$$:

\begin{aligned} |b\rangle =|\underbrace{a_{-} a_- \cdots a_-}_{m-n}\bar{a}_{n-1}\cdots \bar{a}_{0}\rangle , \end{aligned}
(3.2)

where $$a_-=\bar{a}_+=1$$, $$\bar{0}=1$$ and $$\bar{1}=0$$. Namely, for the m-qubit system, the number $$\{-2^{m-1},-2^{m-1}+1,\dots ,2^{m-1}-1\}$$ can be expressed by this notation. For instance, $$m=3$$

\begin{aligned} \begin{array}{llll} |0\rangle =|000\rangle ,&{}\quad |1\rangle =|001\rangle ,&{}\quad |2\rangle =|010\rangle ,&{} |3\rangle =|011\rangle ,\\ |-4\rangle =|100\rangle ,&{} \quad |-3\rangle =|101\rangle ,&{}\quad |-2\rangle =|110\rangle ,&{}\quad |-1\rangle =|111\rangle . \end{array} \end{aligned}
(3.3)

### Sign extension

In the actual computation, to avoid overflow, we sometimes need to increase the number of bits (a so-called sign extension). This operation can be achieved by just inserting $$a_{\pm }$$’s to the representation: For instance, the representation for the m-qubit system can be extended to that for the l-qubit system ($$l>m$$):

\begin{aligned} |\underbrace{a_{\pm }\cdots a_{\pm }}_{m-n}a_{n-1}\cdots a_0\rangle \longmapsto |\underbrace{a_{\pm }\cdots a_{\pm }}_{l-n}a_{n-1}\cdots a_0\rangle . \end{aligned}
(3.4)

In Fig. 3, we show a quantum circuit to increase the number of digits from 4-qubit to 6-qubit.

In Appendix, the number of extra qubits $$a_{\pm }$$ required for the QFFT is discussed.

Let us consider an adder and a subtractor, by slightly modifying the arguments developed in [27, 31].

The addition of two n-bit numbers with the binary representation $$a=a_{n-1} \cdots a_0$$ and $$b=b_{n-1} \cdots b_0$$ ($$a_j, b_j\in \{0,1\}$$) is calculated as

(3.5)

where the carry bit $$c_j$$ and the sum bit $$s_j$$ ($$j=1, \cdots n$$) are defined by

\begin{aligned} c_j&={\left\{ \begin{array}{ll} a_0 b_0 &{}(j=1)\\ a_{j-1}b_{j-1} \oplus b_{j-1}c_{j-1} \oplus c_{j-1}a_{j-1} &{}(2 \le j \le n)\end{array}\right. }, \nonumber \\ s_j&={\left\{ \begin{array}{ll} a_0\oplus b_0 &{}(j=0)\\ a_j \oplus b_j \oplus c_j &{}(1 \le j \le n-1)\\ a_{\pm }\oplus b_{\pm }\oplus c_n &{}(j=n) \end{array}\right. }. \end{aligned}
(3.6)

Note that the symbol $$\oplus$$ denotes exclusive disjunction. In terms of a quantum circuit, this addition is implemented in the transformation of the state

\begin{aligned} |a\rangle \otimes |b\rangle \longmapsto |a\rangle \otimes |a+b\rangle , \end{aligned}
(3.7)

(3.8)

Figure 4 shows the actual circuit which is a slightly modified version of a quantum adder originally developed in [27]. The adder circuit consists of the Toffoli gate [33] and the Peres gate [34] defined as in Fig. 5, where V and $$V^{\dagger }$$ are, respectively, given by

\begin{aligned} V=\frac{1+i}{2}\begin{pmatrix}1 &{} -i \\ -i &{} 1\end{pmatrix}, \quad V^{\dagger }=\frac{1-i}{2} \begin{pmatrix}1 &{} i \\ i &{} 1\end{pmatrix}. \end{aligned}
(3.9)

On the other hand, using the identity $$\overline{\overline{a}+b} = a-b$$, we define a quantum subtractor as

\begin{aligned} |a\rangle \otimes |b\rangle \longmapsto {\left\{ \begin{array}{ll} |a\rangle \otimes |\overline{ \overline{a}+b }\rangle = |a\rangle \otimes |a-b\rangle \\ |a\rangle \otimes |\overline{ a+\overline{b} }\rangle = |a\rangle \otimes |-a+b\rangle \end{array}\right. }, \end{aligned}
(3.10)

which can be implemented by just inserting NOT gates (denoting it by the symbol $$\bigoplus$$) to the above defined adder (3.8) [31]:

(3.11)

The quantum circuit of the adder for $$n_{\mathrm{in}}$$-qubit input data consists of 6 “layers” as in Fig. 4. (Note here that the number of the layers does not depend on $$n_{\mathrm{in}}$$.) The first, second, fifth and sixth layers, respectively, contain $$n_{\mathrm{in}}-1$$, $$n_{\mathrm{in}}-2$$, $$n_{\mathrm{in}}-2$$ and $$n_{\mathrm{in}}-1$$ CNOT gates. The third layer consists of $$n_{\mathrm{in}}-1$$ Toffoli gates: $$5(n_{\mathrm{in}}-1)$$ CNOT gates are required. The fourth layer contains $$n_{\mathrm{in}}-1$$ Peres gates and one CNOT gates: $$4 (n_{\mathrm{in}}-1)+1$$ CNOT gates are required. Note that the Toffoli (resp. Peres) gate requires 5 (resp. 4) CNOT gates as shown in Fig. 5. Thus, totally $$13 n_{\mathrm{in}}-14$$ quantum gates are required for the adder circuit for $$n_{\mathrm{in}}$$-qubit data. On the other hand, the subtractor defined by (3.11) requires additional at most $$3 n_{\mathrm{in}}$$ CNOT gates, and hence, totally at most $$16 n_{\mathrm{in}}-14$$ quantum gates are required for the subtractor.

### Sign changing operation

Due to the identity

\begin{aligned} -a=(-1)\times a=\overline{a}+1, \end{aligned}
(3.12)

we can change the sign of the input number by an adder with NOT gate:

(3.13)

### Arithmetic shift operations

Let us implement an operation to multiply by $$2^p$$ ($$p\in \mathbb {N}$$):

\begin{aligned} |a\rangle \longmapsto |2^p a\rangle . \end{aligned}
(3.14)

This operation is carried out by shifting the digits to the left (arithmetic left shift):

\begin{aligned} |a\rangle =|\underbrace{a_{\pm }\dots a_{\pm }}_{m-n} a_{n-1}\dots a_0\rangle \longmapsto |\underbrace{a_{\pm }\dots a_{\pm }}_{m-n-p} a_{n-1}\dots a_0 \underbrace{0\cdots 0}_p\rangle =|2^p a\rangle . \end{aligned}
(3.15)

Let us pictorially express this operation as

(3.16)

In a similar manner, we can define an arithmetic right shift which is an operation to multiply by $$2^{-p}$$:

\begin{aligned} |a\rangle =&|\underbrace{a_{\pm }\dots a_{\pm }}_{m-n} a_{n-j-1} \dots a_{0}a_{-1}\cdots a_{-j}\rangle \nonumber \\&\longmapsto |\underbrace{a_{\pm }\dots a_{\pm }}_{m-n+p} a_{n-j-1}\cdots a_0a_{-1}\cdots a_{-j+p}\rangle =|2^{-p} a\rangle , \end{aligned}
(3.17)

where $$a_{-k}$$ $$(1\le k \le j-p)$$ are the fractional part of a. Note that, in the above operation, p significant digits are lost. We also graphically denote this operation

(3.18)

The actual implementation of these shift operations into quantum circuits can be accomplished by certain combinations of SWAP and CNOT gates: $$3n_{\mathrm{in}}-5$$ quantum gates (one CNOT gate and $$n_{\mathrm{in}}-2$$ SWAP gates consisting of 3 CNOT gates) are required for the shift operation of $$n_{\mathrm{in}}$$-qubit input data. In Fig. 6, we show a quantum circuit for the left (right) shift operation for $$m=3$$, $$n=3$$ and $$p=1$$.

Combining the adder, the subtractor and the shift operations, we can arithmetically manipulate an arbitrary number.

## Decomposition of the butterfly operation

Now, we decompose the butterfly operation (2.14) [see also (2.3)], which plays a central role on the QFFT, into the elementary arithmetic operations shown in previous section. First, we decompose the butterfly operation into elementary operations:

(4.1)

The above operation (4.1) is implemented as a quantum circuit consisting of one adder, one subtractor [see (3.8), (3.11) and Fig. 4 in Sect. 3.2 for a quantum circuit for the adder/subtractor] and one shift operation (3.18) (see also Fig. 6), which, respectively, require $$13 n_{\mathrm{in}}-14$$, $$16n_{\mathrm{in}}-14$$ and $$3n_{\mathrm{in}}-5$$ quantum gates for $$n_{\mathrm{in}}$$-qubit input. Therefore, the number of quantum gates used in the implementation of (4.1) is totally $$32 n_{\mathrm{in}}-33$$.

In the butterfly operation, the input states consist of $$|(W_N^k) a\rangle$$. Let us abbreviate the component $$W_N^k = \exp \left( -i \frac{2\pi }{N}k \right)$$ to $$\exp \left( i\theta \right)$$ for simplicity. The calculation of $$|\exp \left( i\theta \right) a\rangle$$ is decomposed into

\begin{aligned}&|\exp (i\theta )a\rangle = |(\cos \theta +i\sin \theta )(a_r+ia_i)\rangle = \begin{bmatrix} 1&i \end{bmatrix} \begin{bmatrix} \cos \theta &{}-\sin \theta \\ \sin \theta &{}\cos \theta \end{bmatrix} \begin{bmatrix} |a_r\rangle \\ |a_i\rangle \end{bmatrix}, \end{aligned}
(4.2)

where $$a_r$$ and $$a_i$$ are, respectively, the real and imaginary part of a. The rotation matrix is further decomposed into adding (resp. subtracting) operators (3.8) [resp. (3.11)] with arithmetic right shift operations (3.18) [10, 32, 35]:

Because

\begin{aligned} \left| \frac{\cos \theta -1}{\sin \theta } \right| \le 1 \,\, \text {for }\, \theta \in \left[ -\frac{\pi }{2},\frac{\pi }{2}\right] ,\quad \left| \frac{\cos \theta +1}{\sin \theta } \right| < 1 \,\, \text {for }\, \theta \in \left[ -\pi ,-\frac{\pi }{2}\right) \cup \left( \frac{\pi }{2},\pi \right] , \end{aligned}
(4.5)

to apply the right shift operator, we use (4.3) for $$\theta \in [-\frac{\pi }{2},\frac{\pi }{2}]$$, while for $$\theta \in [-\pi ,-\frac{\pi }{2})\cup (\frac{\pi }{2},\pi ]$$, we use (4.4). The each matrix in the RHS of the above decomposition is schematically given by

(4.6)

In fact, this decomposition makes it possible to efficiently implement the elementary arithmetic operations required for the QFFT. Namely, we develop an implementation of the above procedure into a quantum circuit so as not to generate any garbage bits. A quantum circuit for the operation (4.6) can be constructed by the adder circuit given in Sect. 3.2 (see Fig. 4). For $$p=2$$, the circuit is shown in Fig. 7. Except for some sign extension, the implementation of this operation requires $$13 n_{\mathrm{in}}-14$$ quantum gates, which is the number of the quantum gates in the adder for $$n_{\mathrm{in}}$$-qubit input data (see Sect. 3.2 in detail).

Thanks to the circuit (4.6), we can construct quantum circuits of adding and subtracting for an arbitrary number:

(4.7)

For instance, to add $$A=7\times a$$ to b, we just apply the add operation (4.6) three times, since

\begin{aligned} 7\times a=a\ll 2+a\ll 1+a. \end{aligned}
(4.8)

Thus, the decompositions (4.3) and (4.4) are summarized graphically:

(4.9)

Note that the quantum circuit of the first (resp. second) operation in (4.9) consists of three adders (resp. one adder and two subtractors). If we require an accuracy of $$2^{-A}$$ for the rotation $$|W_N^k|$$, then the quantum circuit needs $$A\times 3\times (13n_{\mathrm{in}}-14)$$ (resp. at most $$A \left\{ 2(16n_{\mathrm{in}}-14)+13n_{\mathrm{in}}-14\right\}$$ for the first (resp. second) operation of $$n_{\mathrm{in}}$$-qubit input.

In summary, the butterfly operation (2.3), which plays a central role for the QFFT (Fig. 2), is decomposed into two operations in (2.14). The first operation in (2.14) is further divided as shown in (4.1) which requires $$32 n_{\mathrm{in}}-33$$ quantum gates for $$n_{\mathrm{in}}$$-qubit input. The second operation in (2.14) can be reduced to (4.9) consisting of at most $$A(45n_{\mathrm{in}}-42)$$ quantum gates. As a consequence, the butterfly operation (2.3) consists of at most $$32 n_{\mathrm{in }}-33+A(45 n_{\mathrm{in }}-42)$$ quantum gates. The number of quantum gates of the main operations necessary for the QFFT is summarized in Table 1.

Considering that quantum circuit is reversible, the calculation of inverse QFFT requires two matrices

\begin{aligned} \left\{ \left[ \begin{matrix}1&{}1\\ 1&{}-1\end{matrix}\right] \left[ \begin{matrix}1&{}0\\ 0&{}(W_N)^k\end{matrix}\right] \right\} ^{-1} =\left[ \begin{matrix}1&{}0\\ 0&{}(W_N)^{-k}\end{matrix}\right] \left[ \begin{matrix}1/2&{}1/2\\ 1/2&{}-1/2\end{matrix}\right] , \end{aligned}
(4.10)

which are also decomposed into an adder, a subtractor and shift operators.

## Computational complexities

### Total number of quantum gates

The QFFT algorithm described in Sect. 2 is decomposed into several arithmetic operations, which is implemented in quantum circuits as in Sects. 3 and 4. Here, we estimate the total number of the quantum gates required for the implementation. As in Sect. 2, the QFFT consists of $$(N\log _2 N)/2$$ butterfly operations, where each operation consists of at most $$32 n_{\mathrm{in}}-33+A(45 n_{\mathrm{in }}-42)$$ quantum gates as explained in Sect. 4. Here, $$n_{\mathrm{in}}$$ and A denote the number of input qubits and the accuracy of rotation, respectively. See Table 1 for the number of quantum of some elementary operations. As a result, the total number of the gates $$n_{g}$$ required for the QFFT is estimated to be at most

\begin{aligned} n_g=\left\{ 32 n_{\mathrm{in }}-33+A(45 n_{\mathrm{in }}-42)\right\} \times \frac{N}{2}\log _2 N. \end{aligned}
(5.1)

### Computational costs including data encoding

To take advantage of quantum computing, some efficient method to encode classical data in quantum states such as a qRAM [3,4,5] must be necessary. Here, we briefly comment on computational costs for the QFFT including data encoding, taking a simple example as illustrated in the introduction.

Let us process N images of $$L\times L$$ pixels each (see Fig. 1 for $$L=2$$, for instance). For comparison, first, we analyze the computational complexity of the classical FFT. As described in the introduction, the complexity to process each image is $$O(L^2 \log _2 L^2)$$. Namely, the total cost for the FFT is $$O(N L^2 \log _2 L^2)$$.

For the QFFT, one must encode the classical data stored in the classical RAM one by one in quantum states: $$O(N L^2)$$ processes are required to encode them. The computational complexity of the QFFT to process the quantum images is $$O(L^2 \log _2 L^2)$$. Consequently, the total complexity including data encoding is $$O(NL^2)+O(L^2\log _2 L^2)$$. Thus, as long as we use the classical RAM, there is not so much advantage.

Recently, a concept of quantum random access memory (qRAM) which makes it possible to drastically reduce the computational cost to encode classical data has been developed in [3,4,5]: The complexity to encode the data can be reduced from $$O(N L^2)$$ to $$O(L^2)$$. Namely, the total computational complexity for the QFFT with the qRAM is $$O(L^2\log _2 L^2)$$ which is much less than the conventional method.

In table 2, the complexities discussed here are summarized.

## Quantum information processing based on the QFFT

As mentioned previously, one of the advantages of the QFFT is its wide utility: The method is applicable to all the problems processed by the conventional FFT. Moreover, the QFFT can simultaneously process multiple data sets which can be generated by U(N) transformations realized by quantum gates as in [36].

As a concrete example, in Fig. 8, we illustrate a quantum circuit for the high/low pass filter applying to multiple data sets. A single n-qubit data labeled $$\alpha$$ is described as $$\bigotimes _{j=0}^{N-1}|x_j^{(\alpha )}\rangle$$ ($$N=2^n$$) with an auxiliary state $$\bigotimes |0\rangle$$. A sequence of operations, the QFFT, the SWAP gate acting on multiple qubits, and the inverse QFFT (IQFFT), generates both the high and low pass filtered data sets separated with some cutoff frequency $$\Lambda$$ through a single circuit. Multiple data sets can be processed simultaneously when the corresponding states are stored in a superposition state with some probability amplitudes $$\{c_\alpha \}$$. If enough numbers of data sets are given, this information processing system exceeds the one using the QFT. Replacing the QFFT with the two-dimensional QFFT, we can also use this system as an edge detector for multiple quantum images (see Fig. 9 as a conceptual image).

## Summary and discussion

In this paper, we have discussed an implementation of the FFT as a quantum circuit. The quantum version of the FFT (QFFT) is defined as a transformation of a tensor product of quantum states. The QFFT has been constructed by a combination of several fundamental arithmetic operators such as an adder, subtractor and shift operators which have been implemented into the quantum circuit of the QFFT without generating any garbage bits.

One of the advantages of the QFFT is due to its high versatility: The QFFT is applicable to all the problems that can be solved by the conventional FFT. For instance, the frequency domain filtering of digital images is one of the possible applications of the QFFT. A major advantage of using the QFFT lies in its quantum superposition: Multiple images are processed simultaneously. It is even superior to the QFT when the number of images is sufficiently large.

Utilization of the resultant multiple data sets obtained after performing the QFFT is also interesting. The QFFT sustains all the information of Fourier coefficients until the moment the quantum state is measured. If the quantum state that contains the Fourier coefficients of multiple data sets was passed on to some quantum device directly and there were some proper techniques to handle it, it would play a key role in the field of quantum machine learning.