Quantum Poisson Solver without Arithmetic

Solving differential equations is one of the most promising applications of quantum computing. Most existing quantum algorithms addressing general ordinary and partial differential equations are thought to be too expensive to execute successfully on Noisy Intermediate-Scale Quantum (NISQ) devices. Here we propose an efficient quantum algorithm for solving one-dimensional Poisson equation based on the simple R y rotations. Our quantum Poisson solver (QPS) avoids the need to perform the expensive routines such as phase estimation, quantum arithmetic or Hamiltonian simulation, etc. The complexity of our QPS is 3 n in qubits and 5/3 n 3 in one- and two-qubit gates, where n is the logarithmic of the number of discrete points. An overwhelming reduction of constant factors of big-O complexity is achieved, which is critical to evaluating the practicality of actually implementing the algorithm on a quantum computer. In terms of the error ε , the complexity is log(1/ ε ) in qubits and poly(log(1/ ε )) in operations. The algorithms are demonstrated using a quantum virtual computing system, and the circuits are executed successfully on the IBM real quantum computers. The present QPS may represent a potential real application for solving differential equations on NISQ devices.


Introduction
Quantum computing is capable of solving problems efficiently which are intractable for classical computing.Since the discovery of Shor's factoring algorithm [1], quantum algorithms demonstrating advantages over classical methods have been developed for a substantial variety of tasks [2].One of the most promising applications is for solving differential equations which is the main task in high-performance scientific computing.
A series of quantum algorithms for solving ordinary and partial differential equations (ODEs and PDEs) have been developed, which promise an exponential speed-up over classical algorithms in the dimension of equations and solution errors [3][4][5][6][7][8][9].The main idea of these algorithms is that encoding the differential equations as a linear system, and solving it by Quantum Linear Systems Algorithm (QLSA) [10][11][12] or Hamiltonian simulation [13,14].These algorithms are general methods for relatively general differential equations, but they are considered to be out of reach of large-scale faulttolerant quantum computers.
In this paper, we focus on a relatively particular problem that is solving the onedimensional Poisson equation with much lower cost.The Poisson equation, usually expressed as 2 f   , is a widely used PDE across many areas of physics and engineering [15].Many quantum algorithms are proposed to solve the d-dimensional Poisson equations [7,9,[16][17][18].Compared with them, the present quantum Poisson solver (QPS) has a much lower cost in qubits and quantum operations, and it aims to be implementable on Noisy Intermediate-Scale Quantum (NISQ) devices for real applications.
The inspiration of our method is from the following observation: the eigenvalues of the discretized matrix of the Poisson equation are square of sine values and sine values can be prepared as probability amplitudes easily by single-qubit Ry rotation.The QLSA is actually to produce a quantum state with probability amplitudes being the reciprocals of eigenvalues.There should exist a simple way of accomplishing it by operating Ry rotation only.That is, all the transformations are performed on the probability amplitude.With such a way, we could eliminate the need for phase estimation, Hamiltonian simulation [13,14] and quantum arithmetic [19][20][21], which are the major contributors to the complexity of the algorithm.
It is intriguing that the above inspiration can be realized naturally from the fact that the product of all eigenvalues of the discretized matrix equals constant.That is referred to as sine formula which is derived from the study of Cartan matrix in Lie algebra [22].Combining the sine formula and single-qubit Ry rotation, we establish our efficient QPS with a complexity of 3n in qubits and 5/3n 3 in one-and two-qubit gates, where n is the logarithmic of the dimension of the linear system.The present QPS is executed successfully on IBM near-term superconducting quantum computers and it would make our QPS the first rudiment of differential equation solver for NISQ devices.With the rapid developments of quantum hardware and techniques [23][24][25][26], our QPS is expected to be practical on NISQ devices.

The problem considered
We start with a high-level overview of the problem we address and the general idea of our method, before delving into details of the algorithm.The problem is solving the one-dimensional Poisson equation with Dirichlet boundary conditions, which can be described as The b(x) is a given smooth function representing, say, velocity distribution in fluid dynamics problems, and v(x) is the unknown to solve.We choose the most simple central difference approximation to discretize the secondorder derivative, then equation (1) turns to be The number of discrete points is N+1 and the mesh size h equals to 1/N.
Matrix A is a well-studied tridiagonal Toeplitz matrix.The eigenvectors and corresponding eigenvalues are ( ) 2 sin( ) , respectively [27].Such a linear system of equations can be, of course, solved using the standard QLSA as done in refs.[16,18].The solution state|v would be expressed as where C is a normalizing constant.In order to create such a state, the HHL algorithm first estimates eigenvalues λj through phase estimation process, then calculates the reciprocals of λj by arithmetic, and then transduce the reciprocals into the probability amplitudes through controlled rotation process [10].
Here we propose a much simple way to create the solution state based on the Ry rotation.The fundamental observation is that matrix A is a Cartan matrix, and there exists a beautiful sine formula as follows [22], where n is the logarithmic of the matrix dimension, namely n = log(N).Note that each term of sine square in the equation is actually the eigenvalue λj of matrix A up to a constant.Therefore, the reciprocal of an eigenvalue equals to the product of all the other eigenvalues.
However, equation ( 5) cannot be used directly to inverse eigenvalues, for the number of sine-square terms equals the dimension of matrix that is exponential with n.Fortunately, we find that by exploiting the distribution characteristic of the terms of sine-square, equation ( 5) can be simplified to be where n≥3, and m is such an integer that lies in [0, n-2] and makes 2 -m j to be an odd number.The numerator 8 represents the lower bound of eigenvalues for n≥2.Details about the derivation of this equation are described in Appendix A. With this equation, the reciprocal of each eigenvalue can be regarded as the product of (n-1) terms of sinesquare values.And note that sine values can be prepared as probability amplitudes easily through the controlled Ry rotation in quantum computing.This provides the possibility that the amplitude 1/λj in the solution state as shown in equation ( 4) can be prepared directly by just implementing probability amplitudes all through.The algorithm can thus avoid the need to do phase estimation, Hamiltonian simulation and arithmetic for solving the linear equations of equation ( 3), and thus for solving the onedimensional Poisson equation.Furthermore, the complexity of the algorithm is low and the solution error comes only from the initial discretizing process of the Poisson equation.

Circuits
The present algorithm proceeds in the following three steps: first with a change of basis, followed by a group of controlled Ry rotations and finally uncomputation.The overall circuit is shown in figure 1.
More specifically, the state | n b of register B can be expressed as If the result is|1 , then the state in register B is 1 | j j j j u     which encodes the solution of the Poisson equation.Typically, the amplitude amplification technique [28] can be applied before measurement to increase the success probability.For the module CB, a unitary operator U is required to accomplish the following diagonalization transformation, That is, the j th row of the operator U is uj, Apparently, U is the sine transform [16,29], and it can be implemented based on the discrete sine transform (DST) of type I in [29].The DST is related to discrete Fourier transform (DFT) as follows, † ).
The TN is a basis change matrix, and FT2N,  |1 is used to pick out sine transform from the direct sum in equation ( 9), ignoring the global phase.
The circuit costs n qubits and O(n 2 ) elementary gates after decomposition [30].Details about this circuit can be found in [18].Now we turn to the group of controlled Ry rotations to transduce the reciprocals of eigenvalues into a probability amplitude.The reciprocals of eigenvalues are calculated using equation (6).As can be seen, the sine-square terms are indexed by two variables, namely m and k.So the overall circuit of this part contains two levels.For the first level, the circuit consists of n-1 modules corresponding to m being from 0 to n-2.Each module is denoted as RYm as shown in figure 3. Note that in each module 2 -m j is an odd number.For the second level, each RYm module actually implements equation ( 6) with a certain m, thus it contains the (n-1) terms of sine-square values.figure 4 takes the RY0 and RY1 modules as the example to show the circuit design of RYm module.Since sine values can be prepared easily using a single-qubit Ry rotation, the sine-square values are obtained using two qubits as follows, Furthermore, it is very simple to implement the controlled 2 y R  operation in each RYm module.For a state | j in register B, the binary representation can be written as 12 1 2 . Then the Ry rotation can be expressed as where jk is the local control qubit.figure 5 shows the circuit designed for    After implementing the group of controlled Ry rotations, the state evolves into where . Combining this equation and equation (6) where |  represents the states of register E except 22  |1 n   .
Finally, the basis is changed back into eigenstates, and the system state evolves into Now, in NISQ devices, we repeatedly measure the ancillary qubit until obtaining state |1 to collapse the system state into 2 2 . Alternatively, in the future fault-tolerant quantum devices, before measurement, amplitude amplification can be applied to increase the success probability of obtaining the expected state.That is, when the number of measurement reduced to nearly one, the algorithm become a deterministic one.

Complexity, depth and error
Now we analyze the complexity, depth and error of our algorithm.As can be seen from figure 1, the algorithm needs 3n qubits, where n is the logarithmic of the dimension of the linear system of equations.Considering the ancillary qubits, the number of qubits required is at least 3n+1.More ancillary qubits can be, of course, added to reduce the complexity of decomposition of gates [30].The total number of gates required is 5/3n 3 in one-and two-qubit gates.Furthermore, if the new technology of multi-controlled one-qubit and one-controlled multi-qubit gates can be applied, like the i-Toffoli/CNOT n operations [31], the number of basic gates required can be reduced dramatically.
The depth of the quantum circuit mainly depends on the way of implementing the group of Ry rotations as shown in figure 1.If the RYm modules are executed serially as shown in figure 3, the depth of the circuit is 5/3n 3 .But the depth can be reduced to 10n 2 by parallelizing the RYm modules.The parallelization is achieved by simply adding another n-2 qubits.Details are discussed in Appendix B.
Compared to the previous arithmetic-based quantum Poisson solver whose qubit complexity is 22n 2 and gate 20000n 3 (only count the highest order) [18], the number of qubits is reduced by one order, and constant factor of the gate complexity is reduced by ten thousand times.This reduction is critical to make the algorithm executable on NISQ computers, and furthermore it reduces dramatically the consumption of the expensive non-Clifford gate on the future fault-tolerant quantum computers with error correction.
Since our algorithm is mainly based on Ry rotations, the major operations are performed on probability amplitudes, which in theory do not cause errors, i.e. without the truncation errors in the phase estimation and arithmetic algorithms.The solution error comes only from the initial finite-difference approximation of the Poisson equation.Recalling that the truncation error is ε=1/N 2 , and n = logN, we have n = 1/2×log(1/ε).Therefore, the complexity of our algorithm is log(1/ε) in qubits and poly(log(1/ε)) in quantum operations.
Since the cost of our circuit is low and the depth can be shallow, the present QPS has the potential to be applied on NISQ devices for solving a non-trivial problem.For example, if solving the one-dimensional Poisson equation by discretizing it with 2 15  points, our QPS can be implemented using 46 qubits and tens of thousands of elementary gates.After parallelization, the depth of the circuit is several thousand with a need of 60 qubits.This cost is acceptable for the NISQ devices with fidelity of 99.99% [33].The output state encodes the solutions of more than 32 thousand points into the probability amplitudes.

Demonstration and execution
We demonstrate our algorithm using a quantum virtual computing system installed on the Sunway TaihuLight Supercomputer in Wuxi, China [34].The detailed circuits for the cases of n=2 and 3 are shown in figures 6 and 8, respectively.
The circuit costs only 6 qubits and 70 one-and two-qubit gates without counting the initialization of b.When demonstrating the circuit on the virtual system, the state of 2 |b is initialized as 1 2 |01 1 2|10 1 2|11      , and the result is [0.552987, 0.674065, 0.489736] T after normalized.The expected result calculated using python is [0.552988, 0.674065, 0.489736] T , so the difference is less than 2 -21 .The circuit of figure 6 is optimized further and executed successfully on the nearterm small-scale superconducting quantum computers through the cloud of IBMQ [35]; for more details see Appendix C. figure 7 shows the comparison between the experimental and theoretical results.The successful execution makes our QPS the first rudiment of practical differential equation solver for NISQ computers.Starting from n=3, the circuit is designed based on equation ( 6) in a very direct way, and it scales in a modular way with respect to n.We demonstrate the circuits for n=3 up to n=10 on the virtual system.For the cases of n≥8, the amplitudes of states corresponding to large eigenvalues cannot be calculated accurately due to the limitation of the virtual system's precision.

Conclurions and outlook
We present a compact quantum Poisson solver (QPS) for solving the one-dimensional Poisson equation.The major operations in the QPS are performed on the probability amplitudes through single-qubit Ry rotation.This is the essential advance of the present QPS compared to the ones based on expensive quantum arithmetic, as well as the phase estimation and Hamiltonian simulation routines.The error of the solution state comes only from the finite difference approximation of the Poisson equation.The idea of implementing function computation on probability amplitude by single-qubit rotation should be inspirational in many ways for optimizing other quantum algorithms.
The costs of our QPS are nearly optimal, that is, 3n in qubits and 5/3n 3 in one-and two-qubit gates, where n is the logarithmic of the number of discrete points.Moreover, the dependence of the complexity on solution error is log(1/ε) in qubits and poly(log(1/ε)) in quantum operations.When solving the Poisson equation by discretizing it into 2 15 points, our algorithm needs only 46 qubits and tens of thousands of one-and two-qubit gates, and the error of the solution state is, in theory, 2 -30 .So the present QPS would be capable of running on near-future Noisy Intermediate-Scale Quantum (NISQ) devices for real applications.
An open question is how to extend the present algorithm to d-dimensional Poisson equations.For higher dimensions, the eigenvalues are the sum of that of onedimensional case.We think there should be an intuitive way to inverse the eigenvalues on probability amplitudes, like the technique of quantum signal processing [36], and achieve a complexity with nearly linear dependence on dimensions.
Firstly, we study the distribution characteristics of the terms on the left hand side of the following equation derived from equation ( 5), The angular coefficient j/2 n+1 of each term is listed in the way as shown in figure A1.Apparently, each layer contains 2 n-1 odd terms that j is an odd number, and 2 n-1 -1 even terms that j is an even number.All of the even terms in one layer are actually the ones of the adjacent upper layer, and each even term always corresponds to an odd one of some upper layer.That is, all the terms of one layer, say n=4 layer, are actually the odd terms of the present and all upper layers, i.e. layers of n=4, 3, 2, 1.So all the terms are divided into n groups and each group contains only the odd terms of that layer.Using the above distribution characteristic, the product can be re-organized with only the odd terms of each layer as follows, Secondly, we analyze the reciprocal of each term sin 2 (jπ/2 n+1 ).The following discussions are for two conditions, one for j being an odd number and the other for j being an even number.
When j is an odd number belonging to [1, 2 n -1], starting from equation (A.1) we have The above derivation contains a recursive process that cut the number of terms of the product by half utilizing the symmetry of distribution in each layer as shown in figure A1.The two trigonometric formulas, i.e. sin2θ = 2sinθcosθ and sinθ = cos(π/2-θ), are used repeatedly in the derivation.So the reciprocal of eigenvalues with j being an odd number can be calculated by the (n-1) terms as follows, When j is an even number, it can be transformed to an odd one in some upper layer.Suppose j = 2 m i, where i is an odd number and m is an integer belonging to [1, n-2], then the even term j corresponds to the odd term i in the upper m th layer.In such case, sin -2 (jπ/2 n+1 ) can be calculated as In fact, equation (A.3) for odd j can be seen as a particular case of equation (A.5) with m = 0. So, in summary, the reciprocal of eigenvalues for any j can be calculated by The lower bound of eigenvalues equals to 8 when n≥2.So after transposing 2 -3 to the left side of equation (A.6) and transforming the factor 4 -m into a sine form, equation (A.6) turns to This is pretty intriguing.The reciprocal of 8/λj is the product of (n-1) terms and each term is a square of sine value.The first m terms are constants, namely sin 2 (/6).Put it another way, we can inverse the eigenvalues through (n-1) sine-square terms.The number of terms of the product is reduced exponentially from (2 n -2) to (n-1), and this guarantees that the complexity of our algorithm is 3n in qubits.

Appendix B. Parallelization of RYm modules
In figure 3, the RYm modules are executed serially.The gate-complexity of the circuit is low, but the depth is high to run efficiently on NISQ devices.Here we propose a feasible way of paralleling the RYm modules to reduce the depth from O(n 3 ) to O(n 2 ).The parallelization consists of three aspects as follows.
Firstly, parallelize the control qubits in figure 3. Another register C with (n-2) qubits is allocated to store the global control qubits.The control qubits are parallelized by the CP (Control-qubits Parallelization) module as shown in figure B1.
Secondly, break up each RYm module in figure 4    As can be seen, the depth of the parallel-version QPS mainly depends on the RY0 and CP modules.The gate-complexity of RYm modules turns to 4n in qubits and the depth reduces from 5/3n 3 to 10n 2 .When n=15, the depth of the circuit inversing eigenvalues based on serial and parallel RYm modules is about 8000 and 1800, respectively.Since the decomposition of the multi-controlled one-qubit and one-controlled multi-qubit gates constitute the major cost, the depth of TN in CB module (in figure 2) and CP in RYm can be reduced to O(n) using the technology of iToffoli/CNOT n gates [31].Additionally, the depth of FT2N in CB can also be reduced to O(n) even O(logn) [37,38].

Appendix C. Execution on IBM quantum computers
Today's gate-based non fault-tolerant quantum computers that has noise and limited resources eclipse the realization of certain quantum algorithms because it can only provide limited width and depth for the reliable execution of the algorithms on it, among which is the famous quantum linear system problem (QLSP) [33,39].Here, with several deeper optimizations, we show that the basic case of n=2 of our QPS can be executed successfully on the near-term superconducting quantum computers publicly provided by IBMQ.The width and depth of the circuit is lowed to 3 and 21, respectively.The optimization mainly comprise the following two aspects: (1) decomposition of the CB module, (2) calculation of the eigenvalue reciprocal (the controlled Ry rotations).

C.2. Calculation of the eigenvalue reciprocals
The main text shows that the calculation of each 8/λj is carried out by two Ry gates in figure 6, namely two Ry gates produce a term of sine square.This circuit has well scalability, but the complexity is slightly high.To simplify the execution of the circuit, the two controlled Ry gates is shrunk to single one, namely sinθj=sin 2 ϕj=8/λj.The optimized circuit is shown in figure C2.Both the number of controlled Ry gates and qubits is reduced to 3. Up to now, the logical circuit is optimal.However the tranpiled circuit need to be taken care for their diverse architectures, namely the different connectivity types between qubits in different real backends [41,42].The 2-qubit operations can merely perform on the two qubits with direct connectivity, while the indirectly connected qubits need to move to be adjacent through SWAP gates first.For the circuit of our QPS, after transpiled on different backends, the depth is 23 on ibmq_santiago and ibmq_vigo with limited connectivity and 21 on ibmqx2 with fully connected qubits (qubits 0,1,2 or 2,3,4 as shown in figure C4).The circuits are shown in figure C3.Note that although the circuit for ibmq_santiago has 3 more CNOT gates than that for ibmqx2, ibmq_santiago is still a better choice for us because of its much smaller error rate relative to ibmqx2 as shown in figure C4.
The CNOT error rate ε of ibmq_santiago is about 0.006 among 2, 3 and 4 while ibmqx2 is 0.02.In fact, ibmq_santiago is far from reaching the critically executable relation of w•d≪1/ε, but it is sufficient for us to obtain a solution that is consistent with the theoretical results as shown in figure 7 in the main text.
Furthermore, the error of the probability of the target state is reduced to less than 10% by the technique of leakage elimination [44] while the depth of the circuit remains unchanged.As aforementioned, the logical circuit is optimal in each module, so the depth of the entire circuit can be reduced to less than 20 based on the methods for twoeven three-qubit operator decompositions [40,43,45].

Figure 1 .
Figure 1.The overall circuit of the present QPS.CB represents the change of basis from eigenstates | j u  to computational basis | j .The controlled Ry rotation and NOT operations are used to inverse the eigenvalues on the probability amplitudes, where s denotes the sine values sin(jπ/2 n+1 ).It is assumed that

S
 represents the Fourier, cosine and sine transform, respectively.The circuit to implement the sine transform 1 I N S  is shown in figure 2.

Figure 2 .
Figure 2. The circuits for the sine transform (a), and for the transform TN (b).An ancillary qubit set

2 yR
 operation based on this equation.

Figure 3 .
Figure 3.The overall circuit to calculate the reciprocal of eigenvalues.In general, the circuit consists of (n-1) controlled RYm modules numbered with m from 0 to n-2.Each RYm module is controlled by two kinds of control qubits called global and local control qubits.The global control qubits are the lower (m+1) qubits in Reg.B which are represented by the red dots and circles.The global control qubits are actually used to pick out such j that 2 -m j is an odd number.The local ones are the higher (n-m) qubits of Reg.B inside the X modules which will be described below.Note that the (n-1) RYm modules are computed serially.They can turn to be parallel by adding n-2 qubits, and then the depth of the circuit is reduced from O(n 3 ) to O(n 2 ).A feasible circuit design for such

Figure 4 .
Figure 4.The circuit design for RY0 module (a), and RY1 module (b).Each RY module consists of n-1 terms of sine-square values.Each 2 y R  operator produce one term of sine-square.The 2 y R 

Figure 5 .
Figure 5.The circuit for the first 2 y R  operation in RY0 module in figure 4. The black dots are the

Figure 6 .
Figure 6.The circuit for the case of n=2.It needs only 6 qubits and 70 one-and two-qubit gates.

Figure 7 .
Figure 7.The probability distribution of the final states of theory (blue, left bar) and experiment (orange, right bar).The experimental results were obtained by the IBM quantum computer of ibmq_santiago, 27 depth, 4096 shots, calibrated on Nov 29, 2020.The q0 is the ancillary qubit in figure 6.More details about the experiment are provided in Appendix C (the codes are provided in the supplementary material).The error of the probability of the target state is lower than 15%.

Figure 8 .
Figure 8.The circuit for the case of n=3.Being different with that of n=2, this circuit is designed in a direct way according to equation (6).Totally, 12 qubits and 200 one-and two-qubit gates are used.

Figure A1 .
Figure A1.The distribution of the angular coefficient j/2 n+1 of each term in equation (5) for n=1, 2, 3, 4. The characteristic of distribution is that the terms of one layer consist of all the odd terms of the present and all upper layers.

2 yR
figure B1 shows the circuit for RYP0 and RYP1.In such way, all the 2 y R  are rearranged to be parallel in each RYP module.Thirdly, as shown in figure 5, the controlled 2 ( 2 ) i y R  

Figure B1 .
Figure B1.A feasible way of parallelizing the RY modules in figure 3. The CP module is used to

Figure B2 .
Figure B2.The way of parallelizing the controlled 2( 2 )   i y R  

Figure C1 .
Figure C1.The decomposition and execution of the 4×4 CB module on IBM quantum computers.

Figure C3 .
Figure C3.The tranpiled circuits on ibmq_santiago (a) and ibmqx2 (b).A SWAP is introduced, as shown in the red grid in (a), to connect the physical qubits of q0 and q2 through the middle q1 while the fully connected one shown in (b) needs no SWAP.

Figure C4 .
Figure C4.The qubit architecture and error rates of ibmq_santiago (a) and ibmqx2 (b).The calibration time is Nov 27 and 24, 2020, respectively.