On the importance of data encoding in quantum Boltzmann methods

In recent years, quantum Boltzmann methods have gained more and more interest as they might provide a viable path towards solving fluid dynamics problems on quantum computers once this emerging compute technology has matured and fault-tolerant many-qubit systems become available. The major challenge in developing a start-to-end quantum algorithm for the Boltzmann equation consists in encoding relevant data efficiently in quantum bits (qubits) and formulating the streaming, collision and reflection steps as one comprehensive unitary operation. The current literature on quantum Boltzmann methods mostly proposes data encodings and quantum primitives for individual phases of the pipeline assuming that they can be combined to a full algorithm. In this paper we disprove this assumption by showing that for encodings commonly discussed in literature either the collision or the streaming step cannot be unitary. Building on this landmark result we propose a novel encoding in which the number of qubits used to encode the velocity depends on the number of time steps one wishes to simulate, with the upper bound depending on the total number of grid points. In light of the non-unitarity result established for existing encodings, our encoding method is to the best of our knowledge the only one currently known that can be used for a start-to-end quantum Boltzmann solver where both the collision and the streaming step are implemented as a unitary operation. Furthermore our theoretical unitarity result can serve as a guideline on which types of encodings to consider or whether a `stop-and-go' method with repeated measurements and re-initializations is the method of choice.


Introduction
Since the first quantum computing boom in the 1990s, quantum computational fluid dynamics (QCFD) has been a field of interest to researchers worldwide.Due to the high computational demands of classic CFD the exponential potential of quantum computers in combination with quantum parallelism and quantum indeterminacy has caused interest in the application.The first QCFD algorithms were proposed by Yepez and his co-workers around the turn of the century [Yep98; Yep01; YB01; Yep02; Pra+03].These algorithms are based on a quantum distributed computing approach assuming that many small-scale quantum computers are more realistic than one large many-qubit system.The core idea of the so-called quantum lattice-gas model is that each grid point of position-space gets its own 6-qubit quantum computer associated to it (which can also be groups of 6 qubits of a future many-qubit quantum computer).The benefit of this approach is that the possible quantum circuit depth and stable entanglement required remains very low, making it a realistic and relatively near-term approach given the capabilities of current quantum devices.Its downside is that to encode a grid of size N a total of 6N qubits are required, which means that the amount of qubits required grows linearly with the size of the grid.Given the limited amount of quantum devices available and the large amount of grid points required for solving practical problems with modern Boltzmann methods, this distributed approach proves a significant drawback.Furthermore, as we will show below, the computational basis state encoding of the velocity vector adopted in the aforementioned papers does not allow for implementing the streaming step as a unitary operator so that measurement and state re-initialization is mandatory after each time step.
After these early results by Yepez et al., the QCFD field became stagnant for about a decade until its recent resurgence, in particular, in the form of quantum Boltzmann methods.Most recent are the methods presented in [TS20; Bud20; Bud21; MVS22; SM22 ;Ste23], that all have their own strengths and weaknesses.Some papers include a streaming and specular reflection mechanism, but no collision methods yet [TS20; Bud20; SM22].Other approaches have implemented a collision method using the linear combination of unitary approach [CW12], causing the algorithm to require a measurement-and-restart strategy after each time step [Bud21].Due to the high costs of quantum state preparation and the chance of measurement errors this 'stop-and-go' strategy is hardly usable in practice.Other algorithms have managed to create a unitary collision operator, but have not yet been able to combine this with a streaming step into one start-to-end algorithm [MVS22;Ste23].
What remained an open problem is the development of a full-fledged quantum Boltzmann method (QBM) that implements both the streaming and the collision step as unitary operations.In this paper we present the first-of-its-kind full-fledged QBM building on a novel encoding scheme of the velocity vector that scales with the number of time steps.Furthermore, we prove rigorously that for the encoding schemes considered for universal quantum computers in all previous publications it is impossible to implement both streaming and collision as a unitary, downgrading them as candidates for any practical QBM.Taking both contributions of this paper together, our new encoding and the theoretical (negative) result on existing encodings, we hope to stimulate a paradigm shift in QBM research from focusing on encodings and algorithms for individual steps of the pipeline to developing full-fledged QBM algorithms.

Lattice Boltzmann method
In the Boltzmann method the macroscopic behavior of a fluid is simulated by considering the microscopic behavior of the fluid particles as they move through space and deriving the macroscopic quantities via averaging-based postprocessing, instead of encoding the macroscopic variables directly, as is commonly done in other CFD methods like the finite volume method.
In this paper we consider the discrete lattice Boltzmann method, where a particle can only move with specific velocities taken from a finite set of discrete velocities.We define the structure of the method using the DnQm system, where n represents the amount of spatial dimensions and m the amount of discrete velocities considered.Figure 1 gives examples of the commonly used D1Q2, D1Q3, D2Q5 and D2Q9 systems, respectively, in standard Boltzmann convention.For an in depth review of the lattice Boltzmann method we refer to the book [Krü+17].
Boltzmann methods simulate the macroscopic behavior of a fluid or gas by implementing a streaming step followed by particle collision on the microscopic level in each time step.When obstacles are present an additional reflection step is performed in each time step.For brevity we omit a detailed description of the latter and refer the interested reader to our recent work [SM22] on this topic.time t The streaming step is implemented by letting the particles move by one grid point per time step in the direction they are traveling currently.Figure 2 illustrates how the particles travel in one time step from the point x to x ± 1 respectively for the D1Q2 case.Similar illustrations can be constructed for twoand three-dimensional cases but are omitted here for brevity reasons.
To implement the collision step we define so-called equivalence classes of streaming patterns which have the same total mass and momentum and are thus considered to be equivalent.A combination of colliding particles can therefore be transformed into any combination from the same equivalence class upon collision without changing the total mass and momentum.Figure 3 shows an example of two equivalent velocity combinations for the D2Q5 (and D2Q4) case.
In Section 4, we present a lattice Boltzmann encoding for which both the collision and the streaming step can be performed through unitary operations and thus admit a straightforward implementation on a sufficiently large faulttolerance quantum computer.Before that, in Section 3, we provide rigorous mathematical proofs that show that such a unitary treatment of both streaming and collision is impossible with the encodings adopted in current literature, thereby underpinning the uniqueness of our proposed encoding.
Figure 3: Illustration of two velocity combinations of the D2Q5 (and D2Q4) velocity spectrum that belong to the same equivalence class with total momentum 0 and mass 2: (a) particles streaming in the q 1 and q 3 direction, and (b) particles streaming in the q 2 and q 4 direction.

Data encoding
As in any computational field, data encoding is pivotal for reaching a good result.More than five decades of classical CFD research and application have established 'good practices' for storing field data such as densities and velocities at, e.g., the grid points or cell centers as floating-point numbers following the IEEE-754 standard.Every now and then new hardware developments stimulate research into non-standard formats, like reduced or mixed-precision [Fre+22], but, in general, data encoding is not considered to be an open problem.Not so in QCFD and, in particular, quantum Boltzmann methods.In this section we will review the main data encodings currently used for QBM and show that in all of them either the streaming step or the collision step cannot be unitary.This result, though discouraging at first sight, should be interpreted as wake-up call that novel quantum encodings for CFD states are imperative for devising full-fledged QCFD applications in the future.We propose one such novel encoding in Section 4 and discuss its potential and limitations.
The two mainstream encodings of the velocity vector are the amplitude based encoding and the computational basis state encoding.In what follows, we will consider both approaches separately and show how they both lead to a contradiction in the unitarity of either the collision or the streaming operation.

Amplitude based encoding
The first type of encoding we consider is the so-called amplitude based encoding, used for several quantum Boltzmann methods [TS20; Bud20; Bud21; SM22].The amplitude based encoding of the velocity vector is such that at each location |x there can be multiple particles with different velocities, for instance |v 0 , |v 1 , |v 2 and |v 3 for D2Q4.Here and below, |i denotes the representation of i as bit string.The state of the system at this point x can then be encoded as where α 0 , α 1 , α 2 and α 3 are complex numbers that simply represent the relative weight or amount of particles traveling at the given velocity at grid point x.For simplicity we will assume that , and so in this example there are only particles at grid point x but the proof extends trivially to the general case with particles spread around the grid.
In order to show that this encoding of the velocity vector inevitably leads to non-unitary collision operators, let us assume a system in a specific state |ψ 1 , with only particles traveling with velocities |v 0 and |v 1 , meaning that |α 0 |, |α 1 | > 0 and α 2 = α 3 = 0. Then we can write the state of the system as (2) Now assume that an equivalent velocity combination exists consisting of particles traveling with velocities β 2 |v 2 + β 3 |v 3 , where we have To realize this potential outcome of a collision as a quantum algorithm, we need to implement the transformation between both equivalent states as a unitary operation U col which changes the states of the velocity encodings as follows Here, if γ 0 = 1 and γ 1 = 0 no collision is taking place (and we simply implement an identity operation) and if γ 1 = 1 we fully change from the original velocities to its alternative representative from the same equivalence class.2Note that to preserve unitarity Let us now consider another system in state |ψ 2 = |x |v 2 .Applying the unitary operation U col should not effect the state at all as a single speed is only in an equivalence class with itself, and so the required behavior for U col is with θ ∈ (0, 2π].That is, the collision operator must preserve the single-velocity state except for changes in the phase factor e iθ that can be neglected.Now that we have identified the required behavior for U col to implement a collision operation, we can prove that any U col that meets both requirements simultaneously cannot be unitary.Here, we resort to the characterization U † col U col = I of unitary operators, with superscript † denoting the adjoint operator.
Proof.To reach a contradiction, assume that U col is a unitary operator.Then it must preserve the inner product for all possible states |φ 1 and |φ 2 However, for a collision operation U col that behaves as expected on the system states described in Equations ( 2) to (4), it follows that The first equality follows from the fact that |ψ 1 and |ψ 2 are orthogonal by construction.The second one holds under the assumption of U col being unitary, which is disproved by the fact that the entire equality chain only holds for the trivial case γ 1 = 0 (as |β 2 | > 0 by definition of the state |ψ 1 ), that is, when U col does not implement the collision operation.From this we can conclude that an amplitude based encoding of the velocity does not allow for a unitary implementation of the collision operation.
Notice that this proof works for any amplitude based encoding of v where the different possible velocities at a position are all represented by their own basis state as there will always be a case with only a single incoming velocity, for which an identity operation up to a phase shift should take place, while at the same time there will be combinations of velocities for which we want some weight of the system to change from one combination of velocities to another combination of velocities in the same equivalence class.These two antagonizing requirements will always lead to the same contradiction of unitarity proven above and we further expand on this intuition in Section 3.3.

Computational basis state encoding
The second type of encoding of a quantum state considered is the computational basis encoding, used in several quantum lattice Boltzmann papers such as [Yep98; Yep01; YB01; Yep02; Pra+03; MVS22; Ste23].Using this encoding the contradiction of unitarity in the collision operation can be avoided by encoding the velocity of the qubits at a position |x in space by identifying each direction particles could be streamed from with its own qubit, which will be set to one if and only if there is a particle streaming from that direction.
As an example consider the D2Q4 lattice depicted in Figure 4.In this case the velocity can be encoded using four qubits q 0 , q 1 , q 2 and q 3 where the state is such that from the center point (1, 1), there is a particle streaming to (1, 2) and a particle streaming to (0, 1) but not to (2, 1) or (1, 0).
Figure 4: Illustration of the computational basis state encoding for the D2Q4 lattice.For each grid point x we set the respective qubit q j to one if and only if there is a particle streaming in that direction, i.e. |v = |q 0 q 1 q 2 q 3 = |0110 .
Using this encoding the collision step can be defined quite naturally as unitary operation.However, we run into trouble when attempting to define a unitary streaming step U str as we demonstrate in what follows.
To simplify notation let us restrict ourselves to the D1Q2 lattice and consider the two settings at time t from Figures 5 and 6, which can be encoded as and respectively.It then follows directly that Upon streaming, the systems from Figures 5 and 6 change from their state at time t (top lattice) to that at time t + 1 (bottom lattice), i.e.
Figure 5: D1Q2 example setting 1.The binary encoding above the arrows indicate whether or not a particle is flowing there in that time step. 1 indicates that there is a particle streaming there and 0 indicates that there is no particle.
In the example setting we consider periodic boundary conditions.The top figure shows the state of the system at time t.The figure below shows the state of the system at time t + 1. and respectively.As in the previous section, we will show by contradiction that any operation U str for which U str |ψ 1 = |ψ 1 and U str |ψ 2 = |ψ 2 cannot be unitary.
Proof.Let us assume that U str is unitary, i.e. it preserves the inner product for all states |φ 1 , |φ 2 .Substituting the states (9) and (10) on the left side, and ( 12) and (13) into the right inner product we arrive at the contradiction The first equality follows from the orthogonality property (11), and the second one from the assumption that U str is a unitary operator, which we just disproved.
As in Section 3.1 this proof extends to any computational basis encoding where each possible combination of velocities at a specific lattice point is encoded using its own basis state, as one can always construct two situations with no overlap at time t that will have non-zero overlap after streaming at time t + 1.
Figure 6: D1Q2 example setting 2, the binary encoding above the arrows indicate whether or not a particle is flowing there in that time step. 1 indicates that there is a particle streaming there and 0 indicates that there is no particle.In the example setting we consider periodic boundary conditions.The top figure shows the state of the system at time t.The figure below shows the state of the system at time t + 1.
This proof also extends trivially to any other DnQm setting as the streaming possibilities of D1Q2 are essentially a subset of any other system and thus the same example can be used by setting the other streaming directions to 0.

Intuition and extension of non-unitarity proofs
In this section we expand on our non-unitarity proofs by providing physical intuition behind the proofs presented above.It is intended to give insight into what types of encodings our non-unitarity proof extends to, and what physical features of the system necessarily lead to the non-unitarity for these encodings.
Consider the proof from Section 3.1 that shows that the amplitude based encoding, where each velocity direction is identified through its own basis state leaving the total velocity at a position x to be a superposition of such basis states, prevents the collision operator U col from being unitary.Since it encodes each streaming direction as a different basis state, the quantum encodings of the velocity directions are all orthogonal to one another.This is also necessary, since if the basis states of the possible streaming directions are not orthogonal, we cannot fully distinguish between them.However, this orthogonality of the different velocity directions leads directly to the non-unitarity of U col .Since a collision operator that will rotate a given linear combination of basis states into a linear combination of other basis states in such a way that the represented streaming patterns belong to the same equivalence class, it will also rotate 'pure' velocities represented by a single basis state into another basis state leading to a nonphysical and undesired change of velocities.
Following this line of argumentation it can be seen that the non-unitarity of U col is not so much a result of a specific choice of encoding but an inherent non-unitarity of the collision step itself that directly leads to the idea of computational basis state encoding, where each velocity pattern (i.e. the combination of velocities) at a grid point is encoded as its own basis state, and not as a unitary combination of all the basis states representing a non-zero contribution.
When encoding the velocity pattern at each grid point as a basis state, naturally, the non-unitarity of collision falls away and we can find a straightforward unitary operator to implement the collision step.However, such an encoding will always lead to non-unitarity of streaming due to the non-local nature of a streaming operation.Consider an arbitrary point in space x and imagine two different scenarios with two different combinations of speeds |v 1 and |v 2 at this point.Then the inner product between |x |v 1 and |x |v 2 must be 0, as these are different basis states.However, the velocity states of the systems at position x in the next time step do not depend on the current velocity states in the lattice point.In fact, they only depend on the velocity states of the neighboring lattice points.Since the inner product of the states at the point x at the next time step does not depend on the current states at the point x, in the next time step the velocity at the point x of the two systems could be identical, and hence, the inner product could be one.There is no way of ensuring that this can only happen when the inner product at some other point x of the systems was non-zero before as each grid point has velocity vectors in multiple directions determining its associated velocity basis state.
This shows that any quantum encoding that successfully implements both streaming and collision as a unitary operation must belong to one of the following three types.The first type is an amplitude based type encoding, where the different velocities are not orthogonal and thus not entirely distinguishable.The second type is a computational basis state encoding where the non-locality of streaming is somehow avoided.The last type is a completely novel encoding method that avoids both non-unitarity problems entirely.In the next section we will present precisely one such idea.

Space-time data encoding
In this section we propose a novel space-time data encoding that enables unitary collision and streaming at the same time.To the best of our knowledge, this is the first-of-its-kind start-to-end quantum Boltzmann algorithm that does not require measurement and quantum-state re-initialization after each time step.
In what follows, we adopt an extended computational basis state encoding, where at each location x we take into account the velocities at all grid points in the vicinity of x.Here, 'in the vicinity of x' means that a particle can theoretically reach the grid point x within the number of time steps still to be performed before measurement.Mathematically speaking being 'in the vicinity of x' means being, respectively, in the so-called extended von Neumann, Moore or hexagonal neighborhood of the point x, depending on the lattice structure. 3his leads to a trade-off between the number of time steps that can be performed between measurements and the number of qubits required to encode the velocity at each grid point x.The more time steps one wishes to take between measurement-and-re-initialization cycles, the more qubits are required for our space-time encoding.Obviously the maximum number of qubits required to implement the velocity without any in-between measurements must be such that the entire grid is spanned.For a DnQm lattice this will be mN g , where N g is the total number of grid points.When encoding the proposed method on a classical computer mN g bits would also be required, so when encoding the full domain there is no quantum benefit in terms of (qu)bit numbers.The quantum improvement comes from exploiting quantum parallelism, which is done as long as we do not encode the whole space.
In what follows, let N t denote the number of streaming steps to be performed between (re-)initialization and measurement.We extend the computational basis state encoding of velocity directions from Section 3.2 to take into account all the speed states from grid points in the neighborhood of x that can (at least theoretically) reach x within N t streaming steps.This takes away the non-locality of the streaming operator, which led to the non-unitarity of U str for the 'regular' computational basis state encoding at the cost of increasing the number of qubits required to encode all required velocity data.
We will give a detailed description of this encoding for the D2Q4 lattice, but want to note that it can be extended naturally to any other choice of DnQm.Consider the D2Q4 lattice given in Figure 7 with qubit q j set to one if and only if there is a particle traveling with velocity direction j from grid point x into a neighboring grid point in the current time step.We now extend this encoding to include all possible velocities at positions 'in the vicinity of x' for the total of N t time steps in order to obtain a unitarily streamable encoding.This is illustrated in Figure 8 for a single time step, i.e.N t = 1 yielding the encoding |x |q 19 q 18 . . .q 0 . (16) For D2Q4, the number of qubits encoding the possible velocity states per grid location x grows with the number of time steps (still) to be taken as where the maximum number of qubits required to encode all velocity directions over the entire grid equals 4N g as stated before. 4Similarly it can be shown that for d dimensions the growth rate is of the order O N d t .
We can now encode the collision step by first identifying the equivalence class for the D2Q4 lattice.We note that at each grid point x as represented in Figure 7 the states |q 0 q 1 q 2 q 3 = |1010 and |q 0 q 1 q 2 q 3 = |0101 belong to the same equivalence class (cf. Figure 3), as they have the same total mass and momentum. 5We implement the collision step by defining a unitary operator U col which performs the following mappings with α, β ∈ C and |α| 2 + |β| 2 = 1, while acting as the identity operation on any other basis state.It can easily be verified that this operation is unitary.
With the so-defined U col ∈ C 2 4 ⊗2 4 , we can write the total collision operation for an encoding of the velocities states v consisting of n v = 4k qubits as k-fold Kronecker products of U col operations, i.e.
Since each U col requires a few CNOT and a single triple controlled rotation gate the total collision operator can be efficiently implemented even on near-term devices. 6 In practice the total collision operator U tot col differs per time step, since its local counterpart U col only needs to be applied to velocity states 'in the vicinity of x'.In the first out of the N t time steps it is important for all qubits representing velocity states 'in the vicinity of x' to be updated correctly.In the very last time step, however, it is only important for the qubits q 0 , q 1 , q 2 and q 3 to end up in the correct state.The more time steps t have been taken, the less time steps N t − t are still to be taken and so the 4-qubit local collision operator U col only needs to be applied to the remaining qubits relevant for encoding the 'directly connected' velocity states as given in Equation (17).
With this logic we can define a collision operator per time step t as where c = 2(N t − t) 2 + 2(N t − t) + 1 and the identity operations are added to avoid dimensionality issues.In practice no operation will be applied on the qubits encoding velocity states not 'in the vicinity of x' within N t − t time steps.Our space-time encoding enables different manners of implementing the streaming step.It can easily be seen that the way the streaming method should be implemented differs per time step t depending on which positions will be 'in the vicinity of x' in the next time step as well.At the first time step it The number of qubits required is equal to the number of points in the extended Von Neumann, Moore or hexagonal neighborhood, depending on which choice of n and m considered. 5The other equivalence classes are |q 0 q 1 q 2 q 3 = |1000 and |q 0 q 1 q 2 q 3 = |1100 and all cyclic shifts of these patterns, and |q 0 q 1 q 2 q 3 = |1111 .However, they all have just a single representative so that we define the collision operator based on the ambiguous case.
6 We can implement the described collision operator by first applying three CNOT operations to the system turning the states into |1010 → |1110 and |0101 → |1111 .Subsequently a triple controlled rotation operation of choice is applied to the right-most qubit (controlled on the three left-most qubits).Finally the initial three CNOT operations are applied in reverse order to reset all velocity states correctly.
is important for (almost) all qubits to be streamed to a very specific position, whereas in the last time step it is only important for the qubits q 0 , q 1 , q 2 and q 3 to end up in the correct state.For the example shown below we are only considering a total of one step to be taken (i.e.N t = 1) and so we only need to consider the speeds that will stream to location x in one time step.In this case that means that streaming consists of performing a swap operation between the following qubit pairs q 0 and q 12 , q 1 and q 17 , q 2 and q 6 as well as q 3 and q 11 .Also in general (i.e.N t > 1), the streaming step can be implemented by a combination of swap gates.Following the same in-the-vicinity-of-x argument as was used for the collision step, a total of n swap (t) = 4 + Nt−t i=1 16i = 8 (N t − t) 2 + 8 (N t − t) + 4 (21) swap gates are required to update as many velocity-encoding qubits in time step t, whereby these swap operations can be performed largely in parallel. 7 The depth of the streaming circuit at time t will amount to swap operations at time t.0 0 1 1 2 2 q 2 q 0 q 3 q 1 Figure 7: Illustration of the computational basis state encoding for D2Q4.
7 In each time step the swap operations in the 4 (or generally speaking m) different directions can be performed in parallel.Furthermore the swap operations for the velocities in the same direction but not in the same 'line of streaming' can all be performed in parallel.Therefore we only need to take into account the velocities in the same line of streaming and the depth of the circuit is determined by the longest 'line of streaming', which is equal to T − t.In each layer of the swap operations at least half of the T − t velocities can be swapped to the correct position.Therefore a total of log 2 (T − t) swap operations needs to be performed in the t-th time step.

Conclusion
In this paper we have shown that current data encoding methods considered for quantum Boltzmann methods do not allow for treating both streaming and collision as unitary quantum operations.We have provided both a mathematical proof of its impossibility, and insight into the physical properties of the system and encodings that lead to this behavior.Using this insight we subsequently developed a new space-time data encoding method that does allow for both streaming and collision to be implemented as a unitary operation.This paper should serve as a guideline on where (not) to look for successful quantum encodings of the lattice Boltzmann and other QCFD methods.
x q 2 q 0 q 3 q 1 q 14 q 12 q 15 q 13 q 18 q 16 q 19 q 17 q 10 q 8 q 11 q 9 q 6 q 4 q 7 q 5 Figure 8: Illustration of the space-time encoding for D2Q4 for a single time step.

Figure 1 :
Figure 1: Four examples of different types of DnQm possible.The top picture on the left portrays the D1Q2 setting and the top picture on the right portrays the D1Q3 setting (where a stationary particle can be included).The picture on the left bottom portrays the D2Q5 setting and the picture on the right below shows the D2Q9 setting.

Figure 2 :
Figure 2: Illustration of the streaming step for the D1Q2 case. Figure (a) shows the velocity vectors at position x at time t. Figure (b) shows the same after configuration at time t+1 after particles have moved to positions x−1 and x+1, respectively.Red and blue colors identify the different streaming directions and their propagation pattern.