Forward and Backward Constrained Bisimulations for Quantum Circuits

. Eﬃcient methods for the simulation of quantum circuits on classic computers are crucial for their analysis due to the exponential growth of the problem size with the number of qubits. Here we study lumping methods based on bisimulation, an established class of techniques that has been proven successful for (classic) stochastic and deterministic systems such as Markov chains and ordinary diﬀerential equations. Forward constrained bisimulation yields a lower-dimensional model which exactly preserves quantum measurements projected on a linear subspace of interest. Backward constrained bisimulation gives a reduction that is valid on a subspace containing the circuit input, from which the circuit result can be fully recovered. We provide an algorithm to compute the constraint bisimulations yielding coarsest reductions in both cases, using a duality result relating the two notions. As applications, we provide theoretical bounds on the size of the reduced state space for well-known quantum algorithms for search, optimization, and factorization. Using a prototype implementation, we report signiﬁcant reductions on a set of benchmarks. Furthermore, we show that constraint bisimulation complements state-of-the-art methods for the simulation of quantum circuits based on decision diagrams.


INTRODUCTION
Quantum computers can solve certain problems more efficiently than classic computers.Earlier instances are Grover's quantum search [28] and Shor's factorization [49]; more recent work addresses the efficient solution of linear equations [30] and the simulation of differential equations [37].Despite its potential and increasing interest from a commercial point of view, quantum computing is still in its infancy.The number of qubits of current quantum computers is prohibitively small; furthermore, low coherence times and quantum noise lead to high error rates.Further research and improvement of quantum circuits thus hinge on the availability of efficient simulation algorithms on classic computers.Being described by a unitary complex matrix, any quantum circuit can be simulated by means of array structures and the respective matrix operations [33,39,52].Unfortunately, direct array approaches are subject to the curse of dimensionality [42] because the size of the matrix is exponential in the number of qubits.This motivated the introduction of techniques that try to overcome the exponential growth, resting, for instance, on the stabilizer formalism [1], tensor networks [57,58], path sum reductions [3] and decision diagrams [29,42,59].
Here we study bisimulation relations for quantum circuits.Bisimulation has a long tradition in computer science [48].
For the purpose of this paper, the most relevant branch of research on this topic regards bisimulations for quantitative models such as probabilistic bisimulation [9,35].This is closely related to ordinary lumpability for Markov chains [13], also known as forward bisimulation [25], which yields an aggregated Markov chain by means of partitioning the original state space, such that the probability of being in each macro-state/block is equal to the sum of the probabilities of each state in that block.Exact lumpability [13], also known as backward bisimulation [51], exploits a specific linear invariant induced by a partition of the state space such that states in the same partition block have the same probability at all time points [13].In an analogous fashion, forward and backward bisimulations have been developed for chemical reaction networks [14,15,53], rule-based systems [24,25], and ordinary differential equations [18,19].
In all these cases, lumping can be mathematically expressed as a specific linear transformation of the original state space into a reduced space that is induced by a partition.However, in general, lumping allows for arbitrary linear transformations [12,51].Since this may introduce loss of information, constrained lumping allows one to specify a subspace of interest that should be preserved in the reduction [12,31,43,55].In partition-based bisimulations, constraints can be specified as suitable user-defined initial partitions of states for which lumping is computed as their (coarsest) refinement [16,20].Bisimulation relations for dynamical systems [11,44] and the notions of constrained linear lumping in [31,43,55], instead, can be understood as linear projections (also known as "lumping schemes") into a lower-dimensional system that preserves an arbitrary linear constraint subspace.
The aim of this paper is to boost the simulation of quantum circuits via forward-and backward-type bisimulations that can be constrained to subspaces. 1nalogously to the cited literature, with forward constrained bisimulation (FCB) the aim is to obtain a lower-dimensional circuit which (exactly) preserves the behavior of the original circuit on the subset of interest.In backward constrained bisimulation (BCB), the reduction is valid on the constraint subspace; in this manner, the original quantum state can be fully recovered from the reduced circuit.Overall, this setting has complementary interpretation with respect to the analysis of a quantum circuit.It is known that a quantum state can only be accessed by means of a quantum measurement, mathematically expressed as a projection onto a given subspace.FCB, in general, preserves any projection onto the constraint subspace.That is, if the constraint subspace contains the measurement subspace, FCB will exactly preserve the quantum measurement, but the full quantum state cannot be recovered in general.Instead, constraining the invariant set of BCB to contain the input of the circuit ensures that the full circuit result can be recovered from the reduced circuit.We show that FCB and BCB are related by a duality property stating that a lumping scheme is an FCB if and only if its complex conjugate transpose is a BCB.Interestingly, this is analogous to the duality established between ordinary (forward) and exact (backward) lumpability for Markov chains [17,20], although it does not carry over to other models in general [8,54].A relevant implication of this result is that one only needs one algorithm to compute both FCB and BCB.As a further contribution of this paper, we present such an algorithm, developed as an adaptation of the CLUE method for the constrained lumping of systems of ordinary differential equations with polynomial right-hand sides [43] of which it inherits the polynomial-time complexity in matrix size.
To show the applicability of our constrained bisimulations, we analyze several case studies for which we report both theoretical and experimental results.Specifically, we first study three classic quantum circuits for search (Grover's algorithm [41, Section 6.2]), optimization [21], and factorization [41, Section 5.3.2],respectively.In Grover's algorithm, the cardinality of the search domain is exponential in the number of qubits; we prove that BCB can always reduce the circuit matrix to a 2 × 2 matrix while exactly preserving the output of interest.Next, we consider the quantum approximate optimization (QAOA) algorithm for solving SAT and MaxCut instances [21]; in this setting, our main theoretical result is an upper bound on the size of the reduced (circuit) matrix by the number of clauses (SAT) or edges (MaxCut).Finally, for quantum factorization we prove that the size of the reduced matrix gives the multiplicative order, that is, it solves the order finding problem to which the factorization problem can be reduced [41].
From an experimental viewpoint, using a prototype based on a publicly available implementation of CLUE, we compare the aforementioned theoretical bounds against the actual reductions on a set of randomly generated instances.
Moreover, we conducted a large-scale evaluation on common quantum algorithms collected from the MQT Bench repository [45], showing considerable reductions in all cases.Finally, we demonstrate that constrained bisimulation complements state-of-the-art methods for quantum circuit simulation based on decision diagrams [42], as implemented in the MQT DDSIM tool [59].
Relation to conference version.The present work extends the conference version [32] by computing quantum bisimulations using decision diagrams.Specifically, the exponential time and space complexity of Theorem 3.4 is improved in the new Theorem 3.11.The latter states that the worst-case complexity for computing the quantum bisimulation, the computational bottleneck of [32], is at most polynomial in the size of the largest decision diagram appearing during the computation of the quantum bisimulation.In a similar vein, the Python implementation from [32] has been re-implemented in C++ and uses decision diagrams instead of vectors.Using the improved framework, the quantum benchmarks from [32] have been revisited.Apart from reducing the simulation times from [32] by several orders of magnitude, it is demonstrated that the combination of quantum bisimulation and decision diagrams outperforms each approach in isolation.
Further related work.Probabilistic bisimulations [6,7,9] have been considered for quantum extensions of process calculi, see [23,26] and references therein.Similarly to their classic counterparts, these seek to identify concurrent (quantum) processes with similar behavior.The current work, instead, is about boosting the simulation of quantum circuits and is in line with [5,17,56].Specifically, it operates directly over quantum circuits rather than processes and exploits general linear invariants in the (complex) state space.In engineering, invariant-based reductions of linear systems are known under the names of proper orthogonal decomposition [4,40], Krylov methods [4], and dynamic mode decomposition [27,34,46].Linear invariants also describe safety properties [10] in quantum model checking [60,61], without being used for reduction though.L-bisimulation [12] and [27,34] yield the same reductions, with the difference being that the former obtains the smallest reduction up to a given initial constraint, while the latter computes the smallest reduction up to an initial condition.While similarly to us relying on reduction techniques, [27,34] focus on the reduction of quantum Hamiltonian dynamics, with applications mostly in quantum physics and chemistry.Instead, we study the reduction of quantum circuits which are the prime citizens of quantum computing.Moreover, we provide a prototype implementation of our approach and perform a large-scale numerical evaluation.
Paper outline.The paper is structured as follows.After a review of core concepts, Section 3 introduces forward and backward constrained bisimulation of (quantum) circuits.There, we also provide an algorithm for the computation of constrained bisimulations by extending [36,43] to circuits.Section 4 then derives bounds on the reduction sizes of quantum search [28], quantum optimization [21] and quantum order finding [41].Section 5, instead, conducts a largescale evaluation on published quantum benchmarks [45] and compares constrained bisimulations with MQT DDSIM with respect to the possibility of speeding up circuit simulations.The paper concludes in Section 6.

Linear Algebra and Dynamical Systems
We begin by introducing core concepts from linear algebra and quantum computing [38,41].
• The column space of a matrix are all linear combinations of its columns and is denoted by .One says, the columns of span .
• The row space of a matrix is the set of all linear combinations of its rows and is denoted by .One says, the rows of span .
• A (quantum) circuit over qubits is described by a unitary map ∈ C × , that is, −1 = † .
• A (quantum) state | ∈ C is a vector with (Euclidian) norm one.
Throughout the paper, we do not work at the higher level where quantum circuits are defined by means of quantum gate compositions [41].Instead, we work directly at the level of the unitary maps that are induced by such compositions.
With this in mind, we use the terms "unitary map" and "quantum circuit" interchangeably.
We distinguish between one-and multi-step applications of a quantum circuit [41].For an input state | 0 ∈ C , the full quantum state after one-step application is | 0 .Instead, the full quantum state after a multi-step application is given by | 0 , where > 1 is the number of steps.These definitions justify interpreting a quantum circuit as a discrete-time dynamical system as follows.
, with ≥ 0. We call | the full quantum state at step .
Example 2.3.The one-qubit circuit = 0 1 1 0 is known as the Pauli -gate [41].In the case of ≥ 1 steps and input The result of a quantum computation is not directly accessible and is usually queried using quantum measurements [41].These can be described by projective measurements [41], formally given by a family of orthogonal projections { 1 , . . ., } that satisfy 1 + . . .+ = .When a quantum state | ∈ C is measured, the probability of outcome 1 ≤ ≤ is = | | .In case of outcome , the quantum state after measurement is | / √ .We will be primarily concerned with the case { , − } for a given orthogonal projection .
Often, one is interested in querying states from a specific subspace .For example, the result of the HHL algorithm [30], considered in Section 5, is stored in a subset of all qubits, i.e., in a subspace.To this end, it suffices to use a projective measurement that identifies .
Definition 2.4.Given an orthogonal projection , we call | the -measurement of A particularly simple but useful class of projective measurements is the one that measures a single state | , i.e., that identifies the space | spanned by | .This is given by the orthogonal projection Example 2.5.Assume that we are interested in measuring the result of Example 2.3 using measurement | that identifies | .Then, for

Linear Algebra with Decision Diagrams
In the following we briefly review decision diagrams and outline how these can be used to conduct linear algebra operations efficiently; see [47] for a detailed discussion.We start by noting that a qubit | = 0 |0 + 1 |1 can be conveniently represented as a decision diagram .
It consists of a single node with one incoming edge that represents the entry point into the decision diagram as well as two successors that represent the split between the basis states of the single qubit, ending in a terminal node denoted by the black box.The state's amplitudes are annotated to the respective edges.Edges without annotations correspond to an edge weight of 1.
Building on the intuition from a single-qubit state, we can move to larger systems.
Example 2.6.Consider the following state vector of a three-qubit system: Then, | can be recursively split into equally-sized parts, essentially making a case distinction over the qubit value ∈ {0, 1}, for each = 2, 1, 0: This directly translates to the decision diagram: Each level of the decision diagram consists of decision nodes with corresponding left and right successor edges.These successors represent the path that leads to an amplitude where the local quantum system (corresponding to the level of the node, annotated here with the labels) is in the |0 (left successor) or the |1 state (right successor).
At this point, this has been just a one-to-one translation between the state vector and a graphical representation.
The unique feature of decision diagrams is that their graph structure allows redundant parts to be merged in the representation instead of being repeated.
Example 2.7.Observe how, in the previous example, the left and right successors of the top-level ( 0 ) node lead to exactly the same structure.As a result, the whole sub-diagram does not need to be represented twice, i.e., 2 1 0 0 From a memory perspective, this reduction compressed the memory required to represent the state by 50%.
Intuitively, the idea is to represent state vectors compactly as decision diagrams by halving the vector in a recursive fashion until all state vector redundancies have been exploited.Although substantial compression is often possible, we mention that there exist state vectors yielding decision diagrams of exponential size in the number of qubits [62].
Merely defining means for compactly representing states is not sufficient, and it is crucial also to define efficient means to work with or manipulate the resulting representations.In [47] it has been outlined how decision diagrams can be used to perform linear algebra operations efficiently.There, it has been argued that decision diagrams support addition, scalar multiplication, and the scalar product.The main idea behind the efficient implementation is to recursively break the respective operations down into sub-computations.
We shall sketch the addition of decision diagrams next.The idea is to recursively rewrite it by noting that where and ′ are common factors of the terms in | and | ′ , respectively.In the decision-diagram formalism, this corresponds to a simultaneous traversal of both decision diagrams from their roots to the terminal (multiplying edge weights along the way until the individual amplitudes are reached) and back again (accumulating the results of the recursive computations).More precisely, where the dashed nodes represent the respective successor decision diagrams.Overall, this results in a complexity that is linear in the size of the larger decision diagram.Scalar multiplication and scalar product of decision diagrams can be treated in a similar fashion and yield the same complexity [47].
3 CONSTRAINED BISIMULATIONS FOR QUANTUM CIRCUITS

Forward Constrained Bisimulation
We next introduce FCB.Before commenting on the definition, we establish the following.

P
. See proof of Theorem 3.9.
We note that the reduction holds for any choice of initial state | 0 , analogously to the aforementioned forward-type bisimulations [17,18] for (real-valued) dynamical systems.The assumption of orthonormality of rows of implies that ≤ , i.e., is a transformation onto a possibly smaller-dimensional state space.Although it can be dropped without loss of generality [43], it allows a more immediate relation to projective measurements.In fact, matrix induces the orthogonal projection defined as = † .This projective measurement identifies because ⊆ † .Moreover, | is preserved in the reduced system for any .To see this, it suffices to multiply | = | ˆ by † from the left and to note that this yields | = † | ˆ .end for 10: until no columns have been appended to † 11: return matrix † † .Algorithm 1 adapts the algorithm for (real-valued) systems of ordinary differential equations with polynomial derivatives developed in [36,43] to the complex domain and yields the minimal FCB wrt subspace , i.e., it returns an orthonormal ∈ C × whose dimension is minimal.

T 3.4 (M FCB)
. Algorithm 1 computes a minimal FCB ∈ C × w.r.t.subspace , that is, the rowspace of any FCB ′ w.r.t.contains that of .The complexity of Algorithm 1 is polynomial in .

P
. See proof of Theorem 3.9.
We briefly comment on Algorithm 1.The idea behind it exploits the fact that can be shown to be an FCB whenever † is an invariant set of the map , that is, if the column space of † is contained in the column space of † .
The algorithm begins by initializing † with a basis of in line 1.This ensures that is contained in the column space of the final result.For every column | of † , the main loop in line 2 checks whether | is in the column space of † (line 5) by computing its projection | onto the column space of † (line 4).If it is not in the column space, the projection will differ from | and the residual | must be added to † .This shows correctness, while the minimality of FCB follows from the fact that only the necessary residuals are added to † .The complexity of the algorithm, instead, follows by noting that at most columns can be added to † and that all computations of the main loop require, similarly to the computation in line 1, at most O ( 3 ) operations.
Remark 3.5.As can be noticed in Algorithm 1, e.g., line 4, the computation of an FCB subsumes the computation of a single step of the circuit.For practical applications to single-step circuits where the modeler is interested in only a single input, FCB may be as expensive as simulating the original circuit directly.Hence, it is obvious that the effectiveness of constrained bisimulations is particularly relevant when simulating the circuit with respect to several inputs, or when considering multi-step applications.Examples of this are provided in Section 4 and a numerical evaluation is carried out in Section 5. is a BCB wrt .

P
. Let 0 ⊆ be a basis of some fixed ⊆ C .We first note that the discussion of [31,36,43] and [4,46] can be directly extended to the complex field.With this, we obtain: (1) ∈ C × is an FCB wrt if and only if ⊆ with † 0 ⊆ .
(2) ∈ C × is a BCB wrt if and only if ⊆ with 0 ⊆ .
Moreover, we observe the following: This yields Theorem 3.9, i.e., ∈ C × is an FCB of wrt constraint if and only if † ∈ C × is a BCB of wrt (because † † 0 = 0 ).Moreover, if † is computed by Algorithm 1, then † is a BCB wrt , while † † is an FCB wrt .This follows by noting that in such a case The complexity follows from the discussion after Theorem 3.4.A detailed complexity discussion can be obtained in [43].
Exploiting that an FCB satisfies † = by [55], we obtain showing that ˆ is unitary.
In light of the above result, we often speak of a (constrained bisimulation) reduction.Moreover, we note that Theorem 3.9 ensures that a BCB reduction up to input yields an FCB reduction up result, which is discussed next.
Remark 3.10.Let † be the BCB of w.r.t.| 0 , where | 0 is the input.Then, = † † is an FCB wrt | 0 , implying that = † identifies the column space of † , see discussion after Definition 3.1.At the same time, the result We end the section by pointing out that, thanks to Theorem 3.9, Algorithm 1 can be used to compute a minimal BCB † w.r.t.subspace .In fact, the only difference is that one should return † instead of † † in the last line of the algorithm.With this change, we notice that Algorithm 1 coincides, in the case of a one-dimensional subspace ⊆ C , with the Krylov subspace [4] that can be obtained by the Arnoldi iteration [46].

Lumpings with Decision Diagrams
The next result ensures that decision diagrams [42] can be used to boost the computation of quantum lumpings.(1) Without assuming knowledge of , the quantum lumping and the reduced unitary map ˆ can be computed in time O ( 2 ) and space O ( + 2 ).
(2) Provided that a vector | ∈ C is given in terms of a decision diagram of size O ( ), vector † ˆ | can be computed, for any ≥ 1, in O ( + 2 ) steps and in O ( ) space.

P
. Assuming that two vectors of C are represented by decision diagrams of size O ( ), it can be shown [47] that their sum, scalar product and scalar multiplication can be computed in time O ( ) and stored as decision diagrams of size O ( ).Next, we prove the special case = 1.
Hence, assuming (as inductive hypothesis) that 0 , . . ., −1 can be stored as decision diagrams of size O ( ), it follows that (1) , (2) , . . ., ( −1) can be also stored as decision diagrams of size O ( ).This, in turn, implies that can be computed in O ( ) time and space.Since this has to be done for all = 0, . . ., − 1, we obtain a time complexity of O ( 2) and a space complexity of O ( + 2 ) because we need to store ˆ and the computation of each reuses the previously computed 0 , . . ., −1 .Once 0 , . . ., −1 have been obtained, the reduced matrix ˆ can be computed in time O ( 2) and space O ( + 2 ) because ˆ , = , .We next turn to claim 2. To this end, we first note that the vector | ∈ C can be obtained by computing scalar products between decision diagrams of size O ( ), thus giving rise to a time and space complexity of O ( ).Once Provided that the sequence 1 | , 2 | , . . .can be computed using decision diagrams whose size is polynomial in the number of qubits, Theorem 3.11 ensures that the quantum lumping of w.r.t.| can be computed with time and space requirements that are polynomial in the number of qubits.
At the expense of higher complexity, Theorem 3.11 can be extended to quantum lumpings w.r.

APPLICATIONS
In this section, we demonstrate that established quantum algorithms enjoy substantial bisimulation reductions.For each application, we provide a brief description of the quantum algorithm and a theoretical bound on its reduction.

antum Search
Let us assume that we are given a non-zero function : {0, 1} → {0, 1} and that we are asked to find some ∈ {0, 1} such that ( ) = 1.Grover's seminal algorithm describes how this can be achieved in O ( √ ) steps on a quantum computer [41, Section 6.2], thus yielding a quadratic speed-up over a classic computer.For any ∈ {0, 1} , the Grover map is given by The Grover map yields the following celebrated result.The next result allows one to compute the result | from Theorem 4.1 using a map over a single qubit.
where is as above, while | is the superposition (i.e., sum) of all solution states and | is the superposition of all non-solution states.
Remark 4.3.Although BCB † wrt | always has dimension 2, its column space depends on the oracle function .
This is because appears in , see (3).

antum Optimization
The Quantum Approximation Optimization Algorithm (QAOA) [21] is a computational model that has the same expressive power as the common quantum circuit model [2,21,22].It is described by two matrices.The first is the problem Hamiltonian for which we are interested in computing a maximal eigenstate, i.e., an eigenvector for a maximal eigenvalue of .The second is the begin Hamiltonian for which a maximal eigenstate | is known already.With this, a maximal eigenstate of can be obtained by performing the QAOA introduced next.
Definition 4.4 (QAOA [21]).For a problem Hamiltonian and a begin Hamiltonian , fix the unitary matrices where > 0 is a sufficiently small time step and exp( ) is the matrix exponential.For a sequence of natural numbers The QAOA with ≥ 1 stages is then given by max{ While the problem Hamiltonian depends on the task or problem we are solving, the choice of the begin Hamiltonian is informed by the so-called adiabatic theorem, a result that identifies conditions under which QAOA returns a global optimum.A common heuristic is to pick such that and do not diagonalize over a common basis [21,22] and to assume without loss of generality that | = | / √ is the unique maximal eigenvector of .
We next demonstrate that bisimulation can reduce QAOA when applied to SAT and MaxCut, two NP-complete problems [50].We start by introducing the problem Hamiltonians for both cases.

Definition 4.5 (SAT and MaxCut Problem Hamiltonians).
• For a boolean formula = =1 , where is a clause over boolean variables, the problem Hamiltonian is given by = , where for any ∈ {0, 1} that represents a Boolean assignment.
Following this definition, it can be shown that the QAOA | | from Definition 4.4 corresponds to a quantum measurement reporting either the expected number of satisfied clauses or the expected size of the cut.It is possible to guarantee that QAOA finds a global optimum for a sufficiently high [21,22].
The next result ensures that has BCB † w.r.t.| whose reduced map is provably small.Moreover, for any such , it ensures that there exists a begin Hamiltonian for which † is a BCB too, thus ensuring substantial reductions of the entire QAOA calculation (4).| .Then (1) The column space of † is spanned by where is the number of clauses (SAT) or edges (MaxCUT).Specifically, the dimension of the BCB is bounded by .
(2) Then, for any Hamiltonian ˆ ∈ C × (i.e., Hermitian matrix), there is a Hamiltonian The QAOA in C thus corresponds to a QAOA in the reduced space C .

P
. We begin by proving 1.For SAT, it can be seen that | = | , where 0 ≤ ≤ is the number of clauses that are satisfied by assignment .A similar formula holds for MaxCut, the difference being that is the size of the cut .
It should be noted that is in diagonal form for both SAT and MaxCut.If denotes the number of distinct eigenvalues of , then ≤ , where is in the case of SAT or MaxCUT, respectively, the number of clauses or edges.The same can be said regarding its matrix exponential ( ) which, being unitary, enjoys an eigendecomposition, allowing us to write | = =1 | , where | is an eigenvector for eigenvalue of ( ).This yields for all ≥ 0. Without loss of generality, consider ≤ such that = 0 for all > and ≠ 0 otherwise.Writing gives rise to a regular Vandermonde matrix [46] in follows from the definition of BCB and Lemma 4.7 from below.
The following auxiliary result is needed in the proof of Theorem 4.6.
Then, is unitary, is an FCB of it wrt | , and ˆ is its reduced map.In addition, there exists a Hamiltonian ∈ C × that satisfies = exp(− ).

P
. We first show that = † as this implies that is an FCB of by [55].To see this, we note that where we have used that † = 0 and † = 0, which follows from the choice of .From the calculation, we can also infer that ˆ = † , i.e., ˆ is indeed the reduced map.In a similar fashion, one can note that is also an FCB of and that ˜ is the respective reduced map.Since both ˆ and ˜ are unitary, we infer that also is unitary (alternatively, a direct calculation yields = † ).Since any unitary matrix can be written as a matrix exponential of a Hamiltonian, there exists a Hamiltonian satisfying = exp(− ).

antum Factorization and Order Finding
Let us assume that we are given a composite number which we seek to factorize.As argued in [41,Section 5.3.2], this problem can be solved in randomized polynomial time, provided that the same holds for the order finding problem.
The next result allows us to relate the order of to the dimension of the BCB wrt |1 .This fact is exploited in Shor's factorization algorithm [41].
).The dimension of the BCB of wrt |1 is equal to the order of modulo .

P .
If can be shown [41] that the from above is unitary and that | =2 / | for all 0 ≤ ≤ − 1, where for any ≥ 0. Hence, the minimal BCB with respect to |1 is contained in the span of 0 , . . ., −1 .To see that the dimension is exactly , we note that the vectors written in the basis 0 , . . ., −1 , constitute a regular Vandermonde matrix [46] in C × .

NUMERICAL EXPERIMENTS
We evaluated our approach on the applications from Section 4 and the MQT Bench quantum circuit benchmark suite [45].To this end, we extended the publicly available implementation of the CLUE algorithm from [36,43] which used Python, Gaussian elimination and sparse vectors and matrices.The new version was re-implemented in C++ and uses the state-of-the-art decision diagram package for quantum computing provided as part of the Munich Quantum Toolkit's MQT Core library (available at https://github.com/cda-tum/mqt-core).A version that uses sparse vectors and matrices for representing state vectors and quantum circuits, respectively, is offered for completeness.The respective prototype is publicly accessible at the GitHub repository CLUE, branch quantum+cpp 2 , all results reported were executed on a machine with an i7-8665U CPU, 32GB RAM and a 1024GB SSD.
As anticipated, the evaluation demonstrates that a combination of quantum bisimulation with decision diagrams outperforms each individual approach.

Applications from Section 4
Here we report the results of numerical experiments on the applications discussed in Section 4. Starting with five qubits, we increased the number until a timeout of 500 seconds has been reached.To allow for a representative evaluation, we averaged runtimes over 50 independent runs for each case study.Specifically, the instances were generated as follows: • Grover algorithm (Sec.4.1): for a fixed numbers of qubits, we pick randomly an element ∈ {0, . . ., 2 − 1}.
We create the corresponding circuit for Grover's algorithms with qubits (1 of them is ancillary) that looks for the element .
• Quantum Optimization for SAT (Sec.4.2): for each number of qubits , we generate a random formula with clauses ( is randomly selected between and 3 ), where each clause has 3 variables at most.We guarantee that every formula contains all variables.
We considered three different regimes: Table 1.Simulation times of quantum applications from Section 4 in case of three different regimes.CLUE: the circuit is reduced using vectors and simulated; DDSIM+CLUE: the circuit is reduced using decision diagrams and then simulated; DDSIM: the original circuit is simulated using decision diagrams without any reduction.
• CLUE: quantum bisimulation is computed using CLUE that relies on vector matrix representations; the simulation is conducted using the reduced system; • DDSIM+CLUE: quantum bisimulation is computed using CLUE that relies on decision diagrams; the simulation is conducted using the reduced system; • DDSIM: using decision diagrams, we simulate the full circuit.
The number of steps was set to the smallest integer greater than or equal to √ .The choice of is motivated by Theorem 4.1 and the discussion around the so-called adiabatic theorem in [21,22].
For quantum optimization, Table 1 reports logarithmic CLUE reductions, reducing in particular 2 15 × 2 15 matrices to 15 × 15 matrices in less than 15 s.Moreover, we observe that a combination of CLUE with DDSIM outperforms each individual approach.

Benchmark Circuits
Similarly to Table 1, we next compare the scalability of vector-matrix operations versus that of decision diagrams, this time on quantum benchmarks from [45], available at https://www.cda.cit.tum.de/mqtbench/.To this end, we computed the constrained bisimulation w.r.t.|0 for all circuits in each benchmark family until a timeout of 500 seconds has been reached.We show the reduction ratio (RR) for the constrained bisimulation w.r.t.|0 and the times (in seconds) spent under regimes CLUE and DDSIM+CLUE, averaged over 5 executions.We note that some circuits were only available for a specific number of qubits (e.g., HHL and price calls).For the benefit of presentation, we omit qubit numbers whose simulation times were below one second for both CLUE and DDSIM+CLUE.
In general, it can be observed that substantial reductions could be obtained for a number of benchmark families.
Concerning scalability, we note that DDSIM+CLUE outperforms CLUE in most cases.An exception to this rule is when a circuit cannot be reduced.In such a case, an extension of CLUE by decision diagrams tends to perform worse.

CONCLUSION
We introduced forward and backward constrained bisimulations for quantum circuits which allow by means of reduction to preserve an invariant subspace of interest.The applicability of the approach was demonstrated by obtaining substantial reductions of common quantum algorithms, including, in particular, quantum search, quantum approximate optimization algorithms for SAT and MaxCut, as well as a number of benchmark circuits.Overall, the results suggest that constrained bisimulation can be used as a tool for speeding up the simulation of quantum circuits on classic computers, especially when the circuit is to be simulated under several initial conditions or for multi-step applications.
In line with the relevant literature on bisimulations for dynamical systems, constrained bisimulations introduce loss of information due to their underlying projection onto a smaller-dimensional state space; the information that is preserved, however, is exact.A relevant issue for future work is to consider approximate variants of bisimulation for quantum circuits, in order to find more aggressive reductions or to capture quantum-specific phenomena such as quantum noise.

L 3 . 2 .
The reduced map ˆ in Definition 3.1 is unitary.

Example 3 . 6 .Definition 3 . 7 (Example 3 . 8 .
Consider the FCB = | † w.r.t.subspace | from Example 3.3.Then, noting that ( − ) | = 0, we infer that Algorithm 1 terminates in line 5. Hence, is a minimal FCB w.r.t.| .3.2 Backward Constrained BisimulationBCB yields a reduced system through the identification of an invariant set, i.e., a subspace such that | ∈ for any | ∈ and ≥ 1.Whereas in FCB the reduced model can recover projective measurements onto the constraint set for any initial set, here one can recover the full quantum state, so long as the initial states belong to the invariant set.Backward Constrained Bisimulation, BCB).Let , and ˆ be as in Definition 3.1.Then, † is a BCB of the dynamical system | +1 = | w.r.t. a subspace of inputs ⊆ C when ⊆ † and whenever| 0 = † | ˆ 0 implies | = † | ˆ for all ≥ 1.Similarly to FCB, we assume without loss of generality that ∈ C × has orthonormal rows.As anticipated above, FCB and BCB are not comparable in general.In fact, an FCB does not make any assumptions on the initial condition| 0 ,while a BCB † does so by requiring † | 0 = | 0 .Conversely, a BCB † allows one to obtain | , while an FCB allows one to obtain | instead of | itself.Fix | = (1, −1) / √ 2 from Example 3.6 and recall that = | † , | = − | and ˆ = −1.Then, † is a BCB of w.r.t.| .In fact, † | 0 = | 0 implies | 0 = | , while † | ˆ = (−1) † | ˆ 0 = (−1) † | 0 = (−1) | 0 = | = | .Example 3.8 anticipates the next result that states FCB and BCB are dual notions.This generalizes the known duality of ordinary and exact lumpability of Markov chains [17, 20].Fix a unitary map ∈ C × and a subspace ⊆ C . is an FCB wrt if and only if † Fix a quantum circuit that induces the unitary map ∈ C × and let | ⊆ C be the subspace spanned by | ∈ C .Assume that the quantum lumping of w.r.t.| has dimension and that 1 | , . . ., | can be computed and stored as decision diagrams in time and space O ( ) using circuit .
the vector | is known, computing ˆ | can be done in O ( 2 ) time and O ( 2 ) space using common matrix operations over C .Finally, for any vector | ˆ ∈ C , vector † | ˆ can be represented as a decision diagram of size O ( ) by performing scalar multiplications and additions over decision diagrams of size O ( ), yielding a time and space complexity O ( ).
The BCB † ∈ C × of w.r.t.| has dimension = 2 and a column space spanned by | and | .P .The claim follows by noting that the column space of an BCB w.r.t.| is spanned by | , | , 2 | , . . ., −1 |and so on.This, in turn, is known to have as basis[41, Section 6.2] Fix as in Definition 4.5, any > 0 and let † ∈ C × be a BCB of ( ) w.r.t.

L 4 . 7 .
Pick any ∈ C × and ∈ C ( − ) × so that the rows of and comprise an orthonormal basis of C , and define

Table 2 .
Reduction Ratio (RR) and Time (s) comparison between CLUE and DDSIM+CLUE for computing a lumping w.r.t.|0 .