Keywords

figure a

1 Introduction

Current Status in Quantum Computing. Quantum computers have seen rapid improvements in recent years, especially the capability of the physical realizations of quantum computers has increased significantly [5]. There are applications where quantum computers promise an advantage over classical machines [8, 10, 19, 23]. Currently, the provided number of qubits does not reach the order of magnitude required for putting the majority of quantum algorithms into practical use. Moreover, current hardware faces significant problems with noise that perturbs the computed results and makes them harder to use as the circuits grow deep [15]. Consequently, the number of gates in the quantum circuit should be reduced as much as possible beforehand. For this purpose, several tools have been developed, e. g., \(\mathrm {T|ket\rangle }\) [24], pyzx [14], Qiskit [20], staq [2], QGo [27]. More details about existing optimization techniques can be found in Sect. 7.

Considering Initial Configuration. A quantum program is usually executed starting from a state where all qubits are \(\left| 0\right\rangle \). Surprisingly, tools listed above take this information only slightly into account and they heavily rely on gate cancellation rules and pattern matching to simplify circuits. When it comes to the initial state, quantum circuits designers provide ad-hoc arguments why particular controls or gates can be omitted based on the initial configuration, e. g., in the context of Shor’s algorithm [17, 26]. Jang et al. [13] propose to use the knowledge of the initial state to automatically remove superfluous controls of controlled gates. For their optimization, they need to execute the quantum circuit on a quantum machine many times. In our view, executing a circuit on a quantum computer several thousand times to achieve an optimized version of the same circuit seems to be laborious. More in the spirit of our approach is Liu et al. [16], who propose a Relaxed Peephole Optimization (RPO) approach that leverages the information on single-qubit states which could be efficiently determined at compile time. However, their idea of treating qubits as independent systems has the drawback that information on single qubits is lost when a multi-qubit gate is applied, with few exceptions. Our approach avoids this issue by tracing entangled qubits’ states up to a given complexity.

Restricted Polynomial-Time Simulation. Since the full simulation of a quantum circuit takes exponential time in the number of qubits [9] in general, simulating the entire quantum circuit is not a viable solution for efficient optimization. For this reason, we propose a restricted simulation of a quantum circuit in Sect. 3, which simulates the circuit only up to a given complexity. The complexity of an entanglement group (the group of qubits that are entangled) corresponds to the number of basis quantum states. For example, the complexity of the three-qubit state, \(\frac{1}{2}\left( \left| 000\right\rangle + \left| 001\right\rangle + \left| 010\right\rangle + \left| 111\right\rangle \right) \), is 4, since there are 4 basis states in the entanglement group, namely \(\left| 000\right\rangle \), \(\left| 001\right\rangle \), \(\left| 010\right\rangle \), and \(\left| 111\right\rangle \). The complexity up to which circuits are simulated is chosen beforehand and thus not depending on the number of qubits. This restriction on complexity ensures that our approach runs in polynomial time, which we will prove in Sect. 5. Using the idea of restricted simulation, we propose to perform a quantum equivalent to constant propagation [22], called quantum constant propagation (QCP). We have implemented our idea into a publicly available toolFootnote 1.

Objective of QCP. With QCP we aim to reduce the number of controls and eliminate superfluous controlled gates, following the same objective as Jang et al.. Overall QCP reduces quantum circuits in their costs to be executed on target platforms. Nevertheless, the circuit processed by QCP still produces the same desired outcome, as what we prove in Sect. 4.

Fig. 1.
figure 1

Our proposed quantum constant propagation identifies the doubly controlled not-gate in the middle as superfluous and hence the circuit reduces to the empty circuit on the right.

Effects of QCP. Our proposed optimization technique is capable of identifying the doubly controlled not-gate in the middle of the circuit shown in Fig. 1 as superfluous and, hence, the circuit reduces to the empty circuit. Our evaluation in Sect. 6 MQT Bench [21] demonstrates the impact of our novel optimization technique. Applying our optimization followed by Qiskit using the highest optimization level, we can remove up to 26k more gates compared to just using Qiskit [20], which corresponds to 0.5% of all gates evaluated. When we compare our approach with a similar existing optimization called Relaxed Peephole Optimization (RPO) [16] by running ours after RPO, we can remove 17.2% more gates on the evaluated circuits than just using RPO alone. It shows that this existing optimization even benefits our optimization. We believe that, especially in the future, QCP will become more important when larger circuits are built based on building blocks controlled by one or multiple controls. We comment on this in more detail in Sect. 8. In the next section, Sect. 2, we give a brief introduction to quantum computing on aspects important to this article.

2 Preliminaries

In the following, we give a brief introduction to quantum computing in order to make this article as self-contained as possible; for a more in-depth explanation, the interested reader is referred to the textbooks [12, 18].

Quantum Bits. Instead of bits, quantum computers operate on qubits (quantum bits). Those cannot just assume the two basis states \(\left| 0\right\rangle \) and \(\left| 1\right\rangle \) but also every state that can be expressed as their linear combination, \(\left| \Psi \right\rangle = \alpha \left| 0\right\rangle + \beta \left| 1\right\rangle \), where \(\alpha , \beta \in \mathbb {C}\)—often called amplitudes—satisfy \(\left| \alpha \right| ^2+\left| \beta \right| ^2 = 1\). Hence, a qubit can be in a so-called superposition of both basis states. Upon measured, it collapses into either \(\left| 0\right\rangle \) or \(\left| 1\right\rangle \) with probability \(\left| \alpha \right| ^2\) or \(\left| \beta \right| ^2\), respectively.

Multiple Quantum Bits. The state of a multi-qubit quantum system is denoted by a vector in \(\mathbb {C}^2 \otimes \cdots \otimes \mathbb {C}^2 = \mathbb {C}^{2^n}\). The basis vectors are written as \(\left| b_1\right\rangle \otimes \dots \otimes \left| b_n\right\rangle \) or \(\left| b_1\dots b_n\right\rangle \) for short, with \(b_i\in \left\{ 0,1\right\} \). Sometimes the abbreviated notation \(\left| n\right\rangle \) is used where \(n = \sum _{i=0}^{n}b_i\cdot 2^i\).

Gates. Operations on qubits are expressed as gates, each of which is denoted by a unitary matrix. A quantum computer only offers a discrete set of basis (parameterized) gates that can be applied to the qubits. Similar to conditioned branches in classical programs, gates can be controlled on the state of one or multiple other qubits. Then the controlled gate is applied to the target qubit if and only if all controlling qubits are in the \(\left| 1\right\rangle \) state.

Entanglement. Entanglement refers to the situation in which the measurement result of one qubit depends on the rest of the quantum system. For example, the circuit in Fig. 2 creates an entanglement among three qubits: The Hadamard-gate brings the first qubit into maximal superposition \((\left| 0\right\rangle +\left| 1\right\rangle )/\sqrt{2}\); then, two controlled not-gates are applied in sequence; when measuring the resulting quantum state \((\left| 000\right\rangle +\left| 111\right\rangle )/\sqrt{2}\), there are only two possible outcomes—“000” and “111”—and no other combination of results, e.g., “010” and “110”, can be obtained even if the three qubits are not measured simultaneously.

Fig. 2.
figure 2

This circuit creates a GHZ state of three qubits.

Curse of Dimensionality. Entanglement is the root cause of why it is so hard to simulate a quantum computer on a classical machine. As long as all the qubits are separable, i. e., not entangled, one needs to store two complex numbers for each qubit. As soon as some \(k\) qubits are entangled with each other, one needs to store potentially \(\mathcal {O}(2^k)\) complex numbers, which would immediately lead to exponential running time [1, 11].

Concrete Semantics. A quantum program consists of a sequence of quantum gates that are applied on an initial configuration of a quantum system, where usually all qubits are in the \(\left| 0\right\rangle \) state. The application of a gate transforms the concrete quantum state, represented as a state vector, according to the unitary matrix associated with the gate via matrix multiplication.

Interface of an Optimization. In our setting, a quantum circuit is represented as a list of gates. The optimization is expected to accept a list of gates and output an optimized version. For this purpose, an optimization provides a function . In the next section, we explain how we implement QCP utilizing a restricted simulation.

3 Methodology

As mentioned in Sect. 1, up to now, many quantum circuit designers have argued in complex manners about the superfluousness of specific controls and gates. Our approach aims to automate those reasonings and apply them automatically as an optimization pass to a quantum circuit. For that, we simulate the circuit and identify controls and gates that can be dropped without changing the semantics of the circuit. To make our optimization efficient in terms of a polynomial time complexity, we propose a restricted simulation that simulates the circuit only partially but satisfies the required time-bound, with the help of our specially tailored data structures that efficiently represent quantum states.

3.1 Union-Table

Efficient Union-Find Customization. One central idea of our approach to allow polynomial running time is to keep groups of qubits that are not entangled with each other separated as long as possible. For that, we need a data structure that stores a collection of sequences, i. e., qubits, supports the operation , and can retrieve the position of an element in its sequences. Additionally, it needs to maintain extra information associated with each sequence. The required functionality suggests augmenting a union-find data structure. However, we have the advantage that the total number of all elements stored in our structure is constant and known a priori, namely the number of qubits. For this reason, we use a table-like approach, hence, the name union-table. To store \(n\) elements in a union-table, we use an array of length \(n\), where each field denotes one element. Each field contains a pointer to a value of type storing all indices also pointing to this entry, their number , and the value associated with this entry. This leads to the following type definitions for the union-table as an OCaml module where is the type of the union-table itself.

figure i

Use of Permutation. Note that we use an extra attribute that stores a permutation serving as a view onto the underlying data structure. If one calls a function that accesses an entry at index , then is first looked up in that returns a potentially different index ; the index is then used for actual access to the union-table structure. More information about the functions to operate on a union-table, especially their running times, can be taken from Table 1.

Table 1. The table lists all functions provided by the module union-table to access and modify the stored data. The type definition of each function is given in script size below it. All functions in the lower block are implemented to run in \(\mathcal {O}(1)\) time. The functions and take \(\mathcal {O}(n)\) time and needs \(\mathcal {O}(n + f(n))\) where \(n\) is the size of the union-table and \(f\) the running time of .

3.2 Representation of a Quantum State

Bitwise Representation. The union-table is polymorphic in the type used to store the extra information for each set. For that, we introduce another module that is a hash table with bit combinations as keys and complex numbers as values inspired by the data structure used in [7]. The bit combinations correspond to basis states, e. g., if there are stored three qubits in a group in the union-table, the binary number \(11_{bin}\) corresponds to the quantum state \(\left| 011\right\rangle \) where the first of the three qubits is in the state \(\left| 0\right\rangle \) and the other two in \(\left| 1\right\rangle \). The values denote the amplitudes for each state. For this to work correctly, the length of keys must not be limited by the number of bits used for an integer, e. g., 32 or 64 bits; instead, we use arbitrary large integers as keys for the hash table. The indices of qubits in each group in the union-table are ordered; this way, one gets a mapping from the global index of a qubit to its position within the state. Figure 3 shows the representation of the state \(\left( \left| 10000\right\rangle -\left| 10101\right\rangle +\left| 11000\right\rangle -\left| 11101\right\rangle \right) /2\otimes (1+i)/\sqrt{2}\left| 0\right\rangle \) using a union-table and bitwise representation of the quantum states.

Fig. 3.
figure 3

The representation of a quantum system with six qubits using the union-table data structure (left), where the quantum state of each entanglement group is a hash table with the basis states as keys and their complex amplitudes as values (right). The qubits in each state are indexed from left to right. The represented quantum state is \(\left( \left| 10000\right\rangle -\left| 10101\right\rangle +\left| 11000\right\rangle -\left| 11101\right\rangle \right) /2\otimes (1+i)/\sqrt{2}\left| 0\right\rangle \). Using the usual tensor-product notation, it is not obvious that qubits \(\left\{ 0,1,3\right\} \) and \(\left\{ 2,4\right\} \) are separable.

Merging Entanglement Groups. The correct ordering of indices becomes, in particular, tricky when two entanglement groups are merged and, consequently, also their quantum states must be merged. For this purpose, the function requires a function to combine the two entries, i. e., in our case the two quantum states. The union-table merges the sequences of two groups in one merge step known from the merge-sort algorithm. The combine function receives the order in which the elements from the two former groups were merged, in order to apply the same merging behavior to the quantum states. When the quantum state for \(n\) qubits contains \(k\) many basis states, the function requires \(\mathcal {O}(k\cdot n)\) steps.

Application of Gates. Here, we only describe the application of a single-qubit gate to a state; the approach generalizes to gates operating on multiple qubits. Let \(U\) be the matrix representation of some gate that is to be applied on qubit \(i\) in a given quantum state where \(U=\left( u_{ij}\right) _{i,j\in \left\{ 0,1\right\} }\). We iterate over the keys in the quantum state: For keys with the \(i\)-th bit equal to 0, we map the old key-value pair \((\left| \Psi \right\rangle , \alpha )\) to the two new pairs \((\left| \Psi \right\rangle , \alpha \cdot u_{11})\) and \((\left| \Psi '\right\rangle , \alpha \cdot u_{21})\), where \(\left| \Psi '\right\rangle \) emerges from \(\left| \Psi \right\rangle \) by flipping the \(i\)-th bit; for keys with the \(i\)-th bit equal to 1, we map the old key-value \((\left| \Phi \right\rangle , \beta )\) pair to the two new pairs \((\left| \Phi '\right\rangle , \beta \cdot u_{12})\) and \((\left| \Phi \right\rangle , \beta \cdot u_{22})\), where, again, \(\left| \Phi '\right\rangle \) emerges from \(\left| \Phi \right\rangle \) by flipping the \(i\)-th bit. Generated pairs with the same key (basis state) are merged by adding their values (amplitudes). For a matrix of dimension \(2^d\) and \(k\) states in the quantum state of \(n\) qubits, this procedure takes \(\mathcal {O}(2^d\cdot k \cdot n)\) steps where \(d\) is the number of affected qubits.

3.3 Restricted Simulation

Restrict Complexity. The state in Fig. 2 contains only two basis states as opposed to \(2^3=8\) possible ones for which a complex number needs to be stored each. Exploiting this fact, the key features of the restricted simulation are

  1. (i)

    to keep track of the quantum state as separable entanglement groups of qubits, where qubits are included in the same entanglement group if and only if they are entangled, and

  2. (ii)

    to limit the number of basis states representing the quantum state of an entanglement group by a chosen constant.

Note that the number of basis states allowed in the quantum state of one entanglement group corresponds to the number of amplitudes required to be stored; all other amplitudes are assumed to be zero.

Reaching Maximum Complexity. The careful reader may ask how to proceed when the maximum number of allowed basis states is reached. We set the state of the entanglement group of which the limit is exceeded to \(\top \), meaning that we no longer track any information about this group of qubits. By doing so, we can continue simulating the remaining entanglement groups until they may also end up in the \(\top \). For this, we utilize a flat lattice that consists of either an element representing a concrete quantum state of an entanglement group, \(\top \), or \(\bot \) (not used) satisfying the partial order in Fig. 4. The following definitions establish the relation between the concrete quantum states and their abstract description.

Fig. 4.
figure 4

Lattice for the abstract description of quantum states.

Definition 1 (Abstract state)

The abstract state \(s\) is an abstract description of a concrete quantum state if and only if \(s = \top {}\) or \(s = \left| \psi \right\rangle \) where \(\left| \psi \right\rangle \) is a concrete quantum state consisting of at most \(nmax\) many non-zero amplitudes.

Definition 2 (Abstract description relation)

Let \(\Delta \) denote the description relation between quantum states and their abstract description. Furthermore, let \(\left| \psi \right\rangle \) be a quantum state and \(s\) be an abstract description. The quantum state \(\left| \psi \right\rangle \) is described by \(s\), formally \(\left| \psi \right\rangle \, \Delta \ s\), if and only if \(s = \top {}\) or \(s = \left| \psi \right\rangle \).

Consequently, the entry in the union-table is not the quantum state itself but an element of the flat lattice, an abstract description of the quantum state. Definition 3 defines a concretization operator for abstract states.

Definition 3 (Concretization operator)

Let \(\gamma \) be the concretization operator, and \(s\) be an abstract description. Then \(\gamma \ s = \{ \left| \psi \right\rangle \mid \left| \psi \right\rangle \, \Delta \ s \} \).

Next, we define the abstract effect for gates acting on quantum states.

Definition 4 (Abstract gate)

Let \([\![ U ]\!]^{\sharp }\) denote the abstract effect of the quantum gate \(U\). For an abstract description \(s\), the abstract effect of \(U\) is:

$$\begin{aligned}{}[\![ U ]\!]^{\sharp } s = {\left\{ \begin{array}{ll} U\left| \psi \right\rangle \\ \top {} \end{array}\right. } \text {if}\quad \begin{aligned} &s = \left| \psi \right\rangle \\ &s = \top {} \end{aligned} \end{aligned}$$

Theorem 1 justifies the above-defined abstract denotation of quantum states. It follows directly from Definition 1, Definition 2, and Definition 4.

Theorem 1 (Correctness of abstract denotation)

For any quantum state \(\left| \psi \right\rangle \) and abstract description \(s\) satisfying \(\left| \psi \right\rangle \ \Delta \ s\), \(U\left| \psi \right\rangle \ \Delta \ [\![ U ]\!]^{\sharp }s\) holds.

Operating with Separated States. In the beginning, every qubit constitutes its own entanglement group. Single-qubit gates can be applied without further ado by modifying the corresponding amplitudes accordingly; the procedure behind is matrix multiplication which can be implemented in constant time given the constant size of matrices. The case where multi-qubit gates are applied is split into two subcases. When the multi-qubit gate is applied only to qubits within one entanglement group, the same argument for applying single-qubit gates still holds. Applying a multi-qubit gate across several entanglement groups will most likely entangle those; hence, we need to merge the affected groups into one.

Applying Multi-qubit Gates. In the case of an uncontrolled multi-qubit gate, such as an echoed cross-resonance (ecr) gate, we first merge all affected entanglement groups into one. If one of those groups is already in the \(\top \) state, we must set the entire newly formed entanglement group to \(\top \). Otherwise, we can apply the merging strategy of the involved quantum states as described in Sect. 3.2. Afterward, matrix multiplication is performed to reflect the expected transformation of the state. A special case is the swap gate: we leave the entanglement groups as is and keep track of the effect of the swap gate in the permutation embedded in the quantum state structure. Before we apply a controlled gate, we perform control reduction—the central part of the optimization—which we outline in the next section, to remove superfluous controls.

3.4 Control Reduction

Classically Determined Qubits. The central task of quantum constant propagation is to remove superfluous controls from controlled gates. First, we identify and remove all classically determined qubits, i. e. those that are either in \(\left| 0\right\rangle \) or \(\left| 1\right\rangle \). If we find a qubit always in \(\left| 0\right\rangle \), the controlled gate can be removed. If we find qubits always in \(\left| 1\right\rangle \), those controls can be removed since they are always satisfied.

Satisfiable Combination. By filtering out classically determined qubits as described above, a set of qubits may remain in some superposition. Even then, for the target operation to be applied, there must be a basis-state where each of the controls is satisfied, i. e., each is in \(\left| 1\right\rangle \). If no such combination exists, the gate can be removed entirely.

Implied Qubits. When a combination with all controls in the \(\left| 1\right\rangle \) state was found, there can still be some superfluous controls among the remaining qubits. Consider the situation in Fig. 5. Here, the upper two qubits are both in \(\left| 1\right\rangle \) state when the third qubit is as well; hence, the third qubit implies the first and second one. The semantics of the controlled gate remains unchanged when we remove the two upper controls. To generalize this idea, we consider every group of entangled qubits separately since there can not be any implications among different entanglement groups. Within each entanglement group, we look for implications, i. e., whether one qubit being in \(\left| 1\right\rangle \) state implies that other qubits are also in the \(\left| 1\right\rangle \) state. Those implied qubits can be removed from the list of controls.

Fig. 5.
figure 5

For the rightmost gate, the third qubit implies the first and second qubit, hence the first and second control qubits can be removed from it.

Further Optimization Potential. In some cases, there might be an equivalence relation between two qubits; here, either one or the other qubit can be removed. This choice is made arbitrarily right now; by considering the circuit to the left or right of the gate, more optimization potential could be exploited. Moreover, the information of more than one qubit might be needed to imply another qubit. Here, we limit ourselves to the described approach because of two reasons: First, in currently common circuits [21] multi-controlled gates with more than two controls rarely occur, and for two controls, our approach finds all possible implications; second, to find the minimal set of controlling qubits is a computationally expensive task that is to the best of our knowledge exponential in the number of controls.

Handle the Abstract State \(\top \). If some of the entanglement groups covered by the controls are in \(\top \), the optimization techniques can be applied nevertheless. Within groups that are in \(\top \) no classically determined qubits or implications between qubits can be identified; however, this is still possible in all other groups. To check whether a satisfiable combination exists across all entanglement groups, we assume one within each group that is \(\top \). This is a safe choice: It does not lead to any unsound optimizations since there could be a satisfiable combination in such groups.

Application of Controlled Gates. Before applying a controlled gate, we assume that all superfluous controls are already removed according to the approach described in Sect. 3.4. Like the application of an uncontrolled multi-qubit gate explained in Sect. 3.2, all involved entanglement groups must be merged. Then, all states that satisfy all remaining controls after the control reduction are filtered. To those, the gate is applied to the target qubits via matrix multiplication, whereas the amplitudes of all other states remain unchanged. However, if one of the controls belongs to an entanglement group in \(\top \), the resulting state cannot be determined, and we set the merged entanglement group to \(\top \).

Fig. 6.
figure 6

Quantum constant propagation removes the control from the gate (3) and the gates (6), (10), and (12) entirely. For an explanation, see Example 1

Example 1

We will demonstrate the effect of our optimization on an example taken from [26]. Vandersypen et al. perform Shor’s algorithm on 7 qubits to factor the number 15. In this process, they design the circuit from Fig. 6. From the gate labeled with (3), the optimization will remove the control because the state of the controlling qubit is known to be \(\left| 1\right\rangle \) at this point. Gate (6) will be removed entirely because the controlling qubit is known to be in the \(\left| 0\right\rangle \) state. Also, gates (10) and (12) are removed since their controlling qubit will be in the \(\left| 0\right\rangle \) state. These optimizations seem to be trivial. However, the difficult part when automating this process is to scale it to larger and larger circuits without sacrificing efficient running time. Here, we provide the right tool for that with our proposed restricted simulation.Footnote 2

4 Correctness of Control Reduction

In this section, we complement the intuitive justification for the correctness of the optimization with a rigorous proof. We first establish the required definitions to characterize the concrete semantics of controlled operations. Similar reasoning about the correctness is contained in [13]; we see our style as more comprehensible since it argues only over the superfluousness of one qubit at a time but is still sufficient to show the correctness of the optimization.

Definition 5 (Controlled gate)

Let \(U\in \mathbb {C}^{2^n\times 2^n}\) for \(n\in \mathbb {N}\) be a unitary matrix of a gate. Let \(C^m(U)\) denote the matrix representing the \(m\)-controlled version of this gate (the application of gate \(U\) is controlled on \(m\) qubits).

Example 2

Consider the X-gate. The corresponding matrix is given by

$$X=\left( {\begin{matrix} 0&{}1\\ 1&{}0 \end{matrix}}\right) .$$

The doubly-controlled version \(C^2(X)\) (the Toffoli-gate), amounts to

$$\left( {\begin{matrix} 1&{}&{}&{}&{}&{}&{}&{}\\ {} &{}1&{}&{}&{}&{}&{}&{}\\ {} &{}&{}1&{}&{}&{}&{}&{}\\ {} &{}&{}&{}1&{}&{}&{}&{}\\ {} &{}&{}&{}&{}1&{}&{}&{}\\ {} &{}&{}&{}&{}&{}1&{}&{}\\ {} &{}&{}&{}&{}&{}&{}0&{}1\\ {} &{}&{}&{}&{}&{}&{}1&{}0 \end{matrix}}\right) \in \mathbb {C}^{8\times 8}.$$

 

Definition 6 (Superfluousness of controls)

Given a state \(\left| \Psi \right\rangle \in \mathbb {C}^{2^{m+n}}\) and a unitary \(U\in \mathbb {C}^{2^n\times 2^n}\). Let \(\mathbb {I}\in \mathbb {C}^{2\times 2}\) denote the identity matrix. The first one of \(m\) controls is superfluous with respect to \(\left| \Psi \right\rangle \) if

$$\begin{aligned} C^m(U)\left| \Psi \right\rangle = \mathbb {I}\otimes C^{m-1}(U)\left| \Psi \right\rangle . \end{aligned}$$
(1)

For the following, we assume without loss of generality that the first \(m\) qubits are the controlling ones for a gate applied to the following \(n\) qubits.

Theorem 2 (Superfluousness of controls)

With the notation from Definition 6 and \(\left| \Psi \right\rangle = \sum _{i=0}^{2^m-1}\sum _{j=0}^{2^n-1}\lambda _{i,j}\left| i\right\rangle \otimes \left| j\right\rangle \), the condition from Definition 6 is equivalent to

$$\begin{aligned} \left. \begin{pmatrix}\lambda _{i,0}\\ \vdots \\ \lambda _{i,2^n-1}\end{pmatrix}\right| _{i=2^{m-1}-1} \end{aligned}$$

being an eigenvector of \(U\) for the eigenvalue \(1\) or the \(0\) vector.Footnote 3

Proof

When we write out the left-hand side of Eq. (1) in Definition 6 using the definition of \(\left| \Psi \right\rangle \), we get the following equation:Footnote 4

$$\begin{aligned} C^m(U)\left| \Psi \right\rangle = \left. \sum _{j=0}^{2^n-1} \left( \sum _{k=0}^{2^n-1}u_{j,k} \lambda _{i,k} \left| i\right\rangle \left| j\right\rangle \right) \right| _{i=2^m-1} + \sum _{i=0}^{2^m-2} \sum _{j=0}^{2^n-1} \lambda _{i,j} \left| i\right\rangle \left| j\right\rangle \end{aligned}$$
(2)

We do the same with the right-hand side of Eq. (1), which results in:

$$\begin{aligned} \begin{aligned} C^m(U)\left| \Psi \right\rangle =& \left. \sum _{i\in \left\{ 2^{m-1}-1,2^m-1\right\} }\sum _{j=0}^{2^n-1} \left( \sum _{k=0}^{2^n-1}u_{j,k} \lambda _{i,k} \left| i\right\rangle \left| j\right\rangle \right) \right| _{i=2^m-1} \\ {} & + \sum _{\begin{array}{c} i=0\\ i\notin \left\{ 2^{m-1}-1,2^m-1\right\} \end{array}}^{2^m-1} \sum _{j=0}^{2^n-1} \lambda _{i,j} \left| i\right\rangle \left| j\right\rangle \end{aligned} \end{aligned}$$
(3)

Such that Eq. (1) in Definition 6 is satisfied, both, Eq. (2) and (3) must be equal, which gives us:

$$\begin{aligned} \sum _{j=0}^{2^n-1}\left( \sum _{k=0}^{2^n-1}u_{j,k}\lambda _{i,k}\right) \left| i\right\rangle \left| j\right\rangle \overset{!}{=}\ \left. \sum _{j=0}^{2^n-1} \lambda _{i,j} \left| i\right\rangle \left| j\right\rangle \right| _{i=2^{m-1}-1} \end{aligned}$$

By performing a summand-wise comparison, this reduces to:

$$\begin{aligned} \left. \sum _{k=0}^{2^n-1}u_{j,k}\lambda _{i,k}=\lambda {i,j}\right| _{i=2^{m-1}-1} \forall j\in \left\{ 0, \dots , 2^n-1\right\} \end{aligned}$$

This is equivalent to \(\left( \lambda _{i,0}, \dots , \lambda _{i,2^n-1}\right) ^\top \) with \(i=2^{m-1}-1\) being an eigenvector to the eigenvalue 1 or being the zero-vector, concluding the proof.    \(\square \)

From Theorem 2 we can derive a corollary that brings this result in a closer relationship with our optimization using the following definition.

Definition 7 (Implied control)

Using the notation from Definition 5, we say the first control is implied by the other controls if \(\lambda _{i,j} = 0\) for \(i = 2^{m-1}-1\) and all \(j \in \left\{ 0, \dots , 2^n-1\right\} \).

If one interprets the basis states as equal-length bitstrings representing variable assignments of \(m+n\) truth variables, then this condition intuitively states that the implication \(x_1\wedge \dots \wedge x_{m-1}\implies x_0\) holds.

Corollary 1 (Sufficient condition for a control to be superfluous)

If the first control is implied by the other controls, it is superfluous.

The following main theorem shows that each of the three possible modifications, as described in Sect. 3.4, does not change the semantics of the circuit.

Theorem 3

Quantum constant propagation does not change the semantics relative to the initial configuration with all qubits in the \(\left| 0\right\rangle \) state.

Proof

Without loss of generality, we can assume that the optimization pass detects the first one of \(m\) controlling qubits as superfluous. Depending on the state of the first qubit, the optimization continues in three different ways.

  1. (i)

    The first qubit is in \(\left| 0\right\rangle \): Here, different from the other two cases, not just the controlling qubit is removed from the controlled gate, rather than the entire gate is removed. Thus, we need to show

    $$\begin{aligned} C^m(U)\left| \Psi \right\rangle = \left| \Psi \right\rangle . \end{aligned}$$

    Since all \(\lambda _{i,j} = 0\) where \(i=2^m-1\) the sum on the right of Eq. (2) reduces to 0 and the claim follows.

  2. (ii)

    The first qubit is in \(\left| 1\right\rangle \): Then the amplitude of all basis state with the first qubit in \(\left| 0\right\rangle \) are equal to 0, i. e., \(\lambda _{i,j} = 0\) where \(i=2^{m-1}-1\) and for all \(j\in \left\{ 1,\dots ,2^n-1\right\} \). Consequently, the condition in Theorem 2 is satisfied, and the first control qubit can safely be removed.

  3. (iii)

    Otherwise: This case can only occur if the optimization found another qubit \(j\) among the controlling ones such that the first qubit is only in \(\left| 1\right\rangle \) if the \(j\)-th qubit is also in \(\left| 1\right\rangle \). Hence, the sufficient condition from Corollary 1 is satisfied and here the first control qubit can be safely removed.

Altogether, this proves the correctness of the QCP optimization pass.    \(\square \)

We continue with an analysis to show that QCP runs in polynomial time.

5 Running Time Analysis

Variable Definition. For the rest of this section, let m be the number of gates in the input circuit and n the number of qubits. Furthermore, let \(k\) be the maximum number of controls attached to any gate in the circuit. Each entanglement group is limited in the number of basis states by the custom constant \(n_{max}\). The achieved asymptotic running time of our QCP is then established by the following lemmas and the main theorem of this section.

Lemma 1

Control reduction runs in \(\mathcal {O}(k^2 \cdot n)\) time.

Proof

As described in Sect. 3.4, the control reduction procedure consists of three steps. First, scanning for classically determined qubits takes \(O(n \cdot k)\) time since the state of all controlling qubits needs to be determined and the entanglement group contains at most \(n_{max}\) basis states, which is constant. The factor of \(n\) comes from retrieving the position and later the state of a specific qubit within the entanglement group which comprises \(\mathcal {O}(n)\) qubits, see also Table 1.

Second, the check for a combination where every controlling qubit is in \(\left| 1\right\rangle \), requires splitting the controlling qubits into groups according to their entanglement groups and then checking within each such group whether a combination of all controlling qubits in \(\left| 1\right\rangle \) exists. There can be \(\mathcal {O}(k)\) groups containing each \(\mathcal {O}(k)\) qubits in the worst case. For each such group, a basis state among the at most \(n_{max}\) basis states where all contained controlling qubits are in \(\left| 1\right\rangle \), needs to be found. This requires retrieving the position and then the state of the individual controlling qubits, which takes \(\mathcal {O}(n)\) for each of those. Together, this step runs in \(\mathcal {O}(k^2 \cdot n)\).

For the third step of finding implications between qubits, we need to consider every pair of qubits in each group already calculated for the previous step. For each pair, we need to retrieve the position and state of the corresponding qubits again, which takes \(\mathcal {O}(n)\) times. Since there are \(\mathcal {O}(k^2)\) pairs to consider, this gives us a running time of \(\mathcal {O}(k^2 \cdot n)\) for this step.

Combined, the running time of the entire control reduction is \(\mathcal {O}(k^2 \cdot n)\).    \(\square \)

To perform the control reduction, the current quantum state needs to be tracked. The running time required for that is given by the next lemma.

Lemma 2

The application of one gate requires \(\mathcal {O}(n)\) time.

Proof

For multi-qubit (un- and controlled) gates, first the affected entanglement groups need to be merged. With the results mentioned in Sect. 3, this requires \(\mathcal {O}(n)\) time considering that \(n_{max}\) is constant.

For uncontrolled gates, there are only single-qubit and two-qubit gates available in current quantum programming tools, hence, we consider the size of the unitary that defines the transformation of those as constant. We first check whether the number of basis states would exceed \(n_{max}\) after the application of the gate; this can be done in \(\mathcal {O}(n)\) by iterating over the basis states in the entanglement group and counting the states relevant for the matrix multiplication.

For the application of the associated unitary, one must iterate over the states in the entanglement group and add for each the corresponding states with their modified amplitudes as described in Sect. 3.2. Since the number of states in an entanglement group is bound by \(n_{max}\) and the unitary is constant in size, this requires \(\mathcal {O}(n)\) time. Checking the state of a specific qubit in a basis state within the entanglement group comprising \(\mathcal {O}(n)\) qubits requires \(\mathcal {O}(n)\) time.

For the controlled case, the procedure is slightly more complicated, since the unitary transformation shall only be applied to basis states where all controlling qubits are satisfied. This can be done by filtering out the right states and then applying the same procedure as above. Hence, since there are at most \(n_{max}\) states, this does not change the overall running time. Consequently, the whole application of one gate can be performed in \(\mathcal {O}(n)\) time.    \(\square \)

Theorem 4

QCP runs in \(\mathcal {O}(m \cdot k^2 \cdot n)\).

Proof

Lemma 1 and Lemma 2 show together, that processing one gate takes \(\mathcal {O}(k^2 \cdot n + n) = \mathcal {O}(k^2 \cdot n)\) time. With \(m\) the number of gates present in the input circuit, this gives us the claimed result.    \(\square \)

In particular, this shows that the entire QCP runs in polynomial time which we consider important for an efficient optimization. This is due to the restriction of the number of states in each entanglement group since this number could otherwise grow exponentially in the number of qubits, i. e., would be in \(\mathcal {O}(2^n)\).

6 Evaluation

The QCP, we propose, only applies control reduction and gate cancellation because of unsatisfiable controls. This may facilitate the elimination of duplicate gates or rotation folding afterward— optimizations which we leave for existing tools capable of this task. In more detail, with the evaluation presented here, we pursue three objectives:

  1. (i)

    Measure the effectiveness of QCP in terms of its ability to facilitate widely used quantum circuit optimizers.

  2. (ii)

    Show that QCP extends existing optimizations that also use the idea of constant propagation, namely the Relaxed Peephole Optimization (RPO) [16].

  3. (iii)

    Demonstrate the efficiency (polynomial running time) of QCP even when processing circuits of large scale.

In the following, we describe the experiments performed to validate our objectives, and afterward, we show and interpret their results. The corresponding artifact [4] provides the means to reproduce the results reported here.

6.1 Experiments

The Benchmark Suite. To provide realistic performance numbers for our optimization, we evaluate it on the comprehensible benchmark suite MQTBench [21]. This benchmark contains circuit representations of 28 algorithms at different abstraction levels; most are scalable in the number of qubits ranging from 2 to 129 qubits. We use the set of circuits at the target-independent level compiled with Qiskit using optimization level 1. This results in a total number of 1761 circuits of varying sizes.

Representation of Numeric Parameters. Due to considerations of practicability and to avoid dealing with symbolic representations of numeric parameters of gates, we convert the parameters to floats and introduce a thresholdFootnote 5 of \(\varepsilon = 10^{-8}\); numbers that differ by less than this threshold are treated as equal, especially numbers less than \(\varepsilon \) are treated equal to zero. Consequently, some gates in the input circuits reduce to the identity gate; we remove those from the benchmark circuit in a preprocessing step.

Test Settings. For purpose (i), we evaluate the influence of QCP with different values for \(n_{max}\) on optimization passes provided by three well-established and widely accepted circuit optimizers—PyZX [14], Qiskit [20], and \(\mathrm {T|ket\rangle }\) [24]. For that, we let those passes run on all benchmark circuits without QCP to create results for a baseline; these numbers are compared with those resulting from first processing the circuits with QCP for different \(n_{max}\) values and then applying those passes. For purpose (ii), we compare the results of the optimization composed by RPO and Qiskit with those when placing QCP before or after RPO into this pipeline. The above comparisons are both conducted for two metrics, namely gate count and control count. For purpose (iii), we record the running times of QCP alone on each input circuit. All experiments are executed on a server running Ubuntu 18.04.6 LTS with two Intel® Xeon® Platinum 8260 CPU @ 2.40 GHz processors offering in total 48 physical cores.

Pre-processing to Fit Circuit Optimizer. Each circuit optimizer supports only a specific gate set. Therefore, certain pre-processing is required to adapt the circuits to the circuit optimizer. This pre-processing includes parameter formatting, gate substitution, and gate decomposition. The latter modification leads to a larger gate count than the input circuit. However, this larger gate count will already be included in our baseline for each circuit optimizer and hence, will not lead to a deterioration of the gate count through the optimization.

6.2 Results

Statistics of the Benchmark Suite. As mentioned in the previous section, we evaluate the QCP on 1761 circuits using between 2 and 129 qubits. The smallest circuits comprise only two gates, whereas the largest circuit contains almost 4.9 million gates. However, except for 16 circuits, the majority contain less than 50 thousand gates. The entire benchmark comprises 23.3 million gates and 22.5 million controls, of which approximately 17 thousand belong to a doubly controlled X-gate and the rest to single-controlled gates. The preprocessing of the circuits to make them suitable for the different circuit optimizers must be considered a best-effort approach. Consequently, some circuits still could not be parsed by the corresponding circuit optimizer. Figure 7 shows exemplarily how many of the 1761 circuits failed either due to a timeout of one minute or another error, remained unchanged regarding their gate count, or changed when first applying QCP for \(n_{max} = 1024\) and then the corresponding optimizer.

Fig. 7.
figure 7

This shows how many circuits remained unchanged, changed, or failed due to timeout (of one minute) or another error when first applying QCP with \(n_{max}=1024\) and then the corresponding optimization tool.

Fig. 8.
figure 8

This plot depicts the aggregated number of gate count (top) and control count (bottom) reduction relative to the baseline, respectively, when applying QCP with different values for \(n_{max}\) (x-axis) and then the corresponding optimizer. Note that a y-value greater than 0 corresponds to an improvement over the baseline of only performing the corresponding optimization alone (i. e., PyZX, Qiskit, or \(\mathrm {T|ket\rangle }\)).

Improvement of Standard Optimizers. Figure 8 shows a summary of the first experiment; the plots show how many more gates and controls, respectively, could be removed in total over the entire benchmark utilizing QCP than just using the corresponding optimizer alone. The plots for Qiskit and \(\mathrm {T|ket\rangle }\) show that the reduction of gates and controls increases gradually with the value of \(n_{max}\). Note that the absolute numbers for PyZX are smaller since PyZX fails on a lot more circuits compared to the other two optimization tools. In any case, it is evident from the plot that QCP improves the result of each optimizer.

Distribution of Relative Improvement. To show the impact of QCP in more detail, we calculate the relative gate reduction for each circuit by dividing the absolute gate reduction by the total gate count before optimization; analogously, we calculate the relative control reduction for every gate. Only for those circuits that fall into the category changed in Fig. 7, we plot the respective distribution of the relative gate and control count reduction. Figure 9 shows the histograms when applying QCP with \(n_{max} = 1024\) before each circuit optimizer. In those plots, the width of each bin amounts to 0.02. We only plot these plots for \(n_{max} = 1024\) because they look almost identical for other values of \(n_{max}\). These plots show that the impact of QCP is small on the majority of circuits. However, some circuits benefit considerably, especially when applying the optimizer \(\mathrm {T|ket\rangle }\) afterward, which looks for patterns to replace with fewer gates; apparently, QCP modifies the circuit such that more of those patterns occur in the circuit.

Fig. 9.
figure 9

The relative reduction of gates (top) and controls (bottom) of the circuits that appear in the category changed in the plot from Fig. 7.

Interaction with RPO. RPO [16] propagates the initial state through the circuit as long as the single qubits are in a pure state, see also Sect. 7. To achieve this type of state propagation in our framework, a value for \(n_{max}\) of two suffices. Still, QCP with \(n_{max} = 2\) can track more information as RPO since also two basis states can suffice to express multiple qubits that are in a superposition of two basis states. Figure 10 and Fig. 11 depict the mutual influence of RPO and QCP. For values 1 and 2 for the parameter \(n_{max}\), QCP does deteriorate the results of RPO when applied before RPO. This is because RPO also implements some circuit pattern matching together with circuit synthesis; when QCP destroys such a pattern, this optimization can not be applied at this position anymore. However, for larger values for \(n_{max}\), those plots show that QCP finds additional optimization potential and is therefore not subsumed by RPO. When looking at Fig. 10, one can see that RPO even benefits QCP: In this setting, approximately 10 times more gates can be removed compared to only using QCP with Qiskit afterward. These remarkable results are mainly due to two circuit families, namely qpeexact and qpeinexact, where RPO removes some controlled gates with their technique in the first place and facilitates that QCP can remove even more controlled gates.

Fig. 10.
figure 10

Those two plots show the reduction of gates and controls, respectively, when applying QCP with different values for \(n_{max}\) (x-axis) after RPO and finally Qiskit.

Fig. 11.
figure 11

Those two plots show the reduction of gates and controls, respectively, when applying RPO after QCP with different values for \(n_{max}\) (x-axis) and finally Qiskit.

Analysis of QCP Alone. QCP only fails on six circuits, of which one is a timeout, and five produce an error because of an unsupported gate. QCP needs the most time on the grover and qwalk circuits; on all other circuits, it finishes processing after at most 3.6 s. In general, the running time of QCP is high if it must track high entanglement for many gates. Accordingly, Fig. 12 shows the running time of QCP on the circuits that belong to the family of Quantum Fourier Transform. Those produce maximum entanglement among the qubits where all possible basis states are represented at the end of the circuit. The plot displays the running time against the number of qubits. Note that the number of gates, and therefore the size of the circuit, grows quadratically with the number of qubits. A full simulation of those circuits would result in exponential running time. The plots indicate that QCP circumvents the exponential running time by limiting the number of basis states to express the state of an entanglement group by \(n_{max}\).

Explanation of Outliers. The plot in Fig. 12 shows outliers, especially for larger values for \(n_{max}\). Those outliers indicate an exponential running time cut-off at a specific qubit count depending on the value of \(n_{max}\). Considering the circuits reveals that due to the generation pattern of those circuits, the chunk of gates executed on the maximal possible entanglement gradually increases in size until the qubit count where the running time drops again. For example, the maximum outlier in the plot for \(n_{max} = 4096\) is reached for 112 qubits. In this circuit, 271 gates are executed on the maximum entanglement comprising 4096 basis states without increasing the entanglement before the gate that increases the entanglement above the limit is processed. In the circuit for one more qubit, i. e., 113 qubits, just 13 gates are executed on the largest possible entanglement. This is due to the order in which the gates in the input file are arranged. In summary, those practical running time measurements underpin our theoretical statements from Sect. 5 since the exponential growth would continue unrestrained otherwise. The results provided indicate differences from existing optimizations. In the next section, we compare our proposed optimization with those and other optimization techniques on a broader basis.

7 Related Work

Other (Peephole) Optimizations. Existing optimization tools [2, 20, 24] mostly look for known patterns consisting of several gates that can be reduced to a smaller number of gates with the same effect. A special case of those optimizations is gate cancellation that removes redundant gates: Many of the common gates are hermitian, i. e., they are self-inverse; when they appear twice directly after each other, both can be dropped without influencing the semantics of the program. When we applied the optimization tools mentioned at the beginning of this paragraph on the circuit shown in Fig. 1, none of those could reduce the circuit to the equivalent empty circuit.

Fig. 12.
figure 12

This plot shows the running time of QCP for different values of \(n_{max}\) against the number of qubits (x-axis). The outliers occur due to the structure in which the circuits are generated; more details can be found in the text.

Bitwise Simulation. As already mentioned in Sect. 3.2, the idea to use a hash table to store the quantum state goes back to a simulator developed by Da Rosa et al. [7]. They use a hash table in the same way as we described in Sect. 3.2 with the basis states as keys and their associated amplitudes as values. However, our approach improves upon theirs by keeping qubits separated as long as they are not entangled following the idea in [3] and, hence, be able to store some quantum states even more efficiently. In contrast, Da Rosa et al. use one single hash table for the entire quantum state. Since they want to simulate the circuit and not optimize it as we aim for, they do not change to \(\top \) if the computed quantum state becomes too complex. Consequently, their simulation runs still in exponential time even though it is not exponential in the number of qubits but rather in the degree of entanglement [7].

Initial-State Optimizations. Circuit optimization tools developed by Liu et al. [16] and Jang et al. [13] both take advantage of the initial state. Liu et al. leverage the information on the single-qubit state which could be efficiently determined at compile time [16]. They implement state automata to keep track of the single-qubit information on each pure state for circuit simplifications. Single-qubit information is lost though when a multi-qubit gate is applied except for a few special cases since a pure state could then turn into a mixed state. To tackle this issue, users are allowed to insert annotations from which some single-qubit information can be recovered. Our approach, however, avoids treating qubits as independent of each other and tries to trace the global state of the quantum system, enabling us not to lose all the information on qubits even after applying a multi-qubit gate on them. The circuit optimizer proposed by Jang et al. aims to remove redundant control signals from controlled gates based on the state information [13]. Instead of classical simulation, they repeatedly perform quantum measurements at truncation points to determine state information. Besides, in order to consider the noise of quantum computers, they set thresholds depending on gate errors and the number of gates and drop observations that are below the thresholds. Although their approach is lower in computational cost compared to classical simulation, the fact that quantum measurements are needed disallows their tool to run at the compile time only, since shipping circuits to the quantum runtime is necessary for performing measurements. Additionally, in their scheme, it is assumed that the controlled gate in the circuit is either a Toffoli gate or a singly-controlled unitary operation denoted to avoid computations growing exponentially, therefore gate decompositions are needed to guarantee that the assumption holds. In contrast, our approach runs statically at compile time and no prior assumption or pre-processing is required for the success of the analysis. In addition, Markov et al. [17] and Vandersypen et al. [26] optimize their circuits manually using arguments based on initial-state information.

Quantum Abstract Interpretation. Another point of view within the static analysis of quantum programs was established by Yu and Palsberg [28]. They introduce a way of abstract interpretation for quantum programs to verify assumptions on the final state reached after the execution of the program, hence their focus is not on the optimization of the circuit but rather to verify its correctness. Interestingly, their approach to focus on a particular set of qubits mimics our separation of entanglement groups, or to put it the other way around, our separation can be seen as one instantiation of their abstract domain just that we allow to alter the groups during simulation of the circuit instead of keeping them fixed over the entire computation as they do. Consequently, our approach dynamically adapts to the current circuit whereas Yu and Palsberg need to fix the set of qubits to focus on statically for their quantum abstract interpretation.

Classical Constant Propagation. When designing our optimization we were inspired by constant propagation known from classical compiler optimizations for interprocedural programs such as C/C++ programs [22]. However, our QCP differs fundamentally from classical constant propagation: In our case, we just need to pass the information along a linear list of instructions (the gates); the problem here is the sheer mass of information that needs to be tracked. In the classical case, the challenge is to deal with structural program elements such as loops and conditional branches that prevent linearly passing information about values. Here, a constraint system consisting of equations over an abstract domain is derived from the program which then needs to be solved.

8 Conclusions

Summary. In our work, we take the idea of utilizing the most common execution condition of quantum circuits where the initial states of all qubits are in \(\left| 0\right\rangle \) and propose our optimization, QCP, which simulates circuits in a restricted but computationally efficient way and has demonstrated its power in one of the circuit optimization tasks, namely control reduction. In addition, QCP works in harmony with quantum computers: QCP runs in polynomial time and hence can be executed efficiently on classical computers, the output of QCP, optimized circuits, which cannot be efficiently simulated on classical computers, are submitted to quantum computers for execution. That is, we let the classical computer do all where it is good at and leave only the rest for the quantum computer. The success of QCP not only proves the value that resides within initial state information but also contributes to the research on quantum circuit optimization based on methods of static analysis running on classical computers. It is already clear that quantum circuits are expected to grow larger and larger, where building blocks containing multi-controlled gates will be heavily used. For example, OpenQASM 3.0, a highly accepted Quantum assembly language for circuit description, allows users to write arbitrarily many controls for gates [6]. Therefore, it is likely that our QCP will play to its strengths even more in the future.

Future Work. It is worthwhile to consider other abstract domains, e. g., the one used by Yu and Palsberg [28] that keep partial information about the state and still maintain the efficiency we desire. Additionally, it could be useful for QCP to consider an abstract state of meta-superposition which stores possible states after the measurement in a probability distribution. The use of meta-superposition would allow QCP to simulate circuits with intermediate measurements, i. e., measurements that happen not at the end of the circuit. We also plan to incorporate and evaluate the idea of the threshold from [13], so that QCP will be able to discard basis states that are not significant to the simulation and will be indistinguishable from noise on a real quantum computer. Besides, currently QCP is not able to detect when qubits become separable again after they were entangled. Implementing such detection facilitates keeping more state information and thus performs better optimizations. Another direction is to increase the capability of the control reduction itself: For this, we want to generalize the ideas proposed in [16] that use only pure state information of single qubits, to our setting. This includes replacing fully simulated parts of the circuit by means of circuit synthesis methods, such as KAK decomposition [25]. It is possible that performing QCP causes a loss of opportunities for other optimizations. So, one might also be interested to study how to determine the optimal order to perform different optimization passes.