In this section, we will apply the zero-noise extrapolation method to our circuit. The basic assumption of the method is that the expectation value of an observable depends smoothly on a small noise parameter \(\lambda \ll 1\) and admits the following power series,
$$\begin{aligned} \langle A \rangle _{|\phi \rangle }(\lambda ) = \langle A \rangle _{|\phi \rangle }^* + \sum _{i=1}^n a_i \lambda ^i + \mathcal {O}(\lambda ^{i+1}), \end{aligned}$$
(8)
where \(\langle A \rangle _\phi ^*\) is the zero noise value we are trying to recover. Richardson’s deferred approach to the limit [15] can then be applied to get a better estimate of the zero noise value. The method requires to generate n estimates to the expectation value, i.e., \(\langle A \rangle _\phi (r_i\lambda )\) for \(r_1<r_2<\dots <r_n\). A better estimate of \(\langle A \rangle _\phi ^*\) is then constructed by combining these values in such a way that the lowest order terms in the power series cancel. As an example, we can get a second-order approximation of the expectation value by combining the results for \(r_1=1\) and \(r_2=2\) through
$$\begin{aligned} 2 \langle A \rangle _{|\phi \rangle }(\lambda ) - \langle A \rangle _{|\phi \rangle }(2\lambda ) = \langle A \rangle _{|\phi \rangle }^* + \mathcal {O}(\lambda ^2). \end{aligned}$$
(9)
Clearly, using \(r_1=1\) generates the expectation value with the least noise. Amplification of noise with the factors \(r_i>1\) can either be achieved directly through pulse control or through modifying the circuit by adding certain extra gates. For IBM’s QX devices, pulse control on devices with more than one qubit is only accessible for their customers, which leaves us with the second possibility.
Pauli-twirling
Before we apply the noise amplification, we convert the non-stochastic errors of CX gates into stochastic errors, see, e.g., [9, section VII] for a detailed description. One way to achieve this is to apply Pauli-twirling. Given a finite group G of quantum operations and a quantum channel \(\Lambda \), the average
$$\begin{aligned} \frac{1}{|G|} \sum _{U\in G} U^\dagger \Lambda U, \end{aligned}$$
(10)
is called a twirl of the channel \(\Lambda \). In our case, gates \(\sigma ^a, \sigma ^b, \sigma ^c, \sigma ^d\) are inserted before and after each CX gate \(\Lambda \), where \(\sigma ^i\) is chosen from the twirling set consisting of the Pauli gates \(\{ \mathbb {1}, \sigma ^x, \sigma ^y, \sigma ^z \}\). After randomly (with uniform probability) choosing \(\sigma ^a, \sigma ^b\), the gates \(\sigma ^c,\sigma ^d\) are then chosen to satisfy
$$\begin{aligned} \sigma ^c \otimes \sigma ^d = \mathrm {e}^{i \theta } \Lambda (\sigma ^a \otimes \sigma ^b) \Lambda ^\dagger . \end{aligned}$$
(11)
This ensures that the overall effect results only in a phase change, which does not change the measurement outcome. The circuit constructed with Pauli-twirling applied to all CX gates is therefore equivalent to the original circuit. Figure 4 shows a schematic depiction of Pauli-twirling as well as all valid combinations for the CX gate. In practice, this method is applicable, if the assumption holds that the qualities of single-qubit gates are an order of magnitude smaller than two-qubit gates. Twirling should then only have a negligible effect on the fidelity of the expectation value on NISQ devices. Figure 2 indicates that noise manifests itself in an increase in the variance of the distribution. There is no effect for the ideal simulator.
Noise amplification
In order to amplify the strength of the noise, we will apply random Pauli gates with a probability proportional to the error rate of the CX gate between a given pair of qubits. More precisely this is means applying gates \(\sigma ^e, \sigma ^f\) randomly chosen form the set of Pauli gates \(\{ \mathbb {1}, \sigma ^x, \sigma ^y, \sigma ^z \}\) after the twirled CX gates with probability \((r-1)\epsilon _{i,j}\), see a depiction in Fig. 4a. Note that there are only 15 possible choices for \(\sigma ^e \otimes \sigma ^f\), since \(\mathbb {1}\otimes \mathbb {1}\) must be excluded because it does not increase the error. Here, \(\epsilon _{i,j}\) is the two-qubit gate error rate between qubits \(q_i\) and \(q_j\). On average, this increases the error rate to the desired value \(\epsilon _\text {new} = \epsilon _{i,j} + (r-1)\epsilon _{i,j} = r\epsilon _{i,j}\).
Figure 5 shows the result for both the simulated error models and the real quantum devices. The assumption that the expectation value of an observable depends smoothly on r seems to hold for the simulator with the IBM QX2 noise model and the IBM QX2 device, but less so for the IBM ourense device, see Fig. 5d. This is likely because some of the underlying assumptions of the method are violated for the ourense device, e.g., the existence of non-Markovian noise, spatially or temporally correlated noise, etc. The result shown in Fig. 5d seems to justify the assumption of the exponential variant of the extrapolation method presented in [3].
Additional insight is provided by looking at the distribution for \(r\in \{1,4,32\}\). Since we increase the noise of CX gates artificially by adding Pauli gates, this means that other outcome strings are becoming more likely. Figure 5a-c shows that expectation values of 1, 2, 3 become increasingly likely. The result is a multi-peaked distribution. The noisy results show the same basic behavior as the ideal circuit. In general, the noise models seem to lead to better estimates of the expectation values than the real quantum devices, limiting their usefulness somewhat.
Error mitigation of measurement noise
Measurement noise is another major source of error. Here, we use the model that assumes spatially uncorrelated errors of a bit flip. We compute the probability that the state \(|i\rangle \) is observed if the state \(|j\rangle \) is prepared, i.e., the conditional probability \(P_{i,j}:=P(|i\rangle ||j\rangle )\). The matrix
$$\begin{aligned} P=\left[ \begin{matrix} P_{1,1}&{}\dots &{}P_{1,2^n}\\ \vdots &{}\ddots &{}\vdots \\ P_{2^n,1}&{}\dots &{}P_{2^n,2^n}\\ \end{matrix}\right] , \end{aligned}$$
(12)
is a (right) stochastic matrix, as \(\sum _j P_{i,j} = 1\). In the absence of noise \(P_{i,j}=\delta _{i,j}\), but measurement (and other) noise leads to nonzero off-diagonal entries. The resulting probabilities for IBM’s QX2 are shown in Fig. 6a.
Let us assume that, for a quantum computer, we are given P and a probability distribution \(D_\text {noisy}\) induced by measurement of a quantum state \(|\Psi \rangle \). Using the equation
$$\begin{aligned} D_\text {noisy} = P D_\text {ideal}, \end{aligned}$$
(13)
we can retrieve the ideal distribution \(D_\text {ideal}\) of \(|\Psi \rangle \). As an example, we are preparing the Bell state \(|\Psi \rangle =1/\sqrt{2}(|00\rangle +|11\rangle )\), but the resulting distribution is \(D_\text {noisy}=(13,2,2,13)/30\). In addition, we have that \(P_{i,j}\) is 0.8 if \(i=j\) and 0.2/3 otherwise. By solving (13), we can then retrieve the noiseless distribution \(D_\text {ideal}=(1/2,0,0,1/2)\).
In order for the method to work, measurement errors must be at least one order of magnitude larger than state preparation and the execution of the X gate. This condition is satisfied for IBM’s QX2 and ourense device, compare Fig. 3. In addition, it must be mentioned that this requires an exponential amount (in the number of qubits) of states to be prepared and measured to build the matrix P. In this work, we use the implementation provided by Qiskit [1]. Fig. 6b and the column for \(E_1\) in Fig. 6c show a clear improvement by applying the measurement filter for Max’s circuit.
Overall results
In all of our experiments, we generate N circuits randomly with Pauli-twirling and a noise amplification factor r. Each of these circuits is called a “repetition” and uses 8192 shots. The number of this random circuits (repetitions) has to be large enough to cover the whole sample space. Max’s circuits have 9 CX operations, which is why we used \(N=1024\) repetitions. We can see in Fig. 7a, b that this number is sufficient for convergence. The results for other circuits are similar.
Figure 7 shows the convergence of the circuits and the effect of error mitigation techniques on the expectation value. With \(E_r=E(r)\), we denote the expectation value achieved with amplification factor r, and by R(.) the Richardson extrapolation. Without error mitigation, the expectation value for the X2 device is closer to the theoretical value than the ourense device. The execution on real devices leads to a worse result as the simulators. Compare also Fig. 2b
Applying Richardson, extrapolation clearly improves the resulting expectation value in all cases. With increasing number of terms, the achieved estimate of the expectation value seems to converge. For the X2 device, already 2–3 terms are sufficient to achieve a very good approximation. For the ourense device, the results are not as good. This is most likely due to longer circuit depth and higher measurement errors.
Applying the measurement error filter alone helps to improve the expectation value as well, particularly when using a quantum simulator. However, on the real devices, the results are not as good as for the Richardson extrapolation. Combining Richardson extrapolation and measurement error filter seems to only work for the ourense model and device.