Nondestructive classification of quantum states using an algorithmic quantum computer


Methods of processing quantum data become more important as quantum computing devices improve their quality towards fault tolerant universal quantum computers. These methods include discrimination and filtering of quantum states given as an input to the device that may find numerous applications in quantum information technologies. In the present paper, we address a scheme of a classification of input states, which is nondestructive and deterministic for certain inputs, while probabilistic, in general case. This can be achieved by incorporating phase estimation algorithm into the hybrid quantum-classical computation scheme, where quantum block is trained classically. We perform proof-of-principle implementation of this idea using superconducting quantum processor of IBM Quantum Experience. Another aspect we are interested in is a mitigation of errors occurring due to the quantum device imperfections. We apply a series of heuristic tricks at the stage of classical postprocessing in order to improve raw experimental data and to recognize patterns in them. These ideas may find applications in other realization of hybrid quantum-classical computations with noisy quantum machines.


Machine learning is a computing paradigm, where recognition of patterns in available data plays a central role, but the computing system is not explicitly programmed; many examples indeed demonstrate success of this approach to real-world problems. Quantum machine learning is an emergent technology based on the assumption that quantum resources can be useful in the pattern analysis, see, e.g., Wiebe et al. (2012), Schuld et al. (2015a), Biamonte et al. (2017), Amin et al. (2018), Preskill (2018), and Adcock et al. (2015). Quantum algorithms within such applications can be used as a part of a larger computation scheme which also incorporates classical blocks.

There are two major approaches for the construction of a quantum block in such schemes—it can be represented either by quantum annealer or by algorithmic quantum computer (Biamonte et al. 2017). Most of the proposals unfortunately are characterized by input/output bottlenecks occurring at stages of encoding classical data into quantum states and decoding them back (Aaronson 2015; Arunachalam et al. 2015). However, these bottlenecks seem to be not severe in the case input states are quantum (Granade et al. 2012; Wiebe et al. 2014). The role of the quantum machine is to recognize their underlying patterns, which may have no classical counterpart (for example, characteristics of quantum entanglement), and then to classify these states or filter them. Let us stress that the classification of quantum states is supposed to play a crucial role in quantum metrology and sensing (Degen et al. 2017). For instance, in quantum illumination problem, one has to operate with the entangled photonic states and to reveal their characteristics (Lloyd 2008; Tan et al. 2008). Another possible source of quantum data can be a quantum simulator or another quantum computer (for example, more noisy and/or of a larger size) (Biamonte et al. 2017).

Machine learning tasks can be roughly divided into supervised and unsupervised. In the present paper, we address a hybrid quantum-classical approach to the problem of classification of input quantum states, where quantum block is trained classically with the set of labeled input vectors (supervised learning). An essential ingredient of the model we consider is a phase estimation algorithm embedded into the quantum part of the computational scheme. Using ancilla qubits, it is possible to extract information about quantum state without doing a direct measurement of the qubits encoding this state. It is thus possible to make a classification of certain input quantum states both nondestructively and deterministically. For general input states, the classification is probabilistic. This idea is motivated by the recent suggestion on simulation of perceptron on a quantum computer (Schuld et al. 2015b).

We also perform proof-of-principle realization of our scheme with real superconducting quantum computer of IBM Quantum Experience available through the cloud service. Its performance, as well as performances of existing quantum computers based on other physical realizations, is limited by imperfections of quantum hardware, which include effects of decoherence and quantum gate errors. This limitation restricts possible realizations of quantum machine learning algorithms to few-qubit examples, see, e.g., Cai et al. (2015) and Li et al. (2015). We therefore address a rather simple toy model, which is associated with the classification of maximally entangled two-qubit states. In order to obtain a valuable information from raw experimental data affected by noise, we apply a series of tricks based on classical postprocessing which are also associated with pattern recognition. These ideas can be of interest in a general context of hybrid quantum-classical computation, which attracts a lot of attention now, see, e.g., Farhi et al. (2014), McClean et al. (2016), Peruzzo et al. (2014), Preskill (2018), Kandala et al. (2017), and Ristè et al. (2017).

This paper is organized as follows. In Section 2, we explain basic ideas behind the approach used. In Section 3, we present an explicit treatment of a toy model dealing with the classification of two-qubit maximally entangled states. In Section 4, we describe the realization of this toy model on superconducting quantum computer of IBM Quantum Experience and apply different approaches to mitigate the effect of errors. We conclude in Section 5.

Phase estimation algorithm in classification problems

Programmable quantum computers operate with data encoded into quantum states. An example of the potential applications for quantum computers is a classification of states given as input, according to some criterion or criteria. In order to accomplish this task, one has to construct a circuit which signals out if a state belongs to one of predefined classes. Another example is associated with filtering problem—quantum device must nondestructively pass a state which belongs to a predefined class and should also signal this event out. The problem on how to construct such a circuit is obvious only for trivial cases and it is not simple for more complex quantum states.

One of the possible solutions is to use ideas from the machine learning field. For example, it is reasonable to construct a quantum circuit with some limited number of free parameters which enter certain blocks of the algorithm. Then, the quantum algorithm can be “trained” by sending training states to the input, tuning the parameters and finding their optimal values allowing for the desirable classification, which can include multiple groups. It is difficult to implement the training as a purely quantum procedure, so that this part of the whole scheme might be accomplished classically, i.e., through the classical computer. The scheme, in this case, represents one of the numerous examples of a hybrid quantum-classical computations. The classical training procedure can be based on various methods, such as grid search, Monte Carlo method, or gradient descent method.

In the present paper, for the quantum part of the scheme, we adopt ideas based on the phase estimation algorithm, which enables to get information about an input state without doing a direct measurement of the qubits encoding this state, but instead exploits ancilla qubits. For certain input states, the classification can be made both nondestructive and deterministic. The quantum block of this circuit is shown schematically in Fig. 1, where U(ω) is a unitary operator parametrized by a set of tunable parameters ω to be adjusted during the training procedure. If the input state is an eigenstate of U(ω), the measurements of ancilla qubits do not destroy it, so |ψ〉 is passed nondestructively through the scheme (apart of a general phase it obtains). Moreover, in this case, the measurements of ancilla qubits are deterministic, provided the eigenvalue of |ψ〉 is \(\exp (2i \pi n/2^{N_{a}})\), where Na is the number of ancilla qubits, whereas n is an integer number ranging from 0 to \(2^{N_{a}} - 1\). The inverse statement is also true: deterministic results of ancilla’s measurement are possible only if the input state is one of the eigenstates of U(ω) and its eigenvalue is of the form \(\exp (2i \pi n/2^{N_{a}})\).

Fig. 1

A schematic view of the quantum circuit. |ψin〉 is an input state and U(ω) is a parametrized unitary, where ω is a set of tunable parameters to be adjusted during the training procedure

Hence, if there are two input states each being eigenstates of U(ω) with different eigenvalues of the above type, it is possible to classify these states both nondestructively and deterministically by doing measurements of ancillas. Otherwise, the classification is probabilistic: the probability to get a set of 0 and 1 corresponding to the eigenstate of U(ω) with given eigenvalue \(\exp (2i \pi n/2^{N_{a}})\) is the sum of overlaps between |ψ〉 and all mutually orthogonal eigenstates of U(ω) characterized by this particular eigenvalue. If |ψ〉 is the eigenstate of U(ω), the classification is nondestructive, but probabilistic, in general case. Notice that the nondestructive character of state transfer through the circuit can be probed by the SWAP test.

We now discuss the same problem, but from another perspective. Let us assume that we have M orthogonal input states. We may try to perform an ideal classification of these states, i.e., to construct an operator U(ω), for which these states are eigenstates and, moreover, the results of ancilla’s measurements allow for the unambiguous deterministic discrimination between them. Let us stress that such a circuit provides a nondestructive and deterministic classification among given set of M input states, while a general input state is classified probabilistically. In the latter case, through the repeating measurements, we may recognize which of the M states of the training set the input state is closer to. It is clear that the minimum necessary number of ancilla qubits is determined by the condition \(2^{N_{a}}\geqslant M\). Apparently, requirements for the operator U(ω) for such a classification are quite restrictive. Alternatively, it is possible to find such a U(ω), which yields a nondestructive but probabilistic classification of M orthogonal training states. Again, the nondestructive character of the input state transfer through the circuit can be verified by SWAP test.

The problem of efficient construction of desirable U(ω) is far from being obvious. In principle, it is possible to try a brute-force strategy, which seems rather universal: one may use a fixed entangler of all qubits of the register and to apply it multiple times, but to insert a set of single-qubit rotations between each application of the entangler; rotation angles can be treated as variational parameters. A similar approach was utilized in Kandala et al. (2017) for the preparation of variational many-body states for the modelling of molecules. It is then possible to optimize some error function in order to minimize a level of “destructiveness” or “non-determinism” of the classification. Another possible strategy is to rely on heuristics when finding suitable form of U(ω), which depends on the characteristics of vectors from the training set. Below, we discuss a toy model, which contains all essential ingredients of the scheme we discuss and can be tested with existing quantum machines. Within this simple example, we follow the heuristic approach for the construction of a proper operator U(ω).

Toy model: classification of maximally entangled two-qubit states

Let us consider four possible input states defined as two-qubit maximally entangled states. In other words, we assume that there are four training vectors, which are Bell states |Φ±〉 and |Ψ±〉, given by

$$ \begin{array}{@{}rcl@{}} |{\varPhi}_{\pm}\rangle = \frac{1}{\sqrt{2}}(|00\rangle \pm |11\rangle),\\ |{\varPsi}_{\pm}\rangle = \frac{1}{\sqrt{2}}(|10\rangle \pm |01\rangle). \end{array} $$

Our aim is to construct an ideal classification scheme allowing for the nondestructive and deterministic classification of these four states into two classes |Φ±〉 and |Ψ±〉.

The states of these two classes differ from each other by their “internal structure” reflected in the probabilities to be in the orthogonal states of computational basis, which is not sensitive to the phases. Therefore, it is perspective to construct U on the basis of rotations around z axis. We thus parametrize U as U = Uz1(ω1)Uz2(ω2), where indices 1 and 2 refer to the qubit number and \(U_{z}(\omega )=\left [\begin {array}{cc} e^{-i\pi \omega /2} & 0 \\ 0 & e^{i\pi \omega /2} \end {array}\right ]\) is a single-qubit rotation around z axis.

We first show explicitly that such a parametrization for U gives a desirable result and also determine optimal values of ω1 and ω2 yielding nondestructive and deterministic classification. We then do the same work using the real quantum computer by finding such optimal parameters through the grid search that can be treated as a learning procedure.

It is easy to see that |Φ±〉 are eigenstates of U provided ω1 + ω2 = 2k, where k is an integer number. The eigenvalue of U for both |Φ+〉 and |Φ〉 is the same, UΦ = eiπk. Similarly, |Ψ±〉 are eigenstates of U provided ω1ω2 = 2q, where q is an integer number; while the eigenvalue of U for both |Ψ+〉 and |Ψ〉 is the same, UΨ = eiπq. Let us choose p and q in such a way as to make UΨ and UΦ different from each other, which is necessary for the classification to work. Obviously, parities of p and q must be opposite. We may choose, for instance, k = 0 and q = 1, which leads to UΦ = −UΨ = 1 and ω1 = −ω2 = 1. Fortunately, for our simplistic toy model, both eigenvalues we found fall automatically into the discrete set, which enables for a deterministic classification. This can be achieved using a single ancilla. The whole quantum scheme for this case is shown in Fig. 2. For the input state |Φ±〉⊗|0〉, the output state at the end of the circuit is \(|{\varPhi }_{\pm }\rangle \otimes \frac {1}{2} ((1+U_{{\varPhi }}) |0\rangle + (1-U_{{\varPhi }}) |1\rangle ) = |{\varPhi }_{\pm }\rangle \otimes |0\rangle \). For the input state |Ψ±〉⊗|0〉, the output is \(|{\varPsi }_{\pm }\rangle \otimes \frac {1}{2} ((1+U_{{\varPsi }}) |0\rangle + (1-U_{{\varPsi }}) |1\rangle ) = |{\varPsi }_{\pm }\rangle \otimes |1\rangle \). Thus, we see that indeed nondestructive and deterministic classification of two groups of input states is possible, since for |Φ±〉 the probability P0(|Φ±〉) to find ancilla in the state |0〉 is exactly 1, while for |Ψ±〉 the probability P0(|Ψ±〉) to find ancilla in the state |0〉 is exactly 0. The scheme basically performs a parity check, and the parity is to be considered as a “quantum pattern”.

Fig. 2

A quantum circuit for the case of two-qubit input states (see in the text)

For the input two-qubit state of a general form

$$ |{\varPsi}\rangle = \alpha|00\rangle + \beta|01\rangle + \gamma|10\rangle + \delta|11\rangle $$

after some straightforward calculations, we obtain the expression for probability P0(|Ψ〉) to find ancilla in the state |0〉 provided optimal ω1, ω2 = 1 are incorporated into the circuit

$$ P_{0}(|{\varPsi}\rangle) = \frac{1}{2} + \frac{|\alpha|^{2} + |\delta|^{2}}{2} - \frac{|\beta|^{2} + |\gamma|^{2}}{2}. $$

It can be rewritten as

$$ \begin{array}{@{}rcl@{}} P_{0}(|{\varPsi}\rangle) &=& \frac{1}{2}+\frac{1}{2}\left( |\langle {\varPhi}_{+}|{\varPsi} \rangle|^{2} + |\langle {\varPhi}_{-}|{\varPsi} \rangle|^{2}\right.\\ && \qquad\quad \left. - |\langle {\varPsi}_{+}|{\varPsi} \rangle|^{2}-|\langle {\varPsi}_{-}|{\varPsi} \rangle|^{2} \right). \end{array} $$

In this general case, the scheme works as a probabilistic classifier, and the classification occurs according to the distance between the input state and two subspaces, in which |Φ±〉 and |Ψ±〉 form local bases. We stress that P0(|Ψ〉) is no longer exactly 0 or 1, while a measurement of the ancilla cannot be treated as nondestructive. A nondestructive classification is possible between quantum states of two classes, α|00〉 + δ|11〉 and β|01〉 + γ|10〉.

Now let us come back to the previous stage and consider the learning procedure. If explicit treatment is impossible, optimal values of ω1 and ω2 have to be determined from the results of measurements of ancillas. Let us introduce probability P0(|Ψ〉;ω1, ω2) to find the ancilla in the state |0〉 for general (ω1, ω2) and for the input state |Ψ〉. This quantity is a generalization of P0(|Ψ〉) given by Eq. 3 and it can be written as

$$ \begin{array}{@{}rcl@{}} P_{0}(|{\varPsi}\rangle; \omega_{1},\omega_{2}) &=& \frac{1}{2} + \frac{|\langle {\varPhi}_{+}|{\varPsi} \rangle|^{2} + |\langle {\varPhi}_{-}|{\varPsi} \rangle|^{2}}{2}\\ && \times \cos \left( \frac{\pi}{2}(\omega_{1} + \omega_{2})\right) \\ &&+ \frac{|\langle {\varPsi}_{+}|{\varPsi} \rangle|^{2} + |\langle {\varPsi}_{-}|{\varPsi} \rangle|^{2}}{2}\\ && \times \cos \left( \frac{\pi}{2}(\omega_{1} - \omega_{2})\right). \end{array} $$

The training procedure consists in finding optimal (ω1, ω2) by evaluating both P0(|Φ±〉;ω1, ω2) and P0(|Ψ±〉;ω1, ω2) and extracting points in the (ω1, ω2) space, where the first quantity is exactly 1, while the second quantity is exactly 0 (or vice versa). Values of (ω1, ω2) can be tuned by the classical computer, while quantum algorithm is implemented with the quantum computer. The brute-force method to determine optimal (ω1, ω2) is a grid search. In the next section, we perform such a search using the real quantum computer. The experimental results will be compared with the explicit treatment. In order to facilitate this comparison, in Fig. 3, we show the results of our calculations for P0(|Φ±〉;ω1, ω2) and P0(|Ψ±〉;ω1, ω2) based on Eq. 5. From this figure, we again see that there are values of ω1 and ω2, supporting a discrimination between two pairs of Bell states in a single measurement.

Fig. 3

Probability patterns P0(|Ψ〉;ω1, ω2) for |Ψ〉 = Φ± (a) |Ψ〉 = Ψ± (b). Points, where a discrimination between two pairs of Bell states is done nondestructively in a single measurement of the ancilla qubit, are marked by red

Fig. 4

Qubit connectivity map of the 16-qubit quantum chip IBMqx5. Qubits used for the scheme implementation are marked by the red color

Implementation on a noisy quantum device

Quantum circuit

Having a simple algorithm at hand, we perform proof-of-principle realization on a currently available quantum device. An additional important issue we are interested in is an error mitigation in hybrid quantum-classical computation schemes, so we consider the realization of a given algorithm as a playground for this quite general problem.

We use 16-qubit IBMqx5 superconducting quantum chip, which is available through the cloud service within the IBM Quantum Experience project. The realization of our scheme is illustrated in Figs. 4 and 5. Figure 4 shows the schematic image of the chip. The qubits utilized in our quantum algorithm are shown by the red color. The quantum circuit itself is presented in Fig. 5. Due to the limitations in connectivity, the quantum circuit includes an additional SWAP gate required to interchange quantum states of two physical qubits. Note that this gate is composed of three CNOT gates and it therefore provides an additional significant contribution to the total error rate.

Fig. 5

The quantum circuit implemented in real quantum processor IBMqx5. Pairs of symbols × denote SWAP operation on corresponding pair of qubits used to circumvent the limitations of connectivity of the chip

Raw data

State-of-the-art quantum computers still suffer from decoherence problem, as well as imperfections of quantum gates and readouts. In order to use such devices for realization of quantum algorithms, one has to deal with the accumulation of errors. It is worth discussing sources of errors for quantum circuits of different lengths under the realization on available superconducting quantum devices. Roughly, they can be divided into readout errors, quantum gate errors, and a bare influence of decoherence, which are characterized as follows:

  1. (i)

    Readout error is typically of the order of 10− 2

  2. (ii)

    Average gate errors is of the order of 10− 3. It is also known that errors of two-qubit gates are nearly one order of magnitude larger than that of single-qubit gates

  3. (iii)

    Longitudinal and transverse relaxation times of individual qubits are typically tens of microseconds. They must be compared to typical timescales of individual quantum gates. This time for single-qubit gates is nearly 80 ns and the duration of two-qubit gates is about 300 ns; there is also 10 ns buffer between two gates.

To partially suppress or mitigate the errors, different tricks have been suggested (Temme et al. 2017; Li and Benjamin 2017; McClean et al. 2017; Endo et al. 2018). These tricks are usually efficient in the regime of low error rate, which is achieved provided shallow quantum circuits are used within the schemes of quantum-classical computation. In contrast, the implementation of our toy model is already associated with the quantum circuit which is not so shallow. Therefore, the error rates in our experiments are relatively high, while the dominant contribution is provided by CNOT errors. We therefore use a series of tricks based on classical postprocessing techniques applied for the output from a noisy quantum device. Since our final goal is to find experimentally the probability patterns of the form similar to the theoretical ones depicted in Fig. 3, in our treatment we also use certain analogies with a problem of image denoising. Thus, we again address the problem of pattern recognition, but now classically.

Figure 6 shows the results for P0(|Ψ〉;ω1, ω2) obtained from IBM classical simulator (left panel), which does not take into account device imperfections, and the real quantum machine (right panel). The results from the classical simulator are, of course, the same, within the computational accuracy and disregarding discretization, as the ones obtained analytically (see Fig. 3b). Both experimental and theoretical maps contain 40 × 40 points. There were 8192 measurements for each point. We have chosen the state |Ψ〉 among the four possibilities in order to illustrate our results and ideas on error mitigation; the results for the remaining three states are rather similar. The comparison of the experimental and theoretical data shows that the agreement is not satisfactory—the experimental data even for our toy classification model are heavily damaged by the noise. Particularly, the experimental probabilities tend to approach 0.5 instead of being distributed from 0 to 1. Moreover, the experimental probability pattern also lacks “connecting bridges between islands”: the exact pattern contains diagonal areas with high values of P0(|Ψ〉;ω1, ω2), while in the experimental data these diagonal areas are dissociated into five separate islands with suppressed values of P0(|Ψ〉;ω1, ω2) between them. Nevertheless, in the next subsections, we are going to apply a combination of tricks in order to extract valuable information from so noisy raw data.

Fig. 6

P0(|Ψ〉;ω1, ω2) obtained from IBM classical simulator and the real quantum device

A poor quality of experimental data is the reason why we restricted ourselves to an oversimplified classification problem with few qubits only among 16 qubits of the device. Indeed, classification of quantum states involving larger number of qubits implies application of much larger number of two-qubit gates which provide the main contribution to the total error rate.

As a measure of difference between ideal (theoretical) results and the experimental data, we have chosen several standard metrics:

  1. (i)

    A signal-to-noise measure, defined as

    $$ SNR(M, M^{\prime}) = 10\log_{10} \left( \frac{\sigma^{2}(M)}{MSE(M,M^{\prime})}\right), $$

    where M and \(M^{\prime }\) are arrays of data, obtained from the quantum chip and ideal classical simulator, correspondingly; σ2(M) is variance; and \(MSE(M,M^{\prime })\) is mean square error.

  2. (ii)

    L1 distance (Manhattan distance), defined as

    $$ d_{L_{1}}(M, M^{\prime}) = \underset{m \in M, m^{\prime} \in M^{\prime}}{\sum} |m - m^{\prime}|, $$

    where m and \(m^{\prime }\) are elements of matrices M and \(M^{\prime }\) correspondingly.

  3. (iii)

    Pearson correlation, defined as

    $$ \rho_{M, M^{\prime}} = \frac{E[(M - E[M])(M^{\prime} - E[M^{\prime}])]}{\sigma(M)\sigma(M^{\prime})}, $$

    where E[M] is an expectation value of M.

We are going to trace the evolution of these three quantities after each step of our denoising procedure.


The first step of our procedure is associated with the postselection of experimental data. The underlying idea is the following: consider we run some quantum circuit on a noisy quantum device and there are certain constraints on possible outputs. These constraints can originate from, e.g., symmetric considerations, and the knowledge of constraints does not necessary require the resolution of the full problem—otherwise, quantum computer is useless. For example, in simulations of many-body systems, there may be certain conditions dictated by an electron-hole symmetry or particle-number conservation. Thus, in computations with noisy quantum devices, we may discard wrong outputs which explicitly violate such requirements. Note that some of us have recently used this idea in Zhukov et al. (2019) dealing with benchmarking of quantum computers using quantum communication protocols.

In the situation. we here consider similar constraints that can be deduced from the explicit derivation of the circuit’s output. Since the main goal of this part of our paper is linked to error mitigation, it is legitimate to use some information from the explicit treatment. Namely, under the proper work of the quantum machine, if the input state is |Φ〉, the output must be a superposition of |Φ〉 and |Φ+〉 irrespective of (ω1, ω2). Thus, if the result of measurements of two register qubits in the computational basis is 00 or 11, this result can be discarded. In order to perform such a postselection, we need to measure not only ancillas, but also data qubits.

The approach we use is not completely universal, since it relies on constraints or symmetries which do not exist for an arbitrary problem. However, we would like to stress that, under certain conditions, it may be efficient to use a redundant coding, i.e., to encode a single logical qubit into larger number of physical qubits and thus to create constraints artificially. Automatic error correction or classical postselection of results can be then applied to discard part of wrong outputs associated with certain quantum errors. Of course, a redundant coding is associated with the increase of the number of noisy gates of the algorithm, but nevertheless the advantages due to the postselection can overcome disadvantages due to the increase of the gate number. The success of the this strategy depends on the details of the algorithm as well as on the errors mechanisms and errors rates. For example, in Zhukov et al. (2019), the redundant encoding supplemented by the postselection was utilized and certain improvement of results has been achieved.

The results of a postselection for the problem we here address are shown in Fig. 7, while Table 1 provides metric values before the postselection and after it. All three quantities indicate certain improvement of data after the procedure we utilized. However, there are also some qualitative changes in the overall distribution of the probability, which can be noticed by comparing experimental data after postselection and raw experimental data (Fig. 7). Namely, postselection leads to the emergence of a correct paternal structure of probability distribution—separate “islands” now tend to be connected by “bridges.” This fact is crucial for the subsequent analysis, since it allows for the partial reconstruction of correct data at the end of our procedure.

Fig. 7

P0(|Ψ〉;ω1, ω2) before and after the postselection procedure

Table 1 Metrics values before the postselection and after it

The fraction of discarded data after this step is approximately 1/2, and it is not so dependent on ω1 and ω2. Of course, the additional measurement of two qubits leads to the increase of total readout error rate. However, these extra errors are definitely much smaller than the total error accumulated by the whole algorithm. This conclusion is evident from the fraction of discarded results, which is as high as 1/2, and known error rates of readouts the latter being typically only several percent.

Image denoising

Let us now discuss another series of heuristic tricks we use to partially suppress the effects of noises. They are associated with the image denoising. However, before that, let us stress that postselection should be used before this step; otherwise, the reconstruction will completely fail. Particularly, without the postselection, the probability pattern lacks connecting “bridges” between “islands” we mentioned before. These features are, of course, crucial for the reconstruction of a correct pattern.

We start with the observation that the experimentally determined values of probability are generally close to 0.5 instead of being distributed between 0 and 1. Nevertheless, the spatial variations of probability as well as its pattern structure in (ω1, ω2) plane are reproduced much more adequately. Notice that both controlling parameters (ω1, ω2) enter the circuit only through single-qubit rotations. The obtained results imply that, in our noisy experiments with real hardware, the output results can be roughly divided into two classes: (i) wrong outputs, which are due to the single or multiple errors occurring during the algorithm executions and (ii) correct results corresponding to zero number of errors occurred. The first contribution is apparently dominant. An important observation is that it is nearly independent of controlling parameters (ω1, ω2). A similar behavior has been recently observed by some of us in Zhukov et al. (2018) dealing with the simulation of unitary evolution of spin clusters using programmable quantum hardware, where a similar controlling parameter was associated with the dimensionless time. The uniformity of the wrong part of the output data with respect to this parameter was attributed to the fact that the circuit was not so shallow and contained a reasonable number of noisy quantum gates. An error occurring at particular gate produces its own dependence of the corresponding output on the controlling parameter. However, such dependencies for errors occurring at different gates of the circuit are also different, so that they finally average out into a nearly uniform dependence on the controlling parameter. Hence, this nearly uniform “background” can be simply eliminated by considering properly normalized differences instead of absolute values of quantities of interest. Let us stress that this situation is a direct consequence of a relatively large number of noisy gates in the circuit—noise in this regime, in some sense, can help extracting valuable information from imperfect data. Of course, as the number of noisy gates grows, the fraction of correct outputs lowers down exponentially—as a result, the trick we discuss can be utilized only in the regime of “intermediate-depth” circuits.

In order to get rid of background, we apply the following transform:

$$ {P}_{0}^{\prime} = \frac{P_{0} - \min P_{0}}{\max P_{0} - \min P_{0}}, $$

where we introduced the notation P0 = P0(|Ψ〉;ω1, ω2). This transform rescales linearly the measured quantity in such a way that the lowest value is mapped to 0 and the highest value is brought to 1. We point out that this trick is not fitting to the already known result. Our methodology is that, in our reconstruction, we use only a partial information on a correct and unknown probability distribution, which in this case is just the minimum and maximum value of the quantity of interest. In many cases, such additional parameters can be deduced from quite general considerations and do not require full knowledge of the output from the quantum computer. The result of the procedure is shown in Fig. 8, and Table 2 gives an evolution of metric values. We see that SNR was improved as well as the Manhattan distance. However, this is not true for the Pearson coefficient which did not change. This latter result is natural, since the Pearson coefficient must be insensitive to linear transformations.

Fig. 8

Probability patterns after three last steps of the postprocessing procedure (see in the text)

Table 2 Metrics values evolution during the following postprocessing procedure: postselection (step 0) → normalization (step 1) → sigmoid transform (step 2) → mean filtering (step 3) → normalization (step 4) → sigmoid transform (step 5)

Although the transformation defined by Eq. 9 enables us to partially get rid of the nearly constant background, it has a serious drawback. The problem is that only a single value of probability corresponding to some particular point of the map is brought to 1, while the probability generally fluctuates significantly from one discrete point in (ω1, ω2) plane to another. The origin of these fluctuations is associated with imperfections of quantum gates.

The particular point of maximum probability resides nearly at the center of the map shown in Fig. 6, i.e., at ω1, ω2 ≈ 0. The same problem, of course, exists for the particular point of the map, for which the measured probability is lowest and hence is switched to 0 by rescaling (7). In order to circumvent this problem, we apply a well known sigmoid transformation. It maps \({P}_{0}^{\prime }\) to the new value \({P}_{0}^{\prime \prime }\), according to

$$ {P}_{0}^{\prime\prime} = \frac{1}{1 + \exp(-a({P}_{0}^{\prime} - b))}, $$

where a and b are free parameters. The value of b is fixed by the requirement that b must stay invariant under the transformation, so that b = 0.5 in our case.

Again, from general considerations, we can deduce a partial information about a true probability pattern, which includes not only minimum and maximum values of this quantity, but also a typical length scale of its variation in the space of parameters (ω1, ω2). For the 2 −qubit input state and the problem we here consider, this length scale can be roughly estimated as ≈ 1/2. Next, we can define another length scale which is much smaller and evaluate the mean value of probability over the corresponding area. It is clear that the probability must be essentially constant within this area. Thus, we choose the parameters of the sigmoid transformation a in such a way as to map the mean value of probability within the corresponding area \(<f>_{\max \limits }\) in the vicinity of its maximum to some number, which is slightly lower than 1 (or alternatively, slightly higher than 0 in the vicinity of its minimum). We choose this number as 0.9. This leads us to equate the \({P}_{0}^{\prime \prime } (<f>_{\max \limits })\) and 0.9. We thus find \(a \approx 5/(2<f>_{\max \limits }-1)\). We obtained that \(<f>_{\max \limits }\) for our set of data is nearly 0.65 in the close vicinity of the point ω1, ω2 ≈ 0 (averaging has been performed over the area of 5 × 5 points) and hence a ≈ 15. Let us stress that the quality of reconstruction is nearly the same until a ranges from 10 to 20; thus, the choice of a characteristic number 0.9 as well as the area of the region for performing averaging are rather relative.

Table 2 provides the evolution of results for the metrics values. The use of the sigmoid transformation with a = 15 applied after the normalization gives further improvement of data quality according to the SNR metrics. However, L1 and Pearson coefficients indicate certain decrease of the agreement between the experiment and theory. The reason is linked to the fact that the sigmoid transformation, at this stage, produces artifacts—it enhances fluctuations in some points of the plane by bringing values of probabilities close either to 0 or to 1. It is evident that L1 is a point-wise local metric and it is rather sensitive to the enhancement of such local fluctuations. Pearson coefficient is also more sensitive to local fluctuation than SNR which is consistent with the fact that it is invariant under the rescaling of the probability pattern as a whole.

The procedures we have used do not completely suppress fluctuations of probability between neighboring discrete points of the map; moreover, the sigmoid transformation even enhances them to a certain extent. A natural idea is to use a mean filtering, i.e., to average out discrete data over small areas discussed in relation to the sigmoid transformation. However, this leads to the fact that the probability is again shifted towards 0.5. As seen from the results of Table 2, it is accompanied by the decrease of SNR, although other metrics show better results due to the fact that after filtering procedure the artifacts of sigmoid transformation, as discussed above, have been partially suppressed. In order to get rid of the decrease of SNR at this stage, we afterwards re-apply normalization and sigmoid transforms with the same parameters a and b and achieve a further improvement of data quality according to the three metrics we used.

The whole procedure of postprocessing is the following: postselection (step 0) → normalization (step 1) → sigmoid transform (step 2) → mean filtering (step 3) → normalization (step 4) → sigmoid transform (step 5). The final result at the end of last three steps of this sequence is shown in Fig. 8. Table 2 provides an evolution of metrics values, which shows that all of them have been significantly improved although the details of their evolution at different steps of our procedure were not identical due to different types of correlations these quantities are responsible for. The comparison between the final pattern and the exact pattern shows that the agreement is good, although certain discrepancies are still present. As a whole, the improvement compared to raw data is significant. Thus, our procedure provides a case study which illustrates that it is possible to extract valuable information from data of noisy quantum computer even if they are heavily damaged by the decoherence and gate errors.


In this paper, we have addressed a hybrid quantum-classical scheme for the classification of input quantum states, where quantum part is represented by the phase estimation algorithm. It is based on a tunable unitary operator which can be adjusted to accomplish a desired classification of input quantum states from the training set. Due to the fact that measurements are performed on ancilla qubits, the classification can be made nondestructive and deterministic. For a general input quantum state, the scheme works as a probabilistic classifier and can be used to classify underlying patterns in quantum data.

We demonstrated proof-of-principle implementation of this idea using a superconducting quantum computer of IBM Quantum Experience and a specific simple example of the hybrid scheme we suggested. This scheme is able to classify maximally entangled two-qubit states into two groups depending on their parity. The real quantum hardware is characterized by different imperfections which lead to the accumulation of errors during the algorithm executions. Error mitigation, within our realization, was another issue addressed in this paper. We have applied a series of tricks associated with classical postprocessing to improve the raw experimental data and to recognize patterns contained in them. These ideas may be used in other realizations of hybrid quantum-classical computation schemes. Our results also demonstrate that pattern recognition can be an important ingredient of classical postprocessing of data from noisy quantum hardware.


  1. Aaronson S (2015) Read the fine print. Nat Phys 11:291

    Article  Google Scholar 

  2. Adcock J, Allen E, Day M, Frick S, Hinchliff J, Johnson M, Morley-Short S, Pallister S, Price A, Stanisic S (2015) Advances in quantum machine learning. arXiv:1512.02900

  3. Amin MH, Andriyash E, Rolfe J, Kulchytskyy B, Melko R (2018) Quantum boltzmann machine. Phys Rev X 8:021050

    Google Scholar 

  4. Arunachalam S, Gheorghiu V, Jochym-O’Connor T, Mosca M, Srinivasan PV (2015) On the robustness of bucket brigade quantum RAM. New J Phys 17:123010

    Article  Google Scholar 

  5. Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S (2017) Quantum machine learning. Nature 549:19

    Article  Google Scholar 

  6. Cai X-D, Wu D, Su Z-E, Chen M-C, Wang X-L, Li L, Liu N-L, Lu C-Y, Pan J-W (2015) Entanglement-based machine learning on a quantum computer. Phys Rev Lett 114:110504

    Article  Google Scholar 

  7. Degen CL, Reinhard F, Cappellaro P (2017) Quantum sensing. Rev Mod Phys 89:035002

    MathSciNet  Article  Google Scholar 

  8. Endo S, Benjamin SC, Li Y (2018) Practical quantum error mitigation for near-future applications. Phys Rev X 8:031027

    Google Scholar 

  9. Farhi E, Goldstone J, Gutmann S (2014) A quantum approximate optimization algorithm. arXiv:1411.4028

  10. Granade CE, Ferrie C, Wiebe N, Cory DG (2012) Robust online hamiltonian learning. New J Phys 14:103013

    MathSciNet  Article  Google Scholar 

  11. Kandala A, Mezzacapo A, Temme K, Takita M, Brink M, Chow JM, Gambetta JM (2017) Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549:242

    Article  Google Scholar 

  12. Li Y, Benjamin SC (2017) Efficient variational quantum simulator incorporating active error minimization. Phys Rev X 7:021050

    Google Scholar 

  13. Li Z, Liu X, Xu N, Du J (2015) Experimental realization of a quantum support vector machine. Phys Rev Lett 114:140504

    Article  Google Scholar 

  14. Lloyd S (2008) Enhanced sensitivity of photodetection via quantum illumination. Science 321:1463

    Article  Google Scholar 

  15. McClean JR, Romero J, Babbush R, Aspuru-Guzik A (2016) The theory of variational hybrid quantum-classical algorithms. New J Phys 18:023023

    Article  Google Scholar 

  16. McClean JR, Kimchi-Schwartz ME, Carter J, de Jong WA (2017) Hybrid quantum-classical hierarchy for mitigation of decoherence and determination of excited states. Phys Rev A 95:042308

    Article  Google Scholar 

  17. Peruzzo A, McClean J, Shadbolt P, Yung M-H, Zhou X-Q, Love PJ, Aspuru-Guzik A, O’Brien JL (2014) A variational eigenvalue solver on a photonic quantum processor. Nat Comm 5:4213

    Article  Google Scholar 

  18. Preskill J (2018) Quantum computing in the NISQ era and beyond. Quantum 2:79

    Article  Google Scholar 

  19. Ristè D, da Silva MP, Ryan CA, Cross AW, Smolin JA, Gambetta JM, Chow JM, Johnson BR (2017) Demonstration of quantum advantage in machine learning. npj Quantum Information 3:16

    Article  Google Scholar 

  20. Schuld M, Sinaiskiy I, Petruccione F (2015a) An introduction to quantum machine learning. Contemp Phys 56(2):1034

    Article  Google Scholar 

  21. Schuld M, Sinayskiy I, Petruccione F (2015b) Simulating a perceptron on a quantum computer. Phys Lett A 379:660

    Article  Google Scholar 

  22. Tan S-H, Erkmen BI, Giovannetti V, Guha S, Lloyd S, Maccone L, Pirandola S, Shapiro JH (2008) Quantum illumination with gaussian states. Phys Rev Lett 101:253601

    Article  Google Scholar 

  23. Temme K, Bravyi S, Gambetta JM (2017) Error mitigation for short-depth quantum circuits. Phys Rev Lett 119:180509

    MathSciNet  Article  Google Scholar 

  24. Wiebe N, Braun D, Lloyd S (2012) Quantum algorithm for data fitting. Phys Rev Lett 109:050505

    Article  Google Scholar 

  25. Wiebe N, Granade C, Ferrie C, Cory DG (2014) Hamiltonian learning and certification using quantum resources. Phys Rev Lett 112:190501

    Article  Google Scholar 

  26. Zhukov AA, Remizov SV, Pogosov WV, Lozovik YE. (2018) Algorithmic simulation of far-from-equilibrium dynamics using quantum computer. Quantum Inf Process 17:223

    MathSciNet  Article  Google Scholar 

  27. Zhukov AA, Kiktenko EO, Elistratov AA, Pogosov WV, Lozovik YE (2019) Quantum communication protocols as a benchmark for programmable quantum computers. Quantum Inf Process 18:31

    Article  Google Scholar 

Download references


We acknowledge use of the IBM Quantum Experience for this work. The viewpoints expressed are those of the authors and do not reflect the official policy or position of IBM or the IBM Quantum Experience team. W. V. P. acknowledges a support from RFBR (project no. 19-02-00421).

Author information



Corresponding author

Correspondence to W. V. Pogosov.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Babukhin, D.V., Zhukov, A.A. & Pogosov, W.V. Nondestructive classification of quantum states using an algorithmic quantum computer. Quantum Mach. Intell. 1, 87–96 (2019).

Download citation


  • Quantum computing
  • Quantum data processing
  • Postprocessing
  • Quantum error correction
  • Error mitigation