Abstract
Quantum noise is currently limiting efficient quantum information processing and computation, impacting on the fidelity and reliability of quantum states. In this work, we consider the tasks of reconstructing and classifying quantum states corrupted by the action of an unknown noisy channel using classical feed-forward neural networks. By framing reconstruction as a regression problem, we show how such an approach can be used to recover with fidelities exceeding 99% the noiseless density matrices of quantum states of up to three qubits undergoing noisy evolution, and we test its performance with both single-qubit (bit-flip, phase-flip, depolarizing, and amplitude damping) and two-qubit quantum channels (correlated amplitude damping). Furthermore, a critical aspect of our investigation involves also a comprehensive comparison between mean squared error and infidelity as loss functions. Our findings reveal that these two metrics yield comparable results in the context of state reconstruction. Moreover, we also consider the task of distinguishing between different quantum noisy channels, and show how a neural network-based classifier is able to solve such a classification problem with perfect accuracy.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
One of the main problems in quantum information processing and computation is that quantum systems can be corrupted by unwanted interactions with the environment. Therefore, the incorporation of robust quantum error correction and mitigation strategies is of paramount importance to realize the full potential of quantum information processing.
Despite the effectiveness of quantum error correction protocols in preserving information, they often require significant overhead and resources. Quantum error mitigation techniques, on the other hand, focus on reducing the impact of noise without fully correcting it, making them more feasible for near-term quantum devices (Cai et al. 2023; Kandala et al. 2019). Examples are readout mitigation techniques to correct measurement errors (Van Den Berg et al. 2022; Smith et al. 2021; Bravyi et al. 2021), noise deconvolution methods to retrieve ideal expectation values of generic observables evaluated on a system subject to a known noise before measurement (Mangini et al. 2022; Roncallo et al. 2023), probabilistic error cancelation (Van Den Berg et al. 2023) and data-driven approaches such as zero-noise extrapolation (Kim et al. 2023) and Clifford data regression (Czarnik et al. 2021; Giurgica-Tiron et al. 2020; Lowe et al. 2021) to mitigate noise happening during a quantum computation.
Another area of great interest is deep learning, which has achieved impressive successes over the past years, with generative pre-trained large language models now leading the way (Vaswani et al. 2017; Brown et al. 2020; Deng and Lin 2022). Deep learning models have excelled in diverse areas, from image and speech recognition models (He et al. 2016; Kamath et al. 2019) to playing games (Schrittwieser et al. 2020), reaching and often surpassing human-level performances. These advancements highlight the vast potential of deep learning in revolutionizing numerous fields, including quantum computation and information.
Indeed, deep learning techniques have shown great promi-ses also for quantum information processing applications, as they were leveraged successfully in, e.g., experimental phase estimation tasks (Lumino et al. 2018), automating the development of QCVV (Quantum characterization, validation and verification) protocols (Scholten et al. 2019), learning quantum hardware-specific noise models (Zlokapa and Gheorghiu 2020), increasing measurement precision of quantum observables with neural networks (Torlai et al. 2020), quantum error mitigation (Kim et al. 2022, 2020; Sack and Egger 2023), identifying quantum protocols such as teleportation or entanglement purification (Wallnöfer et al. 2020), classification and reconstruction of optical quantum states (Ahmed et al. 2021), and quantum state estimation (Lohani et al. 2020).
In this work, we leverage machine learning techniques based on feed-forward neural networks to deal with the task of recovering noise-free quantum states when they undergo an undesired noise evolution. In fact, while it is well known that quantum noisy channels cannot be physically inverted in general, this may be achieved by means of classical post-processing methods (Mangini et al. 2022; Roncallo et al. 2023; Van Den Berg et al. 2023). In particular, since neural networks are universal approximators (Hornik et al. 1989), they can be used to learn a mapping that effectively inverts that effect of noise, and hence they can be used to reconstruct noiseless quantum states. Specifically, let \(\varvec{\tilde{r}}\) indicate the (generalized) Bloch components of a noisy quantum state, our goal is to train a neural network \(h_{\varvec{w}}(\cdot )\) to output the Bloch vector of the ideal noiseless state \(\varvec{\tilde{r}}\rightarrow h_{\varvec{w}}(\varvec{\tilde{r}}) = \varvec{r}\), where \(\varvec{r}\) is the ideal Bloch vector of the state before it undergoes noise process. We explore several combinations of single- and two-qubit noisy channels acting on systems of up to three qubits, and also study the effect of using different loss functions for training, and show that our neural network-based method can reach quantum state reconstruction fidelities higher than 99.9%. The main idea of the proposed method is summarized in Fig. 1.
In addition to regression tasks, we also show how feed-forward neural networks can be used for classifying different quantum channels based on the effect they have on quantum states. In particular, using as inputs Bloch vectors \([\varvec{\tilde{r}}, \varvec{r}]\) obtained with different channels, the network will output a label corresponding to the quantum channel that has been applied to \(\varvec{r}\) in order to produce \(\varvec{\tilde{r}}\). Also in this case, we achieve almost perfect channel classification accuracy.
Outline of the neural network-based noise reconstruction and classification protocols. (a) Noisy Bloch vectors, representing quantum states affected by noise, serve as input to a feed-forward neural network. The quantum state reconstruction protocol aims to recover the original noiseless quantum states from the observed noisy Bloch vectors, utilizing the neural network model. (b) Both noisy and noiseless Bloch vectors are fed into the neural network as input. The network is specifically designed for a classification task, where the output provides a label representing the type of noise acting on the noiseless quantum state
The rest of the manuscript is organized as follows. In Section 2 we formally introduce the problem and the neural network used to obtain the quantum state reconstruction. In Section 3 we present the results obtained for the reconstruction of pure and mixed states, and we introduce the noise classification problems that can be solved similarly with neural networks. In Section 4 we summarize all our results and possible improvements of our method.
2 Methods
In this section, we formalize the quantum communication problem we want to tackle, that is, the recovery of noiseless quantum states undergoing an undesired noisy evolution. We first start by introducing the notation for describing an n-qubit quantum state in terms of its Bloch components, and then move on discussing the neural network approach used in this work, including details on the optimization procedure and the construction of the training and test dataset.
2.1 Reconstruction of noisy Bloch vectors
The state of an n-qubit quantum system is described by its density matrix \(\rho \in \mathbb {C}^{2^n \times 2^n}\), which can be expressed in the Pauli basis as follows (Nielsen and Chuang 2010; Ozols and Mančinska 2007)
where \(\varvec{r} \in \mathbb {R}^{4^n-1}\) is the generalized Bloch vector, and \(\varvec{P} = (P_1,...,\, P_{4^n-1})\) is a vector containing the multi-qubit Pauli basis, obtained by considering tensor products of single-qubit Pauli matrices, that is \(P_i = \sigma _1^{(i)} \otimes \cdots \otimes \sigma _n^{(i)}\), with \(\sigma _k \in \{\mathbb {I}, X, Y, Z\}\).
Quantum channels are completely positive trace preserving (CPTP) maps whose action on a state \(\rho \) can be expressed in Kraus form as (Nielsen and Chuang 2010)
where \(\{E_i\}\) are the Kraus operators of the channel \(\mathcal {N}(\cdot )\), satisfying the trace preserving condition \(\sum _iE_i^{\dagger }E_i=\mathbb {I}\). In our experiments, we consider various single-qubit noisy channels (bit-flip \(\mathcal {X}_p\), phase-flip \(\mathcal {Z}_p\), bit-phase-flip \(\mathcal {Y}_p\), general Pauli \(\mathcal {P}_{\varvec{p}}\), depolarizing \(\mathcal {D}_p\) and amplitude damping \(\mathcal {A}_{p\gamma }\)), as well as a correlated two-qubit amplitude damping channel. We refer to Appendix A for an extended discussion on the quantum noise models used in this work.
Given a noisy channel \(\mathcal {N}(\cdot )\), our goal is to obtain through a learning procedure an optimized neural network that receives noisy Bloch vectors \(\{\varvec{\tilde{r}}_k\}\), and outputs the corresponding noiseless vectors \(\{\varvec{r}_k\}\). In other words, we are looking for the function \(h(\cdot )\) which inverts the action of the noise on the Bloch components of the quantum states, namely
2.2 Reconstruction with neural networks
We provide a concise overview of the fundamental aspects of neural networks, discussing their relevance in addressing the task of quantum state reconstruction.
2.2.1 Generation of the training set
The initial phase of addressing a regression problem involves constructing a valid dataset. In our specific case, the training (and validation) set consists of pairs of noisy and noiseless Bloch vectors
which are obtained by evolving some input quantum states \(\rho _m\) through the noisy channel under investigation, thus obtaining the noisy states \(\tilde{\rho }_m\). The Bloch components of these density matrices are then computed as \(r_i = \textrm{Tr}[\rho \, P_i]\) for \(i=1,..., 4^n-1\) (1).
The choice of the (generalized) Bloch vector as element of the dataset is twofold: first, each quantum state is characterized by its own vector, which grants a unique representation of the state; and secondly a vector as input fits naturally in the processing structure of a feed-forward neural network.
The input quantum states we consider are uniformly distributed in the space of quantum states. For the case of pure states \(\rho _m = |{\psi _m}\rangle \langle {\psi _m}|\), these are obtained by sampling states \(|{\psi _m}\rangle \) from the Haar distribution (Edelman and Rao 2005; Meckes 2019), while for uniformly distributed mixed states, these can be generated either starting from uniformly distributed pure states by means of an appropriate rescaling (Harman and Lacko 2010; Rubinstein and Kroese 2016) or by using the Ginibre ensemble (Życzkowski et al. 2011; Ginibre 1965; Gulshen et al. 2019).
The cardinality \(|\mathcal {T}| = M\) of the dataset is contingent upon the specific problem under consideration and, as demonstrated in Section 3, has a direct impact on the performance of the network. As the quantum computational resources to generate the training set \(\mathcal {T}\) may be demanding experimentally, one generally has to find a compromise between achieving high reconstruction accuracies, and the number of samples (i.e. quantum states) included in the dataset.
Note that, as currently formulated, our approach for state reconstruction involves training machine learning models separately for each type of noise map. To address the more general problem of reconstructing noisy states coming from various noise channels at the same time, one would need to incorporate additional input information into the network to make the problem unambiguous (the same noisy Bloch vector can be the result of different noise channels). This could include a numerical encoding representing the type of channel applied, thereby merging the two approaches proposed in this manuscript, namely the regression problem (Section 3.1) and the classification one (Section 3.2). We leave more general machine learning-based reconstruction methods as a subject of future studies.
2.2.2 Feed-forward neural networks
In this work, we analyze data using deep feed-forward neural networks, which are parametric models that process information in layer-wise fashion through a repeated application of similar operations, as shown in Fig. 1(a).
These neural networks consist of an input layer responsible for data loading, followed by multiple hidden layers to process the information, and finally, an output layer to obtain the computation’s result. Each layer consists of a set of individual nodes known as neurons, and while input and output layers have a number of neurons matching the dimension of the input and the output respectively, the number of neurons in the hidden layers is an architectural hyperparameter to be chosen in advance by the designer. For example, the action of a feed-forward neural network with two hidden layers and trainable parameters \(\varvec{\theta }\), can be expressed as
where \(\varvec{x} \in \mathbb {R}^d\) and \(\hat{\varvec{y}} \in \mathbb {R}^{p}\) are the input and output vectors, \(\varvec{W}^{[1]} \in \mathbb {R}^{h_1 \times d}\) and \(\varvec{b}^{[1]} \in \mathbb {R}^{h_1}\) are the trainable parameters for the first hidden layer, \(\varvec{W}^{[2]} \in \mathbb {R}^{h_2 \times h_1}\) and \(\varvec{b}^{[2]}\in \mathbb {R}^{[h_2]}\) are the trainable parameters for the second hidden layer, \(\varvec{w} \in \mathbb {R}^{p}\) and \(b \in \mathbb {R}\) are the trainable parameters for the output layer, and \(g(\cdot )\) is a non-linear activation function which is applied element-wise to the entries of the vectors. As previously mentioned, \(h_1\) and \(h_2\) are hyperparameters that represent the number of hidden neurons for each respective layer.
In our simulations, we explore different architectures using networks with 2 or 3 hidden layers using \(h_i \in \{64, 128\}\) hidden neurons per layer, while the input and output layers have dimension \(d=p=4^n-1\), as they are employed to represent the components of the Bloch vectors. For the activation function, as customary in machine learning, we adopt the Rectified Linear Unit (ReLU), defined as \(g(x) = \text {ReLU}(x):= \max (0,x)\).
2.2.3 Performance metrics
Given the dataset and the trainable model, we discuss the figures of merit employed for training and evaluating the neural network’s performance in quantum state reconstruction and noise classification tasks. In the context of quantum state reconstruction, we have tested two possible alternatives coming from the classical and quantum information domain respectively, namely the Mean Squared Error (MSE) between the reconstructed and ideal Bloch vectors, and the quantum infidelity between the reconstructed and original quantum states.
In noise classification problems, we used both categorical cross-entropy (15) and accuracy metrics (16) in order to assess how effectively the neural network can distinguish between different types of noisy channels.
MSE
The mean squared error is the most common measure of performances for regression problems in machine learning and consists of the Euclidean distance between vectors, which in our case becomes
where \(\varvec{\varvec{r}}_i\) is the noiseless Bloch vector (see (4)), and \(\varvec{\hat{r}}_i(\theta ) =\text {NN}_{\varvec{\theta }}(\varvec{\tilde{r}}_i)\) is the one predicted by the neural network, with trainable parameters \(\varvec{\theta }\), when receiving as input the noisy Bloch vector \(\varvec{\tilde{r}}_i\). Then, the mean squared error function over the entire dataset \(\mathcal {T}\) of size \(|\mathcal {T}|=M\), is
with \((\varvec{\varvec{\tilde{r}}_i}, \varvec{\varvec{r}_i}) \in \mathcal {T}\).
Infidelity
The quantum fidelity is a measure of distance for quantum states, and given the quantum nature of the data under investigation, it is particularly suited to assess the reconstruction performances of the neural network. Given two density matrices \(\rho \) and \(\sigma \), their fidelity is defined as (Nielsen and Chuang 2010)
with \(0\le F(\rho , \sigma )\le 1\), where the second equality holds if and only if the states are equal \(\rho = \sigma \). The infidelity between two quantum states is then defined as \(I(\rho , \sigma ):= 1 - F(\rho , \sigma )\).
Despite being suited to measure the distance of quantum states, the complex functional dependence of the infidelity on the Bloch vectors of the density matrices — hence the parameters of the neural network — often leads to numerical instabilities when it is used as the loss function to drive the training of the neural network, eventually impairing the optimization process. For this reason, when \(\rho \) and \(\sigma \) are pure states, we instead use directly the simplified but equivalent expression for the fidelity
while if the states are mixed, we use an alternative measure of distance proposed in Wang et al. (2008)
The fidelity in (10) reaches the maximum 1 if and only if \(\rho = \sigma \). However, it differs from the standard fidelity reported in (8), since it is not monotonic under quantum operations, meaning it is neither concave nor convex, and it includes a normalization factor, resulting in a scaled fidelity measure, when one of the two states is pure, i.e. if \(\sigma =|{\psi }\rangle \langle {\psi }|\), then \(F(\rho , |{\psi }\rangle \langle {\psi }|) = \langle \psi |\rho | \psi \rangle / \text {Tr}\rho ^{2}\).
From the fidelity expressions in (9) and (10), we calculate the corresponding infidelity and perform an average over the entire dataset, yielding
where \(\rho _i\) and \(\sigma _i\) are the density matrices computed respectively from the Bloch vectors \(\varvec{\varvec{r}}_i\) and \(\hat{\varvec{r}}_i = \varvec{\hat{r}}_i(\theta ) =\text {NN}_{\varvec{\theta }}(\varvec{\tilde{r}}_i)\), with \((\varvec{\varvec{\tilde{r}}_i}, \varvec{\varvec{r}_i}) \in \mathcal {T}\).
Moreover, it is worth noticing that for single-qubit density matrices the fidelity (8) can be further simplified and expressed in terms of the Bloch vectors as (Jozsa 1994)
where \(\varvec{r}, \varvec{s} \in \mathbb {R}^3\) are the Bloch vectors for \(\rho \) and \(\sigma \) respectively. Specifically, for the particular case when both states are pure \(\Vert {\varvec{r}}\Vert ^{2}=\Vert {\varvec{s}}\Vert ^{2} = 1\), then the expression for the infidelity corresponds to the mean squared error up to a prefactor
We refer to Appendix B for more details on the derivation of (12) and (13). Notably, to the best of our knowledge, apart for single-qubit states, there is no straightforward connection between the fidelity of quantum states and the Euclidean distance of their generalized Bloch vectors.
Finally, in order to assess the performance of the optimized neural network we introduce the average test fidelity (ATF), defined as the mean fidelity between the predicted quantum states and their corresponding ideal counterparts averaged over a test dataset \(\mathcal {\tilde{T}}\) which was not used during training. The ATF is calculated as
where N is the cardinality of the test set, \(F(\cdot ,\,\cdot )\) is the fidelity in (8), and \(\rho _i\) and \(\sigma _i\) are the density matrices computed respectively from the Bloch vectors \(\varvec{r}_i\) and \(\varvec{\hat{r}}_i=\text {NN}_{\varvec{\theta }}(\varvec{\tilde{r}}_i)\), with \((\varvec{\varvec{\tilde{r}}}_i, \varvec{\varvec{r}}_i) \in \mathcal {\tilde{T}}\). A high average test fidelity, typically exceeding 99.9%, indicates that our neural network is capable of accurately reconstructing the corrupted quantum states.
Categorical cross-entropy
The categorical cross-entropy is one of the most common measures to evaluate the performance of classification models (Goodfellow et al. 2016). Given a classification task with C different classes, the categorical cross-entropy quantifies the disparity between the predicted probability distribution of the classes and the true class labels, and it is mathematically defined as
where M is the number of samples to be classified, \(y_{i} = (0,0,..., 1_{c(i)},..., 0) \in \{0,1\}^C\) is the true probability distribution for the samples indicating that the i-th sample belongs to class c(i), and \(p_{ij} \in [0,\,1]\) is the model’s predicted probability distribution for the i-th sample to belong to the j-th class. This metric penalizes the deviation between predicted and actual distributions, with lower values indicating better alignment between the model’s predictions and the true labels.
Accuracy
For classification tasks the evaluation of performance commonly revolves around the metric of accuracy, which quantifies the effectiveness of a model in correctly categorizing samples within a dataset. Mathematically, the accuracy is defined as the ratio of the number of correctly classified samples to the total number of samples, namely
where a value of 1 indicates a perfect classification.
2.2.4 Optimization
Training the neural network means solving the minimization problem
where \(\varvec{\theta }\) are the trainable parameters of the neural network, \(\mathcal {T}\) is the training dataset as defined in (4), and \(\mathcal {L}\) is the loss function driving the learning process, which, as discussed in the previous section, in our case is either the mean squared error (7) or the average infidelity (11).
3 Results
We now present the results obtained for quantum state reconstruction and quantum noise classification using the proposed neural network methods. We start discussing the performance of the neural network in reconstructing noise-free quantum states corrupted by various single- and two-qubit noisy channels, and then proceed showcasing the network’s capability to classify different quantum noisy channels accurately. Our results demonstrate high-fidelity state reconstruction and robust channel classification, thus revealing the potential of machine learning techniques for quantum information processing and computation.
All simulations performed in this work are run with Qiskit (Qiskit contributors 2023) and TensorFlow (Abadi et al. 2015).
3.1 Quantum state reconstruction
In order to explore the reconstruction capabilities of the neural network approach, we first start by studying its performances in the simpler case of learning noisy single-qubit states, and then move on to the more complex case of multi-qubit systems. In both scenarios, we see a clear dependence of the performances on the amount of available data to train the model (size of the training set), and provided that data is sufficient the network is always able to restore noiseless quantum states from their noisy counterparts.
Whenever we consider the task of reconstructing initially pure states, an auxiliary normalization layer is added at the end of the neural network (5) so that the outputs always consist of (Bloch) vectors with unit norm (for single-qubit states), as desired for pure states. Such normalization constraint enforces the generation of physically consistent output states, which in turns effectively constrains the value of the infidelity loss function to remain in the physical regime.Footnote 1 It is worth noting that even when the MSE is employed as the loss function, the same normalization layer is used to ensure that the predicted states maintain their physical integrity and adhere to the desired constraints. The effect of the normalization layer is simply to rescale each output as follows
In addition, whenever we consider the case of reconstructing already mixed states \(\rho \) of given purity \(\text {Tr}[\rho ^2]\), the output of the neural network is additionally rescaled with the purity of the initial mixed state, \(\hat{\varvec{r}} \rightarrow \sqrt{\text {Tr}[\rho ^2]} \hat{\varvec{r}} \).
3.1.1 Single-qubit states
In Table 1 we summarize the results of the reconstruction process for various noisy channels, using different loss functions to drive the training process, and using pure or mixed states as inputs to the neural network. In all cases we observe a very good average test fidelity (ATF) (14) at the end of training exceeding 99.6%, thus showing the effectiveness of the proposed approach for reconstructing noisy states. As we now see, equally good performances are also obtained in the more complex task of inverting noise in multi-qubit systems.
3.1.2 Multi-qubit states
We tested the reconstruction procedure also on systems of \(n=2\) and \(n=3\) qubit systems undergoing several noisy evolutions with both uncorrelated and correlated noisy channels, and we summarize the results in Table 2. For two-qubit systems, we considered the following noise maps: (i) phase-flip channel with \(p=0.2\) applied to the first qubit and nothing on the second indicated as \(\mathcal {Z}_p \otimes \mathcal {I}\); (ii) a phase-flip channel applied to both qubits with \(p=0.2\), denoted as \(\mathcal {Z}_p \otimes \mathcal {Z}_p\); (iii) phase-flip channel on the first qubit and a bit-flip channel on the second qubit, both with \(p=0.2\), denoted as \(\mathcal {Z}_p \otimes \mathcal {X}_p\); (iv) and finally we also have investigated a scenario where a correlated two-qubit amplitude damping channel \(\mathcal {C}_{AD}\) (\(\eta = 0.1\), \(\mu = 0.2\)), was applied to the system (see (29) in Appendix A for a definition of the channel).
For \(n=3\) qubit systems instead, we tested the reconstruction performance with states subject to the composite channel \(\mathcal {X}_p \otimes \mathcal {Z}_p \otimes \mathcal {Y}_p\). In this configuration, each qubit experiences a distinct quantum channel, including a bit-flip, phase-flip, and bit-phase-flip channel with a common noise parameter of \(p=0.2\).
The results reported in Table 2 again reveal a successful reconstruction of ideal density matrices through the use of a relatively simple feed-forward neural network, and thus confirm the effectiveness of the proposed method. As already mentioned previously, we stress again that to ensure the production of pure quantum states as output, a normalization layer has been incorporated. Specifically, by appropriately multiplying the normalization layer in (18), the norm of the output Bloch vectors is constrained to \(\sqrt{3}\) for two-qubit states, while it is set to \(\sqrt{7}\) (Avron and Kenneth 2020) for three-qubit states. Comparing the cardinality of the training set \(|{\mathcal {T}}|\) used for the simulations in Tables 1 and 2, we see that more samples in the training dataset are generally needed to ensure a good reconstruction for larger system sizes, as one would expect. A discussion on the impact of the available information on the reconstruction performances is the topic of the next section.
In Fig. 2 we report the evolution of the MSE (7) and infidelity (11) loss functions during training for the case of \(\mathcal {Z}_p \otimes \mathcal {X}_p\) applied to a two-qubit system. As clear from the picture, the optimization process of both cost functions is straightforward, and interestingly they follow a similar minimization behavior, despite there is not a trivial relation between the two, as instead it happens for single-qubit states, see (13).
We conclude by noticing that, taking into account that an increase in the number of qubits leads to a proportional increase in the dataset cardinality and the neural network complexity, the reconstruction approach proposed in this work can be straightforwardly applied to any n-qubit quantum state.
3.1.3 Impact of the dataset size on the reconstruction performances
So far we have focused on assessing the reconstruction performances of the neural network assuming that enough data is available, and here instead analyze how such performances depend on the size of the training set.
Average test fidelity (ATF) obtained at the end of training when optimizing the neural networks with training sets of different cardinality. For each cardinality, we repeat the training process 5 times using different initialization of the parameters and different training data, and report the mean value and the standard deviation of the resulting ATFs. (a) Reconstruction of single-qubit states undergoing a phase-flip channel \(\mathcal {Z}_{p}(p=0.2)\). (b) Reconstruction of two-qubit states undergoing the uncorrelated phase-flip channel \(\mathcal {Z}_{p} \otimes \mathcal {Z}_{p}(p=0.2)\)
In Fig. 3 we report the average test fidelity obtained at the end of training, for neural networks optimized using training sets of different cardinality. In Fig. 3a we show the data for the reconstruction of single-qubit states undergoing a phase-flip channel \(\mathcal {Z}_{0.2}\), and in Fig. 3b the reconstruction of two-qubit states undergoing an uncorrelated two-qubit phase-flip channel \(\mathcal {Z}_{0.2} \otimes \mathcal {Z}_{0.2}\).
In both cases — and for both the considered loss functions— we observe that the use of a larger training set yields better reconstruction performances, up until a plateau is reached, but importantly also that satisfactory results can be achieved even with a limited number of samples.
3.2 Classification of noisy channels
The application of neural networks to quantum information processing can be extended beyond quantum state reconstruction to that of classification of quantum channels. In particular, in this section we show how a neural network can be trained to discriminate between noisy channels based on the effect they have on some input states, a scenario which is graphically depicted in Fig. 1. Our exploration encompasses a series of classification scenarios, including binary and multi-class classification.
In such classification problems, each data item in the training set is constructed by considering as input the noiseless Bloch vector \(\varvec{r}_m\) appended to the noisy one \(\varvec{\tilde{r}}_m\), and as output an integer label \(y_m\) encoding which type of error was applied to \(\varvec{r}_m\) to obtain \(\varvec{\tilde{r}}_m\). Formally, given a classification task with C possible classes, the training dataset is then defined as
where \(\big [\varvec{\tilde{r}}_m, \varvec{r}_m\big ] \in \mathbb {R}^{2 \times 3}\) is the input to the neural network, and \(y_m \in \{1,...,C\}\) is the desired output. We refer to this training dataset with \(\mathcal {T}_{IN}\), where the subscript indicates that we use as input the extended vector obtained by merging the ideal and noisy Bloch vector.
Furthermore, we consider the more complex scenario where every training data point comprises only the noisy vector \(\varvec{\tilde{r}}_m\), accompanied with its associated noise label \(y_m\). As the neural network is now provided with less information, this scenario presents a higher level of complexity and, as will become clear from the following results, it generally necessitates a larger corpus of data points. We refer to this type of training dataset with \(\mathcal {T}_{N}\), where the subscript indicates that we use as input exclusively the noisy Bloch vector.
As standard with classification tasks in machine learning, for both scenarios we use a one-hot encoding of the labels and train the neural networks using the categorical cross-entropy (15) as loss function (Goodfellow et al. 2016), and then measure the final performances with the accuracy metric (16).
3.2.1 Binary classification
We consider the binary classification problem (\(C=2\)) of discriminating single-qubit states subject either to phase-flip channel \(\mathcal {Z}_p\), or amplitude damping channel \(\mathcal {A}_{p\gamma }\). The training set is generated by sampling a number \(|{\mathcal {T}}|\) of random uniform single-qubit states and evolving half of them with the phase-flip channel, and the remaining half with the amplitude damping one. Classification accuracies at the end of the training procedure are reported in Table 3.
Deformed Bloch spheres obtained by applying a phase-flip channel \(\mathcal {Z}_p(p=0.2)\) (light blue), and a generalized amplitude damping channel \(\mathcal {A}_{p\gamma }(p=0.5,\, \gamma =0.3)\), to a set of uniformly distributed pure states. Note the non-trivial intersection between the two ellipsoids
Remarkably, with a training set containing \(|{\mathcal {T}_{IN}}|=300\) samples, our model exhibits very good classification performances, reaching perfect accuracy \(\text {ACC}=1\). In this case, we noticed that even with a reduced dataset comprising just a few dozen samples, the model is able to achieve almost perfect accuracy, even though the learning process becomes perceptibly less stable.
On the other hand, with the noisy training set \(\mathcal {T}_N\) with \(|{\mathcal {T}_N}|=300\) samples, the best accuracy obtained was \(\text {ACC}=0.92\). As expected, as the noisy dataset \(\mathcal {T}_N\) contains only information about the noisy Bloch vectors (and a label for the noisy channel that created them), more data is needed to reach good classification performances. Indeed, higher accuracies can then be obtained by using larger training datasets: for example, an accuracy of \(\text {ACC}=0.98\) can be achieved with a training set containing 800 samples.
Note that when using dataset \(\mathcal {T}_N\), learning the relation between noisy Bloch vectors belonging to the same channel implicitly translates to that of reconstructing the shape of the Bloch sphere generated by the two channels, which are reported in Fig. 4. However, as clear from the graphical representation, there is a non-trivial intersection between the two Bloch spheres, which implies that certain samples could reasonably be assigned to either class. The imperfect accuracy and the general decrease in classification accuracy for the noise-only training dataset \(\mathcal {T}_N\) shown in Table 3, can then be explained by such inherent ambiguity in the dataset which makes the classification task more difficult — if not impossible — to solve exactly.
3.2.2 Multi-class classification
As a straightforward extension to the previous analysis, we also report results for the case of a multi-class classification task, where the network is asked to classify states generated by three different channels (phase-flip, amplitude damping and depolarizing channel), using a dataset of type \(\mathcal {T}_{IN}\), that is containing both ideal and noisy Bloch vectors. We find again that the network is able to perfectly classify all the states, reaching a perfect final accuracy on a test set \(\text {ACC}=1\), as reported in Table 3.
4 Conclusion
In conclusion, our research underscores the remarkable effectiveness of deep neural networks in quantum information processing tasks, specifically in reconstructing and classifying quantum states undergoing unknown noisy evolutions.
This study, as exemplified in our results, showcases the successful recovery of ideal (generalized) Bloch vectors with fidelities exceeding 0.99, even for quantum states involving up to three qubits, under different correlated and uncorrelated noisy channels, and using both classical (mean squared error) and quantum-inspired (infidelity) loss functions for training.
Furthermore, our investigation demonstrates the versatility of our neural network approach in classification problems, adeptly handling a wide range of noise patterns and consistently achieving remarkable classification accuracy across all test samples. Notably, in the context of discriminating between phase-flip and amplitude damping channels, our model achieves an outstanding classification accuracy of 98%, highlighting its remarkable capacity to discern the relationships between states affected by similar noise sources, even when presented with the noisy vectors alone.
Several interesting research directions are available to further extend the results obtained in our work. For instance, one could evaluate the performances of the proposed machine learning method using noise models found on real hardware or directly with data coming from a real quantum device. Additionally, due to the exponential growth of the input size (i.e. the Bloch vectors) with the number of qubits, this method is currently practical only for qubit systems of moderate size, typical of the current NISQ era. Addressing and assessing the scalability of our method to larger system sizes, as well as testing it on realistic noise channels, is essential for advancing the applicability of our method. We consider these an important topic for future research.
As we look ahead, an intriguing avenue for further exploration lies in examining the intricate connections between various fidelity measures (Liang et al. 2019) used as training loss functions and their impact on the resulting test fidelities. This pursuit aims to identify the fidelity metric best suited to the specific characteristics of the problem at hand. Such investigations promise to advance our understanding of quantum information processing and open new horizons for practical applications in quantum technology.
Data Availability
The code that supports the findings of this study is available on a public repository on GitHub, accessible at https://github.com/MorgilloR/QuantumStateReconstruction-DL.
Notes
The fidelity (8) can be larger than 1 if the Bloch vectors have norms exceeding one, which is not possible for density matrices of quantum systems. If no constraint on the Bloch vectors is present, then the training process will simply push the neural network to output states of larger and larger norm to maximize the overlap.
References
Abadi M et al (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org. https://www.tensorflow.org/
Ahmed S, Muñoz CS, Nori F, Kockum AF (2021) Classification and reconstruction of optical quantum states with deep neural networks. Physical Review Research. 3(3):033278. https://doi.org/10.1103/PhysRevResearch.3.033278
Avron J, Kenneth O (2020) An elementary introduction to the geometry of quantum states with pictures. Rev Math Phys 32(02):2030001. https://doi.org/10.1142/S0129055X20300010
Bravyi S, Sheldon S, Kandala A, Mckay DC, Gambetta JM (2021) Mitigating measurement errors in multiqubit experiments. Phys Rev A 103(4):042605. https://doi.org/10.1103/PhysRevA.103.042605
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds.) Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., pp. 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Cai Z, Babbush R, Benjamin SC, Endo S, Huggins WJ, Li Y, McClean JR, O’Brien TE (2023) Quantum error mitigation. Rev Mod Phys 95(4):045005. https://doi.org/10.1103/RevModPhys.95.045005
Czarnik P, Arrasmith A, Coles PJ, Cincio L (2021) Error mitigation with Clifford quantum-circuit data. Quantum. 5:592 https://doi.org/10.22331/q-2021-11-26-592
D’Arrigo A, Benenti G, Falci G, Macchiavello C (2013) Classical and quantum capacities of a fully correlated amplitude damping channel. Phys Rev A 88:042337. https://doi.org/10.1103/PhysRevA.88.042337
Deng J, Lin Y (2022) The benefits and challenges of chatgpt: An overview. Frontiers in Computing and Intelligent Systems 2(2):81–83 https://doi.org/10.54097/fcis.v2i2.4465
Edelman A, Rao NR (2005) Random matrix theory. Acta Numer 14:233–297. https://doi.org/10.1017/S0962492904000236
Ginibre J (1965) Statistical ensembles of complex, quaternion, and real matrices. J Math Phys 6(3):440–449. https://doi.org/10.1063/1.1704292
Giurgica-Tiron T, Hindy Y, LaRose R, Mari A, Zeng WJ (2020) Digital zero noise extrapolation for quantum error mitigation. In: 2020 IEEE International Conference on Quantum Computing and Engineering (QCE), pp. 306–316. https://doi.org/10.1109/QCE49297.2020.00045 . IEEE
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press, Cambridge, MA. http://www.deeplearningbook.org
Gulshen K, Combes J, Harrigan MP, Karalekas PJ, Silva MP, Alam MS, Brown A, Caldwell S, Capelluto L, Crooks G, Girshovich D, Johnson BR, Peterson EC, Polloreno A, Rubin NC, Ryan CA, Staley A, Tezak NA, Valery J (2019). Forest Benchmarking: QCVV using PyQuil. https://doi.org/10.5281/zenodo.3455847
Harman R, Lacko V (2010) On decompositional algorithms for uniform sampling from n-spheres and n-balls. J Multivar Anal 101(10):2297–2304. https://doi.org/10.1016/j.jmva.2010.06.002
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366. https://doi.org/10.1016/0893-6080(89)90020-8
Jozsa R (1994) Fidelity for mixed quantum states. J Mod Opt 41(12):2315–2323. https://doi.org/10.1080/09500349414552171
Kamath, U., Liu, J., Whitaker, J.: Deep Learning for NLP and Speech Recognition vol. 84. Springer, Cham, Switzerland (2019). https://doi.org/10.1007/978-3-030-14596-5
Kandala A, Temme K, Córcoles AD, Mezzacapo A, Chow JM, Gambetta JM (2019) Error mitigation extends the computational reach of a noisy quantum processor. Nature 567(7749):491–495. https://doi.org/10.1038/s41586-019-1040-7
Kim C, Park KD, Rhee J-K (2020) Quantum error mitigation with artificial neural network. IEEE Access. 8:188853–188860. https://doi.org/10.1109/ACCESS.2020.3031607
Kim J, Oh B, Chong Y, Hwang E, Park DK (2022) Quantum readout error mitigation via deep learning. New J Phys 24(7):073009. https://doi.org/10.1088/1367-2630/ac7b3d
Kim Y, Eddins A, Anand S, Wei KX, Berg E, Rosenblatt S, Nayfeh H, Wu Y, Zaletel M, Temme K, Kandala A (2023) Evidence for the utility of quantum computing before fault tolerance. Nature 618(7965):500–505. https://doi.org/10.1038/s41586-023-06096-3
Liang Y-C, Yeh Y-H, Mendonça PE, Teh RY, Reid MD, Drummond PD (2019) Quantum fidelity measures for mixed states. Rep Prog Phys 82(7):076001. https://doi.org/10.1088/1361-6633/ab1ca4
Lohani S, Kirby BT, Brodsky M, Danaci O, Glasser RT (2020) Machine learning assisted quantum state estimation. Machine Learning: Science and Technology. 1(3):035007. https://doi.org/10.1088/2632-2153/ab9a21
Lowe A, Gordon MH, Czarnik P, Arrasmith A, Coles PJ, Cincio L (2021) Unified approach to data-driven quantum error mitigation. Physical Review Research. 3(3):033098. https://doi.org/10.1103/PhysRevResearch.3.033098
Lumino A, Polino E, Rab AS, Milani G, Spagnolo N, Wiebe N, Sciarrino F (2018) Experimental phase estimation enhanced by machine learning. Phys Rev Appl 10(4):044033. https://doi.org/10.1103/PhysRevApplied.10.044033
Mangini S, Maccone L, Macchiavello C (2022) Qubit noise deconvolution. EPJ Quantum. Technology 9(1):29. https://doi.org/10.1140/epjqt/s40507-022-00151-0
Meckes ES (2019) The Random Matrix Theory of the Classical Compact Groups. Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge . https://doi.org/10.1017/9781108303453
Morgillo AR, Mangini S (2024) QuantumStateReconstruction-DL. https://github.com/MorgilloR/QuantumStateReconstruction-DL
Nielsen MA, Chuang IL (2010) Quantum Computation and Quantum Information. Cambridge University Press, Cambridge, UK. https://doi.org/10.1017/CBO9780511976667
Ozols M, Mančinska L (2007) Generalized Bloch Vector and the Eigenvalues of a Density Matrix. Available online at: http://home.lu.lv/~sd20008/papers/essays.html
Qiskit contributors: Qiskit (2023) An Open-source Framework for Quantum Computing . https://doi.org/10.5281/zenodo.2573505
Roncallo S, Maccone L, Macchiavello C (2023) Multiqubit noise deconvolution and characterization. Phys Rev A 107:022419. https://doi.org/10.1103/PhysRevA.107.022419
Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo Method. John Wiley & Sons, Hoboken, NJ, United States. https://doi.org/10.1002/9781118631980
Sack SH, Egger DJ (2023) Large-scale quantum approximate optimization on non-planar graphs with machine learning noise mitigation. Preprint at https://arxiv.org/abs/2307.14427
Scholten T, Liu Y-K, Young K, Blume-Kohout R (2019) Classifying single-qubit noise using machine learning. Preprint at https://arxiv.org/abs/1908.11762
Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T, Lillicrap T, Silver D (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839):604–609. https://doi.org/10.1038/s41586-020-03051-4
Smith AWR, Khosla KE, Self CN, Kim MS (2021) Qubit readout error mitigation with bit-flip averaging. Sci Adv 7(47):8009. https://doi.org/10.1126/sciadv.abi8009
Torlai G, Mazzola G, Carleo G, Mezzacapo A (2020) Precise measurement of quantum observables with neural-network estimators. Physical Review Research. 2(2):022060. https://doi.org/10.1103/PhysRevResearch.2.022060
Van Den Berg E, Minev ZK, Temme K (2022) Model-free readout-error mitigation for quantum expectation values. Phys Rev A 105:032620. https://doi.org/10.1103/PhysRevA.105.032620
Van Den Berg E, Minev ZK, Kandala A, Temme K (2023) Probabilistic error cancellation with sparse pauli-lindblad models on noisy quantum processors. Nat Phys 1–6. https://doi.org/10.1038/s41567-023-02042-2
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wallnöfer J, Melnikov AA, Dür W, Briegel HJ (2020) Machine learning for long-distance quantum communication. PRX Quantum. 1(1):010301. https://doi.org/10.1103/PRXQuantum.1.010301
Wang X, Yu C-S, Yi XX (2008) An alternative quantum fidelity for mixed states of qudits. Phys Lett A 373(1):58–60. https://doi.org/10.1016/j.physleta.2008.10.083
Zlokapa A, Gheorghiu A (2020) A deep learning model for noise prediction on near-term quantum devices. Preprint at https://arxiv.org/abs/2005.10811
Życzkowski K, Penson KA, Nechita I, Collins B (2011) Generating random density matrices. J Math Phys 52(6). https://doi.org/10.1063/1.3595693
Funding
Open access funding provided by Universitá degli Studi di Pavia within the CRUI-CARE Agreement. A.R.M. acknowledges support from the PNRR MUR Project PE0000023-NQSTI. C.M. acknowledges support from the EU2020 QuantERA project QuICHE and from the MUR PRIN project 2022SW3RPY.
Author information
Authors and Affiliations
Contributions
C.M. and S.M. conceived the presented idea. A.R.M. performed the computations, and wrote the majority of the paper. S.M. contributed to the conceptualization, verified analytical methods, and provided significant contributions to the writing. M.P. assisted in developing simulation models and supervised findings. All authors discussed the results and contributed to the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Noise channels
In this appendix we present the noise quantum channels used in the simulations to generate the datasets containing noisy Bloch vectors.
1.1 A.1 Bit-flip channel
The bit-flip channel flips the qubit state from \(|0\rangle \) to \(|1\rangle \) and vice versa with probability p. Given the operator-sum representation in (2), its Kraus operators are
It is possible to represent the deformation that occurs on the Bloch sphere after the bit-flip noise occurred: the states on the \(\hat{x}\) axis are left alone while the \(\hat{y}-\hat{z}\) plane is uniformly contracted by a factor 2p.
1.2 A.2 Phase-flip channel
This channel changes the sign of the component associated to the element of the computational basis \(|1\rangle \) of the qubit state. The channel is represented by the operation elements
On the Bloch sphere, while the \(\hat{x}-\hat{y}\) plane is contracted by a factor 2p, the states on the \(\hat{z}\) axis are left untouched.
1.3 A.3 Bit-phase-flip channel
This channel is a combination of a bit-flip and a phase-flip channel. Recalling that \(Y=iXZ\), the Kraus operators of this channel are
This noise acts on the Bloch sphere by leaving alone the states on the \(\hat{y}\) axis and contracting the \(\hat{x}-\hat{z}\) plane by a factor 2p.
1.4 A.4 General Pauli channel
The general Pauli channel is a combination of bit-, phase- and bit-phase-flip channels, each one with its own intensity on the correspondent axis (Mangini et al. 2022). In this case, the operators are
with \(p_1, p_2\) and \(p_3\) probabilities of each error such that \(p_0 + p_1 + p_2 + p_3 = 1\).
1.5 A.5 Depolarizing channel
The depolarizing channel acts on a quantum state by leaving it untouched with probability \(1-p\), or replacing it with the completely mixed state \(\mathbb {I}/2\) with probability p. Its Kraus operators are
In this case the entire Bloch sphere is uniformly contracted by a factor which depends on p. The channel can be generalized for d-dimensional (\(d=2^n\) for n qubits) quantum systems as
1.6 A.6 Generalized amplitude damping channel
This channel can be used to describe the energy dissipation in a quantum system to an environment at finite temperature (Nielsen and Chuang 2010), and can be described with Kraus operators
where it is possible to define the stationary state
This channel leads to a deformation of the Bloch sphere, where \(\gamma \) regulates the shrinking of each component of the Bloch vector and p indicates which is the fixed point.
1.7 A.7 Correlated amplitude damping channel
This is a two-qubit noise channel defined as the convex combination of two channels \(\mathcal {N}_0\) and \(\mathcal {N}_1\) (D’Arrigo et al. 2013)
where \(\mu \in [0,1]\) is a correlation parameter between the qubits, and \(\mathcal {N}_0 = \sum _{j=0}^{3} A_j \rho A_j^\dagger \) and \(\mathcal {N}_1 = \sum _{j=0}^{1} B_j \rho B_j^\dagger \) are noisy channels defined by Kraus operators \(A_0 = E_0 \otimes E_0, A_1 = E_0 \otimes E_1, A_2 = E_1 \otimes E_0\) and \(A_3 = E_1 \otimes E_1\) with \(E_0\) and \(E_1\) being the operators in (2627) with \(p=1\), and
Appendix B: Fidelity of single-qubit states
As proved in Jozsa (1994), the fidelity formula (8) can be much simplified for one-qubit states and expressed in terms of Bloch vectors. For an hermitian \(2 \times 2\) matrix M with positive eigenvalues, it holds
and with \(M = \sqrt{\rho }\,\sigma \,\sqrt{\rho }\) as in the definition of the fidelity (8), one obtains
Finally, expressing the density matrices in terms of their Bloch vectors, namely \(\rho = (\mathbb {I} + \varvec{r} \cdot \varvec{P}) / 2\) and \(\sigma = (\mathbb {I} + \varvec{s} \cdot \varvec{P}) / 2\) with \(\varvec{P} = (X, Y, Z)\), by explicit calculation one arrives at
With such expression at hand, it is then possible to prove that for pure single-qubit states minimizing the mean squared error loss is equivalent to minimizing infidelity. In fact, consider the Euclidean distance between the Bloch vectors
and the infidelity
Then, since we are computing the loss functions with Bloch vectors \(\varvec{r}\) and \(\varvec{s}\) of pure states — the ideal noise-free Bloch vector and the predicted one by the neural network, see (4) — it holds that \(\Vert {\varvec{r}}\Vert = \Vert {\varvec{s}}\Vert = 1\), thus obtaining
and substituting in (34) one finally arrives at
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Morgillo, A.R., Mangini, S., Piastra, M. et al. Quantum state reconstruction in a noisy environment via deep learning. Quantum Mach. Intell. 6, 39 (2024). https://doi.org/10.1007/s42484-024-00168-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42484-024-00168-x