1 Introduction

One of the main problems in quantum information processing and computation is that quantum systems can be corrupted by unwanted interactions with the environment. Therefore, the incorporation of robust quantum error correction and mitigation strategies is of paramount importance to realize the full potential of quantum information processing.

Despite the effectiveness of quantum error correction protocols in preserving information, they often require significant overhead and resources. Quantum error mitigation techniques, on the other hand, focus on reducing the impact of noise without fully correcting it, making them more feasible for near-term quantum devices (Cai et al. 2023; Kandala et al. 2019). Examples are readout mitigation techniques to correct measurement errors (Van Den Berg et al. 2022; Smith et al. 2021; Bravyi et al. 2021), noise deconvolution methods to retrieve ideal expectation values of generic observables evaluated on a system subject to a known noise before measurement (Mangini et al. 2022; Roncallo et al. 2023), probabilistic error cancelation (Van Den Berg et al. 2023) and data-driven approaches such as zero-noise extrapolation (Kim et al. 2023) and Clifford data regression (Czarnik et al. 2021; Giurgica-Tiron et al. 2020; Lowe et al. 2021) to mitigate noise happening during a quantum computation.

Another area of great interest is deep learning, which has achieved impressive successes over the past years, with generative pre-trained large language models now leading the way (Vaswani et al. 2017; Brown et al. 2020; Deng and Lin 2022). Deep learning models have excelled in diverse areas, from image and speech recognition models (He et al. 2016; Kamath et al. 2019) to playing games (Schrittwieser et al. 2020), reaching and often surpassing human-level performances. These advancements highlight the vast potential of deep learning in revolutionizing numerous fields, including quantum computation and information.

Indeed, deep learning techniques have shown great promi-ses also for quantum information processing applications, as they were leveraged successfully in, e.g., experimental phase estimation tasks (Lumino et al. 2018), automating the development of QCVV (Quantum characterization, validation and verification) protocols (Scholten et al. 2019), learning quantum hardware-specific noise models (Zlokapa and Gheorghiu 2020), increasing measurement precision of quantum observables with neural networks (Torlai et al. 2020), quantum error mitigation (Kim et al. 2022, 2020; Sack and Egger 2023), identifying quantum protocols such as teleportation or entanglement purification (Wallnöfer et al. 2020), classification and reconstruction of optical quantum states (Ahmed et al. 2021), and quantum state estimation (Lohani et al. 2020).

In this work, we leverage machine learning techniques based on feed-forward neural networks to deal with the task of recovering noise-free quantum states when they undergo an undesired noise evolution. In fact, while it is well known that quantum noisy channels cannot be physically inverted in general, this may be achieved by means of classical post-processing methods (Mangini et al. 2022; Roncallo et al. 2023; Van Den Berg et al. 2023). In particular, since neural networks are universal approximators (Hornik et al. 1989), they can be used to learn a mapping that effectively inverts that effect of noise, and hence they can be used to reconstruct noiseless quantum states. Specifically, let \(\varvec{\tilde{r}}\) indicate the (generalized) Bloch components of a noisy quantum state, our goal is to train a neural network \(h_{\varvec{w}}(\cdot )\) to output the Bloch vector of the ideal noiseless state \(\varvec{\tilde{r}}\rightarrow h_{\varvec{w}}(\varvec{\tilde{r}}) = \varvec{r}\), where \(\varvec{r}\) is the ideal Bloch vector of the state before it undergoes noise process. We explore several combinations of single- and two-qubit noisy channels acting on systems of up to three qubits, and also study the effect of using different loss functions for training, and show that our neural network-based method can reach quantum state reconstruction fidelities higher than 99.9%. The main idea of the proposed method is summarized in Fig. 1.

In addition to regression tasks, we also show how feed-forward neural networks can be used for classifying different quantum channels based on the effect they have on quantum states. In particular, using as inputs Bloch vectors \([\varvec{\tilde{r}}, \varvec{r}]\) obtained with different channels, the network will output a label corresponding to the quantum channel that has been applied to \(\varvec{r}\) in order to produce \(\varvec{\tilde{r}}\). Also in this case, we achieve almost perfect channel classification accuracy.

Fig. 1
figure 1

Outline of the neural network-based noise reconstruction and classification protocols. (a) Noisy Bloch vectors, representing quantum states affected by noise, serve as input to a feed-forward neural network. The quantum state reconstruction protocol aims to recover the original noiseless quantum states from the observed noisy Bloch vectors, utilizing the neural network model. (b) Both noisy and noiseless Bloch vectors are fed into the neural network as input. The network is specifically designed for a classification task, where the output provides a label representing the type of noise acting on the noiseless quantum state

The rest of the manuscript is organized as follows. In Section 2 we formally introduce the problem and the neural network used to obtain the quantum state reconstruction. In Section 3 we present the results obtained for the reconstruction of pure and mixed states, and we introduce the noise classification problems that can be solved similarly with neural networks. In Section 4 we summarize all our results and possible improvements of our method.

2 Methods

In this section, we formalize the quantum communication problem we want to tackle, that is, the recovery of noiseless quantum states undergoing an undesired noisy evolution. We first start by introducing the notation for describing an n-qubit quantum state in terms of its Bloch components, and then move on discussing the neural network approach used in this work, including details on the optimization procedure and the construction of the training and test dataset.

2.1 Reconstruction of noisy Bloch vectors

The state of an n-qubit quantum system is described by its density matrix \(\rho \in \mathbb {C}^{2^n \times 2^n}\), which can be expressed in the Pauli basis as follows (Nielsen and Chuang 2010; Ozols and Mančinska 2007)

$$\begin{aligned} \rho = \frac{1}{2^n} (\mathbb {I}_{2^n} + \varvec{r} \cdot \varvec{P}) \end{aligned}$$
(1)

where \(\varvec{r} \in \mathbb {R}^{4^n-1}\) is the generalized Bloch vector, and \(\varvec{P} = (P_1,...,\, P_{4^n-1})\) is a vector containing the multi-qubit Pauli basis, obtained by considering tensor products of single-qubit Pauli matrices, that is \(P_i = \sigma _1^{(i)} \otimes \cdots \otimes \sigma _n^{(i)}\), with \(\sigma _k \in \{\mathbb {I}, X, Y, Z\}\).

Quantum channels are completely positive trace preserving (CPTP) maps whose action on a state \(\rho \) can be expressed in Kraus form as (Nielsen and Chuang 2010)

$$\begin{aligned} \rho \longrightarrow \tilde{\rho }= \mathcal {N}(\rho ) = \sum _{i} E_i\,\rho \,E_i^\dagger \,, \end{aligned}$$
(2)

where \(\{E_i\}\) are the Kraus operators of the channel \(\mathcal {N}(\cdot )\), satisfying the trace preserving condition \(\sum _iE_i^{\dagger }E_i=\mathbb {I}\). In our experiments, we consider various single-qubit noisy channels (bit-flip \(\mathcal {X}_p\), phase-flip \(\mathcal {Z}_p\), bit-phase-flip \(\mathcal {Y}_p\), general Pauli \(\mathcal {P}_{\varvec{p}}\), depolarizing \(\mathcal {D}_p\) and amplitude damping \(\mathcal {A}_{p\gamma }\)), as well as a correlated two-qubit amplitude damping channel. We refer to Appendix A for an extended discussion on the quantum noise models used in this work.

Given a noisy channel \(\mathcal {N}(\cdot )\), our goal is to obtain through a learning procedure an optimized neural network that receives noisy Bloch vectors \(\{\varvec{\tilde{r}}_k\}\), and outputs the corresponding noiseless vectors \(\{\varvec{r}_k\}\). In other words, we are looking for the function \(h(\cdot )\) which inverts the action of the noise on the Bloch components of the quantum states, namely

$$\begin{aligned}&h(\varvec{\tilde{r}})= \varvec{r}\,, \quad \text {with} \nonumber \\&\rho = \frac{1}{2^n} (\mathbb {I}_{2^n} + \varvec{\varvec{r}} \cdot \varvec{P}) \\&\tilde{\rho }= \mathcal {N}(\rho ) = \frac{1}{2^n} (\mathbb {I}_{2^n} + \varvec{\varvec{\tilde{r}}} \cdot \varvec{P})\nonumber \,. \end{aligned}$$
(3)

2.2 Reconstruction with neural networks

We provide a concise overview of the fundamental aspects of neural networks, discussing their relevance in addressing the task of quantum state reconstruction.

2.2.1 Generation of the training set

The initial phase of addressing a regression problem involves constructing a valid dataset. In our specific case, the training (and validation) set consists of pairs of noisy and noiseless Bloch vectors

$$\begin{aligned} \mathcal {T} = \{(\varvec{\tilde{r}}_m,\, \varvec{r}_m)\}_{m=1}^{M}\,, \end{aligned}$$
(4)

which are obtained by evolving some input quantum states \(\rho _m\) through the noisy channel under investigation, thus obtaining the noisy states \(\tilde{\rho }_m\). The Bloch components of these density matrices are then computed as \(r_i = \textrm{Tr}[\rho \, P_i]\) for \(i=1,..., 4^n-1\) (1).

The choice of the (generalized) Bloch vector as element of the dataset is twofold: first, each quantum state is characterized by its own vector, which grants a unique representation of the state; and secondly a vector as input fits naturally in the processing structure of a feed-forward neural network.

The input quantum states we consider are uniformly distributed in the space of quantum states. For the case of pure states \(\rho _m = |{\psi _m}\rangle \langle {\psi _m}|\), these are obtained by sampling states \(|{\psi _m}\rangle \) from the Haar distribution (Edelman and Rao 2005; Meckes 2019), while for uniformly distributed mixed states, these can be generated either starting from uniformly distributed pure states by means of an appropriate rescaling (Harman and Lacko 2010; Rubinstein and Kroese 2016) or by using the Ginibre ensemble (Życzkowski et al. 2011; Ginibre 1965; Gulshen et al. 2019).

The cardinality \(|\mathcal {T}| = M\) of the dataset is contingent upon the specific problem under consideration and, as demonstrated in Section 3, has a direct impact on the performance of the network. As the quantum computational resources to generate the training set \(\mathcal {T}\) may be demanding experimentally, one generally has to find a compromise between achieving high reconstruction accuracies, and the number of samples (i.e. quantum states) included in the dataset.

Note that, as currently formulated, our approach for state reconstruction involves training machine learning models separately for each type of noise map. To address the more general problem of reconstructing noisy states coming from various noise channels at the same time, one would need to incorporate additional input information into the network to make the problem unambiguous (the same noisy Bloch vector can be the result of different noise channels). This could include a numerical encoding representing the type of channel applied, thereby merging the two approaches proposed in this manuscript, namely the regression problem (Section 3.1) and the classification one (Section 3.2). We leave more general machine learning-based reconstruction methods as a subject of future studies.

2.2.2 Feed-forward neural networks

In this work, we analyze data using deep feed-forward neural networks, which are parametric models that process information in layer-wise fashion through a repeated application of similar operations, as shown in Fig. 1(a).

These neural networks consist of an input layer responsible for data loading, followed by multiple hidden layers to process the information, and finally, an output layer to obtain the computation’s result. Each layer consists of a set of individual nodes known as neurons, and while input and output layers have a number of neurons matching the dimension of the input and the output respectively, the number of neurons in the hidden layers is an architectural hyperparameter to be chosen in advance by the designer. For example, the action of a feed-forward neural network with two hidden layers and trainable parameters \(\varvec{\theta }\), can be expressed as

$$\begin{aligned} \varvec{\hat{y}} = \text {NN}_{\varvec{\theta }}(\varvec{x}) = \varvec{w} \cdot g(\varvec{W}^{[2]}~g(\varvec{W}^{[1]}\varvec{x} + \varvec{b}^{[1]}) + \varvec{b}^{[2]}) + b \end{aligned}$$
(5)

where \(\varvec{x} \in \mathbb {R}^d\) and \(\hat{\varvec{y}} \in \mathbb {R}^{p}\) are the input and output vectors, \(\varvec{W}^{[1]} \in \mathbb {R}^{h_1 \times d}\) and \(\varvec{b}^{[1]} \in \mathbb {R}^{h_1}\) are the trainable parameters for the first hidden layer, \(\varvec{W}^{[2]} \in \mathbb {R}^{h_2 \times h_1}\) and \(\varvec{b}^{[2]}\in \mathbb {R}^{[h_2]}\) are the trainable parameters for the second hidden layer, \(\varvec{w} \in \mathbb {R}^{p}\) and \(b \in \mathbb {R}\) are the trainable parameters for the output layer, and \(g(\cdot )\) is a non-linear activation function which is applied element-wise to the entries of the vectors. As previously mentioned, \(h_1\) and \(h_2\) are hyperparameters that represent the number of hidden neurons for each respective layer.

In our simulations, we explore different architectures using networks with 2 or 3 hidden layers using \(h_i \in \{64, 128\}\) hidden neurons per layer, while the input and output layers have dimension \(d=p=4^n-1\), as they are employed to represent the components of the Bloch vectors. For the activation function, as customary in machine learning, we adopt the Rectified Linear Unit (ReLU), defined as \(g(x) = \text {ReLU}(x):= \max (0,x)\).

2.2.3 Performance metrics

Given the dataset and the trainable model, we discuss the figures of merit employed for training and evaluating the neural network’s performance in quantum state reconstruction and noise classification tasks. In the context of quantum state reconstruction, we have tested two possible alternatives coming from the classical and quantum information domain respectively, namely the Mean Squared Error (MSE) between the reconstructed and ideal Bloch vectors, and the quantum infidelity between the reconstructed and original quantum states.

In noise classification problems, we used both categorical cross-entropy (15) and accuracy metrics (16) in order to assess how effectively the neural network can distinguish between different types of noisy channels.

MSE

The mean squared error is the most common measure of performances for regression problems in machine learning and consists of the Euclidean distance between vectors, which in our case becomes

$$\begin{aligned} \ell (\varvec{\theta }, \varvec{\varvec{r}}_i) = \Vert {\varvec{\varvec{r}}_i - \hat{\varvec{r}}_i(\varvec{\theta })}\Vert ^{2} = \Vert {\varvec{\varvec{r}}_i - \text {NN}_{\varvec{\theta }} (\varvec{\varvec{\tilde{r}}}_i)}\Vert ^{2}, \end{aligned}$$
(6)

where \(\varvec{\varvec{r}}_i\) is the noiseless Bloch vector (see (4)), and \(\varvec{\hat{r}}_i(\theta ) =\text {NN}_{\varvec{\theta }}(\varvec{\tilde{r}}_i)\) is the one predicted by the neural network, with trainable parameters \(\varvec{\theta }\), when receiving as input the noisy Bloch vector \(\varvec{\tilde{r}}_i\). Then, the mean squared error function over the entire dataset \(\mathcal {T}\) of size \(|\mathcal {T}|=M\), is

$$\begin{aligned} \mathcal {L}_\text {MSE}(\varvec{\theta };\, \mathcal {T})= & {} \frac{1}{M}\sum _{i=1}^{M}\ell (\varvec{\theta }, \varvec{\varvec{r}}_i) = \frac{1}{M}\sum _{i=1}^{M}\Vert \varvec{\varvec{r}}_i - \hat{\varvec{r}}_i\Vert ^2 , \end{aligned}$$
(7)

with \((\varvec{\varvec{\tilde{r}}_i}, \varvec{\varvec{r}_i}) \in \mathcal {T}\).

Infidelity

The quantum fidelity is a measure of distance for quantum states, and given the quantum nature of the data under investigation, it is particularly suited to assess the reconstruction performances of the neural network. Given two density matrices \(\rho \) and \(\sigma \), their fidelity is defined as (Nielsen and Chuang 2010)

$$\begin{aligned} F(\rho , \sigma ) := \text {Tr}[\sqrt{\sqrt{\rho }\,\sigma \,\sqrt{\rho }}]^2\,, \end{aligned}$$
(8)

with \(0\le F(\rho , \sigma )\le 1\), where the second equality holds if and only if the states are equal \(\rho = \sigma \). The infidelity between two quantum states is then defined as \(I(\rho , \sigma ):= 1 - F(\rho , \sigma )\).

Despite being suited to measure the distance of quantum states, the complex functional dependence of the infidelity on the Bloch vectors of the density matrices — hence the parameters of the neural network — often leads to numerical instabilities when it is used as the loss function to drive the training of the neural network, eventually impairing the optimization process. For this reason, when \(\rho \) and \(\sigma \) are pure states, we instead use directly the simplified but equivalent expression for the fidelity

$$\begin{aligned} F(\rho ,\sigma ) = \text {Tr}[\rho \,\sigma ]\,,\quad \text {for } \rho , \sigma \text { pure}\,, \end{aligned}$$
(9)

while if the states are mixed, we use an alternative measure of distance proposed in Wang et al. (2008)

$$\begin{aligned} F(\rho , \sigma ) = \frac{|{\text {Tr}[\rho \,\sigma ]}|}{\sqrt{\text {Tr}[\rho ^2] \text {Tr}[\sigma ^2]}}\,,\quad \text {for } \rho , \sigma \text { mixed}\,. \end{aligned}$$
(10)

The fidelity in (10) reaches the maximum 1 if and only if \(\rho = \sigma \). However, it differs from the standard fidelity reported in (8), since it is not monotonic under quantum operations, meaning it is neither concave nor convex, and it includes a normalization factor, resulting in a scaled fidelity measure, when one of the two states is pure, i.e. if \(\sigma =|{\psi }\rangle \langle {\psi }|\), then \(F(\rho , |{\psi }\rangle \langle {\psi }|) = \langle \psi |\rho | \psi \rangle / \text {Tr}\rho ^{2}\).

From the fidelity expressions in (9) and (10), we calculate the corresponding infidelity and perform an average over the entire dataset, yielding

$$\begin{aligned} \begin{aligned} \mathcal {L}_\text {INF}(\varvec{\theta };\, \mathcal {T})&= \frac{1}{M}\sum _{i=1}^{M}1 -F(\rho _i, \sigma _i), \end{aligned} \end{aligned}$$
(11)

where \(\rho _i\) and \(\sigma _i\) are the density matrices computed respectively from the Bloch vectors \(\varvec{\varvec{r}}_i\) and \(\hat{\varvec{r}}_i = \varvec{\hat{r}}_i(\theta ) =\text {NN}_{\varvec{\theta }}(\varvec{\tilde{r}}_i)\), with \((\varvec{\varvec{\tilde{r}}_i}, \varvec{\varvec{r}_i}) \in \mathcal {T}\).

Moreover, it is worth noticing that for single-qubit density matrices the fidelity (8) can be further simplified and expressed in terms of the Bloch vectors as (Jozsa 1994)

$$\begin{aligned} \begin{aligned} F(\varvec{r}, \varvec{s}) = \frac{1}{2} (1 + \varvec{r} \cdot \varvec{s} + \sqrt{(1 - \Vert \varvec{r}\Vert ^2) (1 - \Vert \varvec{s}\Vert ^2)} ), \end{aligned} \end{aligned}$$
(12)

where \(\varvec{r}, \varvec{s} \in \mathbb {R}^3\) are the Bloch vectors for \(\rho \) and \(\sigma \) respectively. Specifically, for the particular case when both states are pure \(\Vert {\varvec{r}}\Vert ^{2}=\Vert {\varvec{s}}\Vert ^{2} = 1\), then the expression for the infidelity corresponds to the mean squared error up to a prefactor

$$\begin{aligned} I(\varvec{r}, \varvec{s}) = 1 - \frac{1}{2}(1+\varvec{r}\cdot \varvec{s}) = \frac{\Vert {\varvec{s}-\varvec{r}\Vert }^2}{4}\,. \end{aligned}$$
(13)

We refer to Appendix B for more details on the derivation of (12) and (13). Notably, to the best of our knowledge, apart for single-qubit states, there is no straightforward connection between the fidelity of quantum states and the Euclidean distance of their generalized Bloch vectors.

Finally, in order to assess the performance of the optimized neural network we introduce the average test fidelity (ATF), defined as the mean fidelity between the predicted quantum states and their corresponding ideal counterparts averaged over a test dataset \(\mathcal {\tilde{T}}\) which was not used during training. The ATF is calculated as

$$\begin{aligned} \text {ATF}(\varvec{\theta }; \mathcal {\tilde{T}}) = \frac{1}{N}\sum _{i=1}^{N} F(\rho _i, \sigma _i), \end{aligned}$$
(14)

where N is the cardinality of the test set, \(F(\cdot ,\,\cdot )\) is the fidelity in (8), and \(\rho _i\) and \(\sigma _i\) are the density matrices computed respectively from the Bloch vectors \(\varvec{r}_i\) and \(\varvec{\hat{r}}_i=\text {NN}_{\varvec{\theta }}(\varvec{\tilde{r}}_i)\), with \((\varvec{\varvec{\tilde{r}}}_i, \varvec{\varvec{r}}_i) \in \mathcal {\tilde{T}}\). A high average test fidelity, typically exceeding 99.9%, indicates that our neural network is capable of accurately reconstructing the corrupted quantum states.

Categorical cross-entropy

The categorical cross-entropy is one of the most common measures to evaluate the performance of classification models (Goodfellow et al. 2016). Given a classification task with C different classes, the categorical cross-entropy quantifies the disparity between the predicted probability distribution of the classes and the true class labels, and it is mathematically defined as

$$\begin{aligned} \text {CCE} := -\sum _{i=1}^{M} \sum _{j=1}^{C} y_{ij} \log (p_{ij}), \end{aligned}$$
(15)

where M is the number of samples to be classified, \(y_{i} = (0,0,..., 1_{c(i)},..., 0) \in \{0,1\}^C\) is the true probability distribution for the samples indicating that the i-th sample belongs to class c(i), and \(p_{ij} \in [0,\,1]\) is the model’s predicted probability distribution for the i-th sample to belong to the j-th class. This metric penalizes the deviation between predicted and actual distributions, with lower values indicating better alignment between the model’s predictions and the true labels.

Accuracy

For classification tasks the evaluation of performance commonly revolves around the metric of accuracy, which quantifies the effectiveness of a model in correctly categorizing samples within a dataset. Mathematically, the accuracy is defined as the ratio of the number of correctly classified samples to the total number of samples, namely

$$\begin{aligned} \text {ACC} := \frac{\text {number of correct predictions}}{\text {total number of predictions}}, \end{aligned}$$
(16)

where a value of 1 indicates a perfect classification.

2.2.4 Optimization

Training the neural network means solving the minimization problem

$$\begin{aligned} \varvec{\theta }_{\text {opt}} = \underset{\varvec{\theta }}{\arg \min }\; \mathcal {L}(\varvec{\theta };\, \mathcal {T}), \end{aligned}$$
(17)

where \(\varvec{\theta }\) are the trainable parameters of the neural network, \(\mathcal {T}\) is the training dataset as defined in (4), and \(\mathcal {L}\) is the loss function driving the learning process, which, as discussed in the previous section, in our case is either the mean squared error (7) or the average infidelity (11).

3 Results

We now present the results obtained for quantum state reconstruction and quantum noise classification using the proposed neural network methods. We start discussing the performance of the neural network in reconstructing noise-free quantum states corrupted by various single- and two-qubit noisy channels, and then proceed showcasing the network’s capability to classify different quantum noisy channels accurately. Our results demonstrate high-fidelity state reconstruction and robust channel classification, thus revealing the potential of machine learning techniques for quantum information processing and computation.

All simulations performed in this work are run with Qiskit (Qiskit contributors 2023) and TensorFlow (Abadi et al. 2015).

3.1 Quantum state reconstruction

In order to explore the reconstruction capabilities of the neural network approach, we first start by studying its performances in the simpler case of learning noisy single-qubit states, and then move on to the more complex case of multi-qubit systems. In both scenarios, we see a clear dependence of the performances on the amount of available data to train the model (size of the training set), and provided that data is sufficient the network is always able to restore noiseless quantum states from their noisy counterparts.

Whenever we consider the task of reconstructing initially pure states, an auxiliary normalization layer is added at the end of the neural network (5) so that the outputs always consist of (Bloch) vectors with unit norm (for single-qubit states), as desired for pure states. Such normalization constraint enforces the generation of physically consistent output states, which in turns effectively constrains the value of the infidelity loss function to remain in the physical regime.Footnote 1 It is worth noting that even when the MSE is employed as the loss function, the same normalization layer is used to ensure that the predicted states maintain their physical integrity and adhere to the desired constraints. The effect of the normalization layer is simply to rescale each output as follows

$$\begin{aligned} \text {NN}_{\varvec{\theta }}(\varvec{\tilde{r}}) = \hat{\varvec{r}} \longrightarrow \hat{\varvec{r}} / \Vert {\hat{\varvec{r}}}\Vert _{2}\,. \end{aligned}$$
(18)

In addition, whenever we consider the case of reconstructing already mixed states \(\rho \) of given purity \(\text {Tr}[\rho ^2]\), the output of the neural network is additionally rescaled with the purity of the initial mixed state, \(\hat{\varvec{r}} \rightarrow \sqrt{\text {Tr}[\rho ^2]} \hat{\varvec{r}} \).

3.1.1 Single-qubit states

In Table 1 we summarize the results of the reconstruction process for various noisy channels, using different loss functions to drive the training process, and using pure or mixed states as inputs to the neural network. In all cases we observe a very good average test fidelity (ATF) (14) at the end of training exceeding 99.6%, thus showing the effectiveness of the proposed approach for reconstructing noisy states. As we now see, equally good performances are also obtained in the more complex task of inverting noise in multi-qubit systems.

Table 1 Reconstruction of single-qubit quantum states. Training with both the MSE and infidelity as loss functions yields good average test fidelities (ATF) at the end of the optimization process. Results are reported for different noisy channels (bit-flip \(\mathcal {X}_p\), phase-flip \(\mathcal {Z}_p\), bit-phase-flip \(\mathcal {Y}_p\), general Pauli \(\mathcal {P}_{\varvec{p}}\), depolarizing \(\mathcal {D}_p\), and general amplitude damping channels \(\mathcal {A}_{p\gamma }\)), and for both pure and mixed initial states
Table 2 Results of pure multi-qubit states reconstruction using MSE and infidelity as loss functions. For two-qubit states, we studied scenarios involving: a phase-flip channel on the first qubit \(\mathcal {Z}_p \otimes \mathcal {I}\), phase-flip channel on both qubits \(\mathcal {Z}_p \otimes \mathcal {Z}_p\), phase-flip and bit-flip channels respectively on the first and second qubit \(\mathcal {Z}_p \otimes \mathcal {X}_p\), and correlated amplitude damping \(\mathcal {C}_{\text {AD}}(\eta ,\, \mu )\). For three-qubit states, we considered the scenario characterized by bit-, phase-, and bit-phase-flip channels applied distinctly to all three qubits \(\mathcal {X}_p \otimes \mathcal {Z}_p \otimes \mathcal {Y}_p\)

3.1.2 Multi-qubit states

We tested the reconstruction procedure also on systems of \(n=2\) and \(n=3\) qubit systems undergoing several noisy evolutions with both uncorrelated and correlated noisy channels, and we summarize the results in Table 2. For two-qubit systems, we considered the following noise maps: (i) phase-flip channel with \(p=0.2\) applied to the first qubit and nothing on the second indicated as \(\mathcal {Z}_p \otimes \mathcal {I}\); (ii) a phase-flip channel applied to both qubits with \(p=0.2\), denoted as \(\mathcal {Z}_p \otimes \mathcal {Z}_p\); (iii) phase-flip channel on the first qubit and a bit-flip channel on the second qubit, both with \(p=0.2\), denoted as \(\mathcal {Z}_p \otimes \mathcal {X}_p\); (iv) and finally we also have investigated a scenario where a correlated two-qubit amplitude damping channel \(\mathcal {C}_{AD}\) (\(\eta = 0.1\), \(\mu = 0.2\)), was applied to the system (see (29) in Appendix A for a definition of the channel).

For \(n=3\) qubit systems instead, we tested the reconstruction performance with states subject to the composite channel \(\mathcal {X}_p \otimes \mathcal {Z}_p \otimes \mathcal {Y}_p\). In this configuration, each qubit experiences a distinct quantum channel, including a bit-flip, phase-flip, and bit-phase-flip channel with a common noise parameter of \(p=0.2\).

The results reported in Table 2 again reveal a successful reconstruction of ideal density matrices through the use of a relatively simple feed-forward neural network, and thus confirm the effectiveness of the proposed method. As already mentioned previously, we stress again that to ensure the production of pure quantum states as output, a normalization layer has been incorporated. Specifically, by appropriately multiplying the normalization layer in (18), the norm of the output Bloch vectors is constrained to \(\sqrt{3}\) for two-qubit states, while it is set to \(\sqrt{7}\) (Avron and Kenneth 2020) for three-qubit states. Comparing the cardinality of the training set \(|{\mathcal {T}}|\) used for the simulations in Tables 1 and 2, we see that more samples in the training dataset are generally needed to ensure a good reconstruction for larger system sizes, as one would expect. A discussion on the impact of the available information on the reconstruction performances is the topic of the next section.

In Fig. 2 we report the evolution of the MSE (7) and infidelity (11) loss functions during training for the case of \(\mathcal {Z}_p \otimes \mathcal {X}_p\) applied to a two-qubit system. As clear from the picture, the optimization process of both cost functions is straightforward, and interestingly they follow a similar minimization behavior, despite there is not a trivial relation between the two, as instead it happens for single-qubit states, see (13).

Fig. 2
figure 2

Optimization process of the neural network to reconstruct a two-qubit state under application of the noisy channel \(\mathcal {Z}_p \otimes \mathcal {X}_p\), using the MSE (7) and the infidelity (11) as loss functions. Both metrics display a similar minimization behavior

We conclude by noticing that, taking into account that an increase in the number of qubits leads to a proportional increase in the dataset cardinality and the neural network complexity, the reconstruction approach proposed in this work can be straightforwardly applied to any n-qubit quantum state.

3.1.3 Impact of the dataset size on the reconstruction performances

So far we have focused on assessing the reconstruction performances of the neural network assuming that enough data is available, and here instead analyze how such performances depend on the size of the training set.

Fig. 3
figure 3

Average test fidelity (ATF) obtained at the end of training when optimizing the neural networks with training sets of different cardinality. For each cardinality, we repeat the training process 5 times using different initialization of the parameters and different training data, and report the mean value and the standard deviation of the resulting ATFs. (a) Reconstruction of single-qubit states undergoing a phase-flip channel \(\mathcal {Z}_{p}(p=0.2)\). (b) Reconstruction of two-qubit states undergoing the uncorrelated phase-flip channel \(\mathcal {Z}_{p} \otimes \mathcal {Z}_{p}(p=0.2)\)

In Fig. 3 we report the average test fidelity obtained at the end of training, for neural networks optimized using training sets of different cardinality. In Fig. 3a we show the data for the reconstruction of single-qubit states undergoing a phase-flip channel \(\mathcal {Z}_{0.2}\), and in Fig. 3b the reconstruction of two-qubit states undergoing an uncorrelated two-qubit phase-flip channel \(\mathcal {Z}_{0.2} \otimes \mathcal {Z}_{0.2}\).

In both cases — and for both the considered loss functions— we observe that the use of a larger training set yields better reconstruction performances, up until a plateau is reached, but importantly also that satisfactory results can be achieved even with a limited number of samples.

3.2 Classification of noisy channels

The application of neural networks to quantum information processing can be extended beyond quantum state reconstruction to that of classification of quantum channels. In particular, in this section we show how a neural network can be trained to discriminate between noisy channels based on the effect they have on some input states, a scenario which is graphically depicted in Fig. 1. Our exploration encompasses a series of classification scenarios, including binary and multi-class classification.

In such classification problems, each data item in the training set is constructed by considering as input the noiseless Bloch vector \(\varvec{r}_m\) appended to the noisy one \(\varvec{\tilde{r}}_m\), and as output an integer label \(y_m\) encoding which type of error was applied to \(\varvec{r}_m\) to obtain \(\varvec{\tilde{r}}_m\). Formally, given a classification task with C possible classes, the training dataset is then defined as

$$\begin{aligned} \mathcal {T}_{IN} = \big \{\big ([\varvec{\tilde{r}}_m, \varvec{r}_m], y_m)\big )\big \}_{m=1}^M\,, \end{aligned}$$
(19)

where \(\big [\varvec{\tilde{r}}_m, \varvec{r}_m\big ] \in \mathbb {R}^{2 \times 3}\) is the input to the neural network, and \(y_m \in \{1,...,C\}\) is the desired output. We refer to this training dataset with \(\mathcal {T}_{IN}\), where the subscript indicates that we use as input the extended vector obtained by merging the ideal and noisy Bloch vector.

Furthermore, we consider the more complex scenario where every training data point comprises only the noisy vector \(\varvec{\tilde{r}}_m\), accompanied with its associated noise label \(y_m\). As the neural network is now provided with less information, this scenario presents a higher level of complexity and, as will become clear from the following results, it generally necessitates a larger corpus of data points. We refer to this type of training dataset with \(\mathcal {T}_{N}\), where the subscript indicates that we use as input exclusively the noisy Bloch vector.

As standard with classification tasks in machine learning, for both scenarios we use a one-hot encoding of the labels and train the neural networks using the categorical cross-entropy (15) as loss function (Goodfellow et al. 2016), and then measure the final performances with the accuracy metric (16).

3.2.1 Binary classification

We consider the binary classification problem (\(C=2\)) of discriminating single-qubit states subject either to phase-flip channel \(\mathcal {Z}_p\), or amplitude damping channel \(\mathcal {A}_{p\gamma }\). The training set is generated by sampling a number \(|{\mathcal {T}}|\) of random uniform single-qubit states and evolving half of them with the phase-flip channel, and the remaining half with the amplitude damping one. Classification accuracies at the end of the training procedure are reported in Table 3.

Table 3 Results for quantum channel classification tasks, with corresponding accuracies (16) obtained at the end of training and evaluated on test sets \(\tilde{\mathcal {T}}\). We studied the binary classification task of distinguishing channels \(\mathcal {Z}_p\) (\(p=0.2\)) vs. \(\mathcal {A}_{p\gamma }\) (\(p=0.5, \gamma =0.3\)) in both variations “IN” and “N” for the training dataset (19), and the three-class classification problem for channels \(\mathcal {Z}_p\) (\(p=0.2\)) vs. \(\mathcal {A}_{p\gamma }\) (\(p=0.5, \gamma =0.3\)) vs. \(\mathcal {D}_p\) (\(p=0.3\)) in the variant “IN”
Fig. 4
figure 4

Deformed Bloch spheres obtained by applying a phase-flip channel \(\mathcal {Z}_p(p=0.2)\) (light blue), and a generalized amplitude damping channel \(\mathcal {A}_{p\gamma }(p=0.5,\, \gamma =0.3)\), to a set of uniformly distributed pure states. Note the non-trivial intersection between the two ellipsoids

Remarkably, with a training set containing \(|{\mathcal {T}_{IN}}|=300\) samples, our model exhibits very good classification performances, reaching perfect accuracy \(\text {ACC}=1\). In this case, we noticed that even with a reduced dataset comprising just a few dozen samples, the model is able to achieve almost perfect accuracy, even though the learning process becomes perceptibly less stable.

On the other hand, with the noisy training set \(\mathcal {T}_N\) with \(|{\mathcal {T}_N}|=300\) samples, the best accuracy obtained was \(\text {ACC}=0.92\). As expected, as the noisy dataset \(\mathcal {T}_N\) contains only information about the noisy Bloch vectors (and a label for the noisy channel that created them), more data is needed to reach good classification performances. Indeed, higher accuracies can then be obtained by using larger training datasets: for example, an accuracy of \(\text {ACC}=0.98\) can be achieved with a training set containing 800 samples.

Note that when using dataset \(\mathcal {T}_N\), learning the relation between noisy Bloch vectors belonging to the same channel implicitly translates to that of reconstructing the shape of the Bloch sphere generated by the two channels, which are reported in Fig. 4. However, as clear from the graphical representation, there is a non-trivial intersection between the two Bloch spheres, which implies that certain samples could reasonably be assigned to either class. The imperfect accuracy and the general decrease in classification accuracy for the noise-only training dataset \(\mathcal {T}_N\) shown in Table 3, can then be explained by such inherent ambiguity in the dataset which makes the classification task more difficult — if not impossible — to solve exactly.

3.2.2 Multi-class classification

As a straightforward extension to the previous analysis, we also report results for the case of a multi-class classification task, where the network is asked to classify states generated by three different channels (phase-flip, amplitude damping and depolarizing channel), using a dataset of type \(\mathcal {T}_{IN}\), that is containing both ideal and noisy Bloch vectors. We find again that the network is able to perfectly classify all the states, reaching a perfect final accuracy on a test set \(\text {ACC}=1\), as reported in Table 3.

4 Conclusion

In conclusion, our research underscores the remarkable effectiveness of deep neural networks in quantum information processing tasks, specifically in reconstructing and classifying quantum states undergoing unknown noisy evolutions.

This study, as exemplified in our results, showcases the successful recovery of ideal (generalized) Bloch vectors with fidelities exceeding 0.99, even for quantum states involving up to three qubits, under different correlated and uncorrelated noisy channels, and using both classical (mean squared error) and quantum-inspired (infidelity) loss functions for training.

Furthermore, our investigation demonstrates the versatility of our neural network approach in classification problems, adeptly handling a wide range of noise patterns and consistently achieving remarkable classification accuracy across all test samples. Notably, in the context of discriminating between phase-flip and amplitude damping channels, our model achieves an outstanding classification accuracy of 98%, highlighting its remarkable capacity to discern the relationships between states affected by similar noise sources, even when presented with the noisy vectors alone.

Several interesting research directions are available to further extend the results obtained in our work. For instance, one could evaluate the performances of the proposed machine learning method using noise models found on real hardware or directly with data coming from a real quantum device. Additionally, due to the exponential growth of the input size (i.e. the Bloch vectors) with the number of qubits, this method is currently practical only for qubit systems of moderate size, typical of the current NISQ era. Addressing and assessing the scalability of our method to larger system sizes, as well as testing it on realistic noise channels, is essential for advancing the applicability of our method. We consider these an important topic for future research.

As we look ahead, an intriguing avenue for further exploration lies in examining the intricate connections between various fidelity measures (Liang et al. 2019) used as training loss functions and their impact on the resulting test fidelities. This pursuit aims to identify the fidelity metric best suited to the specific characteristics of the problem at hand. Such investigations promise to advance our understanding of quantum information processing and open new horizons for practical applications in quantum technology.