Approximation of quantum control correction scheme using deep neural networks

We study the functional relationship between quantum control pulses in the idealized case and the pulses in the presence of an unwanted drift. We show that a class of artificial neural networks called LSTM is able to model this functional relationship with high efficiency, and hence the correction scheme required to counterbalance the effect of the drift. Our solution allows studying the mapping from quantum control pulses to system dynamics and analysing its behaviour with respect to the local variations in the control profile.


I. INTRODUCTION
The main objective and motivation of quantum information processing is the development of new technologies based on principles of quantum mechanics such as superposition and entanglement [1]. Quantum technologies require the development of methods and principles of quantum control, the control theory of the quantum mechanical system [2]. Such methods have to be developed by taking into account the behaviour of quantum systems [3,4]. In particular, as quantum systems are very susceptible to noise, which may influence the results of the computation, the methods of quantum control have to include the means for counteracting the decoherence [5].
The presented work is focused on the development of tools suitable for analysing the relation between the control pulses used for idealized quantum systems, and the control pulses required to execute the quantum computation in the presence of undesirable dynamics. Modelling this relation is important to better understand the manifold of control pulses in the presence of noise, a case which is still poorly understood. We focus on quantum dynamics described by a quantum spin chain. We are interested in a method of approximating the correction function of normal control pulses (NCP), i.e. the function accepting control pulses corresponding to the system without the drift Hamiltonian and generating the de-noising control pulses (DCP) for the system with the drift Hamiltonian. The existence of this function is non-trivial, since there are infinitely many pulses that produce the same evolution. We propose an approximation that not only incorporates the global features of the function but also describes its local properties. This feature is in contrast with available methods based on optimisation which do not take into account the continuous behaviour of the map from NCP to the de-noising control pulses. Indeed, without further assumptions, in view of the non-injectivity of quantum control, we may expect no relation at all between control pulses obtained from the optimization of different problems. On the other hand, we show that machine learning methods can be used for this purpose.
Recently, significant research effort has been invested in the application of machine learning methods in quantum information processing [6][7][8]. In particular, optimization techniques borrowed from machine learning have been used to optimize the dynamics of quantum systems [9], either for quantum control [10,11] and simulation [12], or for implementing quantum gates with suitable time-independent Hamiltonians [13]. These techniques include also quantum control techniques from dynamic optimization [14] and reinforcement learning [15,16]. In the presence of noise, neural networks give tools for optimizing dynamical decoupling, which can be seen as a quantum control correction scheme as considered by us, in a special case of the target operation being identity [17]. On the level of gate decomposition, neural networks have also been applied to the problem of decomposing arbitrary operations as a sequence of elementary gate sets [18,19].
In this paper, we propose a method, based on an artificial neural network (ANN), to study the correction scheme between control pulses obtained in the ideal case, and those obtained when the system is subject to undesired dynamics. Moreover, we demonstrate that the utilized network has high efficiency and can be used to analysis of the properties of the model. This paper is organized as follows. In Section II we introduce the model and describe the architecture of a deep neural network which will be used as an approximation function. In Section III we demonstrate that the proposed methods can be used for generating control pulses without the explicit information about the model of a quantum system. We also utilize it for the purpose of analysing the properties of the correction scheme. Section V contains summary of the presented results.

II. METHODS: MODEL AND SOLUTION
In this section, we provide necessary notation and background information. We start by introducing a spinchain model and describe the problem of generating quantum control pulses that counteract the undesired dynamics present in the system. We also introduce the architecture of the artificial neural network used to approximate the correction scheme.

A. Model of quantum system
Let us consider a system of two interacting qubits. The evolution of the system is described by GKSL master equation where the Hamiltonian has three components We consider the control Hamiltonian of the form and the base Hamiltonian The last element in Eq. (2) is the drift Hamiltonian, which can be an arbitrary two-qubit Hamiltonian multiplied by real parameter γ > 0. Incoherent part of Eq. 1 models interaction with an environment with strengths y j . It should be noted that in this paper will never consider the case when γ and γ j both are different form zero. Quantum optimal control refers to the task of executing a given unitary operation via the evolution of the system, in our case described by Eq. (2). To achieve this, one has to properly choose the coefficients h(t) = (h x (t), h z (t)) in Eq. (3). The set of reachable unitaries can be characterized [2] by studying the Lie algebra generated by the terms in Eq. (2). For H γ = 0 our system is fully-controllable, so any target can be obtained with suitable choice of h(t), with no restriction on the control pulses. We assume that function h(t) is piecewise constant in time slots ∆t i = [t i , t i+1 ], which are the equal partition of evolution time interval T = n−1 i=0 ∆t i . We also assume that h x (t) and h z (t) have values from interval [−1, 1]. Function h(t) will be represented as vectors of values of h x (t) and h z (t). For the case γ = 0, we say that it represents NCP-normal control pulses. Alternatively, for γ = 0, we say that h(t) represents DCP-denoising control pulses. Since h(t) is piecewise constant, both NCP and DCP have two indices, with first index corresponding to time slots {0, . . . , n − 1}, and the second index corresponding to the direction {x, z}, namely where It is worth noting that the mapping from the set of control operations to the unitary operator is not injective. Namely the same unitary can be obtained using different choices of h(t). To study the relationship between NCP and DCP we need to select the DCP which is more closely related to the NCP. Because of continuity, we do that numerically by using the NCP as starting guess of the DCP. The final optimal DCP is then found using a local optimisation around the initial NCP. The figure of merit in our problem is the fidelity distance between superoperators, defined as [20] with where N is the dimension of the system in question, Y is superoperator of the fixed target operator, and X(T ) is evolution superoperator of operator resulting from the numerical integration of Eq. (1) with given controls. In particular, for a target unitary operator U , its superoperator Y is given by the formula Superoperator X(T ) is obtained from the unitary operator resulting from the integration of the Eq. (1).

B. Architecture of artificial neural network
The control pulses used to drive the quantum system with Hamiltonian from Eq. 3 are formally a time series. This aspect suggests that one may study their properties using methods from pattern recognition and machine learning [21,22] that have been successfully applied to process data with similar characteristics. The mapping from NCP to DCP share similar mathematical properties with that of statistical machine translation [23], a problem which is successfully modelled with artificial neural networks (ANN) [24]. Because of this analogy, we use ANN as the approximation function to learn the correction scheme for control pulses. A trained artificial neural network will be used as a map from NCP to DCP where nnDCP, neural network DCP, is an approximation of DCP obtained by using the neural network. Because of time series character of control sequences, we utilize bidirectional long short-term memory (LSTM) networks [25]. The long short-term memory block is a special kind of recurrent neural network (RNN), a type of neural network with directed cycles between units. These cycles allow the RNN to keep track of the inputs received during the former times. In other words, the output at given time depends not only on current input, but also on the history of earlier inputs. This kind of neural networks is applicable in situations with time series with long-range correlations, such as in natural language processing where the next word depends on the previous sentence but also on the context. LSTM block consists of two cells hidden state: which are constructed from following gates forget gate: where • is element wise multiplication of two vectors, s t−1 is previous hidden state, and x t is an actual input state. Matrices V and W are the weight matrices of each gate. As one can see there is only one gate with hyperbolic tangent as activation function -candidate gate. Rest of the gates has sigmoid activation function which has values from [0, 1] interval. Using this function, neural networks decides which values are worth to keep and which should be forgotten. This gates maintain the memory of the network. Basic architectures of RNN are not suitable for maintaining long time dependences, due to the so-called vanishing/exploding gradient problem -the gradient of the cost function may either exponentially decay or explode as a function of the hidden units. Thanks to the structure of LSTM the exploding gradient problem is reduced. The bidirectional version of LSTM is characterized by the fact that it analyses the input sequence/time series forwards and backwards. Thanks to this it uses not only information from the past but also from the future [26,27]. For this purpose the bidirectional LSTM unit consists of two LSTM units. One of them takes as an input vector [h(t 1 ), h(t 2 ), . . . , h(t n )], and the second takes [h(t n ), h(t n−1 ), . . . , h(t 1 )].
As in Eq. (10), the result of the network is a vector of control pulses nnDCP. To evaluate the quality of nnDCP we apply loss function described in Eq. 8. To achieve this we integrate a new superoperator X(T ) form nnDCP and measure how far it is from the target superoperator Y .
For two-qubit systems, we found that three stacked bidirectional LSTM layers are sufficient for obtaining the high value of fidelity. Moreover, at the end of the network we use one dense layer which processes the output of stacked LSTM to obtain our nnDCP. Experiments are performed using TensorFlow library [28,29].

III. RESULTS: EXPERIMENTS
For the purpose of testing the proposed method we use a sample of Haar random unitary matrices [30,31]. Pseudo-uniform random unitaries can be obtained also in the quantum control setting via random functions h(t), provided that the control time T is long enough [32]. The exact implementation of sampling random Haar unitary matrices is available at [33]. In our experiments we use QuTIP [34][35][36] to generate control pulses to training and testing data. First, we construct the target operators where U ∈ U(2) is a random matrix. Next, using QuTIP we generate NCP corresponding to the target operators. In the case of our setup, the parameters are fixed as follows: • time of evolution T = 6, • number of intervals n = 32, • control pulses in [−1, 1].
We train the network using a subset of the generated pairs {(NCP, Y target )}, where Y target is a target superoperator obtained from U target . In the training process network takes NCP as input and generates the nnDCP. Next this nnDCP is an input to the loss function. For this purpose we construct superoperator X(T ) resulting from integration of Eq. (1) using nnDCP. Next, we calculate error function between X(T ) and Y target as in Eq. (8). In this section we analyse only coherent drift i.e. in Eq.(1) γ j = 0. Source code implementing experiments described in this paper is available at [33].

A. Performance of the neural network
The first experiment is designed to analyse the efficiency of the trained network in terms of fidelity error of generated nnDCP control pulses. Trained LSTM have mean fidelity on the test set as presented in Table I. It should be noted that, despite the fact that the mean fidelity on the test set is high, the trained network sometimes has outlier results i.e. it returns nnDCP which corresponds to the operator with high fidelity error.    The performed experiments show that it is possible to train LSTM networks for a given system with high efficiency. Results from Table I are obtained by trained artificial neural networks on different kinds of drift, i.e. ασ x ⊗ 1l + (1 − α)σ y ⊗ 1l for α ∈ {0, 0.2, 0.5, 0.8}, with different values of γ. The experiment with this kind of drift is representative because σ y is orthogonal to the controls σ x , σ z . The performed experiment shows that it is possible to train LSTM to have the efficiency on the test set above 90% for chosen gammas. Some of results from Table I have average score lower than 90%. This is caused by the outlier cases, when network performs with very low efficiency. Such efficiency for chosen gammas is sufficient to our goal, which is to study the relations between the system and control pulses for relatively small disturbances.
In Table II we present average score for model with spin chain drift. As one can see, these results are much better than in Table I. This can be explained by the fact that this type of drift is similar to the base Hamiltonian H 0 which is also a spin chain.
Results from above tables may suggest the answer for non trivial questions: for what kind of drift, map between NCP and DCP is easier to learn. If training on the model with drift operating the same system as the control is easier than training on the model with drift similar to base Hamiltonian? Obtained results suggest that, when the drift is asymmetric with respect to the base Hamiltonian, the neural network requires larger mini batch sizes to achieve similar efficiency.
As one can see, the efficiency of the artificial neural networks depends on the choice of the hyperparameters. In our case, for some values of parameter γ one needs to use bigger batch size and the larger training set to obtain satisfactory results. This behavior is incomprehensible because, for γ = 0.6 and batch size equal to 5, RNN has problems with convergence. However, by increasing the batch size we were able to improve the performance. We would like to stress that our aim was to show that it is possible to obtain mean efficiency greater than 90%, not to examine what is the best possible efficiency of the network.

B. Utilization of the approximation
In this section, we show the results of an experiment, which allows checking what is the behaviour of the ANN() function when we perform local disturbances on a set of random NCP, and we check what is the deviation of the new nnDCP from the original DCP. The original DCP is obtained from GRAPE algorithm initialized by NCP.
Let us suppose that we have a trained LSTM. The procedure of checking its sensitivity on variations of h j (t), for j ∈ {x, z} and t in the i-th time slot, is as follows.
Step 1 Select a NCP vector from the testing set and generate the corresponding nnDCP.
Step 2 If the fidelity between target operator and operator resulting from nnDCP is lower than 90% return to Step 1.
Step 3 Select i ∈ {0, . . . , n − 1}, j ∈ {x, z}, change (i, j) coordinate of NCP by fixed ε and calculate new collection of nnDCP vectors with elements nnDCP i,j , denoting that the element was obtained by perturbing j-th component at the i-th time slot.
Step 4 If the fidelity between target operator and operator resulting from nnDCP i,j is lower than 90% return to Step 3.
Step 5 Calculate norm l 2 (Euclidean norm) of difference between nnDCP and nnDCP i,j .
Because of outlier results of the network, there are additional conditions on generated DCP in the above algorithm. Applying the above algorithm for each NCP from the testing set we can obtain a sample of variations. Next, we can analyse empirical distributions of these variations.
As an example of using the above method we consider the following example. Let us suppose that the drift operator is of the form Then, for ε = 0.1 and γ = 0.2, the exemplary variation histograms are presented in Fig. 2. Thanks to trained network we are able to analyse the impact of small changes of the input on the output (Step 5). One can consider the median of the distribution of variations as the quantitative measure of the influence of the changes in the input signal on the resulting DCP. To check if medians of distributions of changes are similar, we perform Kruskal-Wallis statistical test for each pair of the changed coordinates (see Fig. 3(a)). Results presented in Fig 3(a) shows that most of the distributions received are statistically different regarding the median, i.e. most of the p-values is less than 0.05. The values on the main diagonals, where we compared disturbances on the same controls in the same time slots, are equal to 1. This observation confirms that the test behaves appropriately in this case. On the other hand, one can observe that for time slots 10 ≤ i ≤ 30 Kruskal-Wallis test for distributions obtained for N CP i+1,x ± ε and N CP i,z ± ε gives p-values greater than 0.05. Thus, the disturbances introduced in this time slots on h x and h z coordinates of N CP results in similar variations of the resulting DCP. The situation is different for time slots i ≤ 10, where one can see that the variation in DCP signal depends on which coordinate of the NCP signal is disturbed. The similar effect can be observed if the drift Hamiltonian acts on the second qubit only. In this case we can consider experiment where the drift Hamiltonian is of the form with ε = 0.1 and γ = 0.2. The results of Kruskal-Wallis test for this situation are presented in Fig. 3(b). One can see that our approach suggests that there are similarities in distributions of variations implied by disturbances on different controls and near time slots. This suggests that local disturbances in control signals have a similar effect in the case of drift on the target system and drift on the auxiliary system. Moreover, one can see that the constructed approximation exhibits symmetry of the model. This effect can be observed by analysing the disturbances of h x and h z controls in the same time slot. From the performed experiments one can see they give similar variations quantified by the distribution of DCP changes.
One should note that on plots in Fig. 3, where we compare variations of nnDCP i,x and nnDCP i,z , p-values greater than 0.05 are focused along the diagonal. This symmetry is not perfect, but one should note that training data are generated from random unitary matrices. Moreover, as we do not impose any restrictions on the training set to ensure the uniqueness of the correction scheme.

IV. DISCUSSIONS
In a case where Hamiltonian drift is absent γ = 0 and we want to find proper DCP for system with incoherent noise, things get more complicated. In our experiments we tested three kinds of Lindblad noise, namely 1. with one Lindblad operator L = |0 1| ⊗ 1l, 2. with two Lindblad operators L 1 = σ z ⊗ 1l and L 2 = 1l ⊗ σ z , 3. and mixed two Lindblad operators L 1 = σ z ⊗ 1l and L 2 = |0 1| ⊗ 1l.
For all this open system noise models, nnDCP obtains efficiency above 90% for γ 1 = γ 2 ≈ 0.01. However, this strength of noise is so small that also NCP obtains similar efficiency. Therefore, the proposed model of the neural network does not correct NCP effectively, for incoherent types of noise.
Another important case in the context of quantum control is the robustness on the random fluctuations. In our experiments, we tried to generate nnDCP which will be robust for Gaussian fluctuations i.e. ∀i ∈ {0, . . . , n − 1}, j ∈ {x, z} nnDCP i,j + δ i,j , where δ i,j ∼ N (0, σ). The standard deviation we tested in two cases σ = 0.1 and 0.2, while the model of Hmiltonian drift was 0.4(σ y ⊗ 1l). During the training process, returned by ANN control pulses were copied ten times and to each copy we added Gaussian fluctuations. Such disturbed nnDCP were applied to cost function Eq. (8), and next the average fidelity was calculated. Unfortunately, after this training process artificial neural network doesn't produce nnDCP which are more robust for random fluctuations. This failure might be caused by the fact that training of artificial neural network is based on gradient descent. As we know from [16], gradient based algorithm does not give very good results. On the other hand, task with which we try to face is slightly different from the one in [16], because we want to generalize correction scheme over all target unitaries.

V. CONCLUDING REMARKS
The primary objective of the presented work is to use artificial neural networks for the purpose of approximating the structure of quantum systems. We propose to use a bidirectional LSTM neural network. We argue that this type of artificial neural network is suitable to capture time-dependences present in quantum control pulses. We have developed a method of reconstructing the relation between control pulses in idealised case and control pulses required to implement quantum computation in the presence of undesirable dynamics. We argue that the proposed method can be a useful tool to study the manifold of quantum control pulses in the noisy regime, and to define new theoretical approaches to control noisy quantum systems.