Training Gaussian boson sampling by quantum machine learning

We use neural networks to represent the characteristic function of many-body Gaussian states in the quantum phase space. By a pullback mechanism, we model transformations due to unitary operators as linear layers that can be cascaded to simulate complex multi-particle processes. We use the layered neural networks for non-classical light propagation in random interferometers, and compute boson pattern probabilities by automatic differentiation. This is a viable strategy for training Gaussian boson sampling. We demonstrate that multi-particle events in Gaussian boson sampling can be optimized by a proper design and training of the neural network weights. The results are potentially useful to the creation of new sources and complex circuits for quantum technologies.


I. INTRODUCTION
The development of new models and tools for machine learning (ML) is surprisingly affecting the study of manybody quantum systems and quantum optics [1].Neural networks (NN) enable representations of high-dimensional systems and furnish a universal ansatz for many purposes, like finding the ground state of many-body Hamiltonians [2], including dissipative systems [3,4].
The impact of ML in quantum optics and many-body physics is related to the versatile representation that the NN models furnish for functions of an arbitrary number of variables.Also, the powerful application programming interfaces (APIs), as TensorFlow, enable many new features and tools to compute and design many-body Hamiltonians or large-scale quantum gates [18].
Here we show that NN models are also useful when considering representations in the phase space, as the characteristic functions χ or the Q-representation [19].Unitary operators, as squeezers or displacers, act on the phase-space as variable transformations that correspond to layers in the NN model.Hence, a multilayer NN may encode phase-space representations of complex many-body states.This encoding has two main advantages: on the one hand, one can quickly build complex quantum states by combining NN layers; on the other hand, one can use the automatic graph building and API differentiation technology to compute observables.Also, graphical and tensor processing units (GPU and TPU) may speed up the computation.
In the following, we show how to compute the probability of multi-particle patterns when Gaussian states propagate in a system made of squeezers and interferometers.This problem corresponds to the renowned Gaussian Boson sampling [20,21], which recently demonstrated the quantum advantage at an impressing scale [22], following earlier realizations [23][24][25][26][27][28] of the original proposal by Aharanson and Arkhipov [29].The theory of Gaussian Boson sampling (GBS) heavily relies on phase-space methods [30], making it an exciting NN test-bed supported by recently reported trainable hardware [31][32][33].
A notable outcome of adopting NN models in the phase space is the possibility of training multi-particle statistics [34] and other features as the degree of entanglement.Indeed, most of the reported investigations in quantum ML, focus either on using NN models as a variational ansatz or tailoring the input/output response of a quantum gate.On the contrary, ML in the phase space permits optimizing many-particle features, for example, to increase the probability of multi-photon events.NN may open new strategies to generate non-classical light or enhance the probability of observing large-scale entanglement with relevance in many applications.Here, we derive the NN representing the characteristic function of the Gaussian boson sampling setup.Proper NN training increases the photon-pair probability by orders of magnitude.
Fig. 1 shows the general workflow of the proposed methodology, the different steps enable to define a trainable model for optmizing Gaussian boson sampling.In Section II, we introduce the way we adopt a neural network to compute the characteristic function.In Sec.III, we detail how to compute the observable as derivatives of the characteristic function neural network.In Sec.IV, we show how to compute the Gaussian boson sampling patterns.In Sec.V, we introduce the loss function and describe the training of the model to optimize specific patterns.Conclusions are drawn in Sec.VI.

II. CHARACTERISTIC FUNCTION AS A NEURAL NETWORK
In the phase space, we represent a n-body state by complex characteristic function χ(x) = χ R (x) + ıχ I (x) of a real vector x [19,35].x has dimension 1 × N with N = 2n.For Gaussian states [36] with g the real covariance N × N matrix, and d the real displacement N × 1 vector.In our notation, we omit the symbols of the dot product such that xd and xgx ⊤ are scalars.One has (j, k = 0, 1, 2 . . ., N − 1) and [36].In Eq. ( 2), the canonical variables, qj = R2j and pj = R2j+1 , with j = 0, 1, . . ., n − 1, are organized in the N × 1 operator array R. As shown in Fig. 2a, the characteristic function is a NN layer with two real outputs χ R and χ I .The χ layer has two inputs: x, and a auxiliary bias N × 1 vector a, for later convenience.
The vacuum state is a Gaussian state with g = 1 and d = 0. From the vacuum, one can generate specific states by unitary operators, as displacement or squeezing operators.These transform the canonical variables as ˆ R = M R + d ′ , where the symplectic matrix M and the vector d ′ depend on the specific operator (detailed, e.g., in [36]).The characteristic function changes as We represent the linear transformation as a NN layer with two inputs x and a and two outputs xM and M −1 (d ′ + a) (Fig. 2b).By this definition, Eq. ( 4) is as a two-layer NN. Figure 2c shows χ as the "pullback" of the linear layer from the χ layer.The two layers form a NN that can be implemented with common APIs.[? ].Given the vacuum state with characteristic function χ, one can build the NN model of an arbitrary state by multiple pullbacks.Indeed, we defined the linear layers in a way that they can be cascaded.Figure 4a below shows a n-mode squeezed vacuum as a multiple pullback of single mode squezers, each acting on a different mode.

III. OBSERVABLES
Observables are computed as derivatives of the NN model.For example, the mean photon number per mode is related to the derivatives of the characteristic function.The mean photon number for mode j, is being ∇ 2 j = ∂ 2 qj + ∂ 2 pj and q j = x 2j and p j = x 2j+1 .The differential photon number of modes j and k is Automatic differentiation packages enables an efficient computations of the derivatives of the NN model.

IV. GAUSSIAN BOSON SAMPLING WITH THE NEURAL NETWORK MODEL
In the GBS protocol, one considers a many-body squeezed vacuum state propagating in an Haar inteferometer, which distributes the photons in the output modes.For modelling GBS, we hence need squeezing layers and a layer representing the transmission through random interferometers.The squeezing layers are realized by a proper design of the corresponding symplectic matrices M with d = 0. We implement the Haar matrix operator by QuTiP software [37].Figure 3 shows a pseudo-code to build the neural network model by composing different layer.
We introduce the N × 1 real vector k as and we have with ∇2 j = ∂ 2 /∂k 2j + ∂ 2 /∂k 2j+1 and nT = n−1 j=0 nj .Q ρ in Eq. ( 8) can be evaluated explicitly as a multidimensional Gaussian integral: with (p, q = 0, 1, .., N − 1) being A pq = 1 2 (g pq + δ pq ).Eq. ( 9) and ( 10) can be implemented as further layers of the NN, and the probability of a given pattern computed by running the model.Figure 5a shows an example of the pattern probability distribution with n = 6, obtained by using the NN model in Fig. 4b with squeezing parameters r j = 0.88 and φ j = π/4, such that all the single mode squeezers are identical, each with mean photon number sinh(r j ) 2 ≃ 1.As in [20], we consider patterns with nj = {0, 1}.

V. TRAINING GAUSSIAN BOSON SAMPLING
Our interest is understanding if we can train the model to maximize the generation of specific patterns, e.g., a photon pair in modes 0 and 1.Using complex media to tailor linear systems is a well renowned technique as, for example, to synthesize specific gates [38,39] or taming entanglement [40].Here, we use the NN model in the phase space to optimize multi-particle events.
One could use the squeezing parameters in the model in Fig. 4b as training parameters.However, the degree of squeezing affects the number of particles per mode, and we want to alter the statistical properties of states without changing the average number of particles.We hence consider a GBS setup with an additional trainable interferometer as in Fig. 4c, which is typically realized by amplitude or phase modulators.
In Fig. 4c, n squeezed vacuum modes impinge on a trainable interferometer and then travel through a Haar interferometer.Instead of two distinct interferometers, one could use a single device (i.e., combine the Haar interferometer with the trainable interferometer), but we prefer to distinguish the trainable part from the mode-mixing Haar unitary operator.
Given n modes, our goal is to maximize the probability of patterns that contains a pair of photons in the mode 0 or 1.For example, for n = 6, this means maximizing the probability of n = (1, 1, 0, 0, 0, 0) with respect to n = (1, 0, 0, 1, 0, 0).We use as loss function which is minimal when the expected differential number of photons in mode 0 and mode 1 vanishes.This is the case when the state has a particle pair in mode 0 and mode 1.We stress the difference in using other cost functions, which involve the expected number of photons per mode as, e.g., The linear interferometer does not affect the average number of photons (which are mixed by the Haar layer).Correspondingly, training using L 0 Eq.V is not be effective to generate entangled pairs.On the contrary, L in Eq. ( 11) contains n0 n1 , which is maximal with a photon pair in modes 0 and 1.(a)    Fig. 5a shows the computed probabilities of pairs for the model in Fig. 4c, with a random instance of the Haar and the linear inteferometers.Training strongly alters this statistical distribution, as shown in Fig. 5b.Fig. 5c shows the trend during the training epochs of (n 2 0 − n2 1 ) , which goes to zero while the mean photon numbers n0 and n1 remain unaltered.

Observables
Training also maximizes higher photon events, as in the pattern n = (1, 1, 1, 1, 0, 0) with 4 photons and n = 6.Fig. 6a shows the pattern probability with 4 photons.After training with the loss function in Eq. (11), Pr(n) substantially increases for the patterns with four photons containing 1 pair in modes 0 and 1 (Fig. 6b).

VI. CONCLUSIONS
We have shown that a many-body characteristic function may be reformulated as a layered neural network.This approach enables to build complex states for various applications, as gate design or boson sampling.
A common argument in criticizing quantum neural networks is that the linear quantum mechanics does not match with the nonlinearity-eager NN models.However, recent investigations show that nonlinearity may be introduced in quantum neural networks [43].Our remark is that if we formulate quantum mechanics in the phase space, nonlinearity arises in the characteristic function (or other representation).We analyzed this strategy in the simplest case of Gaussian states.The resulting model is universal and may be trained for different purposes.For this reason, phase space models allow naturally in dealing with non-classical states and computing observables by derivatives.This formulation opens many opportunities.For example, the optimization of multi-particle events can be extended to fermionic fields.As a drawback, computing boson patterns probabilities by NN APIs is not expected to be competitive with highly optimized algorithms running on large-scale clusters [41,42].Still, it appears to be a versatile and straightforward methodology.
Here, we have shown many-body quantum state design and engineering by TensorFlow.We have demonstrated how to enhance multi-particle generation, with many potential applications in quantum technologies.In addition, the proposed method enables training Boson sampling without explicitly computing derivatives of the Hafnian [18,34], but resorting to automatic computational packages.We have tested the algorithm with a conventional workstation with a single commercial GPU (NVIDIA QUADRO RTX 4000), with a computational time of the order of few minutes with 6 modes.
The method can be generalized to other boson sampling setups, as including Glauber layers and multi-mode squeezers.Also, it readily allows to test different loss functions for tailoring the boson sampling patterns.Extension beyond Gaussian states can be envisaged by using a general machine learning networks with an arbitrary number of layers and different nonlinearity.

FIG. 1 .
FIG. 1. Workflow of the proposed methodology to train Boson sampling by representing the characteristic function as a neural network.

FIG. 2 .
FIG. 2. (a)A neural network model for the characteristic function.Two inputs, a data vector x with shape 1 × N and a bias vector a with shape N × 1 seed the model that compute χ and returns the real and imaginary parts of χ(x)e ıxa .(b) A layer representing a linear transformation of the state by a unitary operator represented by a symplectic N × N matrix M and a displacement N × 1 vector d ′ .With such a definition layers can be cascaded, and one can represent single mode squeezers, interferometers, and other unitary operators.(c) A model representing a state with characteristic function χ, subject to a unitary transformation.This is a pullback of a linear transform from the original state, which produces a new state with characteristic function χ [see Eq. (4)].
FIG. 3. Pseudo-code for the creation of a neural network representing a Gaussian boson sampling experiment

FIG. 5 .
FIG. 5. (a) Probability distribution of patterns with two photons for n = 6 in the model in Fig. 4c, before training.The insets detail the particle distribution in the patterns.(b) As in (a) after training, the probability of finding a pair in mode 0 and 1 is enhanced by more than one order of magnitude.(c) Mean photon number in mode 0 and 1 during the training epochs (green), and expected differential photon number (n0 − n1)2 in the two modes, which vanishes after thousands of epochs.The statistical distribution of pairs changes at a constant photon number per mode.Data generated by the code in https://github.com/nonlinearxwaves/BosonSampling.

FIG. 6 .
FIG. 6.(a) Probability distribution of patterns with 4 photons (n = 6) in the model in Fig. 4c before training.The insets detail the particles in each pattern.(b) As in (a) after training; the probability of patterns with two photons in modes 0 and 1 is maximized.Data generated by the code in https://github.com/nonlinearxwaves/BosonSampling.