Quantum statistical learning via Quantum Wasserstein natural gradient

In this article, we introduce a new approach towards the statistical learning problem $\operatorname{argmin}_{\rho(\theta) \in \mathcal P_{\theta}} W_{Q}^2 (\rho_{\star},\rho(\theta))$ to approximate a target quantum state $\rho_{\star}$ by a set of parametrized quantum states $\rho(\theta)$ in a quantum $L^2$-Wasserstein metric. We solve this estimation problem by considering Wasserstein natural gradient flows for density operators on finite-dimensional $C^*$ algebras. For continuous parametric models of density operators, we pull back the quantum Wasserstein metric such that the parameter space becomes a Riemannian manifold with quantum Wasserstein information matrix. Using a quantum analogue of the Benamou-Brenier formula, we derive a natural gradient flow on the parameter space. We also discuss certain continuous-variable quantum states by studying the transport of the associated Wigner probability distributions.


Introduction
The learning problem of quantum states, i.e. positive-definite trace class operators of unit trace, is central in modern quantum theory and commonly called quantum state tomography. The problem of quantum state estimation is ubiquitous in quantum mechanics and has a wide range of applications: This includes the analysis of optical devices [DLPPS02] as well as the reliable estimation of qubit states in quantum computing [BK10,LR09]. Until this day, there have been many recent computationally efficient approaches towards the quantum state estimation problem based on compressed sensing and machine learning methods such as [GLFBE10,TMC18]. For a review of the most common classical approaches towards quantum state estimation, such as Maximal likelihood estimation (MLE), we refer to [PR04].
However, both in physics and non-commutative geometry, many problems come as a quantum state estimation problem in disguise: Over the past years, finding suitable physical descriptors for molecular structures from data has become a vast and growing area of research, cf. the review article [SMB19] and references therein. Recently, such quantum machine learning approaches have also been based on optimization problems in Wasserstein distances, see for example [CLB20], where a kernel ridge regressionbased model relying on the Coulomb matrix is studied. The advantage of using the Wasserstein distance is that it leads to a continuous dependence on the position of the nuclei.
In said article, it has been discovered that it is key to use a suitable parametrization of the Coulomb matrix. This parametrization is ought to be invariant under 3D translations and rotations of the molecule and therefore related to the low-dimensional parametrization problem considered in this and previous articles, cf. [CL18]. Also, first attempts towards quantum Wasserstein generative adversarial networks have been considered in [CHLFW19]. The quantum Wasserstein distance and its generalizations considered in [CGT18,CGT18b] have also far-reaching applications beyond quantum mechanics to the field of non-commutative probability theory which includes multivariable time series and vector-valued random variables [NGT15]. Hence, solving the quantum state estimation problem in Wasserstein distance has become an important and widely applicable problem.
The analysis of geometric properties of the space of quantum states is called quantum information geometry and is central in the field of quantum information. The asymptotic theory of quantum state estimation and quantum information geometry has been developed in the second half of the 1980s by Nagaoka [N95]. A comprehensive review of the modern field of quantum information geometry and its connection to quantum estimation can be found in [H17]. In this article, we develop a new connection between these two fields based on the quantum Wasserstein metric.
It has been discovered, among others, by Otto [O01], that various PDEs evolve according to the gradient flow with respect to the L 2 -Wasserstein metric [L1998]. Later, Carlen and Maas introduced in a series of articles [CM14,CM18,CM20] also quantum Wasserstein metrics for open quantum systems, satisfying a detailed balance condition. In these articles, they showed, that such open quantum systems also evolve according to the L 2 -Wasserstein gradient flow. Moreover, they also showed that their metric allows for a dynamical formulation extending the classical Benamou-Brenier formula [BB00] to the quantum setting. Here, we also mention the work by Datta and Rouzé [RD19,RD17] for additional links to a quantum version of Ricci curvature and Fisher information functional. This analysis has been complemented by articles [CGT18,CGT18b] where different types of non-commutative multiplication operators are considered with favorable properties from a computational point of view. Besides, Carlen and Maas showed that for certain open quantum systems the gradient flow of the relative entropy with respect to an invariant state in the quantum Wasserstein metric coincides with the quantum evolution governed by the Lindblad equation. For continuous-variable states, a quantum transport framework with desirable physical features has been proposed in [DPT19]. However, a dynamical formulation of this approach does not seem to exist, yet. Results on the entropy flow for open quantum systems have also been obtained in [MM17]. Another relevant definition of the Wasserstein distance is due to Golse, Mouhot, and Paul [GMP16] and has been proposed in the study of uniform mean-field limits of quantum systems in the semiclassical parameter.
Recently, optimal transport gradient flows have been applied to estimation problems in classical probability theory. In particular, the parameter estimation problem of probability measures by using parameterized Wasserstein gradient flows on either Kullback-Leibler (KL) divergence, also referred to as relative entropy, or L 2 -Wasserstein distance has been addressed by the second author [CL18,LM18,LM20]. This leads to a joint study between optimal transport [OT] and information geometry [IG,IG2], namely transport information geometry [Li1,Li2]. Here, the natural gradient induced by optimal transport is first applied for statistical learning problems. Meanwhile, this approach also introduces a new estimation theory based on Wasserstein information matrix [LZ20]. It also develops new scientific computing algorithms by the generative adversarial network to solve classical Fokker-Planck equations, in data-poor situations [LLZZ19].
In this article, we present a new approach towards quantum state estimation based on L 2 -quantum Wasserstein gradients. We extend the study of the previous paragraph to quantum systems. We start by studying the problem of minimizing the distance with respect to a quantum Wasserstein metric d, for some fixed target density operator ρ ‹ over a parametrized manifold of states P θ Ă DpHq, i.e. we aim to identify argmin ρPP θ dpρ ‹ , ρq. We address the corresponding estimation problem for particular finite and infinite-dimensional quantum states. In the case of infinite-dimensional states, our approach towards statistical learning is based on the Wigner transform of continuous-variable quantum states. This makes this approach particularly tailored to experimental quantum state estimation in continuous-variable systems, where the Wigner distribution of the quantum state is approximately recovered [VR89]. A classical choice of the distance between probability measures is the Kullback-Leibler (KL) divergence. In classical probability, the metric induced by the L 2 Hessian of the KL divergence is the Fisher-Rao metric which provides a natural gradient descent method. The analogous concepts of relative entropy for faithful states ρ and σ Spρ}σq "´trpρplogpρq´logpσqqq and Fisher information is well-established in quantum information theory, too. For finite-dimensional quantum states, our aim is then to establish low-dimensional parameterized quantum Wasserstein gradient flows based on quantum Wasserstein distances. This means we aim to find a low-dimensional representation of the minimization problem in parameter space by applying quantum Wasserstein dynamics. Our study starts by pulling back the quantum Wasserstein metric to a finite-dimensional parameter manifold, using the quantum transport (Wasserstein) information matrix. This leads to a natural gradient descent method for quantum states.
We also introduce and study a quantum analog of the Schrödinger bridge problem. As we show in this article, this problem can be solved by a quantum Benamou-Brenier's formula with quantum Fisher information functional regularization.
Summary of novel results:.
‚ We introduce the quantum transport information matrix and develop the related quantum transport/Wasserstein statistical manifold. This can be viewed as the first step of quantum transport information geometry. ‚ We formulate the quantum transport natural gradient flow based on quantum Wasserstein statistical manifold. We apply this flow for solving the quantum statistical learning problem. ‚ We also formulate the quantum Schrödinger bridge problem by controlling the quantum transport natural gradient flows. ‚ We study the quantum Wasserstein statistical manifold for various finite-dimensional systems such as the quantum fermionic Fokker-Planck dynamics and more general finite-dimensional open quantum systems satisfying the detailed balance condition, as well as for continuous-variable systems with positive Wigner functions such as (mixtures of) Gaussian states. ‚ We illustrate our results on some simple examples and also discuss how they apply to the parameter estimation problem for quantum channels.
Outline of the article. In Section 2 we provide a brief review of classical optimal transport theory and quantum optimal transport, i.e.
‚ Classical optimal transport, Sec. In Section 3 we then introduce the quantum Wasserstein natural gradient 3.1, the Schrödinger bridge problem for finite-dimensional quantum systems in Section 3.2, and the same two for certain continuous-variable systems, including Gaussian systems, in Section 3.3. In Section 4 we discuss examples of our theory. This includes the transport problem for two Gaussian states and a fully explicit case of the fermionic Fokker-Planck equation. We finish our collection of examples by illustrating how the quantum transport information matrix can also be used to perform parameter estimation for quantum channels.
Notation. We denote by states |ny, for n P N 0 , the canonical eigenbasis of the number operator N " a˚a where a is the standard annihilation operator. The continuous linear operators on a normed space X are denoted by LpXq, the space of trace-class operators on a Hilbert space H by TCpHq. For a set Ω we denote by intpΩq its interior. The set of quantum states (positive-definite operators of unit trace) on a Hilbert space H is denoted by DpHq. We denote the Riemannian manifold of faithful states by D`pHq.
We recall that BDpHq are states with zero determinant and intpDpHqq " D`pHq. We also write tX, Y u " XY`Y X for the anti-commutator and rX, Y s " XY´Y X for the commutator. We denote the spectrum of a linear operator T by SpecpT q.

Review of classical and quantum optimal transport
Our goal is to study the problem of minimizing the distance with respect to a L 2quantum Wasserstein distance W 2 Q , for some fixed target density operator ρ ‹ over a parametrized manifold of states P θ Ă DpHq, i.e. we aim to identify argmin ρPP θ W 2 Q pρ ‹ , ρq. For this purpose, we start in this section with a review of the classical framework and highlight similarities and differences that appear in the quantum setting. In addition, we will also employ the classical framework for the study of Wigner distributions in the continuous-variable setting.
2.1. Classical optimal transport. The optimal transport problem dates back to 1781 when Monge asked how to find for two probability measures f 0 , f 1 on Ω Ă R n , with finite second moment, an optimal transport plan T : Ω Ñ Ω pushing f 0 to f 1 such that the transportation cost is minimized and for all A Ă Ω measurable inf T ż Ω }x´T pxq} 2 f 0 pxq dx : T˚f 0 " f 1 For two probability measures with densities f 0 , f 1 on Ω Ă R n the square of the classical L 2 -Wasserstein distance is defined as where Πpf 0 , f 1 q is the set of all couplings of the two measures f 0 pxq dx and f 1 pxq dx.
Equivalent to (2.1), and particularly relevant for our purposes, is a dynamical formulation, given by the Benamou-Brenier formula, which states that the Wasserstein metric is given by where the infimum is taken over all pairs pµ t , v t q where µ t with µ 0 " f 0 and µ 1 " f 1 is a curve of measures and v t a time-dependent vector field satisfying On a bounded domain Ω the above formulation is replaced by the corresponding Neumann problem.
The dynamical formulation above is closely connected to a Riemannian structure on the Wasserstein space. To fix ideas, we consider the space of strictly positive densities D`pΩq " tf P C 8 pΩ, p0, 8qq : }f } L 1 " 1u.
The tangent space of D`is then just given by σ P C 8 pΩq : For any Φ P C 8 pΩq we can then set This map provides an isomorphism, at least if Ω is compact, We can therefore define the L 2 -Wasserstein metric tensor by introducing: Definition 2.1 (L 2 -Wasserstein metric tensor). We define the metric tensor g f : 2.2. Natural gradient flow. We continue with a review of the main results of [CL18, Sec. 3] and explain how to minimize an objective function efficiently in parameter space.
We define the statistical parameter space as a d-dimensional Riemannian manifold Θ with connection D θ and metric tensor xξ, ηy θ " ξ T G θ η. We then take a continuous parametrization Θ Q θ Þ Ñ ρp‚, θq P D`pΩq and introduce a natural metric tensor by pulling back (2.3) on the statistical manifold The Wasserstein natural gradient is then for an objective function Rpθq defined by 9 θptq "´∇ g Rpθptqq where ∇ g is the unique gradient vector satisfying g θ p∇ g Rpθq, ξq " xD θ Rpθq, ξy θ .
In particular, we have the identification ∇ g Rpθq " G W pθq´1G θ D θ Rpθq.
The Wasserstein gradient descent can then be numerically implemented using a standard forward Euler method This gradient flow method can be interpreted as an approximate solution to the minimization problem argmin θPΘ Rpρp‚, θqq`W 2 cl pρp‚, θ nτ q, ρp‚, θqq 2 2τ which is obvious from considering the linearized expressions where the infimum is taken over all m and f satisfying with the boundary condition xmpt, xq´∇f pt, xq, npxqy " 0 @x P BΩ where npxq is the normal vector of the boundary. We emphasize that the difference between the SBP and the L 2 -Wasserstein metric minimization (2.2) is the presence of the diffusion term β∆ in the PDE (2.6). A discussion of the viscosity limit β Ó 0 and the convergence of the solution to the SBP can be found in [L13].
The minimization problem (2.5) with PDE (2.6) is, as has been shown in [EG99,CGP16] equivalent to minimizing the functional inf m,ρ with a constant term representing the differences of entropies Dpf 1 |f 0 q " ş Ω f 1 pxq logpf 1 pxqqf 0 pxq logpf 0 pxqq dx and f and m are linked by the transport equation The advantage of studying the functional (2.7) over (2.2) is in the additional positivity and strict convexity enforced by the contribution of the Fisher information in the objective functional. Numerical aspects of this minimization problem have been thoroughly discussed in [LYO18].
2.4. Quantum optimal transport. Before introducing quantum analogues of the L 2 -Wasserstein distance (2.1), we first define a notion of coupling of quantum states: For two density operators ρ in , ρ fi P DpHq the set of all couplings Πpρ in , ρ fi q is defined as the set of density operator valued maps that smoothly (up to endpoints) connect the two states Πpρ in , ρ fi q :" To give the definition of the 2-Wasserstein distance for finite-dimensional quantum systems satisfying the detailed balance equation, we employ the differential calculus introduced in [CM20, Def. 4.7]. This framework allows us, in particular, to reformulate the evolution of finite-dimensional open quantum systems satisfying the detailed balance condition as a gradient flow of the relative entropy Spρ||σq where σ is the invariant state, with respect to the Wasserstein metric. Before discussing this in the context of open quantum systems satisfying the detailed balance condition, we introduce the necessary differential structure: 2.4.1. Differential calculus for quantum systems. Let A be a finite-dimensional von Neumann algebra with faithful positive tracial linear functional τ and D`pAq the set of faithful states.
Definition 2.2. A differential structure on A is defined as follows: ‚ There exists a finite index set J and for each j P J a finite-dimensional von Neumann algebra B j with a faithful positive tracial linear functional τ j . ‚ For each j P J there exists a pair pl j , r j q of unital˚-homomorphisms from A to B j such that τ j pl j pAqq " τ j pr j pAqq " τ pAq.
‚ For each j P J there is 0 ‰ V j P B j andj such that Vj " Vj. Moreover, for j P J and A 1 , A 2 P A τ j pVj l j pA 1 qV j r j pA 2 qq " τ j pVj rjpA 1 qV j ljpA 2 qq.
‚ There is a faithful state σ P D`pAq such that for each j P J, V j is an eigenvector of the modular operator M l j pσq,r j pσq pV j q :" l j pσqV j r j pσq´1 " e´ω j V j for some ω j P R.
Then, the derivatives ∇ j : A Ñ B j are defined by ∇ j pAq :" V j r j pAq´l j pAqV j with gradient ∇A :" p∇ 1 A, ..., ∇ |J| Aq and divergence operator divpAq "´ÿ jPJ ∇j A j where ∇j :" ∇j withj such that Vj " Vj .
2.4.2. Wasserstein distance. Logarithmic case. The quantum L 2 -Wasserstein distance, for the above differentiable structure, has been defined in [CM20, (9.1)], by Here, we use the norm }Z} 2 ρ " xZ, L ρ pZqy L 2 pτ q . The quantum L 2 -Wasserstein distance can then be expressed as a variational problem -in analogy to the classical Brenier-Benamou formula (2.2) for the classical L 2 -Wasserstein distance-by where Φ is coupled to ρ by the following continuity equation The physical interpretation of the Riemannian metric g ρ is that for two faithful states ρ, σ P D`pHq, and the quantum relative entropy, defined by S σ pρq " τ pρplogpρq´logpσqqq, [CM20, Prop. 2.7] shows that for D denoting the derivative, the gradient pgrad S σ qpρq :" p´∆ ρ qDS σ pρq, where DS σ pρq " logpρq´logpσq, and we have This implies that the gradient flow of the entropy S σ with respect to the metric g ρ is the dynamics of the Liouville-von Neumann equation where σ is the invariant state of the dynamics defined by L˚.
In particular, the operator L ac ρ pT q is invertible for ρ ą 0 by standard results on the solvability of Lyapunov equations which imply that the inverse is explicitly given as pL ac ρ q´1pSq "´ż 8 0 e´ρ s Se´ρ s ds.

Fermionic Fokker-Planck equation.
Due to its analogy to classical probability theory and classical gradient flows, we start by discussing the quantum fermionic Fokker-Planck equation. Instead of just stating it within the abstract differential calculus introduced in the previous section, we will provide full details to fix ideas.
The quantum fermionic Fokker-Planck equation, is the canonical gradient flow associated with the quantum Wasserstein metric and corresponds to the classical Fokker- Under suitable growth conditions on V this equation has a unique invariant measure dµpxq9e´β V pxq dx. Carlen and Maas introduced in [CM14] a Riemannian metric on density operators which extends the classical L 2 -Wasserstein metric to the quantum setting and with respect to which the quantum evolution of the fermionic Fokker-Planck equation is a gradient flow. We will explain in Section how to use this metric to define a natural gradient flow for parametric models of density operators.
2.5.1. Clifford algebra. Let C be the Clifford algebra on R n generated by n self-adjoint operators Q j , j " 1, .., n satisfying the canonical anti-commutation relations tQ i , Q j u " 2δ ij . The operators Q j are also called the fermionic degrees of freedom. Moreover, C becomes a 2 n -dimensional Hilbert space H " L 2 pτ q with inner product xA, By L 2 pτ q :" τ pA˚Bq, where we introduce the normalized trace τ pAq " 2´n tr C 2 n pAq.
The density operators DpHq in this setting is the closed convex set of positive operators ρ P C of unit normalized trace.
We can explicitly construct matrices Q j solely from Pauli matrices σ x "ˆ0 1 1 0˙, σ y :"ˆ0´i i 0˙, and σ z "ˆ1 0 0´1˙. (2.11) One realization of the fermionic operators Q j , is by defining them as Q j :" b n i"1 X i where The grading operator Γ : C Ñ C is the linear operator defined, for α P t0, 1u n , by ΓpQ α q :" p´1q |α| Q α where Q α :" ś n i"1 Q α i i . The index set α P t0, 1u n is called the fermionic multi-index set. The 2 n matrices Q α for α P t0, 1u n form an orthonormal system spanning C which satisfies τ pQ α q " δ 0|α| .

The fermionic Dirichlet form on C is defined by
FpA, Aq " τ pp∇Aq˚∇Aq " n ÿ j"1 τ pp∇ j Aq˚∇ j Aq with derivatives ∇ j pAq " 1 2 pQ j A´ΓpAqQ j q P C, for j P t1, .., nu and A P C. (2.13) The gradient ∇ : C Ñ C n is then defined as ∇pAq :" p∇ 1 pAq, ..., ∇ n pAqq P C n with nullspace kerp∇q " spanpidq. The L 2 pτ q-adjoint of derivatives ∇ j is just given by The divergence operator is defined, for A " pA j q j P C n by divpAq "´ř n j"1 ∇j pA j q. We define the fermionic number operator N as the self-adjoint operator associated to the Dirichlet form FpB, Aq ": xB, N Ay L 2 pτ q where N A "´divp∇Aq for all A P C and kerpN q " id . The dynamical semigroup generated by´N is the quantum fermionic Fokker-Planck semigroup defined by P t " e´t N which relaxes exponentially fast to its unique invariant state, the completely mixed state. In particular, N is the generator of an ergodic Quantum Markov semigroup satisfying the detailed balance condition with respect to the completely mixed state.
This model can be casted in the differential calculus introduced in Section 2.4.1 by setting A :" B j :" C n , V j :" Q j , ω j :" 0, l j :" Γ and r j " id with derivatives as defined in (2.13) and a generator L A " 2 ř n j"1 pQ j AQ j´A q "´4N .
2.6. Quantum Markov semigroups with detailed balance condition. In the rest of this section, we illustrate the ideas using the differential calculus in Subsection 2.4.1 in the case of Quantum Markov semigroups pP t q with Lindblad generator L , in the Heisenberg picture, acting on a finite-dimensional C˚-algebra A satisfying the detailed balance condition (DBC). This means, that for all times t ą 0 the operator P t is self-adjoint with respect to the inner product xX, Y y 1,σ :" τ pX˚σY q for some state σ. In particular, the DBC implies that σ is the unique state such that Pt pσq " σ for all times t ą 0. Other possible applications of the differential calculus in Subsection 2.4.1 and thus also of the parameter estimation techniques studied in this paper are discussed in [CM20, Sec. 5] and include popular quantum channels such as the depolarizing channel.
The generators L of the quantum Markov semigroups in Heisenberg representation are characterized by [CM20, Theo 2.4] L " ÿ jPJ e´ω j {2 L j and L j pAq " Vj rA, V j s`rVj , AsV j (2.14) with J a finite set and a family of operators pV j q jPJ closed under taking adjoints, as well as real numbers ω j such that the modulation operator M σ pAq :" M σ,σ pAq :" σAσ´1 satisfies M σ pV j q " e´ω j V j and ωj "´ω j .
We then define A " B j " LpHq where H is a finite-dimensional Hilbert space, write B :" ś j B j , and set l j " r j " id A . The partial derivatives are then just given by ∇ j A " rV j , As and ∇j :" ∇j wherej is such that Vj " Vj. The gradient vector is thus just ∇ " p∇ 1 , ..., ∇ |J| q. It follows from [CM20, Prop. 2.5] that the Lindblad generator induces a Dirichlet form with respect to the Kubo-Martin-Schwinger inner product xA, By KMS :" τ pX˚Y σq, i.e. x∇A, ∇By "´xA, L By L 2 KMS pσq for all A, B P A.
We then define the operator In terms of a contraction operator # that is uniquely defined as the linear extension of the map pA b Bq#C :" ACB for A, B, C P A and Feynman-Kubo-Mori operator L ρ pCq :" p ρ j #C (2.15) we may then introduce a positive-definite operator´∆ ρ on L 2 pA, τ q ∆ ρ pAq :" ÿ jPJ ∇j pL ρ p∇ j Aqq. (2.16) This way, the L 2 -quantum Wasserstein metric becomes where Φ is coupled to ρ by the following continuity equation

Quantum natural gradient and open quantum systems
In the following we shall impose the following condition on generators of finitedimensional open quantum systems we consider: Assumption 1. We assume that L is ergodic, i.e. kerpL q " spantidu satisfying the detailed balance condition with invariant state σ.
3.1. Gradient flow for finite-dimensional OQSs with DBC. By the ergodicity assumption, we are able to pull back the metric from the state space to the parameter space. In particular, the above assumptions are satisfied for the fermionic Fokker-Planck equation with the completely mixed state as the unique invariant state.
The statistical parameter space is as in the classical setting defined as a d-dimensional Riemannian manifold Θ with connection D θ and metric tensor xξ, ηy θ " ξ T G θ η. We then take a continuous parametrization Θ Q θ Þ Ñ ρpθq P D`pAq of density operators.
We then define a norm }Z} 2 ρ " xZ, L ρ pZqy L 2 pτ q where L ρ has been defined in (2.12) for the fermionic Fokker-Planck equation and in (2.15) for general open quantum systems satisfying the DBC. In addition, we allow for L ρ the anti-commutation operator defined in (2.10).
In case of L ρ being the anti-commutator, the gradient field ∇Φ X can be found by solving the Lyapunov equation [CGT18,(21)] ∇pdiv grad | spanpidq K q´1X " L ρ p∇Φ X q P LpH n q.
The gradient descent method in parameter space naturally corresponds to a gradient descent method on the parametrized manifold of states: Proposition 3.1. Consider an immersion Θ Q θ Þ Ñ ρpθq P DpHq and an objective function R on the set of states. We can then define an objective function Rpθq " Rpρpθqq and the gradient evolution 9 θptq "´∇ g Rpθq, induces the gradient evolution ρ 1 ptq "´grad Rpρptqq on the parametrized manifold of states where ρptq " ρpθptqq and gradpRpρpt 0 qqq " xD θ ρpθq, ∇ g Rpθ t 0 qy θ .

Schrödinger bridge problem for finite-dimensional OQSs with DBC.
We may now introduce a generalization of the quantum Brenier-Benamou formula in (2.9), to study a quantum version of the Schrödinger bridge problem, by adding a Fisher information regularizer to the dynamics. For this derivation, we shall restrict us to the scenario that the operator L ρ is the Feynman-Kubo-Mori operator as in this case, one obtains direct links to quantum entropies and quantum dynamics.
The computational advantage of the Fisher information regularization are two-fold. Firstly, it induces additional convexity to the minimization problem. Secondly, it additionally forces the density operator to remain strictly positive. where we use the inner product xX, Y y ρptq´1 :" xX, L´1 ρ pY qy L 2 pτ q .
Here m is connected to ρptq by an inhomogeneous heat equation for some fixed parameter β ě 0 where T " L˚for OQS satisfying the DBC and T "´N in the case of the fermionic Fokker-Planck equation.
3.3. Continuous-variable systems. As in the theory of classical probability theory, there exists a close analogue of quantum Gaussian states GpH m q on H m :" L 2 pR m q defined as follows (cf. [BDLR19] and references therein for more details): Gaussian states are states ρ P DpH m q such that their characteristic function χ ρ : is the characteristic function of a Gaussian random variable over C m , i.e. χpξq " exp`´1 4 xξ, γξy`ixd, ξy˘where γ ą 0 is a positive definite matrix satisfying γ`iν ě 0, for ν :"ˆ0 1 1 0˙' m i"1 , and d P R 2m . Here, Dpzq is the displacement operator Dpzq " exp˜m ÿ j"1 pz j aj´z j a j q¸.
Conversely, the density operator ρ P DpH m q can be recovered from its characteristic function by ρ " ż C m χ ρ pzqDp´zq dz π m . We can associate a canonical random variable to any Gaussian state in terms of their Wigner function which is of unit L 1 norm and a Gaussian distribution on R 2m as well.
A particularly simple and relevant example of a Gaussian state are thermal states with mean photon number N P r0, 8q Thermal states have the special property that they are the maximum entropy states for a fixed average energy ρ N " argmax ρ;trpρa˚aqďN´t rpρ logpρqq.
We finally mention that although Wigner distributions functions are positive as operators on L 2 pR 2m q, they are not pointwise positive in general and therefore also not always genuine probability distributions (cf.the Wigner distribution function associated to |1yx1|).
In addition, the Wigner distribution function of a state ρ satisfies the energy identity ż R 2n |z| 2 ρpzq dz " trpρx 2 q`trpρp 2 q " trpp2a˚a`1qρq where x and p are the position and momentum operator.
Proposition 3.3 (Separability). Let ρ piq θ be a family of Gaussian states on Hilbert spaces L 2 pR 2npiq q, and ρ θ :" Â N i"1 ρ piq θ , then the Wasserstein information matrix satisfies Proof. It follows directly from (3.16) that the characteristic function of a tensor product is the product of the individual characteristic functions. Using the Fourier transform and (3.17), this immediately translates into the Wigner functions being a product of Wigner functions (3.17). The result then follows from [LZ20, Prop. 5].

Examples
In this section, we demonstrate the quantum transport information matrix and its related gradient and Hamiltonian flows in some well-known probability models.

Examples for the quantum Wigner distribution.
4.1.1. Gaussian mixture model. For Gaussian states ρ i we consider the Gaussian Wigner probability distributions P ρ i associated to them. Let X i " N pµ i , Σ i q be normal random variables, then it follows that ř N i"1 λ i X i be a Gaussian mixture with λ i ě 0 summing up to one, then clearly µ X :" EpXq " ř N i"1 λ i µ i and also for the second moment m X i :" EpX i Xi q we find Thus, the covariance matrix is given by ř N i"1 λ i µ i µi´EpXqEpXq˚ě 0 by Jensen's inequality. Thus, since the variance of a mixture is increasing, the condition Σ i`i ν ě 0 is satisfied for the extremal states and clearly the state associated with the mixture X is ρ " To parametrize multivariate Gaussian distributions N pµ, Σq that are Wigner functions of Gaussian states, it is natural to consider the parameter space θ " pµ, Σq P Θ :" R 2mˆt γ P R 2mˆ2m ; γ ą 0 and γ`iν ą 0u. The Wasserstein metric tensor for the multivariate Gaussian model is g θ pξ, ηq " xµ ξ , µ η y`trpS ξ ΣS η q for ξ " pµ ξ , Σ ξ q and η " pµ η , Σ η q and S ξ and S η solving the Lyapunov equations Σ ξ " tS ξ , Σu and Σ η " tS η , Σu.

4.4)
with boundary conditions θ 0 " θ 0 and θ N " θ 1 . This minimization problem can be easily solved using a simple Monte-Carlo algorithm minimizing (2.5) that only accepts transitions to states that satisfy the two constraints Σ i ě 0 and Σ i`i ν ě 0.
The numerical solution to the quantum transport problem of the two parametrized Gaussian states is illustrate in Figure 3.

Examples involving the quantum fermionic Fokker-Planck equation.
Example 2 (Fermionic Fokker-Planck equation; Analytic solution). We consider the fermionic Fokker-Planck equation as introduced in Subsection 2.5 for simplest case n " 1, i.e. C can be identified with the two-dimensional Hilbert space spantid C 2ˆ2 , σ x u in which case we can solve the problem analytically.

The grading operator is defined by
Γpidq " id and Γpσ x q "´σ x .
The faithful states in C are then parametrized by p´1, 1q Q θ Þ Ñ ρpθq :" id`θσ x .
In particular, since the Lagrangian is just Lp 9 θptqq " 9 θptq 2 , the geodesics in parameter space are just straight lines as the Euler-Lagrange equation : θptq " 0 immediately shows.

Wasserstein natural gradient.
We shall now also illustrate the Wasserstein natural gradient for the quantum Fokker-Planck equation as in Example 2 by minimizing the von Neumann entropy as objective function Rpθq " τ pρpθq logpρpθqqq.

4.3.
Channel parameter estimation-pushforward of quantum states. The idea of parameter estimation of probability densities constructed from the pushforward of possibly nonlinear activation functions, relevant for neural networks, has been investigated by the second author in [LZ20].
In quantum theory the framework is somewhat different, since quantum operations on a physical system are described by linear (super)-operators, so-called quantum channels rather than non-linear one-dimensional functions. A quantum channel is a completely positive and trace preserving (CPTP) map. Thus, it is natural to consider the situation where a state is parametrized by the output of a quantum channel Φ θ depending on some parameter θ which is the quantum analogue of the pushforward of probability measures by parametrized functions.
We shall illustrate how such problems can be studied in our framework by considering the quantum depolarizing channel (4.13) with the quantum fermionic Fokker-Planck equation, introduced in Section 2.5, for n " 2.

Discussion
In this paper, we pull back the quantum Wasserstein-2 metric into a parameterized quantum statistical models. This allows us to develop a quantum Wasserstein/transport information matrix. Using this matrix, we develop the quantum transport natural gradient methods and apply them to the quantum statistical learning problems. Besides, we also consider the optimal control problem of quantum transport natural gradient flows, which leads to the derivation of quantum Schrödinger bridge problem. Several analytical examples, such as the transport of Gaussian states on the statistical manifold in Example 1, the transport of states for the gradient induced by quantum fermionic Fokker-Planck equation in Section 4.2 on the statistical manifold, and the parameter estimation problem for channels in Subsection 4.3, are provided.
Our results initialize the joint study among quantum information geometry and quantum optimal transport. We pull back the quantum system dynamics into a finitedimensional parameter space generated by statistical and machine learning models. We call this area quantum transport information geometry. Here the interaction study between quantum Fisher and quantum Wasserstein information matrices becomes essential. We expect that this joint study would be useful in developing transport estimation theory of quantum information theory, and designing AI-driven quantum computing algorithms for quantum systems. In the future, we will continue this line of study following transport information geometry [Li1,Li2].