Entropy-regularized 2-Wasserstein distance between Gaussian measures

Gaussian distributions are plentiful in applications dealing in uncertainty quantification and diffusivity. They furthermore stand as important special cases for frameworks providing geometries for probability measures, as the resulting geometry on Gaussians is often expressible in closed-form under the frameworks. In this work, we study the Gaussian geometry under the entropy-regularized 2-Wasserstein distance, by providing closed-form solutions for the distance and interpolations between elements. Furthermore, we provide a fixed-point characterization of a population barycenter when restricted to the manifold of Gaussians, which allows computations through the fixed-point iteration algorithm. As a consequence, the results yield closed-form expressions for the 2-Sinkhorn divergence. As the geometries change by varying the regularization magnitude, we study the limiting cases of vanishing and infinite magnitudes, reconfirming well-known results on the limits of the Sinkhorn divergence. Finally, we illustrate the resulting geometries with a numerical study.


Introduction
Optimal transport (OT) [77] studies the geometry of probability measures through the lifting of a cost function between samples.This is carried out by devising a coupling between two probability measures via a transport plan, so that one measure is transported to another with minimal total cost.The resulting geometry offers a favorable way of comparing probability measures one to another, which has lead to considerable success in machine learning, especially in generative modelling [6,23,27,52], where one aims at training a model distribution to sample from a given data distribution, and computer vision, where OT provides intuitive metrics between images [66].Notably, OT can not only be used to derive divergences, but also metrics between probability distributions, referred to as the p-Wasserstein metrics.
To ease the computational aspects of OT, entropic relaxation was introduced, which transforms the constrained convex problem of transportation into an unconstrained strictly convex problem [20].This is carried out via considering the sum of the total cost and the Kullbackk-Leibler (KL) divergence, between the transport plan and the independent joint distribution, scaled by some regularization magnitude.In addition to computational aspects, the entropic regularization also betters statistical properties [71], specifically, the complexity of estimating the OT quantity between measures through sampling [33,56,78].Theoretical properties of the entropic regularization have been studied in e.g.metric geometry, machine learning and statistics [29,34,35,39,53,64,65].It has also been applied in a variety of fields, including computer vision, density functional theory in chemistry, and inverse problems (e.g.[35,37,49,62]).
The resulting problem has close relations to the Schrödinger problem [70], which considers the most likely flow of a cloud of gas from an initial position to an observed position after a certain amount of time under a prior assumption on the evolution of the position, given by e.g. a Brownian motion.The resulting problem has found applications in fields such as mathematical physics, economics, optimization and probability [12,19,22,30,31,67,79]). Connections to OT have been considered in e.g.[20,32,47,68,69].
OT is not the only instance of a geometric framework for probability measures.Other popular choices include information geometric divergences [3,9] and integral probability metrics [59].In contrast to these methods, OT and entropic OT has the advantage of metrizing the weak * -convergence of probability measures, which results in non-singular behavior when comparing measures of disjoint supports.On top of this, being able to decide the lifted cost function is important in applications, as the cost function can be used to incorporate modelling choices, determining which differences in samples are deemed most important.For example, the standard Euclidean metric is a poor choice for comparing images.
Gaussian distributions provide a meaningful testing ground for such frameworks since, in many cases, they result in closed-form expressions.In addition, the study of Gaussians under the OT framework result in useful divergences.In particular, divergences between centered Gaussians result in divergences between their corresponding covariance matrices.Both instances enjoy many applications in a plethora of fields, such as medical imaging [26], computer vision [74,76,75], brain computer interfaces [18], natural language processing [60], and assessing the quality of generative models [42].Notably, the 2-Wasserstein metric between Gaussians is known as the Bures metric in quantum physics, where it is used to compare quantum states.Other popular divergences for Gaussians include the affine-invariant Riemannian metric [63], corresponding to the Fisher-Rao distance between centered Gaussians, the Alpha Log-Determinant divergences [14], corresponding to Rényi divergences between centered Gaussians, and the log-Euclidean metric [8].A survey of some of the most common divergences and their resulting geometry on Gaussians can be found in [28].More recently, applications have driven research into allowing determining optimal divergences for the task at hand, which has raised interest in studying interpolations between different divergences [4,17,73].Generalizations of these divergences to the infinite-dimensional setting of Gaussian processes and covariance operators have also been considered [46,51,54,58,57].
The Sinkhorn divergence has been proposed in OT, applying the entropic regularization to define a parametric family of divergences, interpolating from the OT quantity to a maximum mean discrepancy (MMD), whose kernel is determined by the cost.In the present work, we provide a closed-form solution to the entropyregularized 2-Wasserstein distance between multivariate Gaussians, which can then be applied in the computation of the corresponding Sinkhorn divergence between Gaussians.In addition, we study the task of interpolating between two Gaussians under the entropy-regularized 2-Wasserstein distance, and confirm known limiting properties of the divergences with respect to the regularization strength.Finally, we provide fixed-point expressions for the barycenter of population of Gaussians restricted to the Gaussian manifold, that can be employed in fixed-point iteration for computing the barycenter.The one-dimensional setting has been studied in [4,36,37].The Schrödinger bridge between multivariate Gaussians has been considered in [15], including the study of the limiting case of bringing the noise of the driving Brownian motion to 0, resulting in the 2-Wasserstein case, in [16].
During the review process of the article at hand, analogous results of this paper was obtained independently by Janati et al. [43].In the barycenter problem, in [43] the authors shows that the barycenter of Gaussians under the Sinkhorn divergence is a Gaussian, when restricted to the space of sub-Gaussian measures.This extends our Theorem 4, where we explicitly restrict to Gaussian instead of sub-Gaussian measures.Furthermore, the authors consider the setting of unbalanced Gaussian measures." The paper is divided as follows: in Section 2, we briefly introduce the necessary background to develop the entropic OT theory of Gaussians, including the formulation of OT, entropic OT, and the corresponding dual and dynamical formulations.In Section 3, we compute explicit solutions to the entropy-relaxed 2-Wasserstein distance between Gaussians, including the dynamical formulation that allows for interpolation.As a consequence, we derive a closed-form solution for the corresponding Sinkhorn divergence.In Section 4, we study the barycenters of populations of Gaussians, restricted to the Gaussian manifold.We derive fixedpoint expressions for the entropic 2-Wasserstein distance and the 2-Sinkhorn divergence.Finally, in Section 5, we illustrate the resulting interpolative and barycentric schemes.Especially, we consider varying the regularization magnitude, visualizing the interpolation between the OT and MMD problems in the Sinkhorn case [29,35,64].

Background
In this section, we start by recalling the essential background for optimal transport (OT) and its entropy-relaxed version.More in-depth exposition for OT can be found in [77], and for computational aspects and entropic OT in [22].
Optimal transport.Let (X , d) be a metric space equipped with a lower semicontinuous cost function c : X × X → R ≥0 .Then, the optimal transport problem between two probability measures µ, ν ∈ P(X ) is given by where ADM(µ, ν) is the set of joint probabilities with marginals µ and ν, and Additionally, by E[µ] we denote the expectation of µ.A minimizer of ( 1) is denoted by γ opt and called a transport plan.
The OT problem admits the following Kantorovich (dual) formulation where (ϕ, ψ) ∈ ADM(c) is required to satisfy Potentials ϕ opt , ψ opt achieving the maximum in (3) are called Kantorovich potentials.
Wasserstein distances.The p-Wasserstein distance W p between µ and ν is defined as where d is a metric on X and p ≥ 1.The case p = 2 is particularly interesting, as the resulting metric is then induced by a pseudo-Riemannian metric structure [5,50].
2-Wasserstein distance between Gaussians.One of the rare cases where the 2-Wasserstein distance admits a closed form solution is between two multivariate Gaussian distributions given by [25,41,44,61] It can be shown that ( 6) is induced by a Riemannian metric in the space of ndimensional Gaussians N (R n ), with the metric g K : given by [72] g where v (K,V ) denotes the unique symmetric matrix solving the Sylvester equation Moreover, given N (m 0 , K 0 ), N (m 1 , K 1 ) ∈ N (R n ), the geodesics under the metric (6) are given by N (m t , K t ), with [55] We remark that Eq.( 6) is valid for all Gaussian distributions, including the case when K 0 , K 1 are positive semi-definite.This is in contrast to the affine-invariant Riemannian distance || log(K )|| F , the Log-Euclidean distance || log(K 0 )− log(K 1 )|| F , and the Kullback-Leibler divergence (see below), which require that K 0 , K 1 be strictly positive definite.
Finally, the 2-Wasserstein barycenter μ of a population of probability measures µ i with weights λ i ≥ 0, i = 1, 2, .., N and N i=1 λ i = 1, is defined as the minimizer When the population consists of Gaussians µ i = N (m i , K i ), one can show that the barycenter is Gaussian given by μ = N ( m, K), where m, K satisfy [1, Thm.

6.1]
Entropic relaxation.Let µ, ν ∈ P(X) with densities p µ and p ν .Then, we denote by the Kullback-Leibler divergence (KL-divergence) between µ and ν.The differential entropy of µ is given by H For a product measure, we have the identity A special case that will be used later in this work is the KL-divergence between two non-degenerate multivariate Gaussian distributions µ 0 = N (m 0 , K 0 ) and µ 1 = N (m 1 , K 1 ) when X = R n , which is given by and for the entropy we have Given > 0, we relax (1) with a KL-divergence term between the transport plan and the independent joint distribution as, yielding the entropic OT problem [20] OT which yields a strictly convex problem with respect to γ.Moreover, this problem is numerically more favorable to solve (1) compared, for instance, to the Hungarian and the auction algorithm, due to the Sinkhorn-Knopp algorithm.As shown, for instance in [12,19,24,39,67], the above problem has a unique minimizer given by if and only if there exists functions α ε and β ε such that where k(x, y) = exp − 1 c denotes the Gibbs kernel.We call γ an entropic transport plan.Moreover, when ε → 0, γ ε converges to γ opt , a solution of the OT problem (1) [22,38,47]; while when ε → ∞, γ ε converges to the independent coupling γ ∞ = µ ⊗ ν [35,64].The latter property shows in particular that, for large ε, the entropy-Regularized OT behaves like an inner product and not like a norm.In linear algebra, the polarization formula is the usual way of defining a norm from a inner product.That is the main idea of Sinkhorn divergence.
Sinkhorn divergence.The KL-divergence term in OT c acts as a bias, as discussed in [29].This can be removed by defining the p-Sinkhorn divergence as As shown in [29] if, for example, c = d p , p ≥ 1 the Sinkhorn divergences metrizes the convergence in law in the space of probability measures.
Entropy-Kantorovich duality.In this subsection we summarize well-known results on the Entropy-Kantorich.For further details and proofs, we refer the reader to [24].
Given a probability measure µ, the class of Entropy-Kantorovich potentials is defined by the set of measurable functions ϕ on R n satisfying Then, given c = d 2 , where d(x, y) [24,29,35,40,47], where Finally, we are able to state the full duality theorem between the primal ( 17) and the dual problem (22).The Theorem below is a particular case of Theorem 2.8 and Proposition 2.11 in [24], when we are in the Euclidian space with distance square cost function.
Elements of the pair (ϕ , ψ ) reaching a maximum in (22) are called entropic Kantorovich potentials.Finally, a relationship between α , β in (19), and the entropic Kantorovich potentials ϕ , ψ above, is according to Theorem 1 given by Using the dual formulation, we can show the following.
Dynamical formulation of entropy relaxed optimal transport.Analogously to unregularized OT theory, the entropic-regularization of OT with distance cost admits a dynamical (aka Benamou-Brenier ) formulation.
In the following, we again consider the particular case when the cost function is given by c(x, y) = x − y 2 .Then, we can write (17) as [40,47] OT d 2 (µ 0 , µ 1 ) = min where t ∈ [0, 1], µ 0 = µ 0 , µ 1 = µ 1 , and where the minimum must be understood as taken among all couples (µ t , v t ) solving the continuity equation in the distributional sense (see appendix A); moreover, the minimum is attained if and only if (µ t , v t ) = (µ t , ∇φ ε t ), for a potential φ ε t : R d → R, which is defined in the following via the entropic potentials.The resulting µ t is called the entropic interpolation between µ 0 and µ 1 .
The solution can be characterized by (while abusing the notation and writing µ(x) for the density of µ, which will be done throughout this work) in (19) of the static problem (17) in conjunction with the heat flow allows us to compute the entropic interpolation from µ 0 to µ 1 , which is given by [40,47,65] and α ε ,β ε are the Entropy-Kantorovich potentials solving the system (19).In particular, we have that In particular, when we send the regularization parameter ε → 0, the curves of measures µ t converge to the 2-Wasserstein between µ 0 and µ 1 [39,47].Moreover, we can also write the entropic interpolation µ t and the dynamic entropic Kantorovich potentials (ϕ ε t , ψ ε t ) via the relation , it is easy to check that by imposing v ε t = ∇φ ε t we have that (µ t , v ε t ) solves the Fokker-Planck equation 3 Entropy-Regularized 2-Wasserstein Distance between Gaussians In this section we consider the special case of ( 17) and (20) when c(x, y) = d 2 (x, y) = |x − y| 2 is the Euclidian distance in R n and µ 0 ∼ N (m 0 , K 0 ), ν ∼ N (m 1 , K 1 ) are multivariate Gaussian distributions.We are interested in obtain explicity formulas for the optimal coupling γ ε solving (17), the Entropy-Kantorovich maximizers (ϕ , ψ ) in ( 22) and the entropic displacement interpolation µ ε t in (29).
We start by showing that we can assume, without loss of generality, that µ 0 and µ 1 are centered Gaussian distributions.The general case is obtain just by a shift depending on the L 2 -distance of the center of both Gaussians.
Proof Recall the definition given in ( 17) Then, as c = d 2 , for the first term we can write x − y 2 dγ(x + m 0 , y + m 1 ). ( We now verify that the requirement γ ∈ ADM(µ 0 , µ 1 ) is equivalent with γ(• + m 0 , • + m 1 ) ∈ ADM(μ 0 , μ1 ), which results from and similarly for the other margin.Finally, for the entropy term, we use the identity (14).Now, as the entropy of a distribution does not depend on the expected value, we have H(µ i ) = H(μ i ), and therefore Putting everything together, we get Proposition 3 Let µ i = N (0, K i ) ∈ N (R n ) for i = 0, 1.Then, the unique optimal plan γ in OT d 2 (µ 0 , µ 1 ) is a centered Gaussian distribution.
Proof Note that E γ [d 2 ] depends only on the mean and covariance of γ, and therefore remains constant, if γ is replaced with a Gaussian with the corresponding mean and covariance (which we can do, as the marginals are Gaussians).Then, for the other term, using the identity ( 14) It is readily seen that the γ with a fixed covariance matrix minimizing this expression is Gaussian, as Gaussians achieve maximal entropy over distributions sharing a fixed covariance matrix.Therefore, we can deduce that γ is Gaussian.Finally, as both of the marginals µ 0 and µ 1 are centered, so is γ .
We now arrive at the main theorem of this work, detailing the entropic 2-Wasserstein geometry between multivariate Gaussians.The proof is based on studying the Schrödinger system given in (19).We give an alternative proof for the statement a. in Theorem 2 in Appendix B, by finding the minimizer of the OT problem.
Recall, that a noteworthy property of the entropic interpolant, is that even if we are interpolating from µ to itself, the trajectory does not constantly stay µ.
Theorem 2 Let µ i = N (0, K i ), for i = 0, 1, be two centered multivariate Gaus- a.The density of the optimal entropy relaxed plan γ is given by where α (x) = exp x T Ax + a , β (y) = exp y T By + b , and b.The entropic optimal transport quantity is given by c.The entropic displacement interpolation µ t , t ∈ [0, 1], between µ 0 and µ 1 is given by µ t = N (0, K t ), where Proof Part a. Recall that α ε , β ε are the unique functions that give the density of the optimal plan γ γ The optimal plan is required to have the right marginals (19), that is, Assuming α ε (x) = exp(x T Ax + a) and β ε (y) = exp(y T By + b), substituting in µ 0 and µ 1 , and after some simplifications, the system reads Using the identity the system (45) results in Let us solve for A and B first.From system (47), we get that A and B can be written as Then, one can show, that the A, B given in (40) solves this system.Plugging A, B in the expressions for exp(a + b) in ( 47), we get for which a possible solution is given by Now, we show that A solves the equation given in (48).Manipulating (48) we see that it suffices to show the equality Substituting in A given in (40), the left-hand side reads whereas the right-hand side is given by Therefore, we need to show the equality which can be derived as follows where the first step results from writing then plugging ϕ and ψ into (22) yields where we used the fact that C Part c.As we have solved for α and β for the optimal plan, the entropic interpolant µ t between µ 0 and µ 1 is given by ( 29), which we rewrite here Then, we can compute similar computation yields Putting these together, we get where N is a normalizing constant.We can simplify the matrix T 0 (A) + T 1 (B) in (62).Write and consider the first term where second equality follows from (47), third from (48), fourth from the Woodbury matrix inverse identity and the last one from substituting in B given in (40).
Likewise, we can substitute B in the second term T 1 (B), which yields 1 . ( Putting the two terms together, we get Note, that we can write (62) as a Gaussian with covariance matrix K t and so Where for the last step we use the formula Above we only considered centered Gaussians.Now we combine the results obtained in Proposition 2 and Theorem 2 to deduce the general case.As a consequence, we also derive the corresponding formulas for the Sinkhorn divergence between two Gaussians b.The entropic interpolant between µ 0 and µ 1 is where m t = (t − 1)m 0 − tm 1 , and K t is given in (42).
We will now emphasize an identity that can be derived from the calculations of Theorem 2, which we find useful.
Lemma 1 Let C, D be symmetric positive-definite matrices.Then, Proof Similarly to (40), let Then, substituting B into the first equation of (47) (while remembering to replace and so the result follows from substituting in A, multiplying both sides by − , and moving −I from right-hand side to left-hand side.
Next, we study the limiting cases of going to 0 and ∞, reconfirming that the Sinkhorn divergence interpolates between 2-Wasserstein and M M D [29,35,64].
b. lim c.For t ∈ [0, 1], denote by µ t the 2-Wasserstein geodesic given in (9), and by µ t the entropic 2-Wasserstein interpolant between µ 0 and µ 1 given in (42).Then, Proof Part a.The → 0 case is a straight-forward computation Therefore, since log → 0 when → 0, We now compute the limit when ε → ∞.It is enough to show that the term goes to 0 when ε → ∞.In fact, denote by {λ i } n i=1 the eigenvalues of K 1 K 2 .Then, So, first notice that for any λ > 0, Second, we have and so the result follows.
Part b.Straight-forward application of the above result to (72).
Part c.By a straight-forward computation on (42),
Next, let us focus on the Gaussian case.We lack the proof that such a barycenter will indeed be a Gaussian, so do note, that the following statement requires the restriction to Gaussians for the candidate barycenters.
Theorem 3 (Entropic Barycenter of Gaussians) Let µ i = N (m i , K i ), i = 1, 2, ..., N be a population of multivariate Gaussians.Then, their entropic barycenter (86) with weights λ i ≥ 0 such that N i=1 λ i = 1, restricted to the manifold of Gaussians N (R n ), is given by μ = N ( m, K), where Proof Proposition 2 allows us to split the geometry into the L 2 -geometry between the means and the entropic 2-Wasserstein geometry between the centered Gaussians (or their covariances).Then, it immediately follows that Therefore, we restrict our analysis to the case of centered distributions.Remark again, that in general, the minimizer of (86) might not be Gaussian, even when the population consists of Gaussians.However, here we will look for the barycenter on the manifold of Gaussian measures.
We begin with a straight-forward computation of the gradient of the objective given in (86) where we used the closed-form solution obtained in the part b. of Theorem 2. Now, recall that ∇ K TrK = I.For the second term, it holds Finally, for the third term, we have i KK where Log(M ) denotes the matrix square-root, and we use the results when f is a matrix function given by a Taylor series, such as the matrix square-root or the matrix logarithm.
Substituting (90) and ( 91) in (89), and using the Woodbury matrix inverse identity (65), we get The last equality follows from Lemma 1 with the substitutions C ← K and D ← K i .Finally, setting (93) to zero, we get that the optimal K satisfies the expression given in (87).
Sinkhorn barycenter.Now, we compute the barycenter of a population of Gaussians under the Sinkhorn divergence, defined by Note that as S 2 (µ, ν) is convex in both µ and ν [29, Thm.1], and so (94) is convex in µ.Now, similarly to the entropic barycenter case, we look for the barycenter of a population of Gaussians in the space of Gaussians N (R n ).
Theorem 4 (Sinkhorn Barycenter of Gaussians) Let µ i = N (m i , K i ), i = 1, 2, ..., N be a population of multivariate Gaussians.Then, their Sinkhorn barycenter (94) with weights λ i ≥ 0 such that N i=1 λ i = 1, restricted to the manifold of Gaussians N (R n ), is given by μ = N ( m, K), where Proof As in the entropic 2-Wasserstein case, we take µ = N (0, K) to be of Gaussian form.Then, we can compute the gradient where the last term disappears.Then, we can use the gradient of the first term, which we computed in (93).A very similar computation yields Substituting ( 93) and ( 97) into (96) yields When ( 98) is set to zero, we find, that the optimal K satisfies the relation given in (95).
Fixed-point iteration.The fixed-point iteration algorithm is defined by where the initial case x 0 is handpicked by the user.The Banach fixed-point theorem is a well-known result stating that such an iteration converges to a fixed-point, i.e. an element x satisfying x = F (x), if F is a contraction mapping.
In the case of the 2-Wasserstein barycenter given in (11), the fixed-point iteration can be shown to converge [2] to the unique barycenter.In the entropic 2-Wasserstein and the 2-Sinkhorn cases we leave such a proof as future work.However, while computing the numerical results in Section 5, the fixed-point iteration always succeeded to converge.

Numerical Illustrations
We will now illustrate the resulting entropic 2-Wasserstein distance and 2-Sinkhorn divergence for Gaussians by employing the closed-form solutions to visualize entropic interpolations between end point Gaussians.Furthermore, we emply the fixed-point iteration (99) in conjunction with the fixed-point expressions of the barycenters for their visualization.
First, we consider the interpolant between one-dimensional Gaussians given in Fig. 1, where the densities of the interpolants are plotted.As one can see, increasing causes the middle of the interpolation to flatten out.This results from the Fokker-Plank equation (31), which governs the diffusion of the evolution of processes that are objected to Brownian noise.In the limit → ∞, we would witness a heat death of the distribution.
The same can be seen in the three-dimensional case, depicted in Fig. 2, visualized using the code accompanying [28].Here, the ellipsoids are determined by the eigenvectors and -values of the covariance matrix of the corresponding Gaussian, and the colors visualize the level sets of the ellipsoids.Note that a large ellipsoid corresponds to high variance in each direction, and does not actually increase the mass of the distribution.Such visualizations are common in diffusion tensor imaging (DTI), where the tensors (covariance matrices) define Gaussian diffusion of water at voxels images produced by magnetic resonance imaging (MRI) [7].
Finally, we consider the entropic 2-Wasserstein and Sinkhorn barycenters in Fig. 3.We consider four different Gaussians, placed in the corners of the square fields in the figure, and plot the barycenters for varying weights, resulting in the barycentric span of the four Gaussians.As the results show, the barycenters are very similar under the two frameworks with small .However, as is increased, the Sinkhorn barycenter seems to be more resiliant against the fattening of the barycenters, which can be seen in the 2-Wasserstein case.
The covariance matrix Γ should be a symmetric positive-definite matrix, which is equivalent to its Schur complement S(C) being positive definite, that is, If S(C) fails to be strictly positive definite, F (C) explodes to infinity, and so it to consider C so that S(C) 0.
leaving us with the task of minimizing (106) with respect to S. Note that we could maximize (105) independently with respect to C, as det(Γ ) is constant over the fiber {C : S(C) = S}.
As F is strictly convex with respect to S, a solution to (101) can be found when the gradient of the expression with respect to S is zero, leading to Moving the second term to RHS, multiplying (107) by (K 1 − S) In general, CAREs do not admit an analytical solution.However, we are in luck, as one can check that (108) is solved by Finally, it is straight-forward to check that the solution Ŝ is indeed symmetric and positive-definite, and therefore satisfies (103).Plugging Ŝ in (106), noticing that 2 has same eigenvalues as K 1 K 2 , and some simplifications concludes the proof.Now, we compute the OT quantity given Ŝ.We first compute the trace term (106), which gives = Tr (111)

1 2
has same eigenvalues and thus trace as CD for any square matrices C and D.

Fig. 3
Fig. 3 Barycentric spans of the four corner tensors under the entropic 2-Wasserstein metric and the 2-Sinkhorn divergence for varying .

1 2
from right, multiplying each side by their corresponding transposes, and some elementary manipulations of the equation, we arrive at a continuous algebraic Riccati equation (CARE)2 K 1 − 2 S − 4SK 2 S = 0.