Coordinate-wise Transformation of Probability Distributions to Achieve a Stein-type Identity

It is shown that for any given multi-dimensional probability distribution, there exists a unique coordinate-wise transformation such that the transformed distribution satisfies a Stein-type identity. The proof is based on an energy minimization problem over a subset of the Wasserstein space. The result is interpreted as a generalization of the diagonal scaling theorem established by Marshall and Olkin (1968).


Introduction
In their seminal paper [20], Marshall and Olkin proved the following diagonal scaling theorem. Let S be a d × d positive semi-definite matrix and assume that S is strictly copositive in the sense that inf w 1 ,...,w d >0 Then, there exists a unique positive diagonal matrix D such that the sum of each row of DSD is unity. Note that (1) is satisfied if S is positive definite. The theorem is interpreted in a probabilistic framework. Let X be a random column vector with mean zero and covariance matrix S. Then, since ∑ d j=1 (DSD) ij = 1 for each i, the distribution µ of the transformed random vector DX satisfies an identity d ∑ j=1 ∫ x i x j dµ = 1, i = 1, . . . , d.
This property is applied to summarize multivariate data. Refer to [28] for details.
In the present paper, we provide a nonlinear analogue of the result. We admit a nonlinear coordinate-wise transformation of a random vector to achieve a stronger condition than (2). This condition will be referred to as the Stein-type identity. Under some mild conditions on µ, it is shown that there exists such a unique transformation. The proof is based on a variational formulation. The Marshall-Olkin theorem is, in fact, derived in a similar manner [20,15]. The space we use in the proof is the Wasserstein space, a distance space induced from optimal transportation. Refer to [25,32] for comprehensive studies of optimal transportation and its applications. Another generalization of the Marshall-Olkin theorem is considered by [3], where the dimension d is infinity but the transformation is linear.
As is well known, Sklar's theorem (see, e.g., [22]) states that any multi-dimensional distribution is transformed by the probability integral transformation into a distribution with uniform marginals. The resultant distribution is called a copula. Our result is considered as an alternative to Sklar's theorem.
The remainder of the present paper is organized as follows. In Section 2, we describe the existence and uniqueness theorem as well as a variational characterization theorem. In Section 3, we clarify the regularity properties of Stein-type distributions. In Section 4, we prove the main results using the theory of optimal transportation. In Section 5, tractable conditions for existence are considered. In Section 6, a numerical method to find the transformation for piecewise uniform distributions is proposed. Finally, we discuss open problems in Section 7.

Main results
We first define a class of distributions that satisfy a stronger condition than (2). Let P 2 = P 2 (R d ) be the set of probability distributions µ on R d with mean zero and finite variance such that each marginal distribution µ i of µ is absolutely continuous with respect to the Lebesgue measure on R. Note that µ itself is not assumed to be absolutely continuous.
The mean-zero condition is imposed only for simplicity. We say that a function f : R → R is absolutely continuous if there exists a locally integrable function f such that f (x) = f (0) + ∫ x 0 f (y)dy in Lebesgue's sense.

Definition 1.
We say that a distribution µ ∈ P 2 is Stein-type if it satisfies for any absolutely continuous function f : R → R with essentially bounded derivative f .
Note that the equation (2) is a special case of (3), where f (x i ) = x i .
We refer to the equation (3) as the Stein-type identity. Indeed, if d = 1, it reduces to the Stein identity ∫ f (x 1 )x 1 dµ = ∫ f (x 1 )dµ, which implies that µ is the standard normal distribution (see [29] and [5]). Similarly, if µ is completely independent in the sense that µ is the direct product of its marginal µ i , then only the d-dimensional standard normal distribution satisfies (3). We focus on dependent cases.
For Gaussian random variables, we obtain the following lemma, where the expectation is denoted by E.
Lemma 1 (Theorem 5 of [28]). Let µ denote the d-dimensional normal distribution with mean zero and covariance matrix S. Then, µ is Stein-type if and only if ∑ j S ij = 1 for each i.
Proof. Let (X 1 , . . . , X d ) be distributed according to µ. Then, E[X j |X i ] = S ij X i /S ii and The last equality follows from the Stein identity for the univariate normal distributions.
The following example gives a rich class of Stein-type distributions.  If a random vector (X 1 , . . . , X d ) has a Stein-type distribution, then the sum ∑ j X j is positively correlated with any monotone transformation of X i due to (3). Refer to Section 8 of [28] for an application of this property.
For each µ ∈ P 2 , let T cw (µ) be the set of coordinate-wise transformations ∞} is non-decreasing and T µ belongs to P 2 . Here, T µ is the push-forward measure defined by (T µ)(A) = µ(T −1 (A)) for any measurable set A. The set T cw (µ) depends only on the marginal distributions of µ. Two maps T and U in T cw (µ) are identified if µ(T = U ) = 1. Note that T i has discontinuous points if the support of (T µ) i is not connected.
We consider a problem to find a map T ∈ T cw (µ) such that T µ is Stein-type. Let us call such a map a Stein-type transformation of µ. For example, if µ is the direct measure The following lemma is immediate.

Lemma 2.
Let µ be the normal distribution with a covariance matrix S. Then, µ has a Stein-type transformation if S is strictly copositive in the sense of (1).
Then, T µ is Stein-type due to Lemma 1.
Denote the set of coordinate-wise transformed distributions of µ by We refer to F µ as a fiber. The following lemma is a direct consequence of the onedimensional optimal transportation. See Appendix A. Lemma 3. For given µ ∈ P 2 and ν ∈ F µ , the map T ∈ T cw (µ) satisfying ν = T µ is uniquely determined µ-almost everywhere. Furthermore, the relation ν ∈ F µ between two measures µ and ν is an equivalence relation. In particular, P 2 is partitioned into mutually disjoint fibers. Now, we state our three main theorems. All proofs are presented in Section 4.
The first theorem characterizes Stein-type distributions in terms of the variational principle. Define an energy functional E(µ) of µ by where We conjecture that the uniqueness follows without the regular support condition. See Section 7 for more details.
We now present a few remarks before proceeding to the following section.
The uniqueness and existence results in Theorem 2 and Theorem 3 are consequences of the variational characterization in Theorem 1, as will be shown in Section 4. For d = 1, the functional E(µ) is the Kullback-Leibler divergence from µ to the standard normal density up to a constant term. For d ≥ 2, however, E is not even bounded from below. Indeed, for each t > 0, let µ t be the multivariate normal distribution with mean zero and covariance matrix Σ t = P + t(I − P ), where I is the identity matrix, and P denotes the orthogonal projection to the direction (1, . . . , 1) ∈ R d . Then, each marginal distribution of µ t is normal with variance , which tends to −∞ as t → ∞. Therefore, it is not trivial if there is a minimizer of E over the fiber. Nevertheless, the existence and uniqueness theorems are obtained.
If µ has the joint density function p(x), then the negative joint entropy is defined by In most cases, we can replace the marginal entropy term in E(µ) with the joint entropy , which is referred to as the multi-information function or the measure of multivariate dependence, is invariant in each fiber (e.g., [11] and [30]). However, in some pathological cases, the difference diverges. Therefore, it is more appropriate to adopt the marginal entropy.
According to Sklar's theorem (e.g. [22]), any d-dimensional distribution µ is transformed by the probability integral transformation T i (x i ) = ∫ x i −∞ dµ i into the distribution T µ with uniform marginals unless some µ i has an atom. The resultant distribution T µ is called a copula. The Stein-type distribution we defined is considered as an alternative representation of the copula. Copulas are also characterized by an energy minimization problem. Here, the potential term in (4) is replaced with  [4], where, in contrast to the present paper, the marginals are fixed to be uniform.

Regularity of Stein-type distributions
The Stein-type identity forces regularity of marginal density functions. We first characterize this by an integral equation.
where m i (x i ) denotes the conditional expectation of ∑ d j=1 x j given x i with respect to µ.
is the indicator function of (a, b). The Stein-type identity for h ab implies Letting b → a in (7), we obtain (6).
Conversely, assume (6). The right-hand side of (6) converges to zero as a → ±∞ because ∫ x j dµ j = 0 for all j. Then, for any bounded and absolutely continuous function f with where the second equality follows from the integral-by-parts formula. If f is not bounded, As a corollary, the regularity of the marginal density functions is established.

Corollary 1.
Let µ be Stein-type. Then, its marginal density functions p i (x i ) are bounded, absolutely continuous, and converge to zero as x i → ±∞.
Proof. From the formula (6), it is obvious that p i is absolutely continuous and bounded We also have p i (x i ) → 0 as x i → ±∞ because the right-hand side of (6) vanishes as a → ±∞.
Although the marginal density function of any Stein-type distribution is absolutely continuous, it can have non-differentiable points as shown in an example in Section 6. The continuous differentiability of p i (x i ) follows from the regularity of the pair-wise copula of µ along formula (6). We do not pursue this line of investigation here. On the other hand, we conjecture that the marginal density of any Stein-type distribution is positive everywhere. See Section 7 for more details.
The following corollary will be used in Section 4.

Corollary 2.
Let µ be Stein-type. Then, its negative marginal entropy To prove As a remark, we also show that Stein-type distributions have finite Fisher information.
The Fisher information of a density function q on R is defined by where q is assumed to be absolutely continuous, and q (x)/q(x) is set to 0 if q is not differentiable or not positive at x. See [13] for properties implied by finite Fisher information.
Note that the Fisher information we defined is that of location family {q(x − θ) | θ ∈ R} in statistics (e.g., [18]).

Corollary 3.
For any Stein-type distribution µ, the Fisher information I(p i ) of each marginal density p i is bounded by the dimension d. In particular, p i has bounded variation.
Proof. From (6), the score function where the last equality follows from the Stein-type identity with f (x i ) = x i . By the Cauchy-Schwarz inequality, we also have Other properties are given in Appendix C.

Proofs based on the theory of optimal transportation
In this section, we prove the three main theorems stated in Section 2. The proof is based on the theory of optimal transportation. Necessary facts about one-dimensional optimal transportation are summarized in Appendix A.

Variational problem over a fiber of Wasserstein space
Let F be a fiber of P 2 (see Section 2 for the definition) and choose two measures µ and . Define the geodesic, which is also referred to as the displacement interpolation [21], from µ to ν by where Id denotes the identity map. Based on the one-dimensional optimal transportation, Although a geodesic between any pair of distributions in P 2 is similarly defined, we need only geodesics in a common fiber. It is known that a geodesic actually attains the minimum length of a path between two measures with respect to the L 2 -Wasserstein distance (see e.g. [2] and [32]). Here the L 2 -Wasserstein distance is the infimum of (∫ x − y 2 dγ(x, y) ) 1/2 over the joint distribution γ on R 2d with the marginal distributions µ and ν. Note that each fiber F is totally geodesic in the sense of [31].
From a different perspective from ours, optimal transportation between two distributions sharing the same copula is considered in [1], where the various cost functions are the center of discussion.
Recall that µ is said to have a regular support if its support is the direct product of the supports of marginal distributions.

Lemma 4.
Let F be a fiber and choose any two distributions µ and ν in F, where µ = ν.
is strictly convex if one of the following conditions is satisfied: (i) µ (and therefore ν) has a regular support, or (ii) the supports of µ i and ν i are connected, respectively, for each i.
Proof. Let ν = T µ, with T ∈ T cw (µ). Let p i be the marginal density of µ. By the change-of-variable formula (Lemma 11 in Appendix A), we obtain where is not strictly convex. Then, there is an interval over which Then, by the regular support condition, we have µ( Convexity along a geodesic is referred to as displacement convexity [21]. Lemma 4 shows that E is displacement convex over each fiber. Refer to [2] for further details on displacement convexity.

Proof of Theorem 1
Let µ be a Stein-type distribution. Corollary 2 implies that µ belongs to dom E. From the convexity (Lemma 4), it is sufficient to show that ≥ 0 for any ν = T µ ∈ F, where d/dt + denotes the right derivative. It follows from formula If T i is absolutely continuous, the right-hand side vanishes by the Stein-type identity, where the boundedness of the derivatives T i can be assumed by a standard approximation argument, as in the proof of Theorem 4. If T i is not absolutely continuous, T i can be decomposed into an absolutely continuous part and a discontinuous part as Then, by Lebesgue's dominated convergence theorem and the Stein-type identity, we obtain Conversely, assume that E(T µ) is minimized at T = Id. Let f be an absolutely continuous function with bounded derivative f . Then, for sufficiently small ε > 0, both (9) is zero, and µ satisfies the Stein-type identity.

Proof of Theorem 2
Assume that µ has a regular support and admits a Stein-type transformation T . Then, Theorem 1 implies that T µ minimizes E over the fiber F µ . However, it is deduced from Lemma 4 that E is strictly convex over F µ . Thus, the minimizer is unique.

Proof of Theorem 3
Assume that µ is copositive. Denote the functional E restricted to the fiber F µ by E µ .
From Theorem 1, it is sufficient to show that E µ has a minimum point. We first show that E µ is bounded from below and that the level set {ν | E µ ≤ c} for each c ∈ R is tight. For any ν ∈ F µ , the copositivity condition implies where the last inequality follows from the nonnegativity of the Kullback-Leibler divergence.
Then, E µ is bounded from below as where C is a constant independent of ν. This inequality also implies that the level set Now there exists a weakly converging sequence ν k such that E µ (ν k ) converges to inf E µ (ν).

Sufficient conditions for copositivity
We now present the sufficient conditions for copositivity of a given distribution µ. In Subsection 5.1, we first take into account the measures with a non-zero mean as well as coordinate-wise transformations that are constant over an interval. We then present a lower bound of the quantity β(µ) in (5). Subsequent subsections are devoted to finding sufficient conditions for copositivity.

Extension of the definition and a lower bound
Let P 2 * be the set of measures on R d such that each marginal µ i is absolutely continuous and ∫ The set T cw * (µ) for µ ∈ P 2 * is defined by the set of coordinate-wise non-decreasing map T : The following lemma is useful to consider copositivity. Denote the inner product and norm of L 2 (µ) by f, g = ∫ f (x)g(x)dµ and f = f, f 1/2 , respectively.
It is shown that β(µ) and β L (µ) are invariant under coordinate-wise transformations.
Thus, β(µ) and β L (µ) depend only on the copula of µ. Furthermore, they depend only on the set of two-dimensional marginal copulas of µ.

Gaussian case
We obtain an explicit expression of β L (µ) if µ is a multivariate normal distribution.

Lemma 7.
Let µ be the multivariate normal distribution with mean vector 0 and covariance matrix S. Then, β L (µ) is the minimum eigenvalue of the correlation matrix of S. In particular, µ is copositive if S is non-singular.
Proof. The case of d = 2 has been proven by [17].
Assume that the marginal density of µ is the standard normal φ(x) = (2π) −1/2 e −x 2 /2 without loss of generality. Then, the covariance matrix coincides with the correlation matrix R = (ρ ij ). We prove that β L (µ) = λ min (R), where the minimum eigenvalue of a positive definite matrix A is denoted by λ min (A). Note that λ min (R) ≤ 1 because tr(R) = d.
Denote the Hermite polynomial of order k by η k ( For any k ≥ 1, we can show that in an inequality tr(AB) ≥ λ min (A)tr(B) for any positive definite matrices A and B. Thus, we have ∫ ( Conversely, let (v 1 , . . . , v d ) be the eigenvector corresponding to λ min (R) and T i ( We conjecture that β(µ) coincides with (1) if µ is Gaussian and S is its covariance matrix. See Section 7.

Rényi's condition of positive copula densities
The following theorem, which has been proven by [26] for d = 2, provides a checkable condition for copositivity.

Theorem 5 ([26] for d = 2).
Assume that µ has a regular support (see Section 2 for the definition) and for each pair i = j, the two-dimensional marginal copula density function c ij of µ is square integrable. Then, β L (µ) > 0. In particular, µ is copositive.
Proof. We first prove that if T ∈ ∏ d i=1 L 2 0 (µ i ) satisfies an equation . . , d} be the set of indices i such that µ(T i = 0) > 0. Next, by contradiction, assume I is not empty.
However, based on the assumption about the support, we obtain µ(∩ i∈I A i ) > 0, which implies that µ( ∑ i T i > 0) > 0 and contradiction. Thus, I is empty, and T = 0. Now, we prove that β L (µ) > 0 using elementary cocepts of functional analysis (refer to [33]). Assume that µ i is uniform over [0, 1], i.e., µ is a copula distribution. Let H = ∏ d i=1 L 2 0 (µ i ) be a Hilbert space of R d -valued functions and define the inner product of H as T, Let c ij be the pairwise copula density and define an operator Based on the assumption that ∫∫ c 2 ij dx i dx j < ∞, we deduce that C is a Hilbert-Schmidt operator. It is easy to see that C is self-adjoint. Now, we can write ∫ ( If (I + C)T = 0, then (11) implies ∑ i T i = 0 and, therefore, T = 0. Thus, I + C is injective. Since the operator I + C is an injective Fredholm operator, it is surjective. By the continuous inverse theorem, we deduce that the inverse operator (I +C) −1 is bounded. Therefore, we have

Corollary 4.
If µ has a positive and bounded copula density function, then µ is copositive.
By Theorem 5, we obtain an alternative proof of Lemma 7 without evaluating β L (µ) (details omitted). In Section 6, we deal with positive and piecewise uniform copula density functions.
Note that the support of µ is not determined from the support of two-dimensional marginal distributions. See the following example. Refer to [27] for related topics. In order to demonstrate this point, let

A condition without regular supports
Theorem 5 assumes regularity of the support. Here, we present a result without assuming the regular support condition.
Proof. We first prove the case d = 2. For each i ∈ {1, 2}, let 0 = T i ∈ L 2 0 (µ i ), and let T ± i be the positive and negative parts of T i such that Assume that T 1 is non-decreasing. Then, we have a 1 < b 1 for any a 1 ∈ I − 1 and Therefore, and the result follows. Note that we did not use the monotonicity of T 2 . Now, we prove the case d ≥ 3. In the same manner as above, we have Since the condition (12) is invariant under marginalization, it is inductively shown that For example, if µ is the uniform distribution over the region [−1, 1] 2 \ [−1, 0] 2 , then µ does not have a regular support but is copositive, where the constant δ in (12) is 1/2.
Proof. The Cauchy-Schwarz inequality implies that If c is square-integrable, then the left-hand side should converge to 0 as δ → 0, which is impossible. Thus, c is not square-integrable.
We conjecture that many copulas with tail dependence are copositive. On the other hand, there is a non-copositive measure with positive copula density, as follows.
Example 4 (Tail counter-comonotonic copula). It is known that there is a positive copula density function with the property which is equivalent to λ = 1 in Lemma 8. Such a copula is referred to as a lower tail comonotonic copula (see Section 2.21 of [12]). Let µ be the induced measure of Y 1 = X 1 and Y 2 = 1 − X 2 . Then, µ is not copositive. Indeed, define a map T ∈ T cw * (µ) by where I A denotes the indicator function of a set A. Then, T 1 = T 2 = √ δ(1 − δ) and In a similar manner to Lemma 6, we deduce that β(µ) = 0.

Piecewise uniform densities
In this section, it is shown that if µ has piecewise uniform density function, then the Stein-type transformation of µ is obtained by finite-dimensional optimization. Here, we do not impose the zero mean condition on measures µ as the preceding section.
We say that a probability density function c(u) on [0, 1] d is piecewise uniform if its two-dimensional marginal densities are written as for some n, where π ij ab is a positive number such that n ∑ a=1 n ∑ b=1 π ij ab = 1.
Let π i a = ∑ n b=1 π ij ab . Note that c is not necessarily a copula density. However, it is transformed by a piecewise linear transform into a copula density. Then, Corollary 4 guarantees the existence of a Stein-type transformation.
By solving Equation (6), we obtain an expression of the Stein-type transformation of c as follows. Denote the cumulative distribution function and density function of the standard normal distribution by Φ and φ, respectively.

Lemma 9.
Let c satisfy (13), and let p be the Stein-type density corresponding to c.
Then, there exist real constants α 1i , . . . , α ni and ξ 1i < · · · < ξ n−1,i such that where ξ 0i = −∞, ξ ni = ∞, and Z ai = Φ(ξ ai − α ai ) − Φ(ξ a−1,i − α ai ). The Stein-type transformation is , u i ∈ ( a−1 n , a n ], (15) and the two-dimensional marginal density is Furthermore, the following identity is satisfied: Proof. Equation (6) is piecewise Gaussian up to a normalizing constant. Since the mass of each piece is preserved under a coordinate-wise transformation, we obtain the form (14). Then, the unique monotone transformation (15) is derived from c i (u i )du i = p i (x i )dx i . Equation (16) results from the transformation of c ij (u i , u j ). Finally, Equation (17) is obtained from The parameters α ai and ξ ai are determined by the continuity of (14) at x i = ξ ai and the identity (17). However, instead of solving the simultaneous equations directly, we adopt an optimization approach.
Assume the density of a distribution µ obeys the parametric form given by Equation (14). Then, the energy function E(µ) defined in Section 4 is a function of α and ξ, which is denoted by F (α, ξ) and is obtained as follows: Since Z ai and M ai are functions of three parameters α ai , ξ ai , and ξ a−1,i , we denote the corresponding partial derivative by D 1 , D 2 , and D 3 . The derivatives of F are By using these formulas, we obtain the following theorem.
is the expectation parameter of an exponential family φ(x i −α ai )/Z ai , it is an increasing function of α ai (e.g., [18]). Therefore, D 1 M ai > 0.
Thus, the stationary condition ∂F/∂α ai = 0 is equivalent to which is equivalent to (17) and solves the integral equation (6) except at boundary points ξ ai . Furthermore, substituting this relation into (19), we obtain Therefore, ∂F/∂ξ ai = 0 is equivalent to the continuity of p i at ξ ai . Then, the density p is the Stein-type density, which is unique due to Theorem 2.

Example 5.
We numerically obtain the Stein-type densities of discretized copulas. The result is shown in Figure 1. The copula used here is the Clayton copula The discretized copula density of n × n cells is given by (13) with

Discussion
In the present paper, we showed that a class of multi-dimensional distributions has a unique representation via the Stein-type identity. Now, we describe areas for future study and some open problems.
In Section 3, we derived some properties of Stein-type distributions. The author could not find any counter-example against the following conjecture. A partial answer to Conjecture 1 is given in the following lemma.

Lemma 10.
Let µ be a Stein-type distribution. If the copula of µ has pair-wise marginal densities c ij such that then each marginal density p i of µ is positive everywhere. In particular, if the copula density of µ is bounded, then the same consequence follows.
Let a ∈ R be a point at which p i (a) > 0. Then, Gronwall's lemma shows that p i (x i ) ≥ p i (a)e −(x i +D * ) 2 /2+(a+D * ) 2 /2 > 0 for x i > a, and similarly p i (x i ) > 0 for x i < a.
If Conjecture 1 is positively solved, then the following conjecture, which is based on Theorem 2, is also positive according to Lemma 4 (ii).

Conjecture 2. A Stein-type transformation is unique if it exists.
We state a relevant conjecture that is the converse of Theorem 3.

Conjecture 3.
A distribution is copositive if it has a Stein-type transformation.
In Section 4, we showed that a Stein-type distribution is characterized by the stationary point of an energy functional E over a fiber F. From the perspective of optimal transportation, we can construct the gradient flow of the energy functional with respect to the L 2 -Wasserstein space ( [14], [23] and [32]). The formal equation is as follows where . Although this appears to be an independent system of onedimensional Fokker-Planck equations, the equations interact with each other via m i (x i ).
Moreover, the physical meaning of the equation is not clear. From Theorem 4, it follows that each Stein-type density is a stationary point of (20). The time evolution will be theoretically of interest.
In Section 5, we presented sufficient conditions for copositivity of distributions. In particular, a Gaussian distribution is copositive if its covariance matrix is not degenerated.
Conversely, if a Gaussian distribution is copositive, then the covariance matrix must, by definition, be strictly copositive (see Equation (1)). The following conjecture naturally arises but is not proven. This is positively solved if Conjecture 3 is correct, due to Lemma 2.

Conjecture 4.
A Gaussian distribution is copositive if the covariance matrix is strictly copositive.
As stated in Subsection 5.5, tail-dependent copulas do not satisfy the sufficient condition in Theorem 5. The copositivity of tail-dependent copulas remains unclear.
In the present paper, we did not consider statistical models that explain a given data set. A statistical model involving a Stein-type distribution is essentially equivalent to a copula model because such models correspond to each other through coordinate-wise transformations, whereas the marginal distributions are not of much interest in copula modelling. The class given in Example 1 provides a flexible model because the distribution of U i 's in the construction can be selected arbitrarily.
Finally, it is expected that there is a coordinate-wise transformation to satisfy for any monotone increasing functions f and g. Although the condition (21) appears to be too strong, how to deal with this problem remains unclear.
Let P 2 (R) be the set of absolutely continuous probability distributions µ on R such that ∫ xdµ = 0 and ∫ x 2 dµ < ∞. For given µ ∈ P 2 (R), let T (µ) be the set of non-decreasing For given µ and ν in P 2 (R), there exists T ∈ T (µ) such that ν = T µ. The map is uniquely determined µ-almost everywhere. More explicitly, T is given by The map T is called the optimal transportation from µ to ν because this map minimizes the Since µ and ν are absolutely continuous, T is decomposed into an absolutely continuous part, T ac , and a discontinuous part, T d , without a singular continuous part. This is because G − constructed above has the same property. The decomposition is unique up to a µ-negligible set.
The following lemmas are used in Section 4 and Section 5. These lemmas were originally proven for multi-dimensional measures but here we simplify them for the one-dimensional case.
Lemma 11 (Theorem 4.4 of [21]). For given µ and ν in P 2 (R), let T be a unique monotone map such that ν = T µ. Let p and q be density functions of µ and ν, respectively. Let X ⊂ R denote the set of points where the derivative T is defined and positive. Then, for any measurable function A on [0, ∞) with A(0) = 0.
Lemma 13 (Proposition 4.2 of [21]). Let µ ∈ P 2 . If T : R → R is a non-decreasing function written as T = T ac + T d and the derivative (T ac ) of the absolutely continuous part is strictly positive µ-almost everywhere, then T µ is absolutely continuous.

B Explicit expression of Stein-type distributions
We formally derive an explicit expression of the Stein-type distributions.
Assume that µ ∈ P 2 has a smooth density function p with decay at infinity.
where dx −i means dx 1 · · · dx i−1 dx i+1 · · · dx d . In fact, formula (22) is rewritten as ∂p i /∂x i + p i (x i )m i (x i ) = 0, where m i (x i ) is the conditional expectation of ∑ j x j given x i , and this equation is equivalent to (6).
Equation (22) is explicitly solved if r(x) is given. Let Q be a fixed orthogonal matrix such that (Q x) 1 where Q denotes the matrix transpose of Q. Then, (22) is written as ∂p(Qw) The general solution is where q is any probability density function on R d−1 .
In particular, if r(x) = 0, we obtain a simple formula Example 1 in Section 2 is this solution. The class of densities (23) is characterized by a stronger condition than the Stein-type identity, i.e.,

C Closedness properties of Stein-type distributions
Let S be the set of Stein-type distributions on R d . We prove that S is closed under mixture, normalized convolution, and weak limit. Proof. This follows from the linearity of the Stein-type identity (3) with respect to µ.
Proof. The Stein-type identity with respect to X implies that for each i, because X and Y are independent. By changing the roles of X and Y , we have Their average is Thus, the Stein-type identity for aX + bY holds if and only if a 2 + b 2 = 1.
The set S is also closed under weak limit in the following sense. Denote the Euclidean norm on R d by x for x ∈ R d .
Proof. These conditions imply that ∫ ϕdµ (n) → ∫ ϕdµ for any continuous function ϕ such that |ϕ(x)| ≤ C(1 + x 2 ) for some C > 0. (Refer to Theorem 7.12 of [32].) Letting ϕ(x) be f (x i ) ∑ j x j and f (x i ), respectively, we obtain the Stein-type identity for µ. Absolute continuity of µ i is shown in the same manner as in the proof of Theorem 4.
The condition regarding moment convergence in Lemma 16 is necessary. Indeed, we can construct a sequence (W, U (n) ) of Stein-type random variables in the same manner as in Example 1 of Section 2 such that U (n) converges in law to a random variable U with E[U 2 ] = ∞.
By Lemma 15 and Lemma 16 together with the central limit theorem, if we have independent and identically distributed samples X 1 , . . . , X n according to a Stein-type distribution µ, then the limit distribution of (X 1 + · · · + X n )/ √ n is a Stein-type normal distribution that is characterized by Lemma 1.
Note that the set of copulas satisfies the same consequence as Lemma 14 and Lemma 16.
If we modify the definition of the copulas in such a way that the marginal distribution is standard normal, then the same consequence as Lemma 15 also follows.