A general version of Price's theorem

Assume that $X_{\Sigma}\in\mathbb{R}^{n}$ is a random vector following a multivariate normal distribution with zero mean and positive definite covariance matrix $\Sigma$. Let $g:\mathbb{R}^{n}\to\mathbb{C}$ be measurable and of moderate growth, e.g., $|g(x)| \lesssim (1+|x|)^{N}$. We show that the map $\Sigma\mapsto\mathbb{E}\left[g(X_{\Sigma})\right]$ is smooth, and we derive convenient expressions for its partial derivatives, in terms of certain expectations $\mathbb{E}\left[(\partial^{\alpha}g)(X_{\Sigma})\right]$ of partial (distributional) derivatives of $g$. As we discuss, this result can be used to derive bounds for the expectation $\mathbb{E}\left[g(X_{\Sigma})\right]$ of a nonlinear function $g(X_{\Sigma})$ of a Gaussian random vector $X_{\Sigma}$ with possibly correlated entries. For the case when $g(x) =g_{1}(x_{1})\cdots g_{n}(x_{n})$ has tensor-product structure, the above result is known in the engineering literature as Price's theorem, originally published in 1958. For dimension $n=2$, it was generalized in 1964 by McMahon to the general case $g:\mathbb{R}^{2}\to\mathbb{C}$. Our contribution is to unify these results, and to give a mathematically fully rigorous proof. Precisely, we consider a normally distributed random vector $X_{\Sigma}\in\mathbb{R}^{n}$ of arbitrary dimension $n\in\mathbb{N}$, and we allow the nonlinearity $g$ to be a general tempered distribution. To this end, we replace the expectation $\mathbb{E}\left[g(X_{\Sigma})\right]$ by the dual pairing $\left\langle g,\,\phi_{\Sigma}\right\rangle_{\mathcal{S}',\mathcal{S}}$, where $\phi_{\Sigma}$ denotes the probability density function of $X_{\Sigma}$.


Introduction
In this section, we first present a precise formulation of Price's theorem, the proof of which we defer to Section 4. We then briefly discuss the relevance of this theorem: In a nutshell, it is a useful tool for estimating the expectation of a nonlinear function g (X Σ ) of a Gaussian random vector X Σ ∈ R n with possibly correlated entries. In Section 3, we consider a specific example application which illustrates this. The relation of our result to the classical versions [6,5] of Price's theorem is discussed in Section 2.

Our version of Price's theorem
Let us denote by Sym n := A ∈ R n×n : A T = A the set of symmetric matrices, and by Sym + n := {A ∈ Sym n : ∀x ∈ R n \ {0} : x, Ax > 0} the set of (symmetric) positive definite matrices, where we write x, y := x T y for the standard scalar product of x, y ∈ R n . For Σ ∈ Sym + n , let and note that φ Σ is the density function of a random vector X Σ ∈ R n which follows a joint normal distribution with covariance matrix Σ, i.e., X Σ ∼ N (0, Σ), see e.g. [4, Chapter 5, Theorem 5.1].
As an important special case, note that if g : R n → C is measurable and of moderate growth, in the sense that is just the expectation of g (X Σ ), where X Σ ∼ N (0, Σ).
The main goal of this short note is to show that the function Φ g is smooth, and to derive an explicit formula for its partial derivatives. Thus, at least in the case of Equation (3), our goal is to calculate the partial derivatives of the expectation of a nonlinear function g of a Gaussian random vector X Σ ∼ N (0, Σ), as a function of the covariance matrix Σ of the vector X Σ .
In order to achieve a convenient statement of this result, we first introduce a bit more notation: Write n := {1, . . . , n}, and let (4) so that I = I ⊎ I < . Since for n > 1, the sets Sym n and Sym + n have empty interior in R n×n (because they only consist of symmetric matrices), it does not make sense to talk about partial derivatives of a function Φ : Sym + n → C, unless one interprets Sym + n as an open subset of the vector space Sym n , rather than of R n×n . As a means of fixing a coordinate system on Sym n , we therefore consider the following isomorphism between R I and Sym n : Here, we denote by (E i,j ) i,j∈n the standard basis of R n×n , i.e., (E i,j ) k,ℓ = δ i,k ·δ j,ℓ with the usual Dirac delta δ i,k . Below, instead of calculating the partial derivatives of Φ g , we will consider the function Φ g • Ω| U , where U := Ω −1 Sym + n ⊂ R I is open. Finally, we introduce some notations concerning multiindices. For α ∈ N n 0 , we use the usual notations |α| = α 1 + · · · + α n , z α = z α 1 1 · · · z αn n for z ∈ C n , and ∂ α = ∂ α 1 ∂x α 1 1 · · · ∂ αn ∂x αn n . For multiindices β = (β (i, j)) (i,j)∈I ∈ N I 0 , we introduce a few nonstandard notations: We define the flattened version of β as with the standard basis (e 1 , . . . , e n ) of R n , and in addition to |β| = (i,j)∈I β (i, j), we will also use |β| := Using these notations, our main result reads as follows: Theorem 1 (Generalized Price Theorem). Let g ∈ S ′ (R n ) be arbitrary. Then the function Φ g • Ω| U is smooth and its partial derivatives are given by Here ∂ β ♭ g denotes the usual distributional derivative of g. ◭

Remark.
Note that even if one is in the setting of Equation (3) where g : R n → C is of moderate growth, so that Φ g (Σ) = E [g (X Σ )] is a "classical" expectation, it need not be the case that the derivative ∂ β ♭ g is given by a function, let alone one of moderate growth. Therefore, it really is useful to consider the formalism of (tempered) distributions.

Relevance of Price's theorem
An important application of Price's theorem is as follows: For certain values of the covariance matrix Σ, it is usually easy to precisely calculate the expectation E [g (X Σ )], for example by using the independence of the entries of X Σ if the respective covariances vanish, or conversely by using the linear dependence between the entries of X Σ if the covariances are maximal. In addition, Price's theorem can be used to obtain (bounds for) the partial derivatives of the map In combination with standard results from multivariable calculus, one can then obtain bounds for E [g (X Σ )] for general covariance matrices Σ. Thus, Price's theorem is a tool for estimating the expectation of a nonlinear function g (X Σ ) of a Gaussian random vector X Σ , even if the entries of X Σ are correlated. An example for this type of reasoning will be given in Section 3. The result which we derive there will be an important ingredient for the upcoming paper [3].

Comparison with the classical results
The original form of Price's theorem as stated in [6] only considers the case when the nonlinearity g (x) = g 1 (x 1 ) · · · g n (x n ) has a tensor-product structure. Apart from this restriction, and up to notational differences, the formula derived in [6] is identical to the one which one gets from Theorem 1 for the special case g (x) = g 1 (x 1 ) · · · g n (x n ).
The tensor-product structure assumption concerning g was removed by McMahon [5]. Note though that McMahon only considers the case n = 2, where the covariance matrix Σ satisfies Finally, we mention the recent paper [7] in which a quantum-mechanical version of Price's theorem is established. In Section II of that paper, the author reviews the "classical" case of Price's theorem, and essentially derives the same formulas as in Theorem 1. Note though that in [7], it is assumed for calculating the k-th order derivatives of Σ → E [g (X Σ )] that the nonlinearity g is C 2k , with a certain decay condition on the derivatives. This required classical smoothness of g is not fulfilled in many applications, see e.g. Section 3.
Despite their great utility, these three versions of Price's theorem have some shortcomings-at least from a mathematical perspective: • In [6,5], the assumptions regarding the functions g 1 , . . . , g n or g are never made explicit.
The same holds for the nature of the derivatives of these functions. This is reflected in the proofs where it is assumed without justification that g 1 , . . . , g n or g can be represented as the sum of certain Laplace transforms.
• In contrast, [7] imposes explicit assumptions concerning the nonlinearity g which ensure that the derivatives of g are defined in a classical sense; but these assumptions are rather strict, and in fact not satisfied in many applications, see Section 3.
Differently from [6,5,7], our version of Price's theorem imposes precise, rather mild assumptions concerning the nonlinearity g (namely g ∈ S ′ (R n )) and precisely explains the nature of the derivative ∂ β ♭ g that appears in the theorem statement: this is just a distributional derivative.
Furthermore, maybe as a consequence of the preceding points, it seems that Price's theorem is not as well-known in the mathematical community as it deserves to be. It is my hope that the present paper may promote awareness of this result.
Before closing this section, we prove that-assuming g to be a tempered distribution-the result of [5] is indeed a special case of Theorem 1. For the "classical" form of Price's theorem considered in [6,7], this is clear.

Remark.
In particular, if both f and the (distributional) derivative ∂ 2n f ∂x n 1 ∂x n 2 are given by functions of moderate growth, then Equation (9) holds, i.e., Proof of Corollary 2. In the notation of Theorem 1, we have Since Ω A (α) = Σ α is easily seen to be positive definite, we have A (α) ∈ U . Thus, setting β := n · e (1,2) ∈ N I 0 (with the standard basis e (1,1) , e (1,2) , e (2,2) of R I ), the chain-rule shows that Θ f is smooth, with
To finish the proof, we only need to show that F τ is continuous with F τ (0) = 0. To see this, let (X, Z) ∼ N (0, I 2 ), with the the 2-dimensional identity matrix which shows that F τ is indeed continuous. Furthermore, we see by independence of X, Z that

The proof of Theorem 1
The main idea of the proof is to use Fourier analysis, since the Fourier transform Fφ Σ of the density function φ Σ will turn out to be much easier to handle than φ Σ itself. This is similar but slightly different from the approach in [6,5], where the Laplace transform is used instead. For the Fourier transform, we will use the normalization It is well-known that the restriction F : S (R n ) → S (R n ) of F is a well-defined homeomorphism, with inverse F −1 : S (R n ) → S (R n ), where F −1 ϕ (x) = (2π) −n · Fϕ (−x). By duality, the Fourier transform also extends to a bijection F : S ′ (R n ) → S ′ (R n ) defined 1 by Fg, ϕ S ′ ,S := g, Fϕ S ′ ,S for g ∈ S ′ (R n ) and ϕ ∈ S (R n ). Further, it is well-known for the distributional derivatives ∂ α g of g ∈ S ′ (R n ) defined by ∂ α g, ϕ S ′ ,S = (−1) |α| · g, ∂ α ϕ S ′ ,S that if we set X α · g, ϕ S ′ ,S = g, X α · ϕ S ′ ,S for g ∈ S ′ (R n ) and ϕ ∈ S (R n ), then we have These results can be found e.g. in [1,Chapter 14], or (with a slightly different normalization of the Fourier transform) in [2, Sections 8.3 and 9.2]. Finally, we will use the formula which is proved in [4, Chapter 5, Theorem 4.1]; in probabilistic terms, this is a statement about the characteristic function of the random vector X Σ ∼ N (0, Σ).
Next, by assumption of Theorem 1, we have g ∈ S ′ (R n ) and hence Fg ∈ S ′ (R n ). Thus, by the structure theorem for tempered distributions (see e.g. [1,Theorem 17.10]), there are L ∈ N, certain α 1 , . . . , α L ∈ N n 0 and certain polynomially bounded, continuous functions f 1 , . . . , f L satisfying Fg = L ℓ=1 ∂ α ℓ f ℓ , i.e., g = L ℓ=1 F −1 (∂ α ℓ f ℓ ). Since both sides of the target identity (8) are linear, we thus assume without loss of generality that g = F −1 (∂ α f ) for some α ∈ N n 0 and some continuous f : R n → C which is polynomially bounded, say |f (x)| ≤ C · (1 + |x|) N for all x ∈ R n and certain C > 0, N ∈ N 0 . We thus have Our first goal in the remainder of the proof is to show that one can justify "differentiation under the integral" with respect to A i,j with Σ = Ω (A) in the last integral in Equation (12).
In combination, this shows that Φ g • Ω is smooth on B ε (A 0 ), with partial derivatives given by as claimed. Since A 0 ∈ U was arbitrary, the proof is complete.