1 Introduction

In this introduction, we first present a precise formulation of our version of Price’s theorem, the proof of which we defer to Sect. 4. We then briefly discuss the relevance of this theorem: In a nutshell, it is a useful tool for estimating the expectation of a nonlinear function \(g(X_{\Sigma })\) of a Gaussian random vector \(X_{\Sigma } \in \mathbb {R}^{n}\) with possibly correlated entries. In Sect. 3, we consider a specific example application which illustrates this. The relation of our result to the classical versions [6, 8] of Price’s theorem is discussed in Sect. 2.

1.1 Our Version of Price’s Theorem

Let us denote by \(\mathrm {Sym}_{n}:= \left\{ A \in \mathbb {R}^{n \times n} \,:\, A^{T} = A \right\} \) the set of symmetric matrices, and by

$$\begin{aligned} \mathrm {Sym}_{n}^{+}:= \left\{ A \in \mathrm {Sym}_{n}\,:\, \forall x \in \mathbb {R}^{n} \setminus \{ 0 \} : \, \langle x, A x \rangle > 0 \right\} \end{aligned}$$

the set of (symmetric) positive definite matrices, where we write \(\langle x, y\rangle :=x^{T} y\) for the standard scalar product of \(x, y \in \mathbb {R}^{n}\) and \(|x| := \sqrt{\langle x,x \rangle }\) for the usual Euclidean norm. For \(\Sigma \in \mathrm {Sym}_{n}^{+}\), let

$$\begin{aligned} \phi _{\Sigma } : \mathbb {R}^{n}\rightarrow (0, \infty ), x \mapsto \big [ (2\pi )^{n} \cdot \det \Sigma \big ]^{-\frac{1}{2}} \cdot e^{-\frac{1}{2}\langle x, \Sigma ^{-1} x\rangle }, \end{aligned}$$
(1.1)

and note that \(\phi _{\Sigma }\) is the density function of a centered random vector \(X_{\Sigma }\in \mathbb {R}^{n}\) which follows a joint normal distribution with covariance matrix \(\Sigma \)— that is, \(X_{\Sigma } \sim N(0,\Sigma )\); see for instance [5, Chapter 5, Theorem 5.1].

Let us briefly recall the notion of Schwartz functions and tempered distributions, which will play an important role in what follows. First, with \(\mathbb {N}= \{ 1,2,\dots \}\) and \(\mathbb {N}_0 = \{0\} \cup \mathbb {N}\), any \(\alpha \in \mathbb {N}_0^n\) will be called a multiindex, and we write \(|\alpha | = \alpha _1 + \dots + \alpha _n\) as well as \(\partial ^\alpha = \frac{\partial ^{\alpha _1}}{\partial x_1^{\alpha _1}} \cdots \frac{\partial ^{\alpha _n}}{\partial x_n^{\alpha _n}}, \) and \(z^{\alpha } = z_1^{\alpha _1} \cdots z_n^{\alpha _n}\) for \(z \in \mathbb {C}^n\). Finally, given \(\alpha ,\beta \in \mathbb {N}_0^n\), we write \(\beta \le \alpha \) if \(\beta _j \le \alpha _j\) for all \(j \in \{ 1,\dots ,n \}\). With this notation, it is not hard to see that the density function \(\phi _\Sigma \) from above belongs to the Schwartz class

$$\begin{aligned} \mathcal {S}(\mathbb {R}^{n})= & {} \left\{ g \in C^{\infty }(\mathbb {R}^{n};\mathbb {C}) \,:\, \forall \, \alpha \in \mathbb {N}_{0}^{n} \, \forall \, N \in \mathbb {N}\, \exists \, C > 0 \, \forall \, x \in \mathbb {R}^{n}: \right. \\&\qquad \left. \, | \partial ^{\alpha } g(x)| \le C \cdot (1 \! + \! |x| )^{-N} \right\} \end{aligned}$$

of smooth, rapidly decaying functions; see for instance [3, Chapter 8] for more details on this space. In fact, \( \phi _{\Sigma }(x) = c_{\Sigma } \cdot e^{-\frac{1}{2} \langle \Sigma ^{-1/2}x, \Sigma ^{-1/2}x \rangle } =c_{\Sigma } \cdot \Phi (\Sigma ^{-1/2} x) \), where \(\Phi \) is the usual Gaussian function \(\Phi (x) = e^{-\frac{1}{2} |x|^{2}}\), which is well-known to belong to \(\mathcal {S}(\mathbb {R}^{n})\).

The space \(\mathcal {S}'(\mathbb {R}^n)\) of tempered distributions consists of all linear functionals \(g : \mathcal {S}(\mathbb {R}^n) \rightarrow \mathbb {C}\) which are continuous with respect to the usual topology on \(\mathcal {S}(\mathbb {R}^n)\); see [3, Sections 8.1 and 9.2] for the details. Since \(\phi _\Sigma \in \mathcal {S}(\mathbb {R}^n)\), given any tempered distribution \(g \in \mathcal {S}' (\mathbb {R}^{n})\), the function

$$\begin{aligned} \Phi _{g}: \mathrm {Sym}_{n}^{+}\rightarrow \mathbb {C}, \Sigma \mapsto \langle g, \, \phi _{\Sigma } \rangle _{\mathcal {S}',\mathcal {S}} \end{aligned}$$
(1.2)

is well-defined, where \(\langle \cdot , \cdot \rangle _{\mathcal {S}',\mathcal {S}}\) denotes the (bilinear) dual pairing between \(\mathcal {S}'(\mathbb {R}^{n})\) and \(\mathcal {S}(\mathbb {R}^{n})\). As an important special case, note that if \(g : \mathbb {R}^{n} \rightarrow \mathbb {C}\) is measurable and of moderate growth, in the sense that \(x \mapsto (1 + |x|)^{-N} \cdot g(x) \in L^{1} (\mathbb {R}^{n})\) for some \(N\in \mathbb {N}\), then

$$\begin{aligned} \Phi _{g}(\Sigma ) = \mathbb {E} \big [ g(X_{\Sigma }) \big ] \end{aligned}$$
(1.3)

is just the expectation of \(g(X_{\Sigma })\), where \(X_{\Sigma } \sim N(0,\Sigma )\). Here, we identify as usual the function g with the tempered distribution \(\mathcal {S}(\mathbb {R}^n) \rightarrow \mathbb {C}, \varphi \mapsto \int g(x) \varphi (x) \, d x\).

The main goal of this note is to show for each \(g \in \mathcal {S}'(\mathbb {R}^n)\) that the function \(\Phi _{g}\) is smooth, and to derive an explicit formula for its partial derivatives. Thus, at least in the case of Equation (1.3), our goal is to calculate the partial derivatives of the expectation of a nonlinear function g of a Gaussian random vector \(X_{\Sigma } \sim N(0,\Sigma )\), as a function of the covariance matrix \(\Sigma \) of the vector \(X_{\Sigma }\).

In order to achieve a convenient statement of this result, we first introduce a bit more notation: Write \(\underline{n} := \left\{ 1,\dots ,n\right\} \), and let

$$\begin{aligned} I:= & {} \left\{ (i,j) \in \underline{n} \times \underline{n} \,:\, i \le j \right\} , \quad I_{\parallel }:=\left\{ (i,i) \,:\, i \in \underline{n}\right\} ,\nonumber \\ I_{<}:= & {} \left\{ (i,j) \in \underline{n} \times \underline{n} \,:\, i < j \right\} , \end{aligned}$$
(1.4)

so that \(I = I_{\parallel } \uplus I_{<}\). Since for \(n > 1\), the sets \(\mathrm {Sym}_{n}\) and \(\mathrm {Sym}_{n}^{+}\) have empty interior in \(\mathbb {R}^{n\times n}\) (because they only consist of symmetric matrices), it does not make sense to talk about partial derivatives of a function \(\Phi : \mathrm {Sym}_{n}^{+}\rightarrow \mathbb {C}\), unless one interprets \(\mathrm {Sym}_{n}^{+}\) as an open subset of the vector space \(\mathrm {Sym}_{n}\), rather than of \(\mathbb {R}^{n\times n}\). As a means of fixing a coordinate system on \(\mathrm {Sym}_{n}\), we therefore parameterize the set of symmetric matrices by their “upper half”; precisely, we consider the following isomorphism between \(\mathbb {R}^{I}\) and \(\mathrm {Sym}_{n}\):

$$\begin{aligned} \Omega : \mathbb {R}^{I} \rightarrow \mathrm {Sym}_{n},\ \left( A_{i,j} \right) _{1\le i\le j\le n} \mapsto \sum _{i\le j} A_{i,j} E_{i,j} + \sum _{i>j} A_{j,i} E_{i,j}. \end{aligned}$$
(1.5)

Here, we denote by \((E_{i,j})_{i,j\in \underline{n}}\) the standard basis of \(\mathbb {R}^{n\times n}\), meaning that \((E_{i,j})_{k,\ell } = \delta _{i,k} \cdot \delta _{j,\ell }\) with the usual Dirac delta \(\delta _{i,k}\). Below, instead of calculating the partial derivatives of \(\Phi _{g}\), we will consider the function \(\Phi _{g}\circ \Omega |_{U}\), where \(U:=\Omega ^{-1}\left( \mathrm {Sym}_{n}^{+}\right) \subset \mathbb {R}^{I}\) is open.

In order to achieve a concise formulation of our version of Price’s theorem, we need two non-standard notions regarding multiindices \(\beta = \big ( \beta (i,j) \big )_{(i,j) \in I} \in \mathbb {N}_0^I\). Namely, we define the flattened version of \(\beta \) as

$$\begin{aligned} \beta _{\flat } := \sum _{(i,j) \in I} \beta (i,j) \, (e_{i} + e_{j}) \in \mathbb {N}_{0}^{n} \quad \text { with the standard basis } (e_{1},\dots ,e_{n}) \text { of } \mathbb {R}^{n}, \end{aligned}$$
(1.6)

and in addition to \(|\beta | = \sum _{(i,j) \in I} \beta (i,j)\), we will also use

$$\begin{aligned} |\beta |_{\parallel } := \sum _{(i,j) \in I_{\parallel }} \beta (i,j) = \sum _{i \in \underline{n}} \, \beta (i,i). \end{aligned}$$
(1.7)

With this notation, our main result reads as follows:

Theorem 1

(Generalized version of Price’s theorem) Let \(g \in \mathcal {S}'(\mathbb {R}^{n})\) be arbitrary. Then the function \(\Phi _{g} \circ \Omega |_{U} : U \rightarrow \mathbb {C}\) is smooth and its partial derivatives are given by

$$\begin{aligned} \partial ^{\beta } \left( \Phi _{g} \circ \Omega \right) (A) = (1/2)^{|\beta |_{\parallel }} \cdot \left\langle \partial ^{\beta _{\flat }}g\,,\phi _{\Omega (A)}\right\rangle _{\mathcal {S}',\mathcal {S}} \qquad \forall \, A \in U = \Omega ^{-1}(\mathrm {Sym}_{n}^{+}) \, \forall \, \beta \in \mathbb {N}_{0}^{I}. \end{aligned}$$
(1.8)

Here \(\partial ^{\beta _{\flat }}g\) denotes the usual distributional derivative of g.

Remark 1

Note that even if one is in the setting of Equation (1.3) where \(g : \mathbb {R}^{n} \rightarrow \mathbb {C}\) is of moderate growth, so that \(\Phi _{g}(\Sigma ) = \mathbb {E} \left[ g(X_{\Sigma })\right] \) is a “classical” expectation, it need not be the case that the derivative \(\partial ^{\beta _{\flat }}g\) is given by a function, let alone one of moderate growth. Therefore, it really is useful to consider the formalism of (tempered) distributions.

1.2 Relevance of Price’s Theorem

An important application of Price’s theorem is as follows: For certain values of the covariance matrix \(\Sigma \), it is usually easy to precisely calculate the expectation \(\mathbb {E}\left[ g(X_{\Sigma }) \right] \)—for example if \(\Sigma \) is a diagonal matrix, in which case the entries of \(X_\Sigma \) are independent. As a complement to such special cases where explicit calculations are possible, Price’s theorem can be used to obtain (bounds for) the partial derivatives of the map \(\Sigma \mapsto \mathbb {E}\left[ g(X_{\Sigma }) \right] \). In combination with standard results from multivariable calculus, one can then obtain bounds for \(\mathbb {E} \left[ g(X_{\Sigma })\right] \) for general covariance matrices \(\Sigma \). Thus, Price’s theorem is a tool for estimating the expectation of a nonlinear function \(g(X_{\Sigma })\) of a Gaussian random vector \(X_{\Sigma }\), even if the entries of \(X_{\Sigma }\) are correlated.

An example for this type of reasoning will be given in Sect. 3. There, we apply our version of Price’s theorem to show that if \(f (x) = f_\tau (x)\) “clips” \(x \in \mathbb {R}\) to the interval \([-\tau , \tau ]\) and if \((X_\alpha , Y_\alpha ) \sim N(0, \Sigma _\alpha )\) for \(\Sigma _\alpha = \left( {\begin{matrix} 1 &{} \alpha \\ \alpha &{} 1 \end{matrix}} \right) \), then the map \(F_\tau : [0,1] \rightarrow \mathbb {R}, \alpha \mapsto \mathbb {E}[f_\tau (X_\alpha ) f_\tau (Y_\alpha )]\) is convex and satisfies \(F_\tau (0) = 0\). Thus, \(F_\tau (\alpha ) \le \alpha \cdot F_\tau (1)\), where \(F_\tau (1)\) is easy to bound since \(X_1 = Y_1\) almost surely. These facts constitute important ingredients in [4]; see Theorem A.4 and the proof of Lemma A.3 in that paper.

2 Comparison with the Classical Results

The original form of Price’s theorem as stated in [8] only concerns the case when the nonlinearity \(g(x) = g_{1}(x_{1}) \cdots g_{n} (x_{n})\) has a tensor-product structure. In this special case, the formula derived in [8] is identical to the one given by Theorem 1, up to notational differences.

This tensor-product structure assumption concerning g was removed by McMahon [6] and Papoulis [7] in the case of Gaussian random vectors of dimension \(n = 2\) with covariance matrix of the form \({ \Sigma = \Sigma _{\alpha } = \left( {\begin{matrix} 1 &{} \alpha \\ \alpha &{} 1 \end{matrix}} \right) }\) with \(\alpha \in ( -1,1)\). Precisely, if \(X_{\alpha } \sim N(0,\Sigma _{\alpha })\), then [6] states for \(g : \mathbb {R}^{2} \rightarrow \mathbb {C}\) that

$$\begin{aligned} \Theta _{g} : (-1,1) \!\rightarrow \!\mathbb {C}, \alpha \!\mapsto \!\mathbb {E}\left[ g(X_{\alpha })\right] \quad \text { is smooth with } \quad \Theta _{g}^{(n)}(\alpha ) \!=\! \mathbb {E}\left[ \! \frac{\partial ^{2n}g}{\partial x_{1}^{n}\partial x_{2}^{n}} (X_{\alpha }) \!\right] . \end{aligned}$$
(2.1)

Based on the work by Papoulis, Brown [1] showed that Price’s theorem holds for Gaussian random vectors X of general dimensionality and unit variance \(\Sigma _{i,i} = \mathbb {E}[(X_\Sigma )_i^2] = 1\), if one takes derivatives with respect to the covariances \(\Sigma _{i,j} = \mathbb {E}[(X_\Sigma )_i \, (X_\Sigma )_j]\) where \(i \ne j\). In this setting, Brown also showed that Price’s theorem characterizes the normal distribution; more precisely, if \((X_\Sigma )_{\Sigma }\) is a (sufficiently nice) family of random vectors with \(\mathrm {Cov}(X_\Sigma ) = \Sigma \) which satisfies the conclusion of Price’s theorem, then \(X_\Sigma \sim N(0,\Sigma )\) is necessarily normally distributed. This extends and corrects the original work of Price [8], where a similar claim was made.

Finally, we mention the article [9] in which a quantum-mechanical version of Price’s theorem is established. In Sect. 2 of that paper, the author reviews the “classical” case of Price’s theorem, and essentially derives the same formulas as in Theorem 1.

Despite their great utility, the existing versions of Price’s theorem have some shortcomings—at least from a mathematical perspective:

  • In [1, 6, 8], the assumptions regarding the functions \(g_{1},\dots ,g_{n}\) or g are never made explicit. In particular, it is assumed in [6, 8] without justification that \(g_{1},\dots ,g_{n}\) or g can be represented as the sum of certain Laplace transforms. Likewise, Papoulis [7] assumes that g satisfies the decay condition \(|g(x,y)| \lesssim e^{|(x,y)|^\beta }\) for some \(\beta < 2\), but does not impose any restrictions on the regularity of g. Finally, [1] is mainly concerned with showing that Price’s theorem only holds for normally distributed random vectors, and simply refers to [7] for the proof that Price’s theorem does indeed hold for normal random vectors.

    None of the papers [1, 6,7,8], explains the nature of the derivative of g (classical, distributional, etc.) which appears in the derived formula.

  • In contrast, for calculating the k-th order derivatives of \(\Sigma \mapsto \mathbb {E}[g(X_\Sigma )]\), it is assumed in [9] that the nonlinearity g is \(C^{2k}\), with a certain decay condition concerning the derivatives. This classical smoothness of g, however, does not hold in many applications; see Sect. 3.

Differently from [1, 6,7,8,9], our version of Price’s theorem imposes precise, rather mild assumptions concerning the nonlinearity g (namely \(g \in \mathcal {S}' (\mathbb {R}^{n})\)) and precisely explains the nature of the derivative \(\partial ^{\beta _{\flat }}g\) that appears in the theorem statement: this is just a distributional derivative.

Furthermore, maybe as a consequence of the preceding points, it seems that Price’s theorem is not as well-known in the mathematical community as it deserves to be. It is my hope that the present paper may promote this result.

Before closing this section, we prove that—assuming g to be a tempered distribution—the result of [6, 7] is indeed a special case of Theorem 1. With similar arguments, one can show that the forms of Price’s theorem considered in [1, 8, 9] are covered by Theorem 1 as well.

Corollary 1

Let \(g \in \mathcal {S}' (\mathbb {R}^{2})\). For \(\alpha \in (-1,1)\), let \(\Sigma _{\alpha } := \left( {\begin{matrix} 1 &{} \alpha \\ \alpha &{} 1 \end{matrix}} \right) \). Let

$$\begin{aligned} \Theta _{g} : (-1,1) \rightarrow \mathbb {C}, \quad \alpha \mapsto \langle g,\, \phi _{\Sigma _{\alpha }} \rangle _{\mathcal {S}',\mathcal {S}}, \end{aligned}$$

where \(\phi _{\Sigma _{\alpha }} : \mathbb {R}^2 \rightarrow (0,\infty )\) denotes the probability density function of \(X_{\alpha }\sim N(0,\Sigma _{\alpha })\).

Then \(\Theta _{g}\) is smooth with n-th derivative \(\Theta _{g}^{(n)} (\alpha ) = \left\langle \frac{\partial ^{2n}g}{\partial x_{1}^{n}\partial x_{2}^{n}},\, \phi _{\Sigma _{\alpha }} \right\rangle _{\mathcal {S}',\mathcal {S}} \) for \(\alpha \in (-1,1)\).

Remark 2

In particular, if both g and the (distributional) derivative \(\frac{\partial ^{2n}g}{\partial x_{1}^{n}\partial x_{2}^{n}}\) are given by functions of moderate growth, then Equation (2.1) holds, i.e.,

$$\begin{aligned} \frac{d^{n}}{d\alpha ^{n}} \mathbb {E}\bigl [\, g(X_{\alpha }) \,\bigr ] =\mathbb {E} \left[ \frac{\partial ^{2n}g}{\partial x_{1}^{n}\partial x_{2}^{n}}(X_{\alpha }) \right] . \end{aligned}$$

Proof of Corollary 1

In the notation of Theorem 1, we have

$$\begin{aligned} \Theta _{g}(\alpha )= & {} \left( \Phi _{g} \circ \Omega \right) \bigl (A^{(\alpha )}\bigr ) \,\,\,\,\text {with}\,\,\,\, A_{i,j}^{(\alpha )} = {\left\{ \begin{array}{ll} 1, &{} \text {if } i = j, \\ \alpha , &{} \text {if } i \ne j \end{array}\right. }\\&\,\,\,\,\text {for}\,\,\,\, (i,j) \in I = \left\{ (1,1), (1,2), (2,2) \right\} \! . \end{aligned}$$

Since \(\Omega \left( A^{(\alpha )}\right) = \Sigma _{\alpha }\) is easily seen to be positive definite, we have \(A^{(\alpha )} \in U\). Now, setting \({\beta := n \cdot e_{(1,2)} \in \mathbb {N}_{0}^{I}}\) (with the standard basis \(e_{(1,1)},e_{(1,2)},e_{(2,2)}\) of \(\mathbb {R}^{I}\)), the flattened version \(\beta _{\flat }\) of \(\beta \) satisfies \(\beta _{\flat } = n e_1 + n e_2 = (n,n)\). Thus, Theorem 1 and the chain-rule show that \(\Theta _{g}\) is smooth, with

$$\begin{aligned} \Theta _{g}^{(n)}(\alpha )&= \frac{d^{n}}{d\alpha ^{n}} \left[ (\Phi _{g}\circ \Omega )\bigl (A^{(\alpha )}\bigr ) \right] = \left[ \partial ^{\beta }(\Phi _{g} \circ \Omega ) \right] \bigl (A^{(\alpha )}\bigr ) \\&= \left\langle \partial ^{\beta _{\flat }}g\,, \phi _{\Omega (A^{(\alpha )})} \right\rangle _{\mathcal {S}',\mathcal {S}} = \left\langle \frac{\partial ^{2n}g}{\partial x_{1}^{n}\partial x_{2}^{n}},\, \phi _{\Sigma _{\alpha }} \right\rangle _{\mathcal {S}',\mathcal {S}}. \end{aligned}$$

\(\square \)

3 An Example of an Application of Price’s Theorem

In this section, we derive bounds for the expectation \(\mathbb {E} [f_\tau (X_\alpha ) f_\tau (Y_\alpha )]\), where \(X_{\alpha },Y_{\alpha }\) follow a joint normal distribution with covariance matrix \( \left( {\begin{matrix} 1 &{} \alpha \\ \alpha &{} 1 \end{matrix}} \right) ,\) and where the nonlinearity \(f_\tau \) is just a truncation (or clipping) to the interval \([-\tau ,\tau ]\). We remark that this example has already been considered by Price [8] himself, but that his arguments are not completely mathematically rigorous, as explained in Sect. 2. Precisely, we obtain the following result:

Lemma 1

Let \(\tau > 0\) be arbitrary, and define

$$\begin{aligned} f_{\tau }: \mathbb {R}\rightarrow \mathbb {R}, x \mapsto {\left\{ \begin{array}{ll} \tau , &{} \text {if } x \ge \tau ,\\ x, &{} \text {if } x \in [-\tau ,\tau ],\\ -\tau , &{} \text {if } x \le -\tau . \end{array}\right. } \end{aligned}$$

For \(\alpha \in [-1,1]\), set \( \Sigma _{\alpha } := \left( {\begin{matrix} 1 &{} \alpha \\ \alpha &{} 1 \end{matrix}} \right) , \) and let \((X_{\alpha },Y_{\alpha }) \sim N(0,\Sigma _{\alpha })\). Finally, define

$$\begin{aligned} F_{\tau } : [-1,1] \rightarrow \mathbb {R}^{2}, \qquad \alpha \mapsto \mathbb {E} \big [f_{\tau }(X_{\alpha }) \cdot f_{\tau }(Y_{\alpha }) \big ]. \end{aligned}$$

Then \(F_{\tau }\) is continuous and \(F_{\tau }|_{[0,1]}\) is convex with \(F_{\tau }(0) = 0\). In particular, \(F_{\tau }(\alpha ) \le \alpha \cdot F_{\tau }(1)\) for all \(\alpha \in [0,1]\).

Proof

It is easy to see that \(f_{\tau }\) is bounded and Lipschitz continuous, so that \(f_{\tau }\in W^{1,\infty }(\mathbb {R})\) with weak derivative \(f_{\tau }' = {\mathbb {1}}_{(-\tau ,\tau )}\). Therefore, using the notation \((g \otimes h)(x,y) = g(x) \, h(y)\), we see that \(g_{\tau } := f_{\tau } \otimes f_{\tau }\in W^{1,\infty }(\mathbb {R}^{2}) \subset \mathcal {S}'(\mathbb {R}^{2})\), with weak derivative \( \frac{\partial ^{2}g_{\tau }}{\partial x_{1}\partial x_{2}} = {\mathbb {1}}_{(-\tau ,\tau )} \otimes {\mathbb {1}}_{(-\tau ,\tau )} = {\mathbb {1}}_{(-\tau ,\tau )^{2}} . \) Directly from the definition of the weak derivative, in combination with Fubini’s theorem and the fundamental theorem of calculus, we thus see for each \(\phi \in \mathcal {S}(\mathbb {R}^2)\) that

$$\begin{aligned} \left\langle \frac{\partial ^4 \, g_\tau }{\partial x_1^2 \partial x_2^2} ,\,\, \phi \right\rangle _{\mathcal {S}',\mathcal {S}}&= \left\langle \frac{\partial ^2 \, g_\tau }{\partial x_1 \partial x_2} ,\,\, \frac{\partial ^2 \phi }{\partial x_1 \partial x_2} \right\rangle _{\mathcal {S}',\mathcal {S}} = \int _{-\tau }^\tau \int _{-\tau }^\tau \Big ( \frac{\partial ^2 \phi }{\partial x_1 \partial x_2} \Big )(t_1, t_2) \, d t_1 \, d t_2 \\&= \int _{-\tau }^\tau \Big ( \frac{\partial \phi }{\partial x_2} \Big ) (\tau , t_2) - \Big ( \frac{\partial \phi }{\partial x_2} \Big ) (-\tau , t_2) \, d t_2 \\&= \phi (\tau ,\tau ) - \phi (-\tau ,\tau ) - \phi (\tau ,-\tau ) + \phi (-\tau ,-\tau ) . \end{aligned}$$

Now, Corollary 1 shows that \(F_{\tau }|_{(-1,1)} = \Theta _{g_{\tau }}\) is smooth with

$$\begin{aligned}&F_{\tau }'' (\alpha ) = \left\langle \frac{\partial ^{4} g_{\tau }}{\partial x_{1}^{2}\partial x_{2}^{2}} ,\, \phi _{\Sigma _{\alpha }} \right\rangle _{\mathcal {S}',\mathcal {S}}\nonumber \\&\quad = \phi _{\Sigma _{\alpha }}(\tau ,\tau ) - \phi _{\Sigma _{\alpha }}(-\tau ,\tau ) - \phi _{\Sigma _{\alpha }}(\tau ,-\tau ) + \phi _{\Sigma _{\alpha }}(-\tau ,-\tau ) \end{aligned}$$

for \(\alpha \in (-1,1)\). We want to show \(F_{\tau }''(\alpha ) \ge 0\) for \(\alpha \in [0,1 )\). Since \(\phi _{\Sigma _{\alpha }}\) is symmetric, it suffices to show \(\phi _{\Sigma _{\alpha }}(\tau ,\tau )-\phi _{\Sigma _{\alpha }}(-\tau ,\tau ) \ge 0\), which is easily seen to be equivalent to

$$\begin{aligned}&\quad \exp \left( -\frac{1}{2(1-\alpha ^{2})}(2\tau ^{2} - 2\alpha \tau ^{2}) \right) \overset{!}{\ge } \exp \left( -\frac{1}{2(1-\alpha ^{2})}(2\tau ^{2} + 2\alpha \tau ^{2}) \right) \\ \Longleftrightarrow&\quad 2\tau ^{2} + 2\alpha \tau ^{2} \overset{!}{\ge } 2\tau ^{2} - 2\alpha \tau ^{2} \qquad \Longleftrightarrow \qquad 4 \alpha \tau ^{2} \overset{!}{\ge } 0, \end{aligned}$$

which clearly holds for \(\alpha \in [0,1)\).

To finish the proof, we only need to show that \(F_{\tau }\) is continuous with \(F_{\tau }(0) = 0\). To see this, let \((X,Z)\sim N(0,I_{2})\), with the 2-dimensional identity matrix \(I_{2}\). For \(\alpha \in [-1,1]\), it is then not hard to see that \(Y_{\alpha } := \alpha X + \sqrt{1-\alpha ^{2}}Z\) satisfies \((X,Y_{\alpha }) \sim N(0,\Sigma _{\alpha })\). Therefore, we see for \(\alpha , \beta \in [-1,1]\) that

$$\begin{aligned} \left| F_{\tau }(\alpha ) - F_{\tau }(\beta ) \right|&= \big | \mathbb {E}\left[ g_{\tau }(X,Y_{\alpha })\right] - \mathbb {E}\left[ g_{\tau }(X,Y_{\beta })\right] \big |\\&= \big | \mathbb {E}\big [ f_\tau (X) \cdot \bigl ( f_\tau (Y_\alpha ) - f_\tau (Y_\beta ) \bigr ) \big ] \big | \\ \left( {\scriptstyle \text {since }\left| f_{\tau }\left( X\right) \right| \le \tau }\right)&\le \tau \cdot \mathbb {E} \left| f_{\tau }(Y_{\alpha }) - f_{\tau }(Y_{\beta }) \right| \\ \left( {\scriptstyle \text {since }f_{\tau }\text { is }1\text {-Lipschitz}}\right)&\le \tau \cdot \mathbb {E} \left| Y_{\alpha } - Y_{\beta } \right| \\&\le \tau \cdot | \alpha - \beta | \cdot \mathbb {E}|X| + \tau \cdot \left| \sqrt{1-\alpha ^{2}} - \sqrt{1-\beta ^{2}} \right| \cdot \mathbb {E} |Z| \xrightarrow [\beta \rightarrow \alpha ]{} 0, \end{aligned}$$

which shows that \(F_{\tau }\) is indeed continuous. Furthermore, we see by independence of XZ that

$$\begin{aligned} F_{\tau }(0) =\mathbb {E}[f_{\tau }(X)\cdot f_{\tau }(Z)] =\mathbb {E}[f_{\tau }(X)] \cdot \mathbb {E}[f_{\tau }(Z)] =0, \end{aligned}$$

since \( \mathbb {E}[f_{\tau }(X)] = -\mathbb {E}[f_{\tau }(-X)] = -\mathbb {E}[f_{\tau }(X)] , \) because of \(X \sim -X\) and \(f_{\tau }(x) = - f_{\tau }(-x)\) for \(x \in \mathbb {R}\). \(\square \)

4 The Proof of Theorem 1

The main idea of the proof is to use Fourier analysis, since the Fourier transform \(\mathcal {F}\phi _{\Sigma }\) of the density function \(\phi _{\Sigma }\) will turn out to be much easier to handle than \(\phi _{\Sigma }\) itself. This is similar to the approach in [1, 7] but slightly different from the approach in [6, 8], where the Laplace transform is used instead.

For the Fourier transform, we will use the normalization

$$\begin{aligned} \mathcal {F}\varphi (\xi ) :=\widehat{\varphi }\left( \xi \right) := \int _{\mathbb {R}^{n}} \varphi (x) \cdot e^{-i \langle x,\xi \rangle } \,dx \quad \text { for } \xi \in \mathbb {R}^{n} \text { and } \varphi \in L^{1}(\mathbb {R}^{n}). \end{aligned}$$

It is well-known that the restriction \(\mathcal {F}: \mathcal {S}(\mathbb {R}^{n}) \rightarrow \mathcal {S}(\mathbb {R}^{n})\) of \(\mathcal {F}\) is a well-defined homeomorphism, with inverse \(\mathcal {F}^{-1} : \mathcal {S}(\mathbb {R}^{n}) \rightarrow \mathcal {S}(\mathbb {R}^{n})\), where \(\mathcal {F}^{-1}\varphi (x) = (2\pi )^{-n} \cdot \mathcal {F}\varphi (-x)\). By duality, the Fourier transform also extends to a bijection \(\mathcal {F}: \mathcal {S}'(\mathbb {R}^{n}) \rightarrow \mathcal {S}'(\mathbb {R}^{n})\) definedFootnote 1 by \( \langle \mathcal {F}g,\,\varphi \rangle _{\mathcal {S}',\mathcal {S}} := \langle g,\,\mathcal {F}\varphi \rangle _{\mathcal {S}',\mathcal {S}} \) for \(g \in \mathcal {S}'(\mathbb {R}^{n})\) and \(\varphi \in \mathcal {S}(\mathbb {R}^{n})\). Further, it is well-known for the distributional derivatives \(\partial ^{\alpha } g\) of \(g \in \mathcal {S}'(\mathbb {R}^{n})\) defined by \( \left\langle \partial ^{\alpha } g,\, \varphi \right\rangle _{\mathcal {S}',\mathcal {S}} = (-1)^{|\alpha |} \cdot \left\langle g,\, \partial ^{\alpha } \varphi \right\rangle _{\mathcal {S}',\mathcal {S}} \) that if we set

$$\begin{aligned} X^{\alpha } \cdot \varphi : \mathbb {R}^{n} \rightarrow \mathbb {C}, x\mapsto x^{\alpha }\cdot \varphi (x) \quad \text { and } \quad \langle X^{\alpha }\cdot g ,\, \varphi \rangle _{\mathcal {S}',\mathcal {S}} = \langle g,\, X^{\alpha } \cdot \varphi \rangle _{\mathcal {S}',\mathcal {S}} \end{aligned}$$

for \(g \in \mathcal {S}'(\mathbb {R}^{n})\) and \(\varphi \in \mathcal {S}(\mathbb {R}^{n})\), then we have

$$\begin{aligned} \mathcal {F}\left[ \partial ^{\alpha } g \right] = i^{|\alpha |} \cdot X^{\alpha } \cdot \mathcal {F}g \qquad \forall \, g \in \mathcal {S}'(\mathbb {R}^{n}), \,\, \alpha \in \mathbb {N}_0^n . \end{aligned}$$
(4.1)

These results can be found e.g., in [2, Chapter 14], or (with a slightly different normalization of the Fourier transform) in [3, Sections 8.3 and 9.2].

Finally, we will use the formula

$$\begin{aligned} (2\pi )^{n} \cdot \mathcal {F}^{-1}\phi _{\Sigma }(\xi )= & {} \int _{\mathbb {R}^{n}} \!\!\! e^{i\left\langle x, \xi \right\rangle } \, \phi _{\Sigma }(x) \,dx = \mathbb {E}\left[ e^{i\left\langle \xi ,X_{\Sigma }\right\rangle }\right] \nonumber \\= & {} e^{-\frac{1}{2}\left\langle \xi ,\Sigma \xi \right\rangle } =:\psi _{\Sigma }(\xi ) \,\,\,\text {for}\,\,\, \xi \in \mathbb {R}^{n}, \end{aligned}$$
(4.2)

which is proved in [5, Chapter 5, Theorem 4.1]; in probabilistic terms, this is a statement about the characteristic function of the random vector \(X_{\Sigma } \sim N(0,\Sigma )\).

Next, by the assumption of Theorem 1, we have \(g \in \mathcal {S}'(\mathbb {R}^{n})\) and hence \(\mathcal {F}g \in \mathcal {S}'(\mathbb {R}^{n})\). Thus, by the structure theorem for tempered distributions (see for instance [2, Theorem 17.10]), there are \(L \in \mathbb {N}\), certain \(\alpha _{1}, \dots , \alpha _{L} \in \mathbb {N}_{0}^{n}\) and certain polynomially bounded, continuous functions \(f_{1}, \dots , f_{L} : \mathbb {R}^n \rightarrow \mathbb {C}\) satisfying \(\mathcal {F}g = \sum _{\ell =1}^{L} \partial ^{\alpha _{\ell }} f_{\ell }\), i.e., \(g = \sum _{\ell =1}^{L} \mathcal {F}^{-1} ( \partial ^{\alpha _{\ell }} f_{\ell } )\). Since both sides of the target identity (1.8) are linear with respect to g, we can thus assume without loss of generality that \(g = \mathcal {F}^{-1}(\partial ^{\alpha } f)\) for some \(\alpha \in \mathbb {N}_{0}^{n}\) and some continuous \(f:\mathbb {R}^{n} \rightarrow \mathbb {C}\) which is polynomially bounded, say \(|f(\xi )| \le C \cdot (1+|\xi |)^{N}\) for all \(\xi \in \mathbb {R}^{n}\) and certain \(C > 0\), \(N \in \mathbb {N}_{0}\). We thus have

$$\begin{aligned} \begin{aligned} \Phi _{g}(\Sigma )&= \left\langle g,\,\phi _{\Sigma }\right\rangle _{\mathcal {S}',\mathcal {S}} = \left\langle g,\, \mathcal {F}\mathcal {F}^{-1} \phi _{\Sigma } \right\rangle _{\mathcal {S}',\mathcal {S}} = \left\langle \mathcal {F}g,\, \mathcal {F}^{-1} \phi _{\Sigma } \right\rangle _{\mathcal {S}',\mathcal {S}} \\ \left( {\scriptstyle \text {Eq. (4.2)}}\right)&= (2\pi )^{-n} \cdot \left\langle \partial ^{\alpha }f,\, \psi _{\Sigma } \right\rangle _{\mathcal {S}',\mathcal {S}}\\&= (-1)^{|\alpha |} \cdot (2\pi )^{-n} \cdot \left\langle f,\, \partial ^{\alpha } \psi _{\Sigma } \right\rangle _{\mathcal {S}',\mathcal {S}} \\&= (-1)^{|\alpha |} \cdot (2\pi )^{-n} \cdot \int _{\mathbb {R}^{n}} f(\xi ) \cdot \left( \partial ^{\alpha }\psi _{\Sigma }\right) (\xi ) \,d\xi \quad \text { for all } \Sigma \in \mathrm {Sym}_{n}^{+}. \end{aligned} \end{aligned}$$
(4.3)

Our first goal in the remainder of the proof is to show that one can justify “differentiation under the integral” with respect to \(A_{i,j}\) with \(\Sigma = \Omega (A)\) in the last integral in Equation (4.3).

It is easy to see that \(A \mapsto \psi _{\Omega (A)}(\xi )\) is smooth, with partial derivative

$$\begin{aligned} \partial _{A_{i,j}}\,\psi _{\Omega (A)}(\xi )&= e^{-\frac{1}{2} \left\langle \xi , \Omega (A)\xi \right\rangle } \cdot \partial _{A_{i,j}} \bigg ( -\frac{1}{2} \cdot \sum _{k,\ell =1}^{n} \big [ \Omega (A) \big ]_{k,\ell } \cdot \xi _{k} \xi _{\ell } \bigg ) \\&= {\left\{ \begin{array}{ll} - \frac{1}{2} \cdot \xi _{i} \, \xi _{j} \cdot \psi _{\Omega (A)}(\xi ), &{} \text {if } i = j, \\ - \xi _{i} \, \xi _{j} \, \cdot \psi _{\Omega (A)}(\xi ), &{} \text {if } i < j \end{array}\right. } \end{aligned}$$

for all \(\xi \in \mathbb {R}^{n}\) and arbitrary \((i,j) \in I\) and \(A \in U\). Given \(\beta \in \mathbb {N}_0^I\), let us write \(\partial _A^\beta \) for the partial derivative of order \(\beta \) with respect to \(A \in \mathbb {R}^I\). Then, a straightforward induction using the preceding identity shows (with \(|\beta |_{\parallel }\) and \(\beta _{\flat }\) as in (1.7) and (1.6)) that

$$\begin{aligned} \partial _{A}^{\beta }\,\psi _{\Omega \left( A\right) }(\xi ) = (-1)^{|\beta |} \cdot \left( \frac{1}{2}\right) ^{|\beta |_{\parallel }} \cdot \xi ^{\beta _{\flat }} \cdot \psi _{\Omega (A)}(\xi ) \qquad \forall \,\, \beta \in \mathbb {N}_{0}^{I},\, \xi \in \mathbb {R}^{n},\, A \in U. \end{aligned}$$
(4.4)

Next, we show for arbitrary \(\gamma \in \mathbb {N}_{0}^{n}\) that there is a polynomial \(p_{\alpha ,\gamma } = p_{\alpha ,\gamma }(\Xi ,B)\) in the variables \(\Xi \in \mathbb {R}^{n}\) and \(B \in \mathbb {R}^{n\times n}\) that satisfies

$$\begin{aligned} \partial _{\xi }^{\alpha } \big [ \xi ^{\gamma } \cdot \psi _{\Sigma }(\xi ) \big ] = p_{\alpha ,\gamma }(\xi ,\Sigma ) \cdot \psi _{\Sigma }(\xi ) \qquad \forall \, \Sigma \in \mathrm {Sym}_{n}^{+},\, \xi \in \mathbb {R}^{n}. \end{aligned}$$
(4.5)

To see this, we first note that a direct computation using the identity \(\partial _{\xi _i} (\xi _k \, \xi _\ell ) = \delta _{i,k} \, \xi _\ell + \delta _{i,\ell } \, \xi _k\) and the symmetry of \(\Sigma \) shows that \(\partial _{\xi _i} \psi _{\Sigma } (\xi ) = - (\Sigma \, \xi )_i \cdot \psi _\Sigma (\xi )\). By induction, and since \((\Sigma \, \xi )_i\) is a polynomial in \(\xi ,\Sigma \), we therefore see that for each \(\beta \in \mathbb {N}_0^n\) there is a polynomial \(p_\beta = p_\beta (\Xi ,B)\) in the variables \(\Xi \in \mathbb {R}^n\) and \(B \in \mathbb {R}^{n \times n}\) satisfying \(\partial _\xi ^\beta \psi _\Sigma (\xi ) = \psi _\Sigma (\xi ) \cdot p_\beta (\xi ,\Sigma )\). Therefore, the Leibniz rule shows

$$\begin{aligned} \partial _\xi ^\alpha \big [ \xi ^\gamma \psi _\Sigma (\xi ) \big ]= & {} \sum _{\beta \in \mathbb {N}_0^n \text { with } \beta \le \alpha } \! \left( {\begin{array}{c}\alpha \\ \beta \end{array}}\right) \, \partial ^\beta \psi _\Sigma (\xi ) \cdot \partial ^{\alpha -\beta } \xi ^\gamma \\= & {} \psi _\Sigma (\xi ) \sum _{\beta \in \mathbb {N}_0^n \text { with } \beta \le \alpha } \! \left( {\begin{array}{c}\alpha \\ \beta \end{array}}\right) \, p_\beta (\xi , \Sigma ) \, \partial ^{\alpha -\beta } \xi ^\gamma , \end{aligned}$$

which proves Equation (4.5).

Now we are ready to justify differentiation under the integral (as in [3, Theorem 2.27]) for the last integral appearing in Equation (4.3), with \(\Sigma = \Omega (A)\), that is, for the function

$$\begin{aligned} U \rightarrow \mathbb {C}, \qquad A \mapsto \int _{\mathbb {R}^{n}} f(\xi ) \cdot \left( \partial ^{\alpha } \psi _{\Omega (A)} \right) \! (\xi ) \,d\xi . \end{aligned}$$

Indeed, let \(A_{0} \in U\) be arbitrary. Since U is open, there is some \(\varepsilon > 0\) satisfying \(\overline{B_{\varepsilon }}(A_{0})\subset U\), for the closed ball \( \overline{B_{\varepsilon }}(A_{0}) = \left\{ A \in \mathbb {R}^{I} \,:\, \left| A - A_{0} \right| \le \varepsilon \right\} \), with the Euclidean norm \(\left| \,\cdot \,\right| \) on \(\mathbb {R}^{I}\). The open ball \(B_{\varepsilon }(A_{0})\) is defined similarly.

Now, with

$$\begin{aligned} \sigma _{\min }(A) := \inf _{\begin{array}{c} x \in \mathbb {R}^{n} \\ |x| =1 \end{array} } \left\langle x,\, A x \right\rangle \quad \text { for } A \in \mathbb {R}^{n \times n} \end{aligned}$$

we have for \(A, B \in \mathbb {R}^{n \times n}\) and arbitrary \(x \in \mathbb {R}^{n}\) with \(|x| = 1\) that

$$\begin{aligned} \sigma _{\min }(A) \le \left\langle x, A x \right\rangle = \left\langle x, B x \right\rangle + \left\langle x,(A-B) x \right\rangle \le \left\langle x, B x \right\rangle + \left\| A - B\right\| . \end{aligned}$$

Since this holds for all \(|x| = 1\), we get \(\sigma _{\min }(A) \le \sigma _{\min }(B) + \left\| A - B \right\| \), and by symmetry \( \left| \sigma _{\min }(A) - \sigma _{\min }(B) \right| \le \left\| A - B\right\| . \) Therefore, the continuous function \(A \mapsto \sigma _{\min } \big (\Omega (A)\big )\) has a positive(!) minimum on the compact set \(\overline{B_{\varepsilon }}(A_{0})\), so that \(\left\langle \xi , \Omega (A) \xi \right\rangle \ge c \cdot |\xi |^{2}\) for all \(\xi \in \mathbb {R}^{n}\) and \(A \in \overline{B_{\varepsilon }}(A_{0})\), for a positive \(c > 0\). Furthermore, there is some \(K = K(A_{0}) > 0\) with \(\left\| \Omega (A) \right\| \le K\) for all \(A \in \overline{B_{\varepsilon }}(A_{0})\).

Now, since the map \(U \times \mathbb {R}^{n} \ni (A,\xi ) \mapsto \psi _{\Omega (A)}(\xi ) \in \mathbb {C}\) is smooth, we have (in view of Equations (4.4) and (4.5)) for arbitrary \(\beta \in \mathbb {N}_{0}^{I}\), \(A \in U\) and \(\xi \in \mathbb {R}^{n}\) that

$$\begin{aligned} \begin{aligned} \partial _{A}^{\beta } \left[ f(\xi ) \cdot \left( \partial ^{\alpha }\psi _{\Omega (A)} \right) (\xi ) \right]&= f(\xi ) \cdot \partial _{\xi }^{\alpha }\left[ \partial _{A}^{\beta } \, \psi _{\Omega (A)}(\xi ) \right] \\ \left( {\scriptstyle \text {Eq. (4.4)}}\right)&= f(\xi ) \cdot (-1)^{|\beta |} \cdot \left( \frac{1}{2}\right) ^{|\beta |_{\parallel }} \cdot \partial _{\xi }^{\alpha } \left[ \xi ^{\beta _{\flat }} \cdot \psi _{\Omega (A)}(\xi ) \right] \\ \left( {\scriptstyle \text {Eq. (4.5)}}\right)&= f(\xi ) \cdot (-1)^{|\beta |} \cdot \left( \frac{1}{2}\right) ^{|\beta |_{\parallel }} \cdot p_{\alpha ,\beta _{\flat }} \left( \xi , \Omega (A) \right) \cdot \psi _{\Omega (A)}(\xi ). \end{aligned} \end{aligned}$$
(4.6)

Using the polynomial growth restriction concerning f, we thus see that there is a constant \(C_{\alpha ,\beta } > 0\) and some \(M_{\alpha ,\beta } \in \mathbb {N}\) with

$$\begin{aligned} \left| \partial _{A}^{\beta } \left[ f(\xi ) \cdot \left( \partial ^{\alpha } \psi _{\Omega (A)} \right) (\xi ) \right] \right|&= \left| f(\xi ) \cdot \left( \frac{1}{2}\right) ^{|\beta |_{\parallel }} \cdot p_{\alpha ,\beta _{\flat }}\left( \xi , \Omega (A) \right) \cdot \psi _{\Omega (A)}(\xi ) \right| \\&\le C \cdot \left( 1 \!+\! |\xi |\right) ^{N} \cdot C_{\alpha ,\beta } \cdot \left( 1 + |\xi | + \left\| \Omega (A) \right\| \right) ^{M_{\alpha ,\beta }} \cdot \\&\quad e^{-\frac{1}{2} \left\langle \xi , \Omega (A) \xi \right\rangle } \\&\le C_{\alpha ,\beta } C \cdot \left( 1 + |\xi | \right) ^{N} \cdot \left( 1 + |\xi | + K \right) ^{M_{\alpha ,\beta }} \cdot e^{-\frac{c}{2} |\xi |^{2}} \\&=: h_{\alpha ,\beta ,A_{0},f}(\xi ), \end{aligned}$$

for all \(\xi \in \mathbb {R}^{n}\) and all \(A\in B_{\varepsilon }(A_{0})\). Since \(h_{\alpha ,\beta ,A_{0},f}\) is independent of \(A \in B_{\varepsilon }(A_{0})\) and since we clearly have \(h_{\alpha ,\beta ,A_{0},f} \in L^{1}(\mathbb {R}^{n})\), [3, Theorem 2.27] and Equation (4.3) show that the function

$$\begin{aligned} B_{\varepsilon }(A_{0}) \rightarrow \mathbb {C}, \quad A \mapsto (-1)^{|\alpha |} \cdot (2\pi )^n \cdot \Phi _g (\Omega (A)) = \int _{\mathbb {R}^{n}} f(\xi ) \cdot \left( \partial ^{\alpha } \psi _{\Omega (A)} \right) (\xi ) \,d\xi \end{aligned}$$

is smooth, with partial derivative of order \(\beta \in \mathbb {N}_0^I\) given by

$$\begin{aligned}&\partial _{A}^{\beta } \left[ (-1)^{|\alpha |} \cdot (2\pi )^n \cdot \Phi _g (\Omega (A)) \right] = \int _{\mathbb {R}^{n}} \partial _A^\beta \big [ f(\xi ) \, (\partial _\xi ^\alpha \psi _{\Omega (A)}) (\xi ) \big ] \,d\xi \\&\left( {\scriptstyle \text {Eq. (4.6)}}\right) = (-1)^{|\beta |} \cdot \left( \frac{1}{2}\right) ^{|\beta |_{\parallel }} \cdot \int _{\mathbb {R}^{n}} f(\xi ) \cdot \partial _{\xi }^{\alpha } \left( \xi ^{\beta _{\flat }} \cdot \psi _{\Omega (A)}(\xi ) \right) \,d\xi \\&\quad = (-1)^{|\beta |} \cdot \left( \frac{1}{2}\right) ^{|\beta |_{\parallel }} \cdot \left\langle f ,\, \partial ^{\alpha } \left[ X^{\beta _{\flat }} \cdot \psi _{\Omega (A)} \right] \right\rangle _{\mathcal {S}',\mathcal {S}}\\&\left( {\scriptstyle \text {Eq. (4.2)}}\right) = (-1)^{|\beta | + |\alpha |} \cdot \left( \frac{1}{2}\right) ^{|\beta |_{\parallel }} \! \cdot (2\pi )^{n} \cdot \left\langle X^{\beta _{\flat }} \cdot \partial ^{\alpha } f ,\, \mathcal {F}^{-1}\phi _{\Omega (A)} \right\rangle _{\mathcal {S}',\mathcal {S}}\\&\left( {\scriptstyle g = \mathcal {F}^{-1}(\partial ^\alpha f), \text { Eq. (4.1)} , \text { and } (-1)^{|\beta |} = \, i^{|\beta _{\flat }|} } \right) \\&\quad = \left( \frac{1}{2}\right) ^{|\beta |_{\parallel }} \cdot (2\pi )^{n} \cdot (-1)^{|\alpha |} \cdot \left\langle \mathcal {F}[\partial ^{\beta _{\flat }} \, g] ,\, \mathcal {F}^{-1} \phi _{\Omega (A)} \right\rangle _{\mathcal {S}',\mathcal {S}}. \end{aligned}$$

In combination, this shows that \(\Phi _{g} \circ \Omega \) is smooth on \(B_{\varepsilon }(A_{0})\), with partial derivatives given by

$$\begin{aligned} \partial ^{\beta }\left[ \Phi _{g}\circ \Omega \right] (A)= & {} \left( \frac{1}{2} \right) ^{|\beta |_{\parallel }} \cdot \left\langle \mathcal {F}\big [ \partial ^{\beta _{\flat }} g \big ] ,\, \mathcal {F}^{-1}\phi _{\Omega (A)} \right\rangle _{\mathcal {S}',\mathcal {S}}\\= & {} \left( \frac{1}{2} \right) ^{|\beta |_{\parallel }} \cdot \left\langle \partial ^{\beta _{\flat }}g ,\, \phi _{\Omega (A)} \right\rangle _{\mathcal {S}',\mathcal {S}}, \end{aligned}$$

as claimed. Since \(A_{0} \in U\) was arbitrary, the proof is complete. \(\square \)