Skip to main content
Log in

Efficient recursive algorithms for functionals based on higher order derivatives of the multivariate Gaussian density

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Many developments in Mathematics involve the computation of higher order derivatives of Gaussian density functions. The analysis of univariate Gaussian random variables is a well-established field whereas the analysis of their multivariate counterparts consists of a body of results which are more dispersed. These latter results generally fall into two main categories: theoretical expressions which reveal the deep structure of the problem, or computational algorithms which can mask the connections with closely related problems. In this paper, we unify existing results and develop new results in a framework which is both conceptually cogent and computationally efficient. We focus on the underlying connections between higher order derivatives of Gaussian density functions, the expected value of products of quadratic forms in Gaussian random variables, and \(V\)-statistics of degree two based on Gaussian density functions. These three sets of results are combined into an analysis of non-parametric data smoothers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Chacón, J.E., Duong, T.: Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. Test 19, 375–398 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Chacón, J.E., Duong, T.: Unconstrained pilot selectors for smoothed cross validation. Aust. N. Z. J. Stat. 53, 331–335 (2011)

    Article  MathSciNet  Google Scholar 

  • Chacón, J.E., Duong, T.: Bandwidth selection for multivariate density derivative estimation, with applications to clustering and bump hunting. Electron. J. Stat. 7, 499–532 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Chacón, J.E., Duong, T., Wand, M.P.: Asymptotics for general multivariate kernel density derivative estimators. Stat. Sinica 21, 807–840 (2011)

    Article  MATH  Google Scholar 

  • Duong, T.: ks: Kernel density estimation and kernel discriminant analysis for multivariate data. J. Stat. Softw. 21(7), 1–16 (2007)

    Google Scholar 

  • Erdélyi, A.: Higher Transcendental Functions, vol. 2. McGraw-Hill, New York (1953)

    Google Scholar 

  • Ghazal, G.A.: Recurrence formula for expectations of products of quadratic forms. Stat. Probab. Lett. 27, 101–109 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Henderson, H.V., Searle, S.R.: Vec and vech operators for matrices, with some uses in Jacobians and multivariate statistics. Can. J. Stat. 7, 65–81 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  • Holmquist, B.: The direct product permuting matrices. Linear Multilinear Algeb. 17, 117–141 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  • Holmquist, B.: Moments and cumulants of the multivariate normal distribution. Stoch. Anal. Appl. 6, 273–278 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  • Holmquist, B.: The \(d\)-variate vector Hermite polynomial of order \(k\). Linear Algeb. Appl. 237(238), 155–190 (1996a)

    Article  MathSciNet  Google Scholar 

  • Holmquist, B.: Expectations of products of quadratic forms in normal variables. Stoch. Anal. Appl. 14, 149–164 (1996b)

    Article  MathSciNet  MATH  Google Scholar 

  • Isserlis, L.: On a formula for the product–moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika 12, 134–139 (1918)

    Article  Google Scholar 

  • Kan, R.: From moments of sum to moments of product. J. Multivar. Anal. 99, 542–554 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Kumar, A.: Expectation of product of quadratic forms. Sankhyā Ser. B 35, 359–362 (1973)

    Google Scholar 

  • Lin, N., Xi, R.: Fast surrogates of \(U\)-statistics. Comput. Stat. Data Anal. 54, 16–24 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Magnus, J.R.: The expectation of products of quadratic forms in normal variables: the practice. Stat. Neerl. 33, 131–136 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  • Magnus, J.R.: The exact moments of a ratio of quadratic forms in normal variables. Ann. Econ. Stat. 4, 95–109 (1986)

    MathSciNet  Google Scholar 

  • Magnus, J.R., Neudecker, H.: The commutation matrix: some properties and applications. Ann. Stat. 7, 381–394 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  • Magnus, J.R., Neudecker, H.: Matrix Differential Calculus with Applications in Statistics and Econometrics, Revised Edition. Wiley, Chichester (1999)

    MATH  Google Scholar 

  • Mathai, A.M., Provost, S.B.: Quadratic Forms in Random Variables: Theory and Applications. Marcel Dekker, New York (1992)

    MATH  Google Scholar 

  • Meijer, E.: Matrix algebra for higher order moments. Linear Algeb. Appl. 410, 112–134 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Phillips, K.: R functions to symbolically compute the central moments of the multivariate normal distribution. J. Stat. Softw. 33 Code Snippet 1 (2010)

    Google Scholar 

  • R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2013)

  • Raykar, V.C., Duraiswami, R., Zhao, L.H.: Fast computation of kernel estimators. J. Comput. Graph. Stat. 19, 205–220 (2010)

    Google Scholar 

  • Savits, T.H.: Some statistical applications of Faa di Bruno. J. Multivar. Anal. 97, 2131–2140 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Schott, J.R.: Kronecker product permutation matrices and their application to moment matrices of the normal distribution. J. Multivar. Anal. 87, 177–190 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Simonoff, J.S.: Smoothing Methods in Statistics. Springer, Berlin (1996)

    Book  MATH  Google Scholar 

  • Smith, P.J.: A recursive formulation of the old problem of obtaining moments from cumulants and vice versa. Am. Stat. 49, 217–218 (1995)

    Google Scholar 

  • Triantafyllopoulos, K.: On the central moments of the multidimensional Gaussian distribution. Math. Sci. 28, 125–128 (2003)

    MathSciNet  MATH  Google Scholar 

  • Wand, M.P.: Fast computation of multivariate kernel estimators. J. Comput. Graph. Stat. 3, 433–445 (1994)

    MathSciNet  Google Scholar 

Download references

Acknowledgments

We thank two anonymous referees for a careful reading of the paper. This work has been partially supported by grants MTM2010-16660 (both authors) and MTM2010-17366 (first author) from the Spanish Ministerio de Ciencia e Innovación. The second author also received funding from the program “Investissements d’ avenir” ANR-10-IAIHU-06

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José E. Chacón.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (R 30 KB)

Appendices

Appendix 1: Proofs

1.1 Proofs of the results in Section 3

The key elements to prove Theorem 1 are the following two lemmas.

Lemma 1

For every \(j\in \mathbb N_{r+1}:=\{1,2,\ldots ,r+1\}\) denote by \(\tau _j\in \mathcal P_{r+1}\) the permutation defined by \(\tau _j(j)=r+1\), \(\tau _j(r+1)=j\) and \(\tau _j(i)=i\) for \(j\ne i\ne r+1\). Then we can express

$$\begin{aligned} \mathcal P_{r+1}=\big \{\sigma \circ \tau _j:\sigma \in \mathcal P_r,\ j\in \mathbb N_{r+1}\big \}. \end{aligned}$$

Proof

As any \(\sigma \in \mathcal P_r\) can be thought as an element of \(\mathcal P_{r+1}\) by defining \(\sigma (r+1)=r+1\), consider the map \(\varphi :\mathcal P_r\times \mathbb N_{r+1}\rightarrow \mathcal P_{r+1}\) given by \(\varphi (\sigma ,j)=\sigma \circ \tau _j\). We conclude by noting that this map is bijective, with inverse given by \(\varphi ^{-1}(\tilde{\sigma })=(\sigma ,j)\), where \(j=\tilde{\sigma }^{-1}(r+1)\) is such that \(\tilde{\sigma }(j)=r+1\) and, for \(i\in \mathbb N_r\), \(\sigma (i)=\tilde{\sigma }(i)\) if \(\tilde{\sigma }(i)\ne r+1\) and \(\sigma (i)=\tilde{\sigma }(r+1)\) if \(\tilde{\sigma }(i)=r+1\). \(\square \)

Lemma 2

If \(\mathbf {A}\in \mathcal M_{m\times n}\), \(\mathbf {B}\in \mathcal M_{p\times q}\) and \(\varvec{a},\varvec{b}\in \mathbb R^d\), then

$$\begin{aligned} \mathbf {A}\otimes \varvec{a}^\top \otimes \mathbf {B}\otimes \varvec{b}^\top&= (\mathbf {A}\otimes \varvec{b}^\top \otimes \mathbf {B}\otimes \varvec{a}^\top )\cdot (\mathbf{I}_{dn}\otimes \mathbf {K}_{q,d}) \cdot \\&(\mathbf{I}_n\otimes \mathbf {K}_{d,dq}). \end{aligned}$$

Proof

Use the properties of the commutation matrix to first permute \(\varvec{a}^\top \otimes \mathbf {B}\) with \(\varvec{b}^\top \), keeping \(\mathbf {A}\) in the same place, and then to permute \(\varvec{a}^\top \) with \(\mathbf {B}\) keeping \(\mathbf {A}\otimes \varvec{b}^\top \) in the same place. \(\square \)

The previous lemmas are helpful to manipulate the original definition of \(\varvec{\mathcal S}_{d,r}\) and thus obtain the proof of Theorem 1.

Proof of Theorem 1

Note that for any two vectors \(\varvec{v},\varvec{w}\in \mathbb R^d\) we have \(\varvec{v}\varvec{w}^\top =\varvec{v}\otimes \varvec{w}^\top \). Then, with the identification \(\mathcal P_r\subset \mathcal P_{r+1}\) and the notation \(\tau _j\) as in Lemma 1, for any \(\sigma \in \mathcal P_r\) and \(j\in \mathbb N_{r+1}\),

$$\begin{aligned} \bigotimes _{\ell =1}^{r+1}{\varvec{e}}_{i_\ell }{\varvec{e}}_{i_{\sigma (\tau _j(\ell ))}}^\top&= \bigotimes _{\ell =1}^{j-1}{\varvec{e}}_{i_\ell }{\varvec{e}}_{i_{\sigma (\ell )}}^\top \otimes {\varvec{e}}_{i_j}{\varvec{e}}_{i_{r+1}}^\top \otimes \bigotimes _{\ell =j+1}^{r}{\varvec{e}}_{i_\ell }{\varvec{e}}_{i_{\sigma (\ell )}}^\top \nonumber \\&\otimes {\varvec{e}}_{i_{r+1}}{\varvec{e}}_{i_{\sigma (j)}}^\top \nonumber \\&= \Big \{\bigotimes _{\ell =1}^r{\varvec{e}}_{i_\ell }{\varvec{e}}_{i_{\sigma (\ell )}}^\top \otimes {\varvec{e}}_{i_{r+1}}{\varvec{e}}_{i_{r+1}}^\top \Big \}\nonumber \\&\cdot (\mathbf{I}_{d^j}\otimes \mathbf {K}_{d^{r-j},d})(\mathbf{I}_{d^{j-1}}\otimes \mathbf {K}_{d,d^{r-j+1}}) \end{aligned}$$
(13)

where for the second equality we have applied Lemma 2 with \(\varvec{a}={\varvec{e}}_{i_{r+1}}\), \(\varvec{b}={\varvec{e}}_{i_{\sigma (j)}}\),

$$\begin{aligned} \mathbf {A}&= \bigotimes _{\ell =1}^{j-1}{\varvec{e}}_{i_\ell }{\varvec{e}}_{i_{\sigma (\ell )}}^\top \otimes {\varvec{e}}_{i_j}\in \mathcal {M}_{d^j,d^{j-1}}\quad \text {and}\quad \\ \mathbf {B}&= \bigotimes _{\ell =j+1}^{r}{\varvec{e}}_{i_\ell }{\varvec{e}}_{i_{\sigma (\ell )}}^\top \otimes {\varvec{e}}_{i_{r+1}}\in \mathcal {M}_{d^{r-j+1},d^{r-j}}. \end{aligned}$$

Taking Lemma 1, (13) and the definition of \(\mathbf {T}_{d,r+1}\) into account,

$$\begin{aligned}&\varvec{\mathcal S}_{d,r+1}\nonumber \\&~=\frac{1}{(r+1)!}\sum _{i_1,i_2,\ldots ,i_{r+1}=1}^d\; \sum _{\sigma \in \mathcal P_{r+1}}\bigotimes _{\ell =1}^{r+1}{\varvec{e}}_{i_\ell }{\varvec{e}}_{i_{\sigma (\ell )}}^\top \\&~=\frac{1}{(r+1)!}\sum _{i_1,i_2,\ldots ,i_{r+1}=1}^d\; \sum _{\sigma \in \mathcal P_r}\sum _{j=1}^{r+1}\bigotimes _{\ell =1}^{r+1}{\varvec{e}}_{i_\ell }{\varvec{e}}_{i_{\sigma (\tau _j(\ell ))}}^\top \\&~=\frac{1}{r!}\sum _{i_1,i_2,\ldots ,i_{r+1}=1}^d\; \sum _{\sigma \in \mathcal P_r}\Big \{\bigotimes _{\ell =1}^r{\varvec{e}}_{i_\ell }{\varvec{e}}_{i_{\sigma (\ell )}}^\top \otimes {\varvec{e}}_{i_{r+1}}{\varvec{e}}_{i_{r+1}}^\top \Big \}\mathbf {T}_{d,r+1}\\&~=\Big \{\varvec{\mathcal S}_{d,r}\otimes \Big (\textstyle \sum _{i_{r+1}=1}^d{\varvec{e}}_{i_{r+1}}{\varvec{e}}_{i_{r+1}}^\top \Big )\Big \}\mathbf {T}_{d,r+1}\\&~=(\varvec{\mathcal S}_{d,r}\otimes \mathbf{I}_d) \mathbf {T}_{d,r+1}, \end{aligned}$$

as \(\mathbf{I}_d=\sum _{i=1}^d{\varvec{e}}_i{\varvec{e}}_i^\top \). \(\square \)

To obtain a recursive formula for the matrix \(\mathbf {T}_{d,r}\) we first need to write the matrices \(\mathbf {K}_{d^{p+1},d}\) and \(\mathbf {K}_{d,d^{p+1}}\) depending on \(\mathbf {K}_{d^p,d}\) and \(\mathbf {K}_{d,d^p}\), respectively.

Lemma 3

For any \(p\ge 0\)

$$\begin{aligned} \mathbf {K}_{d^{p+1},d}&= (\mathbf{I}_{d^p}\otimes \mathbf {K}_{d,d})(\mathbf {K}_{d^p,d}\otimes \mathbf{I}_d)\\&= (\mathbf{I}_d\otimes \mathbf {K}_{d^p,d})(\mathbf {K}_{d,d}\otimes \mathbf{I}_{d^p})\\ \mathbf {K}_{d,d^{p+1}}&= (\mathbf {K}_{d,d^p}\otimes \mathbf{I}_d)(\mathbf{I}_{d^p}\otimes \mathbf {K}_{d,d})\\&= (\mathbf {K}_{d,d}\otimes \mathbf{I}_{d^p})(\mathbf{I}_d\otimes \mathbf {K}_{d,d^p}). \end{aligned}$$

Proof

Using part \(i)\) of Theorem 3.1 in Magnus and Neudecker (1979), we can write

$$\begin{aligned} \mathbf {K}_{d^{p+1},d}&= \sum _{j=1}^d({\varvec{e}}_j^\top \otimes \mathbf{I}_{d^{p+1}}\otimes {\varvec{e}}_j)=\sum _{j=1}^d({\varvec{e}}_j^\top \otimes \mathbf{I}_{d^p}\otimes \mathbf{I}_d\otimes {\varvec{e}}_j)\\&= (\mathbf{I}_{d^p}\otimes \mathbf {K}_{d,d})\sum _{j=1}^d({\varvec{e}}_j^\top \otimes \mathbf{I}_{d^p}\otimes {\varvec{e}}_j\otimes \mathbf{I}_d)\\&= (\mathbf{I}_{d^p}\otimes \mathbf {K}_{d,d})(\mathbf {K}_{d^p,d}\otimes \mathbf{I}_d). \end{aligned}$$

The second equality for \(\mathbf {K}_{d^{p+1},d}\) follows similarly and the formulas for \(\mathbf {K}_{d,d^{p+1}}\) can be derived from the previous ones by noting that \(\mathbf {K}_{d,d^{p+1}}=\mathbf {K}_{d^{p+1},d}^\top \). \(\square \)

Using the previous lemma we obtain a straightforward proof of Theorem 2.

Proof of Theorem 2

Using Lemma 3 for the first \(r-1\) terms in the definition of \(\mathbf {T}_{d,r+1}\), and the property that \((\mathbf {A}\mathbf {C})\otimes (\mathbf {B}\mathbf {D})=(\mathbf {A}\otimes \mathbf {B})(\mathbf {C}\otimes \mathbf {D})\), it follows that

$$\begin{aligned}&\!\!\!(r+1)\mathbf {T}_{d,r+1}\\&\quad =\sum _{j=1}^{r-1}(\mathbf{I}_{d^j}\otimes \mathbf {K}_{d^{r-j},d})(\mathbf{I}_{d^{j-1}}\otimes \mathbf {K}_{d,d^{r-j+1}})\\&\qquad +(\mathbf{I}_{d^{r-1}}\otimes \mathbf {K}_{d,d})+\mathbf{I}_{d^{r+1}}\\&\quad =\sum _{j=1}^{r-1}\big [\mathbf{I}_{d^j}\otimes \{(\mathbf{I}_{d^{r-j-1}}\otimes \mathbf {K}_{d,d})(\mathbf {K}_{d^{r-j-1},d}\otimes \mathbf{I}_d)\}\big ]\\&\qquad \times \big [\mathbf{I}_{d^{j-1}}\otimes \{(\mathbf {K}_{d,d^{r-j}}\otimes \mathbf{I}_d)(\mathbf{I}_{d^{r-j}}\otimes \mathbf {K}_{d,d})\}\big ]\\&\qquad +(\mathbf{I}_{d^{r-1}}\otimes \mathbf {K}_{d,d})+\mathbf{I}_{d^{r+1}}\\&\quad =(\mathbf{I}_{d^{r-1}}\otimes \mathbf {K}_{d,d})\\&\qquad \times \Big [\Big \{\sum _{j=1}^{r}(\mathbf{I}_{d^j}\otimes \mathbf {K}_{d^{r-j-1},d})(\mathbf{I}_{d^{j-1}}\otimes \mathbf {K}_{d,d^{r-j}})\Big \}\otimes \mathbf{I}_d\Big ]\\&\qquad \times (\mathbf{I}_{d^{r-1}}\otimes \mathbf {K}_{d,d})+(\mathbf{I}_{d^{r-1}}\otimes \mathbf {K}_{d,d})\\&\quad =(\mathbf{I}_{d^{r-1}}\otimes \mathbf {K}_{d,d})(r\mathbf {T}_{d,r}\otimes \mathbf{I}_d)(\mathbf{I}_{d^{r-1}}\otimes \mathbf {K}_{d,d})\\&\qquad +(\mathbf{I}_{d^{r-1}}\otimes \mathbf {K}_{d,d}), \end{aligned}$$

where the third equality makes use of \(\mathbf{I}_{d^p}\otimes \mathbf{I}_{d^q}=\mathbf{I}_{d^{p+q}}\). \(\square \)

1.2 Proofs of the results in Section 4

As noted in the text, the proof of Corollary 1 follows by induction on \(r\).

Proof of Corollary 1

For \(r=1\) the formula immediately follows, since \(\varvec{\mathcal S}_{d,1}=\mathbf{I}_d=\mathbf {T}_{d,1}\). The induction step is easily deduced by using formula \(\varvec{\mathcal S}_{d,r+1}=(\varvec{\mathcal S}_{d,r}\otimes \mathbf{I}_d)\mathbf {T}_{d,r+1}\) from Theorem 1 using the same tools as before, taking into account that \(\mathbf{I}_{d^p}\otimes \mathbf{I}_d=\mathbf{I}_{d^{p+1}}\) and that \((\mathbf {A}\mathbf {C})\otimes (\mathbf {B}\mathbf {D})=(\mathbf {A}\otimes \mathbf {B})(\mathbf {C}\otimes \mathbf {D})\). \(\square \)

Corollary 2 is deduced from Corollary 1 as follows.

Proof of Corollary 2

Clearly, the Kronecker product \(\bigotimes _{\ell =1}^r{\varvec{e}}_{i_\ell }\) of \(r\) vectors \({\varvec{e}}_{i_1},\ldots ,{\varvec{e}}_{i_r}\) of the canonical basis of \(\mathbb R^d\) gives the \(p(i_1,\ldots ,i_r)\)-th vector of the canonical basis in \(\mathbb R^{d^r}\) (i.e., the \(p(i_1,\ldots ,i_r)\)-th column of \(\mathbf{I}_{d^r}\)). Therefore, any vector \(\varvec{v}=(v_1,\ldots ,v_{d^r})\in \mathbb R^{d^r}\) can be written as \(\varvec{v}=\sum _{i=1}^{d^r}v_i\bigotimes _{\ell =1}^r{\varvec{e}}_{(p^{-1}(i))_\ell }\) and so, by linearity, it suffices to obtain a simple formula for expressions of the type \((\mathbf {T}_{d,k}\otimes \mathbf{I}_{d^{r-k}})(\bigotimes _{\ell =1}^r{\varvec{e}}_{i_\ell })\). Further, since \((\mathbf {T}_{d,k}\otimes \mathbf{I}_{d^{r-k}})(\bigotimes _{\ell =1}^r{\varvec{e}}_{i_\ell })=\big \{\mathbf {T}_{d,k}\big (\bigotimes _{\ell =1}^k{\varvec{e}}_{i_\ell }\big )\big \}\otimes \bigotimes _{\ell =k+1}^r{\varvec{e}}_{i_\ell }\), it follows that it is enough to provide a simple interpretation for the multiplications \(\mathbf {T}_{d,k}\big (\bigotimes _{\ell =1}^k{\varvec{e}}_{i_\ell }\big )\) for \(k=2,\ldots ,r\).

Finally, using the properties of the commutation matrix (Magnus and Neudecker 1979), it can be checked that

$$\begin{aligned} \mathbf {T}_{d,k}\bigg (\bigotimes _{\ell =1}^k{\varvec{e}}_{i_\ell }\bigg ) \!=\! \frac{1}{k}\sum _{j=1}^k\bigg \{\bigotimes _{\ell =1}^{j-1}{\varvec{e}}_{i_\ell }\otimes {\varvec{e}}_{i_k}\otimes \bigotimes _{\ell =j+1}^{k-1}{\varvec{e}}_{i_\ell }\otimes {\varvec{e}}_{i_j}\bigg \}\nonumber \\ \end{aligned}$$
(14)

with the convention that \(\bigotimes _{\ell =j}^k{\varvec{e}}_{i_\ell }=1\) if \(j>k\). In words, \(k\mathbf {T}_{d,k}\big (\bigotimes _{\ell =1}^k{\varvec{e}}_{i_\ell }\big )\) consists of adding up all the possible \(k\)-fold Kronecker products in which the last factor is interchanged with the \(j\)-th factor, for \(j=1,2,\ldots ,k\). \(\square \)

1.3 Proofs of the results in Section 6

First, let us point out why the formula for the joint cumulant in Corollary 3.3.1 of Mathai and Provost (1992) is not always correct. Using the notation of Theorem 3 above, their formula reads as follows: for \(r\ge 1,\,s\ge 1\),

$$\begin{aligned} \kappa _{r,s}(\mathbf {A}, \mathbf {B})&= 2^{r+s-1} (r+s-1)! {{\mathrm{tr}}}\big (\mathbf {F}_1^r\mathbf {F}_2^s\big )\nonumber \\&+2^{r+s-1}(r+s-2)!\big \{ r(r-1) \nonumber \\&\quad {{\mathrm{tr}}}\big (\mathbf {F}_1^{r-1}\mathbf {F}_2^s\mathbf {F}_1{\varvec{\Sigma }}^{-1}{\varvec{\mu }}{\varvec{\mu }}^\top \big )\nonumber \\&+s(s-1) {{\mathrm{tr}}}\big (\mathbf {F}_2^{s-1}\mathbf {F}_1^r\mathbf {F}_2{\varvec{\Sigma }}^{-1}{\varvec{\mu }}{\varvec{\mu }}^\top \big )\nonumber \\&+2rs {{\mathrm{tr}}}\big (\mathbf {F}_1^r\mathbf {F}_2^s{\varvec{\Sigma }}^{-1}{\varvec{\mu }}{\varvec{\mu }}^\top \big ) \big \}. \end{aligned}$$
(15)

To further simplify our comparison, consider for example the case \({\varvec{\mu }}=0\), and \(r=s=2\), so that (15) simply reads \( 2^3\, 6{{\mathrm{tr}}}\big (\mathbf {F}_1^2\mathbf {F}_2^2\big )\). Writing down explicitly the six elements in \(\mathcal {MP}_{2,2}\) and applying the cyclic property of the trace, the correct form from Theorem 3 has

$$\begin{aligned}&2^3\, 2!2!\sum _{\varvec{i}\in \mathcal {MP}_{2,2}}{{\mathrm{tr}}}\big (\mathbf {F}_{i_1}\mathbf {F}_{i_2}\mathbf {F}_{i_3}\mathbf {F}_{i_4}\big )/4\\&\quad =2^3\, \big \{4{{\mathrm{tr}}}\big (\mathbf {F}_1^2\mathbf {F}_2^2\big )+2{{\mathrm{tr}}}\big (\mathbf {F}_1\mathbf {F}_2\mathbf {F}_1\mathbf {F}_2\big )\big \} \end{aligned}$$

instead. Both formulas involve 6 traces of matrices, all having two factors \(\mathbf {F}_1\) and another two factors \(\mathbf {F}_2\). However, despite the aforementioned cyclic property of the trace, it is not true in general that \({{\mathrm{tr}}}\big (\mathbf {F}_1\mathbf {F}_2\mathbf {F}_1\mathbf {F}_2\big )={{\mathrm{tr}}}\big (\mathbf {F}_1^2\mathbf {F}_2^2\big )\), and that causes an error in formula (15). A similar argument shows the reason why some of the terms involving \({\varvec{\mu }}\) in (15) are also wrong.

A sufficient condition for formula (15) to be correct is that \(\mathbf {F}_1\mathbf {F}_2=\mathbf {F}_2\mathbf {F}_1\). If that condition holds, then the correct formula for the joint cumulant further simplifies to

$$\begin{aligned} \kappa _{r,s}(\mathbf {A},\mathbf {B})&= 2^{r+s-1}(r+s-1)!\big \{{{\mathrm{tr}}}\big (\mathbf {F}_1^r\mathbf {F}_2^s\big )\\&+(r+s){{\mathrm{tr}}}\big (\mathbf {F}_1^r\mathbf {F}_2^s{\varvec{\Sigma }}^{-1}{\varvec{\mu }}{\varvec{\mu }}^\top \big )\big \}. \end{aligned}$$

The proof of Theorem 3 is based on Matrix Calculus. Let us introduce some further notation to simplify the calculations. For \(i=1,2\), denote

$$\begin{aligned} \mathbf {C}_i\equiv \mathbf {C}_i(t_1,t_2)=(\mathbf{I}_d-2t_1\mathbf {F}_1-2t_2\mathbf {F}_2)^{-1}\mathbf {F}_i \end{aligned}$$

and, similarly, \(\mathbf {C}_3\equiv \mathbf {C}_3(t_1,t_2)=(\mathbf{I}_d-2t_1\mathbf {F}_1-2t_2\mathbf {F}_2)^{-1}{\varvec{\Sigma }}^{-1}{\varvec{\mu }}{\varvec{\mu }}^\top \). Taking into account the formula for the differential of the inverse of a matrix given in Magnus and Neudecker (1999, Chapter 8) notice that the introduced notation allows for simple expressions for the following differentials: for any \(i\in \{1,2,3\}\) and \(j\in \{1,2\}\), \(d\mathbf {C}_i=2\mathbf {C}_j\mathbf {C}_idt_j.\) In words, differentiating any of these matrix functions with respect to \(t_j\) consists on pre-multiplying by \(2\mathbf {C}_j\).

More generally, for \(i_1,\ldots ,i_r\in \{1,2\}\), \(j\in \{1,2\}\) and \(m\in \{1,2,3\}\) we have

$$\begin{aligned}&\!\!\!d(\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_r}\mathbf {C}_m)\nonumber \\&\quad =\{d(\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_r})\}\mathbf {C}_m+\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_r}d\mathbf {C}_m\nonumber \\&\quad =2\left\{ \sum _{\ell =1}^r\left( \prod \limits _{k=1}^{\ell -1}\mathbf {C}_{i_k}\right) (\mathbf {C}_j\mathbf {C}_{i_\ell })\left( \prod \limits _{k=\ell +1}^{r}\mathbf {C}_{i_k}\right) \mathbf {C}_m\right. \nonumber \\&\left. \qquad +\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_r}\mathbf {C}_j\mathbf {C}_m\right\} dt_j\nonumber \\&\quad =2\sum _{\ell =1}^{r+1}\left( \prod \limits _{k=1}^{\ell -1}\mathbf {C}_{i_k}\right) \mathbf {C}_j\left( \prod \limits _{k=\ell }^{r}\mathbf {C}_{i_k}\right) \mathbf {C}_m\, dt_j, \end{aligned}$$
(16)

where \(\prod _{k=a}^b\mathbf {C}_{i_k}\) is to be understood as \(\mathbf{I}_d\) if \(a>b\).

The key tool for the proof of Theorem 3 is the following lemma, which is indeed valid for any matrix function having the properties of \(\mathbf {C}_m\) exhibited above.

Lemma 4

For any \(m\in \{1,2,3\}\), consider the function \(w(t_1,t_2)={{\mathrm{tr}}}\mathbf {C}_m\). Then,

$$\begin{aligned} \frac{\partial ^{r+s}}{\partial t_1^{r}\partial t_2^{s}}w(t_1,t_2)=2^{r+s}\,r!s!\sum _{\varvec{i}\in \mathcal {MP}_{r,s}}{{\mathrm{tr}}}\big (\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_{r+s}}\mathbf {C}_m\big ). \end{aligned}$$

Proof

From (16) it easily follows that \(d^r\mathbf {C}_m{=}2^rr!\,\mathbf {C}_1^r\mathbf {C}_m\, dt_1^r\), so that

$$\begin{aligned} \frac{\partial ^r}{\partial t_1^r}w(t_1,t_2)=2^r\,r!\,{{\mathrm{tr}}}\big (\mathbf {C}_1^r\mathbf {C}_m\big ). \end{aligned}$$

Hence, to conclude what we need to show is that, for \(s=0,1,2,\ldots \),

$$\begin{aligned} \frac{\partial ^{s}}{\partial t_2^{s}}{{\mathrm{tr}}}\big (\mathbf {C}_1^{r}\mathbf {C}_m\big )=2^{s}\,s!\sum _{\varvec{i}\in \mathcal {MP}_{r,s}}{{\mathrm{tr}}}\big (\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_{r+s}}\mathbf {C}_m\big ) \end{aligned}$$
(17)

To prove (17) we proceed by induction on \(s\), since the initial step corresponding to \(s=0\) is clear. Assuming that (17) is true for the \((s-1)\)-th derivative, the induction step consists of showing that the formula also holds for the \(s\)-th derivative; that is,

$$\begin{aligned}&\sum _{\varvec{i}\in \mathcal {MP}_{r,s-1}}\frac{\partial }{\partial t_2}{{\mathrm{tr}}}\big (\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_{r+s-1}}\mathbf {C}_m\big )\nonumber \\&\quad =2s\sum _{\varvec{i}\in \mathcal {MP}_{r,s}}{{\mathrm{tr}}}\big (\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_{r+s}}\mathbf {C}_m\big ). \end{aligned}$$
(18)

Taking into account (16), to prove (18) it suffices to show that the set

$$\begin{aligned} \mathcal {A}_{r,s}&= \bigcup _{\ell =1}^{r+s}\{(i_1,\ldots ,i_{\ell -1},2,i_{\ell },\ldots ,i_{r+s-1}):\varvec{i}\in \mathcal {MP}_{r,s-1}\}\\&= \{(2,i_1,\ldots ,i_{r+s-1}):\varvec{i}\in \mathcal {MP}_{r,s-1}\}\\&\quad \cup \{(i_1,2,\ldots ,i_{r+s-1}):\varvec{i}\in \mathcal {MP}_{r,s-1}\}\\&\quad \cup \cdots \cup \{(i_1,\ldots ,i_{r+s-1},2):\varvec{i}\in \mathcal {MP}_{r,s-1}\} \end{aligned}$$

coincides precisely with the multiset that contains \(s\) copies of each of the elements of \(\mathcal {MP}_{r,s}\). This can be showed as follows: it is clear that all the elements in \(\mathcal A_{r,s}\) belong to \(\mathcal {MP}_{r,s}\). On the hand, notice that any vector \(\varvec{i}=(i_1,\ldots ,i_{r+s})\in \mathcal {MP}_{r,s}\) contains the number 2 in exactly \(s\) of its coordinates, which can be distributed along any of the \(r+s\) positions. If one of those number 2 coordinates is deleted from \(\varvec{i}\), the resulting vector belongs to \(\mathcal {MP}_{r,s-1}\), and repeating that process for all the \(s\) coordinates with the number 2, then \(s\) copies of \(\varvec{i}\) are found \(\mathcal A_{r,s}\). \(\square \)

Making use of Lemma 4 next we prove Theorem 3.

Proof of Theorem 3

Magnus (1986) showed that the joint cumulant generating function of \(\mathbf{X}^\top \mathbf {A}\mathbf{X}\) and \(\mathbf{X}^\top \mathbf {B}\mathbf{X}\) can be written as \(\psi (t_1,t_2)=u(t_1,t_2)-\tfrac{1}{2}{\varvec{\mu }}^\top {\varvec{\Sigma }}^{-1}{\varvec{\mu }}+v(t_1,t_2)\), where

$$\begin{aligned} u(t_1,t_2)&=-\tfrac{1}{2}\log |\mathbf{I}_d-2t_1\mathbf {F}_1-2t_2\mathbf {F}_2|\qquad \text {and}\\ v(t_1,t_2)&=\tfrac{1}{2}{{\mathrm{tr}}}\big \{(\mathbf{I}_d-2t_1\mathbf {F}_1-2t_2\mathbf {F}_2)^{-1}{\varvec{\Sigma }}^{-1}{\varvec{\mu }}{\varvec{\mu }}^\top \big \}, \end{aligned}$$

with \(\mathbf {F}_1=\mathbf {A}{\varvec{\Sigma }}\) and \(\mathbf {F}_2=\mathbf {B}{\varvec{\Sigma }}\). Since for \(r+s\ge 1\) the \((r,s)\)-th joint cumulant is defined as \(\kappa _{r,s}(\mathbf {A},\mathbf {B})=\frac{\partial ^{r+s}}{\partial t_1^{r}\partial t_2^s}\psi (0,0)\), it suffices to show that

$$\begin{aligned} \frac{\partial ^{r+s}}{\partial t_1^{r}\partial t_2^s}\psi (t_1,t_2)&= 2^{r+s-1}r!s!\sum _{\varvec{i}\in \mathcal {MP}_{r,s}}\\&{{\mathrm{tr}}}\big [\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_{r+s}}\big \{\mathbf{I}_d/(r+s)+\mathbf {C}_3\big \}\big ]. \end{aligned}$$

With the previous notations, \(v(t_1,t_2)=\frac{1}{2}{{\mathrm{tr}}}\mathbf {C}_3\), so Lemma 4 immediately yields the desired formula for the second summand.

For the first one, combining the chain rule with the formula for the differential of a determinant given in Magnus and Neudecker (1999, Chapter 8), it follows that \(\frac{\partial }{\partial t_1}u(t_1,t_2)={{\mathrm{tr}}}\mathbf {C}_1\). So, applying Lemma 4 to \(\frac{\partial }{\partial t_1}u(t_1,t_2)\), we obtain

$$\begin{aligned} \frac{\partial ^{r+s}}{\partial t_1^{r}\partial t_2^{s}}u(t_1,t_2)&= 2^{r+s-1}\,(r-1)!s\\&\times \sum _{\varvec{i}\in \mathcal {MP}_{r-1,s}}{{\mathrm{tr}}}\big (\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_{r+s-1}}\mathbf {C}_1\big ). \end{aligned}$$

By the symmetry in \((t_1,t_2)\) and \((r,s)\) of the preceding argument we come to

$$\begin{aligned}&(r+s)\times \frac{\partial ^{r+s}}{\partial t_1^{r}\partial t_2^{s}}u(t_1,t_2)\\&\quad =2^{r+s-1}\,r!s!\Big \{\sum _{\varvec{i}\in \mathcal {MP}_{r-1,s}}{{\mathrm{tr}}}\big (\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_{r+s-1}}\mathbf {C}_1\big )\\&\qquad +\sum _{\varvec{i}\in \mathcal {MP}_{r,s-1}}{{\mathrm{tr}}}\big (\mathbf {C}_{i_1}\cdots \mathbf {C}_{i_{r+s-1}}\mathbf {C}_2\big )\Big \}. \end{aligned}$$

The proof is finished by noting that, clearly,

$$\begin{aligned}&\mathcal {MP}_{r,s}=\{(i_1,\ldots ,i_{r+s-1},1):\varvec{i}\in \mathcal {MP}_{r-1,s}\}\\&\cup \{(i_1,\ldots ,i_{r+s-1},2):\varvec{i}\in \mathcal {MP}_{r,s-1}\}. \end{aligned}$$

\(\square \)

Although Theorem 4 suffices to obtain a fast recursive implementation of the CV, PI and SCV criteria, here a slightly more general version of this result is shown. Let us denote \(\widetilde{\eta }_{r,s}({\varvec{x}}; \mathbf {A}, \mathbf {B}, {\varvec{\Sigma }}) = [(\mathrm{vec }^\top \mathbf {A})^{\otimes r} \otimes (\mathrm{vec }^\top \mathbf {B})^{\otimes s}] \mathsf {D}^{\otimes 2r+2s} \phi _{{\varvec{\Sigma }}} ({\varvec{x}})\) for \(d \times d\) symmetric matrices \(\mathbf {A}, \mathbf {B}\) and also \(\widetilde{\eta }_{r}({\varvec{x}}; \mathbf {A}, {\varvec{\Sigma }}) \equiv \widetilde{\eta }_{r,0}({\varvec{x}}; \mathbf {A}, \mathbf{I}_d, {\varvec{\Sigma }}) = (\mathrm{vec }^\top \mathbf {A})^{\otimes r} \mathsf {D}^{\otimes 2r} \phi _{{\varvec{\Sigma }}} ({\varvec{x}})\). Notice that the \(\eta \) functionals can be seen to be particular cases of the \(\widetilde{\eta }\) functionals by setting \(\mathbf {A}=\mathbf{I}_d\).

Theorem 5

For a fixed \({\varvec{x}}\), the previous \(\widetilde{\eta }\) functionals are related to the \(\nu \) functionals as follows

$$\begin{aligned}&\widetilde{\eta }_{r} ({\varvec{x}}; \mathbf {A}, {\varvec{\Sigma }}) = \phi _{{\varvec{\Sigma }}} ({\varvec{x}}) \nu _r\big (\mathbf {A}; {\varvec{\Sigma }}^{-1} {\varvec{x}}, -{\varvec{\Sigma }}^{-1}\big ) \\&\widetilde{\eta }_{r,s} ({\varvec{x}}; \mathbf {A}, \mathbf {B}, {\varvec{\Sigma }}) = \phi _{{\varvec{\Sigma }}} ({\varvec{x}}) \nu _{r,s}\big (\mathbf {A}, \mathbf {B}; {\varvec{\Sigma }}^{-1} {\varvec{x}}, -{\varvec{\Sigma }}^{-1}\big ). \end{aligned}$$

Proof

Notice that (9) entails \(\nu _r(\mathbf {A};{\varvec{\mu }},{\varvec{\Sigma }})=(\mathrm{vec }^\top \mathbf {A})^{\otimes r}\varvec{\mathcal H}_{2r}({\varvec{\mu }};-{\varvec{\Sigma }})\). And from Theorem 3.1 in Holmquist (1996a), \(({\varvec{\Sigma }}^{-1})^{\otimes 2r}\varvec{\mathcal H}_{2r}({\varvec{x}};{\varvec{\Sigma }})=\varvec{\mathcal H}_{2r}({\varvec{\Sigma }}^{-1}{\varvec{x}};{\varvec{\Sigma }}^{-1})\). Therefore,

$$\begin{aligned} \widetilde{\eta }_r({\varvec{x}};\mathbf {A},{\varvec{\Sigma }})&= (\mathrm{vec }^\top \mathbf {A})^{\otimes r} \mathsf {D}^{\otimes 2r} \phi _{{\varvec{\Sigma }}}({\varvec{x}})\\&= \phi _{\varvec{\Sigma }}({\varvec{x}})(\mathrm{vec }^\top \mathbf {A})^{\otimes r}({\varvec{\Sigma }}^{-1})^{\otimes 2r}\varvec{\mathcal H}_{2r}({\varvec{x}};{\varvec{\Sigma }})\\&= \phi _{\varvec{\Sigma }}({\varvec{x}})(\mathrm{vec }^\top \mathbf {A})^{\otimes r}\varvec{\mathcal H}_{2r}({\varvec{\Sigma }}^{-1}{\varvec{x}};{\varvec{\Sigma }}^{-1})\\&= \phi _{\varvec{\Sigma }}({\varvec{x}})\nu _r\big (\mathbf {A}; {\varvec{\Sigma }}^{-1} {\varvec{x}}, -{\varvec{\Sigma }}^{-1}\big ), \end{aligned}$$

as desired. The proof for \(\widetilde{\eta }_{r,s}\) follows analogously. \(\square \)

Appendix 2: Generation of all the permutations with repetitions

A preliminary step to the methods described in Sects. 3, 4 and 5 involves generating the set of all the permutations with repetitions \(\mathcal {PR}_{d,r}\). This set can be portrayed as a matrix \(\mathbf {P}\) of order \(d^r\times r\), whose \((i,j)\)-th entry represents the \(j\)-th coordinate of the \(i\)-th permutation in \(\mathcal {PR}_{d,r}\).

Moreover, in view of Sect. 5 it seems convenient to keep the natural order of these permutations induced by the formulation \(\mathcal {PR}_{d,r}=\{p^{-1}(i):i=1,\ldots ,d^r\}\). Hence, in our construction the vector \(p^{-1}(i)\) will constitute the \(i\)-th row of \(\mathbf {P}\).

Let \(\lfloor x\rfloor \) denote the integer part of a real number \(x\), that is, the largest integer not greater than \(x\). Then, if \(i=p(i_1,\ldots ,i_r)=1+\sum _{j=1}^r(i_j-1)d^{j-1}\) with \(i_1,\ldots ,i_r\in \{1,\ldots ,d\}\), it is not hard to show that \(\lfloor (i-1)/d^{k-1}\rfloor =\sum _{j=k}^r(i_j-1)d^{j-k}\) for \(k=1,\ldots ,r\), so that the \(j\)-th coordinate of the vector \(\varvec{i}=(i_1,\ldots ,i_r)=p^{-1}(i)\) can be expressed as \(i_j=\lfloor (i-1)/d^{j-1}\rfloor -d\lfloor (i-1)/d^j\rfloor +1\).

Thus, assuming there is a floor \(()\) function available that can be applied in an element-wise form to a matrix and returns the integer part of each of its entries, the set \(\mathcal {PR}_{d,r}\) is efficiently obtained as the matrix \(\mathbf {P}=\mathtt{floor}(\mathbf {Q}_{-(r+1)})-d\cdot \mathtt{floor}(\mathbf {Q}_{-1})+1\), where \(\mathbf {Q}\) is a \(d^r\times (r+1)\) matrix whose \((i,j)\)-th entry is \((i-1)/d^{j-1}\) for \(i=1,\ldots ,d^r\) and \(j=1,\ldots ,r+1\), and \(\mathbf {Q}_{-k}\) refers to the sub-matrix obtained from \(\mathbf {Q}\) by deleting the \(k\)-th column.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chacón, J.E., Duong, T. Efficient recursive algorithms for functionals based on higher order derivatives of the multivariate Gaussian density. Stat Comput 25, 959–974 (2015). https://doi.org/10.1007/s11222-014-9465-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-014-9465-1

Keywords

Mathematics Subject Classification

Navigation