1 Introduction

Testing for multivariate normality (for short MVN) is a topic of ongoing interest. A survey of dozens of MVN tests, including graphical procedures for assessing multivariate normality, is provided by Mecklin and Mundfrom (2004). The review of Henze (2002) concentrates on affine invariant and consistent procedures, and the book of Thode (2002) contains a chapter on testing for MVN.

In a standard setting, let \(X,X_1,X_2,\ldots \) be independent identically distributed (i.i.d.) d-variate random (column) vectors, which are defined on a common probability space \((\varOmega ,{{{\mathcal {A}}}},{{\mathbb {P}}})\). The distribution of X will be denoted by \({{\mathbb {P}}}^X\). We write \(\hbox {N}_d(\mu ,\varSigma )\) for the d-variate normal distribution with expectation \(\mu \) and covariance matrix \(\varSigma \), and we let

$$\begin{aligned} {{{\mathcal {N}}}}_d := \{{\mathrm{N}}_d(\mu ,\varSigma ): \mu \in {{\mathbb {R}}}^d, \varSigma \ \text {positive definite}\} \end{aligned}$$

denote the class of all non-degenerate d-variate normal distributions. Testing for d-variate normality means testing the hypothesis

$$\begin{aligned} H_0: {{\mathbb {P}}}^X \in {{{\mathcal {N}}}}_d, \end{aligned}$$

against general alternatives, on the basis of \(X_1,\ldots ,X_n\). At the outset, it should be stressed that each model can merely hold approximately in practice. In particular, there can only be approximate normality, in whatever sense. Consequently, there is the following basic drawback inherent in any goodness-of-fit test, not only of \(H_0\), but also of other families of distributions: If a level-\(\alpha \)-test of \(H_0\) does not lead to a rejection of \(H_0\), the null hypothesis is by no means ‘validated’ or ‘confirmed.’ Presumably, there is merely not enough evidence to reject it! A further fundamental point is that there cannot be an optimal test of \(H_0\), if one really wants to detect general alternatives. In this respect, Janssen (2000) shows that the global power function of any nonparametric test is flat on balls of alternatives, except for alternatives coming from a finite-dimensional subspace. Thus, loosely speaking, each test of \(H_0\) has its own ‘non-centrality.’

Regarding the task of reviewing MVN tests here in 2020, we cite Mecklin and Mundfrom (2004), who write ‘the continuing proliferation of papers with new methods of assessing MVN makes it virtually impossible for any single survey article to cover all available tests.’ And they continue: ‘When compared to the amount of work that has been done in developing these tests, relatively little work has been done in evaluating the quality and power of the procedures.’

This review can also only be partial. We will take the above testing problem seriously and concentrate on genuine tests of \(H_0\) that have been proposed since the review Henze (2002), and we will judge each of these according to the following points of view:

  • affine invariance

  • theoretical properties (limit distributions under \(H_0\) and under fixed and contiguous alternatives to \(H_0\), consistency)

  • feasibility with respect to sample size and dimension.

Thus, e.g., we will not deal with tests for \(H_0\) that allow for \(n \le d\) (see Tan et al. 2005 or Yamada and Himeno 2019), since the condition \(n \ge d+1\) is necessary to decide whether the underlying covariance matrix is non-degenerate or not. Moreover, unlike the review of Mecklin and Mundfrom (2004), we will not discuss purely graphical procedures, as proposed in Holgersson (2006). We will also not embark upon a review of tests for normality in non-i.i.d.-settings, like testing for Gaussianity of the innovations in MGARCH processes (see, e.g., Lee and Ng 2011 or Lee et al. 2014), or situations with incomplete data (see, e.g., Yamada et al. 2015), since such a task would go beyond the scope of this review. We will also not review tests for Gaussianity in infinite-dimensional Hilbert spaces, see, e.g., Górecki et al. (2020) or Kellner and Celisse (2019).

Regarding affine invariance, notice that the class \({{{\mathcal {N}}}}_d\) is closed with respect to full rank affine transformations. Hence, any ‘genuine’ statistic \(T_n=T_n(X_1,\ldots ,X_n)\) (say) for testing \(H_0\) should satisfy \(T_n(AX_1+b,\ldots ,AX_n+b) = T_n(X_1,\ldots ,X_n)\) for each regular \((d\times d)\)-matrix A and each \(b \in {{\mathbb {R}}}^d\). Otherwise, it would be possible to reject \(H_0\) on given data and do not object against \(H_0\) on the same data, after performing a rotation, which makes little, if any, sense. In the sequel, let

$$\begin{aligned} Y_{n,j} = S_n^{-1/2}(X_j-{\overline{X}}_n), \quad j=1,\ldots ,n, \end{aligned}$$
(1.1)

denote the so-called scaled residuals. Here, \({\overline{X}}_n = n^{-1}\sum _{j=1}^n X_j\) is the sample mean, \(S_n = n^{-1}\sum _{j=1}^n (X_j-{\overline{X}}_n)(X_j-{\overline{X}}_n)^\top \) stands for the sample covariance matrix of \(X_1,\ldots ,X_n\), and the superscript \(\top \) denotes transposition of column vectors. The matrix \(S_n^{-1/2}\) is the unique symmetric square root of \(S_n^{-1}\). The latter matrix exists almost surely if \(n \ge d+1\) and \({{\mathbb {P}}}^X\) is absolutely continuous with respect to d-dimensional Lebesgue measure, see Eaton and Perlman (1973). These assumptions will be standing in what follows. We remark that \(S_n\) is sometimes defined with the factor \((n-1)^{-1}\) instead of \(n^{-1}\), but this difference does not have implications for asymptotic considerations. A good account on finite-sample distribution theory of \(Y_{n,1}, \ldots , Y_{n,n}\) under \(H_0\) is provided by Takeuchi (2020).

Affine invariance is achieved if the test statistic \(T_n\) is a function of \(Y_{n,i}^\top Y_{n,j}, \ i,j \in \{1,\ldots ,n\}\), or if \(T_n\) is a function of (only) \(Y_{n,1},\ldots ,Y_{n,n}\), and \(T_n(OY_{n,1}, \ldots ,OY_{n,n})= T_n(Y_{n,1},\ldots ,Y_{n,n})\) for each orthogonal \((d\times d)\)-matrix O. If a statistic \(T_n\) is affine invariant (henceforth invariant for the sake of brevity), the distribution of \(T_n\) under the null hypothesis \(H_0\) does not depend on the parameters \(\mu \) and \(\varSigma \) of the underlying normal distribution. Thus, regarding distribution theory under \(H_0\), we can without loss of generality assume that \({{\mathbb {P}}}^X = \text {N}_d(0,\text {I}_d)\). Here, 0 is the origin in \({{\mathbb {R}}}^d\), and \(\hbox {I}_d\) is the unit matrix of order d. But invariance of a statistic \(T_n\) also entails that it is no restriction to assume \({{\mathbb {E}}}X =0\) and \({{\mathbb {E}}}XX^\top = {\mathrm{I}}_d\) when studying the distribution of \(T_n\) under an alternative to \(H_0\) that satisfies \({{\mathbb {E}}}\Vert X\Vert ^2 < \infty \), where \(\Vert \cdot \Vert \) denotes the Euclidean norm in \({{\mathbb {R}}}^d\).

As for the second point, i.e., properties of a test of \(H_0\) based on a statistic \(T_n\) that go beyond mere simulation results, there should be a sound rationale for the test, which means that there should be good knowledge of what is estimated by \(T_n\) if the underlying distribution is not normal. This rationale is intimately connected to the property of consistency. If \(T_n\) is some invariant statistic, it must be regarded—perhaps after some suitable normalization—as an estimator of some invariant functional \({{{\mathcal {T}}}}(P)\) of the unknown underlying distribution P, where \(P={{\mathbb {P}}}^X\). This means that \({{{\mathcal {T}}}}(P)= {{{\mathcal {T}}}}({\widetilde{P}})\) if \({\widetilde{P}}\) is a full rank affine image of P, whence \({{{\mathcal {T}}}}(\cdot )\) is constant over the class \({{{\mathcal {N}}}}_d\). For such a functional, consistency of a test based on \({{{\mathcal {T}}}}\) against general alternatives can not be expected if \({{{\mathcal {T}}}}\) does not characterize the class \({{{\mathcal {N}}}}_d\), in the sense that there are \(P_1 \in {{{\mathcal {N}}}}_d\) and \(P_2 \notin {{{\mathcal {N}}}}_d\) such that \({{{\mathcal {T}}}}(P_1) = {{{\mathcal {T}}}}(P_2)\). Examples of non-characterizing functionals are time-honored measures of multivariate skewness and kurtosis, see Sect. 8. The most prominent of this group of tests is Mardia’s invariant nonnegative skewness functional

$$\begin{aligned} {{{\mathcal {T}}}}(P) = \beta _d^{(1)}(P) = {{\mathbb {E}}}\Big [ \Big ((X_1-\mu )^\top \varSigma ^{-1}(X_2-\mu )\Big )^3\Big ]. \end{aligned}$$
(1.2)

Here, \(X_1,X_2\) are i.i.d. with distribution P, mean \(\mu \) and non-singular covariance matrix \(\varSigma \). The functional \(\beta ^{(1)}_d\) does not characterize the class \({{{\mathcal {N}}}}_d\) since it does not only vanish on \({{{\mathcal {N}}}}_d\), but in particular also for each non-normal elliptically symmetric distribution for which the expectation figuring in (1.2) exists. This fact has striking consequences for a standard test of \(H_0\) that rejects \(H_0\) for large values of the sample counterpart of \(\beta ^{(1)}_d\), see Sect. 8.

The paper is organized as follows: Sect. 2 gives a thorough account on general aspects of weighted \(L^2\)-statistics for testing \(H_0\), and besides the class of BHEP tests, it reviews five recently proposed tests for multivariate normality that are based on either the characteristic function, the moment generating function, or a combination thereof. Section 3 reviews the Henze–Zirkler test with bandwidth depending on sample size and dimension, which is not a weighted \(L^2\)-statistic in the sense of Sect. 2. In Sect. 4, we summarize the most important features of the meanwhile well-established energy test of Székely and Rizzo (2005), and Section 5 deals with the test of Pudelko (2005). Section 6 reviews new theoretical results on a time-honored test of Cox and Small (1978), while Sect. 7 considers the test of Manzotti and Quiroz (2001), which is based on functions of spherical harmonics. In Sect. 8, we review tests based on skewness and kurtosis, and in Sect. 9, we try to give a brief account on further work on the subject. Section 10 presents the results of a large-scale simulation study that comprises each of the tests treated in Sects. 28. The final Sect. 11 draws some conclusions, and it gives an outlook for further research.

We conclude this section by pointing out some general notation. Throughout the paper, \({{{\mathcal {B}}}}^d\) stands for the \(\sigma \)-field of Borel sets in \({{\mathbb {R}}}^d\), \({{\mathcal {S}}}^{d-1} := \{x \in {{\mathbb {R}}}^d: \Vert x\Vert =1\}\) is the surface of the unit sphere in \({{\mathbb {R}}}^d\), and \(\varPhi (\cdot )\) denotes the distribution function of the standard normal distribution. The symbol \({\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}\) stands for convergence in distribution of random elements (variables, vectors and processes), and \({\mathop {\longrightarrow }\limits ^{{{{\mathbb {P}}}}}}\), \({\mathop {\longrightarrow }\limits ^{\text{ a.s. }}}\) denote convergence in probability and almost sure convergence, respectively. Each limit refers to the setting \(n \rightarrow \infty \). The symbol \({\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}\) denotes equality in distribution. Throughout the paper, each unspecified integral will be over \({{\mathbb {R}}}^d\). The acronyms (E)MGF and (E)CF stand for the (empirical) moment generating function and the (empirical) characteristic function, respectively. Finally, we write \(\mathbf{1}\{A\}\) for the indicator function of an event A.

2 Weighted \(L^2\)-statistics

In this chapter, we review the state of the art of weighted \(L^2\)-statistics for testing \(H_0\). These statistics have a long history, and they are also in widespread use for goodness-of-fit problems with many other distributions, see, e.g., Baringhaus et al. (2017). A weighted \(L^2\)-statistic for testing \(H_0\) takes the form

$$\begin{aligned} T_n = \int Z_n^2(t) w(t)\, {\mathrm{d}}t. \end{aligned}$$
(2.1)

Here, \(Z_n(t) = z_n(X_1,\ldots ,X_n,t)\), \(z_n\) is a real-valued measurable function defined on the (\(n+1\))-fold Cartesian product of \({{\mathbb {R}}}^d\), and \(w:{{\mathbb {R}}}^d \rightarrow {{\mathbb {R}}}\) is a nonnegative weight function satisfying

$$\begin{aligned} \int z_n^2(x_1,\ldots ,x_n,t) w(t)\, {\mathrm{d}}t < \infty \quad {\text { for each }} (x_1,\ldots ,x_n) \in ({{\mathbb {R}}}^d)^n. \end{aligned}$$

The function \(z_n\) can also be vector-valued; then \(Z_n^2(t)\) in (2.1) is replaced with \(\Vert Z_n(t)\Vert ^2\). Typically, \(Z_n(t)\) takes the form

$$\begin{aligned} Z_n(t) = \frac{1}{\sqrt{n}} \sum _{j=1}^n \ell \left( t^\top Y_{n,j}\right) , \quad t \in {{\mathbb {R}}}^d, \end{aligned}$$
(2.2)

where \(\ell (\cdot )\) is some measurable function satisfying \(\int {{\mathbb {E}}}\big [ \ell ^2(t^\top X)\big ]w(t) \, {\mathrm{d}}t < \infty \), and \({{\mathbb {E}}}\big [ \ell (t^\top X)\big ] =0\), \(t \in {{\mathbb {R}}}^d\), if \(X {\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{N}}_d(0,{\mathrm{I}}_d)\).

In view of (2.1), a natural setting to study asymptotic properties of \(T_n\) is the separable Hilbert space \({\mathbb {H}} := {\mathrm{L}}^2({{\mathbb {R}}}^d,{{{\mathcal {B}}}}^d,w(t){\mathrm{d}}t)\) of (equivalence classes) of measurabe functions on \({{\mathbb {R}}}^d\) that are square-integrable with respect to \(w(t)\text {d}t\). If \(\Vert f\Vert _{\mathbb {H}}:= \left( \int f^2(t) w(t)\, {\mathrm{d}}t\right) ^{1/2}\) denotes the norm of \(f \in {\mathbb {H}}\), then \(T_n = \Vert Z_n\Vert ^2_{\mathbb {H}}\). The general approach to derive the limit distribution of \(T_n\) under \(H_0\) is to prove \(Z_n {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}Z\) for some centered Gaussian random element of \({\mathbb {H}}\), whence \(T_n {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}\Vert Z\Vert ^2_{\mathbb {H}}\) by the continuous mapping theorem. To this end, it is indispensable to approximate \(Z_n\) figuring in (2.2) by a suitable random element \(Z_{n,0}\) of \({\mathbb {H}}\) of the form

$$\begin{aligned} Z_{n,0}(t) = \frac{1}{\sqrt{n}} \sum _{j=1}^n \ell _0(t^\top X_j), \end{aligned}$$
(2.3)

where \({{\mathbb {E}}}[\ell _0(t^\top X)] =0\), \(t \in {\mathbb {R}}^d\), \(\int {{\mathbb {E}}}[\ell _0^2(t^\top X)]w(t) {\mathrm{d}}t < \infty \), and \(\Vert Z_n - Z_{n,0}\Vert _{\mathbb {H}} {\mathop {\longrightarrow }\limits ^{{{{\mathbb {P}}}}}}0\). The central limit theorem in Hilbert spaces (see, e.g., Theorem 2.7 in Bosq 2000) then yields \(Z_{n,0} {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}Z\) for some centered Gaussian element of \({\mathbb {H}}\) having covariance kernel

$$\begin{aligned} K(s,t) = {{\mathbb {E}}}\big [ \ell _0(s,X)\ell _0(t,X)\big ], \quad s,t \in {{\mathbb {R}}}^d. \end{aligned}$$

The distribution of Z is uniquely determined by the kernel \(K(\cdot ,\cdot )\), and the distribution of \(\Vert Z\Vert ^2_{{\mathbb {H}}}\) is that of \(\sum _{j=1}^\infty \lambda _j N_j^2\), where the \(N_j\) are i.i.d. standard normal random variables, and \(\lambda _j\), \(j=1,2,\ldots \), are the positive eigenvalues corresponding to eigenfunctions \(f_j\) of the (linear second-order homogeneous Fredholm) integral equation

$$\begin{aligned} \lambda f(s) = \int K(s,t) f(t) w(t)\,{\mathrm{d}}t,\quad s\in {\mathbb {R}}^d, \end{aligned}$$
(2.4)

see, e.g., Kac and Siegert (1947). The problem of finding the eigenvalues and associated eigenfunctions of (2.4) is called the kernel eigenproblem. In this respect, hitherto none of the integral equations corresponding to the test presented in this section has been solved explicitly. Notice that knowledge of the largest eigenvalue \(\lambda _{max}\) (say) opens ground for the calculation of the approximate Bahadur slope and hence for statements on the Bahadur efficiency which, for asymptotically normal statistics, typically coincides with the Pitman efficiency, for details see Bahadur (1960) and Nikitin (1995).

To find a random element \(Z_{n,0}\) of the form (2.3) that approximates \(Z_n\), one has to evaluate the effect of replacing \(Y_{n,j}\) in (2.2) with \(X_j\). Putting \(\varDelta _{n,j} = Y_{n,j}-X_j\), \(j=1,\ldots ,n\), the following result, taken from Dörr et al. (2020), is helpful.

Proposition 1

Let \(X, X_1,X_2, \ldots \) be i.i.d. random vectors satisfying \({{\mathbb {E}}}\Vert X\Vert ^4 < \infty \), \({{\mathbb {E}}}(X) =0\) and \({{\mathbb {E}}}(X X^\top ) = {\mathrm{I}}_d\). We then have

$$\begin{aligned} \sum _{j=1}^n \Vert \varDelta _{n,j}\Vert ^2 = O_{{{\mathbb {P}}}}(1), \quad \frac{1}{n} \sum _{j=1}^n \Vert \varDelta _{n,j}\Vert ^2 {\mathop {\longrightarrow }\limits ^{\text{ a.s. }}}0, \quad \max _{j=1,\ldots ,n} \Vert \varDelta _{n,j}\Vert = o_{{\mathbb {P}}}\left( n^{-1/4}\right) . \end{aligned}$$

Since \(\ell (t^\top Y_{n,j}) = \ell (t^\top X_j + t^\top \varDelta _{n,j})\), the function \(\ell (\cdot )\) must be smooth enough to allow for a Taylor expansion. To tackle the linear part in this expansion, it is crucial to have some information on \(\varDelta _{n,j} = (S_n^{-1/2}-{\mathrm{I}}_d)X_j - S_n^{-1/2}{\overline{X}}_n\). Such information is provided by display (2.13) of Henze and Wagner (1997), according to which

$$\begin{aligned} \sqrt{n}(S_n^{-1/2}- {\mathrm{I}}_d) = - \frac{1}{2\sqrt{n}} \sum _{j=1}^n \left( X_jX_j^\top - {\mathrm{I}}_d \right) + O_{{\mathbb {P}}}\left( n^{-1/2}\right) . \end{aligned}$$

Since Proposition 1 holds under general assumptions, one may often obtain asymptotic normality of weighted \(L^2\)-statistics under fixed alternatives. To this end, notice that

$$\begin{aligned} \frac{T_n}{n} = \int \bigg ( \frac{1}{n}\sum _{j=1}^n \ell (t^\top Y_{n,j})\bigg )^2 w(t)\, {\mathrm{d}}t. \end{aligned}$$

Under suitable conditions, we will have \(T_n/n {\mathop {\longrightarrow }\limits ^{{{{\mathbb {P}}}}}}\varDelta \), where \(\varDelta = \Vert z\Vert ^2_{\mathbb {H}}\), and \(z(t) = {{\mathbb {E}}}\big [ \ell (t^\top X)\big ]\), \(z \in {{\mathbb {R}}}^d\). An immediate consequence of this stochastic convergence is the consistency of a test for \(H_0\) based on \(T_n\) against each alternative distribution that satisfies \(\varDelta >0\). But we have more! Writing \(\langle u,v \rangle _{\mathbb {H}} = \int u(t)v(t) w(t) \, \text {d}t\) for the inner product in \({\mathbb {H}}\), there is the decomposition

$$\begin{aligned} \sqrt{n}\left( \frac{T_n}{n} - \varDelta \right)= & {} \sqrt{n} \left( \Vert Z_n\Vert ^2_{{\mathbb {H}}} - \Vert z\Vert ^2_{{\mathbb {H}}} \right) \\= & {} \sqrt{n}\big \langle Z_n-z,Z_n+z \big \rangle _{\mathbb {H}}\\= & {} \sqrt{n}\big \langle Z_n-z,2z + Z_n-z \big \rangle _{\mathbb {H}}\\= & {} 2\big \langle \sqrt{n}(Z_n-z),z\big \rangle _{\mathbb {H}} + \frac{1}{\sqrt{n}}\Vert \sqrt{n}(Z_n-z)\Vert ^2_{{\mathbb {H}}}. \end{aligned}$$

These lines carve out the quintessence of asymptotic normality of weighted \(L^2\)-statistics under fixed alternatives. Namely, if one can show that the sequence \(V_n := \sqrt{n}(Z_n-z)\) of random elements of \({\mathbb {H}}\) converges in distribution to some centered Gaussian random element V of \({\mathbb {H}}\), then, by the continuous mapping theorem and Slutski’s lemma, we have

$$\begin{aligned} \sqrt{n}\left( \frac{T_n}{n} - \varDelta \right) \ {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}\ \mathrm{{N}}(0,\sigma ^2), \end{aligned}$$
(2.5)

where

$$\begin{aligned} \sigma ^2 = 4 \iint K(s,t) z(s)z(t) w(s)w(t) \ \text {d}s\text {d}t, \end{aligned}$$

and \(K(\cdot ,\cdot )\) is the covariance kernel of V, see Theorem 1 of Baringhaus et al. (2017). As a consequence, if \({\widehat{\sigma }}_n^2\) is a consistent estimator of \(\sigma ^2\) based on \(X_1,\ldots ,X_n\), then, for given \(\alpha \in (0,1)\),

$$\begin{aligned} I_{n,1-\alpha } = \bigg [ \frac{T_n}{n} - \varPhi ^{-1}\left( 1 - \frac{\alpha }{2}\right) \frac{{\widehat{\sigma }}_{n}}{\sqrt{n}}, \frac{T_n}{n} + \varPhi ^{-1}\left( 1 - \frac{\alpha }{2}\right) \frac{{\widehat{\sigma }}_{n}}{\sqrt{n}}\bigg ] \end{aligned}$$
(2.6)

is an asymptotic confidence interval for \(\varDelta \) of level \(1-\alpha \). Moreover, from (2.5) and Slutski’s lemma, we have

$$\begin{aligned} \frac{\sqrt{n}}{{\widehat{\sigma }}_n}\left( \frac{T_n}{n} - \varDelta \right) \ {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}\ \mathrm{{N}}(0,1), \end{aligned}$$
(2.7)

which opens the ground for a validation of a certain neighborhood of \(H_0\). Namely, suppose that we want to tolerate a given ‘distance’ \(\varDelta _0\) to the class \({{{\mathcal {N}}}}_d\). We may then consider the ‘inverse’ testing problem

$$\begin{aligned} H_{\varDelta _0}: \varDelta ({{\mathbb {P}}}^X) \ge \varDelta _0 \ \text { against } \ K_{\varDelta _0}: \varDelta ({{\mathbb {P}}}^X) < \varDelta _0. \end{aligned}$$

Here, the dependence of \(\varDelta \) on the underlying distribution \({{\mathbb {P}}}^X\) has been made explicit.

From (2.7), the test which rejects \(H_{\varDelta _0}\) if

$$\begin{aligned} \frac{T_n}{n} \ \le \ \varDelta _0 - \frac{{\widehat{\sigma }}_n}{\sqrt{n}} \varPhi ^{-1}(1-\alpha ), \end{aligned}$$

has asymptotic level \(\alpha \), and it is consistent against general alternatives, see Section 3.3 of Baringhaus et al. (2017). Notice that this test is in the spirit of bioequivalence testing (see, e.g., Czado et al. 2007; Dette and Munk 2003 or Wellek 2010), since it aims at validating a certain neighborhood of a hypothesized model.

We now review the time-honored class of BHEP tests and several recently suggested \(L^2\)-statistics for testing \(H_0\). Each of these statistics has an upper rejection region, and it is invariant, because it is a function of \(Y_{n,j}^\top Y_{n,k}\), where \(j,k \in \{1,\ldots ,n\}\).

2.1 The BHEP tests

Generalizing a test for univariate normality based on the ECF due to Epps and Pulley (1983), the first proposals for weighted \(L^2\)-statistics for testing \(H_0\) are due to Baringhaus and Henze (1988) and Henze and Zirkler (1990), who considered the statistic

$$\begin{aligned} \text {BHEP}_{n,\beta } = n \int \big | \varPsi _n(t) - \varPsi _0(t)\big |^2 w_\beta (t) \, \text {d}t. \end{aligned}$$
(2.8)

Here,

$$\begin{aligned} \varPsi _n(t) = \frac{1}{n} \sum _{j=1}^n \exp (\text {i}t^\top Y_{n,j}), \qquad t \in {{\mathbb {R}}}^d, \end{aligned}$$
(2.9)

denotes the ECF of \(Y_{n,1}, \ldots , Y_{n,n}\), \(\varPsi _0(t) = \exp (-\Vert t\Vert ^2/2)\) is the CF of the distribution \({\mathrm{N}}_d(0,{\mathrm{I}}_d)\), and the weight function \(w_\beta \) is given by

$$\begin{aligned} w_\beta (t) = \left( 2\pi \beta ^2\right) ^{-d/2} \exp \left( - \frac{\Vert t\Vert ^2}{2\beta ^2}\right) , \end{aligned}$$
(2.10)

where \(\beta >0\) is a fixed constant. That \(\text {BHEP}_{n,\beta }\) is indeed of the type (2.1) will become clear from the representation (2.13).

Whereas Baringhaus and Henze (1988) studied the special case \(\beta =1\), the general case was treated by Henze and Zirkler (1990). An extremely appealing feature of the weight function \(w_\beta \) in (2.10) is that \(\hbox {BHEP}_{n,\beta }\) takes the feasible form

$$\begin{aligned} \text {BHEP}_{n,\beta }= & {} \frac{1}{n} \sum _{j,k=1}^n \exp \left( -\frac{\beta ^2\Vert Y_{n,j}-Y_{n,k}\Vert ^2}{2}\right) \nonumber \\&- \frac{2}{(1+\beta ^2)^{d/2}} \sum _{j=1}^n \exp \left( - \frac{\beta ^2\Vert Y_{n,j}\Vert ^2}{2(1+\beta ^2)} \right) + \frac{n}{(1+2\beta ^2)^{d/2}}. \end{aligned}$$
(2.11)

The BHEP test is the most thoroughly studied class of tests for multivariate normality. Csörgő (1989) coined the acronym BHEP for this class of tests for \(H_0\), after early developers of the idea, and he proved that \(\liminf _{n\rightarrow \infty } n^{-1} {\mathrm{BHEP}}_{n,\beta } \ge C({{\mathbb {P}}}^X,\beta ) >0\) almost surely for some constant \(C({{\mathbb {P}}}^X,\beta )\) if \({{\mathbb {P}}}^X\) does not belong to \({{{\mathcal {N}}}}_d\). As a consequence, a test for normality based on \({\mathrm{BHEP}}_{n,\beta }\) is consistent against any alternative.

If \({{\mathbb {E}}}\Vert X\Vert ^2 < \infty \) and \({{\mathbb {E}}}X =0\), \({{\mathbb {E}}}XX^\top = {\mathrm{I}}_d\) (the last two assumptions entail no loss of generality in view of invariance), then

$$\begin{aligned} \frac{1}{n} {\mathrm{BHEP}}_{n,\beta } {\mathop {\longrightarrow }\limits ^{\text{ a.s. }}}\varDelta _\beta := \int \big | \varPsi (t) - \varPsi _0(t) \big |^2 w_\beta (t)\, {\mathrm{d}}t \end{aligned}$$
(2.12)

(Baringhaus and Henze 1988), where \(\varPsi (t) = {{\mathbb {E}}}\exp ({\mathrm{i}}t^\top X)\), \(t \in {\mathbb {R}}^d\), is the CF of X. Hence, \(\varDelta _\beta = \varDelta _\beta ({{\mathbb {P}}}^X)\) is the functional associated with the BHEP test. Using a Hilbert space setting, Gürtler (2000) proved (2.5) for \(T_n = {\mathrm{BHEP}}_{n,\beta }\), where \(\varDelta = \varDelta _\beta \) and \(\sigma ^2 = \sigma _\beta ^2\) depend on \(\beta \), under each alternative distribution satisfying \({{\mathbb {E}}}\Vert X\Vert ^4 < \infty \). Moreover, Gürtler (2000) obtained a sequence \({\widehat{\sigma }}^2_{n,\beta }\) of consistent estimators of \(\sigma _\beta ^2\) and thus an asymptotic confidence interval of the type (2.6).

In view of the representation (2.11), Baringhaus and Henze (1988) and Henze and Zirkler (1990) obtained the limit null distribution of \(\hbox {BHEP}_{n,\beta }\) as \(n \rightarrow \infty \) by means of the theory of V-statistics with estimated parameters. Upon observing that

$$\begin{aligned} \text {BHEP}_{n,\beta } = \int Z_n^2(t) \, w_\beta (t)\, \text {d}t, \end{aligned}$$
(2.13)

where \(Z_n(t) = n^{-1/2} \sum _{j=1}^n \left( \cos (t^\top Y_{n,j}) + \sin (t^\top Y_{n,j}) - \varPsi _0(t)\right) \), Henze and Wagner (1997) considered \(Z_n(\cdot )\) as a random element in a certain Fréchet space of random functions, and they showed that \(Z_n\) converges in distribution in that space to some centered Gaussian random element Z, see Theorem 2.1 of Henze and Wagner (1997). Moreover, \(\hbox {BHEP}_{n,\beta } {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}\int Z^2(t)\, w_\beta (t) \, \text {d} t\), and the test is able to detect a sequence of contiguous alternatives that approach \(H_d\) at the rate \(n^{-1/2}\). Henze and Wagner (1997) also obtained the first three moments of the limit null distribution of \(\hbox {BHEP}_{n,\beta }\). Finally, the class of BHEP tests is ‘closed at the boundaries’ \(\beta \rightarrow 0\) and \(\beta \rightarrow \infty \) since, elementwise on the underlying probability space, we have

$$\begin{aligned} \lim _{\beta \rightarrow 0} \frac{{\mathrm{BHEP}}_{n,\beta }}{\beta ^6} = \frac{n}{6} \cdot b_{n,d}^{(1)} + \frac{n}{4} \cdot {\widetilde{b}}_{n,d}^{(1)}, \end{aligned}$$
(2.14)

where \(b_{n,d}^{(1)}\) and \({\widetilde{b}}_{n,d}^{(1)}\) are given in (8.1) and (8.3), respectively, see Henze (1997b). Thus, as \(\beta \rightarrow 0\), a scaled version of \({\mathrm{BHEP}}_{n,\beta }\) is approximately a linear combination of two measures of multivariate skewness. The limit distribution of the right-hand side of (2.14) under general distributional assumptions on X has been studied by Henze (1997b). Last but not least, we have

$$\begin{aligned} \lim _{\beta \rightarrow \infty } \beta ^d \left( {\mathrm{BHEP}}_{n,\beta }-1 \right) = \frac{n}{2^{d/2}} - 2 \sum _{j=1}^n \exp \left( - \frac{\Vert Y_{n,j}\Vert ^2}{2}\right) , \end{aligned}$$
(2.15)

see Henze (1997b). Hence, as \(\beta \rightarrow \infty \), rejection of \(H_0\) for large values of \({\mathrm{BHEP}}_{n,\beta }\) means rejection of \(H_0\) for small values of \(\sum _{j=1}^n \exp (-\Vert Y_{n,j}\Vert ^2/2)\). The latter statistic, like Mardia’s measure of multivariate kurtosis \(b_{n,d}^{(2)}\) (see (8.1)), merely investigates an aspect of the ‘radial part’ of the underlying distribution.

Guided by theoretical and simulation based results in the univariate case, Tenreiro (2009) performed an extensive simulation study on the power of the BHEP test for dimensions \(d \in \{2,3,\ldots ,10,12,15 \}\) and sample sizes \(n \in \{20,40,60,80,100\}\). He concluded that the choice \(\beta =0.5\) gives ‘the best results for long tailed or moderately skewed alternatives, but it also produces very poor results for short tailed alternatives.’ If no relevant information about the tail of the alternatives is available, he strongly recommends the use of \(\beta = \sqrt{2}/(1.376+0.075d)\) (in fact, his recommendation is in terms of \(h = 1/(\beta \sqrt{2})\))), and there are similar recommendations for short tailed alternatives and long tailed or moderately skewed alternatives, respectively.

2.2 A weighted \(L^2\)-statistic via the moment generating function

Henze and Jiménez-Gamero (2019) generalized results of Henze and Koch (2020) to the multivariate case and considered a MGF analogue to the BHEP-test statistic. Letting

$$\begin{aligned} M_n(t) = \frac{1}{n} \sum _{j=1}^n \exp \left( t^\top Y_{n,j}\right) , \quad t\in {{\mathbb {R}}}^d, \end{aligned}$$
(2.16)

denote the EMGF of \(Y_{n,1}, \ldots ,Y_{n,n}\), and writing \( M_0(t)=\exp (\Vert t\Vert ^2/2)\), \(t \in {{\mathbb {R}}}^d\), for the MGF of the standard normal distribution \(\hbox {N}_d(0,\text {I}_d)\), the test statistic is

$$\begin{aligned} {\mathrm{HJ}}_{n,\gamma } = n \int \left( M_n(t) - M_0(t) \right) ^2 \, {\widetilde{w}}_\gamma (t) \, \text {d}t, \end{aligned}$$
(2.17)

where

$$\begin{aligned} {\widetilde{w}}_\gamma (t) = \exp \left( -\gamma \Vert t\Vert ^2 \right) , \end{aligned}$$
(2.18)

and \(\gamma >2\) is some fixed parameter. Notice that the condition \(\gamma >1\) is necessary for the integral in (2.17) to be finite, and the more stringent condition \(\gamma >2\) is needed for asymptotics under \(H_0\).

The test statistic \({\mathrm{HJ}}_{n,\gamma }\) has a representation analogous to (2.11) (see display (1.4) of Henze and Jiménez-Gamero 2019). Elementwise on the underlying probability space, we have

$$\begin{aligned} \lim _{\gamma \rightarrow \infty } \gamma ^{3+d/2} \, \frac{6 {\mathrm{HJ }}_{n,\gamma }}{\pi ^{d/2}} = \frac{n}{6} \cdot b_{n,d}^{(1)} + \frac{n}{4} {\widetilde{b}}_{n,d}^{(1)} \end{aligned}$$
(2.19)

which, interestingly, is the same limit as in (2.14). By working in the Hilbert space \({\mathrm{L}}^2({{\mathbb {R}}}^d,{{{\mathcal {B}}}}^d,{\widetilde{w}}_\gamma (t){\mathrm{d}}t)\) of (equivalence classes) of measurabe functions on \({{\mathbb {R}}}^d\) that are square-integrable with respect to \({\widetilde{w}}_\gamma (t)\text {d}t\), Henze and Jiménez-Gamero (2019) derived the limit null distribution of \({\mathrm{HJ}}_{n,\gamma }\), which is that of \({\mathrm{HJ}}_{\infty ,\gamma } := \int W^2(t) {\widetilde{w}}_\gamma (t) \, {\mathrm{d}}t\), where W is some centered Gaussian random element of that space. Henze and Jiménez-Gamero (2019) also obtained the expectation and the variance of \({\mathrm{HJ}}_{\infty ,\gamma }\). Moreover, if X is a (standardized) alternative distribution with the property \(M(t) :=\) \({{\mathbb {E}}}(\exp (t^\top X)) < \infty \), \(t \in {{\mathbb {R}}}^d\), then

$$\begin{aligned} \liminf _{n \rightarrow \infty } \frac{{\mathrm{HJ}}_{n,\gamma }}{n} \ \ge \ \int \left( M(t) -M_0(t) \right) ^2 \, {\widetilde{w}}_\gamma (t) \, \mathrm {d} t \qquad {{\mathbb {P}}}\text {-almost surely}. \end{aligned}$$
(2.20)

This inequality implies the consistency of the MVN test based on \({\mathrm{HJ}}_{n,\gamma }\) against those alternatives that have a finite MGF. Indeed, one may conjecture that this test is consistent against any alternative to \(H_0\).

2.3 A test based on a characterization involving the MGF and the CF

Volkmer (2014) proved a characterization of the univariate centered normal distribution, which involves both the CF and the MGF. Henze et al. (2019) generalized this result as follows: If X is a centered d-variate non-degenerate random vector with MGF \(M(t) = {{\mathbb {E}}}[\exp (t^\top X)]<\infty \), \(t \in {{\mathbb {R}}}^d\), and \(R(t) := {{\mathbb {E}}}[\cos (t^\top X)]\) denotes the real part of the CF of X, then

$$\begin{aligned} R(t) \, M(t)-1 \ =\ 0 \quad \text {for each } t \in {{\mathbb {R}}}^d \end{aligned}$$
(2.21)

holds true if and only if X follows some zero-mean normal distribution.

Since \(Y_{n,1}, \ldots , Y_{n,n}\) provide an empirical standardization of \(X_1,\ldots ,X_n\), a natural test statistic based on (2.21) is

$$\begin{aligned} {\mathrm{HJM}}_{n,\gamma } \ := \ n \int \left( R_n(t)\, M_n(t) - 1 \right) ^2\, {\widetilde{w}}_\gamma (t) \, \text {d} t, \end{aligned}$$

where

$$\begin{aligned} R_n(t) \ := \ \frac{1}{n} \sum _{j=1}^n \cos \left( t^\top Y_{n,j}\right) , \quad t \in {{\mathbb {R}}}^d, \end{aligned}$$

is the empirical cosine transform of the scaled residuals, and \(M_n(t)\) and \({\widetilde{w}}_\gamma (t)\) are given in (2.16) and (2.18), respectively. There is a representation of \({\mathrm{HJM}}_{n,\gamma }\) similar to (2.11), but involving a fourfold sum (see display (3.7) of Henze et al. 2019). The main results about \({\mathrm{HJM}}_{n,\gamma }\) are as follows: Elementwise on the underlying probability space, we have

$$\begin{aligned} \lim _{\gamma \rightarrow \infty } \gamma ^{3+d/2} \, \frac{8 {\mathrm{HJM}}_{n,\gamma }}{\pi ^{d/2}} = \frac{n}{6} \cdot b_{n,d}^{(1)} + \frac{n}{4} \cdot {\widetilde{b}}_{n,d}^{(1)}. \end{aligned}$$

Interestingly, this is the same linear combination of two measures of skewness as in (2.14) and (2.19). If \(\gamma >1\), then the limit null distribution of \({\mathrm{HJM}}_{n,\gamma }\) is that of \({\mathrm{HJM}}_{\infty ,\gamma } := \int W^2(t) {\widetilde{w}}_\gamma (t) \, {\mathrm{d}}t\), where W is a centered random element of the Hilbert space \({\mathrm{L}}^2({{\mathbb {R}}}^d,{{{\mathcal {B}}}}^d,{\widetilde{w}}(t){\mathrm{d}}t)\) with a covariance kernel given in Theorem 5.1 of Henze et al. (2019). Moreover, that paper also states a formula for \({{\mathbb {E}}}[{\mathrm{HJM}}_{\infty ,\gamma }]\) and, under the assumption \(M(t) < \infty \), \(t \in {\mathbb {R}}^d\), obtains the inequality

$$\begin{aligned} \liminf _{n \rightarrow \infty } \frac{{\mathrm{HJM}}_{n,\gamma }}{n} \ \ge \ \int \left( R(t)M(t) -1 \right) ^2 \, w_\gamma (t) \, \mathrm {d} t \quad {{\mathbb {P}}}\text {-almost surely}, \end{aligned}$$
(2.22)

which is analogous to (2.20), see Theorem 6.1 of Henze et al. (2019). We conjecture that also the MVN test based on \({\mathrm{HJM}}_{n,\gamma }\) is consistent against any non-normal alternative distribution.

2.4 A test based on a system of partial differential equations for the MGF

The novel idea of Henze and Visagie (2020) for constructing a test of \(H_0\) is the following: Suppose that the MGF \(M(t) = {\mathbb {E}} [\exp (t^\top X)]\) of a random vector X exists for each \(t \in {\mathbb {R}}^d\) and satisfies the system of partial differential equations

$$\begin{aligned} \frac{\partial M(t)}{\partial t_j} = t_j M(t), \quad t=(t_1,\ldots ,t_d)^\top \in {\mathbb {R}}^d, \quad j=1,\ldots ,d. \end{aligned}$$
(2.23)

Since \(M(0) =1\), it is easily seen that the only solution to (2.23) is \(M_0(t) = \exp (\Vert t\Vert ^2/2)\), \(t \in {\mathbb {R}}^d\), which is the MGF of \(\hbox {N}_d(0,\text {I}_d)\).

If \(H_0\) holds, the scaled residuals \(Y_{n,1},\ldots ,Y_{n,n}\) should be approximately independent, with a distribution close to \(\hbox {N}_d(0,\text {I}_d)\), at least for large n. Hence, a natural approach for testing \(H_0\) is to consider the EMGF \(M_n\) of \(Y_{n,1},\ldots ,Y_{n,n}\), defined in (2.16), and to employ the weighted \(L^2\)-statistic

$$\begin{aligned} {\mathrm{HV}}_{n,\gamma } := n \int \Vert \nabla M_n(t) - t M_n(t)\Vert ^2 \, {\widetilde{w}}_\gamma (t) \, \text {d}t, \end{aligned}$$

where \(\nabla f\) stands for the gradient of a function \(f:{{\mathbb {R}}}^d \rightarrow {{\mathbb {R}}}\), and \({\widetilde{w}}_\gamma \) is given in (2.18). Putting \(Y_{n,j,k}^+ = Y_{n,j} + Y_{n,k}\), \({\mathrm{HV}}_{n,\gamma }\) takes the feasible form

$$\begin{aligned}&{\mathrm{HV}}_{n,\gamma } = \frac{1}{n} \left( \frac{\pi }{\gamma }\right) ^{d/2} \sum _{j,k=1}^n \exp \left( \frac{\Vert Y_{n,j,k}^+\Vert ^2}{4 \gamma } \right) \\&\left( Y_{n,j}^\top Y_{n,k} - \frac{\Vert Y_{n,j,k}^+\Vert ^2}{2\gamma } + \frac{d}{2\gamma } + \frac{\Vert Y_{n,j,k}^+\Vert ^2}{4\gamma ^2} \right) . \end{aligned}$$

To derive the limit null distribution of \({\mathrm{HV}}_{n,\gamma }\), put \(W_n(t) := \sqrt{n} \left( \nabla M_n(t) - t M_n(t)\right) \). Since \(W_n(t)\) is \({{\mathbb {R}}}^d\)-valued, Henze and Visagie (2020) consider the Hilbert space \({\mathbb {H}}\), which is the d-fold (orthogonal) direct sum \({\mathbb {H}} := {\text {L}}^2\oplus \cdots \oplus {\text {L}}^2\), where \({\text {L}}^2= {\mathrm{L}}^2({{\mathbb {R}}}^d,{{{\mathcal {B}}}}^d,{\widetilde{w}}(t){\mathrm{d}}t)\). If \(\gamma >2\), there is some centered Gaussian random element W of \({\mathbb {H}}\) with a covariance (matrix) kernel given in display (11) of Henze and Visagie (2020), so that \(W_n {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}W\) as \(n \rightarrow \infty \). By the continuous mapping theorem, we then have \({\mathrm{HV}}_{n,\gamma } {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{HV}}_{\infty ,\gamma } := \int \Vert W(t)\Vert ^2 \, {\widetilde{w}}_\gamma (t) \, \mathrm{{d}}t\). Henze and Visagie (2020) also obtain a closed form expression for \({{\mathbb {E}}}[T_{\infty ,\gamma }]\). Moreover, if the MGF M(t) of X exists for each \(t \in {{\mathbb {R}}}^d\) and X is standardized, we have

$$\begin{aligned} \liminf _{n\rightarrow \infty } \frac{{{\mathrm{HV}}}_{n,\gamma }}{n} \ge \int \Vert \nabla M'(t) - tM(t)\Vert ^2 \, {\widetilde{w}}_\gamma (t) \, \mathrm{{d}}t \quad {{\mathbb {P}}}\text {-almost surely}, \end{aligned}$$

which parallels (2.20) and (2.22).

We remark in passing that a differential equation involving the moment generating function has been employed by Meintanis and Hlávka (2010) in connection with testing for bivariate and multivariate skew-normality.

2.5 A test based on the harmonic oscillator in characteristic function spaces

Dörr et al. (2020) noticed that the CF \(\varPsi _0(t) = \exp (-\Vert t\Vert ^2/2)\) of the distribution \({\mathrm{N}}(0,{\mathrm{I}}_d)\) is the unique solution of the partial differential equation

$$\begin{aligned} \varDelta f(x) - (\Vert x\Vert ^2-d) f(x)=0 \end{aligned}$$
(2.24)

subject to \(f(0) =1\), where \(\varDelta \) is the Laplace operator, see Theorem 1 of Dörr et al. (2020). The operator \(-\varDelta + \Vert x\Vert ^2 - d\) is called the harmonic oscillator, which is a special case of a Schrödinger operator. A suitable statistic for testing \(H_0\) that reflects this characterization is

$$\begin{aligned} {\mathrm{DEH}}_{n,\gamma }= & {} n\int _{{\mathbb {R}}^d}\left| \varDelta \varPsi _n(t)- \varDelta \varPsi _0(t)\right| ^2 {\widetilde{w}}_\gamma (t)\text{ d }t\nonumber \\= & {} n\int \bigg | \frac{1}{n} \sum _{j=1}^n \Vert Y_{n,j}\Vert ^2 \exp ({\mathrm{i}}t^\top Y_{n,j})+(\Vert t\Vert ^2-d)\varPsi _0(t)\bigg |^2{\widetilde{w}}_\gamma (t)\, \text{ d }t,\qquad \end{aligned}$$
(2.25)

where \({\widetilde{w}}_\gamma \) is given in (2.18) and \(\gamma >0\). The test statistic has the feasible form

$$\begin{aligned} {\mathrm{DEH}}_{n,\gamma }= & {} \left( \frac{\pi }{\gamma }\right) ^\frac{d}{2} \frac{1}{n}\sum _{j,k=1}^n\Vert Y_{n,j}\Vert ^2\Vert Y_{n,k}\Vert ^2\exp \left( -\frac{1}{4\gamma } \Vert Y_{n,j}-Y_{n,k}\Vert ^2\right) \\&-\,\frac{2(2\pi )^{\frac{d}{2}}}{(2\gamma +1)^{2+\frac{d}{2}}} \sum _{j=1}^n\Vert Y_{n,j}\Vert ^2\left( \Vert Y_{n,j}\Vert ^2+2d\gamma (2\gamma +1)\right) \exp \left( -\frac{1}{2}\frac{\Vert Y_{n,j}\Vert ^2}{2\gamma +1}\right) \\&+\,n\frac{\pi ^{\frac{d}{2}}}{(\gamma +1)^{2+\frac{d}{2}}} \left( \gamma (\gamma +1)d^2+\frac{d(d+2)}{4}\right) . \end{aligned}$$

Like the class of BHEP tests, also the class of tests based on \({\mathrm{DEH}}_{n,\gamma }\) is closed at the boundaries \(\gamma \rightarrow 0\) and \(\gamma \rightarrow \infty \), since—elementwise on the underlying probability space—we have

$$\begin{aligned} \lim _{\gamma \rightarrow 0} \left( \frac{\gamma }{\pi }\right) ^{d/2} {\mathrm{DEH}}_{n,\gamma } = b_{n,d}^{(2)}, \quad \lim _{\gamma \rightarrow \infty } \frac{2}{n\pi ^{\frac{d}{2}}}\gamma ^{\frac{d}{2}+1} {\mathrm{DEH}}_{n,\gamma } = {\widetilde{b}}_{n,d}^{(1)}. \end{aligned}$$

Here, \(b_{n,d}^{(2)}\) is multivariate kurtosis in the sense of Mardia (1970), defined in (8.1), and \({\widetilde{b}}_{n,d}^{(1)}\) is skewness in the sense of Móri et al. (1993), see (8.3). Dörr et al. (2020) proved a Hilbert space central limit theorem for the sequence of random elements

$$\begin{aligned} V_n(t) = \frac{1}{\sqrt{n}} \sum _{j=1}^n \left( \Vert Y_{n ,j}\Vert ^2\big \{ \cos (t^\top Y_{n,j}) + \sin (t^\top Y_{n,j})\big \} - \mu (t)\right) , \quad t \in {{\mathbb {R}}}^d, \end{aligned}$$

where \(\mu (t) = {{\mathbb {E}}}[\Vert X\Vert ^2(\cos (t^\top X) + \sin (t^\top X))]\) and X is a standardized random vector satisfying \({{\mathbb {E}}}\Vert X\Vert ^4 < \infty \). Since \(\mu (t) = -\varDelta \varPsi _0(t)\) if \(X {\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{N}}_d(0,{\mathrm{I}}_d)\) and \({\mathrm{DEH}}_{n,\gamma } = \int V_n^2(t) {\widetilde{w}}_\gamma (t) \, {\mathrm{d}}t\) for that choice of \(\mu (t)\), the authors obtained the limit distribution of \({\mathrm{DEH}}_{n,\gamma }\) under \(H_0\) as well as under contiguous and fixed alternatives to \(H_0\). Under \(H_0\), we have \({\mathrm{DEH}}_{n,\gamma } {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}\int V^2(t) {\widetilde{w}}_\gamma (t) \, {\mathrm{d}}t\), where V is the centered limit Gaussian random element of the sequence \((V_n)\) (with \(\mu (t) = - \varDelta \varPsi _0(t)\)). Under contiguous alternatives that approach \(H_0\) at the rate \(n^{-1/2}\), the limit distribution of \({\mathrm{DEH}}_{n,\gamma }\) is that of \(\int (V(t)+c(t))^2 {\widetilde{W}}_\gamma (t) \, {\mathrm{d}}t\), where \(c(\cdot )\) is a shift function (see Section 6 of Dörr et al. 2020). Under a fixed (and because of invariance without loss of generality standardized) alternative distribution satisfying \({{\mathbb {E}}}\Vert X\Vert ^4 < \infty \), we have

$$\begin{aligned} \frac{{\mathrm{DEH}}_{n,\gamma }}{n} \rightarrow {\mathrm{D}}_\gamma := \int \big |\varDelta \varPsi (t) - \varDelta \varPsi _0(t)\big |^2 {\widetilde{w}}_\gamma (t) \, {\mathrm{d}}t \quad {{\mathbb {P}}}\text {-almost surely,} \end{aligned}$$

where \(\varPsi \) is the CF of X. Moreover, the limit distribution of \(\sqrt{n}({\mathrm{DEH}}_{n,\gamma }/n - {\mathrm{D}}_\gamma )\) is a centered normal distribution with a variance that, under the stronger condition \({{\mathbb {E}}}\Vert X\Vert ^6 < \infty \), can be consistently estimated from the data. Thus, by analogy with (2.6), an asymptotic confidence interval for \({\mathrm{D}}_\gamma \) is available. Notice that, when compared with (2.12), the almost sure limits above are ‘Laplacian analogues’ of (2.12).

2.6 A test based on a double estimation in a characterizing PDE

Dörr et al. (2020a) suggested to replace both of the functions f occurring in (2.24) by the ECF \(\varPsi _n\). Since, under \(H_0\), \(\varDelta \varPsi _n(t)\) and \((\Vert t\Vert ^2 -d)\varPsi _n(t)\) should be close to each other for large n, it is tempting to see what happens if, instead of \({\mathrm{DEH}}_{n,\gamma }\) defined in (2.25), we base a test of \(H_0\) on the weighted \(L^2\)-statistic

$$\begin{aligned} {\mathrm{DEH}}^*_{n,\gamma } = n\int \left| \varDelta \varPsi _n(t)-\left( \Vert t\Vert ^2-d\right) \varPsi _n(t)\right| ^2{\widetilde{w}}_\gamma (t)\, \text{ d }t \end{aligned}$$

and reject \(H_0\) for large values of \({\mathrm{DEH}}^*_{n,\gamma }\). Putting \(D^2_{n,j,k} := \Vert Y_{n,j}-Y_{n,k}\Vert ^2\), \(E_{n,j,k} = \exp (-D^2_{n,j,k}/(4\gamma ))\), \(a_{d,\gamma } =2\gamma d(2\gamma - 1)\), \(b_{d,\gamma } = 16d^2\gamma ^3(\gamma - 1) + 4d(d + 2)\gamma ^2\), \(c_{d,\gamma } = (\pi /\gamma )^{d/2}\), and \(e_{d,\gamma } = 8d\gamma ^2 - 4(d + 2)\gamma \), the statistic \({\mathrm{DEH}}^*_{n,\gamma }\) has the feasible representation

$$\begin{aligned} {\mathrm{DEH}}^*_{n,\gamma }= & {} \frac{c_{d,\gamma }}{n} \sum _{j,k=1}^n \! \Biggl [ \Vert Y_{n,j}\Vert ^2\Vert Y_{n,k}\Vert ^2 E_{n,j,k} - \frac{\Vert Y_{n,j}\Vert ^2 + \Vert Y_{n,k}\Vert ^2}{4\gamma ^2}\bigl (D^2_{n,j,k} + a_{d,\gamma }\bigr )E_{n,j,k} \\&+\frac{E_{n,j,k}}{16\gamma ^4}\Bigl (b_{d,\gamma } + (D^2_{n,j,k})^2 + e_{d,\gamma } D^2_{n,j,k}\Bigr )\Biggr ]. \end{aligned}$$

Also the class of tests based on \({\mathrm{DEH}}^*_{n,\gamma }\) is ‘closed at the boundaries \(\gamma \rightarrow 0\) and \(\gamma \rightarrow \infty \)’ since, elementwise on the underlying probability space, we have

$$\begin{aligned} \lim _{\gamma \rightarrow 0} \left[ \left( \frac{\gamma }{\pi }\right) ^{d/2} {\mathrm{DEH}}^*_{n,\gamma } - \frac{d(d + 2)}{4\gamma ^2}\right] = b_{n,d}^{(2)} - d^2, \ \ \lim _{\gamma \rightarrow \infty } \frac{2\gamma ^{d/2+1}}{n\pi ^{d/2}}{\mathrm{DEH}}^*_{n,\gamma } = {\widetilde{b}}_{n,d}^{(1)},\nonumber \\ \end{aligned}$$
(2.26)

where \(b_{n,d}^{(2)}\) and \({\widetilde{b}}_{n,d}^{(1)}\) are given in (8.1) and (8.3), respectively. Under \(H_0\), we have \({\mathrm{DEH}}^*_{n,\gamma } {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{DEH}}^*_{\infty ,\gamma } := \int {{{\mathcal {S}}}}^2(t) {\widetilde{w}}_\gamma (t) \, {\mathrm{d}}t\), where \({{{\mathcal {S}}}}\) is some centered Gaussian random element of \({\mathrm{L}}^2({{\mathbb {R}}}^d,{{{\mathcal {B}}}}^d,{\widetilde{w}}_\gamma (t){\mathrm{d}}t)\). Dörr et al. (2020a) also obtain a closed-form expression for \({{\mathbb {E}}}[{\mathrm{DEH}}^*_{\infty ,\gamma }]\).

If X has a standardized alternative distribution satisfying \({{\mathbb {E}}}\Vert X\Vert ^4 < \infty \), we have

$$\begin{aligned} \frac{{\mathrm{DEH}}^*_{n,\gamma }}{n} {\mathop {\longrightarrow }\limits ^{\text{ a.s. }}}D^*_\gamma := \int |-\varDelta \varPsi ^+(t) + (\Vert t\Vert ^2 - d)\varPsi ^+(t)|^2 {\widetilde{w}}_\gamma (t) \, {\mathrm{d}}t, \end{aligned}$$

where \(\varPsi ^+(t) = {\mathbb {E}}[\cos (t^\top X)] + {\mathbb {E}}[\sin (t^\top X)]\). Hence, \(D^*_\gamma \) is the measure of distance from \(H_0\) associated with \({\mathrm{DEH}}^*_{n\gamma }\). Interestingly, under the stronger condition \({{\mathbb {E}}}\Vert X\Vert ^6 < \infty \), we have

$$\begin{aligned} \lim _{\gamma \rightarrow \infty } \frac{2\gamma ^{d/2+1}}{\pi ^{d/2}}D^*_\gamma = \left\| {\mathbb {E}}\left( \Vert X\Vert ^2X\right) \right\| ^2. \end{aligned}$$

Since the right hand side is population skewness in the sense of Móri et al. (1993) (see Sect. 8), this result complements the second limit in (2.26). Dörr et al. (2020a) also show that, under a fixed alternative distribution satisfying \({{\mathbb {E}}}\Vert X\Vert ^4 < \infty \), \(\sqrt{n}\big ({\mathrm{DEH}}^*_{n,\gamma }/n- D^*_\gamma \big ))\) has a centered limit normal distribution with a variance that can be consistently estimated from \(X_1,\ldots ,X_n\).

3 The Henze–Zirkler test

Henze and Zirkler (1990) observed that the BHEP-statistic defined in (2.8) may be written in the form

$$\begin{aligned} {\mathrm{BHEP}}_{n,\beta } = (2\pi )^{d/2} \beta ^{-d} \int _{{\mathbb {R}}^d} \left( g_{n,\beta }(x) - \frac{1}{(2\pi \tau ^2)^{d/2}} \exp \left( - \frac{\Vert x\Vert ^2}{2 \tau ^2} \right) \right) ^2 {\mathrm{d}}x, \end{aligned}$$

where \(\tau ^2 = (2\beta ^2+1)/(2\beta ^2)\), and

$$\begin{aligned} g_{n,\beta }(x) = \frac{1}{nh^d} \sum _{j=1}^n \frac{1}{(2\pi )^{d/2}} \exp \left( - \frac{\Vert x- Y_{n,j}\Vert ^2}{2h^2} \right) , \end{aligned}$$

where \(h^2= 1/(2\beta ^2)\). The function \(g_{n,\beta }\) is a nonparametric kernel density estimator with Gaussian kernel \(w_1\) (recall \(w_\beta \) from (2.10)) and bandwidth h, applied to \(Y_{n,1}, \ldots , Y_{n,n}\). The choice \(h= h_n = (4/(2d+1)n)^{1/(d+4)}\), taken from Silverman (1986), p. 87, yields \(\beta = \beta _n\), where

$$\begin{aligned} \beta _n = 2^{-1/2} ((2d+1)n/4)^{1/(d+4)}. \end{aligned}$$
(3.1)

The Henze–Zirkler test statistic is given by \({\mathrm{HZ}}_n = {\mathrm{BHEP}}_{n,\beta _{n}}\). Notice that the optimal bandwidth that minimizes the asymptotic MISE of the kernel density estimator, when both the kernel and the underlying density are the standard d-variate normal density, is not \(h_n\) given above, but \({\widetilde{h}}_n = (4/(d+2)n)^{1/(d+4)}\).

Apparently unaware of the work of Henze and Zirkler (1990), Bowman and Foster (1993) proposed a test statistic \({\mathrm{BF}}_n\) that turned out to satisfy \({\mathrm{BF}}_n = \beta _n^d (2 \pi )^{d/2} {\mathrm{BHEP}}_{n,\beta _n}\) (see Section 7 of Henze 2002). Thus, \({\mathrm{BF}}_n\) is equivalent to a BHEP-statistic with a smoothing parameter that depends on n. Gürtler (2000) proved that

$$\begin{aligned} \frac{nh^d2^d\pi ^{d/2}{\mathrm{BF}}_n -1}{2^{1/2-d/4}h^{d/2}} {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{N}}(0,1) \end{aligned}$$
(3.2)

as \(n \rightarrow \infty \) under \(H_0\). Under a fixed standardized alternative distribution with density f, Gürtler (2000) showed that

$$\begin{aligned} \frac{\sqrt{n}}{2} \left( {\mathrm{BF}}_n - \frac{1}{nh_n^d2^d\pi ^{d/2}} - C(f,h_n) \right) {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{N}}(0,\sigma ^2(f)) \end{aligned}$$
(3.3)

for constants \(\sigma ^2(f)\) and \(C(f,h_n)\), where \(\lim _{n \rightarrow \infty } C(f,h_n) = \int (f(x)-w_1(x))^2 {\mathrm{d}}x\). In view of \(nh_n^d \rightarrow \infty \), (3.3) entails \({\mathrm{BF}}_n {\mathop {\longrightarrow }\limits ^{{{{\mathbb {P}}}}}}\int (f(x)-w_1(x))^2 \, {\mathrm{d}}x\) under f. Hence, the test of \(H_0\) based on \({\mathrm{BF}}_n\) (or \({\mathrm{HZ}}_n\)) is consistent against general alternatives. However, since (3.2) remains true under contiguous alternatives that approach \(H_0\) at the rate \(n^{-1/2}\), the Henze–Zirkler (Bowman–Foster) test is not able to detect such alternatives, see also Tenreiro (2007) for more general results on Bickel–Rosenblatt-type statistics.

4 The energy test

For nearly 20 years now, the energy test has emerged as a strong genuine test for multivariate normality. It is based on the notion of energy distance between multivariate distributions. The naming energy stems from a close analogy with Newton’s gravitational potential energy, see, e.g., Székely and Rizzo (2013). Besides goodness-of-fit testing, the concept of energy distance has found applications in many other fields, such as testing for equality of distributions, nonparametric extensions of analysis of variance, clustering, or testing for independence via distance covariance and distance correlation, see, e.g., Székely and Rizzo (2016).

If X and Y are independent random vectors with distributions \({{\mathbb {P}}}^X\) and \({{\mathbb {P}}}^Y\), and \(X'\) and \(Y'\) denote independent copies of X and Y, respectively, then the squared energy distance between \({{\mathbb {P}}}^X\) and \({{\mathbb {P}}}^Y\) is defined as

$$\begin{aligned} D^2({{\mathbb {P}}}^X,{{\mathbb {P}}}^Y) := 2 {{\mathbb {E}}}\Vert X-Y\Vert - {{\mathbb {E}}}\Vert X-X'\Vert - {{\mathbb {E}}}\Vert Y-Y\Vert , \end{aligned}$$

provided these expectations exist (which is tacitly assumed). The energy distance \(D({{\mathbb {P}}}^X,{{\mathbb {P}}}^Y)\) satisfies all axioms of a metric. A proof of the fundamental inequality \(D({{\mathbb {P}}}^X,{{\mathbb {P}}}^Y) \ge 0\), with equality if and only if \({{\mathbb {P}}}^X={{\mathbb {P}}}^Y\), follows from Zinger et al. (1992) or Mattner (1997), see also Székely and Rizzo (2005) for a different proof related to a result of Morgenstern (2001).

The energy test statistic for testing \(H_0\) is

$$\begin{aligned} {{{\mathcal {E}}}}_n := n \left( \frac{2}{n}\sum _{j=1}^n {{\mathbb {E}}}\Vert {\widetilde{Y}}_{n,j}-N_1\Vert - {{\mathbb {E}}}\Vert N_1-N_2\Vert - \frac{1}{n^2}\sum _{j,k=1}^n \Vert {\widetilde{Y}}_{n,j}-{\widetilde{Y}}_{n,k}\Vert \right) . \end{aligned}$$

Here, \({\widetilde{Y}}_{n,j} = \sqrt{n/(n-1)}Y_{n,j}\) with \(Y_{n,j}\) given in (1.1) and \(N_1\) and \(N_2\) are independent random vectors with the normal distribution \(\hbox {N}_d(0,\text {I}_d)\), which are independent of \(X_1,\ldots ,X_n\). The first expectation is with respect to \(N_1\). Notice that \({{\mathbb {E}}}\Vert N_1-N_2\Vert = 2\Gamma ((d+1)/2)/\Gamma (d/2)\), where \(\Gamma (\cdot )\) is the gamma function. Since, for \(a \in {{\mathbb {R}}}^d\), the distribution of \(\Vert a-N_1\Vert ^2\) does only depend on \(\Vert a\Vert ^2\), the statistic \({{{\mathcal {E}}}}_n\) is seen to be invariant. The energy test for multivariate normality rejects \(H_0\) for large values of \({{{\mathcal {E}}}}_n\). It is consistent against each fixed non-normal alternative, see Székely and Rizzo (2005), and it is fully implemented in the energy package for R, see Rizzo and Székely (2014). To the authors’ knowledge, there are hitherto no results on the behavior of \({{{\mathcal {E}}}}_n\) with respect to contiguous alternatives to \(H_0\). Since the intrinsic (quadratic) measure of distance between an alternative distribution \({{\mathbb {P}}}^X\) (which, because of invariance, may be taken as having zero mean and unit covariance matrix) and the standard d-variate normal distribution \(\hbox {N}_d(0,\text {I}_d)\) is given by \(\varDelta _E({{\mathbb {P}}}^X) := D^2({{\mathbb {P}}}^X,{\text {N}}_d(0,{\text {I}}_d))\), say, it would be interesting to see whether \(\sqrt{n}({{{\mathcal {E}}}}_n - \varDelta _E({{\mathbb {P}}}^X))\) has a non-degenerate normal limit as \(n \rightarrow \infty \), with a variance that can consistently be estimated from the data \(X_1,\ldots ,X_n\). Such a result would pave the way for an asymptotic confidence interval for \(\varDelta _E({{\mathbb {P}}}^X)\).

5 The test of Pudelko

For a fixed \(r>0\), Pudelko (2005) suggested to reject \(H_0\) for large values of the weighted supremum distance

$$\begin{aligned} {\mathrm{PU}}_{n,r} = \sqrt{n} \sup _{0< \Vert t\Vert \le r} \frac{|\varPsi _n(t)- \varPsi _0(t)|}{\Vert t\Vert }, \end{aligned}$$

where \(\varPsi _n(t)\) is given in (2.9), and \(\varPsi _0(t) = \exp (-\Vert t\Vert /2)\). The test statistic \({\mathrm{PU}}_{n,r}\) is invariant, since it is a function of the scaled residuals \(Y_{n,1},\ldots ,Y_{n,n}\) and rotation invariant. This statistic is similar in spirit as the statistic studied by Csörgő (1986), which is \(\sup _{\Vert t\Vert \le r}\big ||\varPsi _n(t)|^2 - \varPsi _0^2(t)\big |\). Under \(H_0\), \({\mathrm{PU}}_{n,r}\) converges in distribution to \(\sup _{0<\Vert t\Vert \le r} |{{{\mathcal {P}}}}(t)|/\Vert t\Vert \), where \({{{\mathcal {P}}}}(\cdot )\) is a centered Gaussian random element of the Banach space \(C(B_r)\) of complex-valued continuous functions, defined on \(B_r:= \{x \in {{\mathbb {R}}}^d: \Vert x \Vert \le r\}\), equipped with the supremum norm \(\Vert f\Vert _{C(B_r)} := \sup _{x \in B_r}|f(x)|\). Pudelko (2005) also showed that the test is able to detect contiguous alternatives that approach \(H_0\) at the rate \(n^{-1/2}\). The consistency of the test based on \({\mathrm{PU}}_{n,r}\) follows easily from Csörgő (1989). A drawback of this test is its lack of feasibility, since one has to calculate the supremum of a function inside a d-dimensional sphere.

6 The test of Cox and Small

According to Cox and Small (1978), a main objective of tests of \(H_0\) is ’to see whether an estimated covariance matrix provides an adequate summary of the interrelationships among a set of variables,’ and that departure from multivariate normality ’is often the occurrence of appreciable nonlinearity of dependence.’ To obtain an affine invariant test that assesses the degree of nonlinearity, they propose to find that pair of linear combinations of the original variables, such that one has maximum curvature in its regression on the other. The population functional which underlies the test of Cox and Small is \(T_{CS}({{\mathbb {P}}}^X)=\max _{b\in {\mathcal {S}}^{d-1}}\eta ^2(b)\), where

$$\begin{aligned} \eta ^2(b)=\frac{\left\| {{\mathbb {E}}}\left( X(b^\top X)^2\right) \right\| ^2-\left( {{\mathbb {E}}}\left( b^\top X\right) ^3\right) ^2}{{{\mathbb {E}}}\left( b^\top X\right) ^4-1-\left( {{\mathbb {E}}}\left( b^\top X\right) ^3\right) ^2}, \end{aligned}$$

see Cox and Small (1978), p. 268. The test statistic is \( T_{n,CS}=\max _{b\in {\mathcal {S}}^{d-1}}\eta _n^2(b)\), where

$$\begin{aligned} \eta _n^2(b)=\frac{\left\| n^{-1}\sum _{j=1}^nY_{n,j}(b^\top Y_{n,j})^2\right\| ^2-\left( n^{-1}\sum _{j=1}^n(b^\top Y_{n,j})^3\right) ^2}{n^{-1}\sum _{j=1}^n(b^\top Y_{n,j})^4-1-\left( n^{-1}\sum _{j=1}^n(b^\top Y_{n,j})^3\right) ^2} \end{aligned}$$

is the empirical counterpart of \(\eta ^2(b)\). Rejection of \(H_0\) will be for large values of \(T_{n,CS}\). The statistic \(T_{n,CS}\) is affine invariant, since it is both a function of \(Y_{n,1},\ldots ,Y_{n,n}\) and rotation invariant. Notice that the functional \(T_{CS}\) vanishes on the set \({{{\mathcal {N}}}}_d\), but \(T_{CS}({{\mathbb {P}}}^X) =0\) does not necessarily imply that \({{\mathbb {P}}}^X \in {{{\mathcal {N}}}}_d\). Some missing distributional properties of the statistic \(T_{n,CS}\) were provided by Ebner (2012). If \({{\mathbb {P}}}^X\) is elliptically symmetric and satisfies \({{\mathbb {E}}}\Vert X\Vert ^6 < \infty \), then

$$\begin{aligned} nT_{n,CS}{\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}\frac{d(d+2)}{3m_4-d(d+2)}\max _{b\in {\mathcal {S}}^{d-1}}W(b)^\top B W(b), \end{aligned}$$

where \(m_4=E\Vert X\Vert ^4\), B is the \((d+1)\times (d+1)\)-matrix \(\text{ diag }(1,\ldots ,1,-1)\), and \(W(\cdot )\) is a centered \((d+1)\)-variate Gaussian process in \(C({\mathcal {S}}^{d-1},{\mathbb {R}}^{d+1})\), the space of continuous functions from \({\mathcal {S}}^{d-1}\) to \({\mathbb {R}}^{d+1}\) (see Theorem 2.4 of Ebner 2012, where the covariance matrix kernel of W is given explicitly). As a consequence, the test of Cox and Small is not able to detect such elliptical alternatives to normality. Next, writing \(\mu (b)={{\mathbb {E}}}((b^\top X)^2(X,(b^\top X))^\top )\), we have

$$\begin{aligned} T_{n,CS}{\mathop {\longrightarrow }\limits ^{{{{\mathbb {P}}}}}}\max _{b\in {\mathcal {S}}^{d-1}}\frac{\mu (b)^\top B\mu (b)}{{{\mathbb {E}}}(b^\top X)^4-1-({{\mathbb {E}}}(b^\top X)^3)^2} \end{aligned}$$

if \({{\mathbb {E}}}\Vert X\Vert ^6 < \infty \). Thus, the test based on \(T_{n,CS}\) is consistent against each alternative distribution for which the above stochastic limit \(\delta ({{\mathbb {P}}}^X)\) (say) is positive. Ebner (2012) also provides the limit distribution of \(T_{n,CS}\) under contiguous alternatives to \(H_0\), but it is still an open problem whether \(\sqrt{n}(T_{n,CS}- \delta ({{\mathbb {P}}}^X))\) has a non-degenerate limit distribution as \(n \rightarrow \infty \). From a practical point of view, the test of Cox and Small has the drawback that finding the maximum of \(\eta _n^2(b)\) over \(b \in {\mathcal {S}}^{d-1}\) is a computationally extensive task.

7 The test of Manzotti and Quiroz

Manzotti and Quiroz (2001) propose to test \(H_0\) by means of averages over the standardized sample of multivariate spherical harmonics, radial functions and their products. For \(k\in {\mathbb {N}}\) let \(f_1,\ldots ,f_k:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), such that \({{\mathbb {E}}}f_j^2(X) <\infty \) if \(X{\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}\text{ N}_d(0,{\mathrm{I}}_d)\), \(j=1,\ldots ,k\). Let \(V=(v_{ij})\) be the (\(k\times k\))-matrix with entries

$$\begin{aligned} v_{ij}={{\mathbb {E}}}[f_i(X)f_j(X)]-{{\mathbb {E}}}f_i(X) \, {{\mathbb {E}}}f_j(X),\quad X{\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}\text{ N}_d(0,I_d), \end{aligned}$$

where V is assumed to be invertible. For \({\mathbf {f}}=(f_1,\ldots ,f_k)^\top \), let

$$\begin{aligned} \nu _n(f_j)=\frac{1}{\sqrt{n}}\sum _{\ell =1}^n \big \{ f_j(Y_{n,\ell })-{{\mathbb {E}}}f_j(X) \big \} \quad \text{ and }\quad \nu _n({\mathbf {f}})=(\nu _n(f_1),\ldots ,\nu _n(f_k))^\top . \end{aligned}$$

The general type of test statistic of Manzotti and Quiroz (2001) is the quadratic form

$$\begin{aligned} T_{n,MQ}({\mathbf {f}})=\nu _n({\mathbf {f}})^\top V^{-1}\nu _n({\mathbf {f}}). \end{aligned}$$

To be more specific, let \({\mathcal {H}}_j\), \(j \ge 0\), be the set of spherical harmonics of degree j in the orthonormal basis of spherical harmonics in d dimensions with respect to the uniform measure on \({\mathcal {S}}^{d-1}\), and put \({\mathcal {G}}_j=\bigcup _{i=0}^j{\mathcal {H}}_i\). The number of linear independent spherical harmonics of degree j in dimension d is \({d+j-1 \atopwithdelims ()j} - {d+j-3 \atopwithdelims ()j-2}\). A suitable orthonormal basis can be found using Theorem 5.25 in Axler et al. (2001) or Manzotti and Quiroz (2001), see also Groemer (1996) or Müller (1998) for details on spherical harmonics. Manzotti and Quiroz (2001) suggest two different choices for \({\mathbf {f}}\). Putting \(r_j(x)=\Vert x\Vert ^j\), \(x \in {\mathbb {R}}^d\), and \(u(x)=x/\Vert x\Vert \), \(x \ne 0\), the first statistic \(T_{n,MQ}({\mathbf {f}}_1)\) uses \(f_j\) of the form \(g\circ u\) for \(g\in {\mathcal {G}}_4\setminus {\mathcal {H}}_0\), giving a total of \(k={d+3 \atopwithdelims ()4} - {d+2 \atopwithdelims ()3}-1\) functions. Due to orthonormality we have \(V={\mathrm{I}}_k\), and since no radial functions are considered, \(T_{n,MQ}({\mathbf {f}}_1)\) only tests for aspects of spherical symmetry. The second statistic \(T_{n,MQ}({\mathbf {f}}_2)\) uses the functions \(r_1\) and \(r_3 (g \circ u)\), where \(g \in {\mathcal {G}}_2\), which comprise a totality of \(k={d+1 \atopwithdelims ()2} +d + 1\) functions.

Both statistics are affine invariant, and Manzotti and Quiroz (2001) derive their limit null distributions, which are sums of weighted independent \(\chi ^2_1\) random variables. Although the authors do not deal with the question of consistency of their tests, it is easily seen that, under an alternative distribution \({{\mathbb {P}}}^X\) (which, in view of invariance, is assumed to be standardized), and suitable conditions on \(f_1,\ldots ,f_k\), we have

$$\begin{aligned} \frac{1}{n} T_{n,MQ} {\mathop {\longrightarrow }\limits ^{{{{\mathbb {P}}}}}}\delta (f)^\top V^{-1} \delta (f) \end{aligned}$$

as \(n \rightarrow \infty \), where \(\delta (f) = ({{\mathbb {E}}}f_1(X) - {{\mathbb {E}}}_0 f_1, \ldots ,{{\mathbb {E}}}f_k(X) - {{\mathbb {E}}}_0 f_k)^\top \), and \({{\mathbb {E}}}_0 f_j\) is the expectation \({{\mathbb {E}}}f_j(N)\), where \(N {\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{N}}_d(0,{\mathrm{I}}_d)\). Since there are non-normal distributions for which the above (nonnegative) stochastic limit vanishes, the tests of Manzotti and Quiroz (2001) are not consistent against general alternatives. To the best of our knowledge, there are no further asymptotic properties of \(T_{n,MQ}\) under alternatives to \(H_0\).

8 Tests based on skewness and kurtosis

A still very popular group of tests for \(H_0\) employ measures of multivariate skewness and kurtosis. The popularity of these tests stems from the widespread belief that, in case of rejection of \(H_0\), there is some evidence regarding the kind of departure from normality of the underlying distribution.

The state of the art regarding this group of tests has been reviewed in Henze (2002), but for the sake of completeness, we revisit the most important facts. The classical invariant measures of multivariate sample skewness and kurtosis due to Mardia (1970) are defined by

$$\begin{aligned} b_{n,d}^{(1)} = \frac{1}{n^2} \sum _{j,k=1}^n \left( Y_{n,j}^\top Y_{n,k}\right) ^3, \qquad b_{n,d}^{(2)} = \frac{1}{n} \sum _{j=1}^n \Vert Y_{n,j}\Vert ^4, \end{aligned}$$
(8.1)

respectively. The functional (population counterpart) corresponding to \(b_{n,d}^{(1)}\) is \(\beta _d^{(1)} = \beta _d^{(1)}({{\mathbb {P}}}^X) = {{\mathbb {E}}}(X_1^\top X_2)^3\), where X is standardized, \(X_1,X_2\) are i.i.d. copies of X, and \({{\mathbb {E}}}\Vert X\Vert ^6 < \infty \). The functional accompanying kurtosis is \(\beta _d^{(2)} = \beta _d^{(2)}({{\mathbb {P}}}^X) = {{\mathbb {E}}}\Vert X\Vert ^4\), where, like above, \({{\mathbb {E}}}(X) =0\) and \({{\mathbb {E}}}(XX^\top ) = {\mathrm{I}}_d.\) When used as statistics to test \(H_0\), \(b_{n,d}^{(1)}\) has an upper rejection region, whereas the test based on \(b_{n,d}^{(2)}\) is two-sided. If the distribution of X is elliptically symmetric, we have

$$\begin{aligned} n b_{n,d}^{(1)} {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}\alpha _1 \chi _d^2 + \alpha _2\chi _{d(d-1)(d+4)}^2, \end{aligned}$$
(8.2)

where

$$\begin{aligned} \alpha _1 = \frac{3}{d}\bigg [ \frac{{{\mathbb {E}}}\Vert X\Vert ^6}{d+2} - 2 {{\mathbb {E}}}\Vert X\Vert ^4 + d(d+2)\bigg ], \qquad \alpha _2 = \frac{6{{\mathbb {E}}}\Vert X\Vert ^6}{d(d+2)(d+4)}, \end{aligned}$$

where \(\chi _d^2\), \(\chi _{d(d-1)(d+4)}^2\) are independent \(\chi ^2\)-variables with d and \(d(d-1)(d+4)\) degrees of freedom, respectively, see Baringhaus and Henze (1992), and Klar (2002). Notice that \(\alpha _1=\alpha _2 =6\) under \(H_0\), whence \(n b_{n,d}^{(1)} {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}6 \chi ^2_{d(d+1)(d+2)/6}\) under normality, see Mardia (1970). From (8.2), it follows that the test of \(H_0\) based on \(b_{n,d}^{(1)}\) is not consistent against spherically symmetric alternatives satisfying \({{\mathbb {E}}}\Vert X\Vert ^6 < \infty \). If \(\beta _d^{(1)} > 0\), then \(\sqrt{n}(b_{n,d}^{(1)} - \beta _d^{(1)})\) has a centered non-degenerate limit normal distribution as \(n \rightarrow \infty \), see Theorem 3.2 of Baringhaus and Henze (1992). The skewness functional \(\beta _d^{(1)}(\cdot )\) does not characterize the class \({{{\mathcal {N}}}}_d\) of normal distributions since, although \(\beta _d^{(1)}(\cdot )\) vanishes on \({{{\mathcal {N}}}}_d\), there are (notably elliptically symmetric) non-normal distributions that share this property. Since the critical value of \(b_{n,d}^{(1)}\) as a test statistic for assessing multivariate normality is computed under the very assumption of normality, the inclination to impute supposedly diagnostic properties to \(b_{n,d}^{(1)}\) in case of rejection of \(H_0\) in the sense that ’there is evidence that the underlying distribution is skewed’ is not justified, at least not in terms of statistical significance. In fact, the limit distribution of \(n b_{n,d}^{(1)}\) under certain classes of elliptically symmetric distributions is stochastically much larger than the limit null distribution of \(n b_{n,d} ^{(1)}\) (see Baringhaus and Henze 1992), and so rejection of \(H_0\) based on \(b_{n,d}^{(1)}\) may be due to an underlying long-tailed elliptically symmetric distribution.

Regarding kurtosis, we have \(\sqrt{n}(b_{n,d}^{(2)} - \beta _d^{(2)}) {\mathop {\longrightarrow }\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{N}}(0,\sigma ^2)\) as \(n \rightarrow \infty \), where \(\sigma ^2\) depends on mixed moments of X up to order 8, see Henze (1994a). Under \(H_0\), we have \(\beta _d^{(2)} = d(d+2)\) and \(\sigma ^2 = 8d(d+2)\), and the limit distribution was already obtained by Mardia (1970), see also Klar (2002) for the case that \({{\mathbb {P}}}^X\) is elliptically symmetric. It follows that, under the condition \({{\mathbb {E}}}\Vert X\Vert ^8 < \infty \), Mardia’s kurtosis test for normality is consistent if and only if \(\beta _d^{(2)} \ne d(d+2)\). The critical remarks made above on alleged diagnostic capabilities of tests for \(H_0\) based on measures of skewness apply mutatis mutandis to a test for normality based on \(b_{n,d}^{(2)}\) or any other measure of multivariate kurtosis.

Among the many measures of multivariate skewness, we highlight skewness in the sense of Móri et al. (1993), because it emerges in connection with several weighted \(L^2\)-statistics for testing \(H_0\). This measure is defined by

$$\begin{aligned} {\widetilde{b}}_{n,d}^{(1)} := \frac{1}{n^2} \sum _{j,k=1}^n \Vert Y_{n,j}\Vert ^2 \Vert Y_{n,k}\Vert ^2 Y_{n,j}^\top Y_{n,k}. \end{aligned}$$
(8.3)

The corresponding functional (population counterpart) is \({\widetilde{\beta }}_d^{(1)}= \big \Vert {{\mathbb {E}}}(\Vert X\Vert ^2 X)\big \Vert ^2\), where X is assumed to be standardized and \({{\mathbb {E}}}\Vert X\Vert ^6 < \infty \). Limit distributions for \({\widetilde{b}}_{n,d}^{(1)}\) have been obtained by Henze (1997a) both for the case that \({{\mathbb {P}}}^X\) is elliptically symmetric (which implies \({\widetilde{\beta }}_d^{(1)}=0\)) and the case that \({\widetilde{\beta }}_d^{(1)}>0\), see also Klar (2002).

A further measure of multivariate skewness that has been reviewed in Henze (2002) is skewness in the sense of Malkovich and Afifi (1973), which is defined as

$$\begin{aligned} b_{n,d,M}^{(1)} = \max _{u \in {\mathcal {S}}^{d-1}} \frac{\big \{n^{-1}\sum _{j=1}^n (u^\top X_j- u^\top {\overline{X}}_n)^3 \big \}^2}{(u^\top S_n u)^3}. \end{aligned}$$

General limit distribution theory for \(b_{n,d,M}^{(1)}\) is given in Baringhaus and Henze (1991).

As for further measures of multivariate kurtosis, we mention the measure

$$\begin{aligned} {\widetilde{b}}_{n,d}^{(2)} = \frac{1}{n^2} \sum _{j,k=1}^n \left( Y_{n,j}^\top Y_{n,k}\right) ^4, \end{aligned}$$

introduced by Koziol (1989). The corresponding functional is \({\widetilde{\beta }}_d^{(2)} = {{\mathbb {E}}}(X_1^\top X_2)^4\), where \(X_1,X_2\) are i.i.d. copies of the standardized vector X, and \({{\mathbb {E}}}\Vert X\Vert ^8 < \infty \). General asymptotic distribution theory for \({\widetilde{b}}_{n,d}^{(2)}\) is provided by Henze (1994b) and Klar (2002). Henze (2002) also reviewed kurtosis in the sense of Malkovich and Afifi (1973), which is defined as

$$\begin{aligned} b_{n,d,M}^{(2)} = \max _{u \in {\mathcal {S}}^{d-1}} \frac{n^{-1}\sum _{j=1}^n (u^\top X_j- u^\top {\overline{X}}_n)^4}{(u^\top S_n u)^2}. \end{aligned}$$

Limit distribution theory for \(b_{n,d,M}^{(2)}\) has been obtained by Baringhaus and Henze (1991) and Naito (1998).

Since the review Henze (2002), there have been the following suggestions to test \(H_0\) by means of measures of multivariate skewness and kurtosis (which, however, do not lead to consistent tests and share the drawback stated at the beginning of this section): Kankainen et al. (2007) consider invariant tests of multivariate normality that are based on the Mahalanobis distance between two multivariate location vector estimates (as a measure of skewness) and on the (matrix) distance between two scatter matrix estimates (as a measure of kurtosis). Special choices of these estimates yield generalizations of Mardia’s skewness and kurtosis. The authors obtain asymptotic distribution theory of their test statistics both under normality and certain contiguous alternatives to \(H_0\), and they compare the limiting Pitman efficiencies to those of Mardia’s tests based on \(b_{n,d}^{(1)}\) and \(b_{n,d}^{(2)}\).

Doornik and Hansen (2008) propose a non-invariant test based on skewness and kurtosis.

Enomoto et al. (2020) consider a transformation of Mardia’s kurtosis statistic, with the aim of improving the finite-sample approximation with respect to a normal limit distribution.

9 Miscellaneous results

Arcones (2007) proposed two invariant test statistics that are based on the following characterizations, see, e.g., Cramér (1936). Let \(m \ge 2\) be a fixed integer, and let \(X_1,\ldots ,X_m\) be i.i.d. d-dimensional vectors satisfying \({{\mathbb {E}}}(X_1) =0\) and \({{\mathbb {E}}}(X_1X_1^\top ) = {\mathrm{I}}_d\). Then, \(m^{-1/2}\sum _{j=1}^m X_j {\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{N}}_d(0,{\mathrm{I}}_d)\) if and only if \(X_1 {\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{N}}_d(0,{\mathrm{I}}_d)\). Furthermore, \(m^{-1/2}\sum _{j=1}^m X_j {\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}X_1\) if and only if \(X_1 {\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{N}}_d(0,{\mathrm{I}}_d)\). A statistic that corresponds to the first characterization is \( {\widehat{D}}_{n,m} = \textstyle {\int } \big | {\widehat{\varPsi }}_{n,m}(t) - \varPsi _0(t)\big |^2 w_\beta (t) \, {\mathrm{d}}t \), where \( {\widehat{\varPsi }}_{n,m}(t) = n!^{-1}(n-m)! \textstyle {\sum _{\ne }} \exp \big ({\mathrm{i}} t^\top m^{-1/2} \sum _{p=1}^m Y_{n,j_p} \big )\), and \(\varSigma _{\ne }\) means summation over all \(j_1,\ldots ,j_m \in \{1,\ldots ,n\}\) such that \(j_p \ne j_q\) if \(p \ne q\). Notice that this approach is a generalization of the BHEP-statistic given in (2.8). The statistic which is tailored to the second characterization is \( {\widehat{E}}_{n,m} = \textstyle {\int } \big | \varPsi _{n,m}(1) - \varPsi _{n,1}(t)\big |^2 w_\beta (t) \, {\mathrm{d}}t\). Both statistics have representations in form of multiple sums. By using the theory of U-statistics with estimated parameters, Arcones (2007) derives almost sure limits of \({\widehat{D}}_{n,m}\) and \({\widehat{E}}_{n,m}\) as well as the limit distributions of \(n{\widehat{D}}_{n,m}\) and \(n{\widehat{E}}_{n,m}\) under \(H_0\). Some very limited simulations, performed for \(n \le 15\) and \(d=2\), indicate that the power of these tests is comparable to that of the BHEP test. However, the computational burden involved increases rapidly with m. Bakshaev and Rudzkis (2017) propose a test for multivariate normality that is based on the supremum of \(\big | \varPsi _n(t) - \varPsi _0(t)\big |^2\) figuring in (2.8), where the supremum is taken over a fixed d-dimensional cube without zero origin.

Without providing any distribution theory, Hwu et al. (2002) suggest an invariant two-stage test procedure for testing \(H_0\). This procedure combines a modified correlation coefficient related to a Q–Q plot of the ordered values of \(\Vert Y_{n,j}\Vert ^2\), \(j=1,\ldots ,n\), against ordered quantiles of the \(\chi ^2_d\)-distribution, and a test based on Mardia’s nonnegative invariant measure of skewness \(b_{n,d}^{(1)}\) given in (8.1).

Liang et al. (2004) deal with Q–Q plots based on functions of \((j(j+1))^{-1/2}(X_1+ \ldots + X_j - jX_{j+1})\), \(j=1,\ldots ,n-1\), and hence recommend procedures that are not even invariant with respect to permutations of \(X_1,\ldots ,X_n\). The latter objection also holds for the procedures suggested by Fang et al. (1998) and Liang and Bentler (1999).

Tan et al. (2005) extend the projection procedure of Liang et al. (2000) to test for multivariate normality with incomplete longitudinal data with small sample size, including cases when the sample size n is smaller than d.

Hanusz and Tarasińska (2008) correct an inaccuracy of the (non-invariant) test of Srivastava and Hui (1987), and Maruyama (2007) derives approximations of expectations and variances related to that test under alternative distributions.

Without providing any theoretical results, Hanusz and Tarasińska (2012) aim at transforming two graphical methods for assessing \(H_0\) into formal statistical tests. A variant of this approach was considered by Madukaife and Okafor (2018).

Cardoso de Oliveira and Ferreira (2010) suggest to perform a chi-quare test based on \(\Vert Y_{n,1}\Vert ^2, \ldots , \Vert Y_{n,n}\Vert ^2\) (see also Moore and Stubblebine 1981), and Batsidis et al. (2013) extend this approach to include more general power divergence type of test statistics. Madukaife and Okafor (2019) consider \(\ell _1\)- and \(\ell _2\)-type measures of deviation between \(\Vert Y_{n,j}\Vert ^2\) and corresponding approximate expected order statistics of a \(\chi ^2_d\)-distribution (for tests based on \(\Vert Y_{n,1}\Vert ^2, \ldots , \Vert Y_{n,n}\Vert ^2\), see also Section 5.2 of Henze 2002). Voinov et al. (2016) compare several test statistics that, for fixed \(r \ge 2\), are quadratic forms in the vector \((V_{n,1}, \ldots , V_{n,r})^\top \). Here, \(V_{n,j} = (N_{n,j}-n/r)/(\sqrt{n/r})\), \(N_{n,j} = \sum _{k=1}^n \mathbf{1}\{c_{j-1} < \Vert Y_{n,k} \Vert ^2 \le c_j\}\), and \(0< c_1< \ldots< c_{r-1} < c_r = \infty \), where \(c_j\) is the (j/r)-quantile of the \(\chi ^2_d\)-distribution, \(j=1,\ldots ,r-1\).

Jönsson (2011) investigates the finite-sample performance of the Jarque–Bera test for \(H_0\) in order to improve the size of the test. Koizumi et al. (2014) improve upon multivariate Jarque–Bera-type tests by means of transformations. Simulations show that such transformations essentially improves test accuracy when d is close to n. Kim (2016) generalizes the univariate Jarque–Bera test and its modifications to the multivariate versions using an orthogonalization of data and compares it with competitors in a simulation study.

Kim and Park (2018) propose a non-invariant test based on univariate Anderson–Darling-type statistics that are averaged out over the d coordinates. Villasenor Alva and González Estrada (2009) suggest a non-invariant test that is based on the average of Shapiro–Wilk statistics, applied to each of the components of \(Y_{n,1},\ldots ,Y_{n,n}\).

By using an idea of Fromont and Laurent (2006), Tenreiro (2011) proposes an invariant consistent multiple test procedure that combines Mardia’s measures of skewness and kurtosis and two members of the family of BHEP tests. The combined procedure rejects \(H_0\) if one of the statistics is larger than its \((1-u_{n,\alpha })\)-quantile under \(H_0\), where \(u_{n,\alpha }\) is calibrated so that the combined test has a desired level of significance \(\alpha \). In the same spirit, Tenreiro (2017) combines two BHEP tests and the ’extreme’ BHEP tests, the statistics of which are given by the right hand sides of (2.14) and (2.15).

Majerski and Szkutnik (2010) consider the problem of testing \(H_0\) against some alternatives that are invariant with respect to a subgroup of the full group of affine transformations and obtain approximations to the most powerful invariant tests. Special emphasis is given to exponential and uniform alternatives in the case \(d=2\), whereas the case \(d \ge 3\) is only sketched.

In the spirit of projection pursuit tests (see Section 8.1 of Henze 2002), which are based on Roy’s union-intersection principle (Roy 1953), Zhou and Shao (2014) propose a non-invariant test that combines the Shapiro–Wilk test and Mardia’s kurtosis test. In the same spirit, Wang and Hwang (2011) suggest a statistic that considers solely the Shapiro–Wilk statistic.

Wang (2014) provides a MATLAB package for testing \(H_0\), which is implemented as an interactive and graphical tool. The package comprises 12 different tests, among which are the energy test, the Henze–Zirkler test, and the tests based on Mardia’s skewness and kurtosis.

Thulin (2014) proposes six invariant tests for \(H_0\), the common basis of which are characterizations of independence of sample moments of the multivariate normal distribution.

10 Comparative simulation studies

10.1 Available simulation studies

Mecklin and Mundfrom (2005) perform an extensive simulation study with 13 tests for multivariate normality. From this study, they conclude that ’if one is going to rely on one and only one procedure, the Henze–Zirkler test is recommended. This recommendation is based on the relative ease of use (the test statistic has an approximately lognormal asymptotic distribution), good Monte Carlo simulation results, and mathematically proven consistency against all alternatives.’

Farrell et al. (2007) compare four tests of multivariate normality and conclude: ’The results of our simulation suggest that, relative to the other two tests considered, the Henze and Zirkler test generally possesses good power across the alternative distributions investigated, in particular for \(n \ge 75\).’

Hanusz et al. (2018) compare four test of \(H_0\) that are based on a combination of measures of multivariate skewness and kurtosis, and the Henze–Zirkler test. They concluded that ’the Henze–Zirkler test best preserves the nominal significance level’ and that ’for the number of traits and sample sizes considered, it is not possible to indicate the most powerful test for all kinds of alternative distributions considered in the paper.’

Joenssen and Vogel (2014) investigate 15 tests of \(H_0\), all of which freely available as R-functions. They find that some tests are unreliable and should either be corrected or removed, or their deficits should be commented upon in the documentation by the package maintainer. Moreover, they summarize: ’On the question of whether or not multivariate tests offer an advantage over simply testing each marginal distribution with a univariate test, the answer is a resounding yes. Not only are some multivariate tests able to detect deviations from normality that are not reflected in the marginals of the distribution, but these tests are also, in part, more powerful for distributions that do display the deviations in the marginals.’

10.2 New simulation study

This subsection compares the finite-sample power performance of the tests presented in this survey by means of a Monte Carlo simulation study. All simulations are performed using the statistical computing environment R, see R Core Team (2018). The tests were implemented in the accompanying R package mnt, see Butsch and Ebner (2020).

We consider the sample sizes \(n=20\), \(n=50\) and \(n=100\), the dimensions \(d=2\), \(d=3\) and \(d=5\), and the nominal level of significance is set to 0.05. Throughout, critical values for the tests have been simulated with 100 000 replications under \(H_0\), see Table 1. Note that, in order to ease the comparison with the original articles, we state the empirical quantiles of \(\left( 16\gamma ^{2+d/2}/\pi ^{d/2}\right) {\mathrm{HV}}_{n,\gamma }\), \(\pi ^{-d/2}{\mathrm{HJ}}_{n,\gamma }\), \((\gamma /\pi )^{d/2}{\mathrm{HJM}}_{n,\gamma }\), \((\gamma /\pi )^{d/2}d^{-2}{\mathrm{DEH}}_{n,\gamma }\), and \((\gamma /\pi )^{d/2}d^{-2}{\mathrm{DEH}}^*_{n,\gamma }\) and chose whenever available the tuning parameter \(\gamma \) according to the suggestions of the authors, respectively. For the sake of readability, we suppress the index n for all tests in the tables. Note that \(\overline{\text{ BHEP }}\) denotes the BHEP test with tuning parameter \(\beta = \sqrt{2}/(1.376+0.075d)\), as suggested in Tenreiro (2009). The values given in Table 1 are also reported in package mnt in the data frame Quantile095 for easy access. Each entry in a table that refers to empirical rejection rates as estimates of the power of the test is based on 10 000 replications, with the exception of the HJM test, where 1 000 replications have been considered, due to the heavy computation time of the procedure.

Table 1 Empirical \(95\%\) quantiles of the test statistics under \(H_0\) (100 000 replications)

We consider a total of 32 alternatives as well as a representative of the multivariate normal distribution. By NMix\((p,\mu ,\varSigma )\) we denote the normal mixture distribution generated by

$$\begin{aligned} (1 - p) \, {\mathrm{N}}_d(0, {\mathrm{I}}_d) + p \, {\mathrm{N}}_d(\mu , \varSigma ), \quad p \in (0, 1), \, \mu \in {\mathbb {R}}^d, \, \varSigma > 0, \end{aligned}$$

where \(\varSigma > 0\) stands for a positive definite matrix. In the notation of above, \(\mu =3\) stands for a d-variate vector of 3’s and \(\varSigma ={\mathrm{B}}_d\) for a \((d \times d)\)-matrix containing 1’s on the main diagonal and 0.9’s for each off-diagonal entry. We write \(t_\nu (0,{{\mathrm{I}}}_d)\) for the multivariate t-distribution with \(\nu \) degrees of freedom, see Genz and Bretz (2009). By \(\hbox {DIST}^d(\vartheta )\) we denote the d-variate random vector generated by independently simulated components of the distribution DIST with parameter vector \(\vartheta \), where DIST is taken to be the uniform distribution U, the lognormal distribution LN, the beta distribution B, as well as the Pearson type II \(\hbox {P}_{{II}}\) and Pearson type VII distribution \(\hbox {P}_{{VII}}\). For the latter distribution, we used the R package PearsonDS, see Becker and Klößner (2017). The spherical symmetric distributions were simulated using the R package distrEllipse, see Ruckdeschel et al. (2006), and they are denoted by \({\mathcal {S}}^d(\text{ DIST})\), where DIST stands for the distribution of the radii, which was chosen to be the exponential, the beta, the \(\chi ^2\)-distribution and the lognormal distribution. With \(\hbox {MAR}_d\)(DIST) we denote \({\mathrm{N}}_d(0, {\mathrm{I}}_d)\)-distributed random vectors, where the dth component is independently replaced by a random variable following the distribution DIST. Here, we chose the exponential, the \(\chi ^2\), student’s t and the gamma distribution. With \(\hbox {NM}_d(\vartheta )\) we denote the normal mixture distributions generated by

$$\begin{aligned} 0.5 \, {\mathrm{N}}_d(0, \varSigma _\vartheta ) + 0.5 \, {\mathrm{N}}_d(0, \varSigma _{-\vartheta }), \end{aligned}$$

where \(\varSigma _\vartheta \) is a positive definite (\(d\times d\))-matrix with 1’s on the diagonal and the constant \(\vartheta \) for each off diagonal entry. In this family of non-normal distributions each component follows a normal law. The symbol S\(|\text{ N}_d|\) stands for the distribution of \(\pm |X|\), where \(X{\mathop {=}\limits ^{{{{{\mathcal {D}}}}}}}{\mathrm{N}}_d(0,\text{ I}_d)\), the absolute value \(|\cdot |\) is applied componentwise, and ± assigns, independently of each other and with equal probability 0.5, a random sign to each component of |X|. Finally, we consider the distribution \({\mathrm{N}}_d(\mu _d,\varSigma _{0.5})\), with \(\mu _d=(1,2,\ldots ,d)^\top \) and the same covariance structure as reported for the NM-alternatives, in order to show that all tests under consideration are invariant and indeed have a type I error equal to the significance level of \(5\%\).

Table 2 Empirical rejection rates of the considered tests (\(d=2\), \(\alpha =0.05\))
Table 3 Empirical rejection rates of the considered tests (\(d=3\), \(\alpha =0.05\))
Table 4 Empirical rejection rates of the considered tests (\(d=5\), \(\alpha =0.05\))

The results of the weighted \(L^2\)-type tests in Tables 2, 3 and 4 are presented for the same tuning parameters as in Table 1, and in order to keep the tables concise the values are omitted.

First, we evaluate the results for \(d=2\). A close look at Table 2 reveals that, for the family of normal mixture distributions, the HZ test and the PU test perform best when the shifted standard normal distributions are mixed, whereas for different covariance matrices, the strongest procedure is HJM. The HJM test performs also best throughout the multivariate t-distributions. For the independently simulated components, \(T_{MQ}({\mathbf {f}}_2)\) is strong, especially for marginal distributions with bounded support. Interestingly, each of the tests that are based on measures of skewness and kurtosis, as well as the HV and HJ tests, completely fails to detect these alternatives. For the Pearson type VII alternatives, HJM again has the strongest power, while \(\overline{\text{ BHEP }}\) shows the strongest performance for \(\hbox {LN}^2(0,0.5)\) and BHEP for \(\hbox {B}^2(1,2)\). The spherically symmetric alternatives with bounded support of the radial distributions are well detected by the HZ and the \({{{\mathcal {E}}}}\) tests. For the case of unbounded support of the radial distribution, the strongest test is again HJM. This test is also strongest for the marginally perturbed alternatives \(\hbox {MAR}_2\)(DIST), where it is just outperformed by the PU test for the perturbation by Exp(1)- and \(\chi ^2\)-random variables. The \(\hbox {NM}_d(\vartheta )\)-distributions are uniformly best detected by HJM, although the power is not very strong, whereas all other tests almost completely fail to detect these alternatives. Notably, the S\(|\text{ N}_2|\) alternatives are best detected by \(T_{MQ}({\mathbf {f}}_1)\). Overall, for the chosen alternatives HJM performs best, but it also lacks power especially when the support of the distribution is bounded. From a robust point of view, the weighted \(L^2\) procedures, like \(\hbox {DEH}^*\), the HZ test as well as the energy test \({{{\mathcal {E}}}}\) perform very well, especially if the focus is on consistency.

In dimensions \(d=3\) and \(d=5\), one can paint the same picture for the allocation of the best procedures to the alternatives. Interestingly, the power of the procedures increases compared to the lower-dimensional setting, which appears to be counterintuitive in view of the curse of dimensionality. Some noticeable phenomena arise: For the \({\mathcal {S}}^d(\text{ B }(2,2))\) distribution, some of the tests, like HV, HJ and \(T_{CS}\), \(b_M^{(1)}\), \(b_M^{(2)}\) seem to loose power when the sample size is increased. An explanation for this behavior for the latter tests might be that these procedures use an approximation of the maximum on the unit sphere, which might be harder to approximate for larger samples. In the case \(d=3\), we also observe this behavior for the HJM test. Interestingly, the HJM test and the PU test increase the power against \(\hbox {NM}_d(\vartheta )\)-alternatives in comparison with the case \(d=2\), whereas the other procedures nearly uniformly fail to distinguish them from the null hypothesis in each dimension considered.

11 Conclusions and outlook

From a practical point of view, we recommend to use the computationally efficient weighted \(L^2\)-type procedures, like BHEP (or versions of it like HZ) and \(\hbox {DEH}^*\), or the energy test \({\mathcal {E}}\), since they show a good balance between fast computation time and robust power against many alternatives, and they do not exhibit any particular weakness. If computation time is not an issue, we suggest to employ the HJM test, as it outperforms most of the other procedures. Note that by choosing other tuning parameters, the weighted \(L^2\)-procedures are expected to benefit in terms of power against specific alternatives, especially if one is able to choose the tuning parameter in a data dependent way. For a first step in this direction for univariate goodness-of-fit tests, see Tenreiro (2019). In general, it would be nice to have explicit solutions of the Fredholm integral equation (2.4). For some recent cases in which such integral equations have witnessed explicit solutions in the context of goodness-of-fit testing, see, e.g., Theorem 3.2 of Baringhaus and Taherizadeh (2010) or Theorems 3 and 5 of Hadjicosta and Richards (2019). High-dimensional \(L^2\)-statistics for testing normality have not been considered so far in the literature. The efficient implementation of the tests in the package mnt admits first simulations, which indicate that new interesting phenomena arise.