Tests for multivariate normality -- a critical review with emphasis on weighted $L^2$-statistics

This article gives a synopsis on new developments in affine invariant tests for multivariate normality in an i.i.d.-setting, with special emphasis on asymptotic properties of several classes of weighted $L^2$-statistics. Since weighted $L^2$-statistics typically have limit normal distributions under fixed alternatives to normality, they open ground for a neighborhood of model validation for normality. The paper also reviews several other invariant tests for this problem, notably the energy test, and it presents the results of a large-scale simulation study. All tests under study are implemented in the accompanying R-package mnt.


Introduction
Testing for multivariate normality (for short: MVN) is a topic of ongoing interest. A survey of dozens of MVNtests, including graphical procedures for assessing multivariate normality, provide Mecklin and Mundfrom (2004). The review of Henze (2002) concentrates on affine invariant and consistent procedures, and the book of Thode (2002) contains a chapter on testing for MVN.
In a standard setting, let X, X 1 , X 2 , . . . be independent identically distributed (i.i.d.) d-variate random (column) vectors, which are defined on a common probability space (Ω, A, P). The distribution of X will be denoted by P X . We write N d (µ, Σ) for the d-variate normal distribution with expectation µ and covariance matrix Σ, and we let N d := {N d (µ, Σ) : µ ∈ R d , Σ positive definite} denote the class of all non-degenerate d-variate normal distributions. Testing for d-variate normality means testing the hypothesis H 0 : P X ∈ N d , against general alternatives, on the basis of X 1 , . . . , X n . At the outset, it should be stressed that each model can merely hold approximately in practice. In particular, there can only be approximate normality, in whatever sense. Consequently, there is the following basic drawback inherent in any goodness-of-fit test, not only of H 0 , but also of other families of distributions: If a level-α-test of H 0 does not lead to a rejection of H 0 , the null hypothesis is by no means validated or confirmed. Presumably, there is merely not enough evidence to reject it! A further fundamental point is that there cannot be an optimal test of H 0 , if one really wants to detect general alternatives. In this respect, Janssen (2000) shows that the global power function of any nonparametric test is flat on balls of alternatives, except for alternatives coming from a finite-dimensional subspace. Thus, loosely speaking, each test of H 0 has its own 'non-centrality'.
Regarding the task of reviewing MVN-tests here in 2020, we cite Mecklin and Mundfrom (2004), who write 'the continuing proliferation of papers with new methods of assessing MVN makes it virtually impossible for any single MSC 2010 subject classifications. Primary 62H15 Secondary 62G20 survey article to cover all available tests'. And they continue: 'When compared to the amount of work that has been done in developing these tests, relatively little work has been done in evaluating the quality and power of the procedures'.
This review can also only be partial. We will take the above testing problem seriously and concentrate on genuine tests of H 0 that have been proposed since the review Henze (2002), and we will judge each of these according to the following points of view: • affine invariance • theoretical properties (limit distributions under H 0 and under fixed and contiguous alternatives to H 0 , consistency) • feasibility with respect to sample size and dimension.
Thus, e.g., we will not deal with tests for H 0 that allow for n ≤ d (see Tan et al. (2005) or Yamada and Himeno (2019)), since the condition n ≥ d + 1 is necessary to decide whether the underlying covariance matrix is nondegenerate or not. Moreover, unlike the review of Mecklin and Mundfrom (2004), we will not discuss purely graphical procedures, as proposed in Holgersson (2006). We will also not embark upon a review of tests for normality in noni.i.d.-settings, like testing for Gaussianity of the innovations in MGARCH processes (see, e.g., Lee and Ng (2011) or Lee, Lee and Park (2014)), or situations with incomplete data (see, e.g., Yamada, Romer and Richards (2015)), since such a task would go beyond the scope of this review. We will also not review tests for Gaussianity in infinitedimensional Hilbert spaces, see, e.g., Górecki, Horváth and Kokoszka (2020) or Kellner and Celisse (2019).
Regarding affine invariance, notice that the class N d is closed with respect to full rank affine transformations. Hence, any 'genuine' statistic T n = T n (X 1 , . . . , X n ) (say) for testing H 0 should satisfy T n (AX 1 + b, . . . , AX n + b) = T n (X 1 , . . . , X n ) for each regular (d × d)-matrix A and each b ∈ R d . Otherwise, it would be possible to reject H 0 on given data and do not object against H 0 on the same data, after performing a rotation, which makes little, if any, sense. In the sequel, let Y n,j = S −1/2 n (X j − X n ), j = 1, . . . , n, (1.1) denote the so-called scaled residuals. Here, X n = n −1 n j=1 X j is the sample mean, S n = n −1 n j=1 (X j − X n )(X j − X n ) ⊤ stands for the sample covariance matrix of X 1 , . . . , X n , and the superscript ⊤ denotes transposition of column vectors. The matrix S −1/2 n is the unique symmetric square root of S −1 n . The latter matrix exists almost surely if n ≥ d+ 1 and P X is absolutely continuous with respect to d-dimensional Lebesgue measure, see Eaton and Perlman (1973). These assumptions will be standing in what follows. We remark that S n is sometimes defined with the factor (n−1) −1 instead of n −1 , but this difference does not have implications for asymptotic considerations. A good account on finite-sample distribution theory of Y n,1 , . . . , Y n,n under H 0 is provided by Takeuchi (2020).
Affine invariance is achieved if the test statistic T n is a function of Y ⊤ n,i Y n,j , i, j ∈ {1, . . . , n}, or if T n is a function of (only) Y n,1 , . . . , Y n,n , and T n (OY n,1 , . . . , OY n,n ) = T n (Y n,1 , . . . , Y n,n ) for each orthogonal (d × d)-matrix O. If a statistic T n is affine invariant (henceforth invariant for the sake of brevity), the distribution of T n under the null hypothesis H 0 does not depend on the parameters µ and Σ of the underlying normal distribution. Thus, regarding distribution theory under H 0 , we can without loss of generality assume that P X = N d (0, I d ). Here, 0 is the origin in R d , and I d is the unit matrix of order d. But invariance of a statistic T n also entails that it is no restriction to assume EX = 0 and EXX ⊤ = I d when studying the distribution of T n under an alternative to H 0 that satisfies E X 2 < ∞, where · denotes the Euclidean norm in R d .
As for the second point, i.e., properties of a test of H 0 based on a statistic T n that go beyond mere simulation results, there should be a sound rationale for the test, which means that there should be good knowledge of what is estimated by T n if the underlying distribution is not normal. This rationale is intimately connected to the property of consistency. If T n is some invariant statistic, it must be regarded -perhaps after some suitable normalization -as an estimator of some invariant functional T (P ) of the unknown underlying distribution P , where P = P X . This means that T (P ) = T ( P ) if P is a full rank affine image of P , whence T (·) is constant over the class N d . For such a functional, consistency of a test based on T against general alternatives can not be expected if T does not characterize the class N d , in the sense that there are P 1 ∈ N d and P 2 / ∈ N d such that T (P 1 ) = T (P 2 ). Examples of non-characterizing functionals are time-honored measures of multivariate skewness and kurtosis, see Section 8. The most prominent of this group of tests is Mardia's invariant non-negative skewness functional (1.2) Here, X 1 , X 2 are i.i.d. with distribution P , mean µ and nonsingular covariance matrix Σ. The functional β (1) d does not characterize the class N d since it does not only vanish on N d , but in particular also for each non-normal elliptically symmetric distribution for which the expectation figuring in (1.2) exists. This fact has striking consequences for a standard test of H 0 that rejects H 0 for large values of the sample counterpart of β (1) d , see Section 8. The paper is organized as follows: Section 2 gives a thorough account on general aspects of weighted L 2 -statistics for testing H 0 , and besides the class of BHEP-tests, it reviews five recently proposed tests for multivariate normality that are based on either the characteristic function, the moment generating function, or a combination thereof. Section 3 reviews the Henze-Zirkler test with bandwidth depending on sample size and dimension, which is not a weighted L 2 -statistic in the sense of Section 2. In Section 4, we summarize the most important features of the meanwhile well established energy test of Székely and Rizzo (2005), and Section 5 deals with the test of Pudelko (2005). Section 6 reviews new theoretical results on a time-honored test of Cox and Small (1978), while Section 7 considers the test of Manzotti and Quiroz (2001), which is based on functions of spherical harmonics. In Section 8 we review tests based on skewness and kurtosis, and in Section 9 we try to give a brief account on further work on the subject. Section 10 presents the results of a large scale simulation study that comprises each of the tests treated in Sections 2 -8. The final Section 11 draws some conclusions, and it gives an outlook for further research.
We conclude this section by pointing out some general notation. Throughout the paper, B d stands for the σ-field of Borel sets in R d , S d−1 := {x ∈ R d : x = 1} is the surface of the unit sphere in R d , and Φ(·) denotes the distribution function of the standard normal distribution. The symbol D −→ stands for convergence in distribution of random elements (variables, vectors and processes), and P −→, a.s. −→ denote convergence in probability and almost sure convergence, respectively. Each limit refers to the setting n → ∞. The symbol D = denotes equality in distribution. Throughout the paper, each unspecified integral will be over R d . The acronyms (E)MGF and (E)CF stand for the (empirical) moment generating function and the (empirical) characteristic function, respectively. Finally, we write 1{A} for the indicator function of an event A.

Weighted L -statistics
In this chapter, we review the state of the art of weighted L 2 -statistics for testing H 0 . These statistics have a long history, and they are also in widespread use for goodness-of-fit problems with many other distributions, see, e.g., Baringhaus, Ebner and Henze (2017). A weighted L 2 -statistic for testing H 0 takes the form Here, Z n (t) = z n (X 1 , . . . , X n , t), z n is a real-valued measurable function defined on the (n + 1)-fold cartesian product of R d , and w : R d → R is a non-negative weight function satisfying The function z n can also be vector-valued; then Z 2 n (t) in (2.1) is replaced with Z n (t) 2 . Typically, Z n (t) takes the form In view of (2.1), a natural setting to study asymptotic properties of T n is the separable Hilbert space H := L 2 (R d , B d , w(t)dt) of (equivalence classes) of measurabe functions on R d that are square-integrable with respect to w(t)dt. If f H := f 2 (t)w(t) dt 1/2 denotes the norm of f ∈ H, then T n = Z n 2 H . The general approach to derive the limit distribution of T n under H 0 is to prove Z n D −→ Z for some centred Gaussian random element of H, whence T n D −→ Z 2 H by the continuous mapping theorem. To this end, it is indispensable to approximate Z n figuring in (2.2) by a suitable random element Z n,0 of H of the form Hilbert spaces (see, e.g., Theorem 2.7 in Bosq (2000)), then yields Z n,0 D −→ Z for some centred Gaussian element of The distribution of Z is uniquely determined by the kernel K(·, ·), and the distribution of Z 2 H is that of ∞ j=1 λ j N 2 j , where the N j are i.i.d. standard normal random variables, and λ j , j = 1, 2, . . ., are the positive eigenvalues corresponding to eigenfunctions f j of the (linear second-order homogeneous Fredholm) integral equation see, e.g., Kac and Siegert (1947). The problem of finding the eigenvalues and associated eigenfunctions of (2.4) is called the kernel eigenproblem. In this respect, hitherto none of the integral equations corresponding to the test presented in this section has been solved explicitly. Notice that knowledge of the largest eigenvalue λ max (say) opens ground for the calculation of the approximate Bahadur slope and hence for statements on the Bahadur efficiency which, for asymptotically normal statistics, typically coincides with the Pitman efficiency, for details see Bahadur (1960) and Nikitin (1995).
To find a random element Z n,0 of the form (2.3) that approximates Z n , one has to evaluate the effect of replacing Y n,j in (2.2) with X j . Putting ∆ n,j = Y n,j − X j , j = 1, . . . , n, the following result, taken from Dörr, Ebner and Henze (2019), is helpful.
, the function ℓ(·) must be smooth enough to allow for a Taylor expansion. To tackle the linear part in this expansion, it is crucial to have some information on ∆ n,j = (S −1/2 n − I d )X j − S −1/2 n X n . Such information is provided by display (2.13) of Henze and Wagner (1997), according to which Since Proposition 1 holds under general assumptions, one may often obtain asymptotic normality of weighted L 2statistics under fixed alternatives. To this end, notice that Under suitable conditions, we will have T n /n P −→ ∆, where ∆ = z 2 H , and z(t) = E ℓ(t ⊤ X) , z ∈ R d . An immediate consequence of this stochastic convergence is the consistency of a test for H 0 based on T n against each alternative distribution that satisfies ∆ > 0. But we have more! Writing u, v H = u(t)v(t)w(t) dt for the inner product in H, there is the decomposition These lines carve out the quintessence of asymptotic normality of weighted L 2 -statistics under fixed alternatives. Namely, if one can show that the sequence V n := √ n(Z n − z) of random elements of H converges in distribution to some centred Gaussian random element V of H, then, by the continuous mapping theorem and Slutski's lemma, we have where σ 2 = 4 K(s, t)z(s)z(t)w(s)w(t) dsdt, and K(·, ·) is the covariance kernel of V , see Theorem 1 of Baringhaus, Ebner and Henze (2017). As a consequence, if σ 2 n is a consistent estimator of σ 2 based on X 1 , . . . , X n , then, for given α ∈ (0, 1), is an asymptotic confidence interval for ∆ of level 1 − α. Moreover, from (2.5) and Slutski's lemma, we have which opens the ground for a validation of a certain neighborhood of H 0 . Namely, suppose that we want to tolerate a given 'distance' ∆ 0 to the class N d . We may then consider the 'inverse' testing problem Here, the dependence of ∆ on the underlying distribution P X has been made explicit.
has asymptotic level α, and it is consistent against general alternatives, see Section 3.3 of Baringhaus, Ebner and Henze (2017). Notice that this test is in the spirit of bioequivalence testing (see, e.g., Czado et al. (2007), Dette and Munk (2003) or Wellek (2010)), since it aims at validating a certain neighborhood of a hypothesized model.
We now review the time-honored class of BHEP-tests and several recently suggested L 2 -statistics for testing H 0 . Each of these statistics has an upper rejection region, and it is invariant, because it is a function of Y ⊤ n,j Y n,k , where j, k ∈ {1, . . . , n}.

The BHEP-tests
Generalizing a test for univariate normality based on the ECF due to Epps and Pulley (1983), the first proposals for weighted L 2 -statistics for testing H 0 are due to Baringhaus and Henze (1988) and Henze and Zirkler (1990), who considered the statistic BHEP n,β = n Ψ n (t) − Ψ 0 (t) 2 w β (t) dt. (2.8) Here, denotes the ECF of Y n,1 , . . . , Y n,n , Ψ 0 (t) = exp(− t 2 /2) is the CF of the distribution N d (0, I d ), and the weight function w β is given by where β > 0 is a fixed constant. That BHEP n,β is indeed of the type (2.1) will become clear from the representation (2.13).
Whereas Baringhaus and Henze (1988) studied the special case β = 1, the general case was treated by Henze and Zirkler (1990). An extremely appealing feature of the weight function w β in (2.10) is that BHEP n,β takes the feasible form The BHEP-test is the most thoroughly studied class of tests for multivariate normality. S. Csörgő (1989) coined the acronym BHEP for this class of tests for H 0 , after early developers of the idea, and he proved that lim inf n→∞ n −1 BHEP n,β ≥ C(P X , β) > 0 almost surely for some constant C(P X , β) if P X does not belong to N d . As a consequence, a test for normality based on BHEP n,β is consistent against any alternative.
If E X 2 < ∞ and EX = 0, EXX ⊤ = I d (the last two assumptions entail no loss of generality in view of invariance), then 1 n BHEP n,β a.s.
In view of the representation (2.11), Baringhaus and Henze (1988) and Henze and Zirkler (1990) obtained the limit null distribution of BHEP n,β as n → ∞ my means of the theory of V-statistics with estimated parameters. Upon observing that Henze and Wagner (1997) considered Z n (·) as a random element in a certain Fréchet space of random functions, and they showed that Z n converges in distribution in that space to some centred Gaussian random element Z, see Theorem 2.1 of Henze and Wagner (1997). Moreover, BHEP n,β D −→ Z 2 (t) w β (t) dt, and the test is able to detect a sequence of contiguous alternatives that approach H d at the rate n −1/2 . Henze and Wagner (1997) also obtained the first three moments of the limit null distribution of BHEP n,β . Finally, the class of BHEP-tests is 'closed at the boundaries' β → 0 and β → ∞ since, elementwise on the underlying probability space, we have (1) n,d , (2.14) where b (1) n,d and b (1) n,d are given in (8.1) and (8.3), respectively, see Henze (1997b). Thus, as β → 0, a scaled version of BHEP n,β is approximately a linear combination of two measures of multivariate skewness. The limit distribution of the right-hand side of (2.14) under general distributional assumptions on X has been studied by Henze (1997b). Last but not least, we have see Henze (1997b). Hence, as β → ∞, rejection of H 0 for large values of BHEP n,β means rejection of H 0 for small values of n j=1 exp(− Y n,j 2 /2). The latter statistic, like Mardia's measure of multivariate kurtosis b (2) n,d (see (8.1)), merely investigates an aspect of the 'radial part' of the underyling distribution.
Guided by theoretical and simulation based results in the univariate case, Tenreiro (2009) performed an extensive simulation study on the power of the BHEP test for dimensions d ∈ {2, 3, . . . , 10, 12, 15} and sample sizes n ∈ {20, 40, 60, 80, 100}. He concluded that the choice β n given in (3.1) gives 'the best results for long tailed or moderately skewed alternatives, but it also produces very poor results for short tailed alternatives'. If no relevant information about the tail of the alternatives is available, he strongly recommends the use of β = √ 2/(1.376 + 0.075d) (in fact,his recommendation is in terms of h = 1/(β √ 2))), and there are similar recommendations for short tailed alternatives and long tailed or moderately skewed alternatives, respectively.

A weighted L 2 -statistic via the moment generating function
Henze and Jiménez-Gamero (2019) generalized results of Henze and Koch (2020) to the multivariate case and considered a MGF analogue to the BHEP-test statistic. Letting denote the EMGF of Y n,1 , . . . , Y n,n , and writing M 0 (t) = exp( t 2 /2), t ∈ R d , for the MGF of the standard normal distribution N d (0, I d ), the test statistic is (2.18) and γ > 2 is some fixed parameter. Notice that the condition γ > 1 is necessary for the integral in (2.17) to be finite, and the more stringent condition γ > 2 is needed for asymptotics under H 0 . The test statistic HJ n,γ has a representation analogous to (2.11) (see display (1.4) of Henze and Jiménez-Gamero (2019)). Elementwise on the underlying probability space, we have which, interestingly, is the same limit as in (2.14). By working in the Hilbert space of (equivalence classes) of measurabe functions on R d that are square-integrable with respect to w γ (t)dt, Henze and Jiménez-Gamero (2019) derived the limit null distribution of HJ n,γ , which is that of HJ ∞,γ := W 2 (t) w γ (t) dt, where W is some centred Gaussian random element of that space. Henze and Jiménez-Gamero (2019) also obtained the expectation and the variance of This inequality implies the consistency of the MVN test based on HJ n,γ against those alternatives that have a finite MGF. Indeed, one may conjecture that this test is consistent against any alternative to H 0 .

A test based on a characterization involving the MGF and the CF
Volkmer (2014) proved a characterization of the univariate centred normal distribution, which involves both the CF and the MGF. Henze, Jiménez-Gamero and Meintanis (2019) generalized this result as follows: denotes the real part of the CF of X, then holds true if and only if X follows some zero-mean normal distribution.
Since Y n,1 , . . . , Y n,n provide an empirical standardization of X 1 , . . . , X n , a natural test statistic based on (2.21) is is the empirical cosine transform of the scaled residuals, and M n (t) and w γ (t) are given in (2.16) and (2.18), respectively. There is a representation of HJM n,γ similar to (2.11), but involving a fourfold sum (see display (3.7) of Henze, Jiménez-Gamero and Meintanis (2019)). The main results about HJM n,γ are as follows: Elementwise on the underlying probability space, we have Interestingly, this is the same linear combination of two measures of skewness as in (2.14) and (2.19). If γ > 1, then the limit null distribution of HJM n,γ is that of HJM ∞,γ := W 2 (t) w γ (t) dt, where W is a centred random element of the Hilbert space L 2 (R d , B d , w(t)dt) with a covariance kernel given in Theorem 5.1 of Henze, Jiménez-Gamero and Meintanis (2019). Moreover, that paper also states a formula for E[HJM ∞,γ ] and obtains the inequality which is analogous to (2.20). We conjecture that also the MVN test based on HJM n,γ is consistent against any non-normal alternative distribution.

A test based on a system of partial differential equations for the MGF
The novel idea of Henze and Visagie (2019) for constructing a test of H 0 is the following: Suppose that the MGF M (t) = E[exp(t ⊤ X)] of a random vector X exists for each t ∈ R d and satisfies the system of partial differential equations If H 0 holds, the scaled residuals Y n,1 , . . . , Y n,n should be approximately independent, with a distribution close to N d (0, I d ), at least for large n. Hence, a natural approach for testing H 0 is to consider the EMGF M n of Y n,1 , . . . , Y n,n , defined in (2.16), and to employ the weighted L 2 -statistic where ∇f stands for the gradient of a function f : R d → R, and w γ is given in (2.18). Putting Y + n,j,k = Y n,j + Y n,k , HV n,γ takes the feasible form To derive the limit null distribution of HV n,γ , put W n (t) : there is some centred Gaussian random element W of H with a covariance (matrix) kernel given in display (11) of Henze and Visagie (2019), so that W n D −→ W as n → ∞. By the continuous mapping theorem, we then have HV n,γ which parallels (2.20) and (2.22).

A test based on the harmonic oscillator in characteristic function spaces
Dörr, Ebner and Henze (2019) noticed that the CF Ψ 0 (t) = exp(− t 2 /2) of the distribution N(0, I d ) is the unique solution of the partial differential equation subject to f (0) = 1, where ∆ is the Laplace operator, see Theorem 1 of Dörr, Ebner and Henze (2019). The operator −∆ + x 2 − d is called the harmonic oscillator, which is a special case of a Schrödinger operator. A suitable statistic for testing H 0 that reflects this characterization is where w γ is given in (2.18) and γ > 0. The test statistic has the feasible form Like the class of BHEP-tests, also the class of tests based on DEH n,γ is closed at the boundaries γ → 0 and γ → ∞, since -elementwise on the underlying probability space -we have Here, b (2) n,d is multivariate kurtosis in the sense of Mardia (1970), defined in (8.1), and b (1) n,d is skewness in the sense of Móri, Rohatgi and Székely (1993), see (8.3). Dörr, Ebner and Henze (2019) proved a Hilbert space central limit theorem for the sequence of random elements , and X is a standardized random vector satisfying E X 4 < ∞. Since for that choice of µ(t), the authors obtained the limit distribution of DEH n,γ under H 0 as well as under contiguous and fixed alternatives to . Under contiguous alternatives that approach H 0 at the rate n −1/2 , the limit distribution of DEH n,γ is that of (V (t) + c(t)) 2 W γ (t) dt, where c(·) is a shift function (see Section 6 of Dörr, Ebner and Henze (2019)). Under a fixed (and because of invariance without loss of generality standardized) alternative distribution satisfying E X 4 < ∞, we have where Ψ is the CF of X. Moreover, the limit distribution of √ n(DEH n,γ /n − D γ ) is a centred normal distribution with a variance that, under the stronger condition E X 6 < ∞, can be consistently estimated from the data. Thus, by analogy with (2.6), an asymptotic confidence interval for D γ is available. Notice that, when compared with (2.12), the almost sure limits above are 'Laplacian analogues' of (2.12).
If X has a standardized alternative distribution satisfying E X 4 < ∞, we have Since the right hand side is population skewness in the sense of Móri, Rohatgi and Székely (1993) (see Section 8), this result complements the second limit in (2.26). Dörr, Ebner and Henze (2019b) also show that, under a fixed alternative distribution satisfying E X 4 < ∞, √ n DEH * n,γ /n − D * γ ) has a centred limit normal distribution with a variance that can be consistently estimated from X 1 , . . . , X n .
3 The Henze-Zirkler test Henze and Zirkler (1990) observed that the BHEP-statistic defined in (2.8) may be written in the form where h 2 = 1/(2β 2 ). The function g n,β is a nonparametric kernel density estimator with Gaussian kernel w 1 (recall w β from (2.10)) and bandwidth h, applied to Y n,1 , . . . , Y n,n . A choice of the bandwidth h in oder to minimize the mean integrated square error when estimating w 1 yields h = h n = (4/(2d + 1)n) −1/(d+4) and thus β = β n , where β n = 2 −1/2 ((2d + 1)n/4) 1/(d+4) . (3.1) The Henze-Zirkler test statistic is given by HZ n = BHEP n,βn . Apparently unaware of the work of Henze and Zirkler (1990), Bowman and Foster (1993) proposed a test statistic BF n that turned out to satisfy BF n = β d n (2π) d/2 BHEP n,βn (see Section 7 of Henze (2002). Thus, BF n is equivalent to a BHEP-statistic with a smoothing parameter that depends on n. Gürtler (2000) proved that as n → ∞ under H 0 . Under a fixed standardized alternative distribution with density f , Gürtler (2000) showed that √ n Hence, the test of H 0 based on BF n (or HZ n ) is consistent against general alternatives. However, since (3.2) remains true under contiguous alternatives that approach H 0 at the rate n −1/2 , the Henze-Zirkler (Bowman-Foster) test is not able to detect such alternatives, see also Tenreiro (2007) for more general results on Bickel-Rosenblatt-type statistics.

The energy test
For nearly 20 years now, the energy test has emerged as a strong genuine test for multivariate normality. It is based on the notion of energy distance between multivariate distributions. The naming energy stems from a close analogy with Newton's gravitational potential energy, see, e.g., Székely and Rizzo (2013). Besides goodness-of-fit testing, the concept of energy distance has found applications in many other fields, such as testing for equality of distributions, nonparametric extensions of analysis of variance, clustering, or testing for independence via distance covariance and distance correlation, see e.g., Székely and Rizzo (2016).
If X and Y are independent random vectors with distributions P X and P Y ,and X ′ and Y ′ denote independent copies of X and Y , respectively, then the squared energy distance between P X and P Y is defined as provided these expectations exist (which is tacitly assumed). The energy distance D(P X , P Y ) satisfies all axioms of a metric. A proof of the fundamental inequality D(P X , P Y ) ≥ 0, with equality if and only if P X = P Y , follows from Zinger, Kakosyan and Klebanov (1992) or Mattner (1997), see also Székely and Rizzo (2005) for a different proof related to a result of Morgenstern (2001).
The energy test statistic for testing H 0 is Here, Y n,j = n/(n − 1)Y n,j with Y n,j given in (1.1), and N 1 and N 2 are independent random vectors with the normal distribution N d (0, I d ), which are independent of X 1 , . . . , X n . The first expectation is with respect to N 1 . Notice that E N 1 − N 2 = 2Γ((d + 1)/2)/Γ(d/2), where Γ(·) is the gamma function. Since, for a ∈ R d , the distribution of a − N 1 2 does only depend on a 2 , the statistic E n is seen to be invariant. The energy test for multivariate normality rejects H 0 for large values of E n . It is consistent against each fixed non-normal alternative, see Székely and Rizzo (2005), and it is fully implemented in the energy package for R, see Rizzo and Székely (2014). To the authors' knowledge, there are hitherto no results on the behavior of E n with respect to contiguous alternatives to H 0 . Since the intrinsic (quadratic) measure of distance between an alternative distribution P X (which, because of invariance, may be taken as having zero mean and unit covariance matrix) and the standard d-variate normal distribution N d (0, I d ) is given by ∆ E (P X ) := D 2 (P X , N d (0, I d )), say, it would be interesting to see whether √ n(E n − ∆ E (P X )) has a non-degenerate normal limit as n → ∞, with a variance that can consistently be estimated from the data X 1 , . . . , X n . Such a result would pave the way for an asymptotic confidence interval for ∆ E (P X ).

The test of Pudelko
For a fixed r > 0, Pudelko (2005) suggested to reject H 0 for large values of the weighted supremum distance where Ψ n (t) is given in (2.9), and Ψ 0 (t) = exp(− t /2). The test statistic PU n,r is invariant, since it is a function of the scaled residuals Y n,1 , . . . , Y n,n and rotation invariant. This statistic is similar in spirit as the statistic studied by S. Csörgő (1986), which is sup t ≤r |Ψ n (t)| 2 − Ψ 2 0 (t) . Under H 0 , PU n,r converges in distribution to sup 0< t ≤r |P(t)|/ t , where P(·) is a centred Gaussian random element of the Banach space C(B r ) of complex-valued continuous functions, defined on B r := {x ∈ R d : x ≤ r}, equipped with the supremum norm f C(Br) := sup x∈Br |f (x)|. Pudelko (2005) also showed that the test is able to detect contiguous alternatives that approach H 0 at the rate n −1/2 . The consistency of the test based on PU n,r follows easily from S. Csörgő (1989). A drawback of this test is its lack of feasibility, since one has to calculate the supremum of a function inside a ddimensional sphere.

The test of Cox and Small
According to Cox and Small (1978), a main objective of tests of H 0 is 'to see whether an estimated covariance matrix provides an adequate summary of the interrelationships among a set of variables', and that departure from multivariate normality 'is often the occurrence of appreciable nonlinearity of dependence'. To obtain an affine invariant test that assesses the degree of nonlinearity, they propose to find that pair of linear combinations of the original variables, such that one has maximum curvature in its regression on the other. The population functional which underlies the test of Cox and Small is T CS (P X ) = max b∈S d−1 η 2 (b), where , see Cox and Small (1978), p. 268. The test statistic is T n,CS = max b∈S d−1 η 2 n (b), where is the empirical counterpart of η 2 (b). Rejection of H 0 will be for large values of T n,CS . The statistic T n,CS is affine invariant, since it is both a function of Y n,1 , . . . , Y n,n and rotation invariant. Notice that the functional T CS vanishes on the set N d , but T CS (P X ) = 0 does not necessarily imply that P X ∈ N d . Some missing distributional properties of the statistic T n,CS were provided by Ebner (2012). If P X is elliptically symmetric and satisfies E X 6 < ∞, then where m 4 = E X 4 , B is the (d + 1) × (d + 1)-matrix diag(1, . . . , 1, −1), and W (·) is a centred (d + 1)-variate Gaussian process in C(S d−1 , R d+1 ), the space of continuous functions from S d−1 to R d+1 (see Theorem 2.4 of Ebner (2012), where the covariance matrix kernel of W is given explicitly). As a consequence, the test of Cox and Small is not able to detect such elliptical alternatives to normality. Next, writing µ if E X 6 < ∞. Thus, the test based on T n,CS is consistent against each alternative distribution for which the above stochastic limit δ(P X ) (say) is positive. Ebner (2012) also provides the limit distribution of T n,CS under contiguous alternatives to H 0 , but it is still an open problem whether √ n(T n,CS − δ(P X )) has a nondegenerate limit distribution as n → ∞. From a practical point of view, the test of Cox and Small has the drawback that finding the maximum of η 2 n (b) over b ∈ S d−1 is a computationally extensive task. Manzotti and Quiroz (2001) propose to test H 0 by means of averages over the standardized sample of multivariate spherical harmonics, radial functions and their products. For k ∈ N let f 1 , . . . , f k :

The general type of test statistic of Manzotti and Quiroz
To be more specific, let H j , j ≥ 0, be the set of spherical harmonics of degree j in the orthonormal basis of spherical harmonics in d dimensions with respect to the uniform measure on S d−1 , and put G j = j i=0 H i . The number of linear independent spherical harmonics of degree j in dimension d is d+j−1 j − d+j−3 j−2 . A suitable orthonormal basis can be found using Theorem 5.25 in Axler, Bourdon and Ramey (2001) or Manzotti and Quiroz (2001), see also Groemer (1996) or Müller (1998) for details on spherical harmonics. Manzotti and Quiroz (2001) suggest two different choices for f . Putting r j (x) = x j , x ∈ R d , and u(x) = x/ x , x = 0, the first statistic T n,MQ (f 1 ) uses f j of the form g • u for g ∈ G 4 \ H 0 , giving a total of k = d+3 4 − d+2 3 − 1 functions. Due to orthonormality we have V = I k , and since no radial functions are considered, T n,MQ (f 1 ) only tests for aspects of spherical symmetry. The second statistic T n,MQ (f 2 ) uses the functions r 1 and r 3 (g • u), where g ∈ G 2 , which comprise a totality of k = d+1 2 + d + 1 functions.
Both statistics are affine invariant, and Manzotti and Quiroz (2001) derive their limit null distributions, which are sums of weighted independent χ 2 1 random variables. Although the authors do not deal with the question of consistency of their tests, it is easily seen that, under an alternative distribution P X (which, in view of invariance, is assumed to be standardized), and suitable conditions on f 1 , . . . , f k , we have Since there are non-normal distributions for which the above (non-negative) stochastic limit vanishes, the tests of Manzotti and Quiroz (2001) are not consistent against general alternatives. To the best of our knowledge, there are no further asymptotic properties of T n,MQ under alternatives to H 0 .

Tests based on skewness and kurtosis
A still very popular group of tests for H 0 employ measures of multivariate skewness and kurtosis. The popularity of these tests stems from the widespread belief that, in case of rejection of H 0 , there is some evidence regarding the kind of departure from normality of the underlying distribution. The then state of the art regarding this group of tests has been reviewed in Henze (2002), but for the sake of completeness, we revisit the most important facts. The classical invariant measures of multivariate sample skewness and kurtosis due to Mardia (1970) respectively. The functional (population counterpart) corresponding to b (1) n,d is β , where X is standardized, X 1 , X 2 are i.i.d. copies of X, and E X 6 < ∞. The functional accompanying kurtosis is β n,d has an upper rejection region, whereas the test based on b (2) n,d is two-sided. If the distribution of X is elliptically symmetric, we have nb where χ 2 d , χ 2 d(d−1)(d+4) are independent χ 2 -variables with d and d(d − 1)(d + 4) degrees of freedom, respectively, see Baringhaus and Henze (1992), and Klar (2002). Notice that α 1 = α 2 = 6 under H 0 , whence nb (1) n,d D −→ 6χ 2 d(d+1)(d+2)/6 under normality, see Mardia (1970). From (8.2), it follows that the test of H 0 based on b (1) n,d is not consistent against spherically symmetric alternatives satisfying E X 6 < ∞. If β (1) has a centred non-degenerate limit normal distribution as n → ∞, see Theorem 3.2 of Baringhaus and Henze (1992). The skewness functional β (1) n,d as a test statistic for assessing multivariate normality is computed under the very assumption of normality, the inclination to impute supposedly diagnostic properties to b (1) n,d in case of rejection of H 0 in the sense that 'there is evidence that the underlying distribution is skewed' is not justified, at least not in terms of statistical significance. In fact, the limit distribution of nb (1) n,d under certain classes of elliptically symmetric distributions is stochastically much larger than the limit null distribution of nb (1) n,d (see Baringhaus and Henze (1992)), and so rejection of H 0 based on b (1) n,d may be due to an underlying long-tailed elliptically symmetric distribution.

Regarding kurtosis, we have
where σ 2 depends on mixed moments of X up to order 8, see Henze (1994). Under H 0 , we have β (2) d = d(d + 2) and σ 2 = 8d(d + 2), and the limit distribution was already obtained by Mardia (1970), see also Klar (2002) for the case that P X is elliptically symmetric. It follows that, under the condition E X 8 < ∞, Mardia's kurtosis test for normality is consistent if and only if β The critical remarks made above on alleged diagnostic capabilies of tests for H 0 based on measures of skewness apply mutatis mutandis to a test for normality based on b (2) n,d or any other measure of multivariate kurtosis.
Among the many measures of multivariate skewness, we highlight skewness in the sense of Móri, Rohatgi and Székely (1993), because it emerges in connection with several weighted L 2 -statistics for testing H 0 . This measure is defined by The corresponding functional (population counterpart) is β where X is assumed to be standardized and E X 6 < ∞. Limit distributions for b (1) n,d have been obtained by Henze (1997) both for the case that P X is elliptically symmetric (which implies β (1) d = 0) and the case that β (1) d > 0, see also Klar (2002). A further measure of multivariate skewness that has been reviewed in Henze (2002) is skewness in the sense of Malkovich and Afifi (1973), which is defined as

General limit distribution theory for b
(1) n,d,M is given in Baringhaus and Henze (1991). As for further measures of multivariate kurtosis, we mention the measure introduced by Koziol (1989). The corresponding functional is β copies of the standardized vector X, and E X 8 < ∞. General asymptotic distribution theory for b (2) n,d is provided by Henze (1994b) and Klar (2002). Henze (2002) also reviewed kurtosis in the sense of Malkovich and Afifi (1973), which is defined as Limit distribution theory for b (2) n,d,M has been obtained by Baringhaus and Henze (1991) and Naito (1998). Since the review Henze (2002), there have been the following suggestions to test H 0 by means of measures of multivariate skewness and kurtosis (which, however, do not lead to consistent tests and share the drawback stated at the beginning of this section): Kankainen, Taskinen and Oja (2007) consider invariant tests of multivariate normality that are based on the Mahalanobis distance between two multivariate location vector estimates (as a measure of skewness) and on the (matrix) distance between two scatter matrix estimates (as a measure of kurtosis). Special choices of these estimates yield generalizations of Mardia's skewness an kurtosis. The authors obtain asymptotic distribution theory of their test statistics both under normality and certain contiguous alternatives to H 0 , and they compare the limiting Pitman efficiencies to those of Mardia's tests based on b (1) n,d and b (2) n,d . Doornik and Hansen (2008) propose a noninvariant test based on skewness and kurtosis. Enomoto, Hanusz, Hara and Seo (2020) consider a transformation of Mardia's kurtosis statistic, with the aim of improving the finite-sample approximation with respect to a normal limit distribution. Arcones (2007) proposed two invariant test statistics that are based on the following characterizations, see, e.g., Cramér (1936). Let m ≥ 2 be a fixed integer, and let X 1 , . . . , X m be i.i.d. d-dimensional vectors satisfying E(X 1 ) = 0 and E(

Miscellaneous results
. A statistic that corresponds to the first characterization is D n,m = Ψ n,m (t) − Ψ 0 (t) 2 w β (t) dt, where Ψ n,m (t) = n! −1 (n − m)! = exp it ⊤ m −1/2 m p=1 Y n,jp , and Σ = means summation over all j 1 , . . . , j m ∈ {1, . . . , n} such that j p = j q if p = q. Notice that this approach is a generalization of the BHEP-statistic given in (2.8). The statistic which is tailored to the second characterization is E n,m = Ψ n,m (1) − Ψ n,1 (t) 2 w β (t) dt. Both statistics have representations in form of multiple sums. By using the theory of U -statistics with estimated parameters, Arcones (2007) derives almost sure limits of D n,m and E n,m as well as the limit distributions of n D n,m and n E n,m under H 0 . Some very limited simulations, performed for n ≤ 15 and d = 2, indicate that the power of these tests is comparable to that of the BHEP-test. However, the computational burden involved increases rapidly with m.
Without providing any distribution theory, Hwu, Han and Rogers (2002) suggest an invariant two-stage test procedure for testing H 0 . This procedure combines a modified correlation coefficient related to a Q-Q-plot of the ordered values of Y n,j 2 , j = 1, . . . , n, against ordered quantiles of the χ 2 d -distribution, and a test based on Mardia's nonnegative invariant measure of skewness b (1) n,d given in (8.1). Liang, Pan and Yang (2004) deal with Q-Q-plots based on functions of (j(j + 1)) −1/2 (X 1 + . . . + X j − jX j+1 ), j = 1, . . . , n − 1, and hence recommend procedures that are not even invariant with respect to permutations of X 1 , . . . , X n . The latter objection also holds for the procedure suggested by Liang and Bentler (1999). Tan et al. (2005) extend the projection procedure of Liang et al. (2000) to test for multivariate normality with incomplete longitudinal data with small sample size, including cases when the sample size n is smaller than d. Hanusz and Tarasińska (2008) correct an inaccuracy of the (non-invariant) test of Srivastava and Hui (1987), and Maruyama (2007) derives approximations of expectations and variances related to that test under alternative distributions. Without providing any theoretical results, Hanusz and Tarasińska (2008) aim at transforming two graphical methods for assessing H 0 into formal statistical tests. A variant of this approach was considered by Madukaife and Okafor (2018). Cardoso de Oliveira and Ferreira (2010) suggest to perform a chiquare test based on Y n,1 2 , . . . , Y n,n 2 (see also Moore and Stubblebine (1981)), and Batsidis et al. (2013) extend this approach to include more general power divergence type of test statistics. Madukaife and Okafor (2019) consider ℓ 1 -and ℓ 2 -type measures of deviation between Y n,j 2 and corresponding approximate expected order statistics of a χ 2 d -distribution (for tests based on Y n,1 2 , . . . , Y n,n 2 , see also Section 5.2 of Henze (2002)). Voinov et al. (2016) compare several test statistics that, for fixed r ≥ 2, are quadratic forms in the vector (V n,1 , . . . , V n,r ) ⊤ . Here, V n,j = (N n,j − n/r)/( n/r), N n,j = n k=1 1{c j−1 < Y n,k 2 ≤ c j }, and 0 < c 1 < . . . < c r−1 < c r = ∞, where c j is the (j/r)-quantile of the χ 2 d -distribution, j = 1, . . . , r − 1. Jönsson (2011) investigates the finite-sample performance of of the Jarque-Bera test for H 0 in order to improve the size of the test. Koizumi, Hyodo and Pavlenko (2014) improve upon multivariate Jarque-Bera type tests by means of transformations. Simulations show that such transformatinos essentially improves test accuracy when d is close to n. Kim (2016) generalizes the univariate Jarque-Bera test and its modifications to the multivariate versions using an orthogonalization of data and compares it with competitors in a simulation study. Kim and Park (2018) propose a non-invariant test based on univariate Anderson-Darling type statistics that are averaged out over the d coordinates. Villasenor Alva and González Estrada (2009) suggest a non-invariant test that is based on the average of Shapiro-Wilk statistics, applied to each of the components of Y n,1 , . . . , Y n,n . By using an idea of Fromont and Laurent (2006), Tenreiro (2011) proposes an invariant consistent multiple test procedure that combines Mardia's measures of skewness and kurtosis and two members of the family of BHEP tests. The combined procedure rejects H 0 if one of the statistics is larger than its (1 − u n,α )-quantile under H 0 , where u n,α is calibrated so that the combined test has a desired level of significance α. In the same spirit, Tenreiro (2017) combines two BHEP-tests and the 'extreme' BHEP-tests, the statistics of which are given by the right hand sides of (2.14) and (2.15). Majerski and Szkutnik (2010) consider the problem of testing H 0 against some alternatives that are invariant with respect to a subgroup of the full group of affine transformations and obtain approximations to the most powerful invariant tests. Special emphasis is given to exponential and uniform alternatives in the case d = 2, whereas the case d ≥ 3 is only sketched. In the spirit of projection pursuit tests (see Section 8.1 of Henze (2002)), which are based on Roy's union-intersection principle (Roy, S.N. (1953)), Zhou and Shao (2014) propose a non-invariant test that combines the Shapiro-Wilk test and Mardia's kurtotis test. In the same spirit, Wang and Hwang (2011) suggest a statistic that considers solely the Shapiro-Wilk statistic.
Wang (2014) provides a MATLAB package for testing H 0 , which is implemented as an interactive and graphical tool. The package comprises 12 different tests, among which are the energy test, the Henze-Zirkler test, and the tests based on Mardia's skewness and kurtosis. Thulin (2014) proposes six invariant tests for H 0 , the common basis of which are characterizations of independence of sample moments of the multivariate normal distribution.
10 Comparative simulation studies 10.1 Available simulation studies Mecklin and Mundfrom (2005) perform an extensive simulation study with 13 tests for multivariate normality. From this study, they conclude that 'if one is going to rely on one and only one procedure, the Henze-Zirkler test is recommended. This recommendation is based on the relative ease of use (the test statistic has an approximately lognormal asymptotic distribution), good Monte Carlo simulation results, and mathematically proven consistency against all alternatives'. Farrell et al. (2007) compare four tests of multivariate normality and conclude: 'The results of our simulation suggest that, relative to the other two tests considered, the Henze and Zirkler test generally possesses good power across the alternative distributions investigated, in particular for n ≥ 75'. Hanusz et al. (2018) compare four test of H 0 that are based on a combination of measures of multivariate skewness and kurtosis, and the Henze-Zirkler test. They concluded that 'the Henze-Zirkler test best preserves the nominal significance level', and that 'for the  number of traits and sample sizes considered, it is not possible to indicate the most powerful test for all kinds of alternative distributions considered in the paper'. Joenssen and Vogel (2014) investigate 15 tests of H 0 , all of which freely available as R-functions. They find that some tests are unreliable and should either be corrected or removed, or their deficits should be commented upon in the documentation by the package maintainer. Moreover, they summarize: 'On the question of whether or not multivariate tests offer an advantage over simply testing each marginal distribution with a univariate test, the answer is a resounding yes. Not only are some multivariate tests able to detect deviations from normality that are not reflected in the marginals of the distribution, but these tests are also, in part, more powerful for distributions that do display the deviations in the marginals'.

New simulation study
This subsection compares the finite-sample power performance of the tests presented in this survey by means of a Monte Carlo simulation study. All simulations are performed using the statistical computing environment R, see R Core Team (2020). The tests were implemented in the accompanying R package mnt, see Butsch and Ebner (2020).
We consider the sample sizes n = 20, n = 50 and n = 100, the dimensions d = 2, d = 3 and d = 5, and the nominal level of significance is set to 0.05. Throughout, critical values for the tests have been simulated with 100 000 replications under H 0 , see Table 1. Note that, in order to ease the comparison with the original articles, we state the empirical quantiles of 16γ 2+d/2 /π d/2 HV n,γ , π −d/2 HJ n,γ , (γ/π) d/2 HJM n,γ , (γ/π) d/2 d −2 DEH n,γ , and (γ/π) d/2 d −2 DEH * n,γ and chose whenever available the tuning parameter γ according to the suggestions of the authors, respectively. For the sake of readability, we subduct the index n for all tests in the tables. The values of Table  1 are also reported in package mnt in the data frame Quantile095 for easy access. Each entry in a table that refers to empirical rejection rates as estimates of the power of the test is based on 10 000 replications, with the exception of the HJM test, where 1 000 replications have been considered, due to the heavy computation time of the procedure.
We consider a total of 29 alternatives as well as a representative of the multivariate normal distribution. By NMix(p, µ, Σ) we denote the normal mixture distribution generated by where Σ > 0 stands for a positive definite matrix. In the notation of above, µ = 3 stands for a d-variate vector of 3's and Σ = B d for a (d × d)-matrix containing 1's on the main diagonal and 0.9's for each off-diagonal entry. We write t ν (0, I d ) for the multivariate t-distribution with ν degrees of freedom, see Genz and Bretz (2009). By DIST d (ϑ) we denote the d-variate random vector generated by independently simulated components of the distribution DIST with parameter vector ϑ, where DIST is taken to be the uniform distribution U, the lognormal distribution LN, the beta distribution B, as well as the Pearson Type II P II and Pearson Type VII distribution P V II . For the latter distribution, we used the R package PearsonDS, see Becker and Klößner (2017). The spherical symmetric distributions were simulated using the R package distrEllipse, see Ruckdeschel et al. (2006), and they are denoted by S d (DIST), where DIST stands for the distribution of the radii, which was chosen to be the exponential, the beta, the χ 2 -distribution and the lognormal distribution. With MAR d (DIST) we denote N d (0, I d )-distributed random vectors, where the dth component is independently replaced by a random variable following the distribution DIST. Here, we chose the exponential, the χ 2 , student's t and the gamma distribution. With NM d (ϑ) we denote the normal mixture distributions generated by where Σ ϑ is a positive definite (d × d)-matrix with 1's on the diagonal and the constant ϑ for each off diagonal entry. In this family of non-normal distributions each component follows a normal law. The symbol S|N d | stands for the distribution of ±|X|, where X D = N d (0, I d ), the absolute value | · | is applied componentwise, and ± assigns, independently of each other and with equal probability 0.5, a random sign to each component of |X|. Finally, we consider the distribution N d (µ d , Σ 0.5 ), with µ d = (1, 2, . . . , d) ⊤ and the same covariance structure as reported for the NM-alternatives, in order to show that all tests under consideration are invariant and indeed have a type I error equal to the significance level of 5%.
The results of the weighted L 2 -type tests in Tables 2 -4 are presented for the same tuning parameters as in Table 1, and in order to keep the tables concise the values are omitted.
First, we evaluate the results for d = 2. A close look at Table 2 reveals that, for the family of normal mixture distributions, the HZ-test and the PU-test perform best when the shifted standard normal distributions are mixed, whereas for different covariance matrices, the strongest procedure is HJM. The HJM-test performs also best throughout the multivariate t-distributions. For the independently simulated components, T MQ (f 2 ) is strong, especially for marginal distributions with bounded support. Interestingly, each of the tests that are based on measures of skewness and kurtosis, as well as the HV-and the HJ-test, completely fail to detect these alternatives. For the Pearson-Type VII alternatives, HJM again has the strongest power, while BHEP shows the strongest performance for LN 2 (0, 0.5) and B 2 (1, 2). The spherically symmetric alternatives with bounded support of the radial distributions are well detected by the HZ-and the E-test. For the case of unbounded support of the radial distribution, the strongest test is again HJM. This test is also strongest for the marginally disturbed alternatives MAR 2 (DIST), where it is just outperformed by the PU-test for the disturbance by Exp(1)-and χ 2 -random variables. The NM d (ϑ)-distributions are uniformly best detected by HJM, although the power is not very strong, whereas all other tests almost completely fail to detect these alternatives. Notably, the S|N 2 | alternatives are best detected by T MQ (f 1 ). Overall, for the chosen alternatives HJM performs best, but it also lacks power especially when the support of the distribution is bounded. From a robust point of view, the weighted L 2 procedures, like DEH * , the HZ-test as well as the energy test E perform very well, especially if the focus is on consistency.
In dimensions d = 3 and d = 5, one can paint the same picture for the allocation of the best procedures to the alternatives. Interestingly, the power of the procedures increases compared to the lower-dimensional setting, which appears to be counterintuitive in view of the curse of dimensionality. Some noticeable phenomena arise: For the S d (B(2, 2)) distribution, some of the tests, like HV, HJ and T CS , b (1) M seem to loose power when the sample size is increased. An explanation for this behaviour for the latter tests might be that these procedures use an approximation of the maximum on the unit sphere, which might be harder to approximate for larger samples. In the case d = 3, we also observe this behaviour for the HJM-test. Interestingly, the HJM-test as well as the PU-test increase the power against NM d (ϑ)-alternatives in comparison to the case d = 2, whereas the other procedures nearly uniformly fail to distinguish them from the null hypothesis in each dimension considered.

Conclusions and outlook
From a practical point of view, we recommend to use the computationally efficient weighted L 2 -type procedures. like HZ and DEH * , or the energy test E, since they show a good balance between fast computation time and robust power against many alternatives, and they do not exhibit any particular weakness. If computation time is not an issue we suggest to employ the HJM-test, as it outperforms most of the other procedures. Note that by choosing other tuning parameters, the weighted L 2 -procedures are expected to benefit in terms of power against specific alternatives, especially if one is able to choose the tuning parameter in a data dependent way. For a first step in this direction for univariate goodness-of-fit tests, see Tenreiro (2019). In general, it would be nice to have explicit solutions of the Fredholm integral equation (2.4). For some recent cases in which such integral equations have witnessed explicit solutions in the context of goodness-of-fit testing, see, e.g., Theorem 3.2 of Baringhaus and Taherizadeh (2010) or Theorems 3 and 5 of Hadjicosta and Richards (2019). High-dimensional L 2 -statistics for testing normality have not been considered so far in the literature. The efficient implementation of the tests in the package mnt admit first simulations, which indicate that new interesting phenomena arise.