A new test of multivariate normality by a double estimation in a characterizing PDE

This paper deals with testing for nondegenerate normality of a $d$-variate random vector $X$ based on a random sample $X_1,\ldots,X_n$ of $X$. The rationale of the test is that the characteristic function $\psi(t) = \exp(-\|t\|^2/2)$ of the standard normal distribution in $\mathbb{R}^d$ is the only solution of the partial differential equation $\Delta f(t) = (\|t\|^2-d)f(t)$, $t \in \mathbb{R}^d$, subject to the condition $f(0) = 1$. By contrast with a recent approach that bases a test for multivariate normality on the difference $\Delta \psi_n(t)-(\|t\|^2-d)\psi(t)$, where $\psi_n(t)$ is the empirical characteristic function of suitably scaled residuals of $X_1,\ldots,X_n$, we consider a weighted $L^2$-statistic that employs $\Delta \psi_n(t)-(\|t\|^2-d)\psi_n(t)$. We derive asymptotic properties of the test under the null hypothesis and alternatives. The test is affine invariant and consistent against general alternatives, and it exhibits high power when compared with prominent competitors.


Introduction
A useful tool for assessing the fit of data to a family of distributions are empirical counterparts of distributional characterizations. Such characterizations often emerge as solutions of an equation of the type ρ(Df, f ) = 0. Here, f may be the moment generating function, the Laplace transform, or the characteristic function, and D denotes a differential operator, i.e., this operator can be regarded as ordinary differentiation if f is a function of only one variable or, for instance, the Laplace operator in the multivariate case. Such (partial) differential equations have been used to test for multivariate normality, see [8,19], exponentiality, see [3], the gamma distribution, see [18], the inverse Gaussian distribution, see [17], the beta distribution, see [27], the univariate and multivariate skew-normal distribution, see [24] and [25], and the Rayleigh distribution, see [23]. In all these references, the authors propose a goodness-of-fit test by plugging in an empirical counterpart f n for f into ρ(Df, f ), and by measuring the deviation from the zero function in a suitable function space. If, under the hypothesis to be tested, the function f has a closed form and is known, there are two options for obtaining an empirical counterpart to the characterizing equation, namely ρ(Df n , f ) = 0, or ρ(Df n , f n ) = 0. To the best of our knowledge, the effect of considering both options for the same testing problem and to study the consequences on the performance of the resulting test statistics has not yet been considered, neither from a theoretical point of view, nor in a simulation study. In this spirit, the purpose of this paper is to investigate the effect on the power of a recent test for multivariate normality based on a characterization of the multivariate normal law in connection with the harmonic oscillator, see [8].
In what follows, let d ≥ 1 be a fixed integer, and let X, X 1 , . . . , X n , . . . be independent and identically distributed (i.i.d.) d-dimensional random (column) vectors, that are defined on a common probability space (Ω, A, P). We write P X for the distribution of X, and we denote the d-variate normal law with expectation µ and nonsingular covariance matrix Σ by N d (µ, Σ). Moreover, N d = {N d (µ, Σ) : µ ∈ R d , Σ ∈ R d×d positive definite} stands for the class of all nondegenerate d-variate normal distributions. To check the assumption of multivariate normality means to test the hypothesis H 0 : P X ∈ N d , (1) against general alternatives. The starting point of this paper is Theorem 1.1 of [8]. To state this result, let ∆ denote the Laplace operator, · the Euclidean norm in R d , and I d the identity matrix of size d. Then Theorem 1.1 of [8] states that the characteristic function is the unique solution of the partial differential equation Writing X n = n −1 n j=1 X j for the sample mean and S n = n −1 n j=1 (X j − X n )(X j − X n ) for the sample covariance matrix of X 1 , . . . , X n , respectively, where the superscript means transposition, the standing tacit assumptions that P X is absolutely continuous with respect to Lebesgue measure and n ≥ d + 1 guarantee that S n is invertible almost surely, see [9]. The test statistic is based on the so-called scaled residuals Here, S −1/2 n is the unique symmetric positive definite square root of S −1 n . Letting ψ n (t) = n −1 n j=1 exp(it Y n,j ), t ∈ R d , denote the empirical characteristic function (ecf) of Y n,1 , . . . , Y n,n , the test statistic proposed in [8] is where and a > 0 is a fixed constant. The statistic T n,a has a nice closed-form expression as a function of Y n,i Y n,j , i, j ∈ {1, . . . , n} (see display (10)-(12) of [8]) and is thus invariant with respect to full-rank affine transformations of X 1 , . . . , X n . Theorems 2.2 and 2.3 of [8] show that, elementwise on the underlying probability space, suitably rescaled versions of T n,a have limits as a → ∞ and a → 0, respectively. In the former case, the limit is a measure of multivariate skewness, introduced in [26], whereas Mardia's time-honored measure of multivariate kurtosis (see [22]) shows up as a → 0. As n → ∞, the statistic T n,a has a nondegenerate limit null distribution (Theorem 4.1 of [8]), and a test of (1) that rejects H 0 for large values of T n,a is able to detect alternatives that approach H 0 at the rate n −1/2 , irrespective of the dimension d (Corollary 5.2 of [8]). Under an alternative distribution satisfying E X 4 < ∞, n −1 T n,a converges almost surely to a measure of distance ∆ a between P X and the class N d (Theorem 6.1 of [8]). As a consequence, the test for multinormality based on T n,a is consistent against any such alternative. By Theorem 6.5 of [8], the sequence √ n(n −1 T n,a − ∆ a ) converges in distribution to a centered normal law. Since the variance of this limit distribution can be estimated consistently from X 1 , . . . , X n (Theorem 6.7 of [8]), we have an asymptotic confidence interval for ∆ a .
The novel approach taken in this paper is to replace both of the functions f occurring in (2) by the ecf ψ n . Since, under H 0 , ∆ψ n (t) and ( t 2 − d)ψ n (t) should be close to each other for large n, it is tempting to see what happens if, instead of T n,a defined in (3), we base a test of H 0 on the weighted L 2 -statistic and reject H 0 for large values of U n,a .
Since ∆ψ n (t) = −n −1 n j=1 Y n,j 2 exp(it Y n,j ), the relation valid for c ∈ R d and a > 0, and tedious but straightforward calculations yield the representation which is amenable to computational purposes. Moreover, U n,a turns out to be affine invariant.
The rest of the paper is organized as follows. In Section 2, we derive the elementwise limits of U n,a , after suitable transformations, as a → 0 and a → ∞. Section 3 deals with the limit null distribution of U n,a as n → ∞. In Section 4, we show that, under the condition E X 4 < ∞, n −1 U n,a has an almost sure limit as n → ∞ under a fixed alternative to normality. As a consequence, the test based on U n,a is consistent against any such alternative. Moreover, we prove that the asymptotic distribution of U n,a , after a suitable transformation, is a centered normal distribution. In Section 5, we present the results of a simulation study that compares the power of the test for normality based on U n,a with that of prominent competitors. Section 6 shows a real data example, and Section 7 contains some conclusions and gives an outlook on potential further work.
2 The limits a → 0 and a → ∞ This section considers the (elementwise) limits of U n,a as a → 0 and a → ∞. The results shed some light on the role of the parameter a that figures in the weight function w a in (4). Notice that, from the definition of U n,a given in (5), we have lim a→∞ U n,a = 0 and lim a→0 U n,a = ∞, since ∆ψ n (t) − t 2 − d ψ n (t) 2 dt = ∞. Suitable transformations of U n,a , however, yield well-known limit statistics as a → 0 and a → ∞.
Proof. Starting with (7), (a/π) d/2 U n,a is, apart from the factor 1/n, a double sum over j and k. Since each summand for which j = k vanishes asymptotically as a → 0, we have as a → 0, and the result follows from the fact that n j=1 Y n,j 2 = nd.
Theorem 2.1 means that a suitable affine transformation of U n,a has a limit as a → 0, and that this limit is -apart from the additive constant d 2 -the time-honored measure of multivariate kurtosis in the sense of Mardia, see [22]. The same measure -without the subtrahend d 2 -shows up as a limit of (a/π) d/2 T n,a as a → 0, see Theorem 2.3 of [8]. The next result shows that U n,a and T n,a , after multiplication with the same scaling factor, converge to the same limit as a → ∞, cf. Theorem 2.1 of [8].
Theorem 2.2. Elementwise on the underlying probability space, we have Proof. The proof follows the lines of the proof of Theorem 2.1 of [8] and is thus omitted.
The limit figuring on the right hand side of (9) is a measure of multivariate skewness, introduced by Móri, Rohatgi and Székely, see [26]. Theorem 2.1 and Theorem 2.2 show that the class of tests for H 0 are in a certain sense "closed at the boundaries" a → 0 and a → ∞. However, in contrast to the test for multivariate normality based on U n,a for fixed a ∈ (0, ∞), tests for H 0 based on measures of multivariate skewness and kurtosis lack consistency against general alternatives, see, e.g., [4,5,13].
3 The limit null distribution of U n,a In this section, we assume that the distribution of X is some nondegenerate d-variate normal law. In view of affine invariance, we may further assume that E(X) = 0 and E(XX ) = I d . By symmetry, it is readily seen that U n,a defined in (5) takes the form where In view of (10), our setting for asymptotics will be the separable Hilbert space H of (equivalence classes of) measurable Here and in the sequel, each unspecified integral will be over R d . The scalar product and the norm in H are given by f, H , respectively. Notice that, in this notation, (10) takes the form U n,a = S n 2 H , where S n is given in (11). Putting ψ(t) = exp(− t 2 /2) as before, and writing D −→ for convergence in distribution, the main result of this section is as follows.
Theorem 3.1. If X has some nondegenerate normal distribution, we have the following: a) There is a centered Gaussian random element S of H having covariance kernel Proof. Since the proof is analogous to the proof of Proposition 3.2 of [8], it will only be sketched. If S 0 n (t) stands for the modification of S n (t) that results if we replace Y n,j with X j , then a Hilbert space central limit theorem holds for S 0 n , since the summands of S 0 n are square-integrable centered random elements of H. The idea is thus to find a random element S n of H such that S n D −→ S and S n − S n H = o P (1). Putting Y n,j = X j + ∆ n,j in (11) and using the fact . . , X n and t and satisfy |Θ j − t X j | ≤ |t ∆ n,j |, |Γ j − t X j | ≤ |t ∆ n,j |, some algebra and Proposition A.1 of [8] show that a choice of S n is given by Tedious calculations then show that the covariance kernel of S, which is E[h(X, s)h(X, t)], is equal to K(s, t) given above.
Let U ∞,a denote a random variable having the limit distribution of U n,a given in (12). Since the distribution of U ∞,a is that of S 2 H , where S is the Gaussian random element of H figuring in Theorem 3.1, it is the distribution of j≥1 λ j N 2 j , where N 1 , N 2 , . . . is a sequence of i.i.d. standard normal random variables, and λ 1 , λ 2 , . . . are the positive eigenvalues corresponding to normalized eigenfunctions of the integral operator f → Af on H, where (Af )(s) = K(s, t)f (t) w a (t) dt. It seems to be hopeless to obtain closed-form expressions of these eigenvalues. However, in view of Fubini's theorem, we have and thus straightforward manipulations of integrals yield the following result.
From this result, one readily obtains It is interesting to compare this limit relation with (8). If the underlying distribution is standard normal, i.e., if Now, writing Y n,j = X j + ∆ n,j and using Proposition A.1 of [8], the right hand side of (8) turns out to converge in probability to E X 4 − d 2 as n → ∞, and this expectation is the right hand side of (13). Regarding the case a → ∞, the representation of E[U ∞,a ] easily yields This result corresponds to (9), since, by Theorem 2.2 of [14], the right hand side of (9), after multiplication with n, converges in distribution to 2(d + 2)χ 2 d as n → ∞ if P X = N d (0, I d ). Here, χ 2 d is a random variable having a chi square distribution with d degrees of freedom.

Limits of U n,a under alternatives
In this section we assume that X, X 1 , X 2 , . . . are i.i.d., and that E X 4 < ∞. Moreover, let E(X) = 0 and E(XX ) = I d in view of affine invariance, and recall the Laplace operator ∆ from Section 1. The characteristic function of X will be denoted by we first present an almost sure limit for n −1 U n,a .
Theorem 4.1. We have U n,a n a.s.
The squared norm in H of the second summand on the right hand side converges to zero almost surely, see the treatment of U n,1 in the proof of Theorem 6.1 of [8]. The same holds for the first summand, since its modulus is bounded from above by 4 t n −1 n j=1 ∆ j + 2n −1 n j=1 ∆ j 2 , and the inequality n −1 n j=1 ∆ j ≤ (n −1 n j=1 ∆ j 2 ) 1/2 , together with Proposition A.1 b) of [8], yield the assertion.
Since, under the conditions of Theorem 4.1, Γ a is strictly positive if the underlying distribution does not belong to N d , U n,a converges almost surely to ∞ under such an alternative, and we have the following result. The next result, which corresponds to Theorem 6.4 of [8], shows that the (population) measure of multivariate skewness in the sense of Móri, Rohatgi and Székely emerges as the limit of Γ a , after a suitable scaling, as a → ∞.
In what follows, let Y, Z be independent copies of X. Since ψ + (t) 2 = E[CS + (t Y )CS + (t Z)], the addition theorems for the cosine and the sine function and symmetry yield Putting c = Y − Z, display (6) then gives Likewise, it follows that ψ + (t)∆ψ Finally, and it follows that and the assertion follows.
We close this section with a result on the asymptotic normality of U n,a under fixed alternatives. That such a result holds in principle follows from Theorem 1 of [2]. To state the main idea, write again CS ± (ξ) = cos(ξ) ± sin(ξ) and notice that, by (10), U n,a = S n 2 H , where S n (t) is given in (11). Putting  (14) show that √ n U n,a n − Γ a = √ n S * where V * n (t) = √ n(S * n (t) − z(t)), t ∈ R d . In the sequel, let ∇(f )(t) denote the gradient of a differentiable function f : R d → R, evaluated at t, and write Hf (t) for the Hessian matrix of f at t if f is twice continuously differentiable. By proceeding as in the proof of Theorem 3.3 of [8], there is a centered Gaussian element V * of H having covariance kernel such that V * n D −→ V * as n → ∞. In view of (15) and the fact that the distribution of 2 V * , z H is centered normal, we have the following result. where We remark that a consistent estimator of σ 2 a can be obtained by analogy with the reasoning given in [8], see Lemma 6.6, Theorem 6.7 and Remark 6.11 of that paper.

Simulations
In this section, we present the results of a Monte Carlo simulation study on the finite-sample power of the tests based on U n,a . This study is twofold in the sense that we consider testing for both univariate and multivariate normality, where the latter case is restricted to dimensions d ∈ {2, 3, 5}. Moreover, the study is designed to match and complement the counterparts in [8], Section 7, and [19], since we take exactly the same setting with regard to sample size, nominal level of significance and selected alternative distributions. In this way, we facilitate an easy comparison with existing procedures. In the univariate case, we consider sample sizes n ∈ {20, 50, 100} and restrict the simulations to n ∈ {20, 50} in the multivariate setting. The nominal level of significance is fixed throughout all simulations to 0.05. We simulated empirical critical values under H 0 for d −2 (a/π) d/2 U n,a with 100 000 replications, see Table 1.
The rows entitled '∞' give approximations of the quantiles of the limit random element U ∞,a = S 2 (t)w a (t) dt in Theorem 3.1 b). The entries have been calculated by the method presented in [8], Section 7, setting = 100 000 and m = 2 000 for d ∈ {2, 3, 5, 10}. Note that this approach only relies on the structure of the covariance kernel given in Theorem 3.1 a), the multivariate normal distribution, and the weight function.
We contrast the results in Table 3 with those of Table 4 in [8], which exhibits powers of the related test statistic T n,a . First we oppose the tests T n,a and U n,a . Remarkably, the test based on U n,a shows a better performance for the NMix-alternatives, especially for the choice of the tuning parameter a ∈ {0.25, 0.5}. On the other hand, U n,a is almost uniformly dominated by T n,a for the t ν -distribution. If the underlying distribution is χ 2 , beta, gamma, Weibull, Gumbel or lognormal, both procedures have a comparable power. Table 4 in [8] also provides finite-sample powers of strong either time-honored or recent tests for normality, like the Shapiro-Wilk test, the Shapiro-Francia test, the Anderson-Darling test, the Baringhaus-Henze-Epps-Pulley test (BHEP), see [20], the del Barrio-Cuesta-Albertos-Mátran-Rodríguez-Rodríguez test (BCMR), see [7], and the Betsch-Ebner test, see [6]. For a description of the test statistics and critical values, see [8] and the references therein. A comparison shows that, for suitable choice of the tuning parameter, U n,a can compete with each of these tests, sometimes outperforming them, for example in case of the uniform distribution, n = 20, and a = 0.25, and the χ 2 15 -distribution for all sample sizes and a = 5, but mostly being on the same power level. It is interesting to see that the finite-sample power of U n,a depends heavily on the choice of a. This observation is in contrast to the behavior of T n,a , the power of which depends much less on a.
In the multivariate case, the alternative distributions are selected to match those employed in the simulation studies in [8,19], and are given as follows. Let NMix(p, µ, Σ) be the normal mixture distribution generated by (1 − p) N d (0, I d ) + p N d (µ, Σ), where p ∈ (0, 1), µ ∈ R d , and Σ is a positive definite matrix. In this notation, µ = 3 stands for a d-variate vector of 3's, and Σ = B d is a (d × d)-matrix containing 1's on the main diagonal, and each off-diagonal entry has the value 0.9. We denote by t ν (0, I d ) the multivariate t ν -distribution with ν degrees of freedom, see [11]. The acronym DIST d (ϑ) stands for a d-variate random vector with i.i.d. marginal laws that belong to the distribution DIST with parameter ϑ. In the sequel, DIST is either the Cauchy distribution C, the logistic distribution L, the gamma distribution Γ, or the Pearson Type VII distribution P V II . For the latter distribution, ϑ denotes the number of degrees of freedom. The spherical symmetric distributions have been simulated using the R package distrEllipse, see [29]. These are denoted by S d (DIST), where DIST stands for the distribution of the radii, and was chosen to be the exponential, the beta and the χ 2 -distribution.
Tables 4 -6 can be contrasted to Tables 5 -7 in [8], and for n = 50, with Tables 3 -5 in [19]. Again, we start with a comparison of T n,a and U n,a . For d = 2 (see Table 4 and Table 5 in [8]), T n,a is outperformed by U n,a for NMix(0.1, 3, I 2 ) and NMix(0.9, 3, B 2 ), but shows a stronger performance for NMix(0.5, 3, B 2 ). In case of the multivariate t ν -distributions, both procedures have a similar performance, as well as for the DIST d (ϑ) distributions. The spherical symmetric distributions are dominated by U n,a for a suitable choice of the tuning parameter, except for the S d (χ 2 5 ) distribution, where a similar behaviour is asserted. Again, U n,a seems to be much more sensitive to the choice of a proper tuning parameter than T n,a . Competing tests of multivariate normality are the Henze-Visagie test, see [19], the Henze-Jiménez-Gamero test, see [15], the BHEP-test, the Henze-Jiménez-Gamero-Meintanis test, see [16], and the energy test, see [31]. A description of the test statistics, as well as procedures for computing critical values is found in [19]. The BHEP-test performs best for the NMix(0.1, 3, I 2 )-distribution (NMIX1 in [19]) but is outperformed by T n,a for NMix(0.5, 0, B 2 ), and by U n,a for the NMix(0.9, 3, B 2 ) (NMIX2 in [19]), where these procedures show the best performance of all tests considered. A similar behavior is observed for the t ν -and the spherical symmetric distributions, where again U n,a and T n,a are strong competitors to all procedures considered.

A real data example
As a real data example, we examine the meteorological data set weather provided in the R package RandomFields, see [30], which consists of differences between forecasts and observations (forecasts minus observations) of temperature and pressure at n = 157 locations in the North American Pacific Northwest. The data are pointwise realizations of a bivariate (d = 2) error Gaussian random field, see Figure 1. The forecasts are from the GFS member of the University of Washington regional numerical weather prediction ensemble, see [10], and they were valid on December 18, 2003 at 4 p.m. local time, at a forecast horizon of 48 hours. We ignore the given location of measurements in this evaluation and test the hypothesis that each pair of differences can be modeled as an i.i.d. copies from a bivariate normal distribution. In Table 2, we calculate empirical p-values based on 10 000 replications for U n,a for the univariate differences of temperature and pressure, as well as for the bivariate data for the whole data set, n = 157, and for a random selection of n = 50 points (selected in R with function sample() and seed fixed to '0721'). Regarding the complete data set, we reject the hypothesis of normality in nearly all cases on a 5% level of significance, while on a 1% level of significance we are not able to reject H 0 for the differences in temperature. However, for the pressure and the bivariate data the hypothesis of normality is nearly always rejected. These results are not surprising, since the weather data set is an example of influence of spatial correlation, which has to be carefully modeled. In [12], a bivariate Gaussian random field is fitted to the data, taking the mentioned spatial correlation into account, for a visualization of the locations see Figure 3 in [12]. For the subsample of points we see that the structure vanishes, and we throughout do not reject the hypotheses. Here, we have only applied our method as a proof of principle.

Conclusions and outlook
We have introduced and studied a new affine invariant class of tests for multivariate normality that is easy to apply and consistent against general alternatives. Although consistency has only been proved under the condition E X 4 < ∞, the test should be "all the more consistent" if E X 4 = ∞, and we conjecture that, as is the case for the BHEP-tests, also the test based on U n,a is consistent against each nonnormal alternative distribution. A further topic of research would be to choose the tuning parameter a in an adaptive way, similar to the bootstrap based univariate approaches in [1] and [32]. It would also be of interest to obtain more information on the limit null distribution of U n,a . We finish the outlook by pointing out that, with respect to the references in the introduction regarding other procedures and distributions, a similar analysis can be performed, and it is of theoretical and practical relevance to study the resulting statistics in order to assess the influence of the options of estimating or not estimating certain of the pertaining functions.
After a comparison of U n,a and T n,a from [8], and in view of the results of the simulation study, we recommend to use T n,a , since it seems to be more robust with respect to the choice of the tuning parameter a. Nevertheless, U n,a is a strong competitor, and with a suitable data driven procedure for the choice of a at hand, U n,a may turn out to be a favorable choice over the most classical and recent tests of uni-and multivariate normality.