1 Introduction

Some of classical and important problems in statistics are goodness-of-fit and independence tests. The traditional approach to testing for independence is based on Pearson’s correlation coefficient, but its lack of robustness to outliers and departures from normality eventually led researchers to consider some alternative nonparametric procedures such as Savage, Spearman and van der Waerden, which in particular rely on linear rank statistics. The traditional approach for testing normality includes classical omnibus tests such as Kolmogorov–Smirnov and Anderson-Darling, and quantile-based tests such as Shapiro–Wilk and Shapiro–Francia.

In many circumstances, such as checking model assumptions, one needs to use both types of tests for the same sample. This is usually done separately and then one needs to be careful with multiple testing issues. In this paper we propose another way to test for the independence and normality simultaneously.

A theoretical framework to study this aspect was given by Ejsmont (2016). Ejsmont proved that the characterizations of a normal law are given by a certain invariance of the noncentral chi-square distribution. Namely in Ejsmont (2016, Corollary 3.3) it has been shown that if the random vectors \(({\mathrm X}_1,\dots , {\mathrm X}_m, A)\) and \(({\mathrm Y}_{1},\dots ,{\mathrm Y}_n,B)\) are independent and the distribution of \(\sum _{i=1}^m{\mathrm X}_ia_i+A+\sum _{j=1}^n{\mathrm Y}_jb_j+B\) depends only on \(\sum _{i=1}^ma_i^2+\sum _{j=1}^nb_j^2\), then \({\mathrm X}_1,\dots , {\mathrm X}_m,{\mathrm Y}_{1},\dots ,{\mathrm Y}_n\) are independent and have the same normal distribution. This result is obtained under the assumption that all moments exist. In the current paper we weaken this condition considerably and propose a new test of independence and normality.

The paper is organized as follows. In Sect. 2 we state and prove the main results of Ejsmont (2016) under the weakened assumption. Next, in Sect. 3, using this result we propose a new test for normality and explore its asymptotic properties. In Sect. 4 we present results of a wide empirical study in which we compare powers of our and competing tests. Finally, in Sect. 5 we provide the real-data example and present the list of potential application of the proposed tests. For all proofs, the readers are referred to the Appendix.

2 The theoretical base for the construction of a test

Notation. The scalar product of vectors \(t,s\in {\mathbb {R}}^p\) is denoted by \(\langle t, s \rangle \) and the Euclidean norm of t is \(\Vert t\Vert =\sqrt{\langle t, t \rangle }\). Throughout this paper \({\mathrm X}:=({\mathrm X}_1,\dots , {\mathrm X}_m) \in {\mathbb {R}}^m\) and \({\mathrm Y}:=({\mathrm Y}_{1},\dots ,{\mathrm Y}_n)\in {\mathbb {R}}^n\) are random vectors, where m and n are positive integers. The characteristic functions of \({\mathrm X}\) and \({\mathrm Y}\) are denoted by \(\varphi _{\mathrm X}(\cdot )=Ee^{i\langle \cdot ,{\mathrm X}\rangle }\) and \(\varphi _{\mathrm Y}(\cdot ) =Ee^{i\langle \cdot ,{\mathrm Y}\rangle }\), respectively. For complex-valued functions \(f(\cdot )\), the complex conjugate of f is denoted by \(\overline{f}\) and \(|f|^2 = f \overline{f}\). In order to simplify the notation, we will denote \([n]=\{1,\dots ,n\}\). The concatenation of the vectors \(a\in {\mathbb {R}}^m\) and \(b\in {\mathbb {R}}^n\) is denoted by \((a,b) \in {\mathbb {R}}^{m+n}\).

Our construction of a new test of normality is based on the following result. This is a generalization of the main result of Ejsmont (2016), under omitted moment assumptions (in Ejsmont (2016) we assume that random variables have all moments; the proof is also different).

Theorem 2.1

Let \(({\mathrm X}_1,\dots , {\mathrm X}_m,A) \text { and } ({\mathrm Y}_{1},\dots ,{\mathrm Y}_n,B)\) be independent random vectors, where \({\mathrm X}_i\) and \({\mathrm Y}_j \) are nondegenerate for \(i\in [m],j\in [n]\), and let statistic

$$\langle a,{\mathrm X}\rangle +\langle b,{\mathrm Y}\rangle +A+B=\sum _{i=1}^ma_i{\mathrm X}_i+\sum _{j=1}^n b_j{\mathrm Y}_j+A+B,$$

have a distribution which depends only on \(\Vert a\Vert ^2+\Vert b\Vert ^2\), where \(a\in \mathbb {R}^m\) and \(b\in \mathbb {R}^n\). Then random variables \({\mathrm X}_1,\dots ,{\mathrm X}_m,{\mathrm Y}_1,\dots ,{\mathrm Y}_n\) are independent and have the same normal distribution with zero means.

The construction of a new test is based directly on the Proposition below, that follows, in a sense, from Theorem 2.1; namely if \(A=B=0\), then Theorem 2.1 can be rewritten as follows.

Proposition 2.2

Let \(({\mathrm X}_1,\dots , {\mathrm X}_m) \text { and } ({\mathrm Y}_{1},\dots ,{\mathrm Y}_n)\) be independent random vectors, where \({\mathrm X}_i\) and \({\mathrm Y}_j\) are nondegenerate, \(E({\mathrm X}_i^2)=1\), \(E({\mathrm Y}_j^2)=1 \) for \(i\in [m]\), \(j\in [n]\). Then the following statements are equivalent:

  1. (i)

    a statistic \(\langle a,{\mathrm X}\rangle +\langle b,{\mathrm Y}\rangle \) has a distribution which does not depend on

    $$(a_1,\dots , a_m,b_1,\dots ,b_n),$$

    whenever \(\Vert a\Vert ^2+\Vert b\Vert ^2=1\);

  2. (ii)

    random variables \({\mathrm X}_1,\dots ,{\mathrm X}_m,{\mathrm Y}_1,\dots ,{\mathrm Y}_n\) are independent and have the same normal distribution N(0, 1).

3 The test statistic

In this section we propose a new class of test statistics for testing the null hypothesis that the sample comes from a multivariate normal distribution with independent components. In a univariate case it reduces to the normality null hypothesis.

Our methodology applied in this construction is based on distances between empirical and theoretical quantities. In the theory of hypothesis testing there are many types of distances that can be defined between statistical objects. One of the best known and mostly applied is the \(L^2\) distance (used e.g. for the construction of Cramér (1928) and Anderson and Darling (1952) tests). If we want to test for multivariate normality, then we can also use the \(L^2\) distance between empirical and theoretical characteristic function; see Baringhaus and Henze (1988) and Epps and Lawrence (1983). More recently, the characterization of a test for multivariate independence was given in Székely et al. (2007) and Székely and Rizzo (2009). Suppose that \({\mathrm X}\in {\mathbb {R}}^m,{\mathrm Y}\in {\mathbb {R}}^n\) are real-valued random vectors with characteristic functions \({\varphi _X}\) and \({\varphi _{\mathrm Y}}\), respectively. Then, for measuring independence, we can use the following distance \(\int _{{\mathbb {R}}^{m+n}}|{\varphi _{{\mathrm X},{\mathrm Y}}}(t,s)-{\varphi _{\mathrm X}}(t){\varphi _{\mathrm Y}}(s)|^2w(t,s)dtds\), where w(ts) is an arbitrary positive weight function for which the integral above exists. We put forward a test that is also based on the distance between a function of the empirical characteristic function and some constant, and it was inspired by the articles (Baringhaus and Henze 1988; Székely et al. 2007; Székely and Rizzo 2009; Epps and Lawrence 1983). Our approach is based on the following reasoning.

The condition (i) from Proposition 2.2 simply means that we get the statement (ii) if the distribution of the statistic \(\langle a,{\mathrm X}\rangle +\langle b,{\mathrm Y}\rangle \) is constant on the \((n+m)\)-sphere with radius 1. This requirement can be rewritten using the characteristic function, namely, we get statement (ii) if and only if the function

$$\begin{aligned} Ee^{i\langle a,{\mathrm X}\rangle +i\langle b,{\mathrm Y}\rangle } =\varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b) \end{aligned}$$

is constant on the unit sphere \(\Vert a\Vert ^2+\Vert b\Vert ^2=1\), where \(a\in \mathbb {R}^m\) and \(b\in \mathbb {R}^n\). We also know from the proof of Proposition 2.2 that this constant function must equal \(e^{-\frac{1}{2}}\), namely

$$\begin{aligned} \varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b)-e^{-\frac{1}{2}}=0 \end{aligned}$$

for all \(\Vert a\Vert ^2+\Vert b\Vert ^2=1\) or equivalently,

$$\begin{aligned}&\int _{S_{n+m}}|\varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b)-e^{-\frac{1}{2}}|^2dS_{n+m}=0, \end{aligned}$$
(1)

where \(\int _{S_{n+m}} \cdot dS_{n+m}\) is the surface integral over \(S_{n+m}=\{t\in {\mathbb {R}}^{n+m}\mid \Vert t\Vert =1\}\). Finiteness of the integral above follows directly from \(|\varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b)|\le 1\) and \(e^{-\frac{1}{2}}<1\), namely we see that

$$\begin{aligned} \int _{S_{n+m}}|\varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b)-e^{-\frac{1}{2}}|^2dS_{n+m} \le (1-e^{-\frac{1}{2}})^2|S_{n+m}|. \end{aligned}$$

Let us assume that we have a simple random sample \({{\mathbf {X}}}=(\varvec{X}_1,\ldots , \varvec{X}_N)\) from a multivariate distribution with m components, i.e. the data have the following structure:

(2)

We want to test the null hypothesis

figure a

For this purpose we use Proposition 2.2 for \(m=n\). Let \(\widetilde{{{\mathbf {X}}}}\) denote the matrix obtained from \({{\mathbf {X}}}\) by column-wise standardization, i.e.

$$\begin{aligned} \widetilde{X}_{j,k}=\frac{x_{j,k}-\hat{\mu }_k}{\hat{\sigma }_k}, \;\;k=1,\ldots ,m,\;\; j=1,\ldots ,N, \end{aligned}$$
(3)

where

$$\begin{aligned} \hat{\mu }_k=\frac{1}{N}\sum _{j=1}^N x_{j,k} \text { and } \hat{\sigma }^2_k=\frac{1}{N-1}\sum _{j=1}^N(x_{j,k}-\hat{\mu }_k)^2. \end{aligned}$$
(4)

Let \(\varphi _{\widetilde{{{\mathbf {X}}}}}(a)\) be the empirical characteristic function of \(\widetilde{{{\mathbf {X}}}}\) defined by

$$\begin{aligned} \varphi _{\widetilde{{{\mathbf {X}}}}}(a)=\frac{1}{N}\sum _{k=1}^N e^{i \langle a,\widetilde{{{\mathbf {X}}}}_k\rangle }, \end{aligned}$$

where \(\widetilde{{{\mathbf {X}}}}_k\) is the kth row of the matrix \(\widetilde{{{\mathbf {X}}}}\). Similarly, the empirical counterpart of characteristic function of random variable \(\langle a,X\rangle +\langle b,Y\rangle \) is

$$\begin{aligned} \frac{1}{N^2}\sum _{j,k}e^{i(\langle a,{\varvec{X}}_j\rangle +\langle b,{\varvec{X}}_k\rangle )}, \end{aligned}$$

where \(a=(a_1,...,a_m)^{T }\) and \(b=(b_1,...,b_m)^{T }\). Assuming that X and Y are both distributed as \(\varvec{X}_1\), the natural test statistics based on (1) is

$$\begin{aligned} M_{m}&=N\int \left| \frac{1}{N^2}\sum _{j,k}e^{i(\langle a,\widetilde{\varvec{X}}_j\rangle +\langle b,\widetilde{\varvec{X}}_k\rangle )}-e^{-\frac{1}{2}}\right| ^2dS_{2m}(a,b), \end{aligned}$$
(5)

which can be further expressed as

$$\begin{aligned} \begin{aligned} M_{m}&=N\int \Bigg (\frac{1}{N^2}\sum _{j,k}(\cos \big (\langle a,\widetilde{\varvec{X}}_j\rangle +\langle b,\widetilde{\varvec{X}}_k\rangle \big )-e^{-\frac{1}{2}}\Bigg )^2\\&\quad +\Bigg (\frac{1}{N^2}\sum _{i,j}\sin \big (\langle a,\widetilde{\varvec{X}}_j\rangle +\langle b,\widetilde{\varvec{X}}_k\rangle \big )\Bigg )^2dS_{2m}(a,b), \end{aligned} \end{aligned}$$
(6)

where \(a=(a_1,...,a_m)^{T }\) and \(b=(b_1,...,b_m)^{T }\) such that \(\langle a,a\rangle +\langle b,b\rangle =1\).

We are interested in the one-sided test, i.e. the right-tailed test, because we see from the above construction that we reject the null hypothesis for large values of \(M_m\). It is clear that the distribution of the test statistic, under the null hypothesis, does not depend on the null distribution location and scale parameters \(\mu _k\) and \(\sigma _k\), \(k=1,...,m\), hence we may derive critical values using the Monte Carlo approach.

When the sample is univariate, the test statistic \(M_1\) can be expressed in a closed form.

Proposition 3.1

Let \(\varvec{\tilde{X}}=(\tilde{X}_1,\ldots ,\tilde{X}_n)\) be the standardized sample. The statistic \(M_1\) has the form

$$\begin{aligned} M_{1}&=2\pi N\Bigg [\frac{1}{N^4}\sum _{n,j,k,l=1}^N J(d(\tilde{X}_{n}-\tilde{X}_k, \tilde{X}_j-\tilde{X}_l))-e^{-\frac{1}{2}}\frac{2}{N^2}\sum _{n,j=1}^N J(d(\tilde{X}_{n},\tilde{X}_j))+e^{-1}\Bigg ]. \end{aligned}$$
(7)

where J is the Bessel function (of order zero) of the first kind, namely \(J(z)=\sum _{k=0}^\infty (-1)^k\frac{(z^2/4)^k}{(k!)^2}\) and d is the distance from origin to point (xy), i.e. \(d(x,y)=\sqrt{x^2+y^2}\).

3.1 Asymptotic properties

In this section we discuss some asymptotic properties of the proposed tests. First, we show their consistency against fixed alternatives.

Theorem 3.2

Let \(X_1,...,X_N\) be an i.i.d. sample of m-variate random vectors with finite second moments. Denote the vectors of means and variances \(\varvec{\mu }=(\mu _1,...,\mu _m)^T\) and \(\varvec{\sigma }=(\sigma ^2_1,...,\sigma ^ 2_m)^T\). Define \(\varphi _Z(\cdot )\) as the characteristic function of the columnwise standardized random vector \(Z=Diag ((\sigma ^ {-1}_1,...,\sigma ^{-1}_m))(X_1-\varvec{\mu })\). Then

$$\begin{aligned} \frac{M_{m}}{N}\overset{a.s.}{\rightarrow }\Delta =\int |\varphi _{Z}(a)\varphi _{Z}(b)-e^{-\frac{1}{2}}|^2dS_{2m}(a,b)\ge 0,\end{aligned}$$

and \(\Delta =0\) if and only if the null hypothesis holds.

Next we examine the asymptotic distribution under the null hypothesis. The test statistic (6) is an integral of a sum of squares of two empirical processes. Denote them by \(U^{(m)}_{1,N}\) and \(U^{(m)}_{2,N},\) respectively. Then (6) can be expressed as

$$\begin{aligned} M_{m}&=N\int \Big |U^{(m)}_{1,N}(a,b)+i\cdot U^{(m)}_{2,N}(a,b)\Big |^2dS_{2m}(a,b), \end{aligned}$$
(8)

which is convenient for exploring the asymptotic properties.

Here we focus on the case \(m=1\). The generalization to the multivariate case can be obtained analogously. Before formulating results about the limit null distribution we introduce the following notation.

Let \(L^2[0,2\pi ]\) denote the Hilbert space of all complex-valued functions such that \(\int _{0}^{2\pi }|g(t)|^2dt<\infty \), with the inner product defined as

$$\begin{aligned} \langle g_1,g_2\rangle =\int _{0}^{2\pi }g_1(t)\overline{g_2(t)}dt. \end{aligned}$$

Let also \(||\cdot ||_{L^2}\) denote the norm in this space. Following the idea from Jammalamadaka et al. (2019), and moving to polar coordinates, our statistics can be expressed as

$$\begin{aligned} M_1=||Z_N||_{L^2}^2, \end{aligned}$$

where \(Z_N(\alpha )=\sqrt{N}(U_{1,N}^{ (1)}(\cos \alpha ,\sin \alpha ))+i\cdot U_{2,N}^{ (1)}(\cos \alpha ,\sin \alpha ))\), \(\alpha \in [0,2\pi ]\).

Theorem 3.3

Let \(X_1,...,X_N\) be an i.i.d. sample from normal \(N(\mu ,\sigma ^2)\) distribution. Then \(M_1\overset{w}{\rightarrow }||Z||^2_{L^2}\), where Z is a zero mean Gaussian random element from \(L^2[0,2\pi ]\) with the covariance function

$$\begin{aligned} K(\alpha _1,\alpha _2)=\lim _{N\rightarrow \infty }EZ_N(\alpha _1)\overline{Z_N(\alpha _2)}=E\Xi (X;\alpha _1)\overline{\Xi (X;\alpha _2)}, \end{aligned}$$
(9)

where

$$\begin{aligned} \begin{aligned} \Xi (x;\alpha )&= e^{-\frac{1}{2} \cos ^2\alpha } \cos (x \sin \alpha ) + e^{-\frac{1}{2} \sin ^2\alpha } \cos (x \cos \alpha )-2e^{-\frac{1}{2}} +\frac{1}{2}e^{-\frac{1}{2}}(x^2-1)\\&\quad +i\cdot \big (e^{-\frac{1}{2} \sin ^2\alpha } \sin (x \cos \alpha ) + e^{-\frac{1}{2} \cos ^2\alpha } \sin (x \sin \alpha ) -e^{-\frac{1}{2}}x\big ). \end{aligned} \end{aligned}$$
(10)

From the Karhunen–Loève expansion of Gaussian process, the distribution of \(||Z||_{L^2}^2\) can be further expressed as \(\sum _{i=1}^{\infty }\lambda _iW_i^2\), where \(\{\lambda _i\}\) is the sequence of positive eigenvalues of the integral operator with kernel (9) and \(\{W_i\}\) is the i.i.d. sequence of standard normal random variables.

4 Simulation study

In this section we calculate empirical powers of the proposed tests and compare them with several competitors.

4.1 Testing for univariate normality

In Tables 1 and 2 we present power study results for the test based on \(M_1\), with sample sizes \(n=20,50\) and 100. The empirical sizes and powers are presented as percentages with ’*’ signifying 100%. The results are obtained using the Monte Carlo method with \(N=5000\) replications.

Among the plethora of normality tests, in order to evaluate the performance of our test versus the most popular normality tests, we selected the Shapiro–Wilk test (SW), see Shapiro and Wilk (1965), the Shapiro–Francia test (SF), see Shapiro and Francia (1972), and the Anderson-Darling test (AD), see Anderson and Darling (1952). Those tests are implemented in the R-package nortest by Gross and Ligges (2015). Additionally, we consider recent powerful tests based on empirical characteristic function (BHEP), see Henze and Wagner (1997), quantile correlation test based on the \(L_2\)-Wasserstein distance, see Del Barrio et al. (1999), the moment generating function (HJG\(_{\beta }\)) proposed in Henze and Jiménez-Gamero (2019) and a test based on Stein’s fixed point characterization proposed in Betsch and Ebner (2020).

The alternatives we consider are normal mixtures \(MixN (p,\mu ,\sigma ^2)=(1-p)N(0,1)+pN(\mu ,\sigma ^2)\), Student \(t_\nu \) distribution, uniform U(ab) distribution, chi-squared \(\chi ^2_\nu \), beta \(B (a,b)\), gamma \(\Gamma (a,b)\), Gumbel \(Gum (\mu ,\sigma )\) and lognormal \(LN (\mu ,\sigma ),\) where all parameters are standard distribution parameters. This set of alternatives was also used in Betsch and Ebner (2020).

It can be seen from Tables 1 and 2 that the powers are reasonably high in comparison to other tests for all alternatives except for the uniform and normal mixtures. In the case of the Gumbel distribution our test outperforms the competitors, and for gamma and chi-squared it is one of the best.

Table 1 Power comparison for tests of univariate normality—Part I
Table 2 Power comparison for tests of univariate normality—Part II

4.2 Testing for normality and independence

Consider first a bivariate simple random sample \({{\mathbf {X}}}=(\varvec{X}_1,\ldots , \varvec{X}_N)\), where \(\varvec{X}_j=(X_{j_1},X_{j_2})\), \(j=1,\ldots ,N\). Let \({\widetilde{\mathbf {X}}}=({\widetilde{\varvec{X}}}_1,\ldots , {\widetilde{\varvec{X}}}_N)\) be its column-wise standardization (3). Here, the test statistic (5) becomes

$$\begin{aligned} M_{2}&= \int \Bigg (\frac{1}{N^2}\sum _{j,k}(\cos \big (a_1\tilde{X}_{j_1}+a_2\tilde{X}_{j_2} +b_1\tilde{X}_{k_1}+b_2\tilde{X}_{k_2}\big )-e^{-\frac{1}{2}}\Bigg )^2\\&\quad +\Bigg (\frac{1}{N^2}\sum _{i,j}\sin \big (a_1\tilde{X}_{j_1}+a_2\tilde{X}_{j_2} +b_1\tilde{X}_{k_1}+b_2\tilde{X}_{k_2}\big )\Bigg )^2dS_{4}(a,b), \end{aligned}$$

where \(a=(a_1,a_2)^{T }\) and \(b=(b_1,b_2)^{T }\) such that \(\langle a,a\rangle +\langle b,b\rangle =1\).

It is possible to further elaborate the expression above, similarly to the univariate case, however the resulting integrals do not have a convenient form, and from the computational point of view, the double sums are significantly faster then quadruple sums which would have been obtained otherwise. The integral over a sphere can be efficiently calculated using the SphericalCubature package from the R project (Nolan 2021).

In Table 3 and 4 we present powers of the new test. As competitors we chose the KS2 test, initially proposed in Koziol (1979) with data driven parameter selection introduced in Kallenberg et al. (1997), as well as the bivariate versions of BHEP and HJG\(_\beta \) tests.

The reason behind choosing the KS2 test is that it is the only one in the literature known so far, to test for bivariate normality and independence. In Kallenberg et al. (1997) it is shown that it outperformsKolmogorov–Smirnov and Hoeffding’s tests in most cases. The other tests are chosen since they are powerful for bivariate normality based on characteristic and moment generating functions. When they are applied to the column-wise standardized sample, they are suitable to test for normality and independence. The set of alternatives is taken from Kallenberg et al. (1997) for some choice of distribution parameters, and is given below. Unless stated otherwise, all distributions are defined for \(x_i\in \mathbb {R}\), \(i=1,2\). Distributions derived from the bivariate normal distribution inherit its parameter space \(\mu _i\in \mathbb {R},\sigma _i>0\), \(i=1,2,\) \(\rho \in [-1,1]\).

  • a bivariate normal distribution BivN(\(\mu _1,\mu _2,\sigma _1,\sigma _2,\rho \)) with density

    $$\begin{aligned}&g_1(x_1,x_2;\mu _1,\mu _2,\sigma _1,\sigma _2,\rho )\\&\quad =\frac{1}{2\pi \sigma _1\sigma _2\sqrt{1-\rho ^2}}\;\;e^{-\frac{1}{2(1-\rho )^2}\big (\frac{(x_1-\mu _1)^2}{\sigma _1^2}+\frac{(x_2-\mu _2)^2}{\sigma _2^2}-\frac{2\rho (x_1-\mu _1)(x_2-\mu _2)}{\sigma _1\sigma _2}\big )}, \end{aligned}$$
  • a mixture of bivariate normal distributions NMixA(\(\rho \)) with density

    $$\begin{aligned} g_2(x_1,x_2,\rho )=\frac{1}{2} g_1(x_1,x_2;0,0,1,1,\rho )+\frac{1}{2} g_1(x_1,x_2;1,1,1,1,0.9); \end{aligned}$$
  • a mixture of bivariate normal distributions NMixB(\(\rho \)) with density

    $$\begin{aligned} g_3(x_1,x_2,\rho )=\frac{1}{2} g_1(x_1,x_2;0,0,1,1,\rho )+\frac{1}{2} g_1(x_1,x_2;0,0,1,1,-\rho ); \end{aligned}$$
  • a bivariate lognormal distributions LogN(\(\sigma _1,\sigma _2,\rho \)) with density

    $$\begin{aligned} g_4(x_1,x_2;\sigma _1,\sigma _2,\rho )&=\frac{b_1b_2}{(b_1x_1+a_1)(b_2x_2+a_2)}g_1(l_1,l_2;0,0,\sigma _1,\sigma _2,\rho ),\;\;x_i>\\&-\frac{b_i}{a_i}, \end{aligned}$$

    where \(l_i=\log (b_ix_i+a_i)\), \(a_i=e^{\sigma _i^2/2}\), \(b_i=\sqrt{e^{2\sigma _i^2}-e^{\sigma _i^2}}\), \(i=1,2\).

  • a sinh\(^{-1}\)-normal distribution sinh\(^{-1}\)N(\(\mu _1,\mu _2,\sigma _1,\sigma _2,\rho \)) with density

    $$\begin{aligned} g_5(x_1,x_2;\mu _1,\mu _2,\sigma _1,\sigma _2,\rho )=&\frac{b_1b_2(w_1+\sqrt{1+w_1^2})(w_2+\sqrt{1+w_2^2})}{(1+w_1^2+w_1\sqrt{1+w_1^2})(1+w_2^2+w_2\sqrt{1+w_2^2})}\\ {}&\times g_1(\sinh ^{-1}(w_1),\sinh ^{-1}(w_2);\mu _1,\mu _2,\sigma _1,\sigma _2,\rho ), \end{aligned}$$

    where \(w_i=b_ix_i+a_i\), \(a_i=e^{\sigma _i^2/2}\sinh (\mu _i)\), \(b_i=\sqrt{(e^{\sigma _i^2}-1)(e^{\sigma _i^2}\cosh (2\mu _i)+1)}\), \(i=1,2\).

  • a generalized Burr-Pareto-logistic distribution GBPL(\(\alpha ,\beta \)) with standard normal marginals, with density

    $$\begin{aligned} g_6(x_1,x_2;\alpha ,\beta )=&\frac{(\alpha +1)\varphi (x_1)\varphi (x_2)}{\alpha \Phi (x_1)\Phi (x_2)}\bigg (\frac{1+\beta }{(\Phi (x_1))^{-\frac{1}{\alpha }}+(\Phi (x_2))^{-\frac{1}{\alpha }}-1)^{\alpha +2}}\\&+\frac{4\beta }{2(\Phi (x_1))^{-\frac{1}{\alpha }}+2(\Phi (x_2))^{-\frac{1}{\alpha }}-3)^{\alpha +2}}\\ {}&-\frac{2\beta }{2(\Phi (x_1))^{-\frac{1}{\alpha }}+(\Phi (x_2))^{-\frac{1}{\alpha }}-2)^{\alpha +2}}\\ {}&-\frac{2\beta }{(\Phi (x_1))^{-\frac{1}{\alpha }}+2(\Phi (x_2))^{-\frac{1}{\alpha }}-2)^{\alpha +2}}\bigg ),\; \alpha >0, \beta \in [-1,1], \end{aligned}$$

    where \(\Phi (x)\) and \(\varphi (x)\) are the standard normal distribution function and density;

  • a Morgenstern distribution Morg(\(\alpha \)), with standard normal marginals, with density

    $$\begin{aligned} g_7(x_1,x_2;\alpha )=\varphi (x_1)\varphi (x_2)\Big (1+\alpha \big (2\Phi (x_1)-1\big )\big (2\Phi (x_1)-1\big )\Big ), \;\alpha \in [-1,1]; \end{aligned}$$
  • a Pearson type VII distribution PearVII(\(\alpha \)) with density

    $$\begin{aligned} g_8(x_1,x_2;\alpha )=\frac{\alpha }{2\pi } \Big (1 + \frac{1}{2}(x_1^2+x_2^2)\Big )^{\alpha + 1},\;\; \alpha >0. \end{aligned}$$

Methods of generating random variates from these distributions are available in Johnson (1987) and Cook and Johnson (1986).

Table 3 Powers of tests for bivariate normality and independence—Part I
Table 4 Powers of tests for bivariate normality and independence—Part II

From Tables 3 and 4 we can see that our new test is the most powerful for the bivariate normal, normal mixtures (except for \(\rho =\pm 0.5\) for the mixture B), GBPL and Morgenstern alternatives. Against the sinh\(^{-1}\) alternative it is the best for \(\rho =\pm 0.5\), while for the lognormal alternative it is second best after the BHEP. It is interesting to note that in case of the normal mixture B our test performs better in comparison to others when \(\rho \) gets closer to zero, while in the case of the sinh\(^{-1}\) alternative it is the other way around, i.e. it performs better when \(\rho \) is far from zero.

Consider now testing for trivariate normality and independence.

In Table 5 we present the powers of our tests, as well as those of BHEP and HJG\(_\beta \) tests. We performed testing against trivariate variants of some of the alternatives considered in the bivariate case. The labels are self-explanatory and the densities can be easily derived from the bivariate counterparts with possible exception of the trivariate Morgenstern distribution which can be found in Ota and Kimura (2021).

Table 5 Powers of tests for trivariate normality and independence

The conclusions are more or less similar to the bivariate case, for the alternatives considered. Worth mentioning is that in the case of the Morgenstern alternative our test significantly outperforms all competitors.

4.3 Testing for multivariate normality

Statistical tests for multivariate normality are usually applied to the following standardization of the multivariate simple random sample \(\mathbf {X}\). Let \({\check{\mathbf{X}}}=\widehat{\Sigma }^{-\frac{1}{2}}(\mathbf {X}-\hat{\mu }),\) where \(\widehat{\Sigma }^{-\frac{1}{2}}\) is the unique square root of the inverse of the sample covariance matrix \(\widehat{\Sigma }\).

Our test is no exception and if we apply it to \({\check{\mathbf{X}}}\) instead of \({\widetilde{\mathbf {X}}}\) we can use it to test multivariate normality.

It is easy to see that, like in the original case, the test statistic is affine invariant under the null hypothesis of multivariate normality, and hence the null distribution can be approximated using the Monte Carlo methods.

In Tables 6 and 7 we present the powers of our tests for testing bivariate and trivariate normality, respectively. We also present the powers of corresponding BHEP and \(\hbox {HJG}_{\beta }\) tests as competitors. The alternatives are chosen from the lists in Madukaife (2021) and Ebner and Henze (2020). The labels are self-explanatory and ’\(\times \)’ signifies that the components are independent.

From the tables we can see that no test is uniformly the best. Our test is never the least powerful, for most alternatives it is in the middle, and for couple of alternatives it is the most powerful. All the other tests have cases when they are the best as well as the cases when they are the worst. It is interesting to note that in the case of bivariate beta(0.5,0.5) alternative both our test and HJG tests have powers below the level of significance for \(n=50\). However, the power of our test exceeds this level for \(n=100\), while the HJG powers remain the same.

Table 6 Power comparison of tests for bivariate normality
Table 7 Power comparison of tests for trivariate normality

5 Applications

In this section we consider the application of our tests. We put emphasis on the normality and independence null hypothesis, as situations where we test for univariate or multivariate normality are quite common.

5.1 Model specification tests

In multivariate regression and time series analysis it is often the case that model errors and innovations are assumed to have a multivariate normal distribution with independent components. Such models are proposed and/or discussed in Breiman and Friedman (1997), Zivot and Wang (2006), and Francq and Zakoian (2019). Then our test for normality and independence can be used as a model specification test. The errors are unobservable and hence the testing must be applied to model residuals which are not an i.i.d. sample. To demonstrate the testing procedure would thus require some modification of the test in order to obtain p-values, which is beyond the scope of this paper but worth considering for future research.

Other important applications, worth considering in the future, include novelty detection methods based on orthogonal expansion representation of random elements from a certain Hilbert space (see Rafajłowicz 2021); cluster analysis with finite Gaussian mixtures (see Pan and Shen 2007; Yeung et al. 2001); and image processing (see Rafajłowicz and Rafajłowicz 2010; Rafajłowicz and Wietrzych 2010).

5.2 Random matrix

In probability theory and mathematical physics, a random matrix is a matrix-valued random variable, i.e., a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathematically as matrix problems. For example, the thermal conductivity of a lattice can be computed from the dynamical matrix of the particle particle interactions within the lattice. One of the application is to approximate a covariance matrix, as long as the underlying distribution is normal. Below we give two different representations for using \(\mathcal {H}^{(m)}_0\) hypothesis in the context of the correlation matrix \({{\mathbf {X}}}^T {{\mathbf {X}}}\).

Let \(\Sigma \) be a positive definite, symmetric \(m \times m\) matrix, and let \({{\mathbf {X}}}^T\) be a random \(m \times N\) matrix as in Eq. (2), whose columns are independent, identically distributed random vectors, each with the multivariate Gaussian distribution \(N(0,\Sigma )\).

5.2.1 Wishart distribution

Assume that \(N \ge m\). Then the random matrix \( S = {{\mathbf {X}}}^T{{\mathbf {X}}}\) has the Wishart distribution \(W(\Sigma ,m,N)\), with density

$$\begin{aligned} {\nu }(dS)=\omega (m,N)\det (\Sigma ^{-1}S)^{N/2}\exp \Big (-\frac{1}{2}tr(\Sigma ^{-1}S)\Big ), \end{aligned}$$

for an appropriate normalizing constant \(\omega (m,N)\). Under the null hypothesis \(\mathcal {H}^{(m)}_0\) the matrix \({{\mathbf {X}}}^T {{\mathbf {X}}}\) has the Wishart distribution W(ImN), where I is the \(m \times m\) identity matrix.

5.2.2 Marchenko–Pastur distribution

The limit of the empirical spectral measure of Wishart matrices, with \(\mathcal {H}^{(m)}_0\) assumption, was found in Marchenko and Pastur (1967) by Vladimir Marchenko and Leonid Pastur. Assume that \(\Sigma \sim N(0,I)\). If \(m,N \rightarrow \infty \) in such a way that \(m/N \rightarrow \lambda \in (0, 1),\) then the empirical spectral distribution \({{\mathbf {X}}}^T {{\mathbf {X}}}\) converges weakly to the Marchenko–Pastur distribution with density (with respect to the Lebesgue measure)

$$\begin{aligned} {\nu }(dx)=\frac{1}{2\pi x}\,\sqrt{4\lambda -(x-(1+\lambda ))^2}\,dx, \end{aligned}$$

supported on the interval \(((1-\sqrt{\lambda })^2,\,(1+\sqrt{\lambda })^2)\).

5.3 Real data example

As an illustrative example we consider the data from Stevens (2012) that appeared originally in Ambrose (1985). The data in Table 8 represent average ratings on three performance aspects (rhythm, intonation and tempo) of two groups of elementary school children that received instruction in clarinet. The experimental group (1) had programmed instruction, while the control group (2) had traditional classroom instruction.

Table 8 Performance aspects ratings

Suppose a researcher wants to examine the effects of instruction on these performance aspects. If these aspects are normally distributed and independent random variables, a separate univariate analysis of variance for each variable is recommended. However, if this is not the case, the researcher should either use some nonparametric procedure (in case of non-normality) or a multivariate analysis of variance (in presence of correlation). Hence, our test is handy to make a decision about the appropriate procedure.

Applying the test on data from Table 8 we get the p-values of 0.12 (group 1) and 0.19 (group 2) and therefore the recommendation for the researcher is to examine the effects using the univariate analysis of variance.