A test for normality and independence based on characteristic function

Ejsmont, Wiktor; Milošević, Bojana; Obradović, Marko

doi:10.1007/s00362-022-01365-1

A test for normality and independence based on characteristic function

Regular Article
Open access
Published: 08 October 2022

Volume 64, pages 1861–1889, (2023)
Cite this article

Download PDF

You have full access to this open access article

Statistical Papers Aims and scope Submit manuscript

A test for normality and independence based on characteristic function

Download PDF

1998 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In this article we prove a generalization of the Ejsmont characterization (Ejsmont in Stat Probab Lett 114:1–5, 2016) of the multivariate normal distribution. Based on it, we propose a new test for independence and normality. The test uses an integral of the squared modulus of the difference between the product of empirical characteristic functions and some constant. Special attention is given to the case of testing for univariate normality in which we derive the test statistic explicitly in terms of Bessel function and explore asymptotic properties. The simulation study also includes the cases of testing for bivariate and trivariate normality and independence, as well as multivariate normality. We show the quality performance of our test in comparison to some popular powerful competitors. The practical application of the proposed normality and independence test is discussed and illustrated using a real dataset.

Addressing non-normality in multivariate analysis using the t-distribution

Article 21 January 2023

On the Extended Birnbaum–Saunders Distribution Based on the Skew-t-Normal Distribution

Article 31 July 2018

The odd log-logistic normal distribution: Theory and applications in analysis of experiments

Article 01 June 2016

1 Introduction

Some of classical and important problems in statistics are goodness-of-fit and independence tests. The traditional approach to testing for independence is based on Pearson’s correlation coefficient, but its lack of robustness to outliers and departures from normality eventually led researchers to consider some alternative nonparametric procedures such as Savage, Spearman and van der Waerden, which in particular rely on linear rank statistics. The traditional approach for testing normality includes classical omnibus tests such as Kolmogorov–Smirnov and Anderson-Darling, and quantile-based tests such as Shapiro–Wilk and Shapiro–Francia.

In many circumstances, such as checking model assumptions, one needs to use both types of tests for the same sample. This is usually done separately and then one needs to be careful with multiple testing issues. In this paper we propose another way to test for the independence and normality simultaneously.

A theoretical framework to study this aspect was given by Ejsmont (2016). Ejsmont proved that the characterizations of a normal law are given by a certain invariance of the noncentral chi-square distribution. Namely in Ejsmont (2016, Corollary 3.3) it has been shown that if the random vectors $({\mathrm X}_1,\dots , {\mathrm X}_m, A)$ and $({\mathrm Y}_{1},\dots ,{\mathrm Y}_n,B)$ are independent and the distribution of $\sum _{i=1}^m{\mathrm X}_ia_i+A+\sum _{j=1}^n{\mathrm Y}_jb_j+B$ depends only on $\sum _{i=1}^ma_i^2+\sum _{j=1}^nb_j^2$, then ${\mathrm X}_1,\dots , {\mathrm X}_m,{\mathrm Y}_{1},\dots ,{\mathrm Y}_n$ are independent and have the same normal distribution. This result is obtained under the assumption that all moments exist. In the current paper we weaken this condition considerably and propose a new test of independence and normality.

The paper is organized as follows. In Sect. 2 we state and prove the main results of Ejsmont (2016) under the weakened assumption. Next, in Sect. 3, using this result we propose a new test for normality and explore its asymptotic properties. In Sect. 4 we present results of a wide empirical study in which we compare powers of our and competing tests. Finally, in Sect. 5 we provide the real-data example and present the list of potential application of the proposed tests. For all proofs, the readers are referred to the Appendix.

2 The theoretical base for the construction of a test

Notation. The scalar product of vectors $t,s\in {\mathbb {R}}^p$ is denoted by $\langle t, s \rangle $ and the Euclidean norm of t is $\Vert t\Vert =\sqrt{\langle t, t \rangle }$. Throughout this paper ${\mathrm X}:=({\mathrm X}_1,\dots , {\mathrm X}_m) \in {\mathbb {R}}^m$ and ${\mathrm Y}:=({\mathrm Y}_{1},\dots ,{\mathrm Y}_n)\in {\mathbb {R}}^n$ are random vectors, where m and n are positive integers. The characteristic functions of ${\mathrm X}$ and ${\mathrm Y}$ are denoted by $\varphi _{\mathrm X}(\cdot )=Ee^{i\langle \cdot ,{\mathrm X}\rangle }$ and $\varphi _{\mathrm Y}(\cdot ) =Ee^{i\langle \cdot ,{\mathrm Y}\rangle }$, respectively. For complex-valued functions $f(\cdot )$, the complex conjugate of f is denoted by $\overline{f}$ and $|f|^2 = f \overline{f}$. In order to simplify the notation, we will denote $[n]=\{1,\dots ,n\}$. The concatenation of the vectors $a\in {\mathbb {R}}^m$ and $b\in {\mathbb {R}}^n$ is denoted by $(a,b) \in {\mathbb {R}}^{m+n}$.

Our construction of a new test of normality is based on the following result. This is a generalization of the main result of Ejsmont (2016), under omitted moment assumptions (in Ejsmont (2016) we assume that random variables have all moments; the proof is also different).

Theorem 2.1

Let $({\mathrm X}_1,\dots , {\mathrm X}_m,A) \text { and } ({\mathrm Y}_{1},\dots ,{\mathrm Y}_n,B)$ be independent random vectors, where ${\mathrm X}_i$ and ${\mathrm Y}_j $ are nondegenerate for $i\in [m],j\in [n]$, and let statistic

$$\langle a,{\mathrm X}\rangle +\langle b,{\mathrm Y}\rangle +A+B=\sum _{i=1}^ma_i{\mathrm X}_i+\sum _{j=1}^n b_j{\mathrm Y}_j+A+B,$$

have a distribution which depends only on $\Vert a\Vert ^2+\Vert b\Vert ^2$, where $a\in \mathbb {R}^m$ and $b\in \mathbb {R}^n$. Then random variables ${\mathrm X}_1,\dots ,{\mathrm X}_m,{\mathrm Y}_1,\dots ,{\mathrm Y}_n$ are independent and have the same normal distribution with zero means.

The construction of a new test is based directly on the Proposition below, that follows, in a sense, from Theorem 2.1; namely if $A=B=0$, then Theorem 2.1 can be rewritten as follows.

Proposition 2.2

Let $({\mathrm X}_1,\dots , {\mathrm X}_m) \text { and } ({\mathrm Y}_{1},\dots ,{\mathrm Y}_n)$ be independent random vectors, where ${\mathrm X}_i$ and ${\mathrm Y}_j$ are nondegenerate, $E({\mathrm X}_i^2)=1$, $E({\mathrm Y}_j^2)=1 $ for $i\in [m]$, $j\in [n]$. Then the following statements are equivalent:

(i)
a statistic $\langle a,{\mathrm X}\rangle +\langle b,{\mathrm Y}\rangle $ has a distribution which does not depend on
$$(a_1,\dots , a_m,b_1,\dots ,b_n),$$
whenever $\Vert a\Vert ^2+\Vert b\Vert ^2=1$;
(ii)
random variables ${\mathrm X}_1,\dots ,{\mathrm X}_m,{\mathrm Y}_1,\dots ,{\mathrm Y}_n$ are independent and have the same normal distribution N(0, 1).

3 The test statistic

In this section we propose a new class of test statistics for testing the null hypothesis that the sample comes from a multivariate normal distribution with independent components. In a univariate case it reduces to the normality null hypothesis.

Our methodology applied in this construction is based on distances between empirical and theoretical quantities. In the theory of hypothesis testing there are many types of distances that can be defined between statistical objects. One of the best known and mostly applied is the $L^2$ distance (used e.g. for the construction of Cramér (1928) and Anderson and Darling (1952) tests). If we want to test for multivariate normality, then we can also use the $L^2$ distance between empirical and theoretical characteristic function; see Baringhaus and Henze (1988) and Epps and Lawrence (1983). More recently, the characterization of a test for multivariate independence was given in Székely et al. (2007) and Székely and Rizzo (2009). Suppose that ${\mathrm X}\in {\mathbb {R}}^m,{\mathrm Y}\in {\mathbb {R}}^n$ are real-valued random vectors with characteristic functions ${\varphi _X}$ and ${\varphi _{\mathrm Y}}$, respectively. Then, for measuring independence, we can use the following distance $\int _{{\mathbb {R}}^{m+n}}|{\varphi _{{\mathrm X},{\mathrm Y}}}(t,s)-{\varphi _{\mathrm X}}(t){\varphi _{\mathrm Y}}(s)|^2w(t,s)dtds$, where w(t, s) is an arbitrary positive weight function for which the integral above exists. We put forward a test that is also based on the distance between a function of the empirical characteristic function and some constant, and it was inspired by the articles (Baringhaus and Henze 1988; Székely et al. 2007; Székely and Rizzo 2009; Epps and Lawrence 1983). Our approach is based on the following reasoning.

The condition (i) from Proposition 2.2 simply means that we get the statement (ii) if the distribution of the statistic $\langle a,{\mathrm X}\rangle +\langle b,{\mathrm Y}\rangle $ is constant on the $(n+m)$-sphere with radius 1. This requirement can be rewritten using the characteristic function, namely, we get statement (ii) if and only if the function

$$\begin{aligned} Ee^{i\langle a,{\mathrm X}\rangle +i\langle b,{\mathrm Y}\rangle } =\varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b) \end{aligned}$$

is constant on the unit sphere $\Vert a\Vert ^2+\Vert b\Vert ^2=1$, where $a\in \mathbb {R}^m$ and $b\in \mathbb {R}^n$. We also know from the proof of Proposition 2.2 that this constant function must equal $e^{-\frac{1}{2}}$, namely

$$\begin{aligned} \varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b)-e^{-\frac{1}{2}}=0 \end{aligned}$$

for all $\Vert a\Vert ^2+\Vert b\Vert ^2=1$ or equivalently,

$$\begin{aligned}&\int _{S_{n+m}}|\varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b)-e^{-\frac{1}{2}}|^2dS_{n+m}=0, \end{aligned}$$

(1)

where $\int _{S_{n+m}} \cdot dS_{n+m}$ is the surface integral over $S_{n+m}=\{t\in {\mathbb {R}}^{n+m}\mid \Vert t\Vert =1\}$. Finiteness of the integral above follows directly from $|\varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b)|\le 1$ and $e^{-\frac{1}{2}}<1$, namely we see that

$$\begin{aligned} \int _{S_{n+m}}|\varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b)-e^{-\frac{1}{2}}|^2dS_{n+m} \le (1-e^{-\frac{1}{2}})^2|S_{n+m}|. \end{aligned}$$

Let us assume that we have a simple random sample ${{\mathbf {X}}}=(\varvec{X}_1,\ldots , \varvec{X}_N)$ from a multivariate distribution with m components, i.e. the data have the following structure:

(2)

We want to test the null hypothesis

For this purpose we use Proposition 2.2 for $m=n$. Let $\widetilde{{{\mathbf {X}}}}$ denote the matrix obtained from ${{\mathbf {X}}}$ by column-wise standardization, i.e.

$$\begin{aligned} \widetilde{X}_{j,k}=\frac{x_{j,k}-\hat{\mu }_k}{\hat{\sigma }_k}, \;\;k=1,\ldots ,m,\;\; j=1,\ldots ,N, \end{aligned}$$

(3)

where

$$\begin{aligned} \hat{\mu }_k=\frac{1}{N}\sum _{j=1}^N x_{j,k} \text { and } \hat{\sigma }^2_k=\frac{1}{N-1}\sum _{j=1}^N(x_{j,k}-\hat{\mu }_k)^2. \end{aligned}$$

(4)

Let $\varphi _{\widetilde{{{\mathbf {X}}}}}(a)$ be the empirical characteristic function of $\widetilde{{{\mathbf {X}}}}$ defined by

$$\begin{aligned} \varphi _{\widetilde{{{\mathbf {X}}}}}(a)=\frac{1}{N}\sum _{k=1}^N e^{i \langle a,\widetilde{{{\mathbf {X}}}}_k\rangle }, \end{aligned}$$

where $\widetilde{{{\mathbf {X}}}}_k$ is the kth row of the matrix $\widetilde{{{\mathbf {X}}}}$. Similarly, the empirical counterpart of characteristic function of random variable $\langle a,X\rangle +\langle b,Y\rangle $ is

$$\begin{aligned} \frac{1}{N^2}\sum _{j,k}e^{i(\langle a,{\varvec{X}}_j\rangle +\langle b,{\varvec{X}}_k\rangle )}, \end{aligned}$$

where $a=(a_1,...,a_m)^{T }$ and $b=(b_1,...,b_m)^{T }$. Assuming that X and Y are both distributed as $\varvec{X}_1$, the natural test statistics based on (1) is

$$\begin{aligned} M_{m}&=N\int \left| \frac{1}{N^2}\sum _{j,k}e^{i(\langle a,\widetilde{\varvec{X}}_j\rangle +\langle b,\widetilde{\varvec{X}}_k\rangle )}-e^{-\frac{1}{2}}\right| ^2dS_{2m}(a,b), \end{aligned}$$

(5)

which can be further expressed as

$$\begin{aligned} \begin{aligned} M_{m}&=N\int \Bigg (\frac{1}{N^2}\sum _{j,k}(\cos \big (\langle a,\widetilde{\varvec{X}}_j\rangle +\langle b,\widetilde{\varvec{X}}_k\rangle \big )-e^{-\frac{1}{2}}\Bigg )^2\\&\quad +\Bigg (\frac{1}{N^2}\sum _{i,j}\sin \big (\langle a,\widetilde{\varvec{X}}_j\rangle +\langle b,\widetilde{\varvec{X}}_k\rangle \big )\Bigg )^2dS_{2m}(a,b), \end{aligned} \end{aligned}$$

(6)

where $a=(a_1,...,a_m)^{T }$ and $b=(b_1,...,b_m)^{T }$ such that $\langle a,a\rangle +\langle b,b\rangle =1$.

We are interested in the one-sided test, i.e. the right-tailed test, because we see from the above construction that we reject the null hypothesis for large values of $M_m$. It is clear that the distribution of the test statistic, under the null hypothesis, does not depend on the null distribution location and scale parameters $\mu _k$ and $\sigma _k$, $k=1,...,m$, hence we may derive critical values using the Monte Carlo approach.

When the sample is univariate, the test statistic $M_1$ can be expressed in a closed form.

Proposition 3.1

Let $\varvec{\tilde{X}}=(\tilde{X}_1,\ldots ,\tilde{X}_n)$ be the standardized sample. The statistic $M_1$ has the form

$$\begin{aligned} M_{1}&=2\pi N\Bigg [\frac{1}{N^4}\sum _{n,j,k,l=1}^N J(d(\tilde{X}_{n}-\tilde{X}_k, \tilde{X}_j-\tilde{X}_l))-e^{-\frac{1}{2}}\frac{2}{N^2}\sum _{n,j=1}^N J(d(\tilde{X}_{n},\tilde{X}_j))+e^{-1}\Bigg ]. \end{aligned}$$

(7)

where J is the Bessel function (of order zero) of the first kind, namely $J(z)=\sum _{k=0}^\infty (-1)^k\frac{(z^2/4)^k}{(k!)^2}$ and d is the distance from origin to point (x, y), i.e. $d(x,y)=\sqrt{x^2+y^2}$.

3.1 Asymptotic properties

In this section we discuss some asymptotic properties of the proposed tests. First, we show their consistency against fixed alternatives.

Theorem 3.2

Let $X_1,...,X_N$ be an i.i.d. sample of m-variate random vectors with finite second moments. Denote the vectors of means and variances $\varvec{\mu }=(\mu _1,...,\mu _m)^T$ and $\varvec{\sigma }=(\sigma ^2_1,...,\sigma ^ 2_m)^T$. Define $\varphi _Z(\cdot )$ as the characteristic function of the columnwise standardized random vector $Z=Diag ((\sigma ^ {-1}_1,...,\sigma ^{-1}_m))(X_1-\varvec{\mu })$. Then

$$\begin{aligned} \frac{M_{m}}{N}\overset{a.s.}{\rightarrow }\Delta =\int |\varphi _{Z}(a)\varphi _{Z}(b)-e^{-\frac{1}{2}}|^2dS_{2m}(a,b)\ge 0,\end{aligned}$$

and $\Delta =0$ if and only if the null hypothesis holds.

Next we examine the asymptotic distribution under the null hypothesis. The test statistic (6) is an integral of a sum of squares of two empirical processes. Denote them by $U^{(m)}_{1,N}$ and $U^{(m)}_{2,N},$ respectively. Then (6) can be expressed as

$$\begin{aligned} M_{m}&=N\int \Big |U^{(m)}_{1,N}(a,b)+i\cdot U^{(m)}_{2,N}(a,b)\Big |^2dS_{2m}(a,b), \end{aligned}$$

(8)

which is convenient for exploring the asymptotic properties.

Here we focus on the case $m=1$. The generalization to the multivariate case can be obtained analogously. Before formulating results about the limit null distribution we introduce the following notation.

Let $L^2[0,2\pi ]$ denote the Hilbert space of all complex-valued functions such that $\int _{0}^{2\pi }|g(t)|^2dt<\infty $, with the inner product defined as

$$\begin{aligned} \langle g_1,g_2\rangle =\int _{0}^{2\pi }g_1(t)\overline{g_2(t)}dt. \end{aligned}$$

Let also $||\cdot ||_{L^2}$ denote the norm in this space. Following the idea from Jammalamadaka et al. (2019), and moving to polar coordinates, our statistics can be expressed as

$$\begin{aligned} M_1=||Z_N||_{L^2}^2, \end{aligned}$$

where $Z_N(\alpha )=\sqrt{N}(U_{1,N}^{ (1)}(\cos \alpha ,\sin \alpha ))+i\cdot U_{2,N}^{ (1)}(\cos \alpha ,\sin \alpha ))$, $\alpha \in [0,2\pi ]$.

Theorem 3.3

Let $X_1,...,X_N$ be an i.i.d. sample from normal $N(\mu ,\sigma ^2)$ distribution. Then $M_1\overset{w}{\rightarrow }||Z||^2_{L^2}$, where Z is a zero mean Gaussian random element from $L^2[0,2\pi ]$ with the covariance function

$$\begin{aligned} K(\alpha _1,\alpha _2)=\lim _{N\rightarrow \infty }EZ_N(\alpha _1)\overline{Z_N(\alpha _2)}=E\Xi (X;\alpha _1)\overline{\Xi (X;\alpha _2)}, \end{aligned}$$

(9)

where

$$\begin{aligned} \begin{aligned} \Xi (x;\alpha )&= e^{-\frac{1}{2} \cos ^2\alpha } \cos (x \sin \alpha ) + e^{-\frac{1}{2} \sin ^2\alpha } \cos (x \cos \alpha )-2e^{-\frac{1}{2}} +\frac{1}{2}e^{-\frac{1}{2}}(x^2-1)\\&\quad +i\cdot \big (e^{-\frac{1}{2} \sin ^2\alpha } \sin (x \cos \alpha ) + e^{-\frac{1}{2} \cos ^2\alpha } \sin (x \sin \alpha ) -e^{-\frac{1}{2}}x\big ). \end{aligned} \end{aligned}$$

(10)

From the Karhunen–Loève expansion of Gaussian process, the distribution of $||Z||_{L^2}^2$ can be further expressed as $\sum _{i=1}^{\infty }\lambda _iW_i^2$, where $\{\lambda _i\}$ is the sequence of positive eigenvalues of the integral operator with kernel (9) and $\{W_i\}$ is the i.i.d. sequence of standard normal random variables.

4 Simulation study

In this section we calculate empirical powers of the proposed tests and compare them with several competitors.

4.1 Testing for univariate normality

In Tables 1 and 2 we present power study results for the test based on $M_1$, with sample sizes $n=20,50$ and 100. The empirical sizes and powers are presented as percentages with ’*’ signifying 100%. The results are obtained using the Monte Carlo method with $N=5000$ replications.

Among the plethora of normality tests, in order to evaluate the performance of our test versus the most popular normality tests, we selected the Shapiro–Wilk test (SW), see Shapiro and Wilk (1965), the Shapiro–Francia test (SF), see Shapiro and Francia (1972), and the Anderson-Darling test (AD), see Anderson and Darling (1952). Those tests are implemented in the R-package nortest by Gross and Ligges (2015). Additionally, we consider recent powerful tests based on empirical characteristic function (BHEP), see Henze and Wagner (1997), quantile correlation test based on the $L_2$-Wasserstein distance, see Del Barrio et al. (1999), the moment generating function (HJG$_{\beta }$) proposed in Henze and Jiménez-Gamero (2019) and a test based on Stein’s fixed point characterization proposed in Betsch and Ebner (2020).

The alternatives we consider are normal mixtures $MixN (p,\mu ,\sigma ^2)=(1-p)N(0,1)+pN(\mu ,\sigma ^2)$, Student $t_\nu $ distribution, uniform U(a, b) distribution, chi-squared $\chi ^2_\nu $, beta $B (a,b)$, gamma $\Gamma (a,b)$, Gumbel $Gum (\mu ,\sigma )$ and lognormal $LN (\mu ,\sigma ),$ where all parameters are standard distribution parameters. This set of alternatives was also used in Betsch and Ebner (2020).

It can be seen from Tables 1 and 2 that the powers are reasonably high in comparison to other tests for all alternatives except for the uniform and normal mixtures. In the case of the Gumbel distribution our test outperforms the competitors, and for gamma and chi-squared it is one of the best.

Table 1 Power comparison for tests of univariate normality—Part I

Full size table

Table 2 Power comparison for tests of univariate normality—Part II

Full size table

4.2 Testing for normality and independence

Consider first a bivariate simple random sample ${{\mathbf {X}}}=(\varvec{X}_1,\ldots , \varvec{X}_N)$, where $\varvec{X}_j=(X_{j_1},X_{j_2})$, $j=1,\ldots ,N$. Let ${\widetilde{\mathbf {X}}}=({\widetilde{\varvec{X}}}_1,\ldots , {\widetilde{\varvec{X}}}_N)$ be its column-wise standardization (3). Here, the test statistic (5) becomes

$$\begin{aligned} M_{2}&= \int \Bigg (\frac{1}{N^2}\sum _{j,k}(\cos \big (a_1\tilde{X}_{j_1}+a_2\tilde{X}_{j_2} +b_1\tilde{X}_{k_1}+b_2\tilde{X}_{k_2}\big )-e^{-\frac{1}{2}}\Bigg )^2\\&\quad +\Bigg (\frac{1}{N^2}\sum _{i,j}\sin \big (a_1\tilde{X}_{j_1}+a_2\tilde{X}_{j_2} +b_1\tilde{X}_{k_1}+b_2\tilde{X}_{k_2}\big )\Bigg )^2dS_{4}(a,b), \end{aligned}$$

where $a=(a_1,a_2)^{T }$ and $b=(b_1,b_2)^{T }$ such that $\langle a,a\rangle +\langle b,b\rangle =1$.

It is possible to further elaborate the expression above, similarly to the univariate case, however the resulting integrals do not have a convenient form, and from the computational point of view, the double sums are significantly faster then quadruple sums which would have been obtained otherwise. The integral over a sphere can be efficiently calculated using the SphericalCubature package from the R project (Nolan 2021).

In Table 3 and 4 we present powers of the new test. As competitors we chose the KS2 test, initially proposed in Koziol (1979) with data driven parameter selection introduced in Kallenberg et al. (1997), as well as the bivariate versions of BHEP and HJG$_\beta $ tests.

The reason behind choosing the KS2 test is that it is the only one in the literature known so far, to test for bivariate normality and independence. In Kallenberg et al. (1997) it is shown that it outperformsKolmogorov–Smirnov and Hoeffding’s tests in most cases. The other tests are chosen since they are powerful for bivariate normality based on characteristic and moment generating functions. When they are applied to the column-wise standardized sample, they are suitable to test for normality and independence. The set of alternatives is taken from Kallenberg et al. (1997) for some choice of distribution parameters, and is given below. Unless stated otherwise, all distributions are defined for $x_i\in \mathbb {R}$, $i=1,2$. Distributions derived from the bivariate normal distribution inherit its parameter space $\mu _i\in \mathbb {R},\sigma _i>0$, $i=1,2,$ $\rho \in [-1,1]$.

a bivariate normal distribution BivN($\mu _1,\mu _2,\sigma _1,\sigma _2,\rho $) with density
$$\begin{aligned}&g_1(x_1,x_2;\mu _1,\mu _2,\sigma _1,\sigma _2,\rho )\\&\quad =\frac{1}{2\pi \sigma _1\sigma _2\sqrt{1-\rho ^2}}\;\;e^{-\frac{1}{2(1-\rho )^2}\big (\frac{(x_1-\mu _1)^2}{\sigma _1^2}+\frac{(x_2-\mu _2)^2}{\sigma _2^2}-\frac{2\rho (x_1-\mu _1)(x_2-\mu _2)}{\sigma _1\sigma _2}\big )}, \end{aligned}$$
a mixture of bivariate normal distributions NMixA($\rho $) with density
$$\begin{aligned} g_2(x_1,x_2,\rho )=\frac{1}{2} g_1(x_1,x_2;0,0,1,1,\rho )+\frac{1}{2} g_1(x_1,x_2;1,1,1,1,0.9); \end{aligned}$$
a mixture of bivariate normal distributions NMixB($\rho $) with density
$$\begin{aligned} g_3(x_1,x_2,\rho )=\frac{1}{2} g_1(x_1,x_2;0,0,1,1,\rho )+\frac{1}{2} g_1(x_1,x_2;0,0,1,1,-\rho ); \end{aligned}$$
a bivariate lognormal distributions LogN($\sigma _1,\sigma _2,\rho $) with density
$$\begin{aligned} g_4(x_1,x_2;\sigma _1,\sigma _2,\rho )&=\frac{b_1b_2}{(b_1x_1+a_1)(b_2x_2+a_2)}g_1(l_1,l_2;0,0,\sigma _1,\sigma _2,\rho ),\;\;x_i>\\&-\frac{b_i}{a_i}, \end{aligned}$$
where $l_i=\log (b_ix_i+a_i)$, $a_i=e^{\sigma _i^2/2}$, $b_i=\sqrt{e^{2\sigma _i^2}-e^{\sigma _i^2}}$, $i=1,2$.
a sinh$^{-1}$-normal distribution sinh$^{-1}$N($\mu _1,\mu _2,\sigma _1,\sigma _2,\rho $) with density
$$\begin{aligned} g_5(x_1,x_2;\mu _1,\mu _2,\sigma _1,\sigma _2,\rho )=&\frac{b_1b_2(w_1+\sqrt{1+w_1^2})(w_2+\sqrt{1+w_2^2})}{(1+w_1^2+w_1\sqrt{1+w_1^2})(1+w_2^2+w_2\sqrt{1+w_2^2})}\\ {}&\times g_1(\sinh ^{-1}(w_1),\sinh ^{-1}(w_2);\mu _1,\mu _2,\sigma _1,\sigma _2,\rho ), \end{aligned}$$
where $w_i=b_ix_i+a_i$, $a_i=e^{\sigma _i^2/2}\sinh (\mu _i)$, $b_i=\sqrt{(e^{\sigma _i^2}-1)(e^{\sigma _i^2}\cosh (2\mu _i)+1)}$, $i=1,2$.
a generalized Burr-Pareto-logistic distribution GBPL($\alpha ,\beta $) with standard normal marginals, with density
$$\begin{aligned} g_6(x_1,x_2;\alpha ,\beta )=&\frac{(\alpha +1)\varphi (x_1)\varphi (x_2)}{\alpha \Phi (x_1)\Phi (x_2)}\bigg (\frac{1+\beta }{(\Phi (x_1))^{-\frac{1}{\alpha }}+(\Phi (x_2))^{-\frac{1}{\alpha }}-1)^{\alpha +2}}\\&+\frac{4\beta }{2(\Phi (x_1))^{-\frac{1}{\alpha }}+2(\Phi (x_2))^{-\frac{1}{\alpha }}-3)^{\alpha +2}}\\ {}&-\frac{2\beta }{2(\Phi (x_1))^{-\frac{1}{\alpha }}+(\Phi (x_2))^{-\frac{1}{\alpha }}-2)^{\alpha +2}}\\ {}&-\frac{2\beta }{(\Phi (x_1))^{-\frac{1}{\alpha }}+2(\Phi (x_2))^{-\frac{1}{\alpha }}-2)^{\alpha +2}}\bigg ),\; \alpha >0, \beta \in [-1,1], \end{aligned}$$
where $\Phi (x)$ and $\varphi (x)$ are the standard normal distribution function and density;
a Morgenstern distribution Morg($\alpha $), with standard normal marginals, with density
$$\begin{aligned} g_7(x_1,x_2;\alpha )=\varphi (x_1)\varphi (x_2)\Big (1+\alpha \big (2\Phi (x_1)-1\big )\big (2\Phi (x_1)-1\big )\Big ), \;\alpha \in [-1,1]; \end{aligned}$$
a Pearson type VII distribution PearVII($\alpha $) with density
$$\begin{aligned} g_8(x_1,x_2;\alpha )=\frac{\alpha }{2\pi } \Big (1 + \frac{1}{2}(x_1^2+x_2^2)\Big )^{\alpha + 1},\;\; \alpha >0. \end{aligned}$$

Methods of generating random variates from these distributions are available in Johnson (1987) and Cook and Johnson (1986).

Table 3 Powers of tests for bivariate normality and independence—Part I

Full size table

Table 4 Powers of tests for bivariate normality and independence—Part II

Full size table

From Tables 3 and 4 we can see that our new test is the most powerful for the bivariate normal, normal mixtures (except for $\rho =\pm 0.5$ for the mixture B), GBPL and Morgenstern alternatives. Against the sinh$^{-1}$ alternative it is the best for $\rho =\pm 0.5$, while for the lognormal alternative it is second best after the BHEP. It is interesting to note that in case of the normal mixture B our test performs better in comparison to others when $\rho $ gets closer to zero, while in the case of the sinh$^{-1}$ alternative it is the other way around, i.e. it performs better when $\rho $ is far from zero.

Consider now testing for trivariate normality and independence.

In Table 5 we present the powers of our tests, as well as those of BHEP and HJG$_\beta $ tests. We performed testing against trivariate variants of some of the alternatives considered in the bivariate case. The labels are self-explanatory and the densities can be easily derived from the bivariate counterparts with possible exception of the trivariate Morgenstern distribution which can be found in Ota and Kimura (2021).

Table 5 Powers of tests for trivariate normality and independence

Full size table

The conclusions are more or less similar to the bivariate case, for the alternatives considered. Worth mentioning is that in the case of the Morgenstern alternative our test significantly outperforms all competitors.

4.3 Testing for multivariate normality

Statistical tests for multivariate normality are usually applied to the following standardization of the multivariate simple random sample $\mathbf {X}$. Let ${\check{\mathbf{X}}}=\widehat{\Sigma }^{-\frac{1}{2}}(\mathbf {X}-\hat{\mu }),$ where $\widehat{\Sigma }^{-\frac{1}{2}}$ is the unique square root of the inverse of the sample covariance matrix $\widehat{\Sigma }$.

Our test is no exception and if we apply it to ${\check{\mathbf{X}}}$ instead of ${\widetilde{\mathbf {X}}}$ we can use it to test multivariate normality.

It is easy to see that, like in the original case, the test statistic is affine invariant under the null hypothesis of multivariate normality, and hence the null distribution can be approximated using the Monte Carlo methods.

In Tables 6 and 7 we present the powers of our tests for testing bivariate and trivariate normality, respectively. We also present the powers of corresponding BHEP and $\hbox {HJG}_{\beta }$ tests as competitors. The alternatives are chosen from the lists in Madukaife (2021) and Ebner and Henze (2020). The labels are self-explanatory and ’$\times $’ signifies that the components are independent.

From the tables we can see that no test is uniformly the best. Our test is never the least powerful, for most alternatives it is in the middle, and for couple of alternatives it is the most powerful. All the other tests have cases when they are the best as well as the cases when they are the worst. It is interesting to note that in the case of bivariate beta(0.5,0.5) alternative both our test and HJG tests have powers below the level of significance for $n=50$. However, the power of our test exceeds this level for $n=100$, while the HJG powers remain the same.

Table 6 Power comparison of tests for bivariate normality

Full size table

Table 7 Power comparison of tests for trivariate normality

Full size table

5 Applications

In this section we consider the application of our tests. We put emphasis on the normality and independence null hypothesis, as situations where we test for univariate or multivariate normality are quite common.

5.1 Model specification tests

In multivariate regression and time series analysis it is often the case that model errors and innovations are assumed to have a multivariate normal distribution with independent components. Such models are proposed and/or discussed in Breiman and Friedman (1997), Zivot and Wang (2006), and Francq and Zakoian (2019). Then our test for normality and independence can be used as a model specification test. The errors are unobservable and hence the testing must be applied to model residuals which are not an i.i.d. sample. To demonstrate the testing procedure would thus require some modification of the test in order to obtain p-values, which is beyond the scope of this paper but worth considering for future research.

Other important applications, worth considering in the future, include novelty detection methods based on orthogonal expansion representation of random elements from a certain Hilbert space (see Rafajłowicz 2021); cluster analysis with finite Gaussian mixtures (see Pan and Shen 2007; Yeung et al. 2001); and image processing (see Rafajłowicz and Rafajłowicz 2010; Rafajłowicz and Wietrzych 2010).

5.2 Random matrix

In probability theory and mathematical physics, a random matrix is a matrix-valued random variable, i.e., a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathematically as matrix problems. For example, the thermal conductivity of a lattice can be computed from the dynamical matrix of the particle particle interactions within the lattice. One of the application is to approximate a covariance matrix, as long as the underlying distribution is normal. Below we give two different representations for using $\mathcal {H}^{(m)}_0$ hypothesis in the context of the correlation matrix ${{\mathbf {X}}}^T {{\mathbf {X}}}$.

Let $\Sigma $ be a positive definite, symmetric $m \times m$ matrix, and let ${{\mathbf {X}}}^T$ be a random $m \times N$ matrix as in Eq. (2), whose columns are independent, identically distributed random vectors, each with the multivariate Gaussian distribution $N(0,\Sigma )$.

5.2.1 Wishart distribution

Assume that $N \ge m$. Then the random matrix $ S = {{\mathbf {X}}}^T{{\mathbf {X}}}$ has the Wishart distribution $W(\Sigma ,m,N)$, with density

$$\begin{aligned} {\nu }(dS)=\omega (m,N)\det (\Sigma ^{-1}S)^{N/2}\exp \Big (-\frac{1}{2}tr(\Sigma ^{-1}S)\Big ), \end{aligned}$$

for an appropriate normalizing constant $\omega (m,N)$. Under the null hypothesis $\mathcal {H}^{(m)}_0$ the matrix ${{\mathbf {X}}}^T {{\mathbf {X}}}$ has the Wishart distribution W(I, m, N), where I is the $m \times m$ identity matrix.

5.2.2 Marchenko–Pastur distribution

The limit of the empirical spectral measure of Wishart matrices, with $\mathcal {H}^{(m)}_0$ assumption, was found in Marchenko and Pastur (1967) by Vladimir Marchenko and Leonid Pastur. Assume that $\Sigma \sim N(0,I)$. If $m,N \rightarrow \infty $ in such a way that $m/N \rightarrow \lambda \in (0, 1),$ then the empirical spectral distribution ${{\mathbf {X}}}^T {{\mathbf {X}}}$ converges weakly to the Marchenko–Pastur distribution with density (with respect to the Lebesgue measure)

$$\begin{aligned} {\nu }(dx)=\frac{1}{2\pi x}\,\sqrt{4\lambda -(x-(1+\lambda ))^2}\,dx, \end{aligned}$$

supported on the interval $((1-\sqrt{\lambda })^2,\,(1+\sqrt{\lambda })^2)$.

5.3 Real data example

As an illustrative example we consider the data from Stevens (2012) that appeared originally in Ambrose (1985). The data in Table 8 represent average ratings on three performance aspects (rhythm, intonation and tempo) of two groups of elementary school children that received instruction in clarinet. The experimental group (1) had programmed instruction, while the control group (2) had traditional classroom instruction.

Table 8 Performance aspects ratings

Full size table

Suppose a researcher wants to examine the effects of instruction on these performance aspects. If these aspects are normally distributed and independent random variables, a separate univariate analysis of variance for each variable is recommended. However, if this is not the case, the researcher should either use some nonparametric procedure (in case of non-normality) or a multivariate analysis of variance (in presence of correlation). Hence, our test is handy to make a decision about the appropriate procedure.

Applying the test on data from Table 8 we get the p-values of 0.12 (group 1) and 0.19 (group 2) and therefore the recommendation for the researcher is to examine the effects using the univariate analysis of variance.

References

Abramowitz M, Stegun IA (eds) (1972) Bessel Functions $J$ and $Y$, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, pp 358–364
Google Scholar
Ambrose A (1985) The development and experimental application of programmed materials for teaching clarinet performance skills in college woodwind techniques courses. Unpublished doctoral dissertation, University of Cincinnati, OH
Anderson TW, Darling DA (1952) Asymptotic theory of certain “goodness of fit’’ criteria based on stochastic processes. Ann Math Stat 23:193–212
Article MathSciNet MATH Google Scholar
Baringhaus L, Henze N (1988) A consistent test for multivariate normality based on the empirical characteristic function. Metrika 35(6):339–348
Article MathSciNet MATH Google Scholar
Betsch S, Ebner B (2020) Testing normality via a distributional fixed point property in the Stein characterization. TEST 29:105–138
Article MathSciNet MATH Google Scholar
Breiman L, Friedman JH (1997) Predicting multivariate responses in multiple linear regression. J R Stat Soc B 59(1):3–54
Article MathSciNet MATH Google Scholar
Cook RD, Johnson ME (1986) Generalized Burr-Pareto-logistic distributions with applications to a uranium exploration data set. Technometrics 28(2):123–131
Article MathSciNet Google Scholar
Cramér H (1928) On the composition of elementary errors: II. Statistical applications. Skandinavisk Aktuarietidskrift 11:141–180
MATH Google Scholar
Del Barrio E, Cuesta-Albertos JA, Matrán C, Rodríguez-Rodríguez JM (1999) Tests of goodness of fit based on the $L_2$-Wasserstein distance. Ann Stat 27(4):1230–1239
MATH Google Scholar
Ebner B, Henze N (2020) Tests for multivariate normality–a critical review with emphasis on weighted L2-statistics. TEST 29:845–892
Article MathSciNet MATH Google Scholar
Ejsmont W (2016) A characterization of the normal distribution by the independence of a pair of random vectors. Stat Probab Lett 114:1–5
Article MathSciNet MATH Google Scholar
Epps TW, Lawrence B (1983) A test for normality based on the empirical characteristic function. Biometrika 70(3):723–726
Article MathSciNet MATH Google Scholar
Francq C, Zakoian JM (2019) GARCH models: structure, statistical inference and financial applications. Wiley, New York
Book MATH Google Scholar
Gross J, Ligges U (2015) nortest: tests for normality. R package version 1.0-4
Henze N, Jiménez-Gamero MD (2019) A new class of tests for multinormality with i.i.d. and GARCH data based on the empirical moment generating function. TEST 28:499–521
Article MathSciNet MATH Google Scholar
Henze N, Wagner T (1997) A new approach to the BHEP tests for multivariate normality. J Multivar Anal 62(1):1–23
Article MathSciNet MATH Google Scholar
Jammalamadaka SR, Jiménez-Gamero MD, Meintanis SG (2019) A class of goodness-of-fit tests for circular distributions based on trigonometric moments. SORT 43:317–336
MathSciNet MATH Google Scholar
Johnson ME (1987) Multivariate statistical distributions. Wiley, New York
Google Scholar
Kallenberg WCM, Ledwina T, Rafajłowicz E (1997) Testing bivariate independence and normality. Sankhyā 59(1):42–59
Google Scholar
Korolyuk VS, Borovskich YV (2013) Theory of U-statistics, vol 273. Springer, New York
Google Scholar
Koziol JA (1979) A smooth test for bivariate independence. Sankhyā 41(3/4):260–269
MathSciNet MATH Google Scholar
Madukaife MS (2021) BHEP class of tests for multivariate normality: an empirical comparison. Asian J Probab Stat 15(4):185–195
Article Google Scholar
Marchenko VA, Pastur LA (1967) Distribution of eigenvalues in certain sets of random matrices. Matematicheskii Sbornik. Novaya Seriya 72(144):507–536
MathSciNet MATH Google Scholar
Nolan JP (2021) SphericalCubature: Numerical integration over spheres and balls in n dimensions; multivariate polar coordinates. R package version 1:5
Google Scholar
Ota S, Kimura M (2021) Effective estimation algorithm for parameters of multivariate Farlie-Gumbel-Morgenstern copula. Jpn J Stat Data Sci 4(2):1049–1078
Article MathSciNet MATH Google Scholar
Pan W, Shen X (2007) Penalized model-based clustering with application to variable selection. J Mach Learn Res 8(5):1145
MATH Google Scholar
Rafajłowicz W (2021) Learning novelty detection outside a class of random curves with application to COVID-19 growth. J Artif Intell Soft Comput Res 11(3):195–215
Article Google Scholar
Rafajłowicz E, Rafajłowicz W (2010) Testing (non-) linearity of distributed-parameter systems from a video sequence. Asian J Control 12(2):146–158
Article MathSciNet Google Scholar
Rafajłowicz E, Wietrzych J (2010) Recognition of finite structures with application to moving objects identification. International conference on artificial intelligence and soft computing. Springer, Berlin, pp 453–461
Google Scholar
Shapiro SS, Francia RS (1972) An approximate analysis of variance test for normality. J Am Stat Assoc 67(337):215–216
Article Google Scholar
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611
Article MathSciNet MATH Google Scholar
Stevens JP (2012) Applied multivariate statistics for the social sciences. Routledge, New York
Book MATH Google Scholar
Székely GJ, Rizzo ML (2009) Brownian distance covariance. Ann. Appl Stat 3(4):1236–1265
MathSciNet MATH Google Scholar
Székely GJ, Rizzo ML, Bakirov NK (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(2):2769–2794
MathSciNet MATH Google Scholar
Ushakov NG (2011) Selected topics in characteristic functions. De Gruyter, Berlin
MATH Google Scholar
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
Book MATH Google Scholar
Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987
Article Google Scholar
Zivot E, Wang J (2006) Vector autoregressive models for multivariate time series. Modeling financial time series with S-PLUS®, 385–429

Download references

Acknowledgements

The authors express deep gratitude to the anonymous referee for the questions and comments that significantly improved the quality of the paper. The work was supported by the Narodowe Centrum Nauki grant No 2018/29/B/HS4/01420 and the Ministry of Education, Science and Technological Development of the Republic of Serbia.

Author information

Authors and Affiliations

Department of Telecommunications and Teleinformatics, Wroclaw University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
Wiktor Ejsmont
University of Belgrade, Faculty of Mathematics, Studentski trg 16, Belgrade, Serbia
Bojana Milošević & Marko Obradović

Authors

Wiktor Ejsmont
View author publications
You can also search for this author in PubMed Google Scholar
Bojana Milošević
View author publications
You can also search for this author in PubMed Google Scholar
Marko Obradović
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wiktor Ejsmont.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Theorem 2.1

Our proof is based on the analysis of the characteristic function. We denote the characteristic function of W by $\varphi _{W}(\cdot )$. We write $(a,b)=r(\tilde{a},\tilde{b})$ where $(\tilde{a},\tilde{b})$ belongs to the unit sphere of ${\mathbb {R}}^{n+m}$, i.e. $r=\sqrt{\Vert a\Vert ^2+\Vert b\Vert ^2}$.

Thus for $r>0$ and $t\in {\mathbb {R}}$, we have

$$\begin{aligned} \varphi _{\frac{\langle a,{\mathrm X}\rangle +\langle b,{\mathrm Y}\rangle +A+B}{r}}(t)=\varphi _{\langle \tilde{a},{\mathrm X}\rangle +\langle \tilde{b},{\mathrm Y}\rangle +\frac{A+B}{r}}(t). \end{aligned}$$

(11)

Under the hypothesis, the left hand side of (11) does not depend on $(\tilde{a},\tilde{b})$ (equivalently, it depends on r), and thus the limit on the right hand side

$$\begin{aligned} \varphi _{\langle \tilde{a},{\mathrm X}\rangle +\langle \tilde{b},{\mathrm Y}\rangle +\frac{A+B}{r}}(t)\xrightarrow {r\rightarrow + \infty }\varphi _{\langle \tilde{a},{\mathrm X}\rangle +\langle \tilde{b},{\mathrm Y}\rangle }(t), \end{aligned}$$

does not depend on $(\tilde{a},\tilde{b})$. In particular, we have that the distribution of a statistic

$$\begin{aligned} \langle {a},{\mathrm X}\rangle +\langle {b},{\mathrm Y}\rangle =(\langle \tilde{a},{\mathrm X}\rangle +\langle \tilde{b},{\mathrm Y}\rangle )\sqrt{\Vert a\Vert ^2+\Vert b\Vert ^2} \end{aligned}$$

depends on $\Vert {a}\Vert ^2+\Vert {b}\Vert ^2$ only. Let

$$\begin{aligned} h(\Vert {a}\Vert ^2+\Vert {b}\Vert ^2)=Ee^{i(\langle a,{\mathrm X}\rangle +\langle b,{\mathrm Y}\rangle )}. \end{aligned}$$

Because of the independence of ${\mathrm X}$ and ${\mathrm Y}$, we may write

$$\begin{aligned} h(\Vert {a}\Vert ^2+\Vert {b}\Vert ^2)=Ee^{i\langle a,{\mathrm X}\rangle } Ee^{i\langle b,{\mathrm Y}\rangle }=\varphi _{\mathrm X}(a)\varphi _{\mathrm Y}(b). \end{aligned}$$

(12)

Evaluating (12) first when $a= \mathbf { 0}\in {\mathbb {R}}^m$ and then when $b=\mathbf { 0}\in {\mathbb {R}}^n$, we get

$$\begin{aligned} h(\Vert {b}\Vert ^2)=\varphi _{\mathrm Y}(b)\text { and }h(\Vert {a}\Vert ^2)=\varphi _{\mathrm X}(a), \end{aligned}$$

respectively. Substituting this into (12), we obtain

$$\begin{aligned} h(\Vert {a}\Vert ^2+\Vert {b}\Vert ^2)=h(\Vert {a}\Vert ^2)h(\Vert {b}\Vert ^2). \end{aligned}$$

Note that $h(\cdot )$ is continuous, hence by multiplicative Cauchy functional equation we get

$$\begin{aligned} h\left( \Vert {a}\Vert ^2+\Vert {b}\Vert ^2\right) =e^{c(\Vert {a}\Vert ^2+\Vert {b}\Vert ^2)}. \end{aligned}$$

Substituting $a=(a_1,0,0,\dots ,0)$ and $b=\mathbf { 0}$ in this equation, we see that it can be read as

$$\begin{aligned} Ee^{i{\mathrm X}_1a_1}=e^{ca_1^2}, \text { i.e.}\,{\mathrm X}_1\,\text { has a normal distribution, with zero mean.} \end{aligned}$$

Using this line of reasoning to other random variables, we see that ${\mathrm X}_i$ and ${\mathrm Y}_j$ have the same normal distribution, with zero mean. The independence of random variables ${\mathrm X}_1,\dots , {\mathrm X}_m$ follows from the observation that for all $a=(a_1,\dots ,a_m)\in {\mathbb {R}}^m$

$$\begin{aligned} \varphi _{\mathrm X}(a)=e^{{c\sum _{j=1}^m{a_j^2}}}=\varphi _{{\mathrm X}_1}(a_1) \dots \varphi _{{\mathrm X}_m}(a_m). \end{aligned}$$

$\square $

Proof of Proposition 2.2

$(i)\Rightarrow (ii)$. We see that the distribution of

$$\begin{aligned} \langle a,{\mathrm X}\rangle +\langle b,{\mathrm Y}\rangle =\sqrt{\Vert a\Vert ^2+\Vert b\Vert ^2}\frac{\langle a,{\mathrm X}\rangle +\langle b,{\mathrm Y}\rangle }{\sqrt{\Vert a\Vert ^2+\Vert b\Vert ^2}} \end{aligned}$$

depends only on $\Vert a\Vert ^2+\Vert b\Vert ^2$, which by Theorem 2.1 implies that ${\mathrm X}_i$ and ${\mathrm Y}_j$ are independent and have the same normal distribution N(0, 1), because we assume that $E({\mathrm X}_i^2)=1$, $E({\mathrm Y}_j^2)=1$.

$(ii)\Rightarrow (i)$. We compute the characteristic function

$$\begin{aligned} Ee^{i\langle a,{\mathrm X}\rangle +i\langle b,{\mathrm Y}\rangle }=e^{-(\Vert a\Vert ^2+\Vert b\Vert ^2)/2}, \end{aligned}$$

from which we see that condition (i) is satisfied. $\square $

Proof of Proposition 3.1

Let us calculate the integral on the right hand side of (5).

$$\begin{aligned} M_{1}&= N\int \left| \frac{1}{N^2}\sum _{i,j}e^{i({a}\tilde{X}_{i}+{ b} \tilde{X}_{j})}-e^{-\frac{1}{2}}\right| ^2dS_2(a,b) \\&= N\int V^2(a,b) dS_2(a,b). \end{aligned}$$

Since

$$\begin{aligned} V^2(a,b)&= \frac{1}{N^4}\sum _{i,j,k,l}\Bigg (\Big (\cos (a\tilde{X}_i+b\tilde{X}_j) -e^{-\frac{1}{2}}\Big )\Big (\cos (a\tilde{X}_k+b\tilde{X}_l)-e^{-\frac{1}{2}}\Big )\\&\quad +\sin (a\tilde{X}_i+b\tilde{X}_j)\sin (a\tilde{X}_k+b\tilde{X}_l)\Bigg )\\&=\frac{1}{N^4}\sum _{i,j,k,l}(\cos (a(\tilde{X}_i-\tilde{X}_k) +b(\tilde{X}_j-\tilde{X}_l))-\frac{2e^{-\frac{1}{2}}}{N^2}\sum _{i,j}\cos (a\tilde{X}_i+b\tilde{X}_j)\\&\quad +e^{-1}, \end{aligned}$$

switching to polar coordinates we obtain

$$\begin{aligned} V^2(a,b)&=\frac{1}{N^4}\sum _{i,j,k,l}(\cos (\cos \alpha (\tilde{X}_i-\tilde{X}_k) +\sin \alpha (\tilde{X}_j-\tilde{X}_l))\\&\quad -\frac{2e^{-\frac{1}{2}}}{N^2}\sum _{i,j}\cos (\cos \alpha \tilde{X}_i+b\sin \alpha \tilde{X}_j)+e^{-1}, \quad \alpha \in [0,2\pi ). \end{aligned}$$

Since we need integration over $S_{2}$, we have to focus on computing the following integral

$$\begin{aligned}&\int _{0}^{2\pi }\cos (x\cos \alpha + y \sin \alpha )d\alpha , \qquad \text { for } x,y\in {\mathbb {R}}. \end{aligned}$$

By trigonometric identities the linear combination, or harmonic addition, of sine and cosine waves is equivalent to a single cosine wave with a phase shift and scaled amplitude, namely

$$\begin{aligned} x\cos \alpha + y \sin \alpha =\sqrt{x^2+y^2}\cos (\alpha -atan2 (y,x)), \end{aligned}$$

where $atan2 (y, x)$ is the generalization of $\arctan (y/x)$ that covers the entire circular range (we don’t need a formal definition of $atan2 $). Thus for $x,y\in {\mathbb {R}}$ and $xy\ne 0$ we get

$$\begin{aligned} \int _{0}^{2\pi }\cos (x\cos \alpha + y \sin \alpha )d\alpha&=\int _{0}^{2\pi }\cos \big (\sqrt{x^2+y^2}\cos \big (\alpha -atan2 (y,x)\big )\big )d\alpha \\ {}&=\int _{-atan2 (y,x)}^{2\pi -atan2 (y,x)}\cos \Big (\sqrt{x^2+y^2}\cos t \Big )dt\\ {}&=2\pi J(\sqrt{x^2+y^2}), \end{aligned}$$

where we used the following identity (see Abramowitz and Stegun 1972, p. 360):

$$\begin{aligned} 2\pi J(z)=\int _{0}^{2\pi }e^{iz\cos \alpha }d\alpha =\int _{0}^{2\pi }\cos (z\cos \alpha )d\alpha =\int _{0}^{2\pi }\cos (z\sin \alpha )d\alpha . \end{aligned}$$

(13)

If either $x=0$ or $y=0$, then the formula above is also true because we can use directly equation (13).

Therefore $M_1$ has the representation (7). $\square $

Proof of Theorem 3.2

The proof follows from the strong consistency of the empirical characteristic function on bounded subsets of ${\mathbb {R}}^d$ (see Ushakov 2011, Chapter 3), as well as the strong consistency of the estimators (4) Then from (5) we get that

$$\begin{aligned} \frac{M_m}{N}\overset{a.s.}{\rightarrow }\Delta = \int |\varphi _{Z}(a)\varphi _{Z}(b)-e^{-\frac{1}{2}}|^2dS_{2m}(a,b), \end{aligned}$$

where Z is a random vector whose components have zero means and unit variances, and $\Delta =0$ if and only if $|\varphi _{Z}(a)\varphi _{Z}(b)-e^{-\frac{1}{2}}|=0$ for all a, b such that $||a||^2+||b||^2=1$, which occurs from Proposition 2.2 if and only if the components of Z are independent and normally distributed as N(0, 1), i.e. when the null hypothesis is true. Hence, under any fixed alternative we have $\Delta >0$ and the tests are consistent. $\square $

Proof of Theorem 3.3

After switching to polar coordinates, as in the proof of Proposition 3.1, we get

$$\begin{aligned} M_1&=N\int _{0}^{2\pi } U_{1,N}^{2}(\alpha ;\hat{\mu },\hat{\sigma })+U_{2,N}^{{2 }}(\alpha ;\hat{\mu },\hat{\sigma })d\alpha , \end{aligned}$$

where

$$\begin{aligned} U_{1,N}(\alpha ;\mu ,\sigma )&=U_{1,N}^{ (1)}(\cos \alpha ,\sin \alpha )\\&=\frac{1}{N^2}\sum _{i,j}\Big (\cos \Big (\cos \alpha \frac{X_i-\mu }{\sigma }+\sin \alpha \frac{X_j-\mu }{\sigma }\Big )-e^{-\frac{1}{2}}\Big )\\ U_{2,N}(\alpha ;\mu ,\sigma )&=U_{2,N}^{ (1)}(\cos \alpha ,\sin \alpha )=\frac{1}{N^2}\sum _{i,j}\sin \Big (\cos \alpha \frac{X_i-\mu }{\sigma }+\sin \alpha \frac{X_j-\mu }{\sigma }\Big ), \end{aligned}$$

and $\hat{\mu }$ and $\hat{\sigma }$ are the sample mean and standard deviation, respectively.

Since the statistic is location-scale invariant, we may assume that the true parameter values are $\mu =0$ and $\sigma =1$. From the Taylor expansion of $U_{1,N}(\alpha ;\hat{\mu },\hat{\sigma })$ in the neighbourhood of $(\alpha ;0,1)$ we get

$$\begin{aligned} \begin{aligned}&\sqrt{N}U_{1,N}(\alpha ;\hat{\mu },\hat{\sigma })\\&\quad =\sqrt{N}U_{1,N}(\alpha )+\sqrt{N}\hat{\mu }E(\sin (\cos {\alpha }X_1+\sin \alpha X_2)(\cos \alpha +\sin \alpha ))\\&\qquad +\sqrt{N}(\hat{\sigma }-1)E(\sin ((\cos \alpha X_1+\sin \alpha X_2)(\cos \alpha X_1+\sin \alpha X_2))+r_{1,N}(\alpha ), \end{aligned} \end{aligned}$$

(14)

where $U_{1,N}(\alpha )=U_{1,N}(\alpha ;0,1)$, and $r_{1,N}(\alpha )$ is the Taylor’s residual. When $N\rightarrow \infty $, $r_{1,N}(\alpha )=o_P(1)$ uniformly over $\alpha $, which follows from the Bahadur representation of $(\hat{\mu },\hat{\sigma }), $ the finiteness of trigonometric functions and finite variance of normal random variables.

For a fixed $\alpha $, $U_{1,N}(\alpha )$ is a V-statistic of order 2. The first projection of its kernel is

$$\begin{aligned} \psi _1(x;\alpha )&=\frac{1}{2}(E(\cos (x\cos \alpha +X_2\sin \alpha ))+\frac{1}{2}E(\cos (X_2\cos \alpha +x\sin \alpha ))-e^{-\frac{1}{2}}\\ {}&=\frac{1}{2} \big (e^{-\frac{1}{2} \cos ^2\alpha } \cos (x \sin \alpha ) + e^{-\frac{1}{2} \sin ^2\alpha } \cos (x \cos \alpha )\big )-e^{-\frac{1}{2}}. \end{aligned}$$

Under $\mathcal {H}^{(1)}_0$ we have $E\psi _1(X;\alpha )=0$ and

$\mathrm{Var}(\psi _1(X;\alpha ))<\infty $, therefore $U_{1,N}(\alpha )$ admits the Bahadur representation (see Korolyuk and Borovskich 2013)

$$\begin{aligned} \sqrt{N}U_{1,N}(\alpha )=\frac{2}{\sqrt{N}}\sum _{i=1}^N\psi _1(X_i;\alpha )+r_{2,N}(\alpha ), \end{aligned}$$

(15)

where $r_{2,N}(\alpha )=o_P(1)$ as $N\rightarrow \infty $. Since the parameter estimators can be expressed as

$$\begin{aligned} \sqrt{N}\hat{\mu }&=\frac{1}{\sqrt{N}}\sum _{i=1}^NX_i\\ \sqrt{N}(\hat{\sigma }-1)&=\frac{1}{\sqrt{N}}\sum _{i=1}^N\frac{1}{2}(X_i^2-1)+o_P(1), \end{aligned}$$

combining (14) and (15), we get

$$\begin{aligned} \sqrt{N}U_{1,N}(\alpha ;\hat{\mu },\hat{\sigma })=\frac{1}{\sqrt{N}}\sum _{i=1}^N\Big (2\psi _1(X_i{;\alpha })+\frac{1}{2}(X_i^2-1)e^{-\frac{1}{2}}\Big )+o_P(1). \end{aligned}$$

Similarly,

$$\begin{aligned} \sqrt{N}U_{2,N}(\alpha ;\hat{\mu },\hat{\sigma })=\frac{1}{\sqrt{N}}\sum _{i=1}^N\Big (2\psi _2(X_i{;\alpha })-e^{-\frac{1}{2}}X_i\Big )+o_P(1), \end{aligned}$$

where

$$\begin{aligned} \psi _2(x;\alpha )&=\frac{1}{2} \left( e^{-\frac{1}{2} \sin ^2\alpha } \sin (x \cos \alpha ) + e^{-\frac{1}{2} \cos ^2\alpha } \sin (x \sin \alpha )\right) . \end{aligned}$$

Therefore

$$\begin{aligned} Z_N(\alpha )=\frac{1}{\sqrt{N}}\sum _{i=1}^N\Xi (X_i;\alpha )+R_N(\alpha ), \end{aligned}$$

where

$$\begin{aligned} \Xi (x;\alpha )=2\psi _1(x{;\alpha })+\frac{1}{2}e^{-\frac{1}{2}}(x^2-1)+i\cdot (2\psi _2(x{;\alpha })-e^{-\frac{1}{2}}x), \end{aligned}$$

which is equal to (10), and $||R_N||_{L^2}^2=o_P(1),$ when $N\rightarrow \infty .$

It is easy to show that $E||\Xi (X;\cdot )||_{L^2}^2<\infty $. Therefore from the central limit theorem in Hilbert spaces (van der Vaart and Wellner 1996, p. 50), we get that there exists a zero mean Gaussian random element Z from $L^2[0,2\pi ]$ with covariance function (9) such that $Z_N\overset{w}{\rightarrow } Z$.

Finally, from the continuous mapping theorem we get $ M_1\overset{w}{\rightarrow }||Z||_{L^2}^2, $ which ends the proof. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ejsmont, W., Milošević, B. & Obradović, M. A test for normality and independence based on characteristic function. Stat Papers 64, 1861–1889 (2023). https://doi.org/10.1007/s00362-022-01365-1

Download citation

Received: 27 June 2021
Revised: 19 August 2022
Accepted: 26 September 2022
Published: 08 October 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00362-022-01365-1

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A test for normality and independence based on characteristic function

Abstract

Similar content being viewed by others

Addressing non-normality in multivariate analysis using the t-distribution

On the Extended Birnbaum–Saunders Distribution Based on the Skew-t-Normal Distribution

The odd log-logistic normal distribution: Theory and applications in analysis of experiments

1 Introduction

2 The theoretical base for the construction of a test

Theorem 2.1

Proposition 2.2

3 The test statistic

Proposition 3.1

3.1 Asymptotic properties

Theorem 3.2

Theorem 3.3

4 Simulation study

4.1 Testing for univariate normality

4.2 Testing for normality and independence

4.3 Testing for multivariate normality

5 Applications

5.1 Model specification tests

5.2 Random matrix

5.2.1 Wishart distribution

5.2.2 Marchenko–Pastur distribution

5.3 Real data example

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 2.1

Proof of Proposition 2.2

Proof of Proposition 3.1

Proof of Theorem 3.2

Proof of Theorem 3.3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation