Skip to main content

Tests for multivariate normality based on canonical correlations

Abstract

We propose new affine invariant tests for multivariate normality, based on independence characterizations of the sample moments of the normal distribution. The test statistics are obtained using canonical correlations between sets of sample moments in a way that resembles the construction of Mardia’s skewness measure and generalizes the Lin–Mudholkar test for univariate normality. The tests are compared to some popular tests based on Mardia’s skewness and kurtosis measures in an extensive simulation power study and are found to offer higher power against many of the alternatives.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  • Bartlett MS (1939) A note on tests of significance in multivariate analysis. Math Proc Camb Philos Soc 35:180–185

    Article  Google Scholar 

  • Cerioli A, Farcomeni A, Riani M (2013) Robust distances for outlier-free goodness-of-fit testing. Comput Stat Data Anal 65:29–45

    Article  MathSciNet  Google Scholar 

  • Doornik JA, Hansen H (2008) An omnibus test for univariate and multivariate normality. Oxf Bull Econ Stat 70:927–939

    Article  Google Scholar 

  • Dubkov AA, Malakhov AN (1976) Properties and interdependence of the cumulants of a random variable. Radiophys Quantum Electron 19:833–839

    Article  Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188

    Article  Google Scholar 

  • Henderson HV, Searle SR (1979) Vec and vech operators for matrices, with some uses in Jacobians and multivariate statistics. Can J Stat 7:65–81

    Article  MATH  MathSciNet  Google Scholar 

  • Henze N (2002) Invariant tests for multivariate normality: a critical review. Stat Pap 43:467–506

    Article  MATH  MathSciNet  Google Scholar 

  • Kankainen A, Taskinen S, Oja H (2007) Tests of multinormality based on location vectors and scatter matrices. Stat Methods Appl 16:357–359

    Article  MATH  MathSciNet  Google Scholar 

  • Kaplan EL (1952) Tensor notation and the sampling cumulants of k-statistics. Biometrika 39:319–323

    MATH  MathSciNet  Google Scholar 

  • Kollo T (2002) Multivariate skewness and kurtosis measures with an application in ICA. J Multivar Anal 99:2328–2338

    Article  MathSciNet  Google Scholar 

  • Kollo T, von Rosen D (2005) Advanced multivariate statistics with matrices. Springer, Berlin. ISBN 978-1-4020-3418-3

  • Kotz S, Kozubowski TJ, Podgórski K (2000) An asymmetric multivariate Laplace distribution, Technical Report No. 367, Department of Statistics and Applied Probability, University of California at Santa Barbara

  • Kshirsagar AM (1972) Multivariate analysis. Marcel Dekker, ISBN 0-8247-1386-9

  • Lin C-C, Mudholkar GS (1980) A simple test for normality against asymmetric alternatives. Biometrika 67:455–61

    Article  MATH  MathSciNet  Google Scholar 

  • Mardia KV (1970) Measures of multivariate skewness and kurtosis with applications. Biometrika 57:519–530

    Article  MATH  MathSciNet  Google Scholar 

  • Mardia KV (1974) Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhya Indian J Stat 36:115–128

    MATH  MathSciNet  Google Scholar 

  • Mardia KV, Kent JT (1991) Rao score tests for goodness of fit and independence. Biometrika 78:355–363

    Article  MATH  MathSciNet  Google Scholar 

  • Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, ISBN 0-12-471250-9

  • McCullagh P (1987) Tensor methods in statistics. University Press, ISBN 0-412-27480-9

  • Mecklin CJ, Mundfrom DJ (2004) An appraisal and bibliography of tests for multivariate normality. Int Stat Rev 72:123–128

    Article  MATH  Google Scholar 

  • Mecklin CJ, Mundfrom DJ (2005) A Monte Carlo comparison of the type I and type II error rates of tests of multivariate normality. J Stat Comput Simul 75:93–107

    Article  MATH  MathSciNet  Google Scholar 

  • Mudholkar GS, Marchetti CE, Lin CT (2002) Independence characterizations and testing normality against restricted skewness–kurtosis alternatives. J Stat Plan Inference 104:485–501

    Article  MATH  MathSciNet  Google Scholar 

  • Stehlík M, Fabián Z, Střelec L (2012) Small sample robust testing for normality against Pareto tails. Commun Stat Simul Comput 41:1167–1194

    Article  MATH  Google Scholar 

  • Stehlík M, Střelec L, Thulin M (2014) On robust testing for normality in chemometrics. Chemom Intell Lab Syst 130:98–109

    Article  Google Scholar 

  • Thulin M (2010) On two simple tests for normality with high power. Pre-print, arXiv:1008.5319

Download references

Acknowledgments

The author wishes to thank the editor and two anonymous referees for comments that helped improve the paper, and Silvelyn Zwanzig for several helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Måns Thulin.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 131 KB)

Supplementary material 2 (pdf 736 KB)

Appendix: proofs and tables

Appendix: proofs and tables

For the proof of Theorems 3 and Theorems 4 we need some basic properties of the Kronecker product \(\otimes \) and \(\mathrm{vech}\) and \(\mathrm{vec}\) operators from Henderson and Searle (1979). See also Kollo and von Rosen (2005) and Kollo (2002) for more on these tools from matrix algebra.

For a \(p\times q\) matrix \({\varvec{{A}}}=\{a_{ij}\}\) and an \(r\times s\) matrix \({\varvec{{B}}}\), the Kronecker product \({\varvec{{A}}}\otimes {\varvec{{B}}}\) is the \(pr\times qs\) matrix \(\{a_{ij}{\varvec{{B}}}\}\), \(i=1,\ldots ,p\), \(j=1,\ldots ,q\). The \(\mathrm{vec}\) operator stacks the columns of a matrix underneath eachother, forming a single vector. If the columns of the \(p\times q\) matrix \({\varvec{{A}}}\) are denoted \({\varvec{{a_1}}},\ldots ,{\varvec{{a_q}}}\) then \(\mathrm{vec}({\varvec{{A}}})=({\varvec{{a_1'}}},\ldots ,{\varvec{{a_q'}}})'\) is a vector of length \(pq\).

We will use that

$$\begin{aligned} ({\varvec{{A}}}\otimes {\varvec{{B}}})({\varvec{{C}}}\otimes {\varvec{{D}}})={\varvec{{AC}}}\otimes {\varvec{{BD}}},\qquad ({\varvec{{A}}}\otimes {\varvec{{B}}})'={\varvec{{A'}}}\otimes {\varvec{{B'}}} \end{aligned}$$

and that if \({\varvec{{A}}}\) is a \(p\times p\) matrix and \({\varvec{{B}}}\) a \(q\times q\) matrix,

$$\begin{aligned} \det ({\varvec{{A}}}\otimes {\varvec{{B}}})=\det ({\varvec{{A}}})^q\det ({\varvec{{A}}})^p. \end{aligned}$$

The \(\mathrm{vech}\) operator works as the \(\mathrm{vec}\) operator, except that it only contains each distinct element of the matrix once. For a symmetric matrix \({\varvec{{A}}}\), \(\mathrm{vech}({\varvec{{A}}})\) thus contains only the diagonal and the elements above the diagonal, whereas \(\mathrm{vec}({\varvec{{A}}})\) contains the diagonal elements and the off-diagonal elements twice.

We have the following relationship between the \(\mathrm{vec}\) operator and the Kronecker product:

$$\begin{aligned} \mathrm{vec}({\varvec{{ABC}}})=({\varvec{{C'}}}\otimes {\varvec{{A}}})\mathrm{vec}({\varvec{{B}}}). \end{aligned}$$

Furthermore, for a given symmetric \(p\times p\) matrix \({\varvec{{A}}}\) there exists a \(p(p+1)/2\times p^2\) matrix \({\varvec{{H}}}\) and a \(p^2\times p(p+1)/2\) matrix \({\varvec{{G}}}\) such that

$$\begin{aligned} \mathrm{vech}({\varvec{{A}}})={\varvec{{H}}}\mathrm{vec}({\varvec{{A}}})\qquad \text{ and } \qquad \mathrm{vec}({\varvec{{A}}})={\varvec{{G}}}\mathrm{vech}({\varvec{{A}}}). \end{aligned}$$

As a preparation for the proof of Theorem 3, we prove the following auxiliary lemma.

Lemma 1

Assume that \({\varvec{{X}}},{\varvec{{X_1}}}, \ldots , {\varvec{{X_n}}}\) are i.i.d. \(p\)-variate random variables fulfilling the conditions of Theorem 1. Let \(S_{ij}=(n-1)^{-1}\sum _{k=1}^n(X_{k,i}-\bar{X}_i)(X_{k,j}-\bar{X}_j)\) be the elements of the sample covariance matrix \({\varvec{{S}}}\).

$$\begin{aligned} {\varvec{{u_X}}}=(S_{11},S_{12},\ldots ,S_{1p},S_{22},S_{23},\ldots ,S_{2p},S_{33},\ldots ,S_{p-1,p},S_{pp})'=\mathrm{vech}({\varvec{{S}}}) \end{aligned}$$

is a vector with \(q=p(p+1)/2\) distinct elements. Denote its covariance matrix \(\mathrm{Cov}({\varvec{{u_X}}})={\varvec{{\Lambda _{22}}}}\).

Let \({\varvec{{A}}}\) be a nonsingular \(p\times p\) matrix and let \({\varvec{{b}}}\) be a \(p\)-dimensional vector. Then there exists a nonsingular \(q\times q\) matrix \({\varvec{{D}}}\) such that

  1. (i)

    the sample variances and covariances of \({\varvec{{Y}}}={\varvec{{AX}}}+{\varvec{{b}}}\) are given by \({\varvec{{u_Y}}}={\varvec{{Du_X}}}\),

  2. (ii)

    \(\mathrm{Cov}({\varvec{{u_Y}}})={\varvec{{D\Lambda _{22}D'}}}\) and

  3. (iii)

    \(\det ({\varvec{{D}}})=\det ({\varvec{{A}}})^{p+1}\),

Proof

The transformed sample \({\varvec{{AX}}}+{\varvec{{b}}}\) has sample covariance matrix \({\varvec{{ASA'}}}\), so we wish to study \(\mathrm{vech}({\varvec{{ASA'}}})\). We have

$$\begin{aligned} \mathrm{vec}({\varvec{{ASA'}}})=({\varvec{{A}}}\otimes {\varvec{{A}}})\mathrm{vec}({\varvec{{S}}}). \end{aligned}$$

Moreover, since \({\varvec{{S}}}\) is symmetric there exist nonsingular matrices \({\varvec{{G}}}\) and \({\varvec{{H}}}\) such that

$$\begin{aligned} \mathrm{vec}({\varvec{{S}}})={\varvec{{G}}}\mathrm{vech}({\varvec{{S}}})\qquad \text{ and } \qquad \mathrm{vech}({\varvec{{S}}})={\varvec{{H}}}\mathrm{vec}({\varvec{{S}}}). \end{aligned}$$

Thus

$$\begin{aligned} {\varvec{{u_Y}}}=\mathrm{vech}({\varvec{{ASA'}}})={\varvec{{H}}}({\varvec{{A}}}\otimes {\varvec{{A}}}){\varvec{{G}}}\mathrm{vech}({{\varvec{{S}}}})=:{\varvec{{D}}}{\varvec{{u_X}}}, \end{aligned}$$

which establishes the existence of \({\varvec{{D}}}\). From Section 4.2 of Henderson and Searle (1979) we have

$$\begin{aligned} \det ({\varvec{{D}}})=\det ({\varvec{{H}}}({\varvec{{A}}}\otimes {\varvec{{A}}}){\varvec{{G}}})=\det ({\varvec{{A}}})^{p+1} \end{aligned}$$

which is nonzero, since \({\varvec{{A}}}\) is nonsingular. \({\varvec{{D}}}\) is hence also nonsingular. In conclusion, we have established the existence and nonsingularity of \({\varvec{{D}}}\) as well as (i) and (iii). Finally, (ii) follows immediately from (i). \(\square \)

We now have the tools necessary to tackle Theorem 3.

Proof of Theorem 3

  1. (i)

    From Theorem 10.2.4 in Mardia et al. (1979) we have that the canonical correlations between the random vectors \({\varvec{{Y}}}\) and \({\varvec{{Z}}}\) are invariant under the nonsingular linear transformations \({\varvec{{AY}}}+{\varvec{{b}}}\) and \({\varvec{{CZ}}}+{\varvec{{d}}}\). Clearly all five statistics are invariant under changes in location, since \({\varvec{{S_{11}}}}\), \({\varvec{{S_{22}}}}\), \({\varvec{{S_{12}}}}\) and \({\varvec{{S_{21}}}}\) all share that invariance property. It therefore suffices to show that the nonsingular linear transformation \({\varvec{{AX}}}\) induces nonsingular linear transformations \({\varvec{{C\bar{X}}}}\) and \({\varvec{{Du}}}\). \({\varvec{{C}}}={\varvec{{A}}}\) is immediate and the existence of \({\varvec{{D}}}\) is given by Lemma 1.

  2. (ii)

    By part (ii) of Theorem 1, \(\mu _{ijk}=0\) for all \(i,j,k\) implies that \({\varvec{{\Lambda }}}_{12}={\varvec{{0}}}\). But then \({\varvec{{\Lambda _{11}}}}^{-1}{\varvec{{\Lambda _{12}}}}{\varvec{{\Lambda _{22}}}}^{-1}{\varvec{{\Lambda _{21}}}}={\varvec{{0}}}\) and all canonical correlations are 0. If \(\mu _{ijk}\ne 0\) then \(\rho (\bar{X}_i,S_{jk})\ne 0\). Thus the linear combinations \({\varvec{{a'\bar{X}}}}=\bar{X}_i\) and \({\varvec{{b'u}}}=S_{jk}\) have nonzero correlation. \(\lambda _1\) must therefore be greater than 0.

  3. (iii)

    Follows from the fact that the statistics are continuous function of sample moments that converge almost surely.\(\square \)

The proofs of parts (ii) and (iii) of Theorem 4 are analog to the previous proof. The proof for part (i) is however slightly different as we omit to explicitly give a matrix that gives a nonsingular linear transformation of \({\varvec{{v_X}}}\).

Proof of Theorem 4

(i) Let the third order central moment of a multivariate random variable \({\varvec{{Z}}}\) be

$$\begin{aligned} \bar{m}_3({\varvec{{Z}}})&= { E }\left[ ({\varvec{{Z}}}-{ E }{\varvec{{Z}}})\otimes ({\varvec{{Z}}}-{ E }{\varvec{{Z}}})'\otimes ({\varvec{{Z}}}-{ E }{\varvec{{Z}}}) \right] '\nonumber \\&= { E }\left[ ({\varvec{{Z}}}-{ E }{\varvec{{Z}}})\left( ({\varvec{{Z}}}-{ E }{\varvec{{Z}}})\otimes ({\varvec{{Z}}}-{ E }{\varvec{{Z}}})\right) '\right] . \end{aligned}$$

Given a sample \({\varvec{{X}}}_1,\ldots ,{\varvec{{X}}}_p\), let \(S_{ijk}=\frac{n}{(n-1)(n-2)}\sum _{r=1}^n(X_{r,i}-\bar{X}_i)(X_{r,j}-\bar{X}_j)(X_{r,k}-\bar{X}_k)\). When the distribution of \({\varvec{{Z}}}\) is the empirical distribution of said sample,

$$\begin{aligned} {\varvec{{v_X}}}=(S_{111},S_{112},\ldots ,S_{pp(p-1)},S_{ppp})'=\frac{n^2}{(n-1)(n-2)}\mathrm{vech}\left( \bar{m}_3({\varvec{{Z}}})\right) . \end{aligned}$$

Similarly \(\mathrm{vec}\left( \bar{m}_3({\varvec{{Z}}})\right) \) stacks the elements of \(\bar{m}_3({\varvec{{Z}}})\) in a vector that simply is \(\mathrm{vech}\left( \bar{m}_3({\varvec{{Z}}})\right) \) with a few repetitions:

$$\begin{aligned} {\varvec{{w_X}}}=(S_{111},S_{112},\ldots ,S_{112}\ldots ,S_{pp(p-1)},S_{ppp})'=\frac{n^2}{(n-1)(n-2)}\mathrm{vec}\left( \bar{m}_3({\varvec{{Z}}})\right) . \end{aligned}$$

Thus, for each linear combination \({\varvec{{a'}}}{\varvec{{w_X}}}\) there exists a \({\varvec{{b}}}\) so that \({\varvec{{b'}}}{\varvec{{v_X}}}={\varvec{{a'}}}{\varvec{{w_X}}}\) and therefore, by the definition of canonical correlations, the (sample) canonical correlations between \({\varvec{{\bar{X}}}}\) and \({\varvec{{v_X}}}\) are the same as those between \({\varvec{{\bar{X}}}}\) and \({\varvec{{w_X}}}\).

Writing \({\varvec{{Y}}}={\varvec{{Z}}}-{ E }{\varvec{{Z}}}\), we have \(\bar{m}_3({\varvec{{Z}}})={ E }\left( {\varvec{{Y}}}({\varvec{{Y}}}\otimes {\varvec{{Y}}})'\right) \) and

$$\begin{aligned} \bar{m}_3({\varvec{{AZ}}})&= { E }\left( {\varvec{{AY}}}({\varvec{{AY}}}\otimes {\varvec{{AY}}})'\right) ={ E }\left( {\varvec{{AY}}}({\varvec{{Y}}}\otimes {\varvec{{Y}}})'({\varvec{{A}}}\otimes {\varvec{{A}}})'\right) \nonumber \\&= {\varvec{{A}}}\bar{m}_3({\varvec{{Z}}})({\varvec{{A}}}\otimes {\varvec{{A}}})'. \end{aligned}$$

Hence

$$\begin{aligned} \mathrm{vec}\left( \bar{m}_3({\varvec{{AZ}}})\right) =({\varvec{{A}}}\otimes {\varvec{{A}}}\otimes {\varvec{{A}}})\mathrm{vec}\left( \bar{m}_3({\varvec{{Z}}})\right) . \end{aligned}$$

Now, \(\det ({\varvec{{A}}}\otimes {\varvec{{A}}}\otimes {\varvec{{A}}})=\det ({\varvec{{A}}}\otimes {\varvec{{A}}})^p\det ({\varvec{{A}}})^{p^2}=\det ({\varvec{{A}}})^{3p^2}>0\), so \({\varvec{{E}}}:=({\varvec{{A}}}\otimes {\varvec{{A}}}\otimes {\varvec{{A}}})\) is a nonsingular matrix such that \(\bar{m}_3({\varvec{{AZ}}})={\varvec{{E}}}\bar{m}_3({\varvec{{Z}}})\). Since canonical correlations are invariant under nonsingular linear transformations of the two sets of variables, this means that the canonical correlations between \({\varvec{{\bar{X}}}}\) and \({\varvec{{w_X}}}\) remain unchanged under the transformation \({\varvec{{AX}}}+{\varvec{{b}}}\). Thus the canonical correlations between \({\varvec{{\bar{X}}}}\) and \({\varvec{{v_Y}}}\) must also necessarily remain unchanged. This proves the affine invariance of the statistics. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Thulin, M. Tests for multivariate normality based on canonical correlations. Stat Methods Appl 23, 189–208 (2014). https://doi.org/10.1007/s10260-013-0252-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-013-0252-5

Keywords

  • Goodness-of-fit
  • Kurtosis
  • Multivariate normality
  • Skewness
  • Test for normality