1 Introduction

Classical multivariate statistics is based on a normally distributed population. In practice the normality assumption is often violated and more general classes of distributions are of interest. Continuous elliptical distributions and skew-elliptical distributions are usually the first choices for data modeling. The multivariate t-distribution, \(t_{p,\nu }\), is of special interest because of two important properties—tail dependence and a heavier tail area than that of the normal distribution. When the number of degrees of freedom \(\nu \) tends to infinity, the \(t_{p,\nu }\)-distribution converges to the normal distribution with the same parameters. Thus, we have a direct generalization of the normal population. A wide range of applications where a multivariate \(t_{p,\nu }\)-distribution has been used can be found in (Kotz and Nadarajah 2004, Ch. 12), for instance. Among many recent reports we refer to Osorio et al. (2023) for applications in meteorology, Finegold and Drton (2011) for gene analysis, and Kan and Zhou (2017) and Galea et al. (2020) for economic and financial applications. For risk estimation in portfolio theory, a multivariate t-distribution is specially advocated in Lauprete et al. (2002) with reference to the t-copula.

Before applying specific methods of multivariate analysis it is important to examine first the structure of the covariance matrix \(\varvec{\Sigma }\). Further analyses will be much simpler if one can take into account a specific covariance structure. Traditionally, tests have been developed for simple structures testing \(\varvec{\Sigma }={\textbf{I}}_p\), sphericity \(\varvec{\Sigma }=\sigma ^2{\textbf{I}}_p\), uncorrelatedness \(\varvec{\Sigma }=\varvec{\Lambda }\) with \(\varvec{\Lambda }\) diagonal, and intraclass correlation structure \(\varvec{\Sigma }=\sigma ^2[(1-\rho ){\textbf{I}}_p+\rho {\textbf{1}}_p{\textbf{1}}_p']\), where \({\textbf{1}}'=(1,1,\ldots ,1)\) is a p-dimensional row vector of ones and \({\textbf{I}}_p\) is the \(p\times p\) identity matrix. In more complex situations, e.g. when analysing spatial-temporal data, the Kronecker product structure is present (Srivastava et al. 2008, 2009; Filipiak and Klein 2017, for example). When the population distribution is elliptical, the covariance matrix is a product of a univariate multiplier characterizing the distribution and a scale matrix, denoted also by \(\varvec{\Sigma }\). We shall derive test statistics for the parameter \(\varvec{\Sigma }\) under the null hypothesis \(\varvec{\Sigma }=\varvec{\Sigma }_0\), where \(\varvec{\Sigma }_0\) can be specified.

Probably the most commonly used test is the likelihood ratio test (LRT) under the assumption of normality of the population (Anderson 2003; Bilodeau and Brenner 1999, for example). A profound study of basic LRTs concerning covariance structures under normality is presented in Muirhead (2005, Ch. 8). However, it is known that when the number of parameters to be tested is large, the LRT will almost always reject the null hypothesis. In order to overcome this problem, corrections to the test have been made so that it can be used in a high-dimensional setup as well; see Bai et al. (2009), for example. In practice the uncorrected test is still used. In Kollo et al. (2016) it is shown that instead of the LRT or Wald test (WT), the more consistent Rao score test (RST) should be used to test a particular covariance structure under normality. In this paper we examine these three tests for a multivariate \(t_{p,\nu }\)-distributed population. Since one of our goals is to investigate possible differences in maximum likelihood estimates (MLEs) between t-distributed and normal populations, we focus on cases where the number of degrees of freedom, \(\nu \), is known and as small as possible. Our second goal is to examine the speed of convergence of the considered test statistics when \(\nu \) is growing and to study the power of the tests (by simulations), the assumption of fixed \(\nu \) allows us to speed up the simulations, especially for higher values of dimension and sample size. Nevertheless, the formulas for test statistics under unknown \(\nu \) will be also presented, together with their distributions compared to the respective distributions under known \(\nu \).

In Sect. 2 notation and required notions are explained. For fixed \(\nu \) in Sect. 3 we derive equations for finding MLEs and present, in Proposition 1, the LRT statistic. In Sect. 4 we find a score vector and information matrix for a \(t_{p,\nu }\)-distributed population, and in Proposition 2 the RST statistic is presented using trace functions. In Sect. 5 the WT statistic is derived and presented in Proposition 3. A modification, \(\hbox {WT}^*\), of the Wald statistic is also introduced. In Sect. 6 convergence to the limiting chi-square distribution is examined in simulation experiments for all test statistics. Convergence is also examined in the situation when the MLEs of a \(t_{p,\nu }\)-distribution are replaced by their approximations—the MLEs of the corresponding normal population. In the simulation studies, empirical type I errors and powers of the tests are also calculated. Assuming unknown \(\nu \) in Sect. 7 we present relevant test statistics and we show, that their distributions are similar to the distributions of test statistics derived in Sects. 35. Finally we summarize the results in Sect. 8.

2 Notation and notions

Derivations in this paper utilize a matrix technique based on vectorization, Kronecker product, and matrix derivative. For deeper insight into this technique the interested reader is referred to Magnus and Neudecker (2019), Harville (1997), or Kollo and von Rosen (2005). The following properties of “vec” operator, which transforms a matrix into a vector by stacking the columns one under the other, are frequently used:

$$\begin{aligned} \begin{array}{ccc} \mathrm {vec\,}({\textbf{A}}{\textbf{B}}{\textbf{C}})=({\textbf{C}}^\prime \otimes {\textbf{A}})\mathrm {vec\,}{\textbf{B}},\\ \mathrm {vec\,}^\prime {\textbf{A}}\mathrm {vec\,}{\textbf{B}}=\textrm{tr}({\textbf{A}}^\prime {\textbf{B}}),\\ \mathrm {vec\,}({\textbf{a}}{\textbf{b}}^\prime )={\textbf{b}}\otimes {\textbf{a}},\\ \end{array} \end{aligned}$$

where “\(\textrm{tr}\)” denotes the trace operator. Matrices are denoted with capital letters and vectors with lowercase letters in bold.

Later we shall use matrix derivatives repeatedly, and the definition of Neudecker (1969) is applied.

Definition 1

Let the elements of \({\textbf{Y}}\in {\mathbb {R}}^{r \times s}\) be functions of \({\textbf{X}}\in {\mathbb {R}}^{p\times q}\) with non-constant and functionally independent elements \(x_{ij}\). The matrix \(\frac{d{\textbf{Y}}}{d{\textbf{X}}} \in {\mathbb {R}}^{rs \times pq}\) is called the matrix derivative of \({\textbf{Y}}\) by \({\textbf{X}}\) in a set A if the partial derivatives \(\frac{\partial y_{kl}}{\partial x_{ij}}\) exist and are continuous in A, and

$$\begin{aligned} \frac{d{\textbf{Y}}}{d{\textbf{X}}}=\frac{d }{d \mathrm {vec\,}^\prime {\textbf{X}}}\otimes \mathrm {vec\,}{\textbf{Y}}\end{aligned}$$

where

$$\begin{aligned} \frac{d }{d \mathrm {vec\,}^\prime {\textbf{X}}}=\left( \frac{\partial }{\partial x_{11}},\dots ,\frac{\partial }{\partial x_{p1}},\frac{\partial }{\partial x_{12}},\dots ,\frac{\partial }{\partial x_{p2}},\dots ,\frac{\partial }{\partial x_{1q}},\dots ,\frac{\partial }{\partial x_{pq}}\right) \end{aligned}$$

and \(\mathrm {vec\,}(\cdot )\) is the vectorization operator.

Definition of the derivative with respect to matrix \({\textbf{X}}\) of a scalar, vector and matrix function, can be also found in a recent paper by Liu et al. (2023), where an insightful overview of matrix-oriented results and their applications in statistics are presented.

From the basic properties of the matrix derivative (Magnus and Neudecker 2019) we obtain the differentiation rule for a composite function:

$$\begin{aligned} \textrm{when} \quad {\textbf{Z}}= {\textbf{Z}}({\textbf{Y}}),\quad {\textbf{Y}}={\textbf{Y}}({\textbf{X}})\quad \textrm{then} \quad \frac{d {\textbf{Z}}}{d {\textbf{X}}} = \frac{d {\textbf{Z}}}{d {\textbf{Y}}} \frac{d {\textbf{Y}}}{d {\textbf{X}}}, \end{aligned}$$

and the derivatives of the determinant and the inverse of a square matrix \({\textbf{X}}\):

$$\begin{aligned} \frac{d | {\textbf{X}}|}{d {\textbf{X}}}=| {\textbf{X}}| \mathrm {vec\,}^\prime ({\textbf{X}}^{-1})^\prime \quad \textrm{and }\quad \frac{d {\textbf{X}}^{-1}}{d {\textbf{X}}}= -({\textbf{X}}^{-1})^\prime \otimes {\textbf{X}}^{-1}, \end{aligned}$$

where \(|\cdot |\) denotes the determinant.

Note that for the symmetric matrix \({\textbf{X}}={\textbf{X}}^\prime :p\times p\), the derivative is computed with respect to the vectorized lower triangle of \({\textbf{X}}\), denoted by \(\mathrm {vech\,}{\textbf{X}}\), instead of \(\mathrm {vec\,}{\textbf{X}}\), which results in \(rs\times p(p+1)/2\) matrix. To avoid misunderstandings, the derivative with respect to \(\mathrm {vech\,}{\textbf{X}}\) we denote by adding superscript to \({\textbf{X}}\), i.e., \(\frac{d{\textbf{Y}}}{d{\textbf{X}}^{\Delta }}=\frac{d\mathrm {vec\,}{\textbf{Y}}}{d\mathrm {vech\,}^\prime {\textbf{X}}}\). Then, using the chain rule described above,

$$\begin{aligned} \frac{d{\textbf{Y}}}{d{\textbf{X}}^{\Delta }}=\frac{d\mathrm {vec\,}{\textbf{Y}}}{d\mathrm {vec\,}^\prime {\textbf{X}}}\cdot \frac{d\mathrm {vec\,}{\textbf{X}}}{d\mathrm {vech\,}^\prime {\textbf{X}}}= \frac{d\mathrm {vec\,}{\textbf{Y}}}{d\mathrm {vec\,}^\prime {\textbf{X}}} \cdot {\textbf{D}}_p, \end{aligned}$$

where \({\textbf{D}}_p:p^2\times \frac{1}{2}p(p+1)\) is a duplication matrix that transforms \(\mathrm {vech\,}{\textbf{A}}\) into \(\mathrm {vec\,}{\textbf{A}}\), i.e.,

$$\begin{aligned} {\textbf{D}}_p\mathrm {vech\,}{\textbf{A}}=\mathrm {vec\,}{\textbf{A}}; \end{aligned}$$

cf. Magnus and Neudecker (1986), Filipiak et al. (2016).

Let \({\textbf{x}}\) be a continuous random vector with distribution \(P_{{\textbf{x}}}(\varvec{\theta })\) and density function \(f_{{\textbf{x}}}({\textbf{x}},\varvec{\theta })\), where \(\varvec{\theta }\) is a vector of unknown parameters.

Definition 2

The score vector of a random vector \({\textbf{x}}\) is given by the matrix derivative

$$\begin{aligned} {\textbf{u}}({\textbf{x}},\varvec{\theta })=\left( \frac{d}{d\varvec{\theta }} \ln f({\textbf{x}},\varvec{\theta })\right) ^\prime . \end{aligned}$$

Definition 3

The information matrix of a random vector \({\textbf{x}}\) is the covariance matrix of the score vector \({\textbf{u}}({\textbf{x}},\varvec{\theta })\):

$$\begin{aligned} \textrm{I}({\textbf{x}},\varvec{\theta })={\mathbb {D}}({\textbf{u}}({\textbf{x}},\varvec{\theta }))={\mathbb {E}}({\textbf{u}}({\textbf{x}},\varvec{\theta }){\textbf{u}}^\prime ({\textbf{x}},\varvec{\theta })). \end{aligned}$$

Let \({\textbf{X}}=({\textbf{x}}_1,\dots ,{\textbf{x}}_n)\) denote a random sample from the distribution \(P_{{\textbf{x}}}(\varvec{\theta })\). Then the log-likelihood function of the sample is given by \(\ell (\varvec{\theta },{\textbf{X}})=\sum \limits _{i=1}^n \ln f({\textbf{x}}_i,\varvec{\theta })\) and the score function of the sample by \({\textbf{u}}({\textbf{X}},\varvec{\theta })=\sum \limits _{i=1}^n {\textbf{u}}({\textbf{x}}_i,\varvec{\theta })\). The information matrix of the sample is given by \(\textrm{I}({\textbf{X}},\varvec{\theta })=n \cdot \textrm{I}({\textbf{x}},\varvec{\theta })\).

Definition 4

Let a random p-vector \({\textbf{y}}\) be normally distributed, \({\textbf{y}}\sim N_p({\textbf{0}},\varvec{\Sigma })\), and let \(Z^2\sim \chi ^2_{\nu }\) be independent of \({\textbf{y}}\). Then

$$\begin{aligned} {\textbf{x}}=\frac{\sqrt{\nu }}{Z}{\textbf{y}}+\varvec{\mu }\end{aligned}$$

is multivariate \(t_{p,\nu }\)-distributed with parameters \(\varvec{\mu }\) and \(\varvec{\Sigma }\), i.e., \({\textbf{x}}\sim t_{p,\nu }(\varvec{\mu }, \varvec{\Sigma })\).

Remark 1

When \(\varvec{\mu }={\textbf{0}}\) and instead of \(\varvec{\Sigma }\) we have a correlation matrix \({\textbf{R}}\) as the parameter of the normal distribution, i.e., \(N_p({\textbf{0}}, {\textbf{R}})\), we obtain in the univariate case a standard t-distribution. From Definition 4 we obtain a member of the location-scale family of the univariate t-distribution.

The density function of \({\textbf{x}}\sim t_{p,\nu }(\varvec{\mu },\varvec{\Sigma })\) is given by

$$\begin{aligned} f_{\nu }({\textbf{x}},\varvec{\mu },\varvec{\Sigma })=c_p|\varvec{\Sigma }|^{-\frac{1}{2}} \big [1+\tfrac{1}{\nu } ({\textbf{x}}-\varvec{\mu })^\prime \varvec{\Sigma }^{-1}({\textbf{x}}-\varvec{\mu }) \big ]^{-\frac{\nu +p}{2}}, \end{aligned}$$
(1)

where \(c_p=\frac{\Gamma \left( (\nu +p)/2\right) }{(\pi \nu )^{p/2}\Gamma \left( \nu /2\right) }\) (cf. Kotz and Nadarajah 2004, Ch. 5), and the first two moments of \({\textbf{x}}\) are

$$\begin{aligned} \begin{array}{rcl} {\mathbb {E}}{\textbf{x}}&{}=&{}\varvec{\mu },\\ {\mathbb {D}}{\textbf{x}}&{}=&{}\tfrac{\nu }{\nu -2}\varvec{\Sigma }, \quad \nu >2; \end{array} \end{aligned}$$

cf. (Muirhead 2005, p. 48).

Recall that for \(\nu \rightarrow \infty \) the multivariate \(t_{p,\nu }\)-distribution tends to the multivariate normal distribution, for which obviously

$$\begin{aligned} {\mathbb {E}}{\textbf{x}}=\varvec{\mu }, \qquad {\mathbb {D}}{\textbf{x}}=\varvec{\Sigma }, \end{aligned}$$

and the maximum likelihood estimators (MLEs) of \(\varvec{\mu }\) and \(\varvec{\Sigma }\) are

$$\begin{aligned} \begin{array}{l} \displaystyle \widehat{\varvec{\mu }}= \tfrac{1}{n}\sum _{i=1}^n{\textbf{x}}_i=\tfrac{1}{n}{\textbf{X}}{\textbf{1}}_n=\overline{{\textbf{x}}}, \\ \displaystyle \widehat{\varvec{\Sigma }}=\tfrac{1}{n}\sum _{i=1}^n({\textbf{x}}_i-\overline{{\textbf{x}}})({\textbf{x}}_i-\overline{{\textbf{x}}}) ^\prime =\tfrac{1}{n}{\textbf{X}}({\textbf{I}}_n-\tfrac{1}{n}{\textbf{1}}_n{\textbf{1}}^\prime _n){\textbf{X}}^\prime ={\textbf{S}}\end{array} \end{aligned}$$
(2)

with \({\textbf{X}}= ({\textbf{x}}_1, \dots , {\textbf{x}}_n)\) being a sample from \(N_p(\varvec{\mu },\varvec{\Sigma })\) and \({\textbf{1}}_n\) being an n-dimensional vector of ones.

It is worth noting that the results presented in Sects. 36 are obtained under the assumption of \(\nu \) to be known, while in Sect. 7\(\nu \) is assumed to be unknown.

3 Likelihood ratio test

For testing the hypothesis

$$\begin{aligned} {\left\{ \begin{array}{ll} \textrm{H}_0: \ \varvec{\theta }= \varvec{\theta }_0 \\ \textrm{H}_1: \ \varvec{\theta }\ne \varvec{\theta }_0 \end{array}\right. } \end{aligned}$$
(3)

the LRT statistic in logarithmic form is

$$\begin{aligned} \begin{array}{rcl} \textrm{LRT}({\textbf{X}},\varvec{\theta }_0) &{}&{}= \displaystyle - 2 \ln \left( \frac{L(\varvec{\theta }_0,{\textbf{X}})}{\max _{\varvec{\theta }} L(\varvec{\theta },{\textbf{X}})} \right) =-2 [\ln L(\varvec{\theta }_0,{\textbf{X}})-\ln L(\widehat{\varvec{\theta }},{\textbf{X}})]\\ &{}&{} =-2[\ell (\varvec{\theta }_0,{\textbf{X}})-\ell (\widehat{\varvec{\theta }},{\textbf{X}})], \end{array} \end{aligned}$$

where \({\textbf{X}}=({\textbf{x}}_1,\dots ,{\textbf{x}}_n)\) is a random sample from \(P_{{\textbf{x}}}(\varvec{\theta })\), \(\varvec{\theta }=(\theta _1,\dots ,\theta _r)^\prime \) is a vector of unknown parameters, \(\widehat{\varvec{\theta }}\) is the MLE of \(\varvec{\theta }\), and \(L(\varvec{\theta }_0,{\textbf{X}})\) and \(L(\widehat{\varvec{\theta }},{\textbf{X}})\) (\(\ell (\varvec{\theta }_0,{\textbf{X}})\) and \(\ell (\widehat{\varvec{\theta }},{\textbf{X}})\)) are the likelihood (log-likelihood) functions of the vector of parameters under the null and alternative hypotheses respectively. When the sample size \(n\rightarrow \infty \) and \(\textrm{H}_0\) holds, the distribution of \(\textrm{LRT}(\mathbf {{\textbf{X}}},\varvec{\theta })\) converges to \(\chi ^2_r\).

Assume now that \(\varvec{\theta }=(\varvec{\theta }_{r-k},\varvec{\theta }_k)^\prime \), where \(\varvec{\theta }_{r-k}\) is the set of \((r-k)\) fixed parameters under the null hypothesis, i.e., \(\textrm{H}_0: \ \varvec{\theta }_{r-k}=\varvec{\theta }_0\). Then the LRT in logarithmic form has the representation

$$\begin{aligned} \begin{array}{rcl} \textrm{LRT}({\textbf{X}},\varvec{\theta }_0) &{}=&{} \displaystyle - 2 \ln \left( \frac{\max _{\varvec{\theta }_k}L(\varvec{\theta }_0, {\textbf{X}})}{\max _{\varvec{\theta }} L(\varvec{\theta },{\textbf{X}})} \right) \\ &{}=&{}-2 [\ell ((\varvec{\theta }_0,\widehat{\varvec{\theta }}_k)^\prime ,{\textbf{X}}))-\ell (\widehat{\varvec{\theta }}, {\textbf{X}})], \\ \end{array} \end{aligned}$$

where \(\widehat{\varvec{\theta }}_k\) is the MLE of k non-fixed parameters under the null hypothesis. The distribution of \(\textrm{LRT}(\mathbf {{\textbf{X}}},\varvec{\theta }_0)\) converges to \(\chi ^2_{r-k}\) when the sample size \(n\rightarrow \infty \) and \(\textrm{H}_0\) holds; c.f. (Rao 1973, §6e).

Let \({\textbf{x}}\sim t_{p,\nu }(\varvec{\mu },\varvec{\Sigma })\), with \(\varvec{\mu }=(\mu _1,\dots ,\mu _p)^\prime \), the scale parameter \(\varvec{\Sigma }> 0 : p \times p\) and let \(\varvec{\theta }=(\varvec{\mu }^\prime ,\mathrm {vech\,}^\prime \varvec{\Sigma })^\prime \). Due to (1), the likelihood function \(L(\varvec{\theta },{\textbf{X}})\) is

$$\begin{aligned} L(\varvec{\theta }, {\textbf{X}})=(c_p)^n|\varvec{\Sigma }|^{-\frac{n}{2}}\prod _{i=1}^n\big [1+\tfrac{1}{\nu }({\textbf{x}}_i-\varvec{\mu })^\prime \varvec{\Sigma }^{-1}({\textbf{x}}_i-\varvec{\mu })\big ]^{-\frac{\nu +p}{2}}, \end{aligned}$$

and the log-likelihood function can be presented as

$$\begin{aligned} \ell (\varvec{\theta },{\textbf{X}})=n\ln c_p-\tfrac{n}{2}\ln |\varvec{\Sigma }|-\tfrac{\nu +p}{2}\sum _{i=1}^n\ln \left[ 1+\tfrac{1}{\nu }({\textbf{x}}_i-\varvec{\mu })^\prime \varvec{\Sigma }^{-1}({\textbf{x}}_i-\varvec{\mu })\right] . \end{aligned}$$
(4)

Differentiating (4) with respect to \(\varvec{\mu }\) and \(\varvec{\Sigma }\) gives the following partial derivatives:

$$\begin{aligned} \begin{array}{lcl} \displaystyle \frac{\partial \ell }{\partial \varvec{\mu }}&{}=&{} \displaystyle \tfrac{\nu +p}{\nu }\sum _{i=1}^n\frac{({\textbf{x}}_i-\varvec{\mu })^\prime \varvec{\Sigma }^{-1}}{1+\frac{1}{\nu }({\textbf{x}}_i-\varvec{\mu })^\prime \varvec{\Sigma }^{-1}({\textbf{x}}_i-\varvec{\mu })}, \\ \displaystyle \frac{\partial \ell }{\partial \varvec{\Sigma }}&{}=&{} \displaystyle \left\{ -\tfrac{n}{2}\mathrm {vec\,}^\prime \varvec{\Sigma }^{-1}+\tfrac{v+p}{2v},\sum _{i=1}^{n}\frac{({\textbf{x}}_i-\varvec{\mu })^\prime \otimes ({\textbf{x}}_i-\varvec{\mu })^\prime }{1+\frac{1}{\nu }({\textbf{x}}_i-\varvec{\mu })^\prime \varvec{\Sigma }^{-1}({\textbf{x}}_i-\varvec{\mu })}\varvec{\Sigma }^{-1}\otimes \varvec{\Sigma }^{-1}\right\} {\textbf{D}}_p \end{array} \end{aligned}$$
(5)

with \({\textbf{D}}_{p}\) being the respective duplication matrix. Thus, to obtain MLEs of \(\varvec{\mu }\) and \(\varvec{\Sigma }\), we have to solve the following system of equations:

$$\begin{aligned} \left\{ \begin{array}{lcl} \displaystyle \varvec{\mu }&{}=&{} \displaystyle \sum _{i=1}^n\frac{{\textbf{x}}_i}{t_i} \Big / \sum _{i=1}^n\frac{1}{t_i} \\ \displaystyle \varvec{\Sigma }&{}=&{} \displaystyle \frac{\nu +p}{n\nu }\sum _{i=1}^n\frac{({\textbf{x}}_i-\varvec{\mu })({\textbf{x}}_i-\varvec{\mu })^\prime }{t_i} \end{array} \right. \end{aligned}$$

with \(t_i=1+\frac{1}{\nu }({\textbf{x}}_i-\varvec{\mu })^\prime \varvec{\Sigma }^{-1}({\textbf{x}}_i-\varvec{\mu })\). Noting that \(t_i\) depends on \(\varvec{\mu }\) and \(\varvec{\Sigma }\), the above system of equations can be solved numerically in the following way:

  • fix starting values \(\varvec{\mu }^{(0)}\) and \(\varvec{\Sigma }^{(0)}\),

  • for \(k=1,2,\dots \), update the following system

    $$\begin{aligned} \begin{array}{lcl} t_i^{(k-1)} &{}=&{} 1+\frac{1}{\nu }({\textbf{x}}_i-\varvec{\mu }^{(k-1)})^\prime \left( \varvec{\Sigma }^{(k-1)}\right) ^{-1}({\textbf{x}}_i-\varvec{\mu }^{(k-1)}), \quad i=1,\dots ,n \\ \varvec{\mu }^{(k)} &{}=&{} \displaystyle \sum _{i=1}^n\frac{{\textbf{x}}_i}{t_i^{(k-1)}} \Big / \sum _{i=1}^n\frac{1}{t_i^{(k-1)}} \\ \varvec{\Sigma }^{(k)} &{}=&{} \displaystyle \frac{\nu +p}{n\nu }\sum _{i=1}^n\frac{({\textbf{x}}_i-\varvec{\mu }^{(k)})({\textbf{x}}_i-\varvec{\mu }^{(k)})^\prime }{t_i^{(k-1)}} \end{array} \end{aligned}$$
    (6)

    until the convergence criterion is satisfied.

In our considerations the algorithm stops when \(||\varvec{\mu }^{(k)}-\varvec{\mu }^{(k-1)}||\le 10^{-6}\) and \(||\varvec{\Sigma }^{(k)}-\varvec{\Sigma }^{(k-1)}||_F^2\le 10^{-6}\), where \(||\cdot ||_F\) denotes the Frobenius norm. To the best of our knowledge there are no recommendations regarding the choice of starting point. In our research we choose \(\varvec{\mu }={\textbf{0}}\) and \(\varvec{\Sigma }={\textbf{I}}_p\). Throughout the paper we denote the solutions obtained by \(\widehat{\varvec{\mu }}\) and \(\widehat{\varvec{\Sigma }}\).

Remark 2

In applications with missing data the system of Eq. (5) is solved using the EM algorithm; cf. Liu and Rubin (1995), McLachlan and Krishnan (1997), Finegold and Drton (2011). The procedure for solving (6) coincides with the EM algorithm.

In order to determine the LRT statistic to test

$$\begin{aligned} \left\{ \begin{array}{l} \textrm{H}_0: \ \varvec{\Sigma }=\varvec{\Sigma }_0\\ \textrm{H}_1: \ \varvec{\Sigma }\not =\varvec{\Sigma }_0\\ \end{array} \right. \end{aligned}$$
(7)

when no constraints are imposed on \(\varvec{\mu }\), it is necessary to compute the MLE of \(\varvec{\mu }\) under \(\textrm{H}_{0}\), which will be denoted by \(\widehat{\varvec{\mu }}_0\). Using the same approach as above, \(\widehat{\varvec{\mu }}_0\) is obtained as the solution of

$$\begin{aligned} \varvec{\mu }_0 = \displaystyle \sum _{i=1}^n\frac{{\textbf{x}}_i}{t_{i0}} \Big / \sum _{i=1}^n\frac{1}{t_{i0}}, \end{aligned}$$

where \(t_{i0}=1+\frac{1}{\nu }({\textbf{x}}_i-\varvec{\mu }_0)^\prime \varvec{\Sigma }_0^{-1}({\textbf{x}}_i-\varvec{\mu }_0)\). Applying the same algorithm as before, for a fixed \(\varvec{\Sigma }_0\) we choose a starting value \(\varvec{\mu }^{(0)}\), and then, for \(k=1,2,\dots \), we solve the system

$$\begin{aligned} \left\{ \begin{array}{lcl} t_{i0}^{(k-1)} &{}=&{} 1+\frac{1}{\nu }({\textbf{x}}_i-\varvec{\mu }^{(k-1)})^\prime \varvec{\Sigma }_0^{-1}({\textbf{x}}_i-\varvec{\mu }^{(k-1)}), \quad i=1,\dots ,n \\ \varvec{\mu }_0^{(k)} &{} = &{} \displaystyle \sum _{i=1}^n\frac{{\textbf{x}}_i}{t_{i0}^{(k-1)}} \Big / \sum _{i=1}^n\frac{1}{t_{i0}^{(k-1)}} \end{array} \right. \end{aligned}$$
(8)

until the stopping rule holds.

The LRT statistic for testing (7) when no constraints are imposed on \(\varvec{\mu }\) has the following form:

$$\begin{aligned} \textrm{LRT}({\textbf{X}},\varvec{\Sigma }_0)= -2\ln \left[ \left( \frac{|\widehat{\varvec{\Sigma }|}}{|\varvec{\Sigma }_0|}\right) ^{n/2}\prod _{i=1}^n \left( \frac{1+\frac{1}{\nu }({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)^\prime \varvec{\Sigma }_0^{-1}({\textbf{x}}_i-\widehat{\varvec{\mu }}_0) }{1+\frac{1}{\nu }({\textbf{x}}_i-\widehat{\varvec{\mu }})^\prime \widehat{\varvec{\Sigma }}^{-1}({\textbf{x}}_i-\widehat{\varvec{\mu }})}\right) ^{ -(\nu +p)/2}\right] , \end{aligned}$$

where \(\widehat{\varvec{\mu }}\) and \(\widehat{\varvec{\Sigma }}\) are the numerically determined MLEs of \(\varvec{\mu }\) and \(\varvec{\Sigma }\) (solutions of (6)) and \(\widehat{\varvec{\mu }}_0\) is the solution of (8).

Due to Kollo and Valge (2020, formula (12.3)), their Proposition 12.1 should be rewritten as follows.

Proposition 1

The LRT statistic for testing (7) when no constraints are imposed on \(\varvec{\mu }\) is given by

$$\begin{aligned} \begin{array}{l} \textrm{LRT}({\textbf{X}}, \varvec{\Sigma }_0) \\ = -n \left[ \ln |\widehat{\varvec{\Sigma }}|- \ln |\varvec{\Sigma }_0| \right] \\ \displaystyle + (\nu +p)\\ \quad \times \displaystyle \sum _{i=1}^n \left\{ \ln \left[ 1+\tfrac{1}{\nu } ({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)^\prime \varvec{\Sigma }_0^{-1}({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)\right] -\ln \left[ 1+\tfrac{1}{\nu } ({\textbf{x}}_i-\widehat{\varvec{\mu }})^\prime \widehat{\varvec{\Sigma }}^{-1}({\textbf{x}}_i-\widehat{\varvec{\mu }})\right] \right\} , \end{array} \end{aligned}$$

where \({\textbf{X}}= ({\textbf{x}}_1, \dots , {\textbf{x}}_n)\) is a random sample from \(t_{p,\nu }(\varvec{\mu },\varvec{\Sigma })\), \(\nu >2\) is known, \(\widehat{\varvec{\mu }}, \widehat{\varvec{\Sigma }}\) are the solutions of (6), and \(\widehat{\varvec{\mu }}_0\) is the solution of (8).

When \(n\rightarrow \infty \) and \(\textrm{H}_{0} \) holds, the distribution of \(\textrm{LRT}({\textbf{X}},\varvec{\Sigma }_0)\) tends to the chi-square distribution with \(p(p+1)/2\) degrees of freedom.

Note that the convergence to the chi-square distribution follows from e.g. Wilks (1938).

Remark 3

The LRT statistic in Proposition 1 contains elements of the form

$$\begin{aligned} (\nu +p)\ln \left( 1+\frac{u_i}{\nu }\right) , \end{aligned}$$

where \(u_i\) stands for \(({\textbf{x}}_i-\overline{{\textbf{x}}})^\prime \varvec{\Sigma }_0^{-1}({\textbf{x}}_i-\overline{{\textbf{x}}})\) or/and \(({\textbf{x}}_i-\overline{{\textbf{x}}})^\prime \widehat{\varvec{\Sigma }}^{-1}({\textbf{x}}_i-\overline{{\textbf{x}}})\).

Then, if \(\nu \rightarrow \infty \), using the l’Hôpital rule we obtain

$$\begin{aligned} \lim _{\nu \rightarrow \infty }(\nu +p)\ln \left( 1+\tfrac{u_i}{\nu }\right) = \lim _{\nu \rightarrow \infty }\frac{\ln \left( 1+\frac{u_i}{\nu }\right) }{\frac{1}{\nu +p}}= \lim _{\nu \rightarrow \infty }\frac{u_i(\nu +p)^2}{\nu (\nu +u_i)}=u_i. \end{aligned}$$

This means that the LRT converges to

$$\begin{aligned} \begin{array}{l} \hspace{-.5cm}-n \left[ \ln |\widehat{\varvec{\Sigma }}|- \ln |\varvec{\Sigma }_0| \right] + \sum \limits _{i=1}^n ({\textbf{x}}_i-\overline{{\textbf{x}}})^\prime \left( \varvec{\Sigma }_0^{-1}-\widehat{\varvec{\Sigma }}^{-1}\right) ({\textbf{x}}_i-\overline{{\textbf{x}}})\\ \hspace{.2cm}=-n \ln |\widehat{\varvec{\Sigma }}\varvec{\Sigma }_0^{-1}| + \textrm{tr}\left[ \left( \varvec{\Sigma }_0^{-1}-\widehat{\varvec{\Sigma }}^{-1}\right) \sum \limits _{i=1}^n ({\textbf{x}}_i-\overline{{\textbf{x}}})^\prime ({\textbf{x}}_i-\overline{{\textbf{x}}})\right] . \end{array} \end{aligned}$$

From the definition of \({\textbf{S}}\) given in (2) we have

$$\begin{aligned} \textrm{LRT}\ \rightarrow \ -n \left[ \ln |{\textbf{S}}\varvec{\Sigma }_0^{-1}| + \textrm{tr}\left( {\textbf{S}}\varvec{\Sigma }_0^{-1}\right) -p \right] , \end{aligned}$$

which is the LRT statistic for testing (7) under normality; cf. Kollo et al. (2016).

4 Rao score test

The RST statistic is a function of the score vector and the Fisher information matrix.

Definition 5

The RST statistic for testing hypothesis (3) is of the form

$$\begin{aligned} \textrm{RST}({{\textbf{X}}},\varvec{\theta }_0) = {\textbf{u}}^\prime ({{\textbf{X}}},\varvec{\theta }_0) {\textrm{I}({{\textbf{X}}},\varvec{\theta }_0)}^{-1} {\textbf{u}}({{\textbf{X}}},\varvec{\theta }_0), \end{aligned}$$

where \({\textbf{X}}=({\textbf{x}}_1,\dots ,{\textbf{x}}_n)\) is a random sample from \(P_{{\textbf{x}}}(\varvec{\theta })\), \({\textbf{u}}({{\textbf{X}}},\varvec{\theta }_0)\) is the score function, \({\textrm{I}({{\textbf{X}}},\varvec{\theta }_0)}\) is the information matrix, and \(\varvec{\theta }=(\theta _1,\dots ,\theta _r)^\prime \).

Following Rao (1948), when the sample size \(n \rightarrow \infty \) and \(\textrm{H}_0\) holds, the distribution of \(\textrm{RST}(\mathbf {{\textbf{X}}},\varvec{\theta }_0)\) converges to \(\chi _r^2\).

Assume now that \(\varvec{\theta }=(\varvec{\theta }_{r-k},\varvec{\theta }_k)^\prime \), where \(\varvec{\theta }_{r-k}\) is the set of \((r-k)\) fixed parameters under the null hypothesis, i.e., \(\textrm{H}_0: \ \varvec{\theta }_{r-k}=\varvec{\theta }_0\). Then the RST has the representation

$$\begin{aligned} \textrm{RST}({\textbf{X}},\varvec{\theta }_0) = {\textbf{u}}^\prime _1({\textbf{X}},\varvec{\theta }_0,\widehat{\varvec{\theta }}_k) \textrm{I}_{1.2}^{-1}({\textbf{X}},\varvec{\theta }_0,\widehat{\varvec{\theta }}_k) {\textbf{u}}_1({\textbf{X}},\varvec{\theta }_0,\widehat{\varvec{\theta }_k}), \end{aligned}$$
(9)

where the score vector \({\textbf{u}}^\prime ({{\textbf{X}}},\varvec{\theta }_0,\varvec{\theta }_k)\) has been divided into two parts, related to partial derivatives of \(\ell (\varvec{\theta },{\textbf{X}})\) with respect to \(\varvec{\theta }_{r-k}\) (\(={\textbf{u}}_1({\textbf{X}},\varvec{\theta }_0,\varvec{\theta }_k)\)) and \(\varvec{\theta }_k\)(\(={\textbf{u}}_2({\textbf{X}},\varvec{\theta }_0,\varvec{\theta }_k)\)), \(\textrm{I}_{1.2}({\textbf{X}},\varvec{\theta }_0,\varvec{\theta }_k)\) is the Schur complement of \(\textrm{I}_{22}\) in the Fisher information matrix \(\textrm{I}({\textbf{X}},\varvec{\theta }_0,\varvec{\theta }_k)=(\textrm{I}_{ij}) \ {i,j=1,2}\), i.e., \(\textrm{I}_{1.2}=\textrm{I}_{11}-\textrm{I}_{12}\textrm{I}_{22}^{-1}\textrm{I}_{21}\), and where \(\widehat{\varvec{\theta }}_k\) is the MLE of k non-fixed parameters under the null hypothesis; cf. Rao (2005). Moreover, when under the null hypothesis there are k parameters not fixed, the sample size \(n\rightarrow \infty \) and \(\textrm{H}_0\) holds, the distribution of \(\textrm{RST}(\mathbf {{\textbf{X}}},\varvec{\theta })\) tends to \(\chi ^2_{r-k}\); cf. (Rao 1973, Sect. 6e).

In our case \(\varvec{\theta }=(\varvec{\mu }^\prime ,\mathrm {vech\,}^\prime \varvec{\Sigma })^\prime \) and the \((p+\frac{p(p+1)}{2})\)-dimensional score vector of partial derivatives has the form

$$\begin{aligned} {\textbf{u}}({{\textbf{X}}},\varvec{\mu },\varvec{\Sigma })= \left( {\textbf{u}}_1({\textbf{X}},\varvec{\mu },\varvec{\Sigma }), \ {\textbf{u}}_2({\textbf{X}},\varvec{\mu },\varvec{\Sigma }) \right) ^\prime = \left( \frac{\partial \ell (\varvec{\mu },\varvec{\Sigma },{\textbf{X}})}{\partial \varvec{\mu }}, \ \frac{\partial \ell (\varvec{\mu },\varvec{\Sigma },{\textbf{X}})}{\partial \varvec{\Sigma }^{\Delta }} \right) ^\prime .\nonumber \\ \end{aligned}$$
(10)

The components of the score vector are given by (5). Expressions for the elements of the \((p+p(p+1)/2) \times (p+p(p+1)/2)\) Fisher information matrix of a \(t_{p,\nu }\)-distributed random vector can be computed using the formulae from Mitchell (1989), which in the matrix representation have the form

$$\begin{aligned} \begin{array}{l} \hspace{-.3cm}{\textbf{I}}({\textbf{X}},\varvec{\mu }, \varvec{\Sigma })=\\ \left( \begin{array}{cc} \frac{(\nu +p)n}{\nu +p+2}\varvec{\Sigma }^{-1} &{} {\textbf{0}}\\ {\textbf{0}}&{} \frac{n}{2(\nu +p+2)}{\textbf{D}}_p^\prime \left[ (\nu +p)(\varvec{\Sigma }^{-1}\otimes \varvec{\Sigma }^{-1})-\mathrm {vec\,}\varvec{\Sigma }^{-1}\mathrm {vec\,}^\prime \varvec{\Sigma }^{-1}\right] {\textbf{D}}_p \end{array} \right) . \end{array}\nonumber \\ \end{aligned}$$
(11)

By inserting the MLE of \(\varvec{\mu }\) under the null hypothesis, \(\widehat{\varvec{\mu }}_0\), and \(\varvec{\Sigma }_0\) into the score vector (10) and information matrix (11), it can be seen that since there are no restrictions on \(\varvec{\mu }\) in the hypothesis (7), it is enough to take into account the second component of the score vector and the lower diagonal entry of the Fisher information matrix.

Proposition 2

The RST statistic for testing (7) when no constraints are imposed on \(\varvec{\mu }\) is given by

$$\begin{aligned} \begin{array}{rcl} \textrm{RST}({\textbf{X}},\varvec{\Sigma }_0)= & {} \frac{n(\nu +p+2)}{2\nu (\nu +p)} \left\{ \nu \cdot \textrm{tr}[({\textbf{V}}\varvec{\Sigma }_0^{-1})^2]+\textrm{tr}^2({\textbf{V}}\varvec{\Sigma }_0^{-1}) \right\} \end{array} \end{aligned}$$

with

$$\begin{aligned} {\textbf{V}}={\textbf{V}}({\textbf{X}},\varvec{\Sigma }_0)= \varvec{\Sigma }_0-\tfrac{\nu +p}{n\nu }\sum \limits _{i=1}^n\frac{({\textbf{x}}_i-\widehat{\varvec{\mu }}_0) ({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)^\prime }{1+\frac{1}{\nu }({\textbf{x}}_i-\widehat{\varvec{\mu }}_0) ^\prime \varvec{\Sigma }_0^{-1}({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)}, \end{aligned}$$
(12)

where \({\textbf{X}}= ({\textbf{x}}_1, \dots , {\textbf{x}}_n)\) is a sample from \(t_{\nu ,p}(\varvec{\mu },\varvec{\Sigma })\), \(\nu >2\) is known, and \(\widehat{\varvec{\mu }}_0\) is the solution of (8).

When \(n\rightarrow \infty \) and \(\textrm{H}_{0}\) holds, the distribution of \(\textrm{RST}({\textbf{X}},\varvec{\Sigma }_0)\) tends to the chi-square distribution with \(p(p+1)/2\) degrees of freedom.

Proof

Since there are no restrictions on \(\varvec{\mu }\) in (7), to determine the form of the RST we use (9), with the role of fixed/non-fixed parameters interchanged, i.e.,

$$\begin{aligned} \textrm{RST}({\textbf{X}},\varvec{\Sigma }_0) = {\textbf{u}}^\prime _2({\textbf{X}},\widehat{\varvec{\mu }},\varvec{\Sigma }_0)\textrm{I}_{2.1}^{-1}({\textbf{X}},\widehat{\varvec{\mu }},\varvec{\Sigma }_0) {\textbf{u}}_2({\textbf{X}},\widehat{\varvec{\mu }},\varvec{\Sigma }_0). \end{aligned}$$

From (11) it follows that \(\textrm{I}_{2.1}({\textbf{X}},\widehat{\varvec{\mu }},\varvec{\Sigma }_0)\) reduces to \(\textrm{I}_{22}({\textbf{X}},\widehat{\varvec{\mu }},\varvec{\Sigma }_0)\).

Due to the second equality of (5)

$$\begin{aligned} {\textbf{u}}_2({\textbf{X}},\widehat{ \varvec{\mu }},\varvec{\Sigma }_0) =-\tfrac{n}{2}\Big [\mathrm {vec\,}\varvec{\Sigma }_0^{-1}-\tfrac{\nu +p}{n\nu }(\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1})\sum _{i=1}^n\frac{({\textbf{x}}_i-\widehat{\varvec{\mu }})\otimes ({\textbf{x}}_i-\widehat{\varvec{\mu }})}{1+\frac{1}{\nu }({\textbf{x}}_i-\widehat{\varvec{\mu }})^\prime \varvec{\Sigma }_0^{-1} ({\textbf{x}}_i-\widehat{\varvec{\mu }})}\Big ]{\textbf{D}}_p. \end{aligned}$$

Using the properties of the vec-operator we can rewrite the above as

$$\begin{aligned} {\textbf{u}}_2({\textbf{X}},\widehat{\varvec{\mu }},\varvec{\Sigma }_0) =-\tfrac{n}{2}\Big [(\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1})\Big (\mathrm {vec\,}\varvec{\Sigma }_0-\tfrac{\nu +p}{n\nu }\sum _{i=1}^n\frac{\mathrm {vec\,}(({\textbf{x}}_i-\widehat{\varvec{\mu }})({\textbf{x}}_i-\widehat{\varvec{\mu }})^\prime )}{1+\frac{1}{\nu }({\textbf{x}}_i-\widehat{\varvec{\mu }}) ^\prime \varvec{\Sigma }_0^{-1} ({\textbf{x}}_i-\widehat{\varvec{\mu }})}\Big )\Big ]{\textbf{D}}_p, \end{aligned}$$

and, using the expression for \({\textbf{V}}\) from (12) we obtain

$$\begin{aligned} {\textbf{u}}_2({\textbf{X}},\widehat{\varvec{\mu }}_0,\varvec{\Sigma }_0)=-\tfrac{n}{2}(\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1})\mathrm {vec\,}{\textbf{V}}\cdot {\textbf{D}}_p. \end{aligned}$$
(13)

Observe now that the diagonal block \({\textbf{I}}_{22}\) of (11) can be represented as

$$\begin{aligned} \begin{array}{rcl} {\textbf{I}}_{22}(\varvec{\Sigma }_0)&{}=&{}\frac{n}{2(\nu +p+2)}{\textbf{D}}_p^\prime (\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1})\\ &{}&{} \times \left[ (\nu +p)(\varvec{\Sigma }_0\otimes \varvec{\Sigma }_0)-\mathrm {vec\,}\varvec{\Sigma }_0\mathrm {vec\,}^\prime \varvec{\Sigma }_0 \right] (\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1}){\textbf{D}}_p, \end{array} \end{aligned}$$

and since the duplication matrix \({\textbf{D}}_p\) is of full rank, \({\textbf{I}}_{22}^{-1}\) can be expressed using the Moore-Penrose inverse \({\textbf{D}}_p^+\)

$$\begin{aligned} \begin{array}{rcl} {\textbf{I}}_{22}^{-1}(\varvec{\Sigma }_0)&{}=&{}\frac{2(\nu +p+2)}{n}{\textbf{D}}_p^+(\varvec{\Sigma }_0\otimes \varvec{\Sigma }_0)\\ &{}&{} \times \left[ (\nu +p)(\varvec{\Sigma }_0\otimes \varvec{\Sigma }_0)-\mathrm {vec\,}\varvec{\Sigma }_0\mathrm {vec\,}^\prime \varvec{\Sigma }_0 \right] ^{-1} (\varvec{\Sigma }_0\otimes \varvec{\Sigma }_0){\textbf{D}}_p^{+'}. \end{array} \end{aligned}$$

The inverse matrix in square brackets we obtain using the inverse binomial theorem, i.e., assuming that all included inverses exist,

$$\begin{aligned} ({\textbf{A}}+{\textbf{B}}{\textbf{C}}{\textbf{D}})^{-1}={\textbf{A}}^{-1}-{\textbf{A}}^{-1}{\textbf{B}}({\textbf{D}}{\textbf{A}}^{-1}{\textbf{B}}+{\textbf{C}}^{-1})^{-1}{\textbf{D}}{\textbf{A}}^{-1}; \end{aligned}$$

cf. (Kollo and von Rosen 2005, p. 75). Setting \({\textbf{A}}=(\nu +p)(\varvec{\Sigma }_0\otimes \varvec{\Sigma }_0)\), \({\textbf{B}}=\mathrm {vec\,}\varvec{\Sigma }_0\), \({\textbf{C}}=-1\) and \({\textbf{D}}=\mathrm {vec\,}^\prime \varvec{\Sigma }_0\), we have

$$\begin{aligned} \begin{array}{l} \hspace{-.5cm}\left[ (\nu +p)(\varvec{\Sigma }_0\otimes \varvec{\Sigma }_0)-\mathrm {vec\,}\varvec{\Sigma }_0\mathrm {vec\,}^\prime \varvec{\Sigma }_0\right] ^{-1}\\ =\frac{1}{\nu +p}(\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1})\\ -\frac{1}{(\nu +p)^2}(\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1})\mathrm {vec\,}\varvec{\Sigma }_0\big [\mathrm {vec\,}^\prime \varvec{\Sigma }_0\frac{1}{\nu +p}(\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1})\mathrm {vec\,}\varvec{\Sigma }_0-1\big ]^{-1} \\ \times \mathrm {vec\,}^\prime \varvec{\Sigma }_0(\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1}), \end{array} \end{aligned}$$

and since the sum in the square brackets on right-hand side equals \(\frac{p}{\nu +p}-1\) we have

$$\begin{aligned} \left[ (\nu +p)(\varvec{\Sigma }_0\otimes \varvec{\Sigma }_0)-\mathrm {vec\,}\varvec{\Sigma }_0\mathrm {vec\,}^\prime \varvec{\Sigma }_0 \right] ^{-1}=\tfrac{1}{\nu +p}\Big ((\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1}) +\tfrac{1}{\nu }\mathrm {vec\,}\varvec{\Sigma }_0^{-1}\mathrm {vec\,}^\prime \varvec{\Sigma }_0^{-1}\Big ). \end{aligned}$$

Hence

$$\begin{aligned} {\textbf{I}}_{22}^{-1}(\varvec{\Sigma }_0){} & {} = \tfrac{2(\nu +p+2)}{n(\nu +p)} \nonumber \\{} & {} \qquad \times {\textbf{D}}_p^+\left( \varvec{\Sigma }_0\otimes \varvec{\Sigma }_0\right) \left[ \left( \varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1}\right) +\tfrac{1}{\nu }\mathrm {vec\,}\varvec{\Sigma }_0^{-1}\mathrm {vec\,}^\prime \varvec{\Sigma }_0^{-1} \right] (\varvec{\Sigma }_0\otimes \varvec{\Sigma }_0) {\textbf{D}}_p^{+'} \nonumber \\{} & {} = \tfrac{2(\nu +p+2)}{n\nu (\nu +p)}{\textbf{D}}_p^+ \left[ \nu (\varvec{\Sigma }_0\otimes \varvec{\Sigma }_0)+\mathrm {vec\,}\varvec{\Sigma }_0\mathrm {vec\,}^\prime \varvec{\Sigma }_0 \right] {\textbf{D}}_p^{+'}. \end{aligned}$$
(14)

Combining (13) and (14),

$$\begin{aligned} \textrm{RST}({\textbf{X}},\varvec{\Sigma }_0){} & {} ={\textbf{u}}^\prime _2(\widehat{\varvec{\mu }}_0,\varvec{\Sigma }_0){\textbf{I}}_{22}^{-1}(\varvec{\Sigma }_0){\textbf{u}}_2(\widehat{\varvec{\mu }}_0,\varvec{\Sigma }_0)\\{} & {} = \frac{n(\nu +p+2)}{2\nu (\nu +p)}\mathrm {vec\,}^\prime {\textbf{V}}(\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1}){\textbf{D}}_p{\textbf{D}}_p^+\\{} & {} \quad \times \left[ \nu (\varvec{\Sigma }_0\otimes \varvec{\Sigma }_0)+\mathrm {vec\,}\varvec{\Sigma }_0\mathrm {vec\,}^\prime \varvec{\Sigma }_0 \right] {\textbf{D}}_p^{+'}{\textbf{D}}_p^\prime (\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1})\mathrm {vec\,}{\textbf{V}}. \end{aligned}$$

From (Magnus and Neudecker 1986, formula (54)) it is known that \({\textbf{D}}_p{\textbf{D}}_p^{+}={\textbf{N}}_p\), where \({\textbf{N}}_p=\frac{1}{2}({\textbf{I}}_{p^2}+{\textbf{K}}_{p,p})\) with \({\textbf{K}}_{p,p}\) being the commutation matrix, i.e., the matrix for which \({\textbf{K}}_{p,p}\mathrm {vec\,}{\textbf{A}}=\mathrm {vec\,}{\textbf{A}}^\prime \). Thus, for a symmetric matrix \({\textbf{V}}\) we have \({\textbf{N}}_p\mathrm {vec\,}{\textbf{V}}=\mathrm {vec\,}{\textbf{V}}\). Moreover, since \((\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1}){\textbf{N}}_p={\textbf{N}}_p(\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1})\), we obtain

$$\begin{aligned} \begin{array}{rcl} \textrm{RST}({\textbf{X}},\varvec{\Sigma }_0)&{} =&{} \frac{n(\nu +p+2)}{2\nu (\nu +p)}\mathrm {vec\,}^\prime {\textbf{V}}\left[ \nu (\varvec{\Sigma }_0^{-1}\otimes \varvec{\Sigma }_0^{-1})+\mathrm {vec\,}\varvec{\Sigma }_0^{-1}\mathrm {vec\,}^\prime \varvec{\Sigma }_0^{-1} \right] \mathrm {vec\,}{\textbf{V}}\\ &{}=&{}\frac{n(\nu +p+2)}{2\nu (\nu +p)}\left[ \nu \cdot \mathrm {vec\,}^\prime (\varvec{\Sigma }_0^{-1}{\textbf{V}}\varvec{\Sigma }_0^{-1})\mathrm {vec\,}{\textbf{V}}+\mathrm {vec\,}^\prime {\textbf{V}}\mathrm {vec\,}\varvec{\Sigma }_0^{-1}\mathrm {vec\,}^\prime \varvec{\Sigma }_0^{-1}\mathrm {vec\,}{\textbf{V}}\right] . \end{array} \end{aligned}$$

Finally, again using the relation between the vec-operator and the trace function, we obtain

$$\begin{aligned} \textrm{RST}({\textbf{X}},\varvec{\Sigma }_0) = \tfrac{n(\nu +p+2)}{2\nu (\nu +p)} \left\{ \nu \cdot \textrm{tr}[({\textbf{V}}\varvec{\Sigma }_0^{-1})^2]+\textrm{tr}^2({\textbf{V}}\varvec{\Sigma }_0^{-1}) \right\} . \end{aligned}$$

The limiting distribution follows from e.g. Rao (2005).\(\square \)

Note that if \(\nu \rightarrow \infty \), the \(\widehat{\varvec{\mu }}_0\) becomes the average from the observations, \(\overline{{\textbf{x}}}\), and

$$\begin{aligned} {\textbf{V}}=\varvec{\Sigma }_0-\tfrac{1}{n}{\textbf{X}}{\textbf{Q}}_n{\textbf{X}}^\prime =\varvec{\Sigma }_0-{\textbf{S}}, \end{aligned}$$

where \({\textbf{S}}=\widehat{\varvec{\Sigma }}\) under normality. Thus, the formula for RST reduces to the formula for RST under normality

$$\begin{aligned} \textrm{RST}\ \rightarrow \ \tfrac{n}{2}\textrm{tr}\left[ ({\textbf{I}}_p-{\textbf{S}}\varvec{\Sigma }_0^{-1})^2\right] ; \end{aligned}$$

cf. Kollo et al. (2016).

Finally we mention that the RST statistic can also be determined using the formulae stated in Sutradhar (1993), where Neyman’s score test (Neyman (1959)) for testing covariance structure when n is large and \(\varvec{\mu }\) and \(\nu \) are unknown is proposed.

5 Wald test statistic

The WT statistic is a function of the MLE and the Fisher information matrix.

Definition 6

The WT statistic for testing hypothesis (3) is given by

$$\begin{aligned} \textrm{WT}({\textbf{X}},\varvec{\theta }_0) = (\widehat{\varvec{\theta }}-\varvec{\theta }_0)^\prime {\textbf{I}}({\textbf{X}},\widehat{\varvec{\theta }}) (\widehat{\varvec{\theta }}-\varvec{\theta }_0), \end{aligned}$$

where \(\varvec{\theta }=(\theta _1,\dots ,\theta _r)^\prime \), \(\widehat{\varvec{\theta }}\) is the MLE of \(\varvec{\theta }\) and \({{\textbf{X}}}\) is a random sample from a \(P_{\varvec{\theta }}\)-distributed population.

When sample size \(n\rightarrow \infty \) and \(\textrm{H}_0\) holds then, similarly to the previous cases, the distribution of \(\textrm{WT}({{\textbf{X}}},\varvec{\theta }_0)\) converges to \(\chi ^2_r\); cf. Wald (1943). As in the case of the RST, if the restrictions in the null hypothesis are imposed only on \(r-k\) parameters, the WT can be represented as

$$\begin{aligned} \textrm{WT}({\textbf{X}},\varvec{\theta }_0) = (\widehat{\varvec{\theta }}_{r-k}-\varvec{\theta }_0)^\prime {\textbf{I}}_{1.2}({\textbf{X}},\widehat{\varvec{\theta }}) (\widehat{\varvec{\theta }}_{r-k}-\varvec{\theta }_0), \end{aligned}$$
(15)

where \(\widehat{\varvec{\theta }}_{r-k}\) is the MLE of \(\varvec{\theta }_{r-k}\) under the alternative and \({\textbf{I}}_{1.2}({\textbf{X}},\widehat{\varvec{\theta }})\) is the Schur complement of \(\textrm{I}_{22}\) in the information matrix \(\textrm{I}({\textbf{X}},\widehat{\varvec{\theta }})\); cf. Rao (2005). When the null hypothesis \(\textrm{H}_0: \varvec{\theta }_{r-k}=\varvec{\theta }_0\) holds, the distribution of the WT statistic converges to \(\chi ^2_{r-k}\).

Let us rewrite the hypothesis (7) in equivalent form as

$$\begin{aligned} {\left\{ \begin{array}{ll} \textrm{H}_0: \mathrm {vech\,}\varvec{\Sigma }= \mathrm {vech\,}\varvec{\Sigma }_0, \\ \textrm{H}_1: \mathrm {vech\,}\varvec{\Sigma }\ne \mathrm {vech\,}\varvec{\Sigma }_0, \end{array}\right. } \end{aligned}$$
(16)

since the vector of unknown parameters consists of \(\varvec{\mu }\) and \(\mathrm {vech\,}\varvec{\Sigma }\). Then the following proposition for the WT statistic can be formulated.

Proposition 3

The WT statistic for testing (7) when no constraints are imposed on \(\varvec{\mu }\) is given by

$$\begin{aligned} \textrm{WT}({\textbf{X}},\varvec{\Sigma }_0)= \tfrac{n}{2(\nu +p+2)} \left\{ (\nu +p)\cdot \textrm{tr}\left[ \left( {\textbf{I}}_p-\varvec{\Sigma }_0\widehat{\varvec{\Sigma }}^{-1}\right) ^2\right] - \textrm{tr}^2\left( {\textbf{I}}_p-\varvec{\Sigma }_0\widehat{\varvec{\Sigma }}^{-1}\right) \right\} , \end{aligned}$$

where \({\textbf{X}}= ({\textbf{x}}_1, \ldots , {\textbf{x}}_n)\) is a sample from \(t_{\nu ,p}(\varvec{\mu },\varvec{\Sigma })\), \(\nu >2\) is known, and \(\widehat{\varvec{\Sigma }}\) is the solution of (5).

When \(n\rightarrow \infty \) and \(\textrm{H}_0\) holds, the distribution of \(\textrm{WT}({\textbf{X}},\varvec{\Sigma }_0)\) tends to the chi-square distribution with \(p(p+1)/2\) degrees of freedom.

Proof

For testing (16), due to (15) the WT statistic can be expressed as

$$\begin{aligned} \textrm{WT}({\textbf{X}},\varvec{\Sigma }_0)=\mathrm {vech\,}^\prime (\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0){\textbf{I}}_{22}(\widehat{\varvec{\Sigma }}) \mathrm {vech\,}(\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0) \end{aligned}$$

since \(\textrm{I}_{2.1}({\textbf{X}},\widehat{\varvec{\mu }},\widehat{\varvec{\Sigma }})=\textrm{I}_{22}(\widehat{\varvec{\Sigma }})\) is the lower diagonal block of the Fisher information matrix in (11) with \(\varvec{\Sigma }\) replaced by its MLE, that is

$$\begin{aligned} \begin{array}{rcl} {\textbf{I}}_{22}(\widehat{\varvec{\Sigma }})= & {} \frac{n}{2(\nu +p+2)}{\textbf{D}}_p^\prime \left[ (\nu +p)(\widehat{\varvec{\Sigma }}^{-1}\otimes \widehat{\varvec{\Sigma }}^{-1})-\mathrm {vec\,}\widehat{\varvec{\Sigma }}^{-1} \mathrm {vec\,}^\prime \widehat{\varvec{\Sigma }}^{-1} \right] {\textbf{D}}_p. \end{array} \end{aligned}$$

We obtain

$$\begin{aligned} \begin{array}{rcl} \textrm{WT}({\textbf{X}},\varvec{\Sigma }_0)&{}&{}=\frac{n}{2(\nu +p+2)}\mathrm {vech\,}^\prime (\widehat{\varvec{\Sigma }} -\varvec{\Sigma }_0){\textbf{D}}_p^\prime \\ &{}&{}\quad \times \left[ (\nu +p)(\widehat{\varvec{\Sigma }}^{-1}\otimes \widehat{\varvec{\Sigma }}^{-1})-\mathrm {vec\,}\widehat{\varvec{\Sigma }} ^{-1}\mathrm {vec\,}^\prime \widehat{\varvec{\Sigma }}^{-1} \right] {\textbf{D}}_p\mathrm {vech\,}(\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0), \end{array} \end{aligned}$$

and since \({\textbf{D}}_p^\prime \mathrm {vech\,}(\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0)=\mathrm {vec\,}(\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0)\) (cf. Magnus and Neudecker 1986, formula (49)) we obtain

$$\begin{aligned} \begin{array}{rcl} \textrm{WT}({\textbf{X}},\varvec{\Sigma }_0)&{}&{}= \frac{n}{2(\nu +p+2)}\mathrm {vec\,}^\prime (\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0) \\ &{}&{}\quad \times \left[ (\nu +p)(\widehat{\varvec{\Sigma }}^{-1}\otimes \widehat{\varvec{\Sigma }}^{-1})-\mathrm {vec\,}\widehat{\varvec{\Sigma }} ^{-1}\mathrm {vec\,}^\prime \widehat{\varvec{\Sigma }}^{-1} \right] \mathrm {vec\,}(\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0)\\ &{}&{}= \frac{n}{2(\nu +p+2)}\left[ \mathrm {vec\,}^\prime (\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0) (\nu +p)(\widehat{\varvec{\Sigma }}^{-1}\otimes \widehat{\varvec{\Sigma }}^{-1})\mathrm {vec\,}(\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0)\right. \\ &{}&{}\quad -\left. \mathrm {vec\,}^\prime (\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0) \mathrm {vec\,}\widehat{\varvec{\Sigma }}^{-1}\mathrm {vec\,}^\prime \widehat{\varvec{\Sigma }}^{-1}\mathrm {vec\,}(\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0)\right] \\ &{}&{}= \frac{n(\nu +p)}{2(\nu +p+2)}\mathrm {vec\,}^\prime (\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0) \mathrm {vec\,}(\widehat{\varvec{\Sigma }}^{-1}(\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0)\widehat{\varvec{\Sigma }}^{-1})\\ &{}&{}\quad -\frac{n}{2(\nu +p+2)}\mathrm {vec\,}^\prime (\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0)\mathrm {vec\,}\widehat{\varvec{\Sigma }}^{-1} \mathrm {vec\,}^\prime \widehat{\varvec{\Sigma }}^{-1}\mathrm {vec\,}(\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0). \end{array} \end{aligned}$$

Using the relation between the vec-operator and the trace function, we obtain

$$\begin{aligned} \textrm{WT}({\textbf{X}},\varvec{\Sigma }_0)=\tfrac{n(\nu +p)}{2(\nu +p+2)}\textrm{tr}\left\{ \left[ (\widehat{\varvec{\Sigma }}-\varvec{\Sigma }_0) \widehat{\varvec{\Sigma }}^{-1}\right] ^2\right\} -\tfrac{n}{2(\nu +p+2)}\textrm{tr}^2\left[ (\widehat{\varvec{\Sigma }} -\varvec{\Sigma }_0)\widehat{\varvec{\Sigma }}^{-1}\right] . \end{aligned}$$

which is the same as stated in the proposition. The limiting distribution follows from e.g. Rao (2005).\(\square \)

Note that when \(\nu \rightarrow \infty \), \(\widehat{\varvec{\Sigma }}\) tends to the MLE of \(\varvec{\Sigma }\) under normality, \({\textbf{S}}\), and WT converges to the corresponding expression of the WT statistic under normality

$$\begin{aligned} \textrm{WT}({\textbf{X}},\varvec{\Sigma }_0)\ \rightarrow \ \tfrac{n}{2}\textrm{tr}\left[ \left( {\textbf{I}}_p-\varvec{\Sigma }_0{\textbf{S}}^{-1}\right) ^2\right] ; \end{aligned}$$

cf. Kollo et al. (2016).

In addition, let us consider the test statistic \(\textrm{WT}^*\) associated with WT in such a way that in the information matrix in Definition 6, \(\widehat{\varvec{\theta }}\) is replaced by \(\varvec{\theta }_0\). Then, using the same arguments as in the proof of Proposition 3, we obtain

$$\begin{aligned} \textrm{WT}^*({\textbf{X}},\varvec{\Sigma }_0)= \tfrac{n}{2(\nu +p+2)} \left\{ (\nu +p)\cdot \textrm{tr}\left[ \left( {\textbf{I}}_p-\widehat{\varvec{\Sigma }}\varvec{\Sigma }_0^{-1}\right) ^2\right] - \textrm{tr}^2\left( {\textbf{I}}_p-\widehat{\varvec{\Sigma }}\varvec{\Sigma }_0^{-1}\right) \right\} , \end{aligned}$$

which is simply WT with the roles of \(\widehat{\varvec{\Sigma }}\) and \(\varvec{\Sigma }_0\) exchanged. In Sect. 6 we show using Monte Carlo simulations that under the null hypothesis and increasing sample size (\(n\rightarrow \infty \)), the distribution of \(\textrm{WT}^*\) tends to the chi-square distribution with \(p(p+1)/2\) degrees of freedom.

Finally note that when \(\nu \rightarrow \infty \), i.e., when the multivariate t-distribution converges to multivariate normal, \(\textrm{WT}^*\) becomes the \(\textrm{RST}\).

6 Simulations

In the simulation studies we examine the convergence to the asymptotic chi-square distributions of the test statistics given in Propositions 13 as well as the \(\textrm{WT}^*\) statistic. We are interested in several problems:

  1. 1.

    How does the speed of convergence depend on the number of degrees of freedom \(\nu \) and the sample size n?

  2. 2.

    How does the behavior of the statistics change when \(\nu \) is growing?

  3. 3.

    What happens when we replace the MLEs of the \(t_{p,\nu }(\varvec{\mu },\varvec{\Sigma })\)-distribution by the corresponding MLEs of the normal distribution \(N_{p}(\varvec{\mu },\varvec{\Sigma })\)? From a practical point of view it is important to know whether we need to calculate MLEs numerically for \(\varvec{\mu }\) and \(\varvec{\Sigma }\) for a \(t_{p,\nu }(\varvec{\mu },\varvec{\Sigma })\)-distribution or whether we can just plug in the sample mean \(\overline{{\textbf{x}}}\) and the sample covariance matrix \({{\textbf{S}}}\) instead.

  4. 4.

    Which of the derived statistics, LRT, RST, WT or \(\textrm{WT}^*\), behaves best and can be recommended for use in data analysis?

  5. 5.

    What is the empirical type I error of the considered statistics?

  6. 6.

    Is it possible to indicate the test statistic with the highest power?

Note that a linear transformation of a multivariate \(t_{p,\nu }\)-distributed vector, \(\varvec{\Sigma }_0^{-1/2}{\textbf{x}}\), where \(\varvec{\Sigma }_0^{-1/2}\varvec{\Sigma }_0^{-1/2}=\varvec{\Sigma }_0^{-1}\) is a known nonsingular positive definite (p.d.) matrix, is still multivariate \(t_{p,\nu }\)-distributed with the same number of degrees of freedom, location parameter \(\varvec{\Sigma }_0^{-1/2}\varvec{\mu }\) and scale matrix \(\varvec{\Sigma }_0^{-1/2}\varvec{\Sigma }\varvec{\Sigma }_0^{-1/2}\); cf. Kotz and Nadarajah (2004). Such a transformation allows us to simplify the null hypothesis and test \(\textrm{H}_0:\,\varvec{\Sigma }={\textbf{I}}_p\) instead of \(\textrm{H}_0:\,\varvec{\Sigma }=\varvec{\Sigma }_0\). Thus, without loss of generality, in the simulation study we use the simplest possible values of the parameters, \(\varvec{\mu }={\textbf{0}}\) and \(\varvec{\Sigma }={\textbf{I}}_p\). We fix the dimension to \(p=3, 9\), the sample size to \(n=10,25,50\), and the number of degrees of freedom to \(\nu =3,10\); however, for \(p=9\) we increase the smallest sample size to \(n=12\), to avoid possible ill-conditioned matrices which may appear in the algorithm and cause perturbation of the empirical distribution of the test statistics. The number of simulation runs is 10,000. The results are obtained using Mathematica software.

6.1 Convergence of test statistics to the limiting distribution

In the first row of Figs. 1, 2 and 3 we can see the empirical null distribution of LRT, RST, WT and \(\hbox {WT}^*\) and the limiting \(\chi ^2\) distribution for 3-dimensional data generated from a multivariate t-distribution with \(\nu =3\) degrees of freedom, and, respectively, sample sizes \(n=10,25,50\). We can see that the distribution of RST best matches the limiting distribution, while the fit of LRT and \(\hbox {WT}^*\) is somewhat worse, especially when the 95th quantiles are compared; cf. Table 1. In addition it can be observed that the convergence of the distribution of WT is the slowest, and even for \(n=50\) the 95th quantile is far from the corresponding chi-square quantile. It should also be noted that the quantile values of WT statistics are much higher than the values of other test statistics. This fact can be observed in the graphs as well as in Table 1. The same conclusions can be drawn from Figs. 4, 5 and 6, where respective distributions for \(\nu =10\) are presented.

Fig. 1
figure 1

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =p=3\), \(n=10\), with limiting \(\chi ^2_{6}\)

Fig. 2
figure 2

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =p=3\), \(n=25\), with limiting \(\chi ^2_{6}\)

Since the MLEs under a \(t_{p,\nu }\)-distribution are not available in explicit form, we also performed simulations of the distributions of all four test statistics when the MLEs are replaced by MLEs of \(\varvec{\mu }\) and \(\varvec{\Sigma }\) coming from the normal distribution \(N_p(\varvec{\mu },\varvec{\Sigma })\). If the solutions of (5), \(\widehat{\varvec{\mu }}\) and \(\widehat{\varvec{\Sigma }}\), are replaced by \(\overline{{\textbf{x}}}\) and \({\textbf{S}}\), respectively, these modified test statistics will be denoted accordingly by MLRT, MRST, MWT and \(\hbox {MWT}^*\). The empirical null distributions of these modified test statistics are presented in the second rows of Figs. 1, 2 and 3 for \(\nu =3\), and Figs. 4, 5 and 6 for \(\nu =10\). In all these figures and in Table 1 we see that in all considered cases the MRST distribution does not differ significantly from the distribution of the original RST. Moreover, for larger \(\nu \) the differences are almost not perceptible. This phenomenon does not hold in the case of MLRT, as increasing sample size destroys the convergence to the chi-square distribution (higher sample size causes a shift to the left). Surprisingly, with (M)WT the convergence improved in most of the cases, while the opposite occurred with (M)\(\hbox {WT}^*\). Observe, however, that for increasing \(\nu \) the convergence of the distributions of modified test statistics improves compared with \(\nu =3\), since the \(t_{p,\nu }\)-distribution is closer to normal. The discrepancy between normal and \(t_{p,\nu }\) distributions can be measured by, for example, negentropy (cf. Osorio et al. (2023)), which for fixed degrees of freedom can be calculated as

$$\begin{aligned} H(p,\nu )=\tfrac{p}{2}(1+\ln (2\pi ))+\ln K_p(\nu )-\tfrac{\nu +p}{2}(\psi (\tfrac{\nu +p}{2})-\psi (\tfrac{\nu }{2})), \end{aligned}$$
Fig. 3
figure 3

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =p=3\), \(n=50\), with limiting \(\chi ^2_{6}\)

Fig. 4
figure 4

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =10\), \(p=3\), \(n=10\), with limiting \(\chi ^2_{6}\)

Fig. 5
figure 5

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =10\), \(p=3\), \(n=25\), with limiting \(\chi ^2_{6}\)

where \(K_p(\nu )=((\nu -2)\pi )^{(-p/2)}\Gamma ((\nu +p)/2)/\Gamma (\nu /2)\) and \(\psi (z)\) is the digamma function. It can be seen in Fig. 13 that for \(\nu =10\) and \(p=3\) the negentropy is already very low, around 0.041, while for \(p=9\) it is around 0.20.

Figures 7, 8, 9, 10, 11 and 12 present the empirical null distributions of the considered statistics for \(p=9\) with the limiting \(\chi ^2_{45}\)-distribution. In addition, in Table 2 the 95th empirical quantiles of the respective distributions are given. It can be seen that the null distribution of RST still matches well the limiting distribution even for very small sample sizes. The convergence of the remaining distributions is now slower, and larger differences between them can be observed; in this case LRT outperforms \(\hbox {WT}^*\), and the values of WT are huge, especially for small sample sizes. The conclusions concerning modified versions of the test statistics are the same as in the case \(p=3\); still, the MRST distribution outperforms all other modified test statistics.

Fig. 6
figure 6

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =10\), \(p=3\), \(n=50\), with limiting \(\chi ^2_{6}\)

Table 1 The 95th quantiles of the empirical null distributions of LRT, RST, WT, \(\hbox {WT}^*\) and their modified versions for \(\nu =3,10\), \(p=3\) and \(n=10,25,50\), as well as the 95th quantile of \(\chi ^2_{6}\)
Fig. 7
figure 7

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =3\), \(p=9\), \(n=12\), with limiting \(\chi ^2_{45}\)

Fig. 8
figure 8

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =3\), \(p=9\), \(n=25\), with limiting \(\chi ^2_{45}\)

Fig. 9
figure 9

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =3\), \(p=9\), \(n=50\), with limiting \(\chi ^2_{45}\)

Fig. 10
figure 10

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =10\), \(p=9\), \(n=12\), with limiting \(\chi ^2_{45}\)

Fig. 11
figure 11

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =10\), \(p=9\), \(n=25\), with limiting \(\chi ^2_{45}\)

Fig. 12
figure 12

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (first row) as well as MLRT, MRST, MWT and \(\hbox {MWT}^*\) (second row) for \(\nu =10\), \(p=9\), \(n=50\), with limiting \(\chi ^2_{45}\)

Table 2 The 95th quantiles of the empirical null distributions of LRT, RST, WT, \(\hbox {WT}^*\) and their modified versions for \(\nu =3,10\), \(p=9\) and \(n=12,25,50\), as well as the 95th quantile of \(\chi ^2_{45}\)
Fig. 13
figure 13

Values of negentropy function between normal and t-distribution for \(p=3\) and \(p=9\)

Summing up, we can formulate the following conclusions:

  • for LRT, RST, WT and \(\hbox {WT}^*\):

    • the convergence of RST to the limiting chi-square distribution is quicker than with the remaining test statistics; moreover, the distribution of WT does not fit to the limiting distribution, even for large sample size;

    • usually there is no significant difference between the empirical distributions of all four test statistics for \(\nu =3\) and \(\nu =10\);

  • for MLRT, MRST, MWT and \(\hbox {MWT}^*\):

    • except for MRST, the distributions of the modified test statistics do not fit well to the theoretical chi-square distribution; moreover, for MLRT and MWT the fit becomes worse with increasing sample size, and \(\hbox {MWT}^*\) produces extremely high values, especially for small numbers of degrees of freedom;

    • comparing MWT and WT, it seems that the replacement of the original MLEs by the MLEs of parameters of a normal distribution improves the fit of the test statistic distribution to the theoretical chi-square; this is not the case for the remaining pairs of statistics.

It is also worth noting that similar conclusions were reached by Kollo et al. (2016), where multivariate normality of the distribution of the observation matrix was assumed, i.e., the convergence of RST to the limiting chi-square distribution is quicker than with LRT and WT.

6.2 Type I error

In this section we start with a comparison of the convergence of the empirical Type I errors to the nominal significance level \(\alpha =0.05\). To reject the true null hypothesis we used the quantiles of the limiting chi-square distribution. The results are given in Table 3.

Table 3 The empirical Type I error for LRT, RST, WT and \(\hbox {WT}^*\) for \(\nu =3,10\), \(p=3,9\) and various values of n

We can observe that the empirical Type I errors converge to the nominal significance level with increasing sample size for all statistics; however, the convergence of RST is very quick and, even for small sample size, the Type I error is close to 0.05. We may also notice that in this comparison LRT usually outperforms \(\hbox {WT}^*\), except for small sample sizes, especially when \(\nu =10\) and p is growing. Finally, note that the convergence of the type I error of the WT statistic is extremely slow, and even a sample size of \(n=500\) is not enough to reach the nominal significance level. This behavior follows from the poor convergence of the distribution of WT to the limiting distribution, which was indicated in Sect. 6.1.

6.3 Power comparison

In this section we use a Monte Carlo simulation study to examine the behavior of the power functions of RST, LRT, WT and \(\hbox {WT}^*\) with respect to the discrepancy between the null and alternative hypotheses in (7), and with respect to the sample size. As a measure of discrepancy between two distributions with different covariance matrices, we use Stein’s loss function (cf. Stein (1956)) in the form

$$\begin{aligned} \zeta (\varvec{\Sigma },\varvec{\Sigma }_0) =\textrm{tr}\left( \varvec{\Sigma }^{-1} \varvec{\Sigma }_0 \right) -\ln \Big |\varvec{\Sigma }^{-1} \varvec{\Sigma }_0 \Big |- \ p, \end{aligned}$$

where \(\varvec{\Sigma }\) is the scale matrix under the alternative hypothesis and \(\varvec{\Sigma }_0\) is the scale matrix under the null hypothesis. Since for the simulations we assumed \(\varvec{\Sigma }_0={\textbf{I}}_p\) in (7), the above function reduces to

$$\begin{aligned} \zeta (\varvec{\Sigma }) =\textrm{tr}\varvec{\Sigma }^{-1} -\ln |\varvec{\Sigma }^{-1} |- \ p. \end{aligned}$$
(17)

Observing that Stein’s loss function is not upper bounded, we use as a discrepancy measure \(\eta (\varvec{\Sigma }) = 1 - 1/(1-\zeta (\varvec{\Sigma }))\), to restrict the possible discrepancy to the interval [0,1).

In power comparison we first set the parameters of the experiment as \(p=3\), \(\nu =3,10\) and \(n =10,25,50\). We generate 100 p.d. matrices \(\varvec{\Sigma }\), for which the discrepancies \(\eta (\varvec{\Sigma })\) are computed. Note, that since \(\eta (\varvec{\Sigma })\) is a function of \(\varvec{\Sigma }^{-1}\), and since obviously \(\varvec{\Sigma }_0^{-1}=\varvec{\Sigma }_0\) is diagonal matrix, we randomly choose tridiagonal Toeplitz matrices with positive entries on the diagonal, bigger than the off-diagonal elements, to guarantee positive definiteness, and we obtain \(\varvec{\Sigma }\) by inverting generated matrices. Then, for each \(\varvec{\Sigma }\) we generate 10,000 observation matrices from \(t_{p,\nu }(\varvec{\mu },\varvec{\Sigma })\), and for every generated matrix we test the hypothesis (7) using the quantiles of the empirical null distribution of the relevant statistic. In all comparisons the significance level 0.05 is used. In this way we obtain 100 values of the power of each test, computed as the ratio between the number of rejected null hypotheses and the number of simulation runs (10,000). The results are presented in Figs. 14 and 15. Since there are no significant differences between the powers of tests when \(\nu =3\) and \(\nu =10\), we repeat the above procedure for \(p=9\) only for \(\nu =3\). The results for \(n=12,25,50\) are presented in Fig. 16.

In all these graphs the power shows an upward trend when the discrepancy increases. Comparing LRT and RST, slightly higher deviations between powers are noted for RST than for LRT. A similar phenomenon was observed by Filipiak et al. (2024), where various discrepancy measures were studied in the context of testing separability under doubly multivariate models. Moreover, in Fig. 17 RST has lower power than LRT, especially for small sample sizes; nevertheless, there are still alternatives for which the power of RST exceeds the power of LRT.

In the case of WT and \(\hbox {WT}^*\) it is noted that for two equally distant matrices \(\varvec{\Sigma }\) the power differs significantly, and is often below the nominal significance level, even for large samples. This means that both tests are biased. A similar observation for testing of independence under a block compound symmetry structure in a doubly multivariate normal model was made in Filipiak et al. (2023). It should also be mentioned that the power of WT increases very slowly with the discrepancy, and for small sample sizes it is below 0.5 even if the discrepancy becomes large.

Fig. 14
figure 14

Empirical power of LRT, RST, WT and \(\hbox {WT}^*\) for \(\nu =3\), \(p=3\), \(n=10,25,50\) (in columns)

Fig. 15
figure 15

Empirical power of LRT, RST, WT and \(\hbox {WT}^*\) for \(\nu =10\), \(p=3\), \(n=10,25,50\) (in columns)

Fig. 16
figure 16

Empirical power of LRT, RST, WT and \(\hbox {WT}^*\) for \(\nu =3\), \(p=9\), \(n=12,25,50\) (in columns)

Fig. 17
figure 17

Empirical power of LRT (blue) and RST (red) for \(p=3\), \(\nu =3\) (first row), \(p=3\), \(\nu =10\) (second row), \(p=9\), \(\nu =3\) (third row) for various values of sample size

7 Test statistics under unknown degrees of freedom

Following (McLachlan and Krishnan 1997, Sect. 5.8) to estimate \(\nu \) in multivariate t distribution it is enough to compute \(\widehat{\varvec{\mu }}\) and \(\widehat{\varvec{\Sigma }}\) as the solution of (6), with \(\nu \) replaced in each step by \(\nu ^{(k)}\) obtained as the solution of

$$\begin{aligned} \begin{array}{l} \hspace{-.3cm}\Psi \left( \tfrac{\nu }{2}\right) -\ln \tfrac{\nu }{2}=\\ =1+\tfrac{1}{n} \sum \limits _{i=1}^n \left[ \ln \frac{\nu ^{(k-1)}+p}{\nu ^{(k-1)}+\delta _i^{(k)}} -\frac{\nu ^{(k-1)}+p}{\nu ^{(k-1)}+\delta _i^{(k)}}\right] +\Psi \left( \frac{\nu ^{(k-1)}+p}{2}\right) -\ln \frac{\nu ^{(k-1)}+p}{2} \end{array} \end{aligned}$$
(18)

with \(\Psi (\cdot )\) being digamma function and \(\delta _i^{(k)}=({\textbf{x}}_i-\varvec{\mu }^{(k)})'\varvec{\Sigma }^{(k)^{-1}}({\textbf{x}}_i-\varvec{\mu }^{(k)})\). Similarly, the ML estimator of \(\nu \) under null hypothesis is simply the solution of (8) with \(\nu \) replaced by \(\nu ^{(k)}\) obtained as the solution of (18).

Denoting by \(\widehat{\varvec{\mu }}\), \(\widehat{\varvec{\Sigma }}\) and \(\widehat{\nu }\) the ML estimators under \(\textrm{H}_1\) and by \(\widehat{\varvec{\mu }}_0\), and \(\widehat{\nu }_0\) respective ML estimators under \(\textrm{H}_0\), we can formulate the following test statistics for testing (7) with no constraints imposed on \(\varvec{\mu }\):

  1. (a)
    $$\begin{aligned} \begin{array}{rcl} \textrm{LRT}({\textbf{X}}, \varvec{\Sigma }_0) &{}=&{} n\left[ p\ln \frac{{\widehat{v}}_0}{\widehat{\nu }} +2\ln \frac{\Gamma (\frac{\widehat{\nu }+p}{2})\Gamma (\frac{\widehat{\nu }_0}{2})}{\Gamma (\frac{\widehat{\nu }_0+p}{2})\Gamma (\frac{\widehat{\nu }}{2})} + \ln \frac{|\varvec{\Sigma }_0|}{|\widehat{\varvec{\Sigma }}|} \right] \\ &{} &{} \displaystyle + (\widehat{\nu }_0+p)\displaystyle \sum _{i=1}^n \ln \left[ 1+\tfrac{1}{\widehat{\nu }_0} ({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)^\prime \varvec{\Sigma }_0^{-1}({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)\right] \\ &{}&{} \displaystyle -(\widehat{\nu }+p)\sum _{i=1}^n \ln \left[ 1+\tfrac{1}{\widehat{\nu }} ({\textbf{x}}_i-\widehat{\varvec{\mu }})^\prime \widehat{\varvec{\Sigma }}^{-1}({\textbf{x}}_i-\widehat{\varvec{\mu }})\right] , \end{array} \end{aligned}$$
  2. (b)
    $$\begin{aligned} \begin{array}{rcl} \textrm{RST}({\textbf{X}},\varvec{\Sigma }_0)= & {} \frac{n(\widehat{\nu }_0+p+2)}{2\widehat{\nu }_0(\widehat{\nu }_0+p)} \left\{ \widehat{\nu }_0\cdot \textrm{tr}[({\textbf{V}}\varvec{\Sigma }_0^{-1})^2]+\textrm{tr}^2({\textbf{V}}\varvec{\Sigma }_0^{-1}) \right\} \end{array} \end{aligned}$$

    with

    $$\begin{aligned} {\textbf{V}}={\textbf{V}}({\textbf{X}},\varvec{\Sigma }_0)= \varvec{\Sigma }_0-\tfrac{\widehat{\nu }_0+p}{n\widehat{\nu }_0}\sum \limits _{i=1}^n \frac{({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)^\prime }{1+\frac{1}{\widehat{\nu }_0}({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)^\prime \varvec{\Sigma }_0^{-1}({\textbf{x}}_i-\widehat{\varvec{\mu }}_0)}, \end{aligned}$$
  3. (c)
    $$\begin{aligned} \textrm{WT}({\textbf{X}},\varvec{\Sigma }_0)= \tfrac{n}{2(\widehat{\nu }+p+2)} \left\{ (\widehat{\nu }+p)\cdot \textrm{tr}\left[ \left( {\textbf{I}}_p-\varvec{\Sigma }_0\widehat{\varvec{\Sigma }}^{-1}\right) ^2\right] - \textrm{tr}^2\left( {\textbf{I}}_p-\varvec{\Sigma }_0\widehat{\varvec{\Sigma }}^{-1}\right) \right\} , \end{aligned}$$
  4. (d)
    $$\begin{aligned} \textrm{WT}^*({\textbf{X}},\varvec{\Sigma }_0)= \tfrac{n}{2(\widehat{\nu }_0+p+2)} \left\{ (\widehat{\nu }_0+p)\cdot \textrm{tr}\left[ \left( {\textbf{I}}_p-\widehat{\varvec{\Sigma }}\varvec{\Sigma }_0^{-1}\right) ^2\right] - \textrm{tr}^2\left( {\textbf{I}}_p-\widehat{\varvec{\Sigma }}\varvec{\Sigma }_0^{-1}\right) \right\} , \end{aligned}$$

where \({\textbf{X}}= ({\textbf{x}}_1, \dots , {\textbf{x}}_n)\) is a random sample from \(t_{p,\nu }(\varvec{\mu },\varvec{\Sigma })\), \(\nu >2\) being unknown.

Similarly to the previous cases, due to Wilks (1938), when \(n\rightarrow \infty \) and \(\textrm{H}_{0} \) holds, the distributions of \(\textrm{LRT}({\textbf{X}},\varvec{\Sigma }_0)\), \(\textrm{RST}({\textbf{X}},\varvec{\Sigma }_0)\) and \(\textrm{WT}({\textbf{X}},\varvec{\Sigma }_0)\) tend to the chi-square distribution with \(p(p+1)/2\) degrees of freedom. The same convergence holds for \(\textrm{WT}^*({\textbf{X}},\varvec{\Sigma }_0)\). The distributions of the above test statistics for data generated from \(t_{3,3}({\textbf{0}},{\textbf{I}}_3)\), together with limiting chi-square distribution, are presented in Fig. 18. Comparing to corresponding distributions under known \(\nu \), similar behavior of the test statistics can be observed.

Finally note, that all simulations have been performed using Mathematica, however, the algorithm for ML estimators of multivariate t distribution with unknown degrees of freedom is also available in fitHeavyTail package of R.

Fig. 18
figure 18

Empirical null distributions of LRT, RST, WT and \(\hbox {WT}^*\) (in columns) for \(p=3\), \(n=10,25,50\) (in rows), along with limiting \(\chi ^2_6\), under the assumption of unkonwn \(\nu \)

8 Discussion and conclusions

In this paper we have determined the LRT, RST, WT and \(\hbox {WT}^*\) statistics for testing the covariance structure under the null hypothesis \(\textrm{H}_{0}:\varvec{\Sigma }=\varvec{\Sigma }_0\) for a multivariate \(t_{p,\nu }\)-distribution. Our main interest was focused on the situation when the number of degrees of freedom \(\nu \) is as small as possible (\(\nu >2\) to guarantee existence of the covariance matrix) to examine possible differences between the ML estimates of the \(t_{p,\nu }\)-distribution and the corresponding normal distribution.

In the definition of the density of the \(t_{p,\nu }\)-distribution we followed the classical definition where \(\varvec{\Sigma }\) is the scale matrix (cf. Kotz and Nadarajah (2004)). In some papers the density function of the \(t_{p,\nu }\)-distribution is defined in a different form, where \(\varvec{\Sigma }\) is the covariance matrix (Sutradhar (1993), Osorio et al. (2023), for example). Note, however, that when testing covariance structures the behavior of the test statistics depends on the scale matrix \(\varvec{\Sigma }\), and the univariate multiplier in the expression for the covariance matrix in our definition does not influence the convergence properties of the statistics. Note also that for a fixed number of degrees of freedom, there is no need to estimate \(\nu \), and thus the reparameterization \(\eta =1/\nu \) does not influence our results at all, and for unknown \(\nu \) we followed the EM algorithm given in McLachlan and Krishnan (1997), where direct estimation of degrees of freedom is presented.

From the simulation studies concerning the convergence of test statistics to the limiting distribution, we note that all four test statistics converge to the limiting chi-square distribution; however, RST outperforms all the remaining tests. Moreover, note that the forms of all considered test statistics for a \(t_{p,\nu }\)-distribution differ significantly from the corresponding expressions in the case of a multivariate normal distribution. This is caused by the different form of the likelihood function and also the different MLEs of unknown parameters. Thus, one has to be careful with assumptions about the population distribution when testing the structure of the covariance matrix. Nevertheless, apart from the difference between RST and MRST, their empirical distributions are quite similar and close to the theoretical distribution even for small sample sizes. Thus, even if one was mistaken with regard to the density of the observation matrix, MRST can be seen as another test, more conservative than RST. Indeed, MRST can be found to be a special case of Neyman’s score test statistic (Neyman 1959; Sutradhar 1993). Taking into account the quick convergence of the empirical type I error of RST to the nominal significance level, we would recommend it to be used by practitioners, pointing out that the power of RST does not seem to be significantly lower than the power of LRT.

In our considerations presented in Sect. 6 we assumed a fixed number of degrees of freedom; however, simulation studies show that increasing the degrees of freedom does not have much influence on the speed of convergence to the limiting chi-square distribution, or on power. This allows us to conclude that our findings remain valid even for unknown \(\nu \).

Finally, recall that we assumed a fixed dimension of the data, p, while the sample size tends to infinity. Studying the behavior of the test statistics under a high-dimensional setup, when p and n both tend to infinity, may also be of interest. Note that this problem has been addressed by many authors, usually under normality (see, e.g. Bai and Silverstein 2004; Ledoit and Wolf 2002; Srivastava 2005; Yao et al. 2015), but some references can also be found for non-Gaussian variables (Bai et al. 2009; Jiang 2016, for example). Usually in such cases the central limit theorem is involved and additional assumptions related to the moments of the distributions must be taken into account. Nevertheless, the problem of growing dimension under the multivariate t-distribution will be the topic of future research.