6.1. Introduction

It is assumed that the readers are familiar with the concept of testing statistical hypotheses on the parameters of a real scalar normal density or independent real scalar normal densities. Those who are not or require a refresher may consult the textbook: Mathai and Haubold (2017) on basic “Probability and Statistics” [De Gruyter, Germany, 2017, free download]. Initially, we will only employ the likelihood ratio criterion for testing hypotheses on the parameters of one or more real multivariate Gaussian (or normal) distributions. All of our tests will be based on a simple random sample of size n from a p-variate nonsingular Gaussian distribution, that is, the p × 1 vectors X 1, …, X n constituting the sample are iid (independently and identically distributed) as X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, when a single real Gaussian population is involved. The corresponding test criterion for the complex Gaussian case will also be mentioned in each section.

In this chapter, we will utilize the following notations. Lower-case letters such as x, y will be used to denote real scalar mathematical or random variables. No distinction will be made between mathematical and random variables. Capital letters such as X, Y  will denote real vector/matrix-variate variables, whether mathematical or random. A tilde placed on a letter as for instance \(\tilde {x}, \tilde {y}, \tilde {X}\) and \( \tilde {Y}\) will indicate that the variables are in the complex domain. No tilde will be used for constant matrices unless the point is to be stressed that the matrix concerned is in the complex domain. The other notations will be identical to those utilized in the previous chapters.

First, we consider certain problems related to testing hypotheses on the parameters of a p-variate real Gaussian population. Only the likelihood ratio criterion, also referred to as λ-criterion, will be utilized. Let L denote the joint density of the sample values in a simple random sample of size n, namely, X 1, …, X n, which are iid N p(μ, Σ), Σ > O. Then, as was previously established,

$$\displaystyle \begin{aligned}L=\prod_{j=1}^n\frac{{\mathrm{e}}^{-\frac{1}{2}(X_j-\mu)'\varSigma^{-1}(X_j-\mu)}}{(2\pi)^{\frac{p}{2}}|\varSigma|{}^{\frac{1}{2}}}=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{n}{2}(\bar{X}-\mu)'\varSigma^{-1}(\bar{X}-\mu)}}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}},{}\end{aligned} $$
(6.1.1)

where \(S=\sum _{j=1}^n(X_j-\bar {X})(X_j-\bar {X})'\) is the sample sum of products matrix and \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) is the sample average, n being the sample size. As well, we have already determined that the maximum likelihood estimators (MLE’s) of μ and Σ are \(\hat {\mu }=\bar {X}\) and \(\hat {\varSigma }=\frac {1}{n}S,\) the sample covariance matrix. Consider the parameter space

$$\displaystyle \begin{aligned}\Omega=\{(\mu,~\varSigma)|\, \varSigma>O,~\mu'=(\mu_1,\ldots,\mu_p),~-\infty<\mu_j<\infty,~ j=1,\ldots,p\}. \end{aligned}$$

The maximum value of L within Ω is obtained by substituting the MLE’s of the parameters into L, and since \((\bar {X}-\hat {\mu })=(\bar {X}-\bar {X})=O\) and \({\mathrm{tr}}(\hat {\varSigma }^{-1}S)={\mathrm{tr}}(nI_p)=np,\)

$$\displaystyle \begin{aligned}\max_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}|\frac{1}{n}S|{}^{\frac{n}{2}}}=\frac{{\mathrm{e}}^{-\frac{np}{2}}n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}|S|{}^{\frac{n}{2}}}\,.{}\end{aligned} $$
(6.1.2)

Under any given hypothesis on μ or Σ, the parameter space is reduced to a subspace ω in Ω or ω ⊂ Ω. For example, if H o : μ = μ o where μ o is a given vector, then the parameter space under this null hypothesis reduces to ω = {(μ, Σ)| μ = μ o, Σ > O}⊂ Ω, “null hypothesis” being a technical term used to refer to the hypothesis being tested. The alternative hypothesis against which the null hypothesis is tested, is usually denoted by H 1. If μ = μ o specifies H o, then a natural alternative is H 1 : μμ o. One of two things can happen when considering the maximum of the likelihood function under H o. The overall maximum may occur in ω or it may be attained outside of ω but inside Ω. If the null hypothesis H o is actually true, then ω and Ω will coincide and the maxima in ω and in Ω will agree. If there are several local maxima, then the overall maximum or supremum is taken. The λ-criterion is defined as follows:

$$\displaystyle \begin{aligned}\lambda=\frac{{\mathrm{sup}}_{\omega}L}{{\mathrm{sup}}_{\Omega}L},~~ 0<\lambda\le 1.{}\end{aligned} $$
(6.1.3)

If the null hypothesis is true, then λ = 1. Accordingly, an observed value of λ that is close to 0 in a testing situation indicates that the null hypothesis H o is incorrect and should then be rejected. Hence, the test criterion under the likelihood ratio test is to “reject H o for 0 < λ ≤ λ o”, that is, for small values of λ, so that, under H o, the coverage probability over this interval is equal to the significance level α or the probability of rejecting H o when H o is true, that is, Pr{0 < λ ≤ λ o | H o} = α for a pre-assigned α, which is also known as the size of the critical region or the size of the type-1 error. However, rejecting H o when it is not actually true or when the alternative H 1 is true is a correct decision whose probability is known as the power of the test and written as 1 − β where β is the probability of committing a type-2 error or the error of not rejecting H o when H o is not true. Thus we have

$$\displaystyle \begin{aligned}Pr\{0<\lambda\le \lambda_o\,|\,H_o\}=\alpha \mbox{ and } Pr\{0<\lambda\le \lambda_o\,|\,H_1\}=1-\beta.{}\end{aligned} $$
(6.1.4)

When we preassign α = 0.05, we are allowing a tolerance of 5% for the probability of committing the error of rejecting H o when it is actually true and we say that we have a test at the 5% significance level. Usually, we set α as 0.05 or 0.01. Alternatively, we can allow α to vary and calculate what is known as the p-value when carrying out a test. Such is the principle underlying the likelihood ratio test, the resulting test criterion being referred to as the λ-criterion.

In the complex case, a tilde will be placed above λ and L, (6.1.3) and (6.1.4) remaining essentially the same:

$$\displaystyle \begin{aligned}\tilde{\lambda}=\frac{{\mathrm{sup}}_{\omega}\tilde{L}}{{\mathrm{sup}}_{\Omega}\tilde{L}},~ 0<|\tilde{\lambda}|\le 1,\end{aligned}$$
(6.1a.1)

and

$$\displaystyle \begin{aligned}Pr\{0<|\tilde{\lambda}|\le \lambda_o\,|\, H_o\}=\alpha,~ Pr\{0<|\tilde{\lambda}|\le \lambda_o\,|\, H_1\}=1-\beta\end{aligned}$$
(6.1a.2)

where α is the size or significance level of the test and 1 − β, the power of the test.

6.2. Testing H o : μ = μ 0 (Given) When Σ is Known, the Real N p(μ, Σ) Case

When Σ is known, the only parameter to estimate is μ, its MLE being \(\bar {X}\). Hence, the maximum in Ω is the following:

$$\displaystyle \begin{aligned}{\mathrm{sup}}_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)}}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}.{}\end{aligned} $$
(6.2.1)

In this case, μ is also specified under the null hypothesis H o, so that there is no parameter to estimate. Accordingly,

$$\displaystyle \begin{aligned} {\mathrm{sup}}_{\omega}L&=\frac{{\mathrm{e}}^{-\frac{1}{2}\sum_{j=1}^n(X_j-\mu_o)'\varSigma^{-1}(X_j-\mu_o)}}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}\\ &=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{n}{2}(\bar{X}-\mu_o)'\varSigma^{-1}(\bar{X}-\mu_o)}}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}. \end{aligned} $$
(6.2.2)

Thus,

$$\displaystyle \begin{aligned}\lambda=\frac{{\mathrm{sup}}_{\omega}L}{{\mathrm{sup}}_{\Omega}L}={\mathrm{e}}^{-\frac{n}{2}(\bar{X}-\mu_o)'\varSigma^{-1}(\bar{X}-\mu_o)},{}\end{aligned} $$
(6.2.3)

and small values of λ correspond to large values of \(\frac {n}{2}(\bar {X}-\mu _o)'\varSigma ^{-1}(\bar {X}-\mu _o)\). When X j ∼ N p(μ, Σ), Σ > O, it has already been established that \(\bar {X}\sim N_p(\mu ,~\frac {1}{n}\varSigma ),~\varSigma >O\). As well, \(n(\bar {X}-\mu _o)'\varSigma ^{-1}(\bar {X}-\mu _o)\) is the exponent in a p-variate real normal density under H o, which has already been shown to have a real chisquare distribution with p degrees of freedom or

$$\displaystyle \begin{aligned}n(\bar{X}-\mu_o)'\varSigma^{-1}(\bar{X}-\mu_o)\sim \chi_p^2. \end{aligned}$$

Hence, the test criterion is

$$\displaystyle \begin{aligned}\mbox{ Reject}\ H_o\ \mbox{if }n(\bar{X}-\mu_o)'\varSigma^{-1}(\bar{X}-\mu_o)\ge \chi_{p,\alpha}^2,\mbox{ with } Pr\{\chi_p^2\ge \chi_{p,~\alpha}^2\}=\alpha.{}\end{aligned} $$
(6.2.4)

Under the alternative hypothesis, the distribution of the test statistic is a noncentral chisquare with p degrees of freedom and non-centrality parameter \(\lambda =\frac {n}{2}(\mu -\mu _o)'\varSigma ^{-1}(\mu -\mu _o)\).

Example 6.2.1

For example, suppose that we have a sample of size 5 from a population that has a trivariate normal distribution and let the significance level α be 0.05. Let μ o, the hypothesized mean value vector specified by the null hypothesis, the known covariance matrix Σ, and the five observation vectors X 1, …, X 5 be the following:

the inverse of Σ having been evaluated via elementary transformations. The sample average, \(\frac {1}{5}(X_1+\cdots +X_5)\) denoted by \(\bar {X}\), is

and

For testing H o, the following test statistic has to be evaluated:

As per our criterion, H o should be rejected if \(8\ge \chi ^2_{p,\alpha }\). Since \(\chi ^2_{p,\alpha }=\chi ^2_{3,~0.05}=7.81,\) this critical value being available from a chisquare table, H o : μ = μ o should be rejected at the specified significance level. Moreover, in this case, the p-value is \(Pr\{\chi ^2_3\ge 8\}\approx 0.035,\) which can be evaluated by interpolation from the percentiles provided in a chi-square table or by making use of statistical packages such as R.

6.2.1. Paired variables and linear functions

Let Y 1, …, Y k be p × 1 vectors having their own p-variate distributions which are not known. However, suppose that a certain linear function X = a 1 Y 1 + ⋯ + a k Y k is known to have a p-variate real Gaussian distribution with mean value vector E[X] = μ and covariance matrix Cov(X) = Σ, Σ > O, that is, X = a 1 Y 1 + ⋯ + a k Y k ∼ N p(μ, Σ), Σ > O, where a 1, …, a k are fixed known scalar constants. An example of this type is X = Y 1 − Y 2 where Y 1 consists of measurements on p attributes before subjecting those attributes to a certain process, such as administering a drug to a patient, and Y 2 consists of the measurements on the same attributes after the process is completed. We would like to examine the difference Y 1 − Y 2 to study the effect of the process on these characteristics. If it is reasonable to assume that this difference X = Y 1 − Y 2 is N p(μ, Σ), Σ > O, then we could test hypotheses on E[X] = μ. When Σ is known, the general problem reduces to that discussed in Sect. 6.2. Assuming that we have iid variables on Y 1, …, Y k, we would evaluate the corresponding values of X, which produces iid variables on X, that is, a simple random sample of size n from X = a 1 Y 1 + ⋯ + a k Y k. Thus, when Σ is known, letting \(u=n(\bar {X}-\mu _o)'\varSigma ^{-1}(\bar {X}-\mu _o)\sim \chi _p^2\,\) where \(\bar {X}\) denote the sample average, the test would be carried out as follows at significance level α:

$$\displaystyle \begin{aligned} \mbox{Reject}\ H_o: \mu=\mu_o\ \mbox{(specified) when}\ u\ge \chi_{p,~\alpha}^2,\mbox{ with } Pr\{\chi_p^2\ge \chi_{p,~\alpha}^2\}=\alpha, \end{aligned} $$
(6.2.5)

the non-null distribution of the test statistic u being a non-central chisquare.

Example 6.2.2

Three variables x 1 =  systolic pressure, x 2 =  diastolic pressure and x 3 =  weight are monitored after administering a drug for the reduction of all these p = 3 variables. Suppose that a sample of n = 5 randomly selected individuals are given the medication for one week. The following five pairs of observations on each of the three variables were obtained before and after the administration of the medication:

Let X denote the difference, that is, X is equal to the reading before the medication was administered minus the reading after the medication could take effect. The observation vectors on X are then

In this case, X 1, …, X 5 are observations on iid variables. We are going to assume that these iid variables are coming from a population whose distribution is N 3(μ, Σ), Σ > O, where Σ is known. Let the sample average \(\bar {X}=\frac {1}{5}(X_1+\cdots +X_5)\), the hypothesized mean value vector specified by the null hypothesis H o : μ = μ o, and the known covariance matrix Σ be as follows:

Let us evaluate \(\bar {X}-\mu _o\) and \( n(\bar {X}-\mu _o)'\varSigma ^{-1}(\bar {X}-\mu _o)\) which are needed for testing the hypothesis H o : μ = μ o:

Let us test H o at the significance level α = 0.05. The critical value which can readily be found in a chisquare table is \(\chi ^2_{p,~\alpha }=\chi ^2_{3,~0.05}=7.81\). As per our criterion, we reject H o if \(8.4\ge \chi ^2_{p,~\alpha }\); since 8.4 > 7.81, we reject H o. The p-value in this case is \(Pr\{\chi ^2_p\ge 8.4\}=Pr\{\chi ^2_3\ge 8.4\}\approx 0.04\).

6.2.2. Independent Gaussian populations

Let Y j ∼ N p(μ (j), Σ j), Σ j > O, j = 1, …, k, and let these k populations be independently distributed. Assume that a simple random sample of size n j from Y j is available for j = 1, …, k; then these samples can be represented by the p-vectors Y jq, q = 1, …, n j, which are iid as Y j1, for j = 1, …, k. Consider a given linear function X = a 1 Y 1 + ⋯ + a k Y k where X is p × 1 and the Y j’s are taken in a given order. Let \(U=a_1\bar {Y}_1+\cdots +a_k\bar {Y}_k\) where \(\bar {Y}_j=\frac {1}{n_j}\sum _{q=1}^{n_j}Y_{jq}\) for j = 1, …, k. Then E[U] = a 1 μ (1) + ⋯ + a k μ (k) = μ (say), where a 1, …, a k are given real scalar constants. The covariance matrix in U is \({\mathrm{Cov}}(U)=\frac {a_1^2}{n_1}\varSigma _1+\cdots +\frac {a_k^2}{n_k}\varSigma _k=\frac {1}{n}\varSigma \) (say), where n is a symbol. Consider the problem of testing hypotheses on μ when Σ is known or when a j, Σ j, j = 1, …, k, are known. Let H o : μ = μ o (specified), in the sense μ (j) is a known vector for j = 1, …, k, when Σ is known. Then, under H o, all the parameters are known and the standardized U is observable, the test statistic being

$$\displaystyle \begin{aligned}\sum_{j=1}^ka_jn_j(\bar{Y}_1-\mu_{(j)})'\varSigma_j^{-1}(\bar{Y}_j-\mu_{(j)})\sim\sum_{j=1}^ka_j\chi_p^{2(j)}{}\end{aligned} $$
(6.2.6)

where \(\chi _p^{2(j)},~ j=1,\ldots ,k,\) denote independent chisquares random variables, each having p degrees of freedom. However, since this is a linear function of independent chisquare variables, even the null distribution is complicated. Thus, only the case of two independent populations will be examined.

Consider the problem of testing the hypothesis μ 1 − μ 2 = δ (a given vector) when there are two independent normal populations sharing a common covariance matrix Σ (known). Then U is \(U=\bar {Y}_1-\bar {Y}_2\) with E[U] = μ 1 − μ 2 = δ (given) under H o and \({\mathrm{Cov}}(U)=(\frac {1}{n_1}+\frac {1}{n_2})\varSigma =\frac {n_1+n_2}{n_1n_2}\varSigma ,\) the test statistic, denoted by v, being

$$\displaystyle \begin{aligned} v=\frac{n_1n_2}{n_1+n_2}(U-\delta)'\varSigma^{-1}(U-\delta)=\frac{n_1n_2}{n_1+n_2}(\bar{Y}_1-\bar{Y}_2-\delta)'\varSigma^{-1}(\bar{Y}_1-\bar{Y}_2-\delta) \sim \chi_p^2. \end{aligned} $$
(6.2.7)

The resulting test criterion is

$$\displaystyle \begin{aligned}\mbox{Reject}\ H_o\ \mbox{if the observed value of } v\ge \chi_{p,~\alpha}^2\ \mbox{with}\ Pr\{\chi_p^2\ge \chi_{p,\alpha}^2\}=\alpha.{}\end{aligned} $$
(6.2.8)

Example 6.2.3

Let Y 1 ∼ N 3(μ (1), Σ) and Y 2 ∼ N 3(μ (2), Σ) represent independently distributed normal populations having a known common covariance matrix Σ. The null hypothesis is H o : μ (1) − μ (2) = δ where δ is specified. Denote the observation vectors on Y 1 and Y 2 by Y 1j, j = 1, …, n 1 and Y 2j, j = 1, …, n 2, respectively, and let the sample sizes be n 1 = 4 and n 2 = 5. Let those observation vectors be

and the common covariance matrix Σ be

Let the hypothesized vector under H o : μ (1) − μ (2) = δ be δ′ = (1, 1, 2). In order to test this null hypothesis, the following quantities must be evaluated:

$$\displaystyle \begin{aligned} \bar{Y}_1&=\frac{1}{n_1}(Y_{11}+\cdots+Y_{1n_1})=\frac{1}{4}(Y_{11}+Y_{12}+Y_{13}+Y_{14}),\\ \bar{Y}_2&=\frac{1}{n_2}(Y_{21}+\cdots+Y_{2n_2})=\frac{1}{5}(Y_{21}+\cdots+Y_{25}),\\ U&=\bar{Y}_1-\bar{Y}_2, \ \,v=\frac{n_1n_2}{n_1+n_2}(U-\delta)'\varSigma^{-1}(U-\delta).\end{aligned} $$

They are

Then,

Let us test H o at the significance level α = 0.05. The critical value which is available from a chisquare table is \(\chi ^2_{p,~\alpha }=\chi ^2_{3,~0.05}=7.81\). As per our criterion, we reject H o if \(2.95\ge \chi ^2_{p,~\alpha }\); however, since 2.95 < 7.81, we cannot reject H o. The p-value in this case is \(Pr\{\chi ^2_p\ge 2.95\}=Pr\{\chi ^2_3\ge 2.95\}\approx 0.096,\) which can be determined by interpolation.

6.2a. Testing H o : μ = μ o (given) When Σ is Known, Complex Gaussian Case

The derivation of the λ-criterion in the complex domain is parallel to that provided for the real case. In the parameter space,

$$\displaystyle \begin{aligned}{\mathrm{sup}}_{\Omega}\tilde{L}=\frac{{\mathrm{e}}^{-{\mathrm{tr}}(\varSigma^{-1}\tilde{S})}}{\pi^{np}|{\mathrm{det}}(\varSigma)|{}^n}\end{aligned}$$
(6.2a.1)

and under H o : μ = μ o, a given vector,

$$\displaystyle \begin{aligned}{\mathrm{sup}}_{\omega}\tilde{L}=\frac{{\mathrm{e}}^{-{\mathrm{tr}}(\varSigma^{-1}\tilde{S})-n(\bar{\tilde{X}}-\mu_o)^{*}\varSigma^{-1}(\bar{\tilde{X}}-\mu_o)}}{\pi^{np}|{\mathrm{det}}(\varSigma)|{}^n}.\end{aligned}$$
(6.2a.2)

Accordingly,

$$\displaystyle \begin{aligned}\tilde{\lambda}=\frac{{\mathrm{sup}}_{\omega}\tilde{L}}{{\mathrm{sup}}_{\Omega}\tilde{L}}={\mathrm{e}}^{-n(\bar{\tilde{X}}-\mu_o)^{*}\varSigma^{-1}(\bar{\tilde{X}}-\mu_o)}.\end{aligned}$$
(6.2a.3)

Here as well, small values of \(\tilde {\lambda }\) correspond to large values of \(\tilde {y}\equiv n(\bar {\tilde {X}}-\mu _o)^{*}\varSigma ^{-1}(\bar {\tilde {X}}-\mu _o),\) which has a real gamma distribution with the parameters (α = p, β = 1) or a chisquare distribution with p degrees of freedom in the complex domain as described earlier so that \(2\tilde {y}\) has a real chisquare distribution having 2p degrees of freedom. Thus, a real chisquare table can be utilized for testing the null hypothesis H o, the criterion being

$$\displaystyle \begin{aligned}\mbox{Reject}\ H_o\ \mbox{if }2n(\bar{\tilde{X}}-\mu_o)^{*}\varSigma^{-1}(\bar{\tilde{X}}-\mu_o)\ge \chi_{2p,\alpha}^2,\ \mbox{with} \ Pr\{\chi_{2p}^2\ge \chi_{2p,\alpha}^2\}=\alpha.\end{aligned}$$
(6.2a.4)

The test criteria as well as the decisions are parallel to those obtained for the real case in the situations of paired values and in the case of independent populations. Accordingly, such test criteria and associated decisions will not be further discussed.

Example 6.2a.1

Let p = 2 and the 2 × 1 complex vector \(\tilde {X}\sim \tilde {N}_2(\tilde {\mu },~\tilde {\varSigma }),~ \tilde {\varSigma }=\tilde {\varSigma }^{*}>O\), with \(\tilde {\varSigma }\) assumed to be known. Consider the null hypothesis \(H_o:\tilde {\mu }=\tilde {\mu }_o\) where \(\tilde {\mu }_o\) is specified. Let the known \(\tilde {\varSigma }\) and the specified \(\tilde {\mu }_o\) be the following where \(i=\sqrt {(-1)}\):

Let the general \(\tilde {\mu }\) and general \(\tilde {X}\) be represented as follows for p = 2:

so that, for the given \(\tilde {\varSigma }\),

$$\displaystyle \begin{aligned}{\mathrm{det}}(\tilde{\varSigma})=(2)(3)-(1+i)(1-i)=6-(1^2+1^2)=4={\mathrm{det}}(\tilde{X}^{*})=|{\mathrm{det}}(\tilde{X})|. \end{aligned}$$

The exponent of the general density for p = 2, excluding − 1, is the form \((\tilde {X}-\tilde {\mu })^{*}\tilde {\varSigma }^{-1}(\tilde {X}-\tilde {\mu })\). Further,

$$\displaystyle \begin{aligned}{}[(\tilde{X}-\tilde{\mu})^{*}\tilde{\varSigma}^{-1}(\tilde{X}-\tilde{\mu})]^{*}=(\tilde{X}-\tilde{\mu})^{*}\tilde{\varSigma}^{-1}(\tilde{X}-\tilde{\mu}) \end{aligned}$$

since both \(\tilde {\varSigma }\) and \(\tilde {\varSigma }^{-1}\) are Hermitian. Thus, the exponent, which is 1 × 1, is real and negative definite. The explicit form, excluding − 1, for p = 2 and the given covariance matrix \(\tilde {\varSigma }\), is the following:

$$\displaystyle \begin{aligned} Q&=\frac{1}{4}\{3[(x_1-\mu_1)^2+(y_1-\nu_1^2)]+2[(x_2-\mu_2)^2+(y_2-\nu_2)^2]\\ &\ \ \ \ \ \ \, +2[(x_1-\mu_1)(x_2-\mu_2)+(y_1-\nu_1)(y_2-\nu_2)]\},\end{aligned} $$

and the general density for p = 2 and this \(\tilde {\varSigma }\) is of the following form:

$$\displaystyle \begin{aligned}f(\tilde{X})=\frac{1}{4\pi^2}{\mathrm{e}}^{-Q} \end{aligned}$$

where the Q is as previously given. Let the following be an observed sample of size n = 4 from a \(\tilde {N}_2(\tilde {\mu }_o,~\tilde {\varSigma })\) population whose associated covariance matrix \(\tilde {\varSigma }\) is as previously specified:

Then,

Let us test the stated null hypothesis at the significance level α = 0.05. Since \(\chi ^2_{2p,\,\alpha }=\chi ^2_{4,\,0.05}=9.49\) and 46.5 > 9.49, we reject H o. In this case, the p-value is \(Pr\{\chi ^2_{2p}\ge 46.5\}=Pr\{\chi ^2_4\ge 46.5\}\approx 0\).

6.2.3. Test involving a subvector of a mean value vector when Σ is known

Let the p × 1 vector X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, and the X j’s be independently distributed. Let the joint density of X j, j = 1, …, n, be denoted by L. Then, as was previously established,

$$\displaystyle \begin{aligned}L=\prod_{j=1}^n\frac{{\mathrm{e}}^{-\frac{1}{2}(X_j-\mu)'\varSigma^{-1}(X_j-\mu)}}{(2\pi)^{\frac{p}{2}}|\varSigma|{}^{\frac{1}{2}}}=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{n}{2}(\bar{X}-\mu)'\varSigma^{-1}(\bar{X}-\mu)}}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}} \end{aligned}$$
(i)

where \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) and, letting X = (X 1, …, X n) of dimension p × n and \(\bar {\mathbf {X}}=(\bar {X},\ldots ,\bar {X})\), \(S=(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'\). Let \(\bar {X}\), Σ −1 and μ be partitioned as follows:

where \(\bar {X}^{(1)}\) and μ (1) are r × 1, r < p, and Σ 11 is r × r. Consider the hypothesis \(\mu ^{(1)}=\mu ^{(1)}_o\) (specified) with Σ known. Thus, this hypothesis concerns only a subvector of the mean value vector, the population covariance matrix being assumed known. In the entire parameter space Ω, μ is estimated by \(\bar {X}\) where \(\bar {X}\) is the maximum likelihood estimator (MLE) of μ. The maximum of the likelihood function in the entire parameter space is then

$$\displaystyle \begin{aligned}\max_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)}}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}. \end{aligned}$$
(ii)

Let us now determine the MLE of μ (2), which is the only unknown quantity under the null hypothesis. To this end, we consider the following expansion:

(iii)

Noting that there are only two terms involving μ (2) in (iii), we have

$$\displaystyle \begin{aligned} \frac{\partial}{\partial \mu^{(2)}}\ln L=O&\Rightarrow O-2\varSigma^{22}(\bar{X}^{(2)}-\mu^{(2)})-2\varSigma^{21}(\bar{X}^{(1)}-\mu^{(1)}_o)=O\\ &\Rightarrow \hat{\mu}^{(2)}=\bar{X}^{(2)}+(\varSigma^{22})^{-1}\varSigma^{21}(\bar{X}^{(1)}-\mu^{(1)}_o).\end{aligned} $$

Then, substituting this MLE \(\hat {\mu }^{(2)}\) in the various terms in (iii), we have the following:

$$\displaystyle \begin{aligned} (\bar{X}^{(2)}-\hat{\mu}^{(2)})'\varSigma^{22}(\bar{X}^{(2)}-\hat{\mu}^{(2)})&=(\bar{X}^{(1)}-\mu^{(1)}_o)'\varSigma^{12}(\varSigma^{22})^{-1}\varSigma^{21}(\bar{X}^{(1)}-\mu^{(1)}_o)\\ 2(\bar{X}^{(2)}-\hat{\mu}^{(2)})'\varSigma^{21}(\bar{X}^{(1)}-\mu^{(1)}_o)&=-2(\bar{X}^{(1)}-\mu^{(1)}_o)'\varSigma^{12}(\varSigma^{22})^{-1}\varSigma^{21}(\bar{X}^{(1)}-\mu^{(1)}_o)\Rightarrow\\ (\bar{X}-\mu)'\varSigma^{-1}(\bar{X}-\mu)&=(\bar{X}^{(1)}-\mu^{(1)}_o)'[\varSigma^{11}-\varSigma^{12}(\varSigma^{22})^{-1}\varSigma^{21}](\bar{X}^{(1)}-\mu^{(1)}_o)\\ &=(\bar{X}^{(1)}-\mu^{(1)}_o)'\varSigma_{11}^{-1}(\bar{X}^{(1)}-\mu^{(1)}_o),\end{aligned} $$

since, as established in Sect. 1.3, \(\varSigma _{11}^{-1}=\varSigma ^{11}-\varSigma ^{12}(\varSigma ^{22})^{-1}\varSigma ^{21}\). Thus, the maximum of L under the null hypothesis is given by

$$\displaystyle \begin{aligned}\max_{H_o}L=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{n}{2}(\bar{X}^{(1)}-\mu^{(1)}_o)'\varSigma_{11}^{-1}(\bar{X}^{(1)}-\mu^{(1)}_o)}} {(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}, \end{aligned}$$

and the λ-criterion is then

$$\displaystyle \begin{aligned}\lambda=\frac{\max_{H_o}L}{\max_{\Omega}L}={\mathrm{e}}^{-\frac{n}{2}(\bar{X}^{(1)}-\mu^{(1)}_o)'\varSigma_{11}^{-1}(\bar{X}^{(1)}-\mu^{(1)}_o)}.{}\end{aligned} $$
(6.2.1)

Hence, we reject H o for small values of λ or for large values of \(n(\bar {X}^{(1)}-\mu ^{(1)}_o)'\varSigma _{11}^{-1}(\bar {X}^{(1)}-\mu ^{(1)}_o)\sim \chi ^2_{r}\) since the expected value and covariance matrix of \(\bar {X}^{(1)}\) are respectively \(\mu ^{(1)}_o\) and Σ 11n. Accordingly, the criterion can be enunciated as follows:

$$\displaystyle \begin{aligned} \mbox{Reject}\ H_o:\mu^{(1)}=\mu^{(1)}_o\ \mbox{(given) if }u&\equiv n(\bar{X}^{(1)}-\mu^{(1)}_o)'\varSigma_{11}^{-1}(\bar{X}^{(1)}-\mu^{(1)}_o)\ge \chi^2_{r,\,\alpha} \end{aligned} $$
(6.2.2)

\(\mbox{with}\ Pr\{\chi ^2_r\ge \chi ^2_{r,~\alpha }\}=\alpha \). In the complex Gaussian case, the corresponding \(2\tilde {u}\) will be distributed as a real chisquare random variable having 2r degrees of freedom; thus, the criterion will consist of rejecting the corresponding null hypothesis whenever the observed value of \(2\tilde {u}\ge \chi ^2_{2r,\,\alpha }\).

Example 6.2.4

Let the 4 × 1 vector X have a real normal distribution N 4(μ, Σ), Σ > O. Consider the hypothesis that part of μ is specified. For example, let the hypothesis H o and Σ be the following:

Since we are specifying the first two parameters in μ, the hypothesis can be tested by computing the distribution of . Observe that X (1) ∼ N 2(μ (1), Σ 11), Σ 11 > O where

Let the observed vectors from the original N 4(μ, Σ) population be

Then the observations corresponding to the subvector X (1), denoted by \(X_j^{(1)}\), are the following:

In this case, the sample size n = 5 and the sample mean, denoted by \(\bar {X}^{(1)}\), is

Therefore

If \(9.73>\chi ^2_{2,\,\alpha },\) then we would reject \(H_o^{(1)}\,: \mu ^{(1)}=\mu _o^{(1)}\). Let us test this hypothesis at the significance level α = 0.01. Since \(\chi ^2_{2,\,0.01}=9.21,\) we reject the null hypothesis. In this instance, the p-value, which can be determined from a chisquare table, is \(Pr\{\chi ^2_2\ge 9.73\}\approx 0.007\).

6.2.4. Testing μ 1 = ⋯ = μ p, with Σ known, real Gaussian case

Let X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, and the X j be independently distributed. Letting μ′ = (μ 1, …, μ p), consider the hypothesis

$$\displaystyle \begin{aligned}H_o:\mu_1=\mu_2=\cdots=\mu_p=\nu, \end{aligned}$$

where ν, the common μ j is unknown. This implies that μ i − μ j = 0 for all i and j. Consider the p × 1 vector J of unities, J′ = (1, …, 1) and then take any non-null vector that is orthogonal to J. Let A be such a vector so that A′J = 0. Actually, p − 1 linearly independent such vectors are available. For example, if p is even, then take 1, −1, …, 1, −1 as the elements of A and, when p is odd, one can start with 1, −1, …, 1, −1 and take the last three elements as 1, −2, 1, or the last element as 0, that is,

When the last element of the vector A is zero, we are simply ignoring the last element in X j. Let the p × 1 vector X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, and the X j’s be independently distributed. Let the scalar y j = A′X j and the 1 × n vector Y = (y 1, …, y n) = (A′X 1, …, A′X n) = A′(X 1, …, X n) = A′ X, where the p × n matrix X = (X 1, …, X n). Let \(\bar {y}=\frac {1}{n}(y_1+\cdots +y_n)=A'\frac {1}{n}(X_1+\cdots +X_n)=A'\bar {X}\). Then. \(\sum _{j=1}^n(y_j-\bar {y})(y_j-\bar {y})' =A'\sum _{j=1}^n(X_j-\bar {X})(X_j-\bar {X})'A\) where \(\sum _{j=1}^n(X_j-\bar {X})(X_j-\bar {X})'=(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'=S=\) the sample sum of products matrix in the X j’s, where \(\bar {\mathbf {X}}=(\bar {X},\ldots ,\bar {X})\) the p × n matrix whose columns are all equal to \(\bar {X}\). Thus, one has \(\sum _{j=1}^n(y_j-\bar {y})^2=A'SA\). Consider the hypothesis μ 1 = ⋯ = μ p = ν. Then, A′μ = νA′J = ν 0 = 0 under H o. Since X j ∼ N p(μ, Σ), Σ > O, we have y j ∼ N 1(A′μ, A′ΣA), A′ΣA > 0. Under H o, y j ∼ N 1(0, A′ΣA), j = 1, …, n, the y j’s being independently distributed. Consider the joint density of y 1, …, y n, denoted by L:

$$\displaystyle \begin{aligned}L=\prod_{j=1}^n\frac{{\mathrm{e}}^{-\frac{1}{2A'\varSigma A}(y_j-A'\mu)^2 }}{(2\pi)^{\frac{1}{2}}[A'\varSigma A]^{\frac{1}{2}}}. \end{aligned}$$
(i)

Since Σ is known, the only unknown quantity in L is μ. Differentiating \(\ln L\) with respect to μ and equating the result to a null vector, we have

$$\displaystyle \begin{aligned}\sum_{j=1}^n(y_j-A'\hat{\mu})=0\Rightarrow \sum_{j=1}^ny_j-nA'\hat{\mu}=0\Rightarrow \bar{y}-A'\hat{\mu}=0\Rightarrow A'(\bar{X}-\hat{\mu})=0. \end{aligned}$$

However, since A is a fixed known vector and the equation holds for arbitrary \(\bar {X}\), \(\hat {\mu }=\bar {X}\). Hence the maximum of L, in the entire parameter space Ω = μ, is the following:

$$\displaystyle \begin{aligned}\max_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{1}{2A'\varSigma A}\sum_{j=1}^n[A'(X_j-\bar{X})]^2}}{(2\pi)^{\frac{np}{2}}[A'\varSigma A]^{\frac{n}{2}}}=\frac{{\mathrm{e}}^{-\frac{1}{2A'\varSigma A}A'SA}}{(2\pi)^{\frac{np}{2}}[A'\varSigma A]^{\frac{n}{2}}}. \end{aligned}$$
(ii)

Now, noting that under H o, A′μ = 0, we have

$$\displaystyle \begin{aligned}\max_{H_o}L=\frac{{\mathrm{e}}^{-\frac{1}{2A'\varSigma A}\sum_{j=1}^nA'X_jX_j^{\prime}A'}}{(2\pi)^{\frac{np}{2}}[A'\varSigma A]^{\frac{n}{2}}}. \end{aligned}$$
(iii)

From (i) to (iii), the λ-criterion is as follows, observing that \(A'(\sum _{j=1}^nX_jX_j^{\prime })A=\sum _{j=1}^nA'(X_j-\bar {X})(X_j-\bar {X})'A+nA'(\bar {X}\bar {X}'A)=A'SA+nA'\bar {X} \bar {X}'A\):

$$\displaystyle \begin{aligned} \lambda={\mathrm{e}}^{-\frac{n}{2A'\varSigma A}A'\bar{X}\bar{X}'A}. \end{aligned} $$
(6.2.3)

But since \(\sqrt {\frac {n}{A'\varSigma A}}A'\bar {X}\sim N_1(0,~1)\) under H o, we may test this null hypothesis either by using the standard normal variable or a chisquare variable as \(\frac {n}{A'\varSigma A}A'\bar {X}\bar {X}'A\sim \chi ^2_1\) under H o. Accordingly, the criterion consists of rejecting H o

$$\displaystyle \begin{aligned} \mbox{when}\ \left\vert\sqrt{\frac{n}{A'\varSigma A}}A'\bar{X}\right\vert \ge z_{\frac{\alpha}{2}},\ \mbox{with} \ Pr\{z\ge z_{\beta}\}=\beta,~ z\sim N_1(0,~1)\end{aligned} $$

or

$$\displaystyle \begin{aligned} \mbox{when}\ u\equiv \frac{n}{A'\varSigma A}(A'\bar{X}\bar{X}'A)\ge \chi^2_{1,\,\alpha},\ \mbox{with} \ Pr\{\chi^2_1\ge \chi^2_{1,\,\alpha}\}=\alpha,~ u\sim \chi^2_1\,.\end{aligned} $$
(6.2.4)

Example 6.2.5

Consider a 4-variate real Gaussian vector X ∼ N 4(μ, Σ), Σ > O with Σ as specified in Example 6.2.4 and the null hypothesis that the individual components of the mean value vector μ are all equal, that is,

Let L be a 4 × 1 constant vector such that L′ = (1, −1, 1, −1). Then, under H o, L′μ = 0 and u = L′X is univariate normal; more specifically, u ∼ N 1(0, L′ΣL) where

Let the observation vectors be the same as those used in Example 6.2.4 and let u j = L′X j, j = 1, …, 5. Then, the five independent observations from u ∼ N 1(0, 7) are the following:

the average \(\bar {u}=\frac {1}{5}(u_1+\cdots +u_5)=\frac {1}{5}(-1-3-3-3-1)\) being equal to \(-\frac {11}{5}.\) Then, the standardized sample mean \(z=\frac {\sqrt {n}}{{\sigma _u}}(\bar {u}-0)\sim N_1(0,~1)\). Let us test the null hypothesis at the significance level α = 0.05. Referring to a N 1(0, 1) table, the required critical value, denoted by \(z_{\frac {\alpha }{2}}=z_{0.025}\) is 1.96. Therefore, we reject H o in favor of the alternative hypothesis that at least two components of μ are unequal at significance level α if the observed value of

$$\displaystyle \begin{aligned}|z|=\Big|\frac{\sqrt{n}}{{\sigma_u}}(\bar{u}-0)\Big|\ge 1.96. \end{aligned}$$

Since the observed value of |z| is \(|\frac {\sqrt {5}}{\sqrt {7}}(-\frac {7}{5}-0)|=\sqrt {1.4}=1.18\) is less than 1.96, we do not reject H o at the 5% significance level. Letting z ∼ N 1(0, 1), the p-value in this case is Pr{|z|≥ 1.18} = 0.238, this quantile being available from a standard normal table.

In the complex case, proceeding in a parallel manner to the real case, the lambda criterion will be the following:

$$\displaystyle \begin{aligned} \tilde{\lambda}={\mathrm{e}}^{-\frac{n}{A^{*}\varSigma A}A^{*}\bar{\tilde{X}}\bar{\tilde{X}}^{*}A} \end{aligned}$$
(6.2a.5)

where an asterisk indicates the conjugate transpose. Letting \(\tilde {u}=\frac {2n}{A^{*}\varSigma A}(A^{*}\bar {\tilde {X}}\bar {\tilde {X}}^{*}A),\) it can be shown that under H o, \(\tilde {u}\) is distributed as a real chisquare random variable having 2 degrees of freedom. Accordingly, the criterion will be as follows:

$$\displaystyle \begin{aligned} \mbox{Reject}\ H_o\ \mbox{if the observed }\tilde{u}\ge\chi^2_{2,\,\alpha} \mbox{ with}\ Pr\{\chi^2_2\ge\chi^2_{2,\,\alpha}\}=\alpha. \end{aligned}$$
(6.2a.6)

Example 6.2a.2

When p > 2, the computations become quite involved in the complex case. Thus, we will let p = 2 and consider the bivariate complex \(\tilde {N}_2(\tilde {\mu },\tilde {\varSigma })\) distribution that was specified in Example 6.2a.1, assuming that \(\tilde {\varSigma }\) is as given therein, the same set of observations being utilized as well. In this case, the null hypothesis is \(H_o: \tilde {\mu }_1=\tilde {\mu }_2\), the parameters and sample average being

Letting L′ = (1, −1), \(L'\tilde {\mu }=0\) under H o, and

The criterion consists of rejecting H o if the observed value of \(v\ge \chi ^2_{2,\,\alpha }\). Letting the significance level of the test be α = 0.05, the critical value is \(\chi ^2_{2,\,0.05}=5.99\), which is readily available from a chisquare table. The observed value of v being \(\frac {5}{6}<5.99\), we do not reject H o. In this case, the p-value is \(Pr\{\chi ^2_2\ge \frac {5}{6}\}\approx 0.318\).

6.2.5. Likelihood ratio criterion for testing H o : μ 1 = ⋯ = μ p , Σ known

Consider again, X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, with the X j’s being independently distributed and Σ, assumed known. Letting the joint density of X 1, …, X n be denoted by L, then, as determined earlier,

$$\displaystyle \begin{aligned}L=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{n}{2}(\bar{X}-\mu)'\varSigma^{-1}(\bar{X}-\mu)}}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}} \end{aligned}$$
(i)

where n is the sample size and S is the sample sum of products matrix. In the entire parameter space

$$\displaystyle \begin{aligned}\Omega=\{(\mu,\varSigma)\,|\,\varSigma>O\mbox{ known},~ \mu'=(\mu_1,\ldots,\mu_p)\}, \end{aligned}$$

the MLE of μ is \(\bar {X}= \) the sample average. Then

$$\displaystyle \begin{aligned}\max_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)}}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}\ . \end{aligned}$$
(ii)

Consider the following hypothesis on μ′ = (μ 1, …, μ p):

$$\displaystyle \begin{aligned}H_o: \mu_1=\cdots=\mu_p=\nu,~ \nu \mbox{ is unknown. } \end{aligned}$$

Then, the MLE of μ under H o is \(\hat {\mu }=J\hat {\nu }=J\frac {1}{p}J'\bar {X},~ J'=(1,\ldots ,1)\). This \(\hat {\nu }\) is in fact the sum of all observations on all components of X j, j = 1, …, n, divided by np, which is identical to the sum of all the coordinates of \(\bar {X}\) divided by p or \(\hat {\mu }=\frac {1}{p}JJ'\bar {X}\). In order to evaluate the maximum of L under H o, it suffices to substitute \(\hat {\mu }\) to μ in (i). Accordingly, the λ-criterion is

$$\displaystyle \begin{aligned}\lambda=\frac{\max_{H_o}L}{\max_{\Omega}L}={\mathrm{e}}^{-\frac{n}{2}(\bar{X}-\hat{\mu})'\varSigma^{-1}(\bar{X}-\hat{\mu})}.{}\end{aligned} $$
(6.2.5)

Thus, we reject H o for small values of λ or for large values of \(w\equiv n(\bar {X}-\hat {\mu })'\varSigma ^{-1}(\bar {X}-\hat {\mu })\). Let us determine the distribution of v. First, note that

$$\displaystyle \begin{aligned}\bar{X}-\hat{\mu}=\bar{X}-\frac{1}{p}JJ'\bar{X}=\big(I_p-\frac{1}{p}JJ'\big)\bar{X}, \end{aligned}$$

and let

$$\displaystyle \begin{aligned} w&=n(\bar{X}-\hat{\mu})'\varSigma^{-1}(\bar{X}-\hat{\mu})=n\bar{X}'(I-\frac{1}{p}JJ')\varSigma^{-1}(I-\frac{1}{p}JJ')\bar{X}\\ &=(\bar{X}-\mu)'(I-\frac{1}{p}JJ')\varSigma^{-1}(I-\frac{1}{p}JJ')(\bar{X}-\mu)\end{aligned} $$
(iii)

since \(J'(I-\frac {1}{p}JJ')=O\), μ = νJ being the true mean value of the N p(μ, Σ) distribution. Observe that \(\sqrt {n}(\bar {X}-\mu )\sim N_p(O,~\varSigma ),~ \varSigma >O\), and that \(\frac {1}{p}JJ'\) is idempotent. Since \(I-\frac {1}{p}JJ'\) is also idempotent and its rank is p − 1, there exists an orthonormal matrix P, PP′ = I, P′P = I, such that

Letting \(U=P\sqrt {n}(\bar {X}-\hat {\mu }),\) with U′ = (u 1, …, u p−1, u p), U ∼ N p(O, PΣP′). Now, on noting that

we have

B being the covariance matrix associated with U 1, so that U 1 ∼ N p−1(O, B), B > O. Thus, \(U_1^{\prime }B^{-1}U_1\sim \chi ^2_{p-1}\), a real scalar chisquare random variable having p − 1 degrees of freedom. Hence, upon evaluating

$$\displaystyle \begin{aligned}w=\bar{X}'(I-\frac{1}{p}JJ')\varSigma^{-1}(I-\frac{1}{p}JJ')\bar{X}, \end{aligned}$$

one would reject H o : μ 1 = ⋯ = μ p = ν, ν unknown, whenever the observed value of

$$\displaystyle \begin{aligned}w\ge \chi^2_{p-1,~\alpha}, \ \mbox{with}\ Pr\{\chi^2_{p-1}\ge \chi^2_{p-1,\,\alpha}\}=\alpha.{}\end{aligned} $$
(6.2.6)

Observe that the degrees of freedom of this chisquare variable, that is, p − 1, coincides with the number of parameters being restricted by H o.

Example 6.2.6

Consider the trivariate real Gaussian population X ∼ N 3(μ, Σ), Σ > O, as already specified in Example 6.2.1 with the same Σ and the same observed sample vectors for testing H o : μ′ = (ν, ν, ν), namely,

The following test statistic has to be evaluated for p = 3:

$$\displaystyle \begin{aligned}w=\bar{X}'(I-\frac{1}{p}JJ')\varSigma^{-1}(I-\frac{1}{p}JJ')\bar{X},~J'=(1,1,1). \end{aligned}$$

We have to evaluate the following quantities in order to determine the value of w:

Thus,

We reject H o whenever \(w\ge \chi ^2_{p-1,\alpha }\). Letting the significance level be α = 0.05, the tabulated critical value is \(\chi ^2_{p-1,\,\alpha }=\chi ^2_{2,\,0.05}=5.99\), and since 0.38 < 5.99, we do not reject the null hypothesis. In this instance, the p-value is \(Pr\{\chi ^2_2\ge 0.38\}\approx 0.32\).

6.3. Testing H o : μ = μ o (given) When Σ is Unknown, Real Gaussian Case

In this case, both μ and Σ are unknown in the entire parameter space Ω; however, μ = μ o known while Σ is still unknown in the subspace ω. The MLE under Ω is the same as that obtained in Sect. 6.1.1., that is,

$$\displaystyle \begin{aligned} {\mathrm{sup}}_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{np}{2}}n^{\frac{np}{2}}}{(2 \pi)^{\frac{np}{2}}|S|{}^{\frac{n}{2}}}. \end{aligned} $$
(6.3.1)

When μ = μ o, Σ is estimated by \(\hat {\varSigma }=\frac {1}{n}\sum _{j=1}^n(X_j-\mu _o)(X_j-\mu _o)'\). As shown in Sect. 3.5, \(\hat \varSigma \) can be reexpressed as follows:

$$\displaystyle \begin{aligned} \hat{\varSigma}&=\frac{1}{n}\sum_{j=1}^n(X_j-\mu_o)(X_j-\mu_o)=\frac{1}{n}S+(\bar{X}-\mu_o)(\bar{X}-\mu_o)'\\ &=\frac{1}{n}[S+n(\bar{X}-\mu_o)(\bar{X}-\mu_o)'].\end{aligned} $$

Then, under the null hypothesis, we have

$$\displaystyle \begin{aligned}{\mathrm{sup}}_{\omega}L=\frac{{\mathrm{e}}^{-\frac{np}{2}}n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}|S+n(\bar{X}-\mu_o)(\bar{X}-\mu_o)^{\prime}|{}^{\frac{n}{2}}}.{}\end{aligned} $$
(6.3.2)

Thus,

$$\displaystyle \begin{aligned}\lambda=\frac{{\mathrm{sup}}_{\omega}L}{{\mathrm{sup}}_{\Omega}L}=\frac{|S|{}^{\frac{n}{2}}}{|S+n(\bar{X}-\mu_o)(\bar{X}-\mu_o)^{\prime}|{}^{\frac{n}{2}}}. \end{aligned}$$

On applying results on the determinants of partitioned matrices which were obtained in Sect. 1.3, we have the following equivalent representations of the denominator:

that is,

$$\displaystyle \begin{aligned}|S+n(\bar{X}-\mu_o)(\bar{X}-\mu_o)'|=|S|[1+n(\bar{X}-\mu_o)'S^{-1}(\bar{X}-\mu_o)],\end{aligned}$$

which yields the following simplified representation of the likelihood ratio statistic:

$$\displaystyle \begin{aligned}\lambda=\frac{1}{[1+n(\bar{X}-\mu_o)'S^{-1}(\bar{X}-\mu_o)]^{\frac{n}{2}}}.{}\end{aligned} $$
(6.3.3)

Small values of λ correspond to large values of \(u\equiv n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o)\), which is connected to Hotelling’s \(T_n^2\) statistic. Hence the criterion is the following: “Reject H o for large values of u”. The distribution of u can be derived by making use of the independence of the sample mean and sample sum of products matrix and the densities of these quantities. An outline of the derivation is provided in the next subsection.

6.3.1. The distribution of the test statistic

Let us examine the distribution of \(u=n(\bar {X}-\mu )'S^{-1}(\bar {X}-\mu )\). We have already established in Theorems 3.5.3, that S and \(\bar {X}\) are independently distributed in the case of a real p-variate nonsingular Gaussian N p(μ, Σ) population. It was also determined in Corollary 3.5.2 that the distribution of the sample average \(\bar {X}\) is a p-variate real Gaussian vector with the parameters μ and \(\frac {1}{n}\varSigma ,~ \varSigma >O\) and in the continuing discussion, it is shown that the distribution of S is a matrix-variate Wishart with m = n − 1 degrees of freedom, where n is the sample size and parameter matrix Σ > O. Hence the joint density of S and \(\bar {X}\), denoted by \(f(S,\bar {X})\), is the product of the marginal densities. Letting Σ = I, this joint density is given by

$$\displaystyle \begin{aligned} f(S,\bar{X})=\frac{n^{\frac{p}{2}}}{(2\pi)^{\frac{p}{2}}2^{\frac{mp}{2}}\varGamma_p(\frac{m}{2})}|S|{}^{\frac{m}{2}-\frac{p+1}{2}} {\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(S)-\frac{n}{2}{\mathrm{tr}}((\bar{X}-\mu)(\bar{X}-\mu)')},~m=n-1.\end{aligned}$$
(i)

Note that it is sufficient to consider the case Σ = I. Due to the presence of S −1 in \(u=(\bar {X}-\mu )'S^{-1}(\bar {X}-\mu )\), the effect of any scaling matrix on X j will disappear. If X j goes to \(A^{\frac {1}{2}}X_j\) for any constant positive definite matrix A then S −1 will go to \(A^{-\frac {1}{2}}S^{-1}A^{-\frac {1}{2}}\) and thus u will be free of A. Letting \(Y=S^{-\frac {1}{2}}(\bar {X}-\mu )\) for fixed S, Y ∼ N p(O, S −1n), so that the conditional density of Y , given S, is

$$\displaystyle \begin{aligned}g(Y|S)=\frac{n^{\frac{p}{2}}|S|{}^{\frac{1}{2}}}{(2\pi)^{\frac{p}{2}}}{\mathrm{e}}^{-\frac{n}{2}{\mathrm{tr}}(SYY')}. \end{aligned}$$

Thus, the joint density of S and Y , denoted by f 1(S, Y ), is

$$\displaystyle \begin{aligned} f_1(S,Y)=\frac{n^{\frac{p}{2}}}{(2\pi)^{\frac{p}{2}}2^{\frac{mp}{2}}\varGamma_p(\frac{m}{2})}|S|{}^{\frac{m+1}{2}-\frac{p+1}{2}} {\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(S[I+n YY'])},\quad m=n-1. \end{aligned}$$
(ii)

On integrating out S from (ii) by making use of a matrix-variate gamma integral, we obtain the following marginal density of Y , denoted by f 2(Y ):

$$\displaystyle \begin{aligned}f_2(Y){\mathrm{d}}Y=\frac{n^{\frac{p}{2}}}{(\pi)^{\frac{p}{2}}}\frac{\varGamma_p(\frac{m+1}{2})}{\varGamma_p(\frac{m}{2})}|I+nYY^{\prime}|{}^{-(\frac{m+1}{2})}{\mathrm{d}}Y,\quad m=n-1. \end{aligned}$$
(iii)

However, |I + nYY| = 1 + nY′Y, which can be established by considering two representations of the determinant

similarly to what was done in Sect. 6.3 to obtain the likelihood ratio statistic given in (6.3.3). As well, it can easily be shown that

$$\displaystyle \begin{aligned}\frac{\varGamma_p(\frac{m+1}{2})}{\varGamma_p(\frac{m}{2})}=\frac{\varGamma(\frac{m+1}{2})}{\varGamma(\frac{m+1}{2}-\frac{p}{2})}\end{aligned}$$

by expanding the matrix-variate gamma functions. Now, letting s = Y′Y , it follows from Theorem 4.2.3 that \({\mathrm{d}}Y=\frac {\pi ^{\frac {p}{2}}}{\varGamma (\frac {p}{2})}s^{\frac {p}{2}-1}{\mathrm{d}}s\). Thus, the density of s, denoted by f 3(s), is

$$\displaystyle \begin{aligned} f_3(s){\mathrm{d}}s&=\frac{n^{\frac{p}{2}}\varGamma(\frac{m+1}{2})}{\varGamma(\frac{m+1}{2}-\frac{p}{2})\varGamma(\frac{p}{2})}s^{\frac{p}{2}-1}(1+ns)^{-(\frac{m+1}{2})}{\mathrm{d}}s \end{aligned} $$
(6.3.4)
$$\displaystyle \begin{aligned} &=\frac{n^{\frac{p}{2}}\varGamma(\frac{n}{2})}{\varGamma(\frac{n}{2}-\frac{p}{2})\varGamma(\frac{p}{2})}s^{\frac{p}{2}-1}(1+ns)^{-(\frac{n}{2})}{\mathrm{d}}s,~m=n-1,{} \end{aligned} $$
(6.3.5)

for n = p + 1, p + 2, …, 0 ≤ s < , and zero elsewhere. It can then readily be seen from (6.3.5) that \(ns=nY'Y=n(\bar {X}-\mu )'S^{-1}(\bar {X}-\mu )=u\) is distributed as a real scalar type-2 beta random variable whose parameters are \((\frac {p}{2},~ \frac {n}{2}-\frac {p}{2}),~ n=p+1,\ldots \ \). Thus, the following result:

Theorem 6.3.1

Consider a real p-variate normal population N p(μ, Σ), Σ > O, and a simple random sample of size n from this normal population, X j ∼ N p(μ, Σ), j = 1, …, n, the X j ’s being independently distributed. Let the p × n matrix X = (X 1, …, X n) be the sample matrix and the p-vector \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) denote the sample average. Let \(\bar {\mathbf {X}}=(\bar {X},\ldots ,\bar {X})\) be a p × n matrix whose columns are all equal to \(\bar {X},\) and \(S=(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'\) be the sample sum of products matrix. Then, \(u=n(\bar {X}-\mu )'S^{-1}(\bar {X}-\mu )\) has a real scalar type-2 beta distribution with the parameters \((\frac {p}{2},~ \frac {n}{2}-\frac {p}{2})\) , so that \(u\sim \frac {p}{n-p}F_{p,\,n-p}\) where F p,np denotes a real F random variable whose degrees of freedoms are p and n  p.

Hence, in order to test the hypothesis H o : μ = μ o, the likelihood ratio statistic gives the test criterion: Reject H o for large values of \(u=n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o)\), which is equivalent to rejecting H o for large values of an F-random variable having p and n − p degrees of freedom where \(F_{p,\,n-p}=\frac {n-p}{p}\,u=\frac {n-p}{p}\,n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o)\), that is,

$$\displaystyle \begin{aligned} \mbox{reject}\ H_o\ \mbox{if }\, &\frac{n-p}{p}\,u=F_{p,\,n-p}\ge F_{p,\,n-p,~\alpha}\,,\\ \mbox{with } \, \alpha&=Pr\{F_{p,\,n-p}\ge F_{n,\,n-p,~\alpha}\}{} \end{aligned} $$
(6.3.6)

at a given significance level α where \(u=n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o)\sim \frac {p}{n-p}F_{p,\,n-p}\,\), n being the sample size.

Example 6.3.1

Consider a trivariate real Gaussian vector X ∼ N 3(μ, Σ), Σ > O, where Σ is unknown. We would like to test the following hypothesis on μ: H o : μ = μ o, with \( \mu _o^{\prime }=(1,1,1)\). Consider the following simple random sample of size n = 5 from this N 3(μ, Σ) population:

Let X = [X 1, …, X 5] the 3 × 5 sample matrix and \({\bar {\mathbf {X}}}=[\bar {X},\bar {X},\ldots ,\bar {X}]\) be the 3 × 5 matrix of sample means. Then,

$$\displaystyle \begin{aligned} \mathbf{X}-{\bar{\mathbf{X}}}&=[X_1-\bar{X},\ldots,X_5-\bar{X}] =\frac{1}{5}\left[\begin{array}{rrrrr}4&4&-6&-11&9\\ 3&-2&3&3&-7\\ 1&-9&6&6&-4\end{array}\right],\\ S&=(\mathbf{X}-{\bar{\mathbf{X}}})(\mathbf{X}-{\bar{\mathbf{X}}})' =\frac{1}{5^2}\left[\begin{array}{rrr}270&-110&-170\\ -110&80&85\\ -170&85&170\end{array}\right].\end{aligned} $$

Let \(S=\frac {1}{5^2}A\). In order to evaluate the test statistic, we need S −1 = 25A −1. To obtain the correct inverse without any approximation, we will use the transpose of the cofactor matrix divided by the determinant. The determinant of A, |A|, as obtained in terms of the elements of the first row and the corresponding cofactors is equal to 531250. The matrix of cofactors, denoted by Cof(A), which is symmetric in this case, is the following:

The null hypothesis is H o : μ = μ o = (1, 1, 1), so that

the observed value of the test statistic being

The test statistic w under the null hypothesis is F-distributed, that is, w ∼ F p,np. Let us test H o at the significance level α = 0.05. Since the critical value as obtained from an F-table is F p,np,α = F 3,2,0.05 = 19.2 and 2.35 < 19.2, we do not reject H o.

Note 6.3.1

If S is replaced by \(\frac {1}{n-1}S\), an unbiased estimator for Σ, then the test statistic \(\frac {1}{n-1}n(\bar {X}-\mu _o)'[\frac {1}{n-1}S]^{-1}(\bar {X}-\mu _o)=\frac {T_n^2}{n-1}\) where \(T_n^2\) denotes Hotelling’s T 2 statistic, which for p = 1 corresponds to the square of a Student-t statistic having n − 1 degrees of freedom.

Since u as defined in Theorem 6.3.1 is distributed as a type-2 beta random variable with the parameters \((\frac {p}{2},~ \frac {n-p}{2}),\) we have the following results: \(\frac {1}{u}\) is type-2 beta distributed with the parameters \((\frac {n-p}{2},~ \frac {p}{2})\), \(\frac {u}{1+u}\) is type-1 beta distributed with the parameters \((\frac {p}{2},~ \frac {n-p}{2})\), and \(\frac {1}{1+u}\) is type-1 beta distributed with the parameters \((\frac {n-p}{2},~ \frac {p}{2})\), n being the sample size.

6.3.2. Paired values or linear functions when Σ is unknown

Let Y 1, …, Y k be p × 1 vectors having their own distributions which are unknown. However, suppose that it is known that a certain linear function X = a 1 Y 1 + ⋯ + a k Y k has a p-variate real Gaussian N p(μ, Σ) distribution with Σ > O. We would like to test hypotheses of the type \(E[X]=a_1\mu _{(1)}^{(o)}+\cdots +a_k\mu _{(k)}^{(o)}\) where the \(\mu _j^{(o)}\)’s, j = 1, …, k, are specified. Since we do not know the distributions of Y 1, …, Y k, let us convert the iid variables on Y j, j = 1, …, k, to iid variables on X j, say X 1, …, X n, X j ∼ N p(μ, Σ), Σ > O, where Σ is unknown. First, the observations on Y 1, …, Y k are transformed into observations on the X j’s. The problem then involves a single normal population whose covariance matrix is unknown. An example of this type is Y 1 representing a p × 1 vector before a certain process, such as administering a drug to a patient; in this instance, Y 1 could consists of measurements on p characteristics observed in a patient. Observations on Y 2 will then be the measurements on the same p characteristics after the process such as after administering the drug to the patient. Then Y 2q − Y 1q = X q will represent the variable corresponding to the difference in the measurements on the q-th characteristic. Let the hypothesis be H o : μ = μ o (given), Σ being unknown. Note that once the observations on X j are taken, then the individual μ (j)’s are irrelevant as they no longer are of any use. Once the X j’s are determined, one can compute the sum of products matrix S in X j. In this case, the test statistic is \(u=n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o),\) which is distributed as a type-2 beta with parameters \((\frac {p}{2},\, \frac {n-p}{2})\). Then, \(u\sim \frac {p}{n-p}F\) where F is an F random variable having p and n − p degrees of freedom, that is, an F p,np random variable, n being the sample size. Thus, the test criterion is applied as follows: Determine the observed value of u and the corresponding observed value of F p,np that is, \( \frac {n-p}{p} \, u\), and then

$$\displaystyle \begin{aligned} \mbox{reject}\ H_o\ \mbox{if} \ \, \frac{n-p}{p} \, u\ge F_{p,\,n-p,~\alpha},\ \mbox{with}\ Pr\{F_{p,\, n-p}\ge F_{p,\,n-p,~\alpha}\}=\alpha. \end{aligned} $$
(6.3.7)

Example 6.3.2

Five motivated individuals were randomly selected and subjected to an exercise regimen for a month. The exercise program promoters claim that the subjects can expect a weight loss of 5 kg as well as a 2-in. reduction in lower stomach girth by the end of the month period. Let Y 1 and Y 2 denote the two component vectors representing weight and girth before starting the routine and at the end of the exercise program, respectively. The following are the observations on the five individuals:

Obviously, Y 1 and Y 2 are dependent variables having a joint distribution. We will assume that the difference X = Y 1 − Y 2 has a real Gaussian distribution, that is, X ∼ N 2(μ, Σ), Σ > O. Under this assumption, the observations on X are

Let X = [X 1, X 2, …, X 5] and \(\mathbf {X}-\bar {\mathbf {X}}=[X_1-\bar {X},\ldots ,X_5-\bar {X}]\), both being 2 × 5 matrix. The observed sample average \(\bar {X}\), the claim of the exercise routine promoters μ = μ o as well as other relevant quantities are as follows:

The test statistic being \(w=\frac {(n-p)}{p}n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o)\), its observed value is

Letting the significance level of the test be α = 0.05, the critical value is F p,np,α = F 2,3,0.05 = 9.55. Since 22.84 > 9.55, H o is thus rejected.

6.3.3. Independent Gaussian populations

Consider k independent p-variate real Gaussian populations whose individual distribution is N p(μ (j), Σ j), Σ j > O, j = 1, …, k. Given simple random samples of sizes n 1, …, n k from these k populations, we may wish to test a hypothesis on a given linear functions of the mean values, that is, H o : a 1 μ (1) + ⋯ + a k μ (k) = μ 0 where a 1, …, a k are known constants and μ o is a given quantity under the null hypothesis. We have already discussed this problem for the case of known covariance matrices. When the Σ j’s are all unknown or some of them are known while others are not, the MLE’s of the unknown covariance matrices turn out to be the respective sample sums of products matrices divided by the corresponding sample sizes. This will result in a linear function of independent Wishart matrices whose distribution proves challenging to determine, even for the null case.

Special case of two independent Gaussian populations

Consider the special case of two independent real Gaussian populations having identical covariance matrices. that is, let the populations be Y 1q ∼ N p(μ (1), Σ), Σ > O, the Y 1q’s, q = 1, …, n 1, being iid, and Y 2q ∼ N p(μ (2), Σ), Σ > O, the Y 2q’s, q = 1, …, n 2, being iid . Let the sample p × n 1 and p × n 2 matrices be denoted as \({\mathbf {Y}}_1=(Y_{11},\ldots ,Y_{1n_1})\) and \({\mathbf {Y}}_2=(Y_{21},\ldots ,Y_{2n_2})\) and let the sample averages be \(\bar {Y}_j=\frac {1}{n_j}(Y_{j1}+\cdots +Y_{jn_j}),\ j=1,2\). Let \(\bar {\mathbf {Y}}_j=(\bar {Y}_j,\ldots ,\bar {Y}_j)\), a p × n j matrix whose columns are equal to \(\bar {Y}_j\), j = 1, 2, and let

$$\displaystyle \begin{aligned}S_j=({\mathbf{Y}}_j-\bar{\mathbf{Y}}_j)({\mathbf{Y}}_j-\bar{\mathbf{Y}}_j)',~ j=1,2, \end{aligned}$$

be the corresponding sample sum of products matrices. Then, S 1 and S 2 are independently distributed as Wishart matrices having n 1 − 1 and n 2 − 1 degrees of freedom, respectively. As the sum of two independent p × p real or complex matrices having matrix-variate gamma distributions with the same scale parameter matrix is again gamma distributed with the shape parameters summed up and the same scale parameter matrix, we observe that since the two populations are independently distributed, S 1 + S 2 ≡ S has a Wishart distribution having n 1 + n 2 − 2 degrees of freedom. We now consider a hypothesis of the type μ (1) = μ (2). In order to do away with the unknown common mean value, we may consider the real p-vector \(U= \bar {Y}_1-\bar {Y}_2,\) so that E(U) = O and \({\mathrm{Cov}}(U)=\frac {1}{n_1}\varSigma +\frac {1}{n_2}\varSigma =(\frac {1}{n_1}+\frac {1}{n_2})\varSigma =\frac {n_1+n_2}{n_1n_2}\varSigma \). The MLE of this pooled covariance matrix is \(\frac {1}{n_1+n_2}S\) where S is Wishart distributed with n 1 + n 2 − 2 degrees of freedom. Then, following through the steps included in Sect. 6.3.1 with the parameter m now being n 1 + n 2 − 2, the power of S will become \(\frac {(n_1+n_2-2+1)}{2}-\frac {p+1}{2}\) when integrating out S. Letting the null hypothesis be H o : E[Y 1] − E[Y 2] = δ (specified), such as δ = 0, the function resulting from integrating out S is

$$\displaystyle \begin{aligned}c \,\Big[\frac{(n_1+n_2)}{n_1n_2}u]^{\frac{p}{2}-1}[1+\frac{(n_1+n_2)}{n_1n_2}u\Big]^{-\frac{1}{2}(n_1+n_2-1)}{}\end{aligned} $$
(6.3.8)

where c is the normalizing constant, so that \(w=\frac {(n_1+n_2)}{n_1n_2}(\bar {Y_1}-\bar {Y_2}-\delta )'S^{-1}(\bar {Y}_1-\bar {Y}_2-\delta )\) is distributed as a type-2 beta with the parameters \((\frac {p}{2},~ \frac {(n_1+n_2-1-p)}{2})\). Writing \(w=\frac {p}{n_1+n_2-1-p}F_{p,n_1+n_2-1-p}\), this F is seen to be an F statistic having p and n 1 + n 2 − 1 − p degrees of freedom. We will state these results as theorems.

Theorem 6.3.2

Let the p × p real positive definite matrices X 1 and X 2 be independently distributed as real matrix-variate gamma random variables with densities

$$\displaystyle \begin{aligned}f_j(X_j)=\frac{|B|{}^{\alpha_j}}{\varGamma_p(\alpha_j)}|X_j|{}^{\alpha_j-\frac{p+1}{2}}{\mathrm{e}}^{-{\mathrm{tr}}(BX_j)},~ B>O,~ X_j>O,~ \Re(\alpha_j)>\frac{p-1}{2},{}\end{aligned} $$
(6.3.9)

j=1,2, and zero elsewhere. Then, as can be seen from ( 5.2.6 ), the Laplace transform associated with X j or that of f j , denoted as \(L_{X_j}({{ }_{*}T})\) , is

$$\displaystyle \begin{aligned}L_{X_j}({{}_{*}T})=|I+B^{-1}{{}_{*}T}|{}^{-\alpha_j}, ~I+B^{-1}{{}_{*}T}>O,~j=1,2. \end{aligned}$$
(i)

Accordingly, U 1 = X 1 + X 2 has a real matrix-variate gamma density with the parameters (α 1 + α 2, B) whose associated Laplace transform is

$$\displaystyle \begin{aligned}L_{U_1}({{}_{*}T})=|I+B^{-1}{{}_{*}T}|{}^{-(\alpha_1+\alpha_2)}, \end{aligned}$$
(ii)

and U 2 = a 1 X 1 + a 2 X 2 has the Laplace transform

$$\displaystyle \begin{aligned}L_{U_2}({{}_{*}T})=|I+a_1B^{-1}{{}_{*}T}|{}^{-\alpha_1}|I+a_2B^{-1}{{}_{*}T}|{}^{-\alpha_2}, \end{aligned}$$
(iii)

whenever I + a j B −1 T > O, j = 1, 2, where a 1 and a 2 are real scalar constants.

It follows from (ii) that X 1 + X 2 is also real matrix-variate gamma distributed. When a 1a 2, it is very difficult to invert (iii) in order to obtain the corresponding density. This can be achieved by expanding one of the determinants in (iii) in terms of zonal polynomials, say the second one, after having first taken \(|I+a_1B^{-1}{{ }_{*}T}|{ }^{-(\alpha _1+\alpha _2)}\) out as a factor in this instance.

Theorem 6.3.3

Let Y j ∼ N p(μ (j), Σ), Σ > O, j = 1, 2, be independent p-variate real Gaussian distributions sharing the same covariance matrix. Given a simple random sample of size n 1 from Y 1 and a simple random sample of size n 2 from Y 2 , let the sample averages be denoted by \(\bar {Y}_1\) and \(\bar {Y}_2\) and the sample sums of products matrices, by S 1 and S 2 , respectively. Consider the hypothesis H o : μ (1) − μ (2) = δ (given). Letting S = S 1 + S 2 and

$$\displaystyle \begin{aligned}w=\frac{(n_1+n_2)}{n_1n_2}(\bar{Y}_1-\bar{Y}_2-\delta)'S^{-1}(\bar{Y}_1-\bar{Y}_2-\delta), \ \,w\sim\frac{p}{n_1+n_2-1-p}F_{p,\, n_1+n_2-1-p} \end{aligned}$$
(iv)

where \(F_{p,\, n_1+n_2-1-p}\) denotes an F distribution with p and n 1 + n 2 − 1 − p degrees of freedom, or equivalently, w is distributed as a type-2 beta variable with the parameters \((\frac {p}{2}, \frac {(n_1+n_2-1-p)}{2})\) . We reject the null hypothesis H o if \(\frac {n_1+n_2-1-p}{p}w\ge F_{p,\, n_1+n_2-1-p,\ \alpha }\) with

$$\displaystyle \begin{aligned}Pr\{F_{p,\,n_1+n_2-1-p}\ge F_{p,\, n+1+n_2-1-p,\ \alpha}\}=\alpha \end{aligned}$$
(vi)

at a given significance level α.

Theorem 6.3.4

Let w be as defined in Theorem 6.3.3 . Then \(w_1=\frac {1}{w}\) is a real scalar type-2 beta variable with the parameters \((\frac {n_1+n_2-1-p}{2},\, \frac {p}{2})\); \(w_2=\frac {w}{1+w}\) is a real scalar type-1 beta variable with the parameters \((\frac {p}{2},\, \frac {(n_1+n_2-1-p)}{2})\); \(w_3=\frac {1}{1+w}\) is a real scalar type-1 beta variable with the parameters \((\frac {n_1+n_2-1-p}{2},\, \frac {p}{2})\).

Those last results follow from the connections between real scalar type-1 and type-2 beta random variables. Results parallel to those appearing in (i) to (vi) and stated Theorems 6.3.16.3.4 can similarly be obtained for the complex case.

Example 6.3.3

Consider two independent populations whose respective distributions are N 2(μ (1), Σ 1) and N 2(μ (2j), Σ 2), Σ j > O, j = 1, 2, and samples of sizes n 1 = 4 and n 2 = 5 from these two populations, respectively. Let the population covariance matrices be identical with Σ 1 = Σ 2 = Σ, the common covariance matrix being unknown, and let the observed sample vectors from the first population, X j ∼ N 2(μ (1), Σ), be

Denoting the sample mean from the first population by \(\bar {X}\) and the sample sum of products matrix by S 1, we have

the observations on these quantities being the following:

Let the sample vectors from the second population denoted as Y 1, …, Y 5 be

Then, the sample average and the deviation vectors are the following:

Letting the null hypothesis be

Thus, test statistic is \(u \sim F_{p,n_1+n_2-1-p}\) where

Let us test H o at the 5% significance level. Since the required critical value is \(F_{p,\,n_1+n_2-1-p,~\alpha }=F_{2,\,6,~0.05}=5.14\) and 0.91 < 5.14, the null hypothesis is not rejected.

6.3.4. Testing μ 1 = ⋯ = μ p when Σ is unknown in the real Gaussian case

Let the p × 1 vector X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, the X j’s being independently distributed. Let the p × 1 vector of unities be denoted by J or J′ = (1, …, 1), and let A be a vector that is orthogonal to J so that A′J = 0. For example, we can take

If the last component of A is zero, we are then ignoring the last component of X j. Let y j = A′X j, j = 1, …, n, and the y j’s be independently distributed. Then y j ∼ N 1(A′μ, A′ΣA), A′ΣA > O, is a univariate normal variable with mean value A′μ and variance A′ΣA. Consider the p × n sample matrix comprising the X j’s, that is, X = (X 1, …, X n). Let the sample average of the X j’s be \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) and \(\bar {\mathbf {X}}=(\bar {X},\ldots ,\bar {X})\). Then, the sample sum of products matrix \(S=(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'\). Consider the 1 × n vector \(Y=(y_1,\ldots ,y_n)=(A'X_1,\ldots ,A'X_n)=A'\mathbf {X},~ \bar {y}=\frac {1}{n}(y_1+\cdots +y_n)=A'\bar {X}\), \(\sum _{j=1}^n(y_j-\bar {y})^2=A'(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'A=A'SA\). Let the null hypothesis be H o : μ 1 = ⋯ = μ p = ν, where ν is unknown, μ′ = (μ 1, …, μ p). Thus, H o is A′μ = νA′J = 0. The joint density of y 1, …, y n, denoted by L, is then

$$\displaystyle \begin{aligned} L&=\prod_{j=1}^n\frac{{\mathrm{e}}^{-\frac{1}{2A'\varSigma A}(y_j-A'\mu)^2}}{(2\pi)^{\frac{1}{2}}[A'\varSigma A]^{\frac{1}{2}}}=\frac{{\mathrm{e}}^{-\frac{1}{2(A'\varSigma A)}\sum_{j=1}^n(y_j-A'\mu)^2}}{(2\pi)^{\frac{n}{2}}[A'\varSigma A]^{\frac{n}{2}}}\\ &=\frac{{\mathrm{e}}^{-\frac{1}{2A'\varSigma A}\{A'SA+nA'(\bar{X}-\mu)(\bar{X}-\mu)'A\}}}{(2\pi)^{\frac{n}{2}}[A'\varSigma A]^{\frac{n}{2}}}\end{aligned} $$
(i)

where

$$\displaystyle \begin{aligned} \sum_{j=1}^n(y_j-A'\mu)^2&=\sum_{j=1}^n(y_j-\bar{y}+\bar{y}-A'\mu)^2=\sum_{j=1}^n(y_j-\bar{y})^2+n(\bar{y}-A'\mu)^2\\ &=\sum_{j=1}^nA'(X_j-\bar{X})(X_j-\bar{X})'A+nA'(\bar{X}-\mu)(\bar{X}-\mu)'A\\ &=A'SA+nA'(\bar{X}-\mu)(\bar{X}-\mu)'A.\end{aligned} $$

Let us determine the MLE’s of μ and Σ. We have

$$\displaystyle \begin{aligned}\ln L=-\frac{n}{2}(2\pi)-\frac{n}{2}\ln A'\varSigma A-\frac{1}{2A'\varSigma A}[A'SA+n(A'(\bar{X}-\mu)(\bar{X}-\mu)'A)]. \end{aligned}$$

On differentiating \(\ln L\) with respect to μ 1 and equating the result to zero, we have

(ii)

since the equation holds for each μ j, j = 1, …, p, and A′ = (a 1, …, a p), a j≠0, j = 1, …, p, A being fixed. As well, \((\bar {X}-\mu )(\bar {X}-\mu )'=\bar {X}\bar {X}'-\bar {X}\mu '-\mu \bar {X}'+\mu \mu '\). Now, consider differentiating \(\ln L\) with respect to an element of Σ, say, σ 11, at \(\hat {\mu }=\bar {X}\):

$$\displaystyle \begin{aligned} \frac{\partial}{\partial \sigma_{11}}\ln L&=0\\ &\Rightarrow -\frac{2n~a_1^2\sigma_{11}}{2A'\varSigma A}+\frac{A'SA}{2(A'\varSigma A)^2}(2a_1^2\sigma_{11})=0\\ &\Rightarrow A'\hat{\varSigma}A=\frac{1}{n}A'SA\end{aligned} $$

for each element in Σ and hence \(\hat {\varSigma }=\frac {1}{n}S\). Thus,

$$\displaystyle \begin{aligned}\max_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{n}{2}}n^{\frac{n}{2}}}{(A'SA)^{\frac{n}{2}}}. \end{aligned}$$
(iii)

Under H o, A′μ = 0 and consequently the maximum under H o is the following:

$$\displaystyle \begin{aligned}\max_{H_o}L=\frac{{\mathrm{e}}^{-\frac{n}{2}}n^{\frac{n}{2}}}{[A'(S+n\bar{X}\bar{X}')A]^{\frac{n}{2}}}. \end{aligned}$$
(iv)

Accordingly, the λ-criterion is

$$\displaystyle \begin{aligned} \lambda=\frac{(A'SA)^{\frac{n}{2}}}{[A'(S+n\bar{X}\bar{X}')A]^{\frac{n}{2}}}=\frac{1}{[1+n\frac{A'\bar{X}\bar{X}'A}{A'SA}]^{\frac{n}{2}}}. \end{aligned} $$
(6.3.10)

We would reject H o for small values of λ or for large values of \(u\equiv nA'\bar {X}\bar {X}'A/A'SA\) where \(\bar {X}\) and S are independently distributed. Observing that S ∼ W p(n − 1, Σ), Σ > O and \(\bar {X}\sim N_p(\mu ,~\frac {1}{n}\varSigma ),~ \varSigma >O\), we have

$$\displaystyle \begin{aligned}\frac{n}{A'\varSigma A}A'\bar{X}\bar{X}'A\sim\chi^2_1\mbox{ and }\frac{A'SA}{A'\varSigma A}\sim \chi^2_{n-1}. \end{aligned}$$

Hence, (n − 1)u is a F statistic with 1 and n − 1 degrees of freedom, and the null hypothesis is to be rejected whenever

$$\displaystyle \begin{aligned}v\equiv n(n-1)\frac{A'\bar{X}\bar{X}'A}{A'SA}\ge F_{1,\,n-1,~\alpha}, \ \mbox{with } Pr\{F_{1,\,n-1}\ge F_{1,\,n-1,~\alpha}\}=\alpha.{}\end{aligned} $$
(6.3.11)

Example 6.3.4

Consider a real bivariate Gaussian N 2(μ, Σ) population where Σ > O is unknown. We would like to test the hypothesis H o : μ 1 = μ 2, μ′ = (μ 1, μ 2), so that μ 1 − μ 2 = 0 under this null hypothesis. Let the sample be X 1, X 2, X 3, X 4, as specified in Example 6.3.3. Let A′ = (1, −1) so that A′μ = O under H o. With the same observation vectors as those comprising the first sample in Example 6.3.3, A′X 1 = (1), A′X 2 = (−2), A′X 3 = (−1), A′X 4 = (0). Letting y = A′X j, the observations on y j are (1, −2, −1, 0) or A′ X = A′[X 1, X 2, X 3, X 4] = [1, −2, −1, 0]. The sample sum of products matrix as evaluated in the first part of Example 6.3.3 is

Our test statistic is

$$\displaystyle \begin{aligned}v=n(n-1)\frac{A'\bar{X}\bar{X}'A}{A'S_1A}\sim F_{1,n-1},~ n=4. \end{aligned}$$

Let the significance level be α = 0.05. the observed values of \(A'\bar {X}\bar {X}'A,~A'S_1A\), v, and the tabulated critical value of F 1,n−1,α are the following:

$$\displaystyle \begin{aligned} A'\bar{X}\bar{X}'A&=\frac{1}{4};~ A'S_1A=\frac{80}{16}=5;\\ v&=4(3)\Big(\frac{1}{5\times 4}\Big)=0.6;~ F_{1,~n-1,~\alpha}=F_{1,\,3,\ 0.05}=10.13.\end{aligned} $$

As 0.6 < 10.13, H o is not rejected.

6.3.5. Likelihood ratio test for testing H o : μ 1 = ⋯ = μ p when Σ is unknown

In the entire parameter space Ω of a N p(μ, Σ) population, μ is estimated by the sample average \(\bar {X}\) and, as previously determined, the maximum of the likelihood function is

$$\displaystyle \begin{aligned}\max_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{np}{2}}n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}|S|{}^{\frac{n}{2}}} \end{aligned}$$
(i)

where S is the sample sum of products matrix and n is the sample size. Under the hypothesis H o :  μ 1 = ⋯ =  μ p = ν, where ν is unknown, this ν is estimated by \(\hat {\nu }=\frac {1}{np}\sum _{i,j}x_{ij}=\frac {1}{p}J'\bar {X},\) J′ = (1, …, 1), the p × 1 sample vectors \(X_j^{\prime }=(x_{1j},\ldots ,x_{pj}),~ j=1,\ldots ,n\), being independently distributed. Thus, under the null hypothesis H o, the population covariance matrix is estimated by \(\frac {1}{n}(S+n(\bar {X}-\hat {\mu })(\bar {X}-\hat {\mu })')\), and, proceeding as was done to obtain Eq. (6.3.3), the λ-criterion reduces to

$$\displaystyle \begin{aligned} \lambda&=\frac{|S|{}^{\frac{n}{2}}}{|S+n(\bar{X}-\hat{\mu})(\bar{X}-\hat{\mu})^{\prime}|{}^{\frac{n}{2}}}{} \end{aligned} $$
(6.3.12)
$$\displaystyle \begin{aligned} &=\frac{1}{(1+u)^{\frac{n}{2}}},~ u=n(\bar{X}-\hat{\mu})'S^{-1}(\bar{X}-\hat{\mu}).{}\end{aligned} $$
(6.3.13)

Given the structure of u in (6.3.13), we can take the Gaussian population covariance matrix Σ to be the identity matrix I, as was explained in Sect. 6.3.1. Observe that

$$\displaystyle \begin{aligned}(\bar{X}-\hat{\mu})'=(\bar{X}-\frac{1}{p}JJ'\bar{X})'=\bar{X}'[I-\frac{1}{p}JJ'] \end{aligned}$$
(ii)

where \(I-\frac {1}{p}JJ'\) is idempotent of rank p − 1; hence there exists an orthonormal matrix P, PP′ = I, P′P = I, such that

where V 1 is the subvector of the first p − 1 components of V . Then the quadratic form u, which is our test statistic, reduces to the following:

We note that the test statistic u has the same structure that of u in Theorem 6.3.1 with p replaced by p − 1. Accordingly, \(u=n(\bar {X}-\hat {\mu })'S^{-1}(\bar {X}-\hat {\mu })\) is distributed as a real scalar type-2 beta variable with the parameters \(\frac {p-1}{2}\) and \(\frac {n-(p-1)}{2}\), so that \(\frac {n-p+1}{p-1}u\sim F_{p-1,\,n-p+1}\). Thus, the test criterion consists of

$$\displaystyle \begin{aligned} \mbox{rejecting}\ H_o\ \mbox{if the observed value of }\frac{n-p+1}{p-1}u&\ge F_{p-1,~n-p+1,~\alpha},\\ \mbox{with}\ Pr\{F_{p-1,\,n-p+1}\ge F_{p-1,\,n-p+1,~\alpha}\}&=\alpha.{}\end{aligned} $$
(6.3.14)

Example 6.3.5

Let the population be N 2(μ, Σ), Σ > O, μ′ = (μ 1, μ 2) and the null hypothesis be H o : μ 1 = μ 2 = ν where ν and Σ are unknown. The sample values, as specified in Example 6.3.3, are

The maximum likelihood estimate of μ under H o, is

and

As previously calculated, the sample sum of products matrix is

The test statistic v and its observed value are

At significance level α = 0.05, the tabulated critical value F 1,3,0.05 is 10.13, and since the observed value 0.61 is less than 10.13, H o is not rejected.

6.4. Testing Hypotheses on the Population Covariance Matrix

Let the p × 1 independent vectors X j, j = 1, …, n, have a p-variate real nonsingular N p(μ, Σ) distribution and the p × n matrix X = (X 1, …, X n) be the sample matrix. Denoting the sample average by \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) and letting \(\bar {\mathbf {X}}=(\bar {X},\ldots ,\bar {X})\), each column of \(\bar {\mathbf {X}}\) being equal to \(\bar {X}\), the sample sum of products matrix is \(S=(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'\). We have already established that S is Wishart distributed with m = n − 1 degrees of freedom, that is, S ∼ W p(m, Σ), Σ > O. Letting S μ = (X −M)(X −M) where M = (μ, …, μ), each of its column being the p × 1 vector μ, S μ ∼ W p(n, Σ), Σ > O, where the number of degrees of freedom is n itself whereas the number of degrees of freedom associated with S is m = n − 1. Let us consider the hypothesis H o : Σ = Σ o where Σ o is a given known matrix and μ is unspecified. Then, the MLE’s of μ and Σ in the entire parameter space are \(\hat {\mu }=\bar {X}\) and \( \hat {\varSigma }=\frac {1}{n}S\), and the joint density of the sample values X 1, …, X n, denoted by L, is given by

$$\displaystyle \begin{aligned}L=\frac{1}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{n}{2}{\mathrm{tr}}(\bar{X}-\mu)(\bar{X}-\mu)'}.{}\end{aligned} $$
(6.4.1)

Thus, as previously determined, the maximum of L in the parameter space Ω = {(μ, Σ)|Σ > O} is

$$\displaystyle \begin{aligned}\max _{\Omega}L=\frac{n^{\frac{np}{2}}{\mathrm{e}}^{-\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}|S|{}^{\frac{n}{2}}}\, , \end{aligned}$$
(i)

the maximum of L under the null hypothesis H o : Σ = Σ o being given by

$$\displaystyle \begin{aligned}\max_{H_o}L=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma_o^{-1}S)}}{(2\pi)^{\frac{np}{2}}|\varSigma_o|{}^{\frac{n}{2}}}\,.\end{aligned}$$
(ii)

Then, the λ-criterion is the following:

$$\displaystyle \begin{aligned}\lambda=\frac{{\mathrm{e}}^{\frac{np}{2}}}{n^{\frac{np}{2}}}\,|\varSigma_o^{-1}S\,|{}^{\frac{n}{2}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma_o^{-1}S)}.{}\end{aligned} $$
(6.4.2)

Letting \(u=\lambda ^{\frac {2}{n}}\),

$$\displaystyle \begin{aligned}u=\frac{{\mathrm{e}}^p}{n^p}\,|\varSigma_o^{-1}S|\,{\mathrm{e}}^{-\frac{1}{n}{\mathrm{tr}}(\varSigma_o^{-1}S)},{}\end{aligned} $$
(6.4.3)

and we would reject H o for small values of u since it is a monotonically increasing function of λ, which means that the null hypothesis ought to be rejected for large values of \({\mathrm{tr}}(\varSigma _o^{-1}S)\) as the exponential function dominates the polynomial function for large values of the argument. Let us determine the distribution of \(w={\mathrm{tr}}(\varSigma _o^{-1}S)\) whose Laplace transform with parameter s is

$$\displaystyle \begin{aligned}L_w(s)=E[{\mathrm{e}}^{-sw}]=E[{\mathrm{e}}^{-s\,{\mathrm{tr}}(\varSigma_o^{-1}S)}]. \end{aligned}$$
(iii)

This can be evaluated by integrating out over the density of S which has a Wishart distribution with m = n − 1 degrees of freedom when μ is estimated:

$$\displaystyle \begin{aligned}L_w(s)=\frac{1}{2^{\frac{mp}{2}}\varGamma_p(\frac{m}{2})|\varSigma|{}^{\frac{m}{2}}}\int_{S>O}|S|{}^{\frac{m}{2}-\frac{p+1}{2}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-s\,{\mathrm{tr}}(\varSigma_o^{-1}S)}{\mathrm{d}}S. \end{aligned}$$
(iv)

The exponential part is \(-\frac {1}{2}{\mathrm{tr}}(\varSigma ^{-1}S)-s\,{\mathrm{tr}}(\varSigma _o^{-1}S)=-\frac {1}{2}{\mathrm{tr}}[(\varSigma ^{-\frac {1}{2}}S\varSigma ^{-\frac {1}{2}})(I+2s\varSigma ^{\frac {1}{2}}\varSigma _o^{-1}\varSigma ^{\frac {1}{2}})]\) and hence,

$$\displaystyle \begin{aligned}L_w(s)=|I+2s\varSigma^{\frac{1}{2}}\varSigma_o^{-1}\varSigma^{\frac{1}{2}}|{}^{-\frac{m}{2}}.{}\end{aligned} $$
(6.4.4)

The null case, Σ  =  Σ o

In this case, \(\varSigma ^{\frac {1}{2}}\varSigma _o^{-1}\varSigma ^{\frac {1}{2}}=I,\) so that

$$\displaystyle \begin{aligned}L_w(s)=|I+2sI|{}^{-\frac{m}{2}}=(1+2s)^{-\frac{mp}{2}}\Rightarrow w\sim \chi_{mp}^2.{}\end{aligned} $$
(6.4.5)

Thus, the test criterion is the following:

$$\displaystyle \begin{aligned}\mbox{Reject}\ H_o\ \mbox{if } w\ge \chi_{mp,\,\alpha}^2,\ \mbox{with}\ Pr\{\chi_{mp}^2\ge \chi_{mp,\,\alpha}^2\}=\alpha.{}\end{aligned} $$
(6.4.6)

When μ is known, it is used instead of its MLE to determine S μ, and the resulting criterion consists of rejecting H o whenever the observed \(w_{\mu }={\mathrm{tr}}(\varSigma _o^{-1}S_{\mu })\ge \chi _{np,\,\alpha }^2\) where n is the sample size. These results are summarized in the following theorem.

Theorem 6.4.1

Let the null hypothesis be H o : Σ = Σ o (given) and \(w={\mathrm{tr}}(\varSigma _o^{-1}S)\) where S is the sample sum of products matrix. Then, the null distribution of \(w={\mathrm{tr}}(\varSigma _o^{-1}S)\) has a real scalar chisquare distribution with (n − 1)p degrees of freedom when the estimate of μ, namely \(\hat {\mu }=\bar {X}\) , is utilized to compute S; when μ is specified, w has a chisquare distribution having np degrees of freedom where n is the sample size.

The non-null density of w

The non-null density of w is available from (6.4.4). Let λ 1, …, λ p be the eigenvalues of \(\varSigma ^{\frac {1}{2}}\varSigma _o^{-1}\varSigma ^{\frac {1}{2}}\). Then L w(s) in (6.4.4) can be re-expressed as follows:

$$\displaystyle \begin{aligned}L_w(s)=\prod_{j=1}^p(1+2\lambda_js)^{-\frac{m}{2}}.{}\end{aligned} $$
(6.4.7)

This is the Laplace transform of a variable of the form w = λ 1 w 1 + ⋯ + λ p w p where w 1, …, w p are independently distributed real scalar chisquare random variables, each having m = n − 1 degrees of freedom, where λ j > 0, j = 1, …, p. The distribution of linear combinations of chisquare random variables corresponds to the distribution of quadratic forms; the reader may refer to Mathai and Provost (1992) for explicit representations of their density functions.

Note 6.4.1

If the population mean value μ is known, then one can proceed by making use of μ instead of the sample mean to determine S μ, in which case n, the sample size, ought to be used instead of m = n − 1 in the above discussion.

6.4.1. Arbitrary moments of λ

From (6.4.2), the h-th moment of the λ-criterion for testing H o : Σ = Σ o (given) in a real nonsingular N p(μ, Σ) population, is obtained as follows:

$$\displaystyle \begin{aligned}\lambda^h=\frac{{\mathrm{e}}^{\frac{nph}{2}}}{n^{\frac{nph}{2}}}|\varSigma_o^{-1}|{}^{\frac{nh}{2}}|S|{}^{\frac{nh}{2}}{\mathrm{e}}^{-\frac{h}{2}{\mathrm{tr}}(\varSigma_o^{-1}S)}\ \Rightarrow\end{aligned}$$
$$\displaystyle \begin{aligned} E[\lambda^h]&=\frac{{\mathrm{e}}^{\frac{nph}{2}}}{n^{\frac{nph}{2}}2^{\frac{(n-1)p}{2}}|\varSigma_o|{}^{\frac{nh}{2}}|\varSigma|{}^{\frac{n-1}{2}}\varGamma_p(\frac{n-1}{2})}\int_{S>O}|S|{}^{\frac{nh}{2}+\frac{n-1}{2}-\frac{p+1}{2}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{h}{2}{\mathrm{tr}}(\varSigma_o^{-1}S)}{\mathrm{d}}S\\ &=\frac{{\mathrm{e}}^{\frac{nph}{2}}2^{p(\frac{nh}{2}+\frac{n-1}{2})}\varGamma_p(\frac{nh}{2}+\frac{n-1}{2})}{n^{\frac{nph}{2}}2^{\frac{(n-1)p}{2}}|\varSigma_o|{}^{\frac{nh}{2}}|\varSigma|{}^{\frac{n-1}{2}} \varGamma_p(\frac{n-1}{2})}|\varSigma^{-1}+h\varSigma_o^{-1}|{}^{-(\frac{nh}{2}+\frac{n-1}{2})}\\ &={\mathrm{e}}^{\frac{nph}{2}}\Big(\frac{2}{n}\Big)^{\frac{nph}{2}}\frac{|\varSigma|{}^{\frac{nh}{2}}}{|\varSigma_o|{}^{\frac{nh}{2}}}\frac{\varGamma_p(\frac{nh}{2}+\frac{n-1}{2})} {\varGamma_p(\frac{n-1}{2})}|I+h\varSigma\varSigma_o^{-1}|{}^{-(\frac{nh}{2}+\frac{n-1}{2})}{}\end{aligned} $$
(6.4.8)

for \(\Re (\frac {nh}{2}+\frac {n-1}{2})>\frac {p-1}{2},~ I+h\varSigma \varSigma _o^{-1}>O\). Under H o : Σ = Σ o, we have \(|I+h\varSigma \varSigma _o^{-1}|{ }^{-(\frac {nh}{2}+\frac {n-1}{2})}=(1+h)^{-p(\frac {nh}{2}+\frac {n-1}{2})}\). Thus, the h-th null moment is given by

$$\displaystyle \begin{aligned} E[\lambda^h|H_o]={\mathrm{e}}^{\frac{nph}{2}}\Big(\frac{2}{n}\Big)^{\frac{nph}{2}}\frac{\varGamma_p(\frac{nh}{2}+\frac{n-1}{2})}{\varGamma_p(\frac{n-1}{2})} (1+h)^{-p(\frac{nh}{2}+\frac{n-1}{2})} \end{aligned} $$
(6.4.9)

for \(\Re (\frac {nh}{2}+\frac {n-1}{2})>\frac {p-1}{2}\).

6.4.2. The asymptotic distribution of \(-2\ln \lambda \) when testing H o : Σ = Σ o

Let us determine the asymptotic distribution of \(-2\ln \lambda \) where λ is the likelihood ratio statistic for testing H o : Σ = Σ o (specified) in a real nonsingular N p(μ, Σ) population, as n →, n being the sample size. This distribution can be determined by expanding both real matrix-variate gamma functions appearing in (6.4.9) and applying Stirling’s approximation formula as given in (6.5.14) by letting \(\frac {n}{2}(1+h)\to \infty \) in the numerator gamma functions and \(\frac {n}{2}\to \infty \) in the denominator gamma functions. Then, we have

$$\displaystyle \begin{aligned} \frac{\varGamma_p(\frac{nh}{2}+\frac{n-1}{2})}{\varGamma_p(\frac{n-1}{2})}&=\prod_{j=1}^p\frac{\varGamma(\frac{n}{2}(1+h)-\frac{1}{2}-\frac{j-1}{2})} {\varGamma(\frac{n}{2}-\frac{1}{2}-\frac{j-1}{2})} \\ &\to \prod_{j=1}^p\frac{(2\pi)^{\frac{1}{2}}}{(2\pi)^{\frac{1}{2}}}\frac{[\frac{n}{2}(1+h)]^{\frac{n}{2}(1+h)-\frac{1}{2}-\frac{j}{2}}} {[\frac{n}{2}]^{\frac{n}{2}-\frac{1}{2}-\frac{j}{2}}}\frac{{\mathrm{e}}^{-\frac{n}{2}(1+h)}}{{\mathrm{e}}^{-\frac{n}{2}}} \\ &=\Big(\frac{n}{2}\Big)^{\frac{nph}{2}}{\mathrm{e}}^{-\frac{nph}{2}}(1+h)^{\frac{np}{2}(1+h)-\frac{p}{2}-\frac{p(p+1)}{4}}.\end{aligned} $$

Hence, from (6.4.9)

$$\displaystyle \begin{aligned} E[\lambda^h|H_o]\to (1+h)^{-\frac{p(p+1)}{4}}\ \ \mbox{as}\ n \to\infty, \end{aligned} $$
(6.4.10)

where \((1+h)^{-\frac {p(p+1)}{4}}\) is the h-th moment of the distribution of ey∕2 when \(y\sim \chi ^2_{\frac {p(p+1)}{2}}\). Thus, under H o, \(-2\ln \lambda \to \chi ^2_{\frac {p(p+1)}{2}}\) as n → . For general procedures leading to asymptotic normality, see Mathai (1982).

Theorem 6.4.2

Letting λ be the likelihood ratio statistic for testing H o : Σ = Σ o (given) on the covariance matrix of a real nonsingular N p(μ, Σ) distribution, the null distribution of \(-2\ln \lambda \) is asymptotically (as then sample size tends to ∞) that of a real scalar chisquare random variable having \(\frac {p(p+1)}{2}\) degrees of freedom, where n denotes the sample size. This number of degrees of freedom is also equal to the number of parameters restricted by the null hypothesis.

Note 6.4.2

Sugiura and Nagao (1968) have shown that the test based on the statistic λ as specified in (6.4.2) is biased whereas it becomes unbiased upon replacing n, the sample size, by the degrees of freedom n − 1 in (6.4.2). Accordingly, percentage points are then computed for \(-2\ln \lambda _1\), where λ 1 is the statistic λ given in (6.4.2) wherein n − 1 is substituted to n. Korin (1968), Davis (1971), and Nagarsenker and Pillai (1973) computed 5% and 1% percentage points for this test statistic. Davis and Field (1971) evaluated the percentage points for p = 2(1)10 and n = 6(1)30(5)50, 60, 120 and Korin (1968), for p = 2(1)10.

Example 6.4.1

Let us take the same 3-variate real Gaussian population N 3(μ, Σ), Σ > O and the same data as in Example 6.3.1, so that intermediate calculations could be utilized. The sample size is 5 and the sample values are the following:

the sample average and the sample sum of products matrix being

Let us consider the hypothesis Σ = Σ o where

Let us test the null hypothesis at the significance level α = 0.05. The distribution of the test statistic w and the tabulated critical value are as follows:

$$\displaystyle \begin{aligned}w={\mathrm{tr}}(\varSigma_o^{-1}S)\sim \chi^2_{(n-1)p}\simeq \chi^2_{12}\,;~ \chi^2_{12,~0.05}=21.03.\end{aligned}$$

As the observed value 12.12 < 21.03, H o is not rejected. The asymptotic distribution of \(-2\ln \lambda \), as n →, is \(\chi ^2_{p(p+1)/2}\simeq \chi ^2_6\) where λ is the likelihood ratio criterion statistic. Since \(\chi ^2_{6,~0.05}=12.59\) and 12.59 > 12.12, we still do not reject H o as n →.

6.4.3. Tests on Wilks’ concept of generalized variance

The concept of generalized variance was explained in Chap. 5. The sample generalized variance is simply the determinant of S, the sample sum of products matrix. When the population is p-variate Gaussian, it has already been shown in Chap. 5 that S is Wishart distributed with m = n − 1 degrees of freedom, n being the sample size, and parameter matrix Σ > O, which is the population covariance matrix. When the population is multivariate normal, several types of tests of hypotheses involve the sample generalized variance. The first author has given the exact distributions of such tests, see Mathai (1972a,b) and Mathai and Rathie (1971).

6.5. The Sphericity Test or Testing if H o : Σ = σ 2 I, Given a N p(μ, Σ) Sample

When the covariance matrix Σ = σ 2 I, where σ 2 > 0 is a real scalar quantity, the ellipsoid (X − μ)′Σ −1(X − μ) = c > 0, which represents a specific contour of constant density for a nonsingular N p(μ, Σ) distribution, becomes the sphere defined by the equation \(\frac {1}{\sigma ^2}(X-\mu )'(X-\mu ) =c\) or \(\frac {1}{\sigma ^2}((x_1-\mu _1)^2+\cdots +(x_p-\mu _p)^2)=c>0, \) whose center is located at the point μ; hence the test’s name, the sphericity test. Given a N p(μ, Σ) sample of size n, the maximum of the likelihood function in the entire parameter space is

$$\displaystyle \begin{aligned}{\mathrm{sup}}_{\Omega}L=\frac{n^{\frac{np}{2}}{\mathrm{e}}^{-\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}|S|{}^{\frac{n}{2}}}, \end{aligned}$$

as was previously established. However, under the null hypothesis H o : Σ = σ 2 I, tr(Σ −1 S) = (σ 2)−1(tr(S)) and |Σ| = (σ 2)p. Thus, if we let θ = σ 2 and substitute \(\hat {\mu }=\bar {X}\) in L, under H o the loglikelihood function will be \(\ln L_{\omega }=-\frac {np}{2}\ln (2\pi )-\frac {np}{2}\ln \theta -\frac {1}{2\theta }{\mathrm{tr}}(S). \) Differentiating this function with respect to θ and equating the result to zero produces the following estimator for θ:

$$\displaystyle \begin{aligned}\hat{\theta}=\hat{\sigma}^2=\frac{{\mathrm{tr}}(S)}{np}.{}\end{aligned} $$
(6.5.1)

Accordingly, the maximum of the likelihood function under H o is the following:

$$\displaystyle \begin{aligned}\max_{\omega}L=\frac{n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}}\frac{{\mathrm{e}}^{-\frac{np}{2}}}{[\frac{{\mathrm{tr}}(S)}{p}]^{\frac{n}{2}}}.\end{aligned}$$

Thus, the λ-criterion for testing

$$\displaystyle \begin{aligned}H_o: \varSigma=\sigma^2 I,~ \sigma^2>0\mbox{ (unknown)}\end{aligned}$$

is

$$\displaystyle \begin{aligned}\lambda=\frac{{\mathrm{sup}}_{\omega}L}{{\mathrm{sup}}_{\Omega}L}=\frac{|S|{}^{\frac{n}{2}}}{[\frac{{\mathrm{tr}}(S)}{p}]^{\frac{n}{2}}}.{}\end{aligned} $$
(6.5.2)

In the complex Gaussian case when \(\tilde {X}_j\sim \tilde {N}_p(\tilde {\mu },\varSigma ), ~\varSigma =\varSigma ^{*}>O\) where an asterisk indicates the conjugate transpose, \(\tilde {X}_j=X_{j1}+iX_{j2}\) where X j1 and X j2 are real p × 1 vectors and \(i=\sqrt {(-1)}\). The covariance matrix associated with \(\tilde {X}_j\) is then defined as

$$\displaystyle \begin{aligned} {\mathrm{Cov}}(\tilde{X}_j)&=E[(\tilde{X}_j-\tilde{\mu})(\tilde{X}_j-\tilde{\mu})^{*}]\\ &=E[(X_{j1}-\mu_{(1)})+i(X_{j2}-\mu_{(2)})][(X_{j1}-\mu_{(1)})'-i(X_{j2}-\mu_{(2)})']\\ &=\varSigma_{11}+\varSigma_{22}+ i(\varSigma_{21}-\varSigma_{12})\equiv\varSigma, \ \ \mbox{with } \mu=\mu_{(1)}+i\mu_{(2)},\end{aligned} $$

where Σ is assumed to be Hermitian positive definite, with Σ 11 = Cov(X j1), Σ 22 = Cov(X j2), Σ 12 = Cov(X j1, X j2) and Σ 21 = Cov(X j2, X j1). Thus, the hypothesis of sphericity in the complex Gaussian case is Σ = σ 2 I where σ is real and positive. Then, under the null hypothesis \(\tilde {H}_o: \varSigma =\sigma ^2 I\), the Hermitian form \(\tilde {Y}^{*}\varSigma \tilde {Y}=c>0\) where c is real and positive, becomes \(\sigma ^2\tilde {Y}^{*}\tilde {Y}=c\Rightarrow |\tilde {y}_1|{ }^2+\cdots +|\tilde {y}_p|{ }^2=\frac {c}{\sigma ^2}>0\), which defines a sphere in the complex space, where \(|\tilde {y}_j|\) denotes the absolute value or modulus of \(\tilde {y}_j\). If \(\tilde {y}_j=y_{j1}+iy_{j2}\) with \(i=\sqrt {(-1)},~ y_{j1},y_{j2}\) being real, then \(|\tilde {y}_j|{ }^2=y_{j1}^2+y_{j2}^2\).

The joint density of the sample values in the real Gaussian case is the following:

$$\displaystyle \begin{aligned} L=\prod_{j=1}^n\frac{{\mathrm{e}}^{-\frac{1}{2}(X_j-\mu)'\varSigma^{-1}(X_j-\mu)}}{(2\pi)^{\frac{p}{2}}|\varSigma|{}^{\frac{1}{2}}} =\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-n(\bar{X}-\mu)'\varSigma^{-1}(\bar{X}-\mu)}}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}\end{aligned}$$

where \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\), X j, j = 1, …, n are iid N p(μ, Σ), Σ > O. We have already derived the maximum of L in the entire parameter space Ω, which, in the real case, is

$$\displaystyle \begin{aligned}{\mathrm{sup}}_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{np}{2}}n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}|S|{}^{\frac{n}{2}}},{}\end{aligned} $$
(6.5.3)

where S is the sample sum of products matrix. Under H o, \(|\varSigma |{ }^{\frac {n}{2}}=(\sigma ^2)^{\frac {np}{2}}\) and \({\mathrm{tr}}(\varSigma ^{-1}S)=\frac {1}{\sigma ^2}(s_{11}+\cdots +s_{pp})=\frac {1}{\sigma ^2}{\mathrm{tr}}(S)\). Thus, the maximum likelihood estimator of σ 2 is \(\frac {1}{np}{\mathrm{tr}}(S)\). Accordingly, the λ-criterion is

$$\displaystyle \begin{aligned}\lambda=|S|{}^{\frac{n}{2}}/\Big(\frac{{\mathrm{tr}}(S)}{p}\Big)^{\frac{np}{2}}\Rightarrow\ u_1=\lambda^{\frac{2}{n}}=\frac{p^p|S|}{[{\mathrm{tr}}(S)]^p},{}\end{aligned} $$
(6.5.4)

in the real case. Interestingly, (u 1)1∕p is the ratio of the geometric mean of the eigenvalues of S to their arithmetic mean. The structure remains the same in the complex domain, in which case \({\mathrm{det}}(\tilde {S})\) is replaced by the absolute value \(|{\mathrm{det}}(\tilde {S})|\) so that

$$\displaystyle \begin{aligned} \tilde{u}_1=\frac{p^p|{\mathrm{det}}(\tilde{S})|}{[{\mathrm{tr}}(\tilde{S})]^p}. \end{aligned} $$
(6.5a.1)

For arbitrary h, the h-th moment of u 1 in the real case can be obtained by integrating out over the density of S, which, as explained in Sect. 5.5, 5.5a, is a Wishart density with n − 1 = m degrees of freedom and parameter matrix Σ > O. However, when the null hypothesis H o holds, Σ = σ 2 I p, so that the h-th moment in the real case is

$$\displaystyle \begin{aligned} E[u_1^h|H_o]=\frac{p^{ph}}{2^{\frac{mp}{2}}\varGamma_p(\frac{m}{2})(\sigma^2)^{\frac{mp}{2}}}\int_{S>O}|S|{}^{\frac{m}{2}+h-\frac{p+1}{2}}{\mathrm{e}}^{-\frac{1}{2\sigma^2}{\mathrm{tr}}(S)}({\mathrm{tr}}(S))^{-ph}{\mathrm{d}}S. \end{aligned}$$
(i)

In order to evaluate this integral, we replace [tr(S)]ph by an equivalent integral:

$$\displaystyle \begin{aligned}{}[{\mathrm{tr}}(S)]^{-ph}=\frac{1}{\varGamma(ph)}\int_{x=0}^{\infty}x^{ph-1}{\mathrm{e}}^{-x({\mathrm{tr}}(S))}{\mathrm{d}}x,~ \Re(h)>0. \end{aligned}$$
(ii)

Then, substituting (ii) in (i), the exponent becomes \(-\frac {1}{2\sigma ^2}(1+2\sigma ^2 x)({\mathrm{tr}}(S))\). Now, letting \(S_1=\frac {1}{2\sigma ^2}(1+2\sigma ^2 x)S\Rightarrow {\mathrm{d}}S=(2\sigma ^2)^{\frac {p(p+1)}{2}}(1+2\sigma ^2 x)^{-\frac {p(p+1)}{2}}{\mathrm{d}}S_1\), and we have

$$\displaystyle \begin{aligned} E[u_1^h|H_o]&=\frac{(2\sigma^2)^{ph}}{\varGamma_p( \frac{m}{2})}\frac{p^{ph}}{\varGamma(ph)}\int_0^{\infty}x^{ph-1}(1+2\sigma^2 x)^{-(\frac{m}{2}+h)p}{\mathrm{d}}x\\ &\qquad \qquad \qquad \times \int_{S_1>O}|S_1|{}^{\frac{m}{2}+h-\frac{p+1}{2}}{\mathrm{e}}^{-{\mathrm{tr}}(S_1)}{\mathrm{d}}S_1\\ &=\frac{\varGamma_p(\frac{m}{2}+h)}{\varGamma_p(\frac{m}{2})}\frac{p^{ph}}{\varGamma(ph)}\int_0^{\infty}y^{ph-1}(1+y)^{-(\frac{m}{2}+h)p}{\mathrm{d}}y, ~ y=2\sigma^2 x\\ &=\frac{\varGamma_p(\frac{m}{2}+h)}{\varGamma_p(\frac{m}{2})}p^{ph}\frac{\varGamma(\frac{mp}{2})}{\varGamma(\frac{mp}{2}+ph)},~ \Re(h)>0, \ m=n-1.{}\end{aligned} $$
(6.5.5)

The corresponding h-th moment in the complex case is the following:

$$\displaystyle \begin{aligned} E[\tilde{u}_1^h|H_o]=\frac{\tilde{\varGamma}_p(m+h)}{\tilde{\varGamma}_p(m)}p^{ph}\frac{\tilde{\varGamma}(mp)}{\tilde{\varGamma}(mp+ph)},~\Re(h)>0,~m=n-1. \end{aligned}$$
(6.5a.2)

By making use of the multiplication formula for gamma functions, one can expand the real gamma function Γ(mz) as follows:

$$\displaystyle \begin{aligned}\varGamma(mz)=(2\pi)^{\frac{1-m}{2}}m^{mz-\frac{1}{2}}\varGamma(z)\varGamma\big(z+\frac{1}{m}\big)\cdots\varGamma\big(z+\frac{m-1}{m}\big),~ m=1,2,\ldots\, ,{}\end{aligned} $$
(6.5.6)

and for m = 2, we have the duplication formula

$$\displaystyle \begin{aligned}\varGamma(2z)=\pi^{-\frac{1}{2}}2^{2z-1}\varGamma(z)\varGamma\big(z+\frac{1}{2}\big).{}\end{aligned} $$
(6.5.7)

Then on applying (6.5.6),

$$\displaystyle \begin{aligned}\frac{p^{ph}\varGamma(\frac{mp}{2})}{\varGamma(\frac{mp}{2}+ph)}=\frac{\varGamma(\frac{m}{2})}{\varGamma(\frac{m}{2}+h)}\prod_{j=1}^{p-1}\frac{\varGamma(\frac{m}{2}+\frac{j}{p})} {\varGamma(\frac{m}{2}+h+\frac{j}{p})}. \end{aligned}$$
(iii)

Moreover, it follows from the definition of the real matrix-variate gamma functions that

$$\displaystyle \begin{aligned}\frac{\varGamma_p(\frac{m}{2}+h)}{\varGamma_p(\frac{m}{2})}=\prod_{j=1}^p\frac{\varGamma(\frac{m}{2}-\frac{j-1}{2}+h)}{\varGamma(\frac{m}{2}-\frac{j-1}{2})}. \end{aligned}$$
(iv)

On canceling \(\varGamma (\frac {m}{2}+h)/\varGamma (\frac {m}{2})\) when multiplying (iii) by (iv), we are left with

$$\displaystyle \begin{aligned} E[u_1^h|H_o]=\Big\{\prod_{j=1}^{p-1}\frac{\varGamma(\frac{m}{2}+\frac{j}{p})}{\varGamma(\frac{m}{2}-\frac{j}{2})}\Big\}\Big\{\prod_{j=1}^{p-1} \frac{\varGamma(\frac{m}{2}-\frac{j}{2}+h)} {\varGamma(\frac{m}{2}+\frac{j}{p}+h)}\Big\},~ m=n-1. \end{aligned} $$
(6.5.8)

The corresponding h-th moment in the complex case is the following:

$$\displaystyle \begin{aligned} E[\tilde{u}_1^h|H_o]=\Big\{\prod_{j=1}^{p-1}\frac{\tilde{\varGamma}(m+\frac{j}{p})}{\tilde{\varGamma}(m-j)}\Big\}\Big\{\prod_{j=1}^{p-1}\frac{\tilde{\varGamma}(m-j+h)} {\tilde{\varGamma}(m+\frac{j}{p}+h)}\Big\},~m=n-1. \end{aligned} $$
(6.5a.3)

For h = s − 1, one can treat \(E[u_1^{s-1}|H_o]\) as the Mellin transform of the density of u 1 in the real case. Letting this density be denoted by \(f_{u_1}(u_1)\), it can be expressed in terms of a G-function as follows:

$$\displaystyle \begin{aligned} f_{u_1}(u_1|H_o)=c_1G_{p-1,p-1}^{p-1,0}\left[u_1\big\vert_{\frac{m}{2}-\frac{j}{2}-1,\ j=1,\ldots,p-1}^{\frac{m}{2}+\frac{j}{p}-1,\ j=1,\ldots,p-1}\right],~0\le u_1\le 1, \end{aligned} $$
(6.5.9)

and \(f_{u_1}(u_1|H_o)=0\) elsewhere, where

$$\displaystyle \begin{aligned}c_1=\Big\{\prod_{j=1}^{p-1}\frac{\varGamma(\frac{m}{2}+\frac{j}{p})}{\varGamma(\frac{m}{2}-\frac{j}{2})}\Big\},\end{aligned}$$

the corresponding density in the complex case being the following:

$$\displaystyle \begin{aligned} \tilde{f}_{\tilde{u}_1|H_o}(\tilde{u}_1)=\tilde{c}_1\tilde{G}_{p-1,p-1}^{p-1,0}\left[\tilde{u}_1\big\vert_{m-j-1,\ j=1,\ldots,p-1}^{m+\frac{j}{p}-1,\ j=1,\ldots,p-1}\right],~0\le |\tilde{u}_2|\le 1, \end{aligned} $$
(6.5a.4)

and \(\tilde {f}_{\tilde {u}_1}(\tilde {u}_1)=0\) elsewhere, where \(\tilde {G}\) is a real G-function whose parameters are different from those appearing in (6.5.9), and

$$\displaystyle \begin{aligned}\tilde{c}_1=\Big\{\prod_{j=1}^{p-1}\frac{\tilde{\varGamma}(m+\frac{j}{p})}{\tilde{\varGamma}(m-j)}\Big\}.\end{aligned}$$

For computable series representation of a G-function with general parameters, the reader may refer to Mathai (1970a, 1993). Observe that u 1 in the real case is structurally a product of p − 1 mutually independently distributed real scalar type-1 beta random variables with the parameters \((\alpha _j=\frac {m}{2}-\frac {j}{2},~\beta _j= \frac {j}{2}+\frac {j}{p}),~j=1,\ldots ,p-1\). In the complex case, \(\tilde {u}_1\) is structurally a product of p − 1 mutually independently distributed real scalar type-1 beta random variables with the parameters \((\alpha _j=m-j,~\beta _j=j+\frac {j}{p}),~j=1,\ldots ,p-1\). This observation is stated as a result.

Theorem 6.5.1

Consider the sphericity test statistic for testing the hypothesis H o : Σ = σ 2 I where σ 2 > 0 is an unknown real scalar. Let u 1 and the corresponding complex quantity \(\tilde {u}_1\) be as defined in (6.5.4) and (6.5a.1) respectively. Then, in the real case, u 1 is structurally a product of p − 1 independently distributed real scalar type-1 beta random variables with the parameters \((\alpha _j=\frac {m}{2}-\frac {j}{2},~ \beta _j=\frac {j}{2}+\frac {j}{p}),~j=1,\ldots ,p-1,\) and, in the complex case, \(\tilde {u}_1\) is structurally a product of p − 1 independently distributed real scalar type-1 beta random variables with the parameters \((\alpha _j=m-j,~ \beta _j=j+\frac {j}{p}),~j=1,\ldots ,p-1,\) where m = n − 1, n =  the sample size.

For certain special cases, one can represent (6.5.9) and (6.5a.4) in terms of known elementary functions. Some such cases are now being considered. Real case: p  = 2

In the real case, for p = 2

$$\displaystyle \begin{aligned}E[u_1^h|H_o]=\frac{\varGamma(\frac{m}{2}+\frac{1}{2})}{\varGamma(\frac{m}{2}-\frac{1}{2})}\frac{\varGamma(\frac{m}{2}-\frac{1}{2}+h)}{\varGamma(\frac{m}{2}+\frac{1}{2}+h)} =\frac{\frac{m}{2}-\frac{1}{2}}{\frac{m}{2}-\frac{1}{2}+h}.\end{aligned}$$

This means u 1 is a real type-1 beta variable with the parameters \((\alpha =\frac {m}{2}-\frac {1}{2},~\beta =1)\). The corresponding result in the complex case is that \(\tilde {u}_1\) is a real type-1 beta variable with the parameters (α = m − 1, β = 1). Real case: p  = 3

In the real case

$$\displaystyle \begin{aligned}E[u_1^h|H_o]=\frac{\varGamma(\frac{m}{2}+\frac{1}{3})\varGamma(\frac{m}{2}+\frac{2}{3})}{\varGamma(\frac{m}{2}-\frac{1}{2})\varGamma(\frac{m}{2}-1)}\frac{\varGamma(\frac{m}{2}-\frac{1}{2}+h)\varGamma(\frac{m}{2}-1+h)} {\varGamma(\frac{m}{2}+\frac{1}{3}+h)\varGamma(\frac{m}{2}+\frac{2}{3}+h)},\end{aligned}$$

so that u 1 is equivalent to the product of two independently distributed real type-1 beta random variables with the parameters \((\alpha _j,\beta _j)=(\frac {m}{2}-\frac {j}{2},~ \frac {j}{2}+\frac {j}{3}),~ j=1,2\). This density can be obtained by treating \(E[u_1^h|H_o]\) for h = s − 1 as the Mellin transform of the density of u 1. The density is then available by taking the inverse Mellin transform. Thus, again denoting it by \(f_{u_1}(u_1)\), we have

$$\displaystyle \begin{aligned} f_{u_1}(u_1|H_o)&=c_3\frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty}\phi_3(s){\mathrm{d}}s, ~c>\frac{1}{2}\\ &=c_3\Big[\sum_{\nu=0}^{\infty}R_{\nu}+ \sum_{\nu=0}^{\infty}R_{\nu}^{\prime}\Big],\\ c_3&=\frac{\varGamma(\frac{m}{2}+\frac{1}{3})\varGamma(\frac{m}{2}+\frac{2}{3})}{\varGamma(\frac{m}{2}-\frac{1}{2})\varGamma(\frac{m}{2}-1)},\\ \phi_3(s)&= \frac{\varGamma(\frac{m}{2}-\frac{1}{2}-1+s)\varGamma(\frac{m}{2}-1-1+s)}{\varGamma(\frac{m}{2}+\frac{1}{3}-1+s)\varGamma(\frac{m}{2}+\frac{2}{3}-1+s)}u_1^{-s},\end{aligned} $$

where R ν is the residue of the integrand ϕ 3(s) at the poles of \(\varGamma (\frac {m}{2}-\frac {3}{2}+s)\) and \(R_{\nu }^{\prime }\) is the residue of the integrand ϕ 3(s) at the pole of \(\varGamma (\frac {m}{2}-2+s)\). Letting \(s_1=\frac {m}{2}-\frac {3}{2}+s\),

$$\displaystyle \begin{aligned} R_{\nu}&=\lim_{s\to -\nu+\frac{3}{2}-\frac{m}{2}}\phi_3(s)=\lim_{s_1\to -\nu}[(s_1+\nu)u_1^{\frac{m}{2}-\frac{3}{2}}\frac{\varGamma(s_1)\varGamma(-\frac{1}{2}+s_1)u_1^{-s_1}}{\varGamma(\frac{1}{3}+\frac{1}{2}+s_1)\varGamma(\frac{2}{3}+\frac{1}{2}+s_1)}\\ &=u_1^{\frac{m}{2}-\frac{3}{2}}\frac{(-1)^{\nu}}{\nu!}\frac{\varGamma(-\frac{1}{2}-\nu)}{\varGamma(\frac{1}{3}+\frac{1}{2}-\nu)\varGamma(\frac{2}{3}+\frac{1}{2}-\nu)}u_1^{\nu}.\end{aligned} $$

We can replace negative ν in the arguments of the gamma functions with positive ν by making use of the following formula:

$$\displaystyle \begin{aligned}\varGamma(a-\nu)=\frac{(-1)^{\nu}\varGamma(a)}{(-a+1)_{\nu}},~a\ne 1,2,\ldots,~\nu=0,1,\ldots,{}\end{aligned} $$
(6.5.10)

where for example, (b)ν is the Pochhammer symbol

$$\displaystyle \begin{aligned}(b)_{\nu}=b(b+1)\cdots(b+\nu-1), ~b\ne 0, ~(b)_0=1,{}\end{aligned} $$
(6.5.11)

so that

$$\displaystyle \begin{aligned} \varGamma \Big(-\frac{1}{2}-\nu\Big)&=\frac{(-1)^{\nu}\varGamma(-\frac{1}{2})}{(\frac{3}{2})_{\nu}},~ \varGamma\Big(\frac{1}{3}+\frac{1}{2}-\nu\Big)=\frac{(-1)^{\nu}\varGamma(\frac{1}{3}+\frac{1}{2})}{(-\frac{1}{3}+\frac{1}{2})_{\nu}},\\ \varGamma\Big(\frac{2}{3}+\frac{1}{2}-\nu\Big)&=\frac{(-1)^{\nu}\varGamma(\frac{2}{3}+\frac{1}{2})}{(-\frac{2}{3}+\frac{1}{2})_{\nu}}.\end{aligned} $$

The sum of the residues then becomes

$$\displaystyle \begin{aligned}\sum_{\nu=0}^{\infty}R_{\nu}=\frac{\varGamma(-\frac{1}{2})}{\varGamma(\frac{1}{3}+\frac{1}{2})\varGamma(\frac{2}{3}+\frac{1}{2})}u_1^{\frac{m}{2}-\frac{3}{2}} {{}_2F_1}\Big(-\frac{1}{3}+\frac{1}{2};-\frac{2}{3}+\frac{1}{2};\frac{3}{2};u_1\Big),~0\le u_1\le 1. \end{aligned}$$

It can be similarly shown that

$$\displaystyle \begin{aligned}\sum_{\nu=0}^{\infty}R_{\nu}^{\prime}=\frac{\varGamma(\frac{1}{2})}{\varGamma(\frac{1}{3}+1)\varGamma(\frac{2}{3}+1)}u_1^{\frac{m}{2}-2}{{}_2F_1} \Big(-\frac{1}{3},-\frac{2}{3};\frac{1}{2};u_1\Big),~0\le u_1\le 1. \end{aligned}$$

Accordingly, the density of u 1 for p = 3 is the following:

$$\displaystyle \begin{aligned} f_1(u_1|H_o)&=c_3\Big\{\frac{\varGamma(-\frac{1}{2})}{\varGamma(\frac{5}{6})\varGamma(\frac{7}{6})}u_1^{\frac{m}{2}-\frac{3}{2}}{{}_2F_1}\Big(\frac{1}{6},-\frac{1}{6};\frac{3}{2};u_1\Big)\\ &\ \ \ \ \ \ \ \ +\frac{\varGamma(\frac{1}{2})}{\varGamma(\frac{4}{3})\varGamma(\frac{5}{3})}u_1^{\frac{m}{2}-2}{{}_2F_1}\Big(-\frac{1}{3},-\frac{2}{3};\frac{1}{2};u_1\Big)\Big\},~0\le u_1\le 1{}\end{aligned} $$
(6.5.12)

and \(f_{u_1}(u_1|H_o)=0\) elsewhere. Real case: p  = 4

In this case,

$$\displaystyle \begin{aligned}E[u_1^h|H_o]=c_4\frac{\varGamma(\frac{m}{2}-\frac{3}{2}+s)\varGamma(\frac{m}{2}-2+s)\varGamma(\frac{m}{2}-\frac{5}{2}+s)} {\varGamma(\frac{m}{2}-\frac{3}{4}+s)\varGamma(\frac{m}{2}-\frac{2}{4}+s)\varGamma(\frac{m}{2}-\frac{1}{4}+s)}, \end{aligned}$$

where c 4 is the normalizing constant. However, noting that

$$\displaystyle \begin{aligned}\frac{\varGamma(\frac{m}{2}-\frac{3}{2}+s)}{\varGamma(\frac{m}{2}-\frac{1}{2}+s)}=\frac{1}{\frac{m}{2}-\frac{3}{2}+s}, \end{aligned}$$

there is one pole at \(s=-\frac {m}{2}+\frac {3}{2}\). The poles of \(\varGamma (\frac {m}{2}-\frac {5}{2}+s)\) occur at \(s=-\frac {m}{2}+\frac {5}{2}-\nu ,~\nu =0,1,\ldots ,\) and hence at ν = 1, the pole coincides with the earlier pole and there is a pole of order 2 at \(s=-\frac {m}{2}+\frac {3}{2}\). Each one of other poles of the integrand is simple, that is, of order 1. The second order pole will bring in a logarithmic function. As all the cases for which p ≥ 4 will bring in poles of higher orders, they will not be herein discussed. The general expansion of a G-function of the type \(G_{m,m}^{m,0}(\cdot )\) is provided in Mathai (1970a, 1993). In the complex case, starting from p ≥ 3, poles of higher orders are coming in, so that the densities can only be written in terms of logarithms, psi and zeta functions; hence, these will not be considered. Observe that \(\tilde {u}_1\) corresponds to product of independently distributed real type-1 beta random variables, even though the densities are available only in terms of logarithms, psi and zeta functions for p ≥ 3. The null and non-null densities of the λ-criterion in the general case, were derived by the first author and some results obtained under the null distribution can also be found in Mathai and Saxena (1973). Several researchers have contributed to various aspects of the sphericity and multi-sample sphericity tests; for some of the first author’s contributions, the reader may refer to Mathai and Rathie (1970) and Mathai (1977, 1984, 1986).

Gamma products such as those appearing in (6.5.8) and (6.5a.3) are frequently encountered when considering various types of tests on the parameters of a real or complex Gaussian or certain other types of distributions. Structural representations in the form of product of independently distributed real scalar type-1 beta random variables occur in numerous situations. Thus, a general asymptotic result on the h-th moment of such products of type-1 beta random variables will be derived. This is now stated as a result.

Theorem 6.5.2

Let u be a real scalar random variable whose h-th moment is of the form

$$\displaystyle \begin{aligned}E[u^h]=\frac{\varGamma_p(\alpha+\alpha h+\gamma)}{\varGamma_p(\alpha+\gamma)}\frac{\varGamma_p(\alpha+\gamma+\delta)}{\varGamma_p(\alpha+\alpha h+\gamma+\delta)}{}\end{aligned} $$
(6.5.13)

where Γ p(⋅) is a real matrix-variate gamma function on p × p real positive definite matrices, α is real, γ is bounded, δ is real, 0 < δ < ∞ and h is arbitrary. Then, as \(\alpha \to \infty ,\ -2\ln u\to \chi _{2\,p\,\delta }^2\) , a real chisquare random variable having 2 p δ degrees of freedom, that is, a real gamma random variable with the parameters (α = p δ, β = 2).

Proof

On expanding the real matrix-variate gamma functions, we have the following:

$$\displaystyle \begin{aligned}\frac{\varGamma_p(\alpha+\gamma+\delta)}{\varGamma_p(\alpha+\gamma)}=\prod_{j=1}^p\frac{\varGamma(\alpha+\gamma+\delta-\frac{j-1}{2})}{\varGamma(\alpha+\gamma-\frac{j-1}{2})}. \end{aligned}$$
(i)
$$\displaystyle \begin{aligned}\frac{\varGamma_p(\alpha(1+h)+\gamma)}{\varGamma_p(\alpha(1+h)+\gamma+\delta)}=\prod_{j=1}^p\frac{\varGamma(\alpha(1+h)+\gamma-\frac{j-1}{2})}{\varGamma(\alpha(1+h)+\gamma+\delta-\frac{j-1}{2})}. \end{aligned}$$
(ii)

Consider the following form of Stirling’s asymptotic approximation formula for gamma functions, namely,

$$\displaystyle \begin{aligned}\varGamma(z+\eta)\approx \sqrt{2\pi}z^{z+\eta-\frac{1}{2}}{\mathrm{e}}^{-z}\mbox{ for }|z|\to\infty\mbox{ and }\eta\mbox{ bounded. }{}\end{aligned} $$
(6.5.14)

On applying this asymptotic formula to the gamma functions appearing in (i) and (ii) for α →, we have

$$\displaystyle \begin{aligned}\prod_{j=1}^p\frac{\varGamma(\alpha+\gamma+\delta-\frac{j-1}{2})}{\varGamma(\alpha+\gamma-\frac{j-1}{2})}\to \alpha^{p\,\delta} \end{aligned}$$

and

$$\displaystyle \begin{aligned}\prod_{j=1}^p\frac{\varGamma(\alpha(1+h)+\gamma-\frac{j-1}{2})}{\varGamma(\alpha(1+h)+\gamma+\delta-\frac{j-1}{2})}\to [\alpha(1+h)]^{-p\,\delta}, \end{aligned}$$
(iii)

so that

$$\displaystyle \begin{aligned}E[u^h]\to (1+h)^{-p\,\delta}. \end{aligned}$$
(iv)

On noting that \(E[u^h]=E[{\mathrm{e}}^{h\ln u}]\to (1+h)^{-p\,\delta }\), it is seen that \(\ln u\) has the mgf (1 + h)pδ for 1 + h > 0 or \(-2\ln u\) has mgf (1 − 2h)pδ for 1 − 2h > 0, which happens to be the mgf of a real scalar chisquare variable with 2 p δ degrees of freedom if 2 p δ is a positive integer or a real gamma variable with the parameters (α =  p δ, β = 2). Hence the following result.

Corollary 6.5.1

Consider a slightly more general case than that considered in Theorem 6.5.2 . Let the h-th moment of u be of the form

$$\displaystyle \begin{aligned}E[u^h]=\Big\{\prod_{j=1}^p\frac{\varGamma(\alpha(1+h)+\gamma_j)}{\varGamma(\alpha +\gamma_j)}\Big\}\Big\{\prod_{j=1}^p\frac{\varGamma(\alpha+\gamma_j+\delta_j)}{\varGamma(\alpha(1+h)+\gamma_j+\delta_j)}\Big\}.{}\end{aligned} $$
(6.5.15)

Then as \(\alpha \to \infty ,~ E[u^h]\to (1+h)^{-(\delta _1+\cdots +\delta _p)}\) , which implies that \(-2\ln u\to \chi _{2(\delta _1+\cdots +\delta _p)}^2\) whenever 2(δ 1 + ⋯ + δ p) is a positive integer or, equivalently, \(-2\ln u\) tends to a real gamma variable with the parameters (α = δ 1 + ⋯ + δ p, β = 2) .

Let us examine the asymptotic distribution of the test statistic for the sphericity test in the light of Theorem 6.5.2. It is seen from (6.5.4) that \(\lambda ^h=u^{h\frac {n}{2}}\). Thus, by replacing h by \(\frac {n}{2}h\) in (6.5.8) with m = n − 1, we have

$$\displaystyle \begin{aligned} E[\lambda^h|H_o]=\Big\{\prod_{j=1}^{p-1}\frac{\varGamma(\frac{n-1}{2}+\frac{j}{p})}{\varGamma(\frac{n-1}{2}-\frac{j}{2})}\Big\} \Big\{\prod_{j=1}^{p-1}\frac{\varGamma(\frac{n}{2}(1+h) -\frac{1}{2}-\frac{j}{2})}{\varGamma(\frac{n}{2}(1+h)-\frac{1}{2}+\frac{j}{p})}\Big\}. \end{aligned} $$
(6.5.16)

Then, it follows from Corollary 6.5.1 that \(-2\ln \lambda \to \chi ^2_{2\sum _{j=1}^{p-1}(\frac {j}{2}+\frac {j}{p})}\), a chi-square random variable having \(2\sum _{j=1}^{p-1}(\frac {j}{2}+\frac {j}{p})=\frac {p(p-1)}{2}+(p-1)=\frac {(p-1)(p+2)}{2}\) degrees of freedom. Hence the following result:

Theorem 6.5.3

Consider the λ-criterion for testing the hypothesis of sphericity. Then, under the null hypothesis, \(-2\ln \lambda \to \chi ^2_{\frac {(p-1)(p+2)}{2}}\) , as the sample size n ∞. In the complex case, as n ∞, \(-2\ln \lambda \to \chi ^2_{(p-1)(p+1)}\) , a real scalar chisquare variable with \(2[\frac {p(p-1)}{2}+\frac {(p-1)}{2}]=p(p-1)+(p-1)=(p-1)(p+1)\) degrees of freedom.

Note 6.5.1

We observe that the degrees of freedom of the real chisquare variable in the real scalar case is \(\frac {(p-1)(p+2)}{2}\), which is also equal to the number of parameters restricted by the null hypothesis. Indeed, when Σ = σ 2 I, we have σ ij = 0, ij, which produces \(\frac {p(p-1)}{2}\) restrictions and, since σ 2 is unknown, requiring that the diagonal elements are such that σ 11 = ⋯ = σ pp produces p − 1 additional restrictions for a total of \(\frac {(p-1)(p+2)}{2}\) restrictions being imposed. Thus, the degrees of freedom of the asymptotic chisquare variable corresponds to the number of restrictions imposed by H o, which, actually, is a general result.

6.6. Testing the Hypothesis that the Covariance Matrix is Diagonal

Consider the null hypothesis that Σ, the nonsingular covariance matrix of a p-variate real normal distribution, is diagonal, that is,

$$\displaystyle \begin{aligned}H_o: \varSigma={\mathrm{diag}}(\sigma_{11},\ldots,\sigma_{pp}).\end{aligned}$$

Since the population is assumed to be normally distributed, this implies that the components of the p-variate Gaussian vector are mutually independently distributed as univariate normal random variables whose respective variances are σ jj, j = 1, …, p. Consider a simple random sample of size n from a nonsingular N p(μ, Σ) population or, equivalently, let X 1, …, X n be independently distributed as N p(μ, Σ) vectors, Σ > 0. Under H o, σ jj is estimated by its MLE which is \(\hat {\sigma }_{jj}=\frac {1}{n}s_{jj}\) where s jj is the j-th diagonal element of S = (s ij), the sample sum of products matrix. The maximum of the likelihood function under the null hypothesis is then

$$\displaystyle \begin{aligned}\max_{H_o}L=\prod_{j=1}^p\max_{H_o}L_j=\frac{1}{(2\pi)^{\frac{np}{2}}\prod_{j=1}^p[s_{jj}]^{\frac{n}{2}}}n^{\frac{np}{2}}{\mathrm{e}}^{-\frac{1}{2}(np)},\end{aligned}$$

the likelihood function being the joint density evaluated at an observed value of the sample. Observe that the overall maximum or the maximum in the entire parameter space remains the same as that given in (6.1.1). Thus, the λ-criterion is given by

$$\displaystyle \begin{aligned}\lambda=\frac{{\mathrm{sup}}_{\omega}L}{{\mathrm{sup}}_{\Omega}L}=\frac{|S|{}^{\frac{n}{2}}}{\prod_{j=1}^ps_{jj}^{\frac{n}{2}}}\Rightarrow u_2=\lambda^{\frac{2}{n}}=\frac{|S|}{\prod_{j=1}^ps_{jj}}{}\end{aligned} $$
(6.6.1)

where S ∼ W p(m, Σ), Σ > O, and m = n − 1, n being the sample size. Under H o, Σ = diag(σ 11, …, σ pp). Then for an arbitrary h, the h-th moment of u 2 is available by taking the expected value of \(\lambda ^{\frac {2}{n}}\) with respect to the density of S, that is,

$$\displaystyle \begin{aligned} E[u_2^h|H_o]=\int_{S>O}\frac{|S|{}^{\frac{m}{2}+h-\frac{p+1}{2}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)}(\prod_{j=1}^ps_{jj})^{-h}}{2^{\frac{mp}{2}}|\varSigma|{}^{\frac{m}{2}}\varGamma_p(\frac{m}{2})}{\mathrm{d}}S \end{aligned} $$
(6.6.2)

where, under H o, |Σ| = σ 11σ pp. As was done in Sect. 6.1.1, we may replace \(s_{jj}^{-h}\) by the equivalent integral,

$$\displaystyle \begin{aligned}s_{jj}^{-h}=\frac{1}{\varGamma(h)}\int_0^{\infty}x_j^{h-1}{\mathrm{e}}^{-x_j(s_{jj})}{\mathrm{d}}x_j,~ \Re(h)>0. \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned}\prod_{j=1}^ps_{jj}^{-h}=\frac{1}{[\varGamma(h)]^p}\int_0^{\infty}\cdots\int_0^{\infty}x_1^{h-1}\cdots x_p^{h-1}{\mathrm{e}}^{-{\mathrm{tr}}(YS)}{\mathrm{d}}x_1\wedge\ldots\wedge{\mathrm{d}}x_p \end{aligned}$$
(i)

where Y = diag(x 1, …, x p), so that tr(YS) = x 1 s 11 + ⋯ + x p s pp. Then, (6.6.2) can be reexpressed as follow:

$$\displaystyle \begin{aligned} E[u_2^h|H_o]&=\frac{\int_0^{\infty}\cdots \int_0^{\infty}x_1^{h-1}\cdots x_p^{h-1}}{2^{\frac{mp}{2}}\varGamma_p(\frac{m}{2})(\prod_{j=1}^p\sigma_{jj})^{\frac{mp}{2}}} \int_{S>O}|S|{}^{\frac{m}{2}+h-\frac{p+1}{2}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}((\varSigma^{-1}+2Y)S)}{\mathrm{d}}S\\ &=\frac{\varGamma_p(\frac{m}{2}+h)}{\varGamma_p(\frac{m}{2})}\frac{\int_0^{\infty}\cdots\int_0^{\infty}x_1^{h-1} \cdots x_p^{h-1}} {2^{\frac{mp}{2}}(\prod_{j=1}^p\sigma_{jj})^{\frac{mp}{2}}}\\ &\qquad \qquad \qquad \qquad \times \Big|\frac{(\varSigma^{-1}+2Y)}{2}\Big|{}^{-(\frac{m}{2}+h)}{\mathrm{d}}x_1,\wedge\ldots\wedge{\mathrm{d}}x_p,\end{aligned} $$

and observing that, under H o,

$$\displaystyle \begin{aligned} \Big|\frac{(\varSigma^{-1}+2Y)}{2}\Big|&=\Big|\frac{\varSigma^{-1}}{2}\Big|\,|I+2\varSigma Y| \mbox{ with}\\ |I+2\varSigma Y|&=(1+2\sigma_{11}y_1)\cdots (1+2\sigma_{pp}y_p),\end{aligned} $$
$$\displaystyle \begin{aligned} E[u_2^h|H_o]&=\frac{\varGamma_p(\frac{m}{2}+h)}{\varGamma_p(\frac{m}{2})}\prod_{j=1}^p\frac{1}{\varGamma(h)}\int_0^{\infty}y_j^{h-1}(1+y_j)^{-(\frac{m}{2}+h)}{\mathrm{d}}y_j,~ y_j=2x_j\sigma_{jj}\\ &=\frac{\varGamma_p(\frac{m}{2}+h)}{\varGamma_p(\frac{m}{2})}\Big[\frac{\varGamma(\frac{m}{2})}{\varGamma(\frac{m}{2}+h)}\Big]^p, ~ \Re(\frac{m}{2}+h)>\frac{p-1}{2}.{}\end{aligned} $$
(6.6.3)

Thus,

$$\displaystyle \begin{aligned} E[u_2^h|H_o]&=\Big[\frac{\varGamma(\frac{m}{2})}{\varGamma(\frac{m}{2}+h)}\Big]^p\prod_{j=1}^p\frac{\varGamma(\frac{m}{2}-\frac{j-1}{2}+h)}{\varGamma(\frac{m}{2}-\frac{j-1}{2})}\\ &=\frac{[\varGamma(\frac{m}{2})]^{p-1}}{\{\prod_{j=1}^{p-1}\varGamma(\frac{m}{2}-\frac{j}{2})\}}\frac{\{\prod_{j=1}^{p-1}\varGamma(\frac{m}{2}-\frac{j}{2}+h)\}}{[\varGamma(\frac{m}{2}+h)]^{p-1}}.\end{aligned} $$

Denoting the density of u 2 as \(f_{u_2}(u_2|H_o)\), we can express it as an inverse Mellin transform by taking h = s − 1. Then,

$$\displaystyle \begin{aligned}f_{u_2}(u_2|H_o)=c_{2,p-1}\, G_{p-1,p-1}^{p-1,0}\left[u_2\Big\vert_{\frac{m}{2}-\frac{j}{2}-1,~j=1,\ldots,p-1}^{\frac{m}{2}-1,\ldots,\frac{m}{2}-1}\right],~0\le u_2\le 1,{}\end{aligned} $$
(6.6.4)

and zero elsewhere, where

$$\displaystyle \begin{aligned}c_{2,p-1}=\frac{[\varGamma(\frac{m}{2})]^{p-1}}{\{\prod_{j=1}^{p-1}\varGamma(\frac{m}{2}-\frac{j}{2})\}}.\end{aligned}$$

Some special cases of this density are expounded below. Real and complex cases: p  = 2

When p = 2, u 2 has a real type-1 beta density with the parameters \((\alpha =\frac {m}{2}-\frac {1}{2},~\beta =\frac {1}{2})\) in the real case. In the complex case, it has a real type-1 beta density with the parameters (α = m − 1, β = 1). Real and complex cases: p  = 3

In this case, \(f_{u_2}(u_2|H_o)\) is given by

$$\displaystyle \begin{aligned}f_{u_2}(u_2|H_o)=c_{2,2}\frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty}\frac{\varGamma(\frac{m}{2}-\frac{3}{2}+s)\varGamma(\frac{m}{2}-2+s)}{[\varGamma(\frac{m}{2}-1+s)]^2}u_2^{-s}{\mathrm{d}}s. \end{aligned}$$

The poles of the integrand are simple. Those coming from \(\varGamma (\frac {m}{2}-\frac {3}{2}+s)\) occur at \(s=-\frac {m}{2}+\frac {3}{2}-\nu ,~ \nu =0,1,\ldots \) . The residue R ν is the following:

$$\displaystyle \begin{aligned}R_{\nu}=u_2^{\frac{m}{2}-\frac{3}{2}+\nu}\frac{(-1)^{\nu}}{\nu!}\frac{\varGamma(-\frac{1}{2}-\nu)}{[\varGamma(\frac{1}{2}-\nu)]^2} =\frac{\varGamma(-\frac{1}{2})}{[\varGamma(\frac{1}{2})]^2}\frac{(\frac{1}{2})_{\nu}(\frac{1}{2})_{\nu}(-1)^{\nu}}{(\frac{3}{2})_{\nu}}{\frac{u_2^{\frac{m}{2}-\frac{3}{2}+\nu}}{\nu !}}. \end{aligned}$$

Summing the residues, we have

$$\displaystyle \begin{aligned}\sum_{\nu=0}^{\infty}R_{\nu}=\frac{\varGamma(-\frac{1}{2})}{[\varGamma(\frac{1}{2})]^2}u_2^{\frac{m}{2}-\frac{3}{2}}{{}_2F_1}\Big(\frac{1}{2},\frac{1}{2}; \frac{3}{2};u_2\Big),~0\le u_2\le 1. \end{aligned}$$

Now, consider the sum of the residues at the poles of \(\varGamma (\frac {m}{2}-2+s)\). Observing that \(\varGamma (\frac {m}{2}-2+s)\) cancels out one of the gamma functions in the denominator, namely \(\varGamma (\frac {m}{2}-1+s)=(\frac {m}{2}-2-s)\varGamma (\frac {m}{2}-2+s)\), the integrand becomes

$$\displaystyle \begin{aligned}\frac{\varGamma(\frac{m}{2}-\frac{3}{2}+s)u_2^{-s}}{(\frac{m}{2}-2+s)\varGamma(\frac{m}{2}-1+s)},\end{aligned}$$

the residue at the pole \(s=-\frac {m}{2}+2\) being \(\frac {\varGamma (\frac {1}{2})\,u_2^{\frac {m}{2}-2}}{\varGamma (1)}\). Then, noting that \(\varGamma (-\frac {1}{2})=-2\varGamma (\frac {1}{2})=-2\sqrt {\pi }\), the density is the following:

$$\displaystyle \begin{aligned} f_{u_2}(u_2|H_o)=c_{2,2}\Big\{\sqrt{\pi}u_2^{\frac{m}{2}-2}-\frac{2}{\sqrt{\pi}}u_2^{\frac{m}{2}-\frac{3}{2}}{{}_2F_1}\Big(\frac{1}{2},\frac{1}{2}; \frac{3}{2};u_2\Big)\Big\},~0\le u_2\le 1, \end{aligned} $$
(6.6.5)

and zero elsewhere.

In the complex case, the integrand is

$$\displaystyle \begin{aligned}\frac{\varGamma(m-2+s)\varGamma(m-3+s)}{[\varGamma(m-1+s)]^2}u_2^{-s}=\frac{1}{(m-2+s)^2(m-3+s)}u_2^{-s},\end{aligned}$$

and hence there is a pole of order 1 at s = −m + 3 and a pole of order 2 at s = −m + 2. The residue at s = −m + 3 is \(\frac {u_2^{m-3}}{(1)^2}=u_2^{m-3}\) and the residue at s = −m + 2 is given by

$$\displaystyle \begin{aligned}\lim_{s\to -m+2}\frac{\partial}{\partial s}(m-2+s)^2\Big[\frac{1}{(m-2+s)^2(m-3+s)}u_2^{-s}\Big]=\lim_{s\to -m+2}\frac{\partial}{\partial s}\Big[\frac{u_2^{-s}}{(m-3+s)}\Big],\end{aligned}$$

which gives the residue as \(u_2^{m-2}\ln u_2-u_2^{m-2}\). Thus, the sum of the residues is \(u_2^{m-3}+u_2^{m-2}\ln u_2-u_2^{m-2}\) and the constant part is

$$\displaystyle \begin{aligned}\frac{[\varGamma(m)]^2}{\varGamma(m-1)\varGamma(m-2)}=(m-1)^2(m-2), ~m >2,\end{aligned}$$

so that the density is

$$\displaystyle \begin{aligned}{f}_{{u}_2}({u}_2)=(m-1)^2(m-2)[u_2^{m-3}+u_2^{m-2}\ln u_2 -u_2^{m-2}],~0<u_2\le 1, ~m\ge 3,\end{aligned}$$

and zero elsewhere. Note that as u 2 → 0, the limit of \(\,\frac {u_2^{m-1}}{m-1}\ln u_2\,\) is zero. By integrating out over 0 < u 2 ≤ 1 while m ≥ 3, it can be verified that \({f}_{{u}_2}(\cdot )\) is indeed a density function. Real and complex cases: p  ≥ 4

As poles of higher orders are present when p ≥ 4, both in the real and complex cases, the exact density function of the test statistic will not be herein explicitly given for those cases. Actually, the resulting densities would involve G-functions for which general expansions are for instance provided in Mathai (1993). The exact null and non-null densities of \(u=\lambda ^{\frac {2}{n}}\) have been previously derived by the first author. Percentage points accurate to the 11th decimal place are available from Mathai and Katiyar (1979a, 1980) for the null case; as well, various aspects of the distribution of the test statistic are discussed in Mathai and Rathie (1971) and Mathai (1973, 1984, 1985)

Let us now consider the asymptotic distribution of the λ-criterion under the null hypothesis,

$$\displaystyle \begin{aligned}H_o: \varSigma={\mathrm{diag}}(\sigma_{11},\ldots,\sigma_{pp}). \end{aligned}$$

Given the representation of the h-th moment of u 2 provided in (6.6.3) and referring to Corollary 6.5.1, it is seen that the sum of the δ j’s is \(\sum _{j=1}^{p-1}\delta _j=\sum _{j=1}^{p-1}\frac {j}{2}=\frac {p(p-1)}{4}\), so that the number of degrees of freedom of the asymptotic chisquare distribution is \(2[\frac {p(p-1)}{4}]=\frac {p(p-1)}{2}\) which, as it should be, is the number of restrictions imposed by H o, noting that when Σ is diagonal, σ ij = 0, ij, which produces \(\frac {p(p-1)}{2}\) restrictions. Hence, the following result:

Theorem 6.6.1

Let λ be the likelihood ratio criterion for testing the hypothesis that the covariance matrix Σ of a nonsingular N p(μ, Σ) distribution is diagonal. Then, as n ∞, \(-2\ln \lambda \to \chi ^2_{\frac {p(p-1)}{2}}\) in the real case. In the corresponding complex case, as n ∞, \(-2\ln \lambda \to \chi ^2_{p(p-1)}\) , a real scalar chisquare variable having p(p − 1) degrees of freedom.

6.7. Equality of Diagonal Elements, Given that Σ is Diagonal, Real Case

In the case of a p-variate real nonsingular N p(μ, Σ) population, whenever Σ is diagonal, the individual components are independently distributed as univariate normal random variables. Consider a simple random sample of size n, that is, a set of p × 1 vectors X 1, …, X n, that are iid as X j ∼ N p(μ, Σ) where it is assumed that \(\varSigma ={\mathrm{diag}}(\sigma _1^2,\ldots ,\sigma _p^2)\). Letting \(X_j^{\prime }=(x_{1j},\ldots ,x_{pj})\), the joint density of the x rj’s, j = 1, …, n, in the above sample, which is denoted by L r, is given by

$$\displaystyle \begin{aligned}L_r=\prod_{j=1}^n\frac{{\mathrm{e}}^{-\frac{1}{2\sigma_r^2}(x_{rj}-\mu_{r})^2}}{(2\pi )^{\frac{1}{2}}(\sigma_r^2)^{\frac{1}{2}}}=\frac{{\mathrm{e}}^{-\frac{1}{2\sigma_r^2}\sum_{j=1}^n(x_{rj}-\mu_r)^2}}{(2\pi)^{\frac{n}{2}}(\sigma_r^2)^{\frac{n}{2}}}.\end{aligned}$$

Then, on substituting the maximum likelihood estimators of μ r and \(\sigma _r^2\) in L r, its maximum is

$$\displaystyle \begin{aligned}\max L_r=\frac{n^{\frac{n}{2}}{\mathrm{e}}^{-\frac{n}{2}}}{(2\pi)^{\frac{n}{2}}(s_{rr})^{\frac{n}{2}}}, ~s_{rr}=\sum_{j=1}^n(x_{rj}-\bar{x_r})^2. \end{aligned}$$

Under the null hypothesis H o, \(\sigma _2^2=\cdots =\sigma _p^2\equiv \sigma ^2\) and the MLE of σ 2 is a pooled estimate which is equal to \(\frac {1}{np}(s_{11}+\cdots +s_{pp})\). Thus, the λ-criterion is the following in this case:

$$\displaystyle \begin{aligned}\lambda=\frac{{\mathrm{sup}}_{\omega}}{{\mathrm{sup}}_{\Omega}}=\frac{[s_{11}s_{22}\cdots s_{pp}]^{\frac{n}{2}}}{(\frac{s_{11}+\cdots+s_{pp}}{p})^{\frac{np}{2}}}.{}\end{aligned} $$
(6.7.1)

If we let

$$\displaystyle \begin{aligned}u_3=\lambda^{\frac{2}{n}}=\frac{p^p(\prod_{j=1}^ps_{jj})}{(\sum_{j=1}^ps_{jj})^p},{}\end{aligned} $$
(6.7.2)

then, for arbitrary h, the h-th moment of u 3 is the following:

$$\displaystyle \begin{aligned} E[u_3^h|H_o]=E\Big[\frac{p^{ph}(\prod_{j=1}^ps_{jj}^h)}{(s_{11}+\cdots+s_{pp})^{ph}}\Big]=E \Big[p^{ph}\Big(\prod_{j=1}^ps_{jj}^h\Big)(s_{11}+\cdots+s_{pp})^{-ph}\Big]. \end{aligned} $$
(6.7.3)

Observe that \(\frac {s_{jj}}{\sigma ^2}\overset {iid}{\sim } \chi _{n-1}^2=\chi _m^2,~ m=n-1,\) for j = 1, …, p, the density of s jj being of the form

$$\displaystyle \begin{aligned}f_{s_{jj}}(s_{jj})=\frac{s_{jj}^{\frac{m}{2}-1}{\mathrm{e}}^{-\frac{s_{jj}}{2\sigma^2}}}{(2\sigma^2)^{\frac{m}{2}}\varGamma(\frac{m}{2})}, ~0\le s_{jj}<\infty, ~m=n-1=1,2,\ldots, \end{aligned}$$
(i)

under H o. Note that (s 11 + ⋯ + s pp)ph can be replaced by an equivalent integral as

$$\displaystyle \begin{aligned}(s_{11}+\cdots+s_{pp})^{-ph}=\frac{1}{\varGamma(ph)}\int_0^{\infty}x^{ph-1}{\mathrm{e}}^{-x(s_{11}+\cdots+s_{pp})}{\mathrm{d}}x, ~\Re(h)>0. \end{aligned}$$
(ii)

Due to independence of the s jj’s, the joint density of s 11, …, s pp, is the product of the densities appearing in (i), and on integrating out s 11, …, s pp, we end up with the following:

$$\displaystyle \begin{aligned}\frac{1}{(2\sigma^2)^{\frac{mp}{2}}}\Big\{\prod_{j=1}^p\frac{\varGamma(\frac{m}{2}+h)}{\varGamma(\frac{m}{2})}\Big\} \Big\{\prod_{j=1}^n\Big[\frac{(1+2\sigma^2 x)}{2\sigma^2}\Big]^{-(\frac{m}{2}+h)}\Big\}=\frac{[\varGamma(\frac{m}{2}+h)]^p}{[\varGamma(\frac{m}{2})]^p}\Big[\frac{(1+2\sigma^2 x)^{-p(\frac{m}{2}+h)}}{(2\sigma^2)^{-ph}}\Big]. \end{aligned}$$

Now, the integral over x can be evaluated as follows:

$$\displaystyle \begin{aligned}\frac{(2\sigma^2)^{ph}}{\varGamma(ph)}\int_0^{\infty}x^{ph-1}(1+2\sigma^2 x)^{-p(\frac{m}{2}+h)}{\mathrm{d}}x=\frac{\varGamma(\frac{mp}{2})}{\varGamma(\frac{mp}{2}+ph)},~\Re(h)>0. \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned} E[u_3^h|H_o]=p^{ph}\frac{\varGamma^p(\frac{m}{2}+h)}{\varGamma^p(\frac{m}{2})}\frac{\varGamma(\frac{mp}{2})} {\varGamma(\frac{mp}{2}+ph)},~\Re(h)>0. \end{aligned} $$
(6.7.4)

The density of u 3 can be written in terms of an H-function. Since p is a positive integer, we can expand one gamma ratio using Gauss’ multiplication formula:

$$\displaystyle \begin{aligned}\frac{\varGamma(\frac{mp}{2})}{\varGamma(\frac{mp}{2}+ph)}=\frac{(2\pi)^{\frac{1-p}{2}}p^{\frac{pm}{2}-\frac{1}{2}}\varGamma(\frac{m}{2})\varGamma(\frac{m}{2}+\frac{1}{p}) \cdots\varGamma(\frac{m}{2}+\frac{p-1}{p})}{(2\pi)^{\frac{1-p}{2}}p^{\frac{mp}{2}-\frac{1}{2}+ph}\varGamma(\frac{m}{2}+h)\cdots\varGamma(\frac{m}{2} +\frac{p-1}{p}+h)} \end{aligned}$$

for p = 1, 2, …, m ≥ p. Accordingly,

$$\displaystyle \begin{aligned} E[u_3^h|H_o]&=\frac{[\varGamma(\frac{m}{2}+h)]^p}{[\varGamma(\frac{m}{2})]^p}\prod_{j=0}^{p-1}\frac{\varGamma(\frac{m}{2}+\frac{j}{p})}{\varGamma(\frac{m}{2}+\frac{j}{p}+h)}\\ &=\frac{[\varGamma(\frac{m}{2}+h)]^{p-1}}{[\varGamma(\frac{m}{2})]^{p-1}}\prod_{j=1}^{p-1}\frac{\varGamma(\frac{m}{2}+\frac{j}{p})}{\varGamma(\frac{m}{2}+\frac{j}{p}+h)}=c_{3,p-1} \frac{[\varGamma(\frac{m}{2}+h)]^{p-1}} {\prod_{j=1}^{p-1}\varGamma(\frac{m}{2}+\frac{j}{p}+h)},{} \end{aligned} $$
(6.7.5)
$$\displaystyle \begin{aligned} c_{3,p-1}&=\frac{\prod_{j=1}^{p-1}\varGamma(\frac{m}{2}+\frac{j}{p})}{[\varGamma(\frac{m}{2})]^{p-1}}, ~\Re\big(\frac{m}{2}+h\big)>0.{}\end{aligned} $$
(6.7.6)

Hence, for h = s − 1, (6.7.5) is the Mellin transform of the density of u 3. Thus, denoting the density by \(f_{u_3}(u_3)\), we have

$$\displaystyle \begin{aligned}f_{u_3}(u_3|H_o)=c_{3,p-1}G_{p-1,p-1}^{p-1,0}\left[u_3\big\vert^{\frac{m}{2}-1+\frac{j}{p},~j=1,\ldots,p-1}_{\frac{m}{2}-1,\ldots,\frac{m}{2}-1}\right], ~ 0\le u_3\le 1,{}\end{aligned} $$
(6.7.7)

and zero elsewhere.

In the complex case, the h-th moment is the following:

$$\displaystyle \begin{aligned} E[\tilde{u}_3^h|H_o]&=\tilde{c}_{3,p-1}\frac{[\tilde{\varGamma}(m+h)]^{p-1}}{\prod_{j=1}^{p-1}\tilde{\varGamma}(m+\frac{j}{p}+h)}, \end{aligned} $$
(6.7a.1)
$$\displaystyle \begin{aligned} \tilde{c}_{3,p-1}&=\frac{\prod_{j=1}^{p-1}\tilde{\varGamma}(m+\frac{j}{p})}{\tilde{[\varGamma}(m)]^{p-1}}.\end{aligned} $$
(6.7a.2)

and the corresponding density is given by

$$\displaystyle \begin{aligned} \tilde{f}_{\tilde{u}_3}(\tilde{u}_3|H_o)=\tilde{c}_{3,p-1}G_{p-1,p-1}^{p-1,0}\left[\tilde{u}_3\big\vert^{m-1+\frac{j}{p},\ j=1,\ldots,p-1}_{m-1,\ldots,m-1}\right],\ 0\le |\tilde{u}_3|\le 1, \end{aligned}$$
(6.7a.3)

and zero elsewhere, G denoting a real G-function. Real and complex cases: p  = 2

It is seen from (6.7.5) that for p = 2, u 3 is a real type-1 beta with the parameters \((\alpha =\frac {m}{2},~\beta =\frac {1}{p})\) in the real case. Whenever p ≥ 3, poles of order 2 or more are occurring, and the resulting density functions which are expressible in terms generalized hypergeometric functions, will not be explicitly provided. For a general series expansion of the G-function, the reader may refer to Mathai (1970a, 1993).

In the complex case, when p = 2, \(\tilde {u}_3\) has a real type-1 beta density with the parameters \((\alpha =m, ~\beta =\frac {1}{p})\). In this instance as well, poles of higher orders will be present when p ≥ 3, and hence explicit forms of the corresponding densities will not be herein provided. The exact null and non-null distributions of the test statistic are derived for the general case in Mathai and Saxena (1973), and highly accurate percentage points are provided in Mathai (1979a,b).

An asymptotic result can also be obtained as n → . Consider the h-th moment of λ, which is available from (6.7.5) in the real case and from (6.7a.1) in the complex case. Then, referring to Corollary 6.5.2, \(\delta _j=\frac {j}{p}\) whether in the real or in the complex situations. Hence, \(2[\sum _{j=1}^{p-1}\delta _j]=2\sum _{j=1}^{p-1}\frac {j}{p}=(p-1)\) in both the real and the complex cases. As well, observe that in the complex case, the diagonal elements are real since \(\tilde {\varSigma }\) is Hermitian positive definite. Accordingly, the number of restrictions imposed by H o in either the real or complex cases is p − 1. Thus, the following result:

Theorem 6.7.1

Consider the λ-criterion for testing the equality of the diagonal elements, given that the covariance matrix is already diagonal. Then, as n ∞, the null distribution of \(-2\ln \lambda \to \chi ^2_{p-1}\) in both the real and the complex cases.

6.8. Hypothesis that the Covariance Matrix is Block Diagonal, Real Case

We will discuss a generalization of the problem examined in Sect. 6.6, considering again the case of real Gaussian vectors. Let X 1, …, X n be iid as X j ∼ N p(μ, Σ), Σ > O, and

In this case, the p × 1 real Gaussian vector is subdivided into subvectors of orders p 1, …, p k, so that p 1 + ⋯ + p k = p, and, under the null hypothesis H o, Σ is assumed to be a block diagonal matrix, which means that the subvectors are mutually independently distributed p j-variate real Gaussian vectors with corresponding mean value vector μ (j) and covariance matrix Σ jj, j = 1, …, k. Then, the joint density of the sample values under the null hypothesis can be written as \(L=\prod _{r=1}^kL_r\) where L r is the joint density of the sample values corresponding to the subvector X (rj), j = 1, …, n, r = 1, …, k. Letting the p × n general sample matrix be X = (X 1, …, X n), we note that the sample representing the first p 1 rows of X corresponds to the sample from the first subvector \(X_{(1j)}\overset {iid}{\sim } N_p(\mu _{(1)},~\varSigma _{11}),~\varSigma _{11}>O\), j = 1, …, n. The MLE’s of μ (r) and Σ rr are the corresponding sample mean and sample covariance matrix. Thus, the maximum of L r is available as

$$\displaystyle \begin{aligned}\max L_r=\frac{{\mathrm{e}}^{-\frac{np_r}{2}}n^{\frac{np_r}{2}}}{(2\pi)^{\frac{np_r}{2}}|S_{rr}|{}^{\frac{n}{2}}}\,\overset{ind}{\Rightarrow}\, \prod_{r=1}^k\max L_r=\frac{{\mathrm{e}}^{-\frac{np}{2}}n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}\prod_{r=1}^k|S_{rr}|{}^{\frac{n}{2}}}.\end{aligned}$$

Hence,

$$\displaystyle \begin{aligned}\lambda=\frac{{\mathrm{sup}}_{\omega}L}{{\mathrm{sup}}_{\Omega}L}=\frac{|S|{}^{\frac{n}{2}}}{\prod_{r=1}^k|S_{rr}|{}^{\frac{n}{2}}},{}\end{aligned} $$
(6.8.1)

and

$$\displaystyle \begin{aligned}u_4\equiv\lambda^{\frac{2}{n}}=\frac{|S|}{\prod_{r=1}^k|S_{rr}|}.{}\end{aligned} $$
(6.8.2)

Observe that the covariance matrix Σ = (σ ij) can be written in terms of the matrix of population correlations. If we let D = diag(σ 1, …, σ p) where \(\sigma _t^2=\sigma _{tt}\) denotes the variance associated the component x tj in \(X_j^{\prime }=(x_{1j},\ldots ,x_{pj})\) where Cov(X) = Σ, and R = (ρ rs) be the population correlation matrix, where ρ rs is the population correlation between the components x rj and x sj, then, Σ = DRD. Consider a partitioning of Σ into k × k blocks as well as the corresponding partitioning of D and R:

where, for example, Σ jj is p j × p j, p 1 + ⋯ + p k = p, and the corresponding partitioning of D and R. Consider a corresponding partitioning of the sample sum of products matrix S = (S ij), D (s) and R (s) where R (s) is the sample correlation matrix and \(D^{(s)}={\mathrm{diag}}(\sqrt {s_{11}},\ldots ,\sqrt {s_{pp}})\), where \(S_{jj},~ D_j^{(s)},~ R_{jj}^{(s)}\) are p j × p j, p 1 + ⋯ + p k = p. Then,

$$\displaystyle \begin{aligned}\frac{|\varSigma|}{\prod_{j=1}^k|\varSigma_{jj}|}=\frac{|R|}{\prod_{j=1}^k|R_{jj}|}{}\end{aligned} $$
(6.8.3)

and

$$\displaystyle \begin{aligned}u_4\equiv\lambda^{\frac{2}{n}}=\frac{|S|}{\prod_{j=1}^k|S_{jj}|}=\frac{|R^{(s)}|}{\prod_{j=1}^k|R_{jj}^{(s)}|}.{}\end{aligned} $$
(6.8.4)

An additional interesting property is now pointed out. Consider a linear function of the original p × 1 vector X j ∼ N p(μ, Σ), Σ > O, in the form CX j where C is the diagonal matrix, diag(c 1, …, c p). In this case, the product CX j is such that the r-th component of X j is weighted or multiplied by c r. Let C be a block diagonal matrix that is partitioned similarly to D so that its j-th diagonal block matrix be the p j × p j diagonal submatrix C j. Then,

$$\displaystyle \begin{aligned}u_c=\frac{|CSC'|}{\prod_{j=1}^k|C_jS_{jj}C_j^{\prime}|}=\frac{|S|}{\prod_{j=1}^k|S_{jj}|}=u_4.{}\end{aligned} $$
(6.8.5)

In other words, u 4 is invariant under linear transformations on \(X_j \overset {iid}{\sim }N_p(\mu ,\varSigma ), ~\varSigma >O,\ j=1,\ldots ,n\). That is, if Y j = CX j + d where d is a constant column vector, then the p × n sample matrix on Y j, namely, Y = (Y 1, …, Y n) = (CX 1 + d, …, CX n + d),

$$\displaystyle \begin{aligned}\mathbf{Y}-\bar{\mathbf{Y}}=C(\mathbf{X}-\bar{\mathbf{X}})\Rightarrow S_y=(\mathbf{Y}-\bar{\mathbf{Y}})(\mathbf{Y}-\bar{\mathbf{Y}})'=C(\mathbf{X}-\bar{\mathbf{X}})(\mathbf{X}-\bar{\mathbf{X}})'C'=CSC'. \end{aligned}$$

Letting S y be partitioned as S into k × k blocks and S y = (S ijy), we have

$$\displaystyle \begin{aligned}u_y\equiv\frac{|S_y|}{\prod_{j=1}^k|S_{jjy}|}=\frac{|CSC'|}{\prod_{j=1}^k|C_jS_{jj}C_j^{\prime}|}=\frac{|S|}{\prod_{j=1}^k|S_{jj}|}=u_4.{}\end{aligned} $$
(6.8.6)

Arbitrary moments of u 4 can be derived by proceeding as in Sect. 6.6. The h-th null moment, that is, the h-th moment under the null hypothesis H o, is then

$$\displaystyle \begin{aligned}E[u_4^h|H_o]=\frac{1}{2^{\frac{mp}{2}}\varGamma_p(\frac{m}{2})|\varSigma_o|{}^{\frac{m}{2}}}\int_{S>O}|S|{}^{\frac{m}{2}+h-\frac{p+1}{2}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma_o^{-1}S)}\big\{\prod_{r=1}^k|S_{rr}|{}^{-h}\big\}{\mathrm{d}}S \end{aligned}$$
(i)

where m = n − 1, n being the sample size, and

(ii)

where S rr is the r-th diagonal block of S, corresponding to Σ rr of Σ whose order p r × p r, r = 1, …, k, p 1 + ⋯ + p k = p. On noting that

$$\displaystyle \begin{aligned}|S_{rr}|{}^{-h}=\frac{1}{\varGamma_{p_r}(h)}\int_{Y_r>O}|Y_r|{}^{h-\frac{p_r+1}{2}}{\mathrm{e}}^{-{\mathrm{tr}}(Y_rS_{rr})}{\mathrm{d}}Y_r,\ r=1,\ldots,k, \end{aligned}$$
(iii)

where Y r > O is a p r × p r real positive definite matrix, and replacing each |S rr|h by its integral representation as given in (iii), the exponent of e in (i) becomes

The right-hand side of equation (i) then becomes

$$\displaystyle \begin{aligned} E[u_4^h|H_o]&=\frac{\varGamma_p(\frac{m}{2}+h)}{\varGamma_p(\frac{m}{2})|\varSigma_o|{}^{\frac{m}{2}}}2^{ph}\Big\{\prod_{r=1}^k\frac{1}{\varGamma_{p_r}(h)} \int_{Y_r>O}|Y_r|{}^{h-\frac{p_r+1}{2}}\Big\}\\ &\ \ \ \ \ \ \times |\varSigma_o^{-1}+2Y|{}^{-(\frac{m}{2}+h)}{\mathrm{d}}Y_1\wedge\ldots\wedge{\mathrm{d}}Y_k,~\Re(h)>-\frac{m}{2}+\frac{p-1}{2}.\end{aligned} $$
(iv)

It should be pointed out that the non-null moments of u 4 can be obtained by substituting a general Σ to Σ o in (iv). Note that if we replace 2Y  by Y , the factor containing 2, namely 2ph, will disappear. Further, under H o,

$$\displaystyle \begin{aligned}|\varSigma_o^{-1}+2Y|{}^{-(\frac{m}{2}+h)}=\Big\{\prod_{r=1}^k|\varSigma_{rr}|{}^{\frac{m}{2}+h}\Big\}\Big\{\prod_{r=1}^k|I+2\varSigma_{rr}Y_r|{}^{-(\frac{m}{2}+h)}\Big\}. \end{aligned}$$
(v)

Then, each Y r-integral can be evaluated as follows:

$$\displaystyle \begin{aligned} \frac{1}{\varGamma_{p_r}(h)}&\int_{Y_r>O}|Y_r|{}^{h-\frac{p_r+1}{2}}|I+2\varSigma_{rr}Y_r|{}^{-(\frac{m}{2}+h)}{\mathrm{d}}Y_r\\ &=2^{-p_rh}|\varSigma_{rr}|{}^{-h} \frac{1}{\varGamma_{p_r}(h)}\int_{Z_r>O}|Z_r|{}^{h-\frac{p_r+1}{2}}|I+Z_r|{}^{-(\frac{m}{2}+h)}{\mathrm{d}}Z_r,\ Z_r=2\varSigma_{rr}^{\frac{1}{2}}Y_r\varSigma_{rr}^{\frac{1}{2}}\\ &=2^{-p_rh}|\varSigma_{rr}|{}^{-h}\frac{\varGamma_{p_r}(h)}{\varGamma_{p_r}(h)}\frac{\varGamma_{p_r}(\frac{m}{2})}{\varGamma_{p_r}(\frac{m}{2}+h)}.\end{aligned} $$
(vi)

On combining equations (i) to (vi), we have

$$\displaystyle \begin{aligned} E[u_4^h|H_o]&=\frac{\varGamma_p(\frac{m}{2}+h)}{\varGamma_p(\frac{m}{2})}\prod_{r=1}^k\frac{\varGamma_{p_r}(\frac{m}{2})}{\varGamma_{p_r}(\frac{m}{2}+h)},~ \Re(\frac{m}{2}+h)>\frac{p-1}{2},{} \end{aligned} $$
(6.8.7)
$$\displaystyle \begin{aligned} &=c_{4,p}\frac{\varGamma_p(\frac{m}{2}+h)}{\prod_{r=1}^k\varGamma_{p_r}(\frac{m}{2}+h)}=c_{4,p}c^{*}\frac{\prod_{j=1}^p\varGamma(\frac{m}{2}-\frac{j-1}{2}+h)}{\prod_{r=1}^k[\prod_{i=1}^{p_r}\varGamma_{p_r} (\frac{m}{2}-\frac{i-1}{2}+h)]},{} \end{aligned} $$
(6.8.8)
$$\displaystyle \begin{aligned} c_{4,p}&=\frac{\prod_{r=1}^k\varGamma_{p_r}(\frac{m}{2})}{\varGamma_p(\frac{m}{2})}=\frac{[\prod_{r=1}^k\pi^{\frac{p_r(p_r-1)}{4}}]}{\pi^{\frac{p(p-1)}{4}}}\frac{\prod_{r=1}^k[\prod_{i=1}^{p_r} \varGamma(\frac{m}{2}-\frac{i-1}{2})]}{\prod_{j=1}^p \varGamma(\frac{m}{2}-\frac{j-1}{2})},{}\\ c^{*}&=\frac{\pi^{\frac{p(p-1)}{4}}}{\prod_{r=1}^k\pi^{\frac{p_r(p_r-1)}{4}}} \end{aligned} $$
(6.8.9)

so that when h = 0, \(E[u_4^h|H_o]=1\). Observe that one set of gamma products can be canceled in (6.8.8) and (6.8.9). When that set is the product of the first p 1 gamma functions, the h-th moment of u 4 is given by

$$\displaystyle \begin{aligned} E[u_4^h|H_o]=c_{4,p-p_1}\frac{\prod_{j=p_1+1}^p\varGamma(\frac{m}{2}-\frac{j-1}{2}+h)}{\prod_{r=2}^k [\prod_{i=1}^{p_r}\varGamma(\frac{m}{2}-\frac{i-1}{2}+h)]}, \end{aligned} $$
(6.8.10)

where \(c_{4,p-p_1}\) is such that \(E[u_4^h|H_o]=1\) when h = 0. Since the structure of the expression given in (6.8.10) is that of the h-th moment of a product of p − p 1 independently distributed real scalar type-1 beta random variables, it can be inferred that the distribution of u 4|H o is also that of a product of p − p 1 independently distributed real scalar type-1 beta random variables whose parameters can be determined from the arguments of the gamma functions appearing in (6.8.10).

Some of the gamma functions appearing in (6.8.10) will cancel out for certain values of p 1, …, p k, , thereby simplifying the representation of the moments and enabling one to express the density of u 4 in terms of elementary functions in such instances. The exact null density in the general case was derived by the first author. For interesting representations of the exact density, the reader is referred to Mathai and Rathie (1971) and Mathai and Saxena (1973), some exact percentage points of the null distribution being included in Mathai and Katiyar (1979a). As it turns out, explicit forms are available in terms of elementary functions for the following special cases, see also Anderson (2003): p 1 = p 2 = p 3 = 1; p 1 = p 2 = p 3 = 2; p 1 = p 2 = 1, p 3 = p − 2; p 1 = 1, p 2 = p 3 = 2; p 1 = 1, p 2 = 2, p 3 = 3; p 1 = 2, p 2 = 2, p 3 = 4; p 1 = p 2 = 2, p 3 = 3; p 1 = 2, p 2 = 3, p is even.

6.8.1. Special case: k = 2

Let us consider a certain 2 × 2 partitioning of S, which corresponds to the special case k = 2. When p 1 = 1 and p 2 = p − 1 so that p 1 + p 2 = p, the test statistic is

$$\displaystyle \begin{aligned} u_4&=\frac{|S|}{|S_{11}|~|S_{22}|}=\frac{|S_{11}-S_{12}S_{22}^{-1}S_{21}|}{|S_{11}|}\\ &=\frac{s_{11}-S_{12}S_{22}^{-1}S_{21}}{s_{11}}=1-r^2_{1.(2\ldots p)}{}\end{aligned} $$
(6.8.11)

where r 1.(2…p) is the multiple correlation between x 1 and (x 2, …, x p). As stated in Theorem 5.6.3, \(1-r^2_{1.(2\ldots p)}\) is distributed as a real scalar type-1 beta variable with the parameters \((\frac {n-1}{2}-\frac {p-1}{2},~ \frac {p-1}{2})\). The simplifications in (6.8.11) are achieved by making use of the properties of determinants of partitioned matrices, which are discussed in Sect. 1.3. Since s 11 is 1 × 1 in this case, the numerator determinant is a real scalar quantity. Thus, this yields a type-2 beta distribution for \(w=\frac {u_4}{1-u_4}\) and thereby \(\frac {n-p}{p-1}w\) has an F-distribution, so that the test can be based on an F statistic having (n − 1) − (p − 1) = n − p and p − 1 degrees of freedom.

6.8.2. General case: k = 2

If in a 2 × 2 partitioning of S, S 11 is of order p 1 × p 1 and S 22 is of order p 2 × p 2 with p 2 = p − p 1. Then u 4 can be expressed as

$$\displaystyle \begin{aligned} u_4&=\frac{|S|}{|S_{11}|~|S_{22}|}=\frac{|S_{11}-S_{12}S_{22}^{-1}S_{21}|}{|S_{11}|}\\ &=|I-S_{11}^{-\frac{1}{2}}S_{12}S_{22}^{-1}S_{21}S_{11}^{-\frac{1}{2}}|=|I-U|,\ U=S_{11}^{-\frac{1}{2}}S_{12}S_{22}^{-1}S_{21}S_{11}^{-\frac{1}{2}}{}\end{aligned} $$
(6.8.12)

where U is called the multiple correlation matrix. It will be shown that U has a real matrix-variate type-1 beta distribution when S 11 is of general order rather than being a scalar.

Theorem 6.8.1

Consider u 4 for k = 2. Let S 11 be p 1 × p 1 and S 22 be p 2 × p 2 so that p 1 + p 2 = p. Without any loss of generality, let us assume that p 1 ≤ p 2 . Then, under H o : Σ 12 = O, the multiple correlation matrix U has a real matrix-variate type-1 beta distribution with the parameters \((\frac {p_2}{2}, ~\frac {m}{2}-\frac {p_2}{2}),\) with m = n − 1, n being sample size, and thereby \((I-U)\sim \mathit{\mbox{ type-1 beta }}(\frac {m}{2}-\frac {p_2}{2},~ \frac {p_2}{2})\) , the determinant of I  U being u 4 under the null hypothesis when k = 2.

Proof

Since Σ under H o can readily be eliminated from a structure such as u 4, we will take a Wishart matrix S having m = n − 1 degrees of freedom, n denoting the sample size, and parameter matrix I, the identity matrix. At first, assume that Σ is a block diagonal matrix and make the transformation \(S_1=\varSigma ^{-\frac {1}{2}}S\varSigma ^{-\frac {1}{2}}\). As a result, u 4 will be free of Σ 11 and Σ 22, and so, we may take S ∼ W p(m, I). Now, consider the submatrices S 11, S 22, S 12 so that dS = dS 11 ∧dS 22 ∧dS 12. Let f(S) denote the W p(m, I) density. Then,

However, appealing to a result stated in Sect. 1.3, we have

$$\displaystyle \begin{aligned} |S|&=|S_{22}|~|S_{11}-S_{12}S_{22}^{-1}S_{21}|\\ &=|S_{22}|~|S_{11}|~|I-S_{11}^{-\frac{1}{2}}S_{12}S_{22}^{-1}S_{21}S_{11}^{-\frac{1}{2}}|.\end{aligned} $$

The joint density of S 11, S 22, S 12 denoted by f 1(S 11, S 22, S 12) is then

$$\displaystyle \begin{aligned} f_1(S_{11},S_{22},S_{12})&=|S_{11}|{}^{\frac{m}{2}-\frac{p+1}{2}}|S_{22}|{}^{\frac{m}{2}-\frac{p+1}{2}}|I-U|{}^{\frac{m}{2}-\frac{p+1}{2}}\\ &\ \ \ \ \times \frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(S_{11})-\frac{1}{2}{\mathrm{tr}}(S_{22})}}{2^{\frac{mp}{2}}\varGamma_p(\frac{m}{2})},~ U=S_{11}^{-\frac{1}{2}}S_{12}S_{22}^{-1}S_{21}S_{11}^{-\frac{1}{2}}.\end{aligned} $$

Letting \(Y=S_{11}^{-\frac {1}{2}}S_{12}S_{22}^{-\frac {1}{2}},\) it follows from a result on Jacobian of matrix transformation, previously established in Chap. 1, that \({\mathrm{d}}Y=|S_{11}|{ }^{-\frac {p_2}{2}}|S_{22}|{ }^{-\frac {p_1}{2}}{\mathrm{d}}S_{12}\). Thus, the joint density of S 11, S 22, Y , denoted by f 2(S 11, S 22, Y ), is given by

$$\displaystyle \begin{aligned} f_2(S_{11},S_{22},Y)&=|S_{11}|{}^{\frac{m}{2}+\frac{p_2}{2}-\frac{p+1}{2}}|S_{22}|{}^{\frac{m}{2}+\frac{p_1}{2}-\frac{p+1}{2}} |I-YY^{\prime}|{}^{\frac{m}{2}-\frac{p+1}{2}}\\ &\ \ \ \ \times \frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(S_{11})-\frac{1}{2}{\mathrm{tr}}(S_{22})}}{2^{\frac{pm}{2}}\varGamma_p(\frac{m}{2})},\end{aligned} $$

Note that S 11, S 22, Y  are independently distributed as f 2(⋅) can be factorized into functions of S 11, S 22, Y . Now, letting U = YY, it follows from Theorem 4.2.3 that

$$\displaystyle \begin{aligned}{\mathrm{d}}Y=\frac{\pi^{\frac{p_1p_2}{2}}}{\varGamma_{p_1}(\frac{p_2}{2})}|U|{}^{\frac{p_2}{2}-\frac{p_1+1}{2}}{\mathrm{d}}U,\end{aligned}$$

and the density of U, denoted by f 3(U), can then be expressed as follows:

$$\displaystyle \begin{aligned}f_3(U)=c~|U|{}^{\frac{p_2}{2}-\frac{p_1+1}{2}}|I-U|{}^{\frac{m}{2}-\frac{p_2}{2}-\frac{p_1+1}{2}},~O<U<I, {}\end{aligned} $$
(6.8.13)

which is a real matrix-variate type-1 beta density with the parameters \((\frac {p_2}{2},~ \frac {m}{2}-\frac {p_2}{2}),\) where c is the normalizing constant. As a result, I − U has a real matrix-variate type-1 beta distribution with the parameters \((\frac {m}{2}-\frac {p_2}{2}, ~\frac {p_2}{2})\). Finally, observe that u 4 is the determinant of I − U.

Corollary 6.8.1

Consider u 4 as given in (6.8.12) and the determinant |I  U| where U and I  U are defined in Theorem 6.8.1 . Then for k = 2 and an arbitrary h, \(E[u_4^h|H_o]=|I-U|{ }^{h}\).

Proof

On letting k = 2 in (6.8.8), we obtain the h-th moment of u 4|H o as

$$\displaystyle \begin{aligned}E[u_4^h|H_o]=c_{4,p}\frac{\prod_{j=1}^p\varGamma(\frac{m}{2}-\frac{j-1}{2}+h)}{\{\prod_{j=1}^{p_1}\varGamma(\frac{m}{2}-\frac{j-1}{2}+h)\} \{\prod_{j=1}^{p_2}\varGamma(\frac{m}{2}-\frac{j-1}{2}+h)\}}. \end{aligned}$$
(i)

After canceling p 2 of the gamma functions, the remaining gamma product in the numerator of (i) is

$$\displaystyle \begin{aligned}\varGamma\Big(\alpha-\frac{p_2}{2}\Big)\varGamma\Big(\alpha-\frac{p_2+1}{2}\Big)\cdots\varGamma\Big(\alpha-\frac{p-1}{2}\Big)=\varGamma_{p_1}\Big(\alpha-\frac{p_2}{2}\Big),\ \alpha=\frac{m}{2}+h,\end{aligned}$$

excluding \(\pi ^{\frac {p_1(p_1-1)}{4}}\). The remainder of the gamma product present in the denominator is comprised of the gamma functions coming from \(\varGamma _{p_1}(\frac {m}{2}+h)\), excluding \(\pi ^{\frac {p_1(p_1-1)}{4}}\). The normalizing constant will automatically take care of the factors containing π. Now, the resulting part containing h is \(\varGamma _{p_1}(\frac {m}{2}-\frac {p_2}{2}+h)/\varGamma _{p_1}(\frac {m}{2}+h)\), which is the gamma ratio in the h-th moment of a p 1 × p 1 real matrix-variate type-1 beta distribution with the parameters \((\frac {m}{2}-\frac {p_2}{2}, ~\frac {p_2}{2})\).

Since this happens to be E[|IU|]h for I − U distributed as is specified in Theorem 6.8.1, the Corollary is established.

An asymptotic result can be established from Corollary 6.5.1 and the λ-criterion for testing block-diagonality or equivalently the independence of subvectors in a p-variate Gaussian population. The resulting chisquare variable will have 2∑j δ j degrees of freedom where δ j is as defined in Corollary 6.5.1 for the second parameter of the real scalar type-1 beta distribution. Referring to (6.8.10), we have

$$\displaystyle \begin{aligned} \sum_j\delta_j&=\sum_{j=p_1+1}^{p}\frac{j-1}{2}-\sum_{j=2}^k\sum_{i=1}^{p_j}\frac{i-1}{2}=\sum_{j=p_1+1}^p\frac{j-1}{2} -\sum_{j=2}^k\frac{p_j(p_j-1)}{4}\\ &=\sum_{j=1}^p\frac{j-1}{2}-\sum_{j=1}^k\frac{p_j(p_j-1)}{4}=\frac{p(p-1)}{4}-\sum_{j=1}^k\frac{p_j(p_j-1)}{4}=\sum_{j=1}^k\frac{p_j(p-p_j)}{4}.\end{aligned} $$

Accordingly, the degrees of freedom of the resulting chisquare is \(2[\sum _{j=1}^k\frac {p_j(p-p_j)}{4}]=\sum _{j=1}^k\frac {p_j(p-p_j)}{2}\) in the real case. It can also be observed that the number of restrictions imposed by the null hypothesis H o is obtained by first letting all the off-diagonal elements of Σ = Σ′ equal to zero and subtracting the off-diagonal elements of the k diagonal blocks which produces \(\frac {p(p-1)}{2}-\sum _{j=1}^k\frac {p_j(p_j-1)}{2}=\sum _{j=1}^k\frac {p_j(p-p_j)}{2}\). In the complex case, the number of degrees of freedom will be twice that obtained for the real case, the chisquare variable remaining a real scalar chisquare random variable. This is now stated as a theorem.

Theorem 6.8.2

Consider the λ-criterion given in (6.8.1) in the real case and let the corresponding λ in the complex case be \(\tilde {\lambda }\) . Then \(-2\ln \lambda \to \chi ^2_{\delta }\) as n ∞ where n is the sample size and \(\delta = \sum _{j=1}^k\frac {p_j(p-p_j)}{2},\) which is also the number of restrictions imposed by H o . Analogously, in the complex case, \(-2\ln \tilde {\lambda }\to \chi ^2_{\tilde {\delta }}\) as n , where the chisquare variable remains a real scalar chisquare random variable, \(\tilde {\delta }=\sum _{j=1}^kp_j(p-p_j)\) and n denotes the sample size.

6.9. Hypothesis that the Mean Value and Covariance Matrix are Given

Consider a real p-variate Gaussian population X j ∼ N p(μ, Σ), Σ > O, and a simple random sample, X 1, …, X n, from this population, the X i’s being iid as X j. Let the sample mean and the sample sum of products matrix be denoted by \(\bar {X}\) and S, respectively. Consider the hypothesis H o : μ = μ o, Σ = Σ o where μ o and Σ o are specified. Let us examine the likelihood ratio test for testing H o and obtain the resulting λ-criterion. Let the parameter space be Ω = {(μ, Σ)|Σ > O,  − < μ j < , j = 1, …, p, μ′ = (μ 1, …, μ p)}. Let the joint density of X 1, …, X n be denoted by L. Then, as previously obtained, the maximum value of L is

$$\displaystyle \begin{aligned}\max_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{np}{2}}n^{\frac{np}{2}}}{(2\pi)^{\frac{np}{2}}|S|{}^{\frac{n}{2}}}{}\end{aligned} $$
(6.9.1)

and the maximum under H o is

$$\displaystyle \begin{aligned}\max_{H_o}L=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\sum_{j=1}^n(X_j-\mu_o)'\varSigma_o^{-1}(X_j-\mu_o))}}{(2\pi)^{\frac{np}{2}}|\varSigma_o|{}^{\frac{n}{2}}}.{}\end{aligned} $$
(6.9.2)

Thus,

$$\displaystyle \begin{aligned}\lambda=\frac{\max_{H_o}L}{\max_{\Omega}L}=\frac{{\mathrm{e}}^{\frac{np}{2}}|S|{}^{\frac{n}{2}}}{n^{\frac{np}{2}}|\varSigma_o|{}^{\frac{n}{2}}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\sum_{j=1}^n(X_j-\mu_o)'\varSigma_o^{-1}(X_j-\mu_o))}.{}\end{aligned} $$
(6.9.3)

We reject H o for small values of λ. Since the exponential part dominates the polynomial part for large values, we reject for large values of the exponent, excluding (−1), which means for large values of \(\sum _{j=1}^n(X_j-\mu _o)'\varSigma _o^{-1}(X_j-\mu _o)\sim \chi ^2_{np}\) since \((X_j-\mu _o)'\varSigma _o^{-1}(X_j-\mu _o)\overset {iid}{\sim } \chi ^2_p\) for each j. Hence the criterion consists of

$$\displaystyle \begin{aligned}\mbox{rejecting}\ H_o\ \mbox{if the observed values of }\sum_{j=1}^n(X_j-\mu_o)'\varSigma_o^{-1}(X_j-\mu_o)\ge \chi^2_{np,\alpha}\end{aligned}$$

with

$$\displaystyle \begin{aligned}Pr\{\chi^2_{np}\ge \chi^2_{np,\alpha}\}=\alpha.{}\end{aligned} $$
(6.9.4)

Let us determine the h-th moment of λ for an arbitrary h. Note that

$$\displaystyle \begin{aligned}\lambda^h=\frac{{\mathrm{e}}^{\frac{nph}{2}}}{n^{\frac{nph}{2}}|\varSigma_o|{}^{\frac{nh}{2}}}|S|{}^{\frac{nh}{2}}{\mathrm{e}}^{-\frac{h}{2}{\mathrm{tr}}(\varSigma_o^{-1}S)-\frac{hn}{2}(\bar{X}-\mu_o)'\varSigma_o^{-1}(\bar{X}-\mu_o)}.{}\end{aligned} $$
(6.9.5)

Since λ contains S and \(\bar {X}\) and these quantities are independently distributed, we can integrate out the part containing S over a Wishart density having m = n − 1 degrees of freedom and the part containing \(\bar {X}\) over the density of \(\bar {X}\). Thus, for m = n − 1,

$$\displaystyle \begin{aligned} E\ \Big[({|S|{}^{\frac{nh}{2}}}/{|\varSigma_o|{}^{\frac{nh}{2}}})\,{\mathrm{e}}^{-\frac{h}{2}{\mathrm{tr}}(\varSigma_o^{-1}S)}|H_o\Big]&=\frac{\int_{S>O}|S|{}^{\frac{m}{2}+\frac{nh}{2}-\frac{p+1}{2}}{\mathrm{e}}^{-\frac{(1+h)}{2}{\mathrm{tr}}(\varSigma_o^{-1}S)}}{2^{\frac{mp}{2}}\varGamma_p(\frac{m}{2})|\varSigma_o|{}^{\frac{n}{2}(1+h)-\frac{1}{2}}}{\mathrm{d}}S\\ &=2^{\frac{nph}{2}}\frac{\varGamma_p(\frac{n}{2}(1+h)-\frac{1}{2})}{\varGamma_p(\frac{n}{2}-\frac{1}{2})}(1+h)^{-[\frac{n}{2}(1+h)-\frac{1}{2}]p}.\end{aligned} $$
(i)

Under H o, the integral over \(\bar {X}\) gives

$$\displaystyle \begin{aligned}\int_{\bar{X}}\frac{\sqrt{n}}{(2\pi)^{\frac{p}{2}}|\varSigma_o|{}^{\frac{1}{2}}}{\mathrm{e}}^{-(1+h)\frac{n}{2}(\bar{X}-\mu_o)'\varSigma_o^{-1}(\bar{X}-\mu_o)}{\mathrm{d}}\bar{X}=(1+h)^{-\frac{p}{2}}. \end{aligned}$$
(ii)

From (i) and (ii), we have

$$\displaystyle \begin{aligned} E[\lambda^h|H_o]=\frac{{\mathrm{e}}^{\frac{nph}{2}}2^{\frac{nph}{2}}}{n^{\frac{nph}{2}}}\frac{\varGamma_p(\frac{n}{2}(1+h)-\frac{1}{2})}{\varGamma_p(\frac{n-1}{2})} (1+h)^{-[\frac{n}{2}(1+h)]p}. \end{aligned} $$
(6.9.6)

The inversion of this expression is quite involved due to branch points. Let us examine the asymptotic case as n →. On expanding the gamma functions by making use of the version of Stirling’s asymptotic approximation formula for gamma functions given in (6.5.14), namely \(\varGamma (z+\eta )\approx \sqrt {2\pi }z^{z+\eta -\frac {1}{2}}{\mathrm{e}}^{-z}\) for |z|→ and η bounded, we have

$$\displaystyle \begin{aligned} \frac{\varGamma_p(\frac{n}{2}(1+h)-\frac{1}{2})}{\varGamma_p(\frac{n-1}{2})}&=\prod_{j=1}^p\frac{\varGamma(\frac{n}{2}(1+h)-\frac{1}{2}-\frac{j-1}{2})} {\varGamma(\frac{n-1}{2}-\frac{j-1}{2})}\\ &=\prod_{j=1}^p\frac{\sqrt{2\pi}[\frac{n}{2}(1+h)]^{\frac{n}{2}(1+h)-\frac{1}{2}-\frac{1}{2}-\frac{j-1}{2}}{\mathrm{e}}^{-\frac{n}{2}(1+h)}}{\sqrt{2\pi}[\frac{n}{2}]^{\frac{n}{2}-\frac{1}{2}-\frac{1}{2}-\frac{j-1}{2}}{\mathrm{e}}^{-\frac{n}{2}}}\\ &=\Big[\frac{n}{2}\Big]^{\frac{nph}{2}}{\mathrm{e}}^{-\frac{nph}{2}}(1+h)^{\frac{n}{2}(1+h)p-\frac{p}{2}-\frac{p(p+1)}{4}}.\end{aligned} $$
(iii)

Thus, as n →, it follows from (6.9.6) and (iii) that

$$\displaystyle \begin{aligned}E[\lambda^h|H_o]=(1+h)^{-\frac{1}{2}(p+\frac{p(p+1)}{2})},{}\end{aligned} $$
(6.9.7)

which implies that, asymptotically, \(-2\ln \lambda \) has a real scalar chisquare distribution with \(p+\frac {p(p+1)}{2}\) degrees of freedom in the real Gaussian case. Hence the following result:

Theorem 6.9.1

Given a N p(μ, Σ), Σ > O, population, consider the hypothesis H o : μ = μ o, Σ = Σ o where μ o and Σ o are specified. Let λ denote the λ-criterion for testing this hypothesis. Then, in the real case, \(-2\ln \lambda \to \chi ^2_{\delta }\) as n ∞ where \(\delta =p+\frac {p(p+1)}{2}\) and, in the corresponding complex case, \(-2\ln \lambda \to \chi ^2_{\delta _1}\) as n ∞ where δ 1 = 2p + p(p + 1), the chisquare variable remaining a real scalar chisquare random variable.

Note 6.9.1

In the real case, observe that the hypothesis H o : μ = μ o, Σ = Σ o imposes p restrictions on the μ parameters and \(\frac {p(p+1)}{2}\) restrictions on the Σ parameters, for a total of \(p+\frac {p(p+1)}{2}\) restrictions, which corresponds to the degrees of freedom for the asymptotic chisquare distribution in the real case. In the complex case, there are twice as many restrictions.

Example 6.9.1

Consider the real trivariate Gaussian distribution N 3(μ, Σ), Σ > O and the hypothesis H o : μ = μ o, Σ = Σ o where μ o, Σ o and an observed sample of size 5 are as follows:

Now,

$$\displaystyle \begin{aligned} (X_1-\mu_o)'\varSigma_o^{-1}(X_1-\mu_o)&=\frac{36}{24},~ (X_2-\mu_o)'\varSigma_o^{-1}(X_2-\mu_o)=\frac{33}{24},\\ (X_3-\mu_o)'\varSigma_o^{-1}(X_3-\mu_o)&=\frac{104}{24},~ (X_4-\mu_o)'\varSigma_o^{-1}(X_4-\mu_o)=\frac{144}{24},\\ (X_5-\mu_o)'\varSigma_o^{-1}(X_5-\mu_o)&=\frac{20}{24},\end{aligned} $$

and

$$\displaystyle \begin{aligned} \sum_{j=1}^5(X_j-\mu_o)'\varSigma_o^{-1}(X_j-\mu_o)=\frac{1}{24}[36+33+104+144+20]=\frac{337}{24}=14.04.\end{aligned}$$

Note that, in this example, n = 5, p = 3 and np = 15. Letting the significance level of the test be α = 0.05, H o is not rejected since \(14.04< \chi ^2_{15,\,0.05}=25\).

6.10. Testing Hypotheses on Linear Regression Models or Linear Hypotheses

Let the p × 1 real vector X j have an expected value μ and a covariance matrix Σ > O for j = 1, …, n, and the X j’s be independently distributed. Let X j, μ, Σ be partitioned as follows where x 1j, μ 1 and σ 11 are 1 × 1, μ (2), Σ 21 are (p − 1) × 1, \(\varSigma _{12}=\varSigma _{21}^{\prime }\) and Σ 22 is (p − 1) × (p − 1):

(i)

If the conditional expectation of x 1j, given X (2)j is linear in X (2)j, then omitting the subscript j since the X j’s are iid, it was established in Eq. (3.3.5) that

$$\displaystyle \begin{aligned}E[x_1|X_{(2)}]=\mu_1+\varSigma_{12}\varSigma_{22}^{-1}(X_{(2)}-\mu_{(2)}).{}\end{aligned} $$
(6.10.1)

When the regression is linear, the best linear predictor of x 1 in terms of X (2) will be of the form

$$\displaystyle \begin{aligned}E[x_1|X_{(2)}]-E(x_1)=\beta'(X_{(2)}-E(X_{(2)})), ~\beta'=(\beta_2,\ldots,\beta_p).{}\end{aligned} $$
(6.10.2)

Then, by appealing to properties of the conditional expectation and conditional variance, it was shown in Chap. 3 that \(\beta '=\varSigma _{12}\varSigma _{22}^{-1}\). Hypothesizing that X (2) is not random, or equivalently that the predictor function is a function of the preassigned values of X (2), amounts to testing whether \(\varSigma _{12}\varSigma _{22}^{-1}=O\). Noting that Σ 22 > O since Σ > O, the null hypothesis thus reduces to H o : Σ 12 = O. If the original population X is p-variate real Gaussian, this hypothesis is then equivalent to testing the independence of x 1 and X (2). Actually, this has already been discussed in Sect. 6.8.2 for the case of k = 2, and is also tantamount to testing whether the population multiple correlation ρ 1.(2…k) = 0. Assuming that the population is Gaussian and letting of \(u=\lambda ^{\frac {2}{n}}\) where λ is the lambda criterion, \(u\sim \mbox{type-1 beta}(\frac {n-p}{2},~\frac {p-1}{2})\) under the null hypothesis; this was established in Theorem 6.8.1 for p 1 = 1 and p 2 = p − 1. Then, \(v=\frac {u}{1-u}\sim \mbox{ type-2 beta }(\frac {n-p}{2}, ~\frac {p-1}{2})\), that is, \(v\sim \frac {n-p}{p-1}F_{n-p,~p-1}\) or \(\frac {(p-1)}{(n-p)}\frac {u}{1-u}\sim F_{n-p,~p-1}\). Hence, in order to test H o : β = O,

$$\displaystyle \begin{aligned}\mbox{reject}\ H_o\ \mbox{if }F_{n-p,\,p-1}\ge F_{n-p,\,p-1,~\alpha}, \mbox{ with } Pr\{F_{n-p,\,p-1}\ge F_{n-p,\,p-1,~\alpha}\}=\alpha{}\end{aligned} $$
(6.10.3)

The test statistic u is of the form

$$\displaystyle \begin{aligned}u=\frac{|S|}{s_{11}|S_{22}|},~ S\sim W_p(n-1,\varSigma),~ \varSigma>O,~\frac{(p-1)}{(n-p)}\frac{u}{1-u}\sim F_{n-p,~p-1},\end{aligned}$$

where the submatrices of S, s 11 is 1 × 1 and S 22 is (p − 1) × (p − 1). Observe that the number of parameters being restricted by the hypothesis Σ 12 = O is p 1 p 2 = 1(p − 1) =  p − 1. Hence as n →, the null distribution of \(-2\ln \lambda \) is a real scalar chisquare having p − 1 degrees of freedom. Thus, the following result:

Theorem 6.10.1

Let the p × 1 vector X j be partitioned into the subvectors x 1j of order 1 and X (2)j of order p − 1. Let the regression of x 1j on X (2)j be linear in X (2)j , that is, E[x 1j|X (2)j] − E(x 1j) = β′(X (2)j − E(X (2)j)). Consider the hypothesis H o : β = O. Let X j ∼ N p(μ, Σ), Σ > O, for j = 1, …, n, the X j ’s being independently distributed, and let λ be the λ-criterion for testing this hypothesis. Then, as n ∞, \(-2\ln \lambda \to \chi ^2_{p-1}\).

Example 6.10.1

Let the population be N 3(μ, Σ), Σ > O, and the observed sample of size n = 5 be

The resulting sample average \(\bar {X}\) and deviation vectors are then

Letting

$$\displaystyle \begin{aligned}\mathbf{X}=[X_1,\ldots,X_5], ~ \bar{\mathbf{X}}=[\bar{X},\ldots,\bar{X}] \mbox{ and } S=(\mathbf{X}-\bar{\mathbf{X}})(\mathbf{X}-\bar{\mathbf{X}})',\end{aligned}$$

so that the test statistic is

$$\displaystyle \begin{aligned} u&=\frac{s_{11}-S_{12}S_{22}^{-1}S_{21}}{s_{11}}=1-\frac{S_{12}S_{22}^{-1}S_{21}}{s_{11}}=1-r^2_{1.(2,3)}\\ &=1-\frac{760000}{(120)(7375)}=1-0.859=0.141 \Rightarrow\\ v&=\frac{(p-1)}{(n-p)}\frac{u}{1-u}=\Big(\frac{2}{2}\Big)\Big(\frac{0.141}{0.859}\Big)=0.164,\ v\sim F_{n-p,p-1}.\end{aligned} $$

Let us test H o at the significance level α = 0.05. In that case, the critical value which is available from F tables is F np,p−1,α = F 2,2,0.05 = 19. Since the observed value of v is 0.164 < 19, H o is not rejected.

Note 6.10.1

Observe that

$$\displaystyle \begin{aligned}u=1-r^2_{1.(2,3)},~~~ r^2_{1.(2,3)}=\frac{S_{12}S_{22}^{-1}S_{21}}{s_{11}}\end{aligned}$$

where r 1.(2,3) is the sample multiple correlation between the first component of X j ∼ N 3(μ, Σ), Σ > O, and the other two components of X j. If the population covariance matrix Σ is similarly partitioned, that is,

then, the population multiple correlation coefficient is ρ 1.(2,3) where

$$\displaystyle \begin{aligned}\rho^2_{1.(2,3)}=\frac{\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}}{\sigma_{11}}.\end{aligned}$$

Thus, if \(\varSigma _{12}=\varSigma _{21}^{\prime }=O\), ρ 1.(2,3) = 0 and conversely since σ 11 > 0 and \(\varSigma _{22}>O\Rightarrow \varSigma _{22}^{-1}>O\). The regression coefficient β being equal to the transpose of \(\varSigma _{12}\varSigma _{22}^{-1}\), Σ 12 = O also implies that the regression coefficient β = O and conversely. Accordingly, the hypothesis that the regression coefficient vector β = O is equivalent to hypothesizing that the population multiple correlation ρ 1,(2,…,p) = 0, which also implies the hypothesis that the two subvectors are independently distributed in the multivariate normal case, or that the covariance matrix Σ 12 = O, the only difference being that the test on regression coefficients is in the conditional space whereas testing the independence of the subvectors or whether the population multiple correlation equals zero is carried out in the entire space. The numerical example included in this section also illustrates the main result presented in Sect. 6.8.1 in connection with testing whether a population multiple correlation coefficient is equal to zero.

6.10.1. A simple linear model

Consider a linear model of the following form where a real scalar variable y is estimated by a linear function of pre-assigned real scalar variables z 1, …, z q:

$$\displaystyle \begin{aligned}y_j=\beta_o+\beta_1z_{1j}+\cdots+\beta_qz_{qj}+e_j, ~j=1,\ldots,n \end{aligned}$$
(i)

where y 1, …, y n are n observations on y, z i1, z i2, …, z in, i = 1, …, q, are preassigned values on z 1, …, z q, and β o, β 1, …, β q are unknown parameters. The random components e j, j = 1, …, n, are the corresponding sum total contributions coming from all unknown factors. There are two possibilities with respect to this model: β o = 0 or β o≠0. If β o = 0, β o is omitted in model (i) and we let y j = x j. If β o≠0, the model is modified by taking \(x_j=y_j-\bar {y}\) where \( \bar {y}=\frac {1}{n}(y_1+\cdots +y_n)\), then becoming

$$\displaystyle \begin{aligned}x_j=y_j-\bar{y}=\beta_1(z_{1j}-\bar{z}_1)+\cdots+\beta_q(z_{qj}-\bar{z}_q)+\epsilon_j \end{aligned}$$
(ii)

for some error term 𝜖 j, where \(\bar {z}_i=\frac {1}{n}(z_{i1}+\cdots +z_{in}), \ i=1,\ldots ,q\). Letting \(Z_j^{\prime }=(z_{1j},\ldots ,z_{qj})\) if β o = 0 and \(Z_j^{\prime }=(z_{1j}-\bar {z}_1,\ldots ,z_{qj}-\bar {z}_q)\), otherwise, equation (ii) can be written in vector/matrix notation as follows:

$$\displaystyle \begin{aligned} \epsilon_j&=(x_j-\beta_1z_{1j}-\cdots-\beta_qz_{qj})= (x_j-\beta'Z_j),\ \beta'=(\beta_1,\ldots,\beta_q),\\ \sum_{j=1}^n\epsilon_j^2&=\sum_{j=1}^n(x_j-\beta_1z_{1j}-\cdots-\beta_1qz_{qj})^2=\sum_{j=1}^n(x_j-\beta'Z_j)^2.\end{aligned} $$

Letting

(iii)
$$\displaystyle \begin{aligned} \epsilon'&=(X'-\beta'Z)\Rightarrow \sum_{j=1}^n\epsilon_j^2=\epsilon'\epsilon=(X'-\beta'Z)(X-Z'\beta).\end{aligned} $$
(iv)

The least squares minimum is thus available by differentiating 𝜖′𝜖 with respect to β, equating the resulting expression to a null vector and solving, which will produce a single critical point that corresponds to the minimum as the maximum occurs at + :

$$\displaystyle \begin{aligned} \frac{\partial}{\partial\beta}\Big(\sum_{j=1}^n\epsilon_j^2\Big)&=O\Rightarrow \sum_{j=1}^nx_jZ_j^{\prime}-\hat{\beta}'\sum_{j=1}^nZ_jZ_j^{\prime} \\ &\qquad \ \Rightarrow \hat{\beta}=\Big(\sum_{j=1}^nZ_jZ_j^{\prime}\Big)^{-1} \Big(\sum_{j=1}^nx_jZ_j^{\prime}\Big); \end{aligned} $$
(v)
$$\displaystyle \begin{aligned} \frac{\partial}{\partial \beta}(\epsilon'\epsilon)&=O\Rightarrow \hat{\beta}=(ZZ')^{-1}ZX,\end{aligned} $$
(vi)

that is,

$$\displaystyle \begin{aligned} \hat{\beta} =\Big(\sum_{j=1}^nZ_jZ_j^{\prime}\Big)^{-1}\Big(\sum_{j=1}^nx_jZ_j\Big). \end{aligned} $$
(6.10.4)

Since the z ij’s are preassigned quantities, it can be assumed without any loss of generality that (ZZ′) is nonsingular, and thereby that (ZZ′)−1 exists, so that the least squares minimum, usually denoted by s 2, is available by substituting \(\hat {\beta }\) for β in 𝜖′𝜖. Then, at \(\beta =\hat {\beta }\),

$$\displaystyle \begin{aligned} \epsilon'|{}_{\beta=\hat{\beta}}&=X'-X'Z'(ZZ')^{-1}Z=X'[I-Z'(ZZ')^{-1}Z]\ \mbox{so that}\\ s^2&=\epsilon'\epsilon|{}_{\beta=\hat{\beta}}=X'[I-Z'(ZZ')^{-1}Z]X{}\end{aligned} $$
(6.10.5)

where I − Z′(ZZ′)−1 Z is idempotent and of rank (n − 1) − q. Observe that if β o ≠ 0 in (i) and we had proceeded without eliminating β o, then β would have been of order (k + 1) × 1 and I − Z′(ZZ′)−1 Z, of rank n − (q + 1) = n − 1 − q, whereas if β o≠0 and we had eliminated β o from the model, then the rank of I − Z′(ZZ′)−1 Z would have been (n − 1) − q, that is, unchanged, since \(\sum _{j=1}^n(x_j-\bar {x})^2=X'[I-\frac {1}{n}JJ']X,~ J'=(1,\ldots ,1)\) and the rank of \(I-\frac {1}{n}JJ'=n-1\).

Some distributional assumptions on 𝜖 j are required in order to test hypotheses on β. Let 𝜖 j ∼ N 1(0, σ 2), σ 2 > 0, j = 1, …, n, be independently distributed. Then x j ∼ N 1(β′Z j, σ 2), j = 1, …, n are independently distributed but not identically distributed as the mean value depends on j. Under the normality assumption for the 𝜖 j’s, it can readily be seen that the least squares estimators of β and σ 2 coincide with the maximum likelihood estimators. It can also be observed that σ 2 is estimated by \(\frac {s^2}{n}\) where n is the sample size. In this simple linear regression context, the parameter space Ω = {(β, σ 2)|σ 2 > 0}. Thus, under the normality assumption, the maximum of the likelihood function L is given by

$$\displaystyle \begin{aligned}\max_{\Omega}L=\frac{{\mathrm{e}}^{-\frac{n}{2}}n^{\frac{n}{2}}}{(2\pi)^{\frac{n}{2}}[s^2]^{\frac{n}{2}}}. \end{aligned}$$
(vi)

Under the hypothesis H o : β = O or β 1 = 0 = ⋯ = β q, the least squares minimum, usually denoted as \(s_o^2\), is X′X and, assuming normality, the maximum of the likelihood function under H o is the following:

$$\displaystyle \begin{aligned}\max_{H_o}L=\frac{{\mathrm{e}}^{-\frac{n}{2}}n^{\frac{n}{2}}}{(2\pi)^{\frac{n}{2}}[s_o^2]^{\frac{n}{2}}}. \end{aligned}$$
(vii)

Thus, the λ-criterion is

$$\displaystyle \begin{aligned} \lambda&=\frac{\max_{H_o}L}{\max_{\Omega}L}=\Big[\frac{s^2}{s_o^2}\Big]^{\frac{n}{2}}\\ \Rightarrow\ u&=\lambda^{\frac{2}{n}}=\frac{X'[I-Z'(ZZ')^{-1}Z]X}{X'X}\\ &=\frac{X'[I-Z'(ZZ')^{-1}Z]X}{X'[I-Z'(ZZ')^{-1}Z]X+X'Z'(ZZ')^{-1}ZX}\\ &=\frac{1}{1+u_1}{}\end{aligned} $$
(6.10.6)

where

$$\displaystyle \begin{aligned}u_1=\frac{X'Z'(ZZ')^{-1}ZX}{X'[I-Z'(ZZ')^{-1}Z]X}=\frac{s_o^2-s^2}{s^2} \end{aligned}$$
(viii)

with the matrices Z′(ZZ′)−1 Z and I − Z′(ZZ′)−1 Z being idempotent, mutually orthogonal, and of ranks q and (n − 1) − q, respectively. We can interpret \(s_o^2-s^2\) as the sum of squares due to the hypothesis and s 2 as the residual part. Under the normality assumption, \(s_o^2-s^2\) and s 2 are independently distributed in light of independence of quadratic forms that was discussed in Sect. 3.4.1; moreover, their representations as quadratic forms in idempotent matrices of ranks q and (n − 1) − q implies that \(\frac {s_o^2-s^2}{\sigma ^2}\sim \chi _q^2\) and \(\frac {s^2}{\sigma ^2}\sim \chi _{(n-1)-q}^2\). Accordingly, under the null hypothesis,

$$\displaystyle \begin{aligned}u_2=\frac{(s_o^2-s^2)/q}{s^2/[(n-1)-q] }\sim F_{q,\, n-1-q},{}\end{aligned} $$
(6.10.7)

that is, an F-statistic having q and n − 1 − q degrees of freedom. Thus, we reject H o for small values of λ or equivalently for large values of u 2 or large values of F q,n−1−q. Hence, the following criterion:

$$\displaystyle \begin{aligned}\mbox{Reject}\ H_o\ \mbox{if the observed value of }u_2\ge F_{q,\,n-1-q,\ \alpha},\, Pr\{F_{q,\, n-1-q}\ge\ F_{q,\,n-1-q,~\alpha}\}=\alpha.{}\end{aligned} $$
(6.10.8)

A detailed discussion of the real scalar variable case is provided in Mathai and Haubold (2017).

6.10.2. Hypotheses on individual parameters

Denoting the expected value of (⋅) by E[(⋅)], it follows from (6.10.4) that

$$\displaystyle \begin{aligned} E[\hat{\beta}]&=(ZZ')^{-1}ZE(X)=(ZZ')^{-1}ZZ'\beta=\beta,~ E[X]=Z'\beta,\ \mbox{and}\\ {\mathrm{Cov}}(\hat{\beta})&=(ZZ')^{-1}Z[{\mathrm{Cov}}(X)]Z'(ZZ')^{-1}=(ZZ')^{-1}Z(\sigma^2 I)Z'(ZZ')^{-1}\\ &=\sigma^2(ZZ')^{-1}.\end{aligned} $$

Under the normality assumption on x j, we have \(\hat {\beta }\sim N_q(\beta ,\sigma ^2(ZZ')^{-1})\). Letting the (r, r)-th diagonal element of (ZZ′)−1 be b rr, then \(\hat {\beta }_r\), the estimator of the r-th component of the parameter vector β, is distributed as \(\hat {\beta }_r\sim N_1(\beta _r,\sigma ^2b_{rr})\), so that

$$\displaystyle \begin{aligned}\frac{\hat{\beta}_r-\beta_r}{\hat{\sigma}\sqrt{b_{rr}}}\sim t_{n-1-q}{}\end{aligned} $$
(6.10.9)

where t n−1−q denotes a Student-t distribution having n − 1 − q degrees of freedom and \(\hat {\sigma }^2=\frac {s^2}{n-1-q}\) is an unbiased estimator for σ 2. On writing s 2 in terms of 𝜖, it is easily seen that E[s 2] = (n − 1 − q)σ 2 where s 2 is the least squares minimum in the entire parameter space Ω. Thus, one can test hypotheses on β r and construct confidence intervals for that parameter by means of the Student-t statistic specified in (6.10.9) or its square \(t^2_{n-1-q}\) which has an F distribution having 1 and n − 1 − q degrees of freedom, that is, \(t^2_{n-1-q}\sim F_{1,n-1-q}\).

Example 6.10.2

Let us consider a linear model of the following form:

$$\displaystyle \begin{aligned}y_j=\beta_o+\beta_1z_{1j}+\cdots+\beta_qz_{qj}+e_j,\ j=1,\ldots,n, \end{aligned}$$

where the z ij’s are preassigned numbers of the variable z i. Let us take n = 5 and q = 2, so that the sample is of size 5 and, excluding β o, the model has two parameters. Let the observations on y and the preassigned values on the z i’s be the following:

$$\displaystyle \begin{aligned} 1&=\beta_o+\beta_1(0)+\beta_1(1)+e_1\\ 2&=\beta_o+\beta_1(1)+\beta_2(-1)+e_2\\ 4&=\beta_o+\beta_1(-1)+\beta_2(2)+e_3\\ 6&=\beta_o+\beta_1(2)+\beta_2(-2)+e_4\\ 7&=\beta_o+\beta_1(-2)+\beta_2(5)+e_5\,.\end{aligned} $$

The averages on y, z 1 and z 2 are then

$$\displaystyle \begin{aligned} \bar{y}&=\frac{1}{5}[1+2+4+6+7]=4,~ \bar{z}_1=\frac{1}{5}[0+1+(-1)+2+(-2)]=0 \ \ \mbox{and}\\ \bar{z}_2&=\frac{1}{5}[1+(-1)+2+(-2)+5]=1,\end{aligned} $$

and, in terms of deviations, the model becomes

$$\displaystyle \begin{aligned}x_j=y_j-\bar{y}=\beta_1(z_{1j}-\bar{z}_1)+\beta_2(z_{2j}-\bar{z}_2)+\epsilon_j, \epsilon_j=e_j-\bar{e}\,.\end{aligned}$$

That is,

When minimizing 𝜖′𝜖 = (X − Z′β)(X − Z′β), we determined that \(\hat {\beta }\), the least squares estimate of β, the least squares minimum s 2 and \(s_o^2-s^2=\) corresponding to the sum of squares due to β, could be express as

$$\displaystyle \begin{aligned} \hat{\beta}&=(ZZ')^{-1}ZX,~ s_o^2-s^2=X'Z'(ZZ')^{-1}ZX,\\ s^2&=X'[I-Z'(ZZ')^{-1}Z]X\ \, \mbox{and that}\\ Z'(ZZ')^{-1}Z&=[Z'(ZZ')^{-1}Z]^2, I-Z'(ZZ')^{-1}Z=[I-Z'(ZZ')^{-1}Z]^2,\\ &[I-Z'(ZZ')^{-1}Z][Z'(ZZ')^{-1}Z]=O.\end{aligned} $$

Let us evaluate those quantities:

Then,

$$\displaystyle \begin{aligned}X'[I-Z'(ZZ')^{-1}Z]X=26-\frac{120}{11}=\frac{166}{11},~~\mbox{with}\ n=5\ \mbox{and}\ q=2.\end{aligned}$$

The test statistics u 1 and u 2 and their observed values are the following:

$$\displaystyle \begin{aligned} u_1&=\frac{X'Z'(ZZ')^{-1}ZX}{X'[I-Z'(ZZ')^{-1}Z]X}=\frac{s_o^2-s^2}{s^2}\\ &=\frac{120}{166}=0.72;\\ u_2&=\frac{(s_o^2-s^2)/q}{s^2/(n-1-q)}\sim F_{q,~n-1-q}\\ &=\frac{120/2}{166/2}=0.72.\end{aligned} $$

Letting the significance level be α = 0.05, the required tabulated critical value is F q,n−1−q,α = F 2,2,0.05 = 19. Since 0.72 < 19, the hypothesis H o : β = O is not rejected. Thus, we will not proceed to test individual hypotheses on the regression coefficients β 1 and β 2. For tests on general linear models, refer for instance to Mathai (1971).

6.11. Problem Involving Two or More Independent Gaussian Populations

Consider k independent p-variate real normal populations X j ∼ N p(μ (j), Σ), Σ > O, j = 1, …, k, having the same nonsingular covariance matrix Σ but possibly different mean values. We consider the problem of testing hypotheses on linear functions of the mean values. Let b = a 1 μ (1) + ⋯ + a k μ (k) where a 1, …, a k, are real scalar constants, and the null hypothesis be H o : b = b o (given), which means that the a j’s and μ (j)’s, j = 1, …, k, are all specified. It is also assumed that Σ is known. Suppose that simple random samples of sizes n 1, …, n k from these k independent normal populations can be secured, and let the sample values be X jq, q = 1, …, n j, where \(X_{j1},\ldots ,X_{jn_j}\) are iid as N p(μ (j), Σ), Σ > O. Let the sample averages be denoted by \(\bar {X}_j=\frac {1}{n_j}\sum _{q=1}^{n_j}X_{jq},~ j=1,\ldots ,k\). Consider the test statistic \(U_k=a_1\bar {X}_1+\cdots +a_k\bar {X}_k\). Since the populations are independent and U k is a linear function of independent vector normal variables, U k is normally distributed with the mean value b = a 1 μ (1) + ⋯ + a k μ (k) and covariance matrix \(\frac {1}{n}\varSigma \), where \(\frac {1}{n}=(\frac {a_1^2}{n_1}+\cdots +\frac {a_k^2}{n_k})\) and so, \(\sqrt {n}\,\varSigma ^{-\frac {1}{2}}(U_k-b)\sim N_p(O,I)\). Then, under the hypothesis H o : b = b o (given), which is being tested against the alternative H 1 : bb o, the test criterion is obtained by proceeding as was done in the single population case. Thus, the test statistic is \(z=n(U_k-b_o)'\varSigma ^{-1}(U_k-b_o)\sim \chi _{p}^2\) and the criterion will be to reject the null hypothesis for large values of the z. Accordingly, the criterion is

$$\displaystyle \begin{aligned} \mbox{Reject}\ H_o: b=b_o\ \mbox{if the observed value of } &n(U_k-b_o)'\varSigma^{-1}(U_k-b_o)\ge \chi_{p,~\alpha}^2,\\ \mbox{with}\ Pr\{\chi_{p}^2\ge \chi_{p,~\alpha}^2\}&=\alpha.{}\end{aligned} $$
(6.11.1)

In particular, suppose that we wish to test the hypothesis H o : δ = μ (1) − μ (2) = δ 0, such as δ o = 0 as is often the case, against the natural alternative. In this case, when δ o = 0, the null hypothesis is that the mean value vectors are equal, that is, μ (1) = μ (2), and the test statistic is \( z=n(\bar {X}_1-\bar {X}_2)'\varSigma ^{-1}(\bar {X}_1-\bar {X}_2)\sim \chi _{p}^2\) with \(\frac {1}{n}=\frac {1}{n_1}+\frac {1}{n_2}\), the test criterion being

$$\displaystyle \begin{aligned} \mbox{Reject}\ H_o:\mu_{(1)}-\mu_{(2)}=0\ \mbox{if the observed value of } &z\ge \chi_{p,~\alpha}^2,\\ \mbox{with}\ Pr\{\chi_{p}^2\ge \chi_{p,~\alpha}^2\}=\alpha.\qquad \quad \quad \ & {}\end{aligned} $$
(6.11.2)

For a numerical example, the reader is referred to Example 6.2.3. One can also determine the power of the test or the probability of rejecting H o : δ = μ (1) − μ (2) = δ 0, under an alternative hypothesis, in which case the distribution of z is a noncentral chisquare variable with p degrees of freedom and non-centrality parameter \(\lambda =\frac {1}{2}\frac {n_1n_2}{n_1+n_2}(\delta -\delta _o)'\varSigma ^{-1}(\delta -\delta _o),~ \delta =\mu _{(1)}-\mu _{(2)}\), where n 1 and n 2 are the sample sizes. Under the null hypothesis, the non-centrality parameter λ is equal to zero. The power is given by

$$\displaystyle \begin{aligned}\mbox{Power }= Pr\{\mbox{reject }H_o|H_1\}=Pr\{\chi_{p}^2(\lambda)\ge \chi_{p,~\alpha}^2(\lambda)\}.{}\end{aligned} $$
(6.11.3)

When the population covariance matrices are identical and the common covariance matrix is unknown, one can also construct a statistic for testing hypotheses on linear functions of the mean value vectors by making use of steps parallel to those employed in the single population case, with the resulting criterion being based on Hotelling’s T 2 statistic for testing H o : μ (1) = μ (2).

6.11.1. Equal but unknown covariance matrices

Let us consider the same procedure as in Sect. 6.11 to test a hypothesis on a linear function b = a 1 μ (1) + ⋯ + a k μ (k) where a 1, …, a k are known real scalar constants and μ (j), j = 1, …, k, are the population mean values. We wish to test the hypothesis H o : b = b o (given) in the sense all the mean values μ (j), j = 1, …, k and a 1, …, a k, are specified. . Let \(U_k=a_1\bar {X}_1+\cdots +a_k\bar {X}_k\) as previously defined. Then, E[U k] = b and \({\mathrm{Cov}}(U_k)=(\frac {a_1^2}{n_1}+\cdots +\frac {a_k^2}{n_k})\varSigma \), where \((\frac {a_1^2}{n_1}+\cdots +\frac {a_k^2}{n_k})\equiv \frac {1}{n}\) for some symbol n. The common covariance matrix Σ has the MLE \(\frac {1}{n_1+\cdots +n_k}(S_1+\cdots +S_k)\) where S j is the sample sum of products matrix for the j-th Gaussian population. It has been established that S = S 1 + ⋯ + S k has a Wishart distribution with (n 1 − 1) + ⋯ + (n k − 1) = N − k, N = n 1 + ⋯ + n k, degrees of freedom, that is,

$$\displaystyle \begin{aligned}S\sim W_p(N-k,\varSigma),~ \varSigma>O,~ N=n_1+\cdots+n_k.{}\end{aligned} $$
(6.11.4)

Then, when Σ is unknown, it follows from a derivation parallel to that provided in Sect. 6.3 for the single population case that

$$\displaystyle \begin{aligned}w\equiv n(U_k-b)'S^{-1}(U_k-b)\sim \mbox{ type-2 beta } \Big(\frac{p}{2},~ \frac{N-k-p}{2}\Big),{}\end{aligned} $$
(6.11.5)

or, w has a real scalar type-2 beta distribution with the parameters \((\frac {p}{2},~ \frac {N-k-p}{2})\). Letting \(w=\frac {p}{N-k-p}F\), this F is an F-statistic with p and N − k − p degrees of freedom.

Theorem 6.11.1

Let U k, n, N, b, S be as defined above. Then w = n(U k − b)′S −1 (U k − b) has a real scalar type-2 beta distribution with the parameters \((\frac {p}{2}, ~\frac {N-k-p}{2})\) . Letting \(w=\frac {p}{N-k-p}F\) , this F is an F-statistic with p and N  k  p degrees of freedom.

Hence for testing the hypothesis H o : b = b o (given), the criterion is the following:

$$\displaystyle \begin{aligned} \mbox{Reject}\ H_o\ \mbox{if the observed value of }F&=\frac{N-k-p}{p}w\ge F_{p\,N-k-p,~\alpha},{} \end{aligned} $$
(6.11.6)
$$\displaystyle \begin{aligned} \mbox{with}\ Pr\{F_{p,\,N-k-p}&\ge F_{p,\,N-k-p,~\alpha}\}=\alpha.{}\end{aligned} $$
(6.11.7)

Note that by exploiting the connection between type-1 and type-2 real scalar beta random variables, one can obtain a number of properties on this F-statistic.

This situation has already been covered in Theorem 6.3.4 for the case k = 2.

6.12. Equality of Covariance Matrices in Independent Gaussian Populations

Let X j ∼ N p(μ (j), Σ j), Σ j > O, j = 1, …, k, be independently distributed real p-variate Gaussian populations. Consider simple random samples of sizes n 1, …, n k from these k populations, whose sample values, denoted by X jq, q = 1, …, n j, are iid as X j1, j = 1, …, k. The sample sums of products matrices denoted by S 1, …, S k, respectively, are independently distributed as Wishart matrix random variables with n j − 1, j = 1, …, k, degrees of freedoms. The joint density of all the sample values is then given by

$$\displaystyle \begin{aligned} L=\prod_{j=1}^kL_j,~~ L_j=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma_j^{-1}S_j)-\frac{n_j}{2}(\bar{X}_j-\mu_{(j)})'\varSigma_j^{-1}(\bar{X}_j-\mu_{(j)})}}{(2\pi)^{\frac{n_jp}{2}}|\varSigma_j|{}^{\frac{n_j}{2}}}, \end{aligned} $$
(6.12.1)

the MLE’s of μ (j) and Σ j being \(\hat {\mu _{(j)}}=\bar {X}_j\) and \( \hat {\varSigma }_j=\frac {1}{n_j}S_j\). The maximum of L in the entire parameter space Ω is

$$\displaystyle \begin{aligned}\max_{\Omega}L=\prod_{j=1}^k\max_{\Omega}L_j=\frac{\Big[\prod_{j=1}^kn_j^{\frac{n_jp}{2}}\Big]{\mathrm{e}}^{-\frac{Np}{2}}}{(2\pi)^{\frac{Np}{2}}\prod_{j=1}^k|S_j|{}^{\frac{n_j}{2}}}, ~N=n_1+\cdots+n_k. \end{aligned}$$
(i)

Let us test the hypothesis of equality of covariance matrices:

$$\displaystyle \begin{aligned}H_o: \varSigma_1=\varSigma_2=\cdots=\varSigma_k=\varSigma \end{aligned}$$

where Σ is unknown. Under this null hypothesis, the MLE of μ (j) is \(\bar {X}_j\) and the MLE of the common Σ is \(\frac {1}{N}(S_1+\cdots +S_k)=\frac {1}{N}S, ~N=n_1+\cdots +n_k,~ S=S_1+\cdots +S_k\). Thus, the maximum of L under H o is

$$\displaystyle \begin{aligned}\max_{H_o}L=\frac{N^{\frac{Np}{2}}{\mathrm{e}}^{-\frac{Np}{2}}}{(2\pi)^{\frac{Np}{2}}\prod_{j=1}^k|S|{}^{\frac{n_j}{2}}}=\frac{N^{\frac{Np}{2}}{\mathrm{e}}^{-\frac{Np}{2}}}{(2\pi)^{\frac{Np}{2}}|S|{}^{\frac{N}{2}}},\end{aligned}$$
(ii)

and the λ-criterion is the following:

$$\displaystyle \begin{aligned}\lambda=\frac{N^{\frac{Np}{2}}\Big\{\prod_{j=1}^k|S_j|{}^{\frac{n_j}{2}}\Big\}}{|S|{}^{\frac{N}{2}}\Big\{\prod_{j=1}^kn_j^{\frac{n_jp}{2}}\Big\}}.{}\end{aligned} $$
(6.12.2)

Let us consider the h-th moment of λ for an arbitrary h. Letting \(c=\frac {N^{\frac {Np}{2}}}{\Big \{\prod _{j=1}^kn_j^{\frac {n_jp}{2}}\Big \}},\)

$$\displaystyle \begin{aligned} \lambda^h=c^h\frac{\Big\{\prod_{j=1}^k|S_j|{}^{\frac{n_jh}{2}}\Big\}}{|S|{}^{\frac{Nh}{2}}}=c^h\Big\{\prod_{j=1}^k|S_j|{}^{\frac{n_jh}{2}}\Big\}|S|{}^{-\frac{Nh}{2}}. \end{aligned} $$
(6.12.3)

The factor causing a difficulty, namely \(|S|{ }^{-\frac {Nh}{2}}\), will be replaced by an equivalent integral. Letting Y > O be a real p × p positive definite matrix, we have the identity

$$\displaystyle \begin{aligned}|S|{}^{-\frac{Nh}{2}}=\frac{1}{\varGamma_p(\frac{Nh}{2})}\int_{Y>O}|Y|{}^{\frac{Nh}{2}-\frac{p+1}{2}}{\mathrm{e}}^{-{\mathrm{tr}}(YS)}{\mathrm{d}}Y,~ \Re(\frac{Nh}{2})>\frac{p-1}{2},~S>O,{}\end{aligned} $$
(6.12.4)

where

$$\displaystyle \begin{aligned}{\mathrm{tr}}(YS)={\mathrm{tr}}(YS_1)+\cdots+{\mathrm{tr}}(YS_k). \end{aligned}$$
(iii)

Thus, once (6.12.4) is substituted in (6.12.3), λ h splits into products involving S j, j = 1, …, k; this enables one to integrate out over the densities of S j, which are Wishart densities with m j = n j − 1 degrees of freedom. Noting that the exponent involving S j is \(-\frac {1}{2}{\mathrm{tr}}(\varSigma _j^{-1}S_j)-{\mathrm{tr}}(YS_j)=-\frac {1}{2}{\mathrm{tr}}[S_j(\varSigma _j^{-1}+2Y)]\), the integral over the Wishart density of S j gives the following:

$$\displaystyle \begin{aligned} \prod_{j=1}^k&\frac{1}{2^{\frac{m_jp}{2}}\varGamma_p(\frac{m_j}{2})|\varSigma_j|{}^{\frac{m_j}{2}}}\int_{S_j>O}|S_j|{}^{\frac{m_j}{2}+\frac{n_jh}{2} -\frac{p+1}{2}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}[S_j(\varSigma^{-1}+2Y)]}{\mathrm{d}}S_j\\ &=\prod_{j=1}^k\frac{2^{\frac{n_jhp}{2}}\varGamma_p(\frac{m_j}{2}+\frac{n_jh}{2})|\varSigma_j^{-1}+2Y|{}^{-(\frac{m_j}{2}+\frac{n_jh}{2})}} {\varGamma_p(\frac{m_j}{2})|\varSigma_j|{}^{\frac{m_j}{2}}}\\ &=2^{\frac{Nph}{2}}\prod_{j=1}^k\frac{|\varSigma_j|{}^{\frac{n_jh}{2}}|I+2\varSigma_j Y|{}^{-(\frac{m_j}{2}+\frac{n_jh}{2})}\varGamma_p(\frac{m_j}{2}+\frac{n_jh}{2})}{\varGamma_p(\frac{m_j}{2})}.\end{aligned} $$
(iv)

Thus, on substituting (iv) in E[λ h], we have

$$\displaystyle \begin{aligned} E[\lambda^h]&=\frac{c^h2^{\frac{Nph}{2}}}{\varGamma_p(\frac{Nh}{2})}\int_{Y>O}|Y|{}^{\frac{Nh}{2}-\frac{p+1}{2}}\\ &\qquad \qquad \times \Big\{\prod_{j=1}^k\frac{|\varSigma_j|{}^{\frac{n_jh}{2}}|I+2\varSigma_j Y|{}^{-(\frac{m_j}{2}+\frac{n_jh}{2})}\varGamma_p(\frac{m_j}{2}+\frac{n_jh}{2})}{\varGamma_p(\frac{m_j}{2})}\Big\}{\mathrm{d}}Y,{}\end{aligned} $$
(6.12.5)

which is the non-null h-th moment of λ. The h-th null moment is available when Σ 1 = ⋯ = Σ k = Σ. In the null case,

$$\displaystyle \begin{aligned} \prod_{j=1}^k&|I+2Y\varSigma_j|{}^{-(\frac{m_j}{2}+\frac{n_jh}{2})}|\varSigma_j|{}^{\frac{n_jh}{2}}\frac{\varGamma_p(\frac{m_j}{2}+\frac{n_jh}{2})} {\varGamma_p(\frac{m_j}{2})}\\ &=|\varSigma|{}^{\frac{Nh}{2}}|I+2Y\varSigma|{}^{-(\frac{N-k}{2}+\frac{Nh}{2})}\Big\{\prod_{j=1}^k\frac{\varGamma_p(\frac{m_j}{2}+\frac{n_jh}{2})} {\varGamma_p(\frac{m_j}{2})}\Big\}.\end{aligned} $$
(v)

Then, substituting (v) in (6.12.5) and integrating out over Y  produces

$$\displaystyle \begin{aligned} E[\lambda^h|H_o]=c^h\frac{\varGamma_p(\frac{N-k}{2})}{\varGamma_p(\frac{N-k}{2}+\frac{Nh}{2})}\Big\{\prod_{j=1}^k\frac{\varGamma_p(\frac{n_j-1}{2} +\frac{n_jh}{2})}{\varGamma_p(\frac{n_j-1}{2})}\Big\}. \end{aligned} $$
(6.12.6)

Observe that when h = 0, E[λ h|H o] = 1. For h = s − 1 where s is a complex parameter, we have the Mellin transform of the density of λ, denoted by f(λ), which can be expressed as follows in terms of an H-function:

$$\displaystyle \begin{aligned} f(\lambda)=\frac{1}{c}\frac{\varGamma_p(\frac{N-k}{2})}{\Big\{\prod_{j=1}^k\varGamma_p(\frac{n_j-1}{2})\Big\}}H_{1,k}^{k,0}\left[\frac{\lambda}{c} \Big\vert_{(-\frac{1}{2},~\frac{n_j}{2}),~j=1,\ldots,k}^{(-\frac{k}{2},~\frac{N}{2})}\right],~0<\lambda <1, \end{aligned} $$
(6.12.7)

and zero elsewhere, where the H-function is defined in Sect. 5.4.3, more details being available from Mathai and Saxena (1978) and Mathai et al. (2010). Since the coefficients of \(\frac {h}{2}\) in the gammas, that is, n 1, …, n k and N, are all positive integers, one can expand all gammas by using the multiplication formula for gamma functions, and then, f(λ) can be expressed in terms of a G-function as well. It may be noted from (6.12.5) that for obtaining the non-null moments, and thereby the non-null density, one has to integrate out Y  in (6.12.5). This has not yet been worked out for a general k. For k = 2, one can obtain a series form in terms of zonal polynomials for the integral in (6.12.5). The rather intricate derivations are omitted.

6.12.1. Asymptotic behavior

We now investigate the asymptotic behavior of \(-2\ln \lambda \) as n j →, j = 1, …, k, N = n 1 + ⋯ + n k. On expanding the real matrix-variate gamma functions in the the gamma ratio involving h in (6.12.6), we have

$$\displaystyle \begin{aligned}\frac{\prod_{j=1}^k\varGamma_p(\frac{n_j}{2}(1+h)-\frac{1}{2})}{\varGamma_p(\frac{N}{2}(1+h)-\frac{k}{2})}\to \frac{\prod_{j=1}^k\Big\{\prod_{i=1}^p\varGamma(\frac{n_j}{2}(1+h)-\frac{1}{2}-\frac{i-1}{2})\Big\}}{\prod_{i=1}^p\varGamma(\frac{N}{2}(1+h)-\frac{k}{2}-\frac{i-1}{2})}, \end{aligned}$$
(i)

excluding the factor containing π. Letting \(\frac {n_j}{2}(1+h)\to \infty ,~ j=1,\ldots ,k,\) and \(\frac {N}{2}(1+h)\to \infty \) as n j →, j = 1, …, k, with N →, we now express all the gamma functions in (i) in terms of Sterling’s asymptotic formula. For the numerator, we have

$$\displaystyle \begin{aligned} &\prod_{j=1}^k\Big\{\prod_{i=1}^p\varGamma\Big(\frac{n_j}{2}(1+h)-\frac{1}{2}-\frac{i-1}{2}\Big)\Big\}\\ &\to\prod_{j=1}^k\Big\{\prod_{i=1}^p\sqrt{(2\pi)}\Big[\frac{n_j}{2}(1+h)\Big]^{\frac{n_j}{2}(1+h)-\frac{1}{2}-\frac{i}{2}}{\mathrm{e}}^{-\frac{n_j}{2}(1+h)}\Big\}\\ &=\prod_{j=1}^k(\sqrt{2\pi})^p\Big[\frac{n_j}{2}(1+h)\Big]^{\frac{n_j}{2}(1+h)p-\frac{p}{2}-\frac{p(p+1)}{4}}{\mathrm{e}}^{-\frac{n_j}{2}(1+h)p}\\ &=(\sqrt{2\pi})^{kp}\Big[\prod_{j=1}^k\Big(\frac{n_j}{2}\Big)^{\frac{n_j}{2}(1+h)p-\frac{p}{2}-\frac{p(p+1)}{4}}\Big]{\mathrm{e}}^{-\frac{N}{2}(1+h)p}\\ &\qquad \ \ \ \ \ \ \ \ \ \ \quad \times(1+h)^{\frac{N}{2}(1+h)-\frac{kp}{2}-k\frac{p(p+1)}{4}},\end{aligned} $$
(ii)

and the denominator in (i) has the following asymptotic representation:

$$\displaystyle \begin{aligned} \prod_{i=1}^p\varGamma\Big(\frac{N}{2}(1+h)-\frac{k}{2}-\frac{i-1}{2}\Big)&\to\prod_{i=1}^p(\sqrt{2\pi})\Big[\frac{N}{2}(1+h)\Big]^{\frac{N}{2}(1+h)-\frac{k}{2}-\frac{i}{2}}{\mathrm{e}}^{-\frac{N}{2}(1+h)}\\ &=(\sqrt{2\pi})^{p}\Big[\frac{N}{2}\Big]^{\frac{N}{2}(1+h)p-\frac{kp}{2}-\frac{p(p+1)}{4}}{\mathrm{e}}^{-\frac{N}{2}(1+h)p}\\ &\ \ \ \ \ \ \times (1+h)^{\frac{N}{2}(1+h)p-\frac{kp}{2}-\frac{p(p+1)}{4}}.\end{aligned} $$
(iii)

Now, expanding the gammas in the constant part \(\varGamma _p(\frac {N-k}{2})/\prod _{j=1}^k\varGamma _p(\frac {n_j-1}{2})\) and then taking care of c h, we see that the factors containing π, the n j’s and N disappear leaving

$$\displaystyle \begin{aligned}(1+h)^{-(k-1)\frac{p(p+1)}{4}}.\end{aligned}$$

Hence \(-2\ln \lambda \to \chi ^2_{(k-1)\frac {p(p+1)}{2}}\) and hence we have the following result:

Theorem 6.12.1

Consider the λ-criterion in (6.12.2) or the null density in (6.12.7). When n j →∞, j = 1, …, k, the asymptotic null density of \(-2\ln \lambda \) is a real scalar chisquare with \((k-1)\frac {p(p+1)}{2}\) degrees of freedom.

Observe that the number of parameters restricted by the null hypothesis H o : Σ 1 = ⋯ = Σ k = Σ where Σ is unknown, is (k − 1) times the number of distinct parameters in Σ, which is \(\frac {p(p+1)}{2}\), which coincides with the number of degrees of freedom of the asymptotic chisquare distribution under H o.

6.13. Testing the Hypothesis that k Independent p-variate Real Gaussian Populations are Identical and Multivariate Analysis of Variance

Consider k independent p-variate real Gaussian populations X ij ∼ N p(μ (i), Σ i), Σ i > O, i = 1, …, k, and j = 1, …, n i, where the p × 1 vector X ij is the j-th sample value belonging to the i-th population, these samples (iid variables) being of sizes n 1, …, n k from these k populations. The joint density of all the sample values, denoted by L, can be expressed as follows:

$$\displaystyle \begin{aligned} L&=\prod_{i=1}^k L_i,~~ L_i=\prod_{j=1}^{n_i}\frac{{\mathrm{e}}^{-\frac{1}{2}\sum_{j=1}^{n_i}(X_{ij}-\mu^{(i)})'\varSigma_i^{-1}(X_{ij}-\mu^{(i)})}}{(2\pi)^{\frac{n_ip}{2}}|\varSigma_i|{}^{\frac{n_i}{2}}}\\ &=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma_i^{-1}S_i)-\frac{n_i}{2}(\bar{X}_i-\mu^{(i)})'\varSigma_i^{-1}(\bar{X}_i-\mu^{(i)})}}{(2\pi)^{\frac{n_i}{2}}|\varSigma_i|{}^{\frac{n_i}{2}}}\end{aligned} $$

where \(\bar {X}_i=\frac {1}{n_i}(X_{i1}+\cdots +X_{in_i}),~ i=1,\ldots ,k,\) and E[X ij] = μ (i), j = 1, …, n i. Then, letting N = n 1 + ⋯ + n k,

$$\displaystyle \begin{aligned} \max_{\Omega}L&=\prod_{i=1}^k\max_{\Omega}L_i=\prod_{i=1}^k\frac{n_i^{\frac{n_ip}{2}}{\mathrm{e}}^{-\frac{n_ip}{2}}}{(2\pi)^{\frac{n_ip}{2}}|S_i|{}^{\frac{n_i}{2}}}\\ &=\frac{\{\prod_{i=1}^kn_i^{\frac{n_ip}{2}}\}{\mathrm{e}}^{-\frac{Np}{2}}}{(2\pi)^{\frac{Np}{2}}\{\prod_{i=1}^k|S_i|{}^{\frac{n_i}{2}}\}}.\end{aligned} $$

Consider the hypothesis H o : μ (1) = ⋯ = μ (k) = μ, Σ 1 = ⋯ = Σ k = Σ, where μ and Σ are unknown. This corresponds to the hypothesis of equality of these k populations. Under H o, the maximum likelihood estimator (MLE) of μ, denoted by \(\hat {\mu }\), is given by \(\hat {\mu }=\frac {1}{N}[n_1\bar {X}_1+\cdots +n_k\bar {X}_k]\) where N and \(\bar {X}_i\) are as defined above. As for the common Σ, its MLE is

$$\displaystyle \begin{aligned} \hat{\varSigma}&=\frac{1}{N}\sum_{i=1}^k\sum_{j=1}^{n_i}(X_{ij}-\hat{\mu})(X_{ij}-\hat{\mu})'\\ &=\frac{1}{N}[S_1+\cdots+S_k+\sum_{i=1}^kn_i(\bar{X}_i-\hat{\mu})(\bar{X}_i-\hat{\mu})']\end{aligned} $$

where S i is the sample sum of products matrix for the i-th sample, observing that

$$\displaystyle \begin{aligned} \sum_{j=1}^{n_i}(X_{ij}-\hat{\mu})(X_{ij}-\hat{\mu})'&=\sum_{j=1}^{n_i}(X_{ij}-\bar{X}_i+\bar{X}_i-\hat{\mu})(X_{ij}-\bar{X}_i+\bar{X}_i-\hat{\mu})'\\ &=\sum_{j=1}^{n_i}(X_{ij}-\bar{X}_i)(X_{ij}-\bar{X}_i)'+\sum_{j=1}^{n_i}(\bar{X}_i-\hat{\mu})(\bar{X}_i-\hat{\mu})'\\ &=S_i+n_i(\bar{X}_i-\hat{\mu})(\bar{X}_i-\hat{\mu})'.\end{aligned} $$

Hence the maximum of the likelihood function under H o is the following:

$$\displaystyle \begin{aligned}\max_{H_o}L=\frac{{\mathrm{e}}^{-\frac{Np}{2}}N^{\frac{Np}{2}}}{(2\pi)^{\frac{Np}{2}}|S+\sum_{i=1}^kn_i(\bar{X}_i-\hat{\mu})(\bar{X}_i-\hat{\mu})^{\prime}|{}^{\frac{N}{2}}} \end{aligned}$$

where S = S 1 + ⋯ + S k. Therefore the λ-criterion is given by

$$\displaystyle \begin{aligned}\lambda=\frac{\max_{H_o}}{\max_{\Omega}}=\frac{\{\prod_{i=1}^k|S_i|{}^{\frac{n_i}{2}}\}N^{\frac{Np}{2}}}{\{\prod_{i=1}^kn_i^{\frac{n_ip}{2}}\}|S+\sum_{i=1}^kn_i(\bar{X}_i-\hat{\mu})(\bar{X}_i-\hat{\mu})^{\prime}|{}^{\frac{N}{2}}}.{}\end{aligned} $$
(6.13.1)

6.13.1. Conditional and marginal hypotheses

For convenience, we may split λ into the product λ 1 λ 2 where λ 1 is the λ-criterion for the conditional hypothesis H o1 :  μ (1) = ⋯ = μ (k) = μ given that Σ 1 = ⋯ = Σ k = Σ and λ 2 is the λ-criterion for the marginal hypothesis H o2 :  Σ 1 = ⋯ = Σ k = Σ where μ and Σ are unknown. The conditional hypothesis H o1 is actually the null hypothesis usually being made when establishing the multivariate analysis of variance (MANOVA) procedure. We will only consider H o1 since the marginal hypothesis H o2 has already been discussed in Sect. 6.12. When the Σ i’s are assumed to be equal, the common Σ is estimated by the MLE \(\frac {1}{N}(S_1+\cdots +S_k)\) where S i is the sample sum of products matrix in the i-th population. The common μ is estimated by \(\frac {1}{N}(n_1\bar {X}_1+\cdots +n_k\bar {X}_k)\). Accordingly, the λ-criterion for this conditional hypothesis is the following:

$$\displaystyle \begin{aligned}\lambda_1=\frac{|S|{}^{\frac{N}{2}}}{|S+\sum_{i=1}^kn_i(\bar{X}_i-\hat{\mu})(\bar{X}_i-\hat{\mu})^{\prime}|{}^{\frac{N}{2}}}{}\end{aligned} $$
(6.13.2)

where \(S=S_1+\cdots +S_k,~ \hat {\mu }=\frac {1}{N}(n_1\bar {X}_1+\cdots +n_k\bar {X}_k),~ N=n_1+\cdots +n_k\). Note that the S i’s are independently Wishart distributed with n i − 1 degrees of freedom, that is, \(S_i\overset {ind}{\sim } W_p(n_i-1,~\varSigma )\), i = 1, …, k, and hence S ∼ W p(N − k, Σ). Let

$$\displaystyle \begin{aligned}Q=\sum_{i=1}^kn_i(\bar{X}_i-\hat{\mu})(\bar{X}_i-\hat{\mu})'. \end{aligned}$$

Since Q only contains sample averages and the sample averages and the sample sum of products matrices are independently distributed, Q and S are independently distributed. Moreover, since we can write \(\bar {X}_i-\hat {\mu }\) as \((\bar {X}_i-\mu )-(\hat {\mu }-\mu )\), where μ is the common true mean value vector, without any loss of generality we can deem the \(\bar {X}_i\)’s to be independently \(N_p(O,\frac {1}{n_i}\varSigma )\) distributed, i = 1, …, k, and letting \(Y_i=\sqrt {n_i}\bar {X}_i\), one has \(Y_i\overset {iid}{\sim } N_p(O,\varSigma )\) under the hypothesis H o1. Now, observe that

$$\displaystyle \begin{aligned} \bar{X}_i-\hat{\mu}&=\bar{X}_i-\frac{1}{N}(n_1\bar{X}_1+\cdots+n_k\bar{X}_k)\\ &=-\frac{n_1}{N}\bar{X}_1-\cdots -\frac{n_{i-1}}{N}\bar{X}_{i-1}+\Big(1-\frac{n_i}{N}\Big)\bar{X}_i-\frac{n_{i+1}}{N}\bar{X}_{i+1}-\cdots-\frac{n_k}{N}\bar{X}_k\,, \end{aligned} $$
(i)
$$\displaystyle \begin{aligned} Q&=\sum_{i=1}^kn_i(\bar{X}_i-\hat{\mu})(\bar{X}_i-\hat{\mu})'\\ &=\sum_{i=1}^kn_i\bar{X}_i\bar{X}_i^{\prime}-N\hat{\mu}\hat{\mu}'\\ &=\sum_{i=1}^kY_iY_i^{\prime}-\frac{1}{N}(\sqrt{n_1}Y_1+\cdots+\sqrt{n_k}Y_k)(\sqrt{n_1}Y_1+\cdots+\sqrt{n_k}Y_k)', \end{aligned} $$
(ii)

where \(\sqrt {n_1}Y_1+\cdots +\sqrt {n_k}Y_k=(Y_1,\ldots ,Y_k)DJ\) with J being the k × 1 vector of unities, J′ = (1, …, 1), and \(D={\mathrm{diag}}(\sqrt {n_1},\ldots ,\sqrt {n_k})\). Thus, we can express Q as follows:

$$\displaystyle \begin{aligned}Q=(Y_1,\ldots,Y_k)[I-\frac{1}{N}DJJ'D](Y_1,\ldots,Y_k)'. \end{aligned}$$
(iii)

Let \(B=\frac {1}{N}DJJ'D\) and A = I − B. Then, observing that J′D 2 J = N, both B and A are idempotent matrices, where B is of rank 1 since the trace of B or equivalently the trace of \(\frac {1}{N}J'D^2J\) is equal to one, so that the trace of A which is also its rank, is k − 1. Then, there exists an orthonormal matrix P, PP′ = I k, P′P = I k, such that where O is a (k − 1) × 1 null vector, O′ being its transpose. Letting (U 1, …, U k) = (Y 1, …, Y k)P′, the U i’s are still independently N p(O, Σ) distributed under H o1, so that

(iv)

Thus, \(S+\sum _{i=1}^kn_i(\bar {X}_i-\hat {\mu })(\bar {X}_i-\hat {\mu })'\sim W_p(N-1,~\varSigma )\), which clearly is not independently distributed of S, referring to the ratio in (6.13.2).

6.13.2. Arbitrary moments of λ 1

Given (6.13.2), we have

$$\displaystyle \begin{aligned}\lambda_1=\frac{|S|{}^{\frac{N}{2}}}{|S+Q|{}^{\frac{N}{2}}}\Rightarrow \lambda_1^h=|S|{}^{\frac{Nh}{2}}|S+Q|{}^{-\frac{Nh}{2}} \end{aligned}$$
(v)

where \(|S+Q|{ }^{-\frac {Nh}{2}}\) will be replaced by the equivalent integral

$$\displaystyle \begin{aligned}|S+Q|{}^{-\frac{Nh}{2}}=\frac{1}{\varGamma_p(\frac{Nh}{2})}\int_{T>O}|T|{}^{\frac{Nh}{2}-\frac{p+1}{2}}{\mathrm{e}}^{-{\mathrm{tr}}((S+Q)T)}{\mathrm{d}}T \end{aligned}$$
(vi)

with the p × p matrix T > O. Hence, the h-th moment of λ 1, for arbitrary h, is the following expected value:

$$\displaystyle \begin{aligned}E[\lambda_1^h|H_{o1}]=E\{|S|{}^{\frac{Nh}{2}}|S+Q|{}^{-\frac{Nh}{2}}\}. \end{aligned}$$
(vii)

We now evaluate (vii) by integrating out over the Wishart density of S and over the joint multinormal density for U 1, …, U k−1:

$$\displaystyle \begin{aligned} E[\lambda_1^h|H_{o1}]&=\frac{1}{\varGamma_p(\frac{Nh}{2})}\int_{T>O}|T|{}^{\frac{Nh}{2}-\frac{p+1}{2}}\\ &\qquad \quad \times \frac{\int_{S>O}|S|{}^{\frac{Nh}{2}+\frac{N-k}{2}-\frac{p+1}{2}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-{\mathrm{tr}}(ST)}}{2^{\frac{(N-k)p}{2}}\varGamma_p(\frac{N-k}{2})|\varSigma|{}^{\frac{N-k}{2}}}{\mathrm{d}}S\\ &\qquad \quad \times \int_{U_1,.,..,U_{k-1}}\frac{{\mathrm{e}}^{-\frac{1}{2}\sum_{i=1}^{k-1}U_i^{\prime}\varSigma^{-1}U_i-\sum_{i=1}^{k-1}{\mathrm{tr}}(TU_iU_i^{\prime})}}{(2\pi)^{\frac{(k-1)p}{2}}|\varSigma|{}^{\frac{k-1}{2}}}{\mathrm{d}}U_1\wedge\ldots\wedge{\mathrm{d}}U_{k-1}\wedge{\mathrm{d}}T.\end{aligned} $$

The integral over S is evaluated as follows:

$$\displaystyle \begin{aligned} &\frac{\int_{S>O}|S|{}^{\frac{Nh}{2}+\frac{N-k}{2}-\frac{p+1}{2}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}(S+2TS))}}{2^{\frac{(N-k)p}{2}}\varGamma_p(\frac{N-k}{2})|\varSigma|{}^{\frac{N-k}{2}}}{\mathrm{d}}S\\ &=2^{\frac{Nhp}{2}}|\varSigma|{}^{\frac{Nh}{2}}\frac{\varGamma_p(\frac{N-k}{2}+\frac{Nh}{2})}{\varGamma_p(\frac{N-k}{2})}|I+2\varSigma T|{}^{-(\frac{Nh}{2}+\frac{N-k}{2})}\end{aligned} $$
(viii)

for I + 2ΣT > O, \(\Re (\frac {N-k}{2}+\frac {Nh}{2})>\frac {p-1}{2}\). The integral over U 1, …, U k−1 is the following, denoted by δ:

$$\displaystyle \begin{aligned}\delta=\int_{U_1,\ldots,U_{k-1}}\frac{{\mathrm{e}}^{-\frac{1}{2}\sum_{i=1}^{k-1}U_i^{\prime}\varSigma^{-1}U_i-{\mathrm{tr}}(\sum_{i=1}^{k-1}TU_iU_i^{\prime})}}{(2\pi)^{\frac{(k-1)p}{2}}|\varSigma|{}^{\frac{k-1}{2}}}{\mathrm{d}}U_1\wedge\ldots\wedge{\mathrm{d}}U_{k-1} \end{aligned}$$

where

$$\displaystyle \begin{aligned}{\mathrm{tr}}\Big(\sum_{i=1}^{k-1}TU_iU_i^{\prime}\Big)={\mathrm{tr}}\Big(\sum_{i=1}^{k-1}U_i^{\prime}TU_i\Big)=\sum_{i=1}^{k-1}U_i^{\prime}TU_i \end{aligned}$$

since \(U_i^{\prime }TU_i\) is scalar; thus, the exponent becomes \(-\frac {1}{2}\sum _{i=1}^{k-1}U_i^{\prime }[\varSigma ^{-1}+2T]U_i\) and the integral simplifies to

$$\displaystyle \begin{aligned}\delta =|I+2\varSigma T|{}^{-\frac{k-1}{2}},~ I+2\varSigma T>O. \end{aligned}$$
(ix)

Now the integral over T is the following:

$$\displaystyle \begin{aligned} &\frac{1}{\varGamma_p(\frac{Nh}{2})}\int_{T>O}|T|{}^{\frac{Nh}{2}-\frac{p+1}{2}}|I+2\varSigma T|{}^{-(\frac{Nh}{2}+\frac{N-k}{2}+\frac{k-1}{2})}\\ &=\frac{\varGamma_p(\frac{N-k}{2}+\frac{k-1}{2})}{\varGamma_p(\frac{Nh}{2}+\frac{N-k}{2}+\frac{k-1}{2})}2^{-\frac{Nhp}{2}}|\varSigma|{}^{-\frac{Nh}{2}},~\Re(\frac{N-k}{2}+\frac{Nh}{2})>\frac{p-1}{2}.\end{aligned} $$
(x)

Therefore,

$$\displaystyle \begin{aligned} E[\lambda_1^h|H_{o1}]=\frac{\varGamma_p(\frac{N-k}{2}+\frac{Nh}{2})}{\varGamma_p(\frac{N-k}{2})}\frac{\varGamma_p(\frac{N-k}{2}+\frac{k-1}{2})} {\varGamma_p(\frac{Nh}{2}+\frac{N-k}{2}+\frac{k-1}{2})} \end{aligned} $$
(6.13.3)

for \(\Re (\frac {N-k}{2}+\frac {Nh}{2})>\frac {p-1}{2}\).

6.13.3. The asymptotic distribution of \(-2\ln \lambda _1\)

An asymptotic distribution of \(-2\ln \lambda _1\) as N → can be derived from (6.13.3). First, on expanding the real matrix-variate gamma functions in (6.13.3), we obtain the following representation of the h-th null moment of λ 1:

$$\displaystyle \begin{aligned} E[\lambda_1^h|H_{o1}]&=\Big\{\prod_{j=1}^p\frac{\varGamma(\frac{N-k}{2}+\frac{k-1}{2}-\frac{j-1}{2})}{\varGamma(\frac{N-k}{2}-\frac{j-1}{2})}\Big\}\\ &\ \ \ \ \times \Big\{\prod_{j=1}^p\frac{\varGamma(\frac{N}{2}(1+h)-\frac{k}{2}-\frac{j-1}{2})}{\varGamma(\frac{N}{2}(1+h)-\frac{k}{2}+\frac{k-1}{2}-\frac{j-1}{2})}\Big\}. \end{aligned} $$
(xi)

Let us now express all the gamma functions in terms of Sterling’s asymptotic formula by taking \(\frac {N}{2}\to \infty \) in the constant part and \(\frac {N}{2}(1+h)\to \infty \) in the part containing h. Then,

$$\displaystyle \begin{aligned} \frac{\varGamma(\frac{N}{2}(1+h)-\frac{k}{2}-\frac{j-1}{2})}{\varGamma(\frac{N}{2}(1+h)-\frac{k}{2}+\frac{k-1}{2}-\frac{j-1}{2})}&\to \frac{(2\pi)^{\frac{1}{2}}}{(2\pi)^{\frac{1}{2}}}\frac{[\frac{N}{2}(1+h)]^{\frac{N}{2}(1+h)-\frac{k}{2}-\frac{j}{2}}} {[\frac{N}{2}(1+h)]^{\frac{N}{2}(1+h)-\frac{k}{2}+\frac{k-1}{2}-\frac{j}{2}}}\frac{{\mathrm{e}}^{\frac{N}{2}(1+h)}}{{\mathrm{e}}^{\frac{N}{2}(1+h)}}\\ &=\frac{1}{[\frac{N}{2}(1+h)]^{\frac{k-1}{2}}}, \end{aligned} $$
(xii)
$$\displaystyle \begin{aligned} \frac{\varGamma(\frac{N}{2}-\frac{k}{2}+\frac{k-1}{2}-\frac{j-1}{2})}{\varGamma(\frac{N}{2}-\frac{k}{2}-\frac{j-1}{2})}&\to \Big(\frac{N}{2}\Big)^{\frac{(k-1)}{2}}. \end{aligned} $$
(xiii)

Hence,

$$\displaystyle \begin{aligned}E[\lambda_1^{h}|H_{o1}]\to (1+h)^{-\frac{(k-1)p}{2}}\mbox{ as }N\to\infty.{}\end{aligned} $$
(6.13.4)

Thus, the following result:

Theorem 6.13.1

For the test statistic λ 1 given in (6.13.2), \(-2\ln \lambda _1\to \chi ^2_{(k-1)p}\) , that is, \(-2\ln \lambda _1 \) tends to a real scalar chisquare variable having (k − 1)p degrees of freedom as N ∞ with N = n 1 + ⋯ + n k , n j being the sample size of the j-th p-variate real Gaussian population.

Under the marginal hypothesis H o2 :  Σ 1 = ⋯ = Σ k = Σ where Σ is unknown, the λ-criterion is denoted by λ 2, and its h-th moment, which is available from (6.12.6) of Sect. 6.12, is given by

$$\displaystyle \begin{aligned} E[\lambda_2^h|H_{o2}]=c^h\frac{\varGamma_p(\frac{N-k}{2})}{\varGamma_p(\frac{N-k}{2}+\frac{Nh}{2})} \Big\{\prod_{j=1}^p\frac{\varGamma_p(\frac{n_j-1}{2}+\frac{n_jh}{2})}{\varGamma_p(\frac{n_j-1}{2})}\Big\} \end{aligned} $$
(6.13.5)

for \(\Re (\frac {n_j-1}{2}+\frac {n_jh}{2})>\frac {p-1}{2}, ~j=1,\ldots ,k\). Hence the h-th null moment of the λ criterion for testing the hypothesis H o of equality of the k independent p-variate real Gaussian populations is the following:

$$\displaystyle \begin{aligned} E[\lambda^h|H_o]&=E[\lambda_1^h|H_{o1}]E[\lambda_2^h|H_{o2}]\\ &=\frac{\varGamma_p(\frac{N-k}{2}+\frac{k-1}{2})}{\varGamma_p(\frac{Nh}{2}+\frac{N-k}{2}+\frac{k-1}{2})} c^h\Big\{\prod_{j=1}^p\frac{\varGamma_p(\frac{n_j-1}{2} +\frac{n_jh}{2})}{\varGamma_p(\frac{n_j-1}{2})}\Big\}{}\end{aligned} $$
(6.13.6)

for \(\Re (\frac {n_j-1}{2}+\frac {n_jh}{2})>\frac {p-1}{2}, ~j=1,\ldots ,k, ~N=n_1+\cdots +n_k\), where c is the constant associated with the h-th moment of λ 2. Combining Theorems 6.13.1 and Theorem 6.12.1, the asymptotic distribution of \(-2\ln \lambda \) of (6.13.6) is a real scalar chisquare with \((k-1)p+(k-1)\frac {p(p+1)}{2}\) degrees of freedom. Thus, the following result:

Theorem 6.13.2

For the λ-criterion for testing the hypothesis of equality of k independent p-variate real Gaussian populations, \(-2\ln \lambda \to \chi ^2_{\nu }, ~\nu =(k-1)p+(k-1)\frac {p(p+1)}{2}\) as n j →, j = 1, …, k.

Note 6.13.1

Observe that for the conditional hypothesis H o1 in (6.13.2), the degrees of freedom of the asymptotic chisquare distribution of \(-2\ln \lambda _1\) is (k − 1)p, which is also the number of parameters restricted by the hypothesis H o1. For the hypothesis H o2, the corresponding degrees of freedom of the asymptotic chisquare distribution of \(-2\ln \lambda _2\) is \((k-1)\frac {p(p+1)}{2}\), which as well is the number of parameters restricted by the hypothesis H o2. The asymptotic chisquare distribution of \(-2\ln \lambda \) for the hypothesis H o of equality of k independent p-variate Gaussian populations the degrees of freedom is the sum of these two quantities, that is, \((k-1)p+(k-1)\frac {p(p+1)}{2}=p(k-1)\frac {(p+3)}{2}\), which also coincides with the number of parameters restricted under the hypothesis H o.

Exercises

6.1.

Derive the λ-criteria for the following tests in a real univariate Gaussian population N 1(μ, σ 2), assuming that a simple random sample of size n, namely x 1, …, x n, which are iid as N 1(μ, σ 2), is available: (1): μ = μ o (given), σ 2 is known; (2): μ = μ o, σ 2 unknown; (3): \(\sigma ^2=\sigma _o^2\) (given), also you may refer to Mathai and Haubold (2017).

In all the following problems, it is assumed that a simple random sample of size n is available. The alternative hypotheses are the natural alternatives.

6.2.

Repeat Exercise 6.1. for the corresponding complex Gaussian.

6.3.

Construct the λ-criteria in the complex case for the tests discussed in Sects. 6.26.4.

6.4.

In the real p-variate Gaussian case, consider the hypotheses (1): Σ is diagonal or the individual components are independently distributed; (2): The diagonal elements are equal, given that Σ is diagonal (which is a conditional test). Construct the λ-criterion in each case.

6.5.

Repeat Exercise 6.4. for the complex Gaussian case.

6.6.

Let the population be real p-variate Gaussian N p(μ, Σ), Σ = (σ ij) > O, μ′ = (μ 1, …, μ p). Consider the following tests and compute the λ-criteria: (1): σ 11 = ⋯ = σ pp = σ 2, σ ij = ν for all i and j, ij. That is, all the variances are equal and all the covariances are equal; (2): In addition to (1), μ 1 = μ 2 = ⋯ = μ or all the mean values are equal. Construct the λ-criterion in each case. The first one is known as L vc criterion and the second one is known L mvc criterion. Repeat the same exercise for the complex case. Some distributional aspects are examined in Mathai (1970b) and numerical tables are available in Mathai and Katiyar (1979b).

6.7.

Let the population be real p-variate Gaussian N p(μ, Σ), Σ > O. Consider the hypothesis (1):

(2):

where a 1≠0, a 2≠0, b 1≠0, b 2≠0, a 1b 1, a 2b 2, \(\varSigma _{12}=\varSigma _{21}^{\prime }\) and all the elements in Σ 12 and Σ 21 are each equal to c≠0. Construct the λ-criterion in each case. (These are hypotheses on patterned matrices).

6.8.

Repeat Exercise 6.7. for the complex case.

6.9.

Consider k independent real p-variate Gaussian populations with different parameters, distributed as \(N_p(M_j,\varSigma _j),~\varSigma _j>O,~ M_j^{\prime }=(\mu _{1j},\ldots ,\mu _{pj}),~ j=1,\ldots ,k\). Construct the λ-criterion for testing the hypothesis Σ 1 = ⋯ = Σ k or the covariance matrices are equal. Assume that simple random samples of sizes n 1, …, n k are available from these k populations.

6.10.

Repeat Exercise 6.9. for the complex case.

6.11.

For the second part of Exercise 6.7., which is also known as Wilks’ L mvc criterion, show that if \(u=\lambda ^{\frac {2}{n}}\) where λ is the likelihood ratio criterion and n is the sample size, then

$$\displaystyle \begin{aligned}u=\frac{|S|}{[s+(p-1)s_1][s-s_1+\frac{n}{p-1}\sum_{j=1}^p(\bar{x}_j-\bar{x})^2]} \end{aligned}$$
(i)

where S = (s ij) is the sample sum of products matrix, \(s=\frac {1}{p}\sum _{i=1}^ps_{ii},\) \( s_1=\frac {1}{p(p-1)}\sum _{i\ne j=1}^ps_{ij}\), \(\bar {x}=\frac {1}{p}\sum _{i=1}^p\bar {x}_i,\) \(\bar {x}_i=\frac {1}{n}\sum _{k=1}^nx_{ik}\). For the statistic u in (i), show that the h-th null moment or the h-th moment when the null hypothesis is true, is given by the following:

$$\displaystyle \begin{aligned}E[u^h|H_o]=\prod_{j=0}^{p-2}\frac{\varGamma(\frac{n-1}{2}+h-\frac{j}{2})}{\varGamma(\frac{n-1}{2}-\frac{j}{2})}\frac{\varGamma(\frac{n+1}{2}+\frac{j}{p-1})}{\varGamma(\frac{n+1}{2}+h+\frac{j}{p-1})}. \end{aligned}$$
(ii)

Write down the conditions for the existence of the moment in (ii). [For the null and non-null distributions of Wilks’ L mvc criterion, see Mathai (1978).]

6.12.

Let the (p + q) × 1 vector X have a (p + q)-variate nonsingular real Gaussian distribution, X ∼ N p+q(μ, Σ), Σ > O. Let

where Σ 1 is p × p with all its diagonal elements equal to σ aa and all other elements equal to \(\sigma _{aa'}\), Σ 2 has all elements equal to σ ab, Σ 3 has all diagonal elements equal to σ bb and all other elements equal to \(\sigma _{bb'}\) where \(\sigma _{aa},~ \sigma _{aa'},~ \sigma _{bb},~ \sigma _{bb'}\) are unknown. Then, Σ is known as bipolar. Let λ be the likelihood ratio criterion for testing the hypothesis that Σ is bipolar. Then show that the h-th null moment is the following:

$$\displaystyle \begin{aligned} E[\lambda^h|H_o]&=[(p-1)^{p-1}(q-1)^{q-1}]\frac{\varGamma[\frac{(q-1)(n-1)}{2}]\varGamma[\frac{(p-1)(n-1)}{2}]}{\varGamma[(p-1)(h+\frac{n-1}{2})]\varGamma[(q-1)(h+\frac{n-1}{2})]}\\ &\ \ \ \ \times \prod_{j=0}^{p+q-3}\frac{\varGamma[h+\frac{n-3}{2}-\frac{j}{2}]}{\varGamma[\frac{n-3}{2}-\frac{j}{2}]}\end{aligned} $$

where n is the sample size. Write down the conditions for the existence of this h-th null moment.

6.13.

Let X be m × n real matrix having the matrix-variate Gaussian density

$$\displaystyle \begin{aligned}f(X)=\frac{1}{|2\pi \varSigma|{}^{\frac{n}{2}}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}[\varSigma^{-1}(X-M)(X-M)']},~\varSigma>O. \end{aligned}$$

Letting S = XX′, S is a non-central Wishart matrix. Derive the density of S and show that this density, denoted by f s(S), is the following:

$$\displaystyle \begin{aligned} f_s(S)&=\frac{1}{\varGamma_m(\frac{n}{2})|2\varSigma|{}^{\frac{n}{2}}}|S|{}^{\frac{n}{2}-\frac{m+1}{2}}{\mathrm{e}}^{-{\mathrm{tr}}(\Omega)-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)}\\ &\ \ \ \ \times {{}_0F_1}(~;\frac{n}{2};\frac{1}{2}\varSigma^{-1}S)\end{aligned} $$

where \(\Omega =\frac {1}{2}MM'\varSigma ^{-1}\) is the non-centrality parameter and 0 F 1 is a Bessel function of matrix argument.

6.14.

Show that the h-th moment of the determinant of S, the non-central Wishart matrix specified in Exercise 6.13., is given by

$$\displaystyle \begin{aligned}E[|S|{}^h]=\frac{\varGamma_m(h+\frac{n}{2})}{\varGamma(\frac{n}{2})|2\varSigma|{}^h}{\mathrm{e}}^{-{\mathrm{tr}}(\Omega)}{{}_1F_1}(h+\frac{n}{2};\frac{n}{2};\Omega) \end{aligned}$$

where 1 F 1 is a hypergeometric function of matrix argument and Ω is the non-centrality parameter defined in Exercise 6.13..

6.15.

Letting \(v=\lambda ^{\frac {2}{n}}\) in Eq. (6.3.10), show that, under the null hypothesis H o, v is distributed as a real scalar type-1 beta with the parameters \((\frac {n-1}{2},~\frac {1}{2})\) and that \(\frac {n A'\bar {X}\bar {X}'A}{A'SA}\) is real scalar type-2 beta distributed with the parameters \((\frac {1}{2}, ~\frac {n-1}{2})\).

6.16.

Show that for an arbitrary h, the h-th null moment of the test statistic λ specified in Eq. (6.3.10) is

$$\displaystyle \begin{aligned}E[\lambda^h|H_o]=\frac{\varGamma(\frac{n-1}{2}+\frac{nh}{2})}{\varGamma(\frac{n-1}{2})}\frac{\varGamma(\frac{n}{2})}{\varGamma(\frac{n}{2}+\frac{nh}{2})},~ \Re(h)>-\frac{n-1}{n}.\end{aligned}$$