3.1. Introduction

Real scalar mathematical as well as random variables will be denoted by lower-case letters such as x, y, z, and vector/matrix variables, whether mathematical or random, will be denoted by capital letters such as X, Y, Z, in the real case. Complex variables will be denoted with a tilde: \(\tilde {x},\tilde {y}, \tilde {X},\tilde {Y}, \) for instance. Constant matrices will be denoted by A, B, C, and so on. A tilde will be placed above constant matrices only if one wishes to stress the point that the matrix is in the complex domain. Equations will be numbered chapter and section-wise. Local numbering will be done subsection-wise. The determinant of a square matrix A will be denoted by |A| or det(A) and, in the complex case, the absolute value of the determinant of A will be denoted as |det(A)|. Observe that in the complex domain, det(A) = a + ib where a and b are real scalar quantities, and then, |det(A)|2 = a 2 + b 2.

Multivariate usually refers to a collection of scalar variables. Vector/matrix variable situations are also of the multivariate type but, in addition, the positions of the variables must also be taken into account. In a function involving a matrix, one cannot permute its elements since each permutation will produce a different matrix. For example,

are all multivariate cases but the elements or the individual variables must remain at the set positions in the matrices.

The definiteness of matrices will be needed in our discussion. Definiteness is defined and discussed only for symmetric matrices in the real domain and Hermitian matrices in the complex domain. Let A = A be a real p × p matrix and Y  be a p × 1 real vector, Y denoting its transpose. Consider the quadratic form Y AY , A = A , for all possible Y  excluding the null vector, that is, Y ≠O. We say that the real quadratic form Y AY  as well as the real matrix A = A are positive definite, which is denoted A > O, if Y AY > 0, for all possible non-null Y . Letting A = A be a real p × p matrix, if for all real p × 1 vector Y ≠O,

$$\displaystyle \begin{aligned} Y^{\prime}AY>0,&\ A>O\ (\mbox{positive definite})\\ Y^{\prime}AY\ge 0,& \ A\ge O \ (\mbox{positive semi-definite}){}\\ Y^{\prime}AY<0,&\ A<O \ (\mbox{negative definite})\\ Y^{\prime}AY\le 0,&\ A\le O \ (\mbox{negative semi-definite}). \end{aligned} $$
(3.1.1)

All the matrices that do not belong to any one of the above categories are said to be indefinite matrices, in which case A will have both positive and negative eigenvalues. For example, for some Y , Y AY  may be positive and for some other values of Y , Y AY  may be negative. The definiteness of Hermitian matrices can be defined in a similar manner. A square matrix A in the complex domain is called Hermitian if A = A where A means the conjugate transpose of A. Either the conjugates of all the elements of A are taken and the matrix is then transposed or the matrix A is first transposed and the conjugate of each of its elements is then taken. If \(\tilde {z}=a+ib,\ i=\sqrt {(-1)} \) and a, b real scalar, then the conjugate of \(\tilde {z}\), conjugate being denoted by a bar, is \(\bar {\tilde {z}}=a-ib\), that is, i is replaced by − i. For instance, since

B = B , and thus the matrix B is Hermitian. In general, if \(\tilde {X}\) is a p × p matrix, then, \(\tilde {X}\) can be written as \(\tilde {X}=X_1+iX_2\) where X 1 and X 2 are real matrices and \(i=\sqrt {(-1)}\). And if \(\tilde {X}=X^{*}\) then \(\tilde {X}=X_1+iX_2=X^{*}=X_1^{\prime }-iX_2^{\prime }\) or X 1 is symmetric and X 2 is skew symmetric so that all the diagonal elements of a Hermitian matrix are real. The definiteness of a Hermitian matrix can be defined parallel to that in the real case. Let A = A be a Hermitian matrix. In the complex domain, definiteness is defined only for Hermitian matrices. Let Y ≠O be a p × 1 non-null vector and let Y be its conjugate transpose. Then, consider the Hermitian form Y AY, A = A . If Y AY > 0 for all possible non-null Y ≠O, the Hermitian form Y AY, A = A as well as the Hermitian matrix A are said to be positive definite, which is denoted A > O. Letting A = A , if for all non-null Y ,

$$\displaystyle \begin{aligned} Y^{*}AY>0,\ & A>O\ (\mbox{Hermitian positive definite})\\ Y^{*}AY\ge 0,\ & A\ge O \ (\mbox{Hermitian positive semi-definite})\\ Y^{*}AY<0,\ & A<O \ (\mbox{Hermitian negative definite}){}\\ Y^{*}AY\le 0,\ & A\le O\ (\mbox{Hermitian negative semi-definite}), \end{aligned} $$
(3.1.2)

and when none of the above cases applies, we have indefinite matrices or indefinite Hermitian forms.

We will also make use of properties of the square root of matrices. If we were to define the square root of A as B such as B 2 = A, there would then be several candidates for B. Since a multiplication of A with A is involved, A has to be a square matrix. Consider the following matrices

whose squares are all equal to I 2. Thus, there are clearly several candidates for the square root of this identity matrix. However, if we restrict ourselves to the class of positive definite matrices in the real domain and Hermitian positive definite matrices in the complex domain, then we can define a unique square root, denoted by \(A^{\frac {1}{2}}>O.\)

For the various Jacobians used in this chapter, the reader may refer to Chap. 1, further details being available from Mathai (1997).

3.1a. The Multivariate Gaussian Density in the Complex Domain

Consider the complex scalar random variables \(\tilde {x}_1,\ldots , \tilde {x}_p\). Let \(\tilde {x}_j=x_{j1}+ix_{j2}\) where x j1, x j2 are real and \(i=\sqrt {(-1)}\). Let E[x j1] = μ j1, E[x j2] = μ j2 and \(\ E[\tilde {x}_j]=\mu _{j1}+i\mu _{j2}\equiv \tilde {\mu }_j\). Let the variances be as follows: \({\mathrm {Var}}(x_{j1})=\sigma _{j1}^2, {\mathrm {Var}}(x_{j2})=\sigma _{j2}^2\). For a complex variable, the variance is defined as follows:

$$\displaystyle \begin{aligned} {\mathrm{Var}}(\tilde{x}_j)&=E[\tilde{x}_j-E(\tilde{x}_j)][\tilde{x}_j-E(\tilde{x}_j)]^{*}\\ &=E[(x_{j1}-\mu_{j1})+i(x_{j2}-\mu_{j2})][(x_{j1}-\mu_{j1})-i(x_{j2}-\mu_{j2})]\\ &=E[(x_{j1}-\mu_{j1})^2+(x_{j2}-\mu_{j2})^2]={\mathrm{Var}}(x_{j1})+{\mathrm{Var}}(x_{j2})=\sigma_{j1}^2+\sigma_{j2}^2\\ &\equiv\sigma_j^2\, .\end{aligned} $$

A covariance matrix associated with the p × 1 vector \(\tilde {X}=(\tilde {x}_1,\ldots , \tilde {x}_p)^{\prime }\) in the complex domain is defined as \({\mathrm {Cov}}(\tilde {X})=E[\tilde {X}-E(\tilde {X})][\tilde {X}-E(\tilde {X})]^{*}\equiv \varSigma \) with \(E(\tilde {X})\equiv \tilde {\mu }=(\tilde {\mu }_1,\ldots , \tilde {\mu }_p)^{\prime }\). Then we have

where the covariance between \(\tilde {x}_r\) and \(\tilde {x}_s\), two distinct elements in \(\tilde {X}\), requires explanation. Let \(\tilde {x}_r=x_{r1}+ix_{r2}\) and \(\tilde {x}_s=x_{s1}+ix_{s2}\) where x r1, x r2, x s1, x s2 are all real. Then, the covariance between \(\tilde {x}_r\) and \(\tilde {x}_s\) is

$$\displaystyle \begin{aligned} {\mathrm{Cov}}(\tilde{x}_r,\tilde{x}_s)&=E[\tilde{x}_r-E(\tilde{x}_r)][\tilde{x}_s-E(\tilde{x}_s)]^{*}={\mathrm{Cov}}[(x_{r1}+ix_{r2}),(x_{s1}-ix_{s2})]\\ &={\mathrm{Cov}}(x_{r1},x_{s1})+{\mathrm{Cov}}(x_{r2},x_{s2})+i[{\mathrm{Cov}}(x_{r2},x_{s1})-{\mathrm{Cov}}(x_{r1},x_{s2})=\sigma_{rs}].\end{aligned} $$

Note that none of the individual covariances on the right-hand side need be equal to each other. Hence, σ rs need not be equal to σ sr. In terms of vectors, we have the following: Let \(\tilde {X}=X_1+iX_2\) where X 1 and X 2 are real vectors. The covariance matrix associated with \(\tilde {X}\), which is denoted by \({\mathrm {Cov}}(\tilde {X})\), is

$$\displaystyle \begin{aligned} {\mathrm{Cov}}(\tilde{X})&=E([\tilde{X}-E(\tilde{X})][\tilde{X}-E(\tilde{X})]^{*})\\ &=E([(X_1-E(X_1))+i(X_2-E(X_2))][(X_1^{\prime}-E(X_1^{\prime}))-i(X_2^{\prime}-E(X_2^{\prime}))])\\ &={\mathrm{Cov}}(X_1,X_1)+{\mathrm{Cov}}(X_2,X_2)+i[{\mathrm{Cov}}(X_2,X_1)-{\mathrm{Cov}}(X_1,X_2)]\\ & \equiv \varSigma_{11}+\varSigma_{22}+i[\varSigma_{21}-\varSigma_{12}]\end{aligned} $$

where Σ 12 need not be equal to Σ 21. Hence, in general, Cov(X 1, X 2) need not be equal to Cov(X 2, X 1). We will denote the whole configuration as \({\mathrm {Cov}}(\tilde {X})=\varSigma \) and assume it to be Hermitian positive definite. We will define the p-variate Gaussian density in the complex domain as the following real-valued function:

$$\displaystyle \begin{aligned} f(\tilde{X})=\frac{1}{\pi^p|{\mathrm{det}}(\varSigma)|}{\mathrm{e}}^{-(\tilde{X}-\tilde{\mu})^{*}\varSigma^{-1}(\tilde{X}-\tilde{\mu})} {} \end{aligned} $$
(3.1a.1)

where |det(Σ)| denotes the absolute value of the determinant of Σ. Let us verify that the normalizing constant is indeed \(\frac {1}{\pi ^p|{\mathrm {det}}(\varSigma )|}\). Consider the transformation \(\tilde {Y}=\varSigma ^{-\frac {1}{2}}(\tilde {X}-\tilde {\mu })\) which gives \({\mathrm {d}}\tilde {X}=[{\mathrm {det}}(\varSigma \varSigma ^{*})]^{\frac {1}{2}}{\mathrm {d}}\tilde {Y}=|{\mathrm {det}}(\varSigma )|{\mathrm {d}}\tilde {Y}\) in light of (1.6a.1). Then |det(Σ)| is canceled and the exponent becomes \(-\tilde {Y}^{*}\tilde {Y}=-[|\tilde {y}_1|{ }^2+\cdots +|\tilde {y}_p|{ }^2]\). But

$$\displaystyle \begin{aligned}\int_{\tilde{y}_j}{\mathrm{e}}^{-|\tilde{y}_j|{}^2}{\mathrm{d}}\tilde{y}_j =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}{\mathrm{e}}^{-(y_{j1}^2+y_{j2}^2)}{\mathrm{d}}y_{j1}\wedge{\mathrm{d}}y_{j2}=\pi, \ \tilde{y}_j=y_{j1}+iy_{j2}, \end{aligned} $$
(i)

which establishes the normalizing constant. Let us examine the mean value and the covariance matrix of \(\tilde {X}\) in the complex case. Let us utilize the same transformation, \(\varSigma ^{-\frac {1}{2}}(\tilde {X}-\tilde {\mu })\). Accordingly,

$$\displaystyle \begin{aligned}E[\tilde{X}]=\tilde{\mu}+E[(\tilde{X}-\tilde{\mu})]=\tilde{\mu}+\varSigma^{\frac{1}{2}}E[\tilde{Y}]. \end{aligned}$$

However,

$$\displaystyle \begin{aligned}E[\tilde{Y}]=\frac{1}{\pi^p}\int_{\tilde{Y}}\tilde{Y}{\mathrm{e}}^{-\tilde{Y}^{*}\tilde{Y}}{\mathrm{d}}\tilde{Y}, \end{aligned}$$

and the integrand has each element in \(\tilde {Y}\) producing an odd function whose integral converges, so that the integral over \(\tilde {Y}\) is null. Thus, \(E[\tilde {X}]=\tilde {\mu }\), the first parameter appearing in the exponent of the density (3.1a.1). Now the covariance matrix in \(\tilde {X}\) is the following:

$$\displaystyle \begin{aligned}{\mathrm{Cov}}(\tilde{X})=E([\tilde{X}-E(\tilde{X})][\tilde{X}-E(\tilde{X})]^{*})=\varSigma^{\frac{1}{2}}E[\tilde{Y}\tilde{Y}^{*}]\varSigma^{\frac{1}{2}}. \end{aligned}$$

We consider the integrand in \(E[\tilde {Y}\tilde {Y}^{*}]\) and follow steps parallel to those used in the real case. It is a p × p matrix where the non-diagonal elements are odd functions whose integrals converge and hence each of these elements will integrate out to zero. The first diagonal element in \(\tilde {Y}\tilde {Y}^{*}\) is \(|\tilde {y}_1|{ }^2\). Its associated integral is

$$\displaystyle \begin{aligned} &\int\ldots\int |\tilde{y}_1|{}^2{\mathrm{e}}^{-(|\tilde{y}_1|{}^2+\cdots+|\tilde{y}_p|{}^2)}{\mathrm{d}}\tilde{y}_1\wedge\ldots \wedge{\mathrm{d}}\tilde{y}_p\\ &=\Big\{\prod_{j=2}^p{\mathrm{e}}^{-|\tilde{y}_j|{}^2}{\mathrm{d}}\tilde{y}_j\Big\}\int_{\tilde{y}_1}|\tilde{y}_1|{}^2{\mathrm{e}}^{-|\tilde{y}_1|{}^2}{\mathrm{d}}\tilde{y}_1.\end{aligned} $$

From (i),

$$\displaystyle \begin{aligned}\int_{\tilde{y}_1}|\tilde{y}_1|{}^2{\mathrm{e}}^{-|\tilde{y}_1|{}^2}{\mathrm{d}}\tilde{y}_1=\pi;\ \prod_{j=2}^p\int_{\tilde{y}_j}{\mathrm{e}}^{-|\tilde{y}_j|{}^2}{\mathrm{d}}\tilde{y}_j=\pi^{p-1}, \end{aligned}$$

where \(|\tilde {y}_1|{ }^2=y_{11}^2+y_{12}^2,\ \tilde {y}_1=y_{11}+iy_{12},\ i=\sqrt {(-1)}, \) and y 11, y 12 real. Let \(y_{11}=r\cos \theta , \ y_{12}=r\sin \theta \Rightarrow {\mathrm {d}}y_{11}\wedge {\mathrm {d}}y_{12}=r\,{\mathrm {d}}r\wedge {\mathrm {d}}\theta \) and

$$\displaystyle \begin{aligned} \int_{\tilde{y}_1}|\tilde{y}_1|{}^2{\mathrm{e}}^{-|\tilde{y}_1|{}^2}{\mathrm{d}}\tilde{y}_1&=\Big(\int_{r=0}^{\infty}r(r^2){\mathrm{e}}^{-r^2}{\mathrm{d}}r\Big)\Big(\int_{\theta=0}^{2\pi}{\mathrm{d}}\theta\Big),\ ({\mathrm{letting}}\ u=r^2)\\ &=(2\pi)\Big(\frac{1}{2}\int_0^{\infty}u{\mathrm{e}}^{-u}{\mathrm{d}}u\Big)=(2\pi)\Big(\frac{1}{2}\Big)=\pi. \end{aligned} $$

Thus the first diagonal element in \(\tilde {Y}\tilde {Y}^{*}\) integrates out to π p and, similarly, each diagonal element will integrate out to π p, which is canceled by the term π p present in the normalizing constant. Hence the integral over \(\tilde {Y}\tilde {Y}^{*}\) gives an identity matrix and the covariance matrix of \(\tilde {X}\) is Σ, the other parameter appearing in the density (3.1a.1). Hence the two parameters therein are the mean value vector and the covariance matrix of \(\tilde {X}\).

Example 3.1a.1

Consider the matrix Σ and the vector \(\tilde {X}\) with expected value \(E[\tilde {X}]=\tilde {\mu }\) as follows:

Show that Σ is Hermitian positive definite so that it can be a covariance matrix of \(\tilde {X}\), that is, \({\mathrm {Cov}}(\tilde {X})=\varSigma \). If \(\tilde {X}\) has a bivariate Gaussian distribution in the complex domain; \(\tilde {X}\sim \tilde {N}_2(\tilde {\mu },\varSigma ),\ \varSigma >O\), then write down (1) the exponent in the density explicitly; (2) the density explicitly.

Solution 3.1a.1

The transpose and conjugate transpose of Σ are

and hence Σ is Hermitian. The eigenvalues of Σ are available from the equation

$$\displaystyle \begin{aligned} (2-\lambda)(3-\lambda)-(1-i)(1+i)=0&\Rightarrow \lambda^2-5\lambda+4=0\\ &\Rightarrow (\lambda-4)(\lambda-1)\mbox{ or }\lambda_1=4,\ \lambda_2=1.\end{aligned} $$

Thus, the eigenvalues are positive [the eigenvalues of a Hermitian matrix will always be real]. This property of eigenvalues being positive, combined with the property that Σ is Hermitian proves that Σ is Hermitian positive definite. This can also be established from the leading minors of Σ. The leading minors are det((2)) = 2 > 0 and det(Σ) = (2)(3) − (1 − i)(1 + i) = 4 > 0. Since Σ is Hermitian and its leading minors are all positive, Σ is positive definite. Let us evaluate the inverse by making use of the formula \(\varSigma ^{-1}=\frac {1}{{\mathrm {det}}(\varSigma )}({\mathrm {Cof}}(\varSigma ))^{\prime }\) where Cof(Σ) represents the matrix of cofactors of the elements in Σ. [These formulae hold whether the elements in the matrix are real or complex]. That is,

(ii)

The exponent in a bivariate complex Gaussian density being \(-(\tilde {X}-\tilde {\mu })^{*}\,\varSigma ^{-1}(\tilde {X}-\tilde {\mu })\), we have

$$\displaystyle \begin{aligned} -(\tilde{X}-\tilde{\mu})^{*}\,\varSigma^{-1}(\tilde{X}-\tilde{\mu})&=-\frac{1}{4}\{3\,[\tilde{x}_1-(1+2i)]^{*}[\tilde{x}_1-(1+2i)]\\ &\ \ \ \ -(1+i)\,[\tilde{x}_1-(1+2i)]^{*}[\tilde{x}_2-(2-i)]\\ &\ \ \ \ -(1-i)\,[\tilde{x}_2-(2-i)]^{*}[\tilde{x}_1-(1+2i)]\\ &\ \ \ \ +2\,[\tilde{x}_2-(2-i)]^{*}[\tilde{x}_2-(2-i)]\}.{} \end{aligned} $$
(iii)

Thus, the density of the \(\tilde {N}_2(\tilde {\mu },\varSigma )\) vector whose components can assume any complex value is

$$\displaystyle \begin{aligned} f(\tilde{X})=\frac{{\mathrm{e}}^{-(\tilde{X}-\tilde{\mu})^{*}\,\varSigma^{-1}(\tilde{X}-\tilde{\mu})}}{4\,\pi^2}{} \end{aligned} $$
(3.1a.2)

where Σ −1 is given in (ii) and the exponent, in (iii).

Exercises 3.1

3.1.1

Construct a 2 × 2 Hermitian positive definite matrix A and write down a Hermitian form with this A as its matrix.

3.1.2

Construct a 2 × 2 Hermitian matrix B where the determinant is 4, the trace is 5, and first row is 2, 1 + i. Then write down explicitly the Hermitian form X BX.

3.1.3

Is B in Exercise 3.1.2 positive definite? Is the Hermitian form X BX positive definite? Establish the results.

3.1.4

Construct two 2 × 2 Hermitian matrices A and B such that AB = O (null), if that is possible.

3.1.5

Specify the eigenvalues of the matrix B in Exercise 3.1.2, obtain a unitary matrix Q, QQ  = I, Q Q = I such that Q BQ is diagonal and write down the canonical form for a Hermitian form X BX = λ 1|y 1|2 + λ 2|y 2|2.

3.2. The Multivariate Normal or Gaussian Distribution, Real Case

We may define a real p-variate Gaussian density via the following characterization: Let x 1, .., x p be real scalar variables and X be a p × 1 vector with x 1, …, x p as its elements, that is, X  = (x 1, …, x p). Let L  = (a 1, …, a p) where a 1, …, a p are arbitrary real scalar constants. Consider the linear function u = L X = X L = a 1 x 1 + ⋯ + a p x p. If, for all possible L, u = L X has a real univariate Gaussian distribution, then the vector X is said to have a multivariate Gaussian distribution. For any linear function u = L X, E[u] = L E[X] = L μ, μ  = (μ 1, …, μ p), μ j = E[x j], j = 1, …, p, and Var(u) = L ΣL, Σ = Cov(X) = E[X − E(X)][XE(X)] in the real case. If u is univariate normal then its mgf, with parameter t, is the following:

$$\displaystyle \begin{aligned}M_u(t)=E[{\mathrm{e}}^{tu}]={\mathrm{e}}^{tE(u)+\frac{t^2}{2}{\mathrm{Var}}(u)}={\mathrm{e}}^{tL^{\prime}\mu+\frac{t^2}{2}L^{\prime}\varSigma L}. \end{aligned}$$

Note that \(tL^{\prime }\mu +\frac {t^2}{2}L^{\prime }\varSigma L=(tL)^{\prime }\mu +\frac {1}{2}(tL)^{\prime }\varSigma (tL)\) where there are p parameters a 1, …, a p when the a j’s are arbitrary. As well, tL contains only p parameters as, for example, ta j is a single parameter when both t and a j are arbitrary. Then,

$$\displaystyle \begin{aligned} M_u(t)=M_X(tL)={\mathrm{e}}^{(tL)^{\prime}\mu+\frac{1}{2}(tL)^{\prime}\varSigma (tL)}={\mathrm{e}}^{T^{\prime}\mu+\frac{1}{2}T^{\prime}\varSigma T}=M_X(T),\ T=tL.{} \end{aligned} $$
(3.2.1)

Thus, when L is arbitrary, the mgf of u qualifies to be the mgf of a p-vector X. The density corresponding to (3.2.1) is the following, when Σ > O:

$$\displaystyle \begin{aligned}f(X)=c~{\mathrm{e}}^{-\frac{1}{2}(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)},\ -\infty<x_j<\infty,\ -\infty<\mu_j<\infty,\ \varSigma>O \end{aligned}$$

for j = 1, …, p. We can evaluate the normalizing constant c when f(X) is a density, in which case the total integral is unity. That is,

$$\displaystyle \begin{aligned}1=\int_Xf(X){\mathrm{d}}X=\int_Xc~{\mathrm{e}}^{-\frac{1}{2}(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)}{\mathrm{d}}X. \end{aligned}$$

Let \(\varSigma ^{-\frac {1}{2}}(X-\mu )=Y\Rightarrow {\mathrm {d}}Y=|\varSigma |{ }^{-\frac {1}{2}}{\mathrm {d}}(X-\mu )=|\varSigma |{ }^{-\frac {1}{2}}{\mathrm {d} }X \) since μ is a constant. The Jacobian of the transformation may be obtained from Theorem 1.6.1. Now,

$$\displaystyle \begin{aligned}1=c|\varSigma|{}^{\frac{1}{2}}\int_Y{\mathrm{e}}^{-\frac{1}{2}Y^{\prime}Y}{\mathrm{d}}Y. \end{aligned}$$

But \(Y^{\prime }Y=y_1^2+\cdots +y_p^2\,\) where y 1, …, y p are the real elements in Y  and \(\int _{-\infty }^{\infty }{\mathrm {e}}^{-\frac {1}{2}y_j^2}{\mathrm {d}}y_j=\sqrt {2\pi }\). Hence \(\int _Y{\mathrm {e}}^{-\frac {1}{2}Y^{\prime }Y}{\mathrm {d}}Y=(\sqrt {2\pi })^p\). Then \(c=[|\varSigma |{ }^{\frac {1}{2}}(2\pi )^{\frac {p}{2}}]^{-1}\) and the p-variate real Gaussian or normal density is given by

$$\displaystyle \begin{aligned} f(X)=\frac{1}{|\varSigma|{}^{\frac{1}{2}}(2\pi)^{\frac{p}{2}}}{\mathrm{e}}^{-\frac{1}{2}(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)}{} \end{aligned} $$
(3.2.2)

for Σ > O,  − < x j < ,  − < μ j < , j = 1, …, p. The density (3.2.2) is called the nonsingular normal density in the real case—nonsingular in the sense that Σ is nonsingular. In fact, Σ is also real positive definite in the nonsingular case. When Σ is singular, we have a singular normal distribution which does not have a density function. However, in the singular case, all the properties can be studied with the help of the associated mgf which is of the form in (3.2.1), as the mgf exists whether Σ is nonsingular or singular.

We will use the standard notation X ∼ N p(μ, Σ) to denote a p-variate real normal or Gaussian distribution with mean value vector μ and covariance matrix Σ. If it is nonsingular real Gaussian, we write Σ > O; if it is singular normal, then we specify |Σ| = 0. If we wish to combine the singular and nonsingular cases, we write X ∼ N p(μ, Σ), Σ ≥ O.

What are the mean value vector and the covariance matrix of a real p-Gaussian vector X?

$$\displaystyle \begin{aligned} E[X]&=E[X-\mu]+E[\mu]=\mu+\int_X(X-\mu)f(X){\mathrm{d}}X\\ &=\mu+\frac{1}{|\varSigma|{}^{\frac{1}{2}}(2\pi)^{\frac{p}{2}}}\int_X(X-\mu)\,{\mathrm{e}}^{-\frac{1}{2}(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)}{\mathrm{d}}X\\ &=\mu+\frac{\varSigma^{ \frac{1}{2}}}{(2\pi)^{\frac{p}{2}}}\int_YY\,{\mathrm{e}}^{-\frac{1}{2}Y^{\prime}Y}{\mathrm{d}}Y,\ Y=\varSigma^{-\frac{1}{2}}(X-\mu).\end{aligned} $$

The expected value of a matrix is the matrix of the expected value of every element in the matrix. The expected value of the component y j of Y  = (y 1, …, y p) is

$$\displaystyle \begin{aligned}E[y_j]=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}y_j{\mathrm{e}}^{-\frac{1}{2}y_j^2}{\mathrm{d}}y_j\Big\{\prod_{i\ne j=1}^p\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}{\mathrm{e}}^{-\frac{1}{2}y_i^2}{\mathrm{d}}y_i\Big\}. \end{aligned}$$

The product is equal to 1 and the first integrand being an odd function of y j, it is equal to 0 since integral is convergent. Thus, E[Y ] = O (a null vector) and E[X] = μ, the first parameter appearing in the exponent of the density. Now, consider the covariance matrix of X. For a vector real X,

$$\displaystyle \begin{aligned} {\mathrm{Cov}}(X)&=E[X-E(X)][X-E(X)]^{\prime}=E[(X-\mu)(X-\mu)^{\prime}]\\ &=\frac{1}{|\varSigma|{}^{\frac{1}{2}}(2\pi)^{\frac{p}{2}}}\int_X(X-\mu)(X-\mu)^{\prime}{\mathrm{e}}^{-\frac{1}{2}(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)}{\mathrm{d}}X\\ &=\frac{1}{(2\pi)^{\frac{p}{2}}}\varSigma^{\frac{1}{2}}\Big[\int_YYY^{\prime}{\mathrm{e}}^{-\frac{1}{2}Y^{\prime}Y}{\mathrm{d}}Y\Big]\varSigma^{\frac{1}{2}},\ Y=\varSigma^{-\frac{1}{2}}(X-\mu).\end{aligned} $$

But

The non-diagonal elements are linear in each variable y i and y j, ij and hence the integrals over the non-diagonal elements will be equal to zero due to a property of convergent integrals over odd functions. Hence we only need to consider the diagonal elements. When considering y 1, the integrals over y 2, …, y p will give the following:

$$\displaystyle \begin{aligned}\int_{-\infty}^{\infty}{\mathrm{e}}^{-\frac{1}{2}y_j^2}{\mathrm{d}}y_j=\sqrt{2\pi},\ j=2,\ldots,p\ \Rightarrow (2\pi)^{\frac{p-1}{2}}\end{aligned}$$

and hence we are left with

$$\displaystyle \begin{aligned}\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}y_1^2{\mathrm{e}}^{-\frac{1}{2}y_1^2}{\mathrm{d}}y_1=\frac{2}{\sqrt{2\pi}}\int_0^{\infty}y_1^2{\mathrm{e}}^{-\frac{1}{2}y_1^2}{\mathrm{d}}y_1 \end{aligned}$$

due to evenness of the integrand, the integral being convergent. Let \(u=y_1^2\) so that \( y_1=u^{\frac {1}{2}}\) since y 1 > 0. Then \({\mathrm {d}}y_1=\frac {1}{2}u^{\frac {1}{2}-1}{\mathrm {d}}u\). The integral is available as \(\varGamma (\frac {3}{2})2^{\frac {3}{2}}=\frac {1}{2}\varGamma (\frac {1}{2})2^{\frac {3}{2}}=\sqrt {2\pi }\) since \(\varGamma (\frac {1}{2})=\sqrt {\pi }\), and the constant is canceled leaving 1. This shows that each diagonal element integrates out to 1 and hence the integral over YY is the identity matrix after absorbing \((2\pi )^{-\frac {p}{2}}\). Thus \({\mathrm {Cov}}(X)=\varSigma ^{\frac {1}{2}}\varSigma ^{\frac {1}{2}}=\varSigma \) the inverse of which is the other parameter appearing in the exponent of the density. Hence the two parameters are

$$\displaystyle \begin{aligned} \mu=&E[X]\ {\mathrm{and}}\ \varSigma={\mathrm{Cov}}(X).{} \end{aligned} $$
(3.2.3)

The bivariate case

When p = 2, we obtain the bivariate real normal density from (3.2.2), which is denoted by f(x 1, x 2). Note that when p = 2,

where \(\sigma _1^2={\mathrm {Var}}(x_1)=\sigma _{11},\ \sigma _2^2={\mathrm {Ver}}(x_2)=\sigma _{22},\ \sigma _{12}={\mathrm {Cov}}(x_1,x_2)=\sigma _1\sigma _2\rho \) where ρ is the correlation between x 1 and x 2, and ρ, in general, is defined as

$$\displaystyle \begin{aligned}\rho=\frac{{\mathrm{Cov}}(x_1,x_2)}{\sqrt{{\mathrm{Var}}(x_1){\mathrm{Var}}(x_2)}}=\frac{\sigma_{12}}{\sigma_1\sigma_2},\ \sigma_1\ne 0,\ \sigma_2\ne 0, \end{aligned}$$

which means that ρ is defined only for non-degenerate random variables, or equivalently, that the probability mass of either variable should not lie at a single point. This ρ is a scale-free covariance, the covariance measuring the joint variation in (x 1, x 2) corresponding to the square of scatter, Var(x), in a real scalar random variable x. The covariance, in general, depends upon the units of measurements of x 1 and x 2, whereas ρ is a scale-free pure coefficient. This ρ does not measure relationship between x 1 and x 2 for − 1 < ρ < 1. But for ρ = ±1 it can measure linear relationship. Oftentimes, ρ is misinterpreted as measuring any relationship between x 1 and x 2, which is not the case as can be seen from the counterexamples pointed out in Mathai and Haubold (2017). If ρ x,y is the correlation between two real scalar random variables x and y and if u = a 1 x + b 1 and v = a 2 y + b 2 where a 1≠0, a 2≠0 and b 1, b 2 are constants, then ρ u,v = ±ρ x,y. It is positive when a 1 > 0, a 2 > 0 or a 1 < 0, a 2 < 0 and negative otherwise. Thus, ρ is both location and scale invariant.

The determinant of Σ in the bivariate case is

The inverse is as follows, taking the inverse as the transpose of the matrix of cofactors divided by the determinant:

(3.2.4)

Then,

$$\displaystyle \begin{aligned} (X-\mu)^{\prime}\varSigma^{-1}(X-\mu)=\Big(\frac{x_1-\mu_1}{\sigma_1}\Big)^2+\Big(\frac{x_2-\mu_2}{\sigma_2}\Big)^2-2\rho\Big(\frac{x_1-\mu_1}{\sigma_1}\Big)\Big(\frac{x_2-\mu_2}{\sigma_2}\Big)\equiv Q. {} \end{aligned} $$
(3.2.5)

Hence, the real bivariate normal density is

$$\displaystyle \begin{aligned} f(x_1,x_2)=\frac{1}{2\pi \sigma_1\sigma_2\sqrt{(1-\rho^2)}}{\mathrm{e}}^{-\frac{Q}{2(1-\rho^2)}}{} \end{aligned} $$
(3.2.6)

where Q is given in (3.2.5). Observe that Q is a positive definite quadratic form and hence Q > 0 for all X and μ. We can also obtain an interesting result on the standardized variables of x 1 and x 2. Let the standardized x j be \(y_j=\frac {x_j-\mu _j}{\sigma _j},\ j=1,2\) and u = y 1 − y 2. Then

$$\displaystyle \begin{aligned} {\mathrm{Var}}(u)={\mathrm{Var}}(y_1)+{\mathrm{Var}}(y_2)-2{\mathrm{Cov}}(y_1,y_2)=1+1-2\rho=2(1-\rho).{} \end{aligned} $$
(3.2.7)

This shows that the smaller the absolute value of ρ is, the larger the variance of u, and vice versa, noting that − 1 < ρ < 1 in the bivariate real normal case but in general, − 1 ≤ ρ ≤ 1. Observe that if ρ = 0 in the bivariate normal density given in (3.2.6), this joint density factorizes into the product of the marginal densities of x 1 and x 2, which implies that x 1 and x 2 are independently distributed when ρ = 0. In general, for real scalar random variables x and y, ρ = 0 need not imply independence; however, in the bivariate normal case, ρ = 0 if and only if x 1 and x 2 are independently distributed. As well, the exponent in (3.2.6) has the following feature:

$$\displaystyle \begin{aligned} Q=(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)=\Big(\frac{x_1-\mu_1}{\sigma_1}\Big)^2+\Big(\frac{x_2-\mu_2}{\sigma_2}\Big)^2-2\rho\,\Big(\frac{x_1-\mu_1}{\sigma_1}\Big)\Big(\frac{x_2-\mu_2} {\sigma_2}\Big)=c {} \end{aligned} $$
(3.2.8)

where c is positive describes an ellipse in two-dimensional Euclidean space, and for a general p,

$$\displaystyle \begin{aligned} (X-\mu)^{\prime}\varSigma^{-1}(X-\mu)=c>0,\ \varSigma>O,{} \end{aligned} $$
(3.2.9)

describes the surface of an ellipsoid in the p-dimensional Euclidean space, observing that Σ −1 > O when Σ > O.

Example 3.2.1

Let

Show that Σ > O and that Σ can be a covariance matrix for X. Taking E[X] = μ and Cov(X) = Σ, construct the exponent of a trivariate real Gaussian density explicitly and write down the density.

Solution 3.2.1

Let us verify the definiteness of Σ. Note that Σ = Σ (symmetric). The leading minors are \(|(3)|=3>0,\ \left \vert \begin {array}{cc}3&0\\ 0&3\end {array}\right \vert =9>0,\ |\varSigma |=12>0\), and hence Σ > O. The matrix of cofactors of Σ, that is, Cof(Σ) and the inverse of Σ are the following:

$$\displaystyle \begin{aligned}{\mathrm{Cof}}(\varSigma)=\left[\begin{array}{ccc}5&-1\ \ \ &3\\ -1\ \ \ &5&-3\ \ \ \\ 3&-3\ \ \ &9\end{array}\right], \ \varSigma^{-1}=\frac{1}{12}\left[\begin{array}{ccc}5&-1\ \ \ &3\\ -1\ \ \ &5&-3\ \ \ \\ 3&-3\ \ \ &9\end{array}\right].{} \end{aligned} $$
(i)

Thus the exponent of the trivariate real Gaussian density is \(-\frac {1}{2}Q\) where

(ii)

The normalizing constant of the density being

$$\displaystyle \begin{aligned}(2\pi)^{\frac{p}{2}}|\varSigma|{}^{\frac{1}{2}}=(2\pi)^{\frac{3}{2}}[12]^{\frac{1}{2}}=2^{\frac{5}{2}}\sqrt{3}\pi^{\frac{3}{2}},\end{aligned}$$

the resulting trivariate Gaussian density is

$$\displaystyle \begin{aligned}f(X)=[2^{\frac{5}{2}}\sqrt{3}\pi^{\frac{3}{2}}]^{-1}{\mathrm{e}}^{-\frac{1}{2}Q} \end{aligned}$$

for − < x j < , j = 1, 2, 3, where Q is specified in (ii).

3.2.1. The moment generating function in the real case

We have defined the multivariate Gaussian distribution via the following characterization whose proof relies on its moment generating function: if all the possible linear combinations of the components of a random vector are real univariate normal, then this vector must follow a real multivariate Gaussian distribution. We are now looking into the derivation of the mgf given the density. For a parameter vector T, with T  = (t 1, …, t p), we have

$$\displaystyle \begin{aligned} M_X(T)&=E[{\mathrm{e}}^{T^{\prime}X}]=\int_X{\mathrm{e}}^{T^{\prime}X}f(X){\mathrm{d}}X={\mathrm{e}}^{T^{\prime}\mu}E[{\mathrm{e}}^{T^{\prime}(X-\mu)}]\\ &=\frac{{\mathrm{e}}^{T^{\prime}\mu}}{|\varSigma|{}^{\frac{1}{2}}(2\pi)^{\frac{p}{2}}}\int_X{\mathrm{e}}^{T^{\prime}(X-\mu)-\frac{1}{2}(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)}{\mathrm{d}}X.\end{aligned} $$

Observe that the moment generating function (mgf) in the real multivariate case is the expected value of e raised to a linear function of the real scalar variables. Making the transformation \(Y=\varSigma ^{-\frac {1}{2}}(X-\mu )\Rightarrow {\mathrm {d}}Y=|\varSigma |{ }^{-\frac {1}{2}}{\mathrm {d}}X\). The exponent can be simplified as follows:

$$\displaystyle \begin{aligned} T^{\prime}(X-\mu)&-\frac{1}{2}(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)=-\frac{1}{2}\{-2T^{\prime}\varSigma^{\frac{1}{2}}Y+Y^{\prime}Y\}\\ &=-\frac{1}{2}\{(Y-\varSigma^{\frac{1}{2}}T)^{\prime}(Y-\varSigma^{\frac{1}{2}}T)-T^{\prime}\varSigma T\}.\end{aligned} $$

Hence

$$\displaystyle \begin{aligned}M_X(T)={\mathrm{e}}^{T^{\prime}\mu+\frac{1}{2}T^{\prime}\varSigma T}\frac{1}{(2\pi)^{\frac{p}{2}}}\int_Y{\mathrm{e}}^{-\frac{1}{2}(Y-\varSigma^{\frac{1}{2}}T)^{\prime}(Y-\varSigma^{\frac{1}{2}}T)}{\mathrm{d}}Y.\end{aligned}$$

The integral over Y  is 1 since this is the total integral of a multivariate normal density whose mean value vector is \(\varSigma ^{\frac {1}{2}}T\) and covariance matrix is the identity matrix. Thus the mgf of a multivariate real Gaussian vector is

$$\displaystyle \begin{aligned} M_X(T)={\mathrm{e}}^{T^{\prime}\mu+\frac{1}{2}T^{\prime}\varSigma T}.{} \end{aligned} $$
(3.2.10)

In the singular normal case, we can still take (3.2.10) as the mgf for Σ ≥ O (non-negative definite), which encompasses the singular and nonsingular cases. Then, one can study properties of the normal distribution whether singular or nonsingular via (3.2.10).

We will now apply the differential operator \(\frac {\partial }{\partial T}\) defined in Sect. 1.7 on the moment generating function of a p × 1 real normal random vector X and evaluate the result at T = O to obtain the mean value vector of this distribution, that is, μ = E[X]. As well, E[XX ] is available by applying the operator \(\frac {\partial }{\partial T}\frac {\partial }{\partial T^{\prime }}\) on the mgf, and so on. From the mgf in (3.2.10), we have

$$\displaystyle \begin{aligned} \frac{\partial}{\partial T}M_X(T)|{}_{T=O}&=\frac{\partial}{\partial T}{\mathrm{e}}^{T^{\prime}\mu+\frac{1}{2}T^{\prime}\varSigma T}|{}_{T=O}\\ &=[{\mathrm{e}}^{T^{\prime}\mu+\frac{1}{2}T^{\prime}\varSigma T}[\mu+\varSigma T]|{}_{T=O}]\Rightarrow \mu=E[X]. {}\end{aligned} $$
(i)

Then,

$$\displaystyle \begin{aligned} \frac{\partial}{\partial T^{\prime}}M_X(T)={\mathrm{e}}^{T^{\prime}\mu+\frac{1}{2}T^{\prime}\varSigma T}[\mu^{\prime}+T^{\prime}\varSigma].{}\end{aligned} $$
(ii)

Remember to write the scalar quantity, M X(T), on the left for scalar multiplication of matrices. Now,

$$\displaystyle \begin{aligned} \frac{\partial}{\partial T}\frac{\partial}{\partial T^{\prime}}M_X(T)&=\frac{\partial}{\partial T}{\mathrm{e}}^{T^{\prime}\mu+\frac{1}{2}T^{\prime}\varSigma T}[\mu^{\prime}+T^{\prime}\varSigma]\\ &=M_X(T)[\mu+\varSigma T][\mu^{\prime}+T^{\prime}\varSigma]+M_X(T)[\varSigma].\end{aligned} $$

Hence,

$$\displaystyle \begin{aligned} E[XX^{\prime}]=\Big[\frac{\partial}{\partial T}\frac{\partial}{\partial T^{\prime}}M_X(T)|{}_{T=O}\Big]=\varSigma +\mu\mu^{\prime}.{} \end{aligned} $$
(iii)

But

$$\displaystyle \begin{aligned} {\mathrm{Cov}}(X)=E[XX^{\prime}]-E[X]E[X^{\prime}]=(\varSigma+\mu\mu^{\prime})-\mu\mu^{\prime}=\varSigma. \end{aligned} $$
(iv)

In the multivariate real Gaussian case, we have only two parameters μ and Σ and both of these are available from the above equations. In the general case, we can evaluate higher moments as follows:

$$\displaystyle \begin{aligned} E[\ \cdots X^{\prime}XX^{\prime}]=\ \cdots \frac{\partial}{\partial T^{\prime}}\frac{\partial}{\partial T}\frac{\partial}{\partial T^{\prime}}M_X(T)|{}_{T=O}\,. {} \end{aligned} $$
(v)

If the characteristic function ϕ X(T), which is available from the mgf by replacing T by \(iT,\ i=\sqrt {(-1)},\) is utilized, then multiply the left-hand side of (v) by \(i=\sqrt {(-1)}\) with each operator operating on ϕ X(T) because ϕ X(T) = M X(iT). The corresponding differential operators can also be developed for the complex case.

Given a real p-vector X ∼ N p(μ, Σ), Σ > O, what will be the distribution of a linear function of X? Let u = L X, X ∼ N p(μ, Σ), Σ > O, L  = (a 1, …, a p) where a 1, …, a p are real scalar constants. Let us examine its mgf whose argument is a real scalar parameter t. The mgf of u is available by integrating out over the density of X. We have

$$\displaystyle \begin{aligned}M_u(t)=E[{\mathrm{e}}^{tu}]=E[{\mathrm{e}}^{tL^{\prime}X}]=E[{\mathrm{e}}^{(tL^{\prime})X}]. \end{aligned}$$

This is of the same form as in (3.2.10) and hence, M u(t) is available from (3.2.10) by replacing T by (tL ), that is,

$$\displaystyle \begin{aligned} M_u(t)={\mathrm{e}}^{t(L^{\prime}\mu)+\frac{t^2}{2}L^{\prime}\varSigma L}\Rightarrow u\sim N_1(L^{\prime}\mu,L^{\prime}\varSigma L). {} \end{aligned} $$
(3.2.11)

This means that u is a univariate normal with mean value L μ = E[u] and the variance of L ΣL = Var(u). Now, let us consider a set of linearly independent linear functions of X. Let A be a real q × p, q ≤ p matrix of full rank q and let the linear functions U = AX where U is q × 1. Then E[U] = AE[X] =  and the covariance matrix in U is

$$\displaystyle \begin{aligned} {\mathrm{Cov}}(U)&=E[U-E(U)][U-E(U)]^{\prime}=E[A(X-\mu)(X-\mu)^{\prime}A^{\prime}]\\ &=AE[(X-\mu)(X-\mu)^{\prime}]A^{\prime}=A\,\varSigma A^{\prime}.\end{aligned} $$

Observe that since Σ > O, we can write \(\varSigma =\varSigma _1\varSigma _1^{\prime }\) so that AΣA  = ( 1)( 1) and 1 is of full rank which means that AΣA  > O. Therefore, letting T be a q × 1 parameter vector, we have

$$\displaystyle \begin{aligned}M_U(T)=E[{\mathrm{e}}^{T^{\prime}U}]=E[{\mathrm{e}}^{T^{\prime}AX}]=E[{\mathrm{e}}^{(T^{\prime}A)X}], \end{aligned}$$

which is available from (3.2.10). That is,

$$\displaystyle \begin{aligned}M_U(T)={\mathrm{e}}^{T^{\prime}A\mu+\frac{1}{2}(T^{\prime}A\varSigma A^{\prime}T)}\Rightarrow U\sim N_q(A\mu,A\varSigma A^{\prime}). \end{aligned}$$

Thus U is a q-variate multivariate normal with parameters and AΣA and we have the following result:

Theorem 3.2.1

Let the vector random variable X have a real p-variate nonsingular N p(μ, Σ) distribution and the q × p matrix A with q  p, be a full rank constant matrix. Then

$$\displaystyle \begin{aligned} U=AX\sim N_q(A\mu, A\varSigma A^{\prime}), \ A\varSigma A>O.{} \end{aligned} $$
(3.2.12)

Corollary 3.2.1

Let the vector random variable X have a real p-variate nonsingular N p(μ, Σ) distribution and B be a 1 × p constant vector. Then U 1 = BX has a univariate normal distribution with parameters Bμ and BΣB .

Example 3.2.2

Let X, μ = E[X], Σ = Cov(X), Y, and A be as follows:

Let y 1 = x 1 + x 2 + x 3 and y 2 = x 1 − x 2 + x 3 and write Y = AX. If Σ > O and if X ∼ N 3(μ, Σ), derive the density of (1) Y ; (2) y 1 directly as well as from (1).

Solution 3.2.2

The leading minors of Σ are \(|(4)|=4>0, \left \vert \begin {array}{cc}4&-2\ \ \ \\ -2\ \ \ &3\end {array}\right \vert =8>0, \ |\varSigma |=12>0\) and Σ = Σ . Being symmetric and positive definite, Σ is a bona fide covariance matrix. Now, Y = AX where

(3.2.13 i)
(ii)

Since A is of full rank (rank 2) and y 1 and y 2 are linear functions of the real Gaussian vector X, Y  has a bivariate nonsingular real Gaussian distribution with parameters E(Y ) and Cov(Y ). Since

the density of Y  has the exponent \(-\frac {1}{2}Q\) where

(iii)

The normalizing constant being \((2\pi )^{\frac {p}{2}}|\varSigma |{ }^{\frac {1}{2}}=2\pi \sqrt {68}=4\sqrt {17}\pi \), the density of Y , denoted by f(Y ), is given by

$$\displaystyle \begin{aligned} f(Y)=\frac{1}{4\sqrt{17}\pi}{\mathrm{e}}^{-\frac{1}{2}Q} \end{aligned} $$
(iv)

where Q is specified in (iii). This establishes (1). For establishing (2), we first start with the formula. Let \(y_1=A_1X\Rightarrow A_1=[1,1,1],\ E[y_1]=A_1E[X]=[1,1,1]\left [\begin {array}{r}2\\ 0\\ -1\end {array}\right ]=1\)and

Hence y 1 ∼ N 1(1, 7). For establishing this result directly, observe that y 1 is a linear function of real normal variables and hence, it is univariate real normal with the parameters E[y 1] and Var(y 1). We may also obtain the marginal distribution of y 1 directly from the parameters of the joint density of y 1 and y 2, which are given in (i) and (ii). Thus, (2) is also established.

The marginal distributions can also be determined from the mgf. Let us partition T, μ and Σ as follows:

(v)

where T 1, μ (1), X 1 are r × 1 and Σ 11 is r × r. Letting T 2 = O (the null vector), we have

which is the structure of the mgf of a real Gaussian distribution with mean value vector E[X 1] = μ (1) and covariance matrix Cov(X 1) = Σ 11. Therefore X 1 is an r-variate real Gaussian vector and similarly, X 2 is (p − r)-variate real Gaussian vector. The standard notation used for a p-variate normal distribution is X ∼ N p(μ, Σ), Σ ≥ O, which includes the nonsingular and singular cases. In the nonsingular case, Σ > O, whereas |Σ| = 0 in the singular case.

From the mgf in (3.2.10) and (i) above, if we have Σ 12 = O with \( \varSigma _{21}=\varSigma _{12}^{\prime }\), then the mgf of becomes \({\mathrm {e}}^{T_1^{\prime }\,\mu _{(1)}+T_2^{\prime }\,\mu _{(2)}+\frac {1}{2}T_1^{\prime }\,\varSigma _{11}T_1+\frac {1}{2}T_2^{\prime }\,\varSigma _{22}T_2}\). That is,

$$\displaystyle \begin{aligned}M_X(T)=M_{X_1}(T_1)M_{X_2}(T_2),\end{aligned}$$

which implies that X 1 and X 2 are independently distributed. Hence the following result:

Theorem 3.2.2

Let the real p × 1 vector X  N p(μ, Σ), Σ > O, and let X be partitioned into subvectors X 1 and X 2 , with the corresponding partitioning of μ and Σ, that is,

Then, X 1 and X 2 are independently distributed if and only if \(\varSigma _{12}=\varSigma _{21}^{\prime }=O\).

Observe that a covariance matrix being null need not imply independence of the subvectors; however, in the case of subvectors having a joint normal distribution, it suffices to have a null covariance matrix to conclude that the subvectors are independently distributed.

3.2a. The Moment Generating Function in the Complex Case

The determination of the mgf in the complex case is somewhat different. Take a p-variate complex Gaussian \(\tilde {X}\sim \tilde {N}_p(\tilde {\mu },\tilde {\varSigma }),\ \tilde {\varSigma }=\tilde {\varSigma }^{*}>O\). Let \(\tilde {T}^{\prime }=(\tilde {t}_1,\ldots ,\tilde {t}_p)\) be a parameter vector. Let \(\tilde {T}=T_1+iT_2\), where T 1 and T 2 are p × 1 real vectors and \(i=\sqrt {(-1)}\). Let \(\tilde {X}=X_1+iX_2\) with X 1 and X 2 being real. Then consider \(\tilde {T}^{*}\tilde {X}=(T_1^{\prime }-iT_2^{\prime })(X_1+iX_2)=T_1^{\prime }X_1+T_2^{\prime }X_2+i(T_1^{\prime }X_2-T_2^{\prime }X_1)\). But \(T_1^{\prime }X_1+T_2^{\prime }X_2\) already contains the necessary number of parameters and all the corresponding real variables and hence to be consistent with the definition of the mgf in the real case one must take only the real part in \(\tilde {T}^{*}\tilde {X}\). Hence the mgf in the complex case, denoted by \(M_{\tilde {X}}(\tilde {T})\), is defined as \(E[{\mathrm {e}}^{\Re (\tilde {T}^{*}\tilde {X})}]\). For convenience, we may take \(\tilde {X}=\tilde {X}-\tilde {\mu }+\tilde {\mu }\). Then \(E[{\mathrm {e}}^{\Re (\tilde {T}^{*}\tilde {X})}]={\mathrm {e}}^{\Re (\tilde {T}^{*}\tilde {\mu })}E[{\mathrm {e}}^{\Re (\tilde {T}^{*}(\tilde {X}-\tilde {\mu }))}]\). On making the transformation \(\tilde {Y}=\varSigma ^{-\frac {1}{2}}(\tilde {X}-\tilde {\mu })\), |det(Σ)| appearing in the denominator of the density of \(\tilde {X}\) is canceled due to the Jacobian of the transformation and we have \((\tilde {X}-\tilde {\mu })=\varSigma ^{\frac {1}{2}}\tilde {Y}\). Thus,

$$\displaystyle \begin{aligned} E[{\mathrm{e}}^{\Re(\tilde{T}^{*}\tilde{Y})}]=\frac{1}{\pi^p}\int_{\tilde{Y}}{\mathrm{e}}^{\Re(\tilde{T}^{*}\varSigma^{\frac{1}{2}}\tilde{Y})\,-\,\tilde{Y}^{*}\tilde{Y}}{\mathrm{d}}\tilde{Y}. \end{aligned} $$
(i)

For evaluating the integral in (i), we can utilize the following result which will be stated here as a lemma.

Lemma 3.2a.1

Let \(\tilde {U}\) and \(\tilde {V}\) be two p × 1 vectors in the complex domain. Then

$$\displaystyle \begin{aligned}2\,\Re(\tilde{U}^{*}\tilde{V})={\mathrm{U}}^{*}\tilde{V}+\tilde{V}^{*}\tilde{U}=2\,\Re(\tilde{V}^{*}\tilde{U}). \end{aligned}$$

Proof

Let \(\tilde {U}=U_1+iU_2,\ \tilde {V}=V_1+iV_2\) where U 1, U 2, V 1, V 2 are real vectors and \(i=\sqrt {(-1)}\). Then \(\tilde {U}^{*}\tilde {V}=[U_1^{\prime }-iU_2^{\prime }][V_1+iV_2]=U_1^{\prime }V_1+U_2^{\prime }V_2+i[U_1^{\prime }V_2-U_2^{\prime }V_1]\). Similarly \(\tilde {V}^{*}\tilde {U}=V_1^{\prime }U_1+V_2^{\prime }U_2+i[V_1^{\prime }U_2-V_2^{\prime }U_1]\). Observe that since U 1, U 2, V 1, V 2 are real, we have \(U_i^{\prime }V_j=V_j^{\prime }U_i\) for all i and j. Hence, the sum \(\tilde {U}^{*}\tilde {V}+\tilde {V}^{*}\tilde {U}=2[U_1^{\prime }V_1+ U_2^{\prime }V_2]=2\,\Re (\tilde {V}^{*}\tilde {U})\). This completes the proof.

Now, the exponent in (i) can be written as

$$\displaystyle \begin{aligned}\Re(\tilde{T}^{*}\varSigma^{\frac{1}{2}}\tilde{Y})=\frac{1}{2}\tilde{T}^{*}\varSigma^{\frac{1}{2}}\tilde{Y} +\frac{1}{2}\tilde{Y}^{*}\varSigma^{\frac{1}{2}}\tilde{T} \end{aligned}$$

by using Lemma 3.2a.1, observing that Σ = Σ . Let us expand \((\tilde {Y}-C)^{*}(\tilde {Y}-C)\) as \(\tilde {Y}^{*}\tilde {Y}-\tilde {Y}^{*}C-C^{*}\tilde {Y}+C^{*}C\) for some C. Comparing with the exponent in (i), we may take \(C^{*}=\frac {1}{2}\tilde {T}^{*}\varSigma ^{\frac {1}{2}}\) so that \(C^{*}C=\frac {1}{4}\tilde {T}^{*}\varSigma \tilde {T}\). Therefore in the complex Gaussian case, the mgf is

$$\displaystyle \begin{aligned} M_{\tilde{X}}(\tilde{T})={\mathrm{e}}^{\Re(\tilde{T}^{*}\tilde{\mu})+\frac{1}{4}\tilde{T}^{*}\varSigma \tilde{T}}.{} \end{aligned} $$
(3.2a.1)

Example 3.2a.1

Let \(\tilde {X},\ E[\tilde {X}]=\tilde {\mu }, \ {\mathrm {Cov}}(\tilde {X})=\varSigma \) be the following where \(\tilde {X}\sim \tilde {N}_2(\tilde {\mu },\varSigma ),\ \varSigma >O\),

Compute the mgf of \(\tilde {X}\) explicitly.

Solution 3.2a.1

Let where let \(\tilde {t}_1=t_{11}+it_{12}, \tilde {t}_2=t_{21}+it_{22}\) with t 11, t 12, t 21, t 22 being real scalar parameters. The mgf of \(\tilde {X}\) is

$$\displaystyle \begin{aligned}M_{\tilde{X}}(\tilde{T})={\mathrm{e}}^{\Re(\tilde{T}^{*}\tilde{\mu})+\frac{1}{4}\tilde{T}^{*}\varSigma \tilde{T}}. \end{aligned}$$

Consider the first term in the exponent of the mgf:

The second term in the exponent is the following:

Note that since the parameters are scalar quantities, the conjugate transpose means only the conjugate or \(\tilde {t}_j^{*}=\bar {\tilde {t}}_j,~j=1,2\). Let us look at the non-diagonal terms. Note that \([(1+i)\tilde {t}_1^{*}\tilde {t}_2]+[(1-i)\tilde {t}_2^{*}\tilde {t}_1]\) gives 2(t 11 t 21 + t 12 t 22 + t 12 t 21 − t 11 t 22). However, \(\tilde {t}_1^{*}\tilde {t}_1=t_{11}^2+t_{12}^2, \tilde {t}_2^{*}\tilde {t}_2=t_{21}^2+t_{22}^2\). Hence if the exponent of \(M_{\tilde {X}}(\tilde {t})\) is denoted by ϕ,

$$\displaystyle \begin{aligned} \phi&=[t_{11}-t_{12}+2t_{21}-3t_{22}]+\frac{1}{4}\{3(t_{11}^2+t_{12}^2)+2(t_{21}^2+t_{22}^2)\\ &\ \ \ \ +2(t_{11}t_{21}+t_{12}t_{22}+t_{12}t_{21}-t_{11}t_{22})\}. \end{aligned} $$
(i)

Thus the mgf is

$$\displaystyle \begin{aligned}M_{\tilde{X}}(\tilde{T})={\mathrm{e}}^{\phi} \end{aligned}$$

where ϕ is given in (i).

3.2a.1. Moments from the moment generating function

We can also derive the moments from the mgf of (3.2a.1) by operating with the differential operator of Sect. 1.7 of Chap. 1. For the complex case, the operator \(\frac {\partial }{\partial X_1}\) in the real case has to be modified. Let \(\tilde {X}=X_1+iX_2\) be a p × 1 vector in the complex domain where X 1 and X 2 are real and p × 1 and \(i=\sqrt {(-1)}\). Then in the complex domain the differential operator is

$$\displaystyle \begin{aligned} \frac{\partial}{\partial \tilde{X}}=\frac{\partial}{\partial X_1}+i\frac{\partial}{\partial X_2}. \end{aligned} $$
(ii)

Let \(\tilde {T}=T_1+iT_2, \ \tilde {\mu }=\mu _{(1)}+i\mu _{(2)},\ \varSigma =\varSigma _1+i\varSigma _2\) where T 1, T 2, μ (1), μ (2), Σ 1, Σ 2 are all real and \(i=\sqrt {(-1)}\), \(\varSigma _1=\varSigma _1^{\prime }, \) and \( \varSigma _2^{\prime }=-\varSigma _2\) because Σ is Hermitian. Note that \(\tilde {T}^{*}\varSigma \tilde {T}=(T_1^{\prime }-iT_2^{\prime })\varSigma (T_1+iT_2)=T_1^{\prime }\varSigma T_1+T_2^{\prime }\varSigma T_2+i(T_1^{\prime }\varSigma T_2-T_2^{\prime }\varSigma T_1)\), and observe that

$$\displaystyle \begin{aligned} T_j^{\prime}\,\varSigma T_j=T_j^{\prime}(\varSigma_1+i\varSigma_2) T_j=T_j^{\prime}\,\varSigma_1T_j+0 \end{aligned} $$
(iii)

for j = 1, 2 since Σ 2 is skew symmetric. The exponent in the mgf in (3.2a.1) can be simplified as follows: Letting u denote the exponent in the mgf and observing that \([\tilde {T}^{*}\varSigma \tilde {T}]^{*}=\tilde {T}^{*}\varSigma \tilde {T}\) is real,

$$\displaystyle \begin{aligned} u&=\Re(\tilde{T}^{*}\tilde{\mu})+\frac{1}{4}\tilde{T}^{*}\varSigma \tilde{T}=\Re(T_1^{\prime}-iT_2^{\prime})(\mu_{(1)}+i\mu_{(2)})+\frac{1}{4}(T_1^{\prime}-iT_2^{\prime})\varSigma(T_1+iT_2)\\ &=T_1^{\prime}\mu_{(1)}+T_2^{\prime}\mu_{(2)}+\frac{1}{4}[T_1^{\prime}\,\varSigma T_1+T_2^{\prime}\,\varSigma T_2]+\frac{1}{4}u_1,\ u_1=i(T_1^{\prime}\,\varSigma T_2-T_2^{\prime}\,\varSigma T_1)\\ &=T_1^{\prime}\mu_{(1)}+T_2^{\prime}\mu_{(2)}+\frac{1}{4}[T_1^{\prime}\,\varSigma_1T_1+T_2^{\prime}\,\varSigma_1T_2]+\frac{1}{4}u_1. \end{aligned} $$
(iv)

In this last line, we have made use of the result in (iii). The following lemma will enable us to simplify u 1.

Lemma 3.2a.2

Let T 1 and T 2 be real p × 1 vectors. Let the p × p matrix Σ be Hermitian, Σ = Σ  = Σ 1 + iΣ 2 , with \(\varSigma _1=\varSigma _1^{\prime }\) and \(\varSigma _2=-\varSigma _2^{\prime }\) . Then

$$\displaystyle \begin{aligned} u_1&=i(T_1^{\prime}\,\varSigma T_2-T_2^{\prime}\,\varSigma T_1)=-2T_1^{\prime}\,\varSigma_2 T_2=2T_2^{\prime}\,\varSigma_2 T_1\\ &\Rightarrow \frac{\partial}{\partial T_1}u_1=-2\varSigma_2 T_2\ \mathit{\mbox{ and }}\ \frac{\partial}{\partial T_2}u_1=2\varSigma_2T_1. \end{aligned} $$
(v)

Proof

This result will be established by making use of the following general properties: For a 1 × 1 matrix, the transpose is itself whereas the conjugate transpose is the conjugate of the same quantity. That is, (a + ib) = a + ib, (a + ib) = a − ib and if the conjugate transpose is equal to itself then the quantity is real or equivalently, if (a + ib) = (a + ib) = a − ib then b = 0 and the quantity is real. Thus,

$$\displaystyle \begin{aligned} u_1&=i(T_1^{\prime}\varSigma T_2-T_2^{\prime}\varSigma T_1)=i[T_1^{\prime}(\varSigma_1+i\varSigma_2)T_2- T_2^{\prime}(\varSigma_1+i\varSigma_2)T_1],\\ &=iT_1^{\prime}\varSigma_1 T_2-T_1^{\prime}\varSigma_2T_2-iT_2^{\prime}\varSigma_1 T_1+T_2^{\prime}\varSigma_2 T_1=-T_1^{\prime}\varSigma_2 T_2+T_2^{\prime}\varSigma_2 T_1\\ &=-2T_1^{\prime}\varSigma_2T_2=2T_2^{\prime}\varSigma_2 T_1. \end{aligned} $$
(vi)

The following properties were utilized: \(T_i^{\prime }\varSigma _1T_j=T_j^{\prime }\varSigma _1 T_i\) for all i and j since Σ 1 is a symmetric matrix, the quantity is 1 × 1 and real and hence, the transpose is itself; \(T_i^{\prime }\varSigma _2 T_j=-T_j^{\prime }\varSigma _2 T_i\) for all i and j because the quantities are 1 × 1 and then, the transpose is itself, but the transpose of \(\varSigma _2^{\prime }=-\varSigma _2\). This completes the proof.

Now, let us apply the operator \((\frac {\partial }{\partial T_1}+i\frac {\partial }{\partial T_2})\) to the mgf in (3.2a.1) and determine the various quantities. Note that in light of results stated in Chap. 1, we have

$$\displaystyle \begin{aligned} \frac{\partial}{\partial T_1}(T_1^{\prime}\varSigma_1 T_1)&=2\varSigma_1T_1,\ \frac{\partial}{\partial T_1}(-2T_1^{\prime}\varSigma_2 T_2)=-2\varSigma_2T_2,\ \frac{\partial}{\partial T_1}\Re(\tilde{T}^{*}\tilde{\mu})=\mu_{(1)},\\ \frac{\partial}{\partial T_2}(T_2^{\prime}\varSigma_1 T_2)&=2\varSigma_1T_2,\ \frac{\partial}{\partial T_2}(2T_2^{\prime}\varSigma_2 T_1)=2\varSigma_2 T_1,\ \frac{\partial}{\partial T_2}\Re(\tilde{T}^{*}\tilde{\mu})=\mu_{(2)}.\end{aligned} $$

Thus, given (ii)(vi), the operator applied to the exponent of the mgf gives the following result:

$$\displaystyle \begin{aligned} \Big(\frac{\partial}{\partial T_1}+i\frac{\partial}{\partial T_2}\Big)u&=\mu_{(1)}+i\mu_{(2)}+\frac{1}{4}[2\varSigma_1T_1-2\varSigma_2T_2+2\varSigma_1iT_2+2\varSigma_2iT_1]\\ &=\tilde{\mu}+\frac{1}{4}[2(\varSigma_1+i\varSigma_2)T_1+2(\varSigma_1+i\varSigma_2)iT_2=\tilde{\mu}+\frac{1}{4}[2\varSigma\tilde{T}]=\tilde{\mu}+\frac{1}{2}\varSigma \tilde{T},\end{aligned} $$

so that

$$\displaystyle \begin{aligned} \frac{\partial}{\partial \tilde{T}}M_{\tilde{X}}(\tilde{T})|{}_{\tilde{T}=O}&=\Big(\frac{\partial}{\partial T_1}+i\frac{\partial}{\partial T_2}\Big)M_{\tilde{X}}(\tilde{T})|{}_{\tilde{T}=O}\\ &=[M_{\tilde{X}}(\tilde{T})[\tilde{\mu}+\frac{1}{2}\tilde{\varSigma}\tilde{T}]|{}_{T_1=O,T_2=O}=\tilde{\mu}, \end{aligned} $$
(vii)

noting that \(\tilde {T}=O\) implies that T 1 = O and T 2 = O. For convenience, let us denote the operator by

$$\displaystyle \begin{aligned}\frac{\partial}{\partial \tilde{T}}=\Big(\frac{\partial}{\partial T_1}+i\frac{\partial}{\partial T_2}\Big). \end{aligned}$$

From (vii), we have

$$\displaystyle \begin{aligned} \frac{\partial}{\partial\tilde{T}}M_{\tilde{X}}(\tilde{T})&=M_{\tilde{X}}(\tilde{T})[\tilde{\mu}+\frac{1}{2}\varSigma \tilde{T}],\\ \frac{\partial}{\partial\tilde{T}^{*}}M_{\tilde{X}}(\tilde{T})&=[\tilde{\mu}^{*}+\frac{1}{2}\tilde{T}^{*}\tilde{\varSigma}]M_{\tilde{X}}(\tilde{T}). \end{aligned} $$

Now, observe that

$$\displaystyle \begin{aligned} \tilde{T}^{*}\varSigma&= (T_1^{\prime}-iT_2^{\prime})\varSigma=T_1^{\prime}\varSigma -iT_2^{\prime}\varSigma\Rightarrow\\ \frac{\partial}{\partial T_1}(\tilde{T}^{*}\varSigma)&=\varSigma,\ \frac{\partial}{\partial T_2}(\tilde{T}^{*}\varSigma) = -i\varSigma,\\ \Big(\frac{\partial}{\partial T_1}+i\frac{\partial}{\partial T_2}\Big)(\tilde{T}^{*}\varSigma)&=\varSigma-i(i)\varSigma=2\varSigma,\end{aligned} $$

and

$$\displaystyle \begin{aligned}\frac{\partial}{\partial \tilde{T}}\frac{\partial}{\partial\tilde{T}^{*}}M_{\tilde{X}}(\tilde{T})|{}_{\tilde{T}=O}=\tilde{\mu}\tilde{\mu}^{*}+\tilde{\varSigma}. \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned}\frac{\partial}{\partial \tilde{T}}M_{\tilde{X}}(\tilde{T})|{}_{\tilde{T}=O}=\tilde{\mu}\ \mbox{ and }\ \frac{\partial}{\partial \tilde{T}}\frac{\partial}{\partial\tilde{T}^{*}}M_{\tilde{X}}(\tilde{T})|{}_{\tilde{T}=O}=\tilde{\varSigma}+\tilde{\mu}\tilde{\mu}^{*}, \end{aligned}$$

and then \({\mathrm {Cov}}(\tilde {X})=\tilde {\varSigma }\). In general, for higher order moments, one would have

$$\displaystyle \begin{aligned}E[\ \cdots \tilde{X}^{*}\tilde{X}\tilde{X}^{*}]=\ \cdots\frac{\partial}{\partial\tilde{T}^{*}}\frac{\partial}{\partial\tilde{T}} \frac{\partial}{\partial\tilde{T}^{*}}M_{\tilde{X}}(\tilde{T})|{}_{\tilde{T}=O}\,. \end{aligned}$$

3.2a.2. Linear functions

Let \(\tilde {w}=L^{*}\tilde {X}\) where L  = (a 1, …, a p) and a 1, …, a p are scalar constants, real or complex. Then the mgf of \(\tilde {w}\) can be evaluated by integrating out over the p-variate complex Gaussian density of \(\tilde {X}\). That is,

$$\displaystyle \begin{aligned} M_{\tilde{w}}(\tilde{t})=E[{\mathrm{e}}^{\Re(\tilde{t}\tilde{w})}]=E[{\mathrm{e}}^{(\Re(\tilde{t}L^{*}\tilde{X}))}].{} \end{aligned} $$
(3.2a.2)

Note that this expected value is available from (3.2a.1) by replacing \(\tilde {T}^{*}\) by \(\tilde {t}L^{*}\). Hence

$$\displaystyle \begin{aligned} M_{\tilde{w}}(\tilde{t})={\mathrm{e}}^{\Re(\tilde{t}(L^{*}\tilde{\mu}))+\frac{1}{4}\tilde{t}\tilde{t}^{*}(L^{*}\varSigma L)}.{} \end{aligned} $$
(3.2a.3)

Then from (2.1a.1), \(\tilde {w}=L^{*}\tilde {X}\) is univariate complex Gaussian with the parameters \(L^{*}\tilde {\mu }\) and L ΣL. We now consider several such linear functions: Let \(\tilde {Y}=A\tilde {X}\) where A is q × p, q ≤ p and of full rank q. The distribution of \(\tilde {Y}\) can be determined as follows. Since \(\tilde {Y}\) is a function of \(\tilde {X}\), we can evaluate the mgf of \(\tilde {Y}\) by integrating out over the density of \(\tilde {X}\). Since \(\tilde {Y}\) is q × 1, let us take a q × 1 parameter vector \(\tilde {U}\). Then,

$$\displaystyle \begin{aligned} M_{\tilde{Y}}(\tilde{U})=E[{\mathrm{e}}^{\Re(\tilde{U}^{*}\tilde{Y})}]=E[{\mathrm{e}}^{\Re(\tilde{U}^{*}A\tilde{X})}]=E[{\mathrm{e}}^{\Re[(\tilde{U}^{*}A)\tilde{X}]}].{} \end{aligned} $$
(3.2a.4)

On comparing this expected value with (3.2a.1), we can write down the mgf of \(\tilde {Y}\) as the following:

$$\displaystyle \begin{aligned} M_{\tilde{Y}}(\tilde{U})={\mathrm{e}}^{\Re(\tilde{U}^{*}A\tilde{\mu})+\frac{1}{4}(\tilde{U}^{*}A)\varSigma(A^{*}\tilde{U})}={\mathrm{e}}^{\Re(\tilde{U}^{*}(A\tilde{\mu}))+\frac{1}{4}\tilde{U}^{*}(A\varSigma A^{*})\tilde{U}},{} \end{aligned} $$
(3.2a.5)

which means that \(\tilde {Y}\) has a q-variate complex Gaussian distribution with the parameters \(A\,\tilde {\mu }\) and AΣA . Thus, we have the following result:

Theorem 3.2a.1

Let \(\tilde {X}\sim \tilde {N}_p(\tilde {\mu }, \varSigma ), \ \varSigma >O\) be a p-variate nonsingular complex normal vector. Let A be a q × p, q  p, constant real or complex matrix of full rank q. Let \(\tilde {Y}=A\tilde {X}\) . Then,

$$\displaystyle \begin{aligned} \tilde{Y}\sim\tilde{N}_q(A\,\tilde{\mu}, A\varSigma A^{*}),\ A\varSigma A^{*}>O.{} \end{aligned} $$
(3.2a.6)

Let us consider the following partitioning of \(\tilde {T},\ \tilde {X},\ \varSigma \) where \(\tilde {T}\) is p × 1, \(\tilde {T}_1\) is r × 1, r ≤ p, \(\tilde {X}_1\) is r × 1, Σ 11 is r × r, \(\tilde {\mu }_{(1)}\) is r × 1:

Let \(\tilde {T}_2=O\). Then the mgf of \(\tilde {X}\) becomes that of \(\tilde {X}_1\) as

Thus the mgf of \(\tilde {X}_1\) becomes

$$\displaystyle \begin{aligned} M_{\tilde{X}_1}(\tilde{T}_1)={\mathrm{e}}^{\Re(\tilde{T}_1^{*}\tilde{\mu}_{(1)})+\frac{1}{4}\tilde{T}_1^{*}\varSigma_{11}\tilde{T}_1}.{} \end{aligned} $$
(3.2a.7)

This is the mgf of the r × 1 subvector \(\tilde {X}_1\) and hence \(\tilde {X}_1\) has an r-variate complex Gaussian density with the mean value vector \(\tilde {\mu }_{(1)}\) and the covariance matrix Σ 11. In a real or complex Gaussian vector, the individual variables can be permuted among themselves with the corresponding permutations in the mean value vector and the covariance matrix. Hence, all subsets of components of \(\tilde {X}\) are Gaussian distributed. Thus, any set of r components of \(\tilde {X}\) is again a complex Gaussian for r = 1, 2, …, p when \(\tilde {X}\) is a p-variate complex Gaussian.

Suppose that, in the mgf of (3.2a.1), Σ 12 = O where \(\tilde {X}\sim \tilde {N}_p(\tilde {\mu },\varSigma ),\ \varSigma >O\) and

When Σ 12 is null, so is Σ 21 since \(\varSigma _{21}=\varSigma _{12}^{*}\). Then is block-diagonal. As well, \(\Re (\tilde {T}^{*}\tilde {\mu })=\Re (\tilde {T}_1^{*}\tilde {\mu }_{(1)})+\Re (\tilde {T}_2^{*}\tilde {\mu }_{(2)})\) and

(i)

In other words, \(M_{\tilde {X}}(\tilde {T})\) becomes the product of the the mgf of \(\tilde {X}_1\) and the mgf of \(\tilde {X}_2\), that is, \(\tilde {X}_1\) and \(\tilde {X}_2\) are independently distributed whenever Σ 12 = O.

Theorem 3.2a.2

Let \(\tilde {X}\sim \tilde {N}_p(\tilde {\mu },\varSigma ),\ \varSigma >O\) , be a nonsingular complex Gaussian vector. Consider the partitioning of \(\tilde {X},\ \tilde {\mu },\ \tilde {T},\ \varSigma \) as in (i) above. Then, the subvectors \(\tilde {X}_1\) and \(\tilde {X}_2\) are independently distributed as complex Gaussian vectors if and only if Σ 12 = O or equivalently, Σ 21 = O.

Exercises 3.2

3.2.1

Construct a 2 × 2 real positive definite matrix A. Then write down a bivariate real Gaussian density where the covariance matrix is this A.

3.2.2

Construct a 2 × 2 Hermitian positive definite matrix B and then construct a complex bivariate Gaussian density. Write the exponent and normalizing constant explicitly.

3.2.3

Construct a 3 × 3 real positive definite matrix A. Then create a real trivariate Gaussian density with this A being the covariance matrix. Write down the exponent and the normalizing constant explicitly.

3.2.4

Repeat Exercise 3.2.3 for the complex Gaussian case.

3.2.5

Let the p × 1 real vector random variable have a p-variate real nonsingular Gaussian density X ∼ N p(μ, Σ), Σ > O. Let L be a p × 1 constant vector. Let u = L X = X L =  a linear function of X. Show that E[u] = L μ, Var(u) = L ΣL and that u is a univariate Gaussian with the parameters L μ and L ΣL.

3.2.6

Show that the mgf of u in Exercise 3.2.5 is

$$\displaystyle \begin{aligned}M_u(t)={\mathrm{e}}^{t(L^{\prime}\mu)+\frac{t^2}{2}L^{\prime}\varSigma L}.\end{aligned}$$

3.2.7

What are the corresponding results in Exercises 3.2.5 and 3.2.6 for the nonsingular complex Gaussian case?

3.2.8

Let X ∼ N p(O, Σ), Σ > O, be a real p-variate nonsingular Gaussian vector. Let u 1 = X Σ −1 X, and u 2 = X X. Derive the densities of u 1 and u 2.

3.2.9

Establish Theorem 3.2.1 by using transformation of variables [Hint: Augment the matrix A with a matrix B such that is p × p and nonsingular. Derive the density of Y = CX, and therefrom, the marginal density of AX.]

3.2.10

By constructing counter examples or otherwise, show the following: Let the real scalar random variables x 1 and x 2 be such that \(x_1\sim N_1(\mu _1,\sigma _1^2), \ \sigma _1>0, x_2\sim N_1(\mu _2,\sigma _2^2), \ \sigma _2>0\) and Cov(x 1, x 2) = 0. Then, the joint density need not be bivariate normal.

3.2.11

Generalize Exercise 3.2.10 to p-vectors X 1 and X 2.

3.2.12

Extend Exercises 3.2.10 and 3.2.11 to the complex domain.

3.3. Marginal and Conditional Densities, Real Case

Let the p × 1 vector have a real p-variate Gaussian distribution X ∼ N p(μ, Σ), Σ > O. Let X, μ and Σ be partitioned as the following:

where X 1 and μ (1) are r × 1, X 2 and μ (2) are (p − r) × 1, Σ 11 is r × r, and so on. Then

(i)

But

$$\displaystyle \begin{aligned}{}[(X_1-\mu_{(1)})^{\prime}\varSigma^{12}(X_2-\mu_{(2)})]^{\prime}=(X_2-\mu_{(2)})^{\prime}\varSigma^{21}(X_1-\mu_{(1)}) \end{aligned}$$

and both are real 1 × 1. Thus they are equal and we may write their sum as twice either one of them. Collecting the terms containing X 2 − μ (2), we have

$$\displaystyle \begin{aligned} (X_2-\mu_{(2)})^{\prime}\varSigma^{22}(X_2-\mu_{(2)})+2(X_2-\mu_{(2)})^{\prime}\varSigma^{21}(X_1-\mu_{(1)}). \end{aligned} $$
(ii)

If we expand a quadratic form of the type (X 2μ (2) + C) Σ 22(X 2 − μ (2) + C), we have

$$\displaystyle \begin{aligned} (X_2-\mu_{(2)}+C)^{\prime}&\varSigma^{22}(X_2-\mu_{(2)}+C)=(X_2-\mu_{(2)})^{\prime}\varSigma^{22}(X_2-\mu_{(2)})\\ &+(X_2-\mu_{(2)})^{\prime}\varSigma^{22}C+C^{\prime}\varSigma^{22}(X_2-\mu_{(2)})+C^{\prime}\varSigma^{22}C. \end{aligned} $$
(iii)

Comparing (ii) and (iii), let

$$\displaystyle \begin{aligned}\varSigma^{22}C=\varSigma^{21}(X_1-\mu_{(1)})\Rightarrow C=(\varSigma^{22})^{-1}\varSigma^{21}(X_1-\mu_{(1)}).\end{aligned}$$

Then,

$$\displaystyle \begin{aligned}C^{\prime}\varSigma^{22}C=(X_1-\mu_{(1)})^{\prime}\varSigma^{12}(\varSigma^{22})^{-1}\varSigma^{21}(X_1-\mu_{(1)}). \end{aligned}$$

Hence,

$$\displaystyle \begin{aligned} (X-\mu)^{\prime}\varSigma^{-1}(X-\mu)&=(X_1-\mu_{(1)})^{\prime}[\varSigma^{11}-\varSigma^{12}(\varSigma^{22})^{-1}\varSigma^{21}](X_1-\mu_{(1)})\\ &\ \ \ \ \ +(X_2-\mu_{(2)}+C)^{\prime}\varSigma^{22}(X_2-\mu_{(2)}+C),\end{aligned} $$

and after integrating out X 2, the balance of the exponent is \((X_1-\mu _{(1)})^{\prime }\varSigma _{11}^{-1}(X_1-\mu _{(1)})\), where Σ 11 is the r × r leading submatrix in Σ; the reader may refer to Sect. 1.3 for results on the inversion of partitioned matrices. Observe that \(\varSigma _{11}^{-1}=\varSigma ^{11}-\varSigma ^{12}(\varSigma ^{22})^{-1}\varSigma ^{21}\). The integral over X 2 only gives a constant and hence the marginal density of X 1 is

$$\displaystyle \begin{aligned}f_1(X_1)=c_1\ {\mathrm{e}}^{-\frac{1}{2}(X_1-\mu_{(1)})^{\prime}\varSigma_{11}^{-1}(X_1-\mu_{(1)})}. \end{aligned}$$

On noting that it has the same structure as the real multivariate Gaussian density, its normalizing constant can easily be determined and the resulting density is as follows:

$$\displaystyle \begin{aligned} f_1(X_1)=\frac{1}{|\varSigma_{11}|{}^{\frac{1}{2}}(2\pi)^{\frac{r}{2}}}{\mathrm{e}}^{-\frac{1}{2}(X_1-\mu_{(1)})^{\prime}\varSigma_{11}^{-1}(X_1-\mu_{(1)})}, \ \varSigma_{11}>O, {} \end{aligned} $$
(3.3.1)

for − < x j < ,  − < μ j < , j = 1, …, r, and where Σ 11 is the covariance matrix in X 1 and μ (1) = E[X 1] and Σ 11 = Cov(X 1). From symmetry, we obtain the following marginal density of X 2 in the real Gaussian case:

$$\displaystyle \begin{aligned} f_2(X_2)=\frac{1}{|\varSigma_{22}|{}^{\frac{1}{2}}(2\pi)^{\frac{p-r}{2}}}{\mathrm{e}}^{-\frac{1}{2}(X_2-\mu_{(2)})^{\prime}\varSigma_{22}^{-1}(X_2-\mu_{(2)})},\ \varSigma_{22}>O, {} \end{aligned} $$
(3.3.2)

for − < x j < ,  − < μ j < , j = r + 1, …, p.

Observe that we can permute the elements in X as we please with the corresponding permutations in μ and the covariance matrix Σ. Hence the real Gaussian density in the p-variate case is a multivariate density and not a vector/matrix-variate density. From this property, it follows that every subset of the elements from X has a real multivariate Gaussian distribution and the individual variables have univariate real normal or Gaussian distribution. Hence our derivation of the marginal density of X 1 is a general density for a subset of r elements in X because those r elements can be brought to the first r positions through permutations of the elements in X with the corresponding permutations in μ and Σ.

The bivariate case

Let us look at the explicit form of the real Gaussian density for p = 2. In the bivariate case,

For convenience, let us denote σ 11 by \(\sigma _1^2\ \) and σ 22 by \(\sigma _2^2\). Then σ 12 = σ 1 σ 2 ρ where ρ is the correlation between x 1 and x 1, and for p = 2,

$$\displaystyle \begin{aligned}|\varSigma|=\sigma_1^2\sigma_2^2-(\sigma_1\sigma_2\rho)^2=\sigma_1^2\sigma_2^2(1-\rho^2). \end{aligned}$$

Thus, in that case,

Hence, substituting these into the general expression for the real Gaussian density and denoting the real bivariate density as f(x 1, x 2), we have the following:

$$\displaystyle \begin{aligned} f(x_1,x_2)=\frac{1}{(2\pi)\sigma_1\sigma_2\sqrt{1-\rho^2}}\exp\Big\{-\frac{1}{2(1-\rho^2)}Q\Big\}{} \end{aligned} $$
(3.3.3)

where Q is the real positive definite quadratic form

$$\displaystyle \begin{aligned}Q=\Big(\frac{x_1-\mu_1}{\sigma_1}\Big)^2-2\rho\Big(\frac{x_1-\mu_1}{\sigma_1}\Big)\Big(\frac{x_2-\mu_2}{\sigma_2}\Big) +\Big(\frac{x_2-\mu_2}{\sigma_2}\Big)^2 \end{aligned}$$

for σ 1 > 0, σ 2 > 0,  − 1 < ρ < 1,  − < x j < ,  − < μ j < , j = 1, 2.

The conditional density of X 1 given X 2, denoted by g 1(X 1|X 2), is the following:

$$\displaystyle \begin{aligned} g_1(X_1|X_2)&=\frac{f(X)}{f_2(X_2)}=\frac{|\varSigma_{22}|{}^{\frac{1}{2}}}{(2\pi)^{\frac{r}{2}}|\varSigma|{}^{\frac{1}{2}}}\\ &\ \ \ \ \ \ \ \ \ \times \exp\Big\{\!\!-\frac{1}{2}[(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)-(X_2-\mu_{(2)})^{\prime}\varSigma_{22}^{-1}(X_2-\mu_{(2)})]\Big\}.\end{aligned} $$

We can simplify the exponent, excluding \(-\frac {1}{2}\), as follows:

$$\displaystyle \begin{aligned} (X-\mu)^{\prime}\varSigma^{-1}(X-\mu)&-(X_2-\mu_{(2)})^{\prime}\varSigma_{22}^{-1}(X_2-\mu_{(2)})\\ &=(X_1-\mu_{(1)})^{\prime}\varSigma^{11}(X_1-\mu_{(1)})+2(X_1-\mu_{(1)})^{\prime}\varSigma^{12}(X_2-\mu_{(2)})\\ &\ \ \ \ +(X_2-\mu_{(2)})^{\prime}\varSigma^{22}(X_2-\mu_{(2)})-(X_2-\mu_{(2)})^{\prime}\varSigma_{22}^{-1}(X_2-\mu_{(2)}).\end{aligned} $$

But \(\varSigma _{22}^{-1}=\varSigma ^{22}-\varSigma ^{21}(\varSigma ^{11})^{-1}\varSigma ^{12}\). Hence the terms containing Σ 22 are canceled. The remaining terms containing X 2 − μ (2) are

$$\displaystyle \begin{aligned}2(X_1-\mu_{(1)})^{\prime}\varSigma^{12}(X_2-\mu_{(2)})+(X_2-\mu_{(2)})^{\prime}\varSigma^{21}(\varSigma^{11})^{-1}\varSigma^{12}(X_2-\mu_{(2)}). \end{aligned}$$

Combining these two terms with (X 1μ (1)) Σ 11(X 1 − μ (1)) results in the quadratic form (X 1μ (1) + C) Σ 11(X 1 − μ (1) + C) where C = (Σ 11)−1 Σ 12(X 2 − μ (2)). Now, noting that

$$\displaystyle \begin{aligned}\frac{|\varSigma_{22}|{}^{\frac{1}{2}}}{|\varSigma|{}^{\frac{1}{2}}}=\left[\frac{|\varSigma_{22}|}{|\varSigma_{22}|~|\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}|}\right]^{\frac{1}{2}} =\frac{1}{|\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}|{}^{\frac{1}{2}}}\,, \end{aligned}$$

the conditional density of X 1 given X 2, which is denoted by g 1(X 1|X 2), can be expressed as follows:

$$\displaystyle \begin{aligned} g_1(X_1|X_2)&=\frac{1}{(2\pi)^{\frac{r}{2}}|\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}|{}^{\frac{1}{2}}}\\ &\ \ \ \times\exp\{-\frac{1}{2}(X_1-\mu_{(1)}+C)^{\prime}(\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21})^{-1}(X_1-\mu_{(1)}+C){} \end{aligned} $$
(3.3.4)

where C = (Σ 11)−1 Σ 12(X 2 − μ (2)). Hence, the conditional expectation and covariance of X 1 given X 2 are

$$\displaystyle \begin{aligned} E[X_1|X_2]&=\mu_{(1)}-C=\mu_{(1)}-(\varSigma^{11})^{-1}\varSigma^{12}(X_2-\mu_{(2)})\\ &=\mu_{(1)}+\varSigma_{12}\varSigma_{22}^{-1}(X_2-\mu_{(2)})\mbox{,}\ \ \mbox{which is linear in }X_2.\\ {\mathrm{Cov}}(X_1|X_2)&=\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21},\ \mbox{which is free of }X_2.{} \end{aligned} $$
(3.3.5)

From the inverses of partitioned matrices obtained in Sect. 1.3, we have − (Σ 11)−1 Σ 12 \(=\varSigma _{12}\varSigma _{22}^{-1}\), which yields the representation of the conditional expectation appearing in Eq. (3.3.5). The matrix \(\varSigma _{12}\varSigma _{22}^{-1}\) is often called the matrix of regression coefficients. From symmetry, it follows that the conditional density of X 2, given X 1, denoted by g 2(X 2|X 1), is given by

$$\displaystyle \begin{aligned} g_2(X_2|X_1)&=\frac{1}{(2\pi)^{\frac{p-r}{2}}|\varSigma_{22}-\varSigma_{21}\varSigma_{11}^{-1}\varSigma_{12}|{}^{\frac{1}{2}}}\\ &\ \ \ \ \times \exp\Big\{\!\!-\frac{1}{2}(X_2-\mu_{(2)}+C_1)^{\prime}(\varSigma_{22}-\varSigma_{21}\varSigma_{11}^{-1}\varSigma_{12})^{-1}(X_2-\mu_{(2)}+C_1)\Big\}{} \end{aligned} $$
(3.3.6)

where C 1 = (Σ 22)−1 Σ 21(X 1 − μ (1)), and the conditional expectation and conditional variance of X 2 given X 1 are

$$\displaystyle \begin{aligned} E[X_2|X_1]&=\mu_{(2)}-C_1=\mu_{(2)}-(\varSigma^{22})^{-1}\varSigma^{21}(X_1-\mu_{(1)})\\ &=\mu_{(2)}+\varSigma_{21}\varSigma_{11}^{-1}(X_1-\mu_{(1)})\mbox{,}\ \ \mbox{which linear in }X_1\\ {\mathrm{Cov}}(X_2|X_1)&=\varSigma_{22}-\varSigma_{21}\varSigma_{11}^{-1}\varSigma_{12} \mbox{,}\ \ \mbox{which is free of }X_1,{} \end{aligned} $$
(3.3.7)

the matrix \(\varSigma _{21}\varSigma _{11}^{-1}\) being often called the matrix of regression coefficients.

What is then the conditional expectation of x 1 given x 2 in the bivariate normal case? From formula (3.3.5) for p = 2, we have

$$\displaystyle \begin{aligned} E[X_1|X_2]&=\mu_{(1)}+\varSigma_{12}\varSigma_{22}^{-1}(X_2-\mu_{(2)})=\mu_1+\frac{\sigma_{12}}{\sigma_2^2}(x_2-\mu_2)\\ &=\mu_1+\frac{\sigma_1\sigma_2\rho}{\sigma_2^2}(x_2-\mu_2)=\mu_1+\frac{\sigma_1}{\sigma_2}\rho (x_2-\mu_2)=E[x_1|x_2],{} \end{aligned} $$
(3.3.8)

which is linear in x 2. The coefficient \(\frac {\sigma _1}{\sigma _2}\rho \) is often referred to as the regression coefficient. Then, from (3.3.7) we have

$$\displaystyle \begin{aligned} E[x_2|x_1]=\mu_2+\frac{\sigma_2}{\sigma_1}\rho(x_1-\mu_1)\mbox{,}\ \ \mbox{which is linear in }x_1{} \end{aligned} $$
(3.3.9)

and \(\frac {\sigma _2}{\sigma _1}\rho \) is the regression coefficient. Thus, (3.3.8) gives the best predictor of x 1 based on x 2 and (3.3.9), the best predictor of x 2 based on x 1, both being linear in the case of a multivariate real normal distribution; in this case, we have a bivariate normal distribution.

Example 3.3.1

Let X, x 1, x 2, x 3, E[X] = μ, Cov(X) = Σ be specified as follows where X ∼ N 3(μ, Σ), Σ > O:

Compute (1) the marginal densities of x 1 and ; (2) the conditional density of x 1 given X 2 and the conditional density of X 2 given x 1; (3) conditional expectations or regressions of x 1 on X 2 and X 2 on x 1.

Solution 3.3.1

Let us partition Σ accordingly, that is,

Let us compute the following quantities:

As well,

Then we have the following:

(i)

and

$$\displaystyle \begin{aligned} {\mathrm{Cov}}(X_1|X_2)=\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}=\frac{3}{5}; \end{aligned} $$
(ii)
(iii)

and

(iv)

The distributions of x 1 and X 2 are respectively x 1 ∼ N 1(−1, 3) and X 2 ∼ N 2(μ (2), Σ 22), the corresponding densities denoted by f 1(x 1) and f 2(X 2) being

$$\displaystyle \begin{aligned} f_1(x_1)&=\frac{1}{\sqrt{(2\pi)}\sqrt{3}}{\mathrm{e}}^{-\frac{1}{6}(x_1+1)^2},\ -\infty<x_1<\infty,\\ f_2(X_2)&=\frac{1}{(2\pi)\sqrt{5}}{\mathrm{e}}^{-\frac{1}{2}Q_1},\ Q_1=\frac{1}{5}[3(x_2)^2-2(x_2)(x_3+2)+2(x_3+2)^2]\end{aligned} $$

for − < x j < , j = 2, 3. The conditional distributions are X 1|X 2 ∼ N 1(E(X 1|X 2), Var(X 1|X 2)) and X 2|X 1 ∼ N 2(E(X 2|X 1), Cov(X 2|X 1)), the associated densities denoted by g 1(X 1|X 2) and g 2(X 2|X 1) being given by

$$\displaystyle \begin{aligned} g_1(X_1|X_2)&=\frac{1}{\sqrt{(2\pi)}(3/5)^{\frac{1}{2}}}{\mathrm{e}}^{-\frac{5}{6}[x_1+1+\frac{6}{5}x_2-\frac{2}{5}(x_3+2)]^2},\\ g_2(X_2|X_1)&=\frac{1}{2\pi \times 1}{\mathrm{e}}^{-\frac{1}{2}Q_2},\\ Q_2&=3\Big[x_2+\frac{2}{3}(x_1+1)\Big]^2-2\Big[x_2+\frac{2}{3}(x_1+1)\Big](x_3+2)+\frac{2}{3}(x_3+2)^2\end{aligned} $$

for − < x j < , j = 1, 2, 3. This completes the computations.

3.3a. Conditional and Marginal Densities in the Complex Case

Let the p × 1 complex vector \(\tilde {X}\) have the p-variate complex normal distribution, \(\tilde {X}\sim \tilde {N}_p(\tilde {\mu },\tilde {\varSigma }),\ \tilde {\varSigma }>O\). As can be seen from the corresponding mgf which was derived in Sect. 3.2a, all subsets of the variables \(\tilde {x}_1,\ldots , \tilde {x}_p\) are again complex Gaussian distributed. This result can be obtained by integrating out the remaining variables from the p-variate complex Gaussian density. Let \(\tilde {U}=\tilde {X}-\tilde {\mu }\) for convenience. Partition \(\tilde {X}, \tilde {\mu }, \ \tilde {U}\) into subvectors and Σ into submatrices as follows:

where \(\tilde {X}_1,\ \tilde {\mu }_{(1)},\ \tilde {U}_1\) are r × 1 and Σ 11 is r × r. Consider

(i)

and suppose that we wish to integrate out \(\tilde {U}_2\) to obtain the marginal density of \(\tilde {U}_1\). The terms containing \(\tilde {U}_2\) are \(\tilde {U}_2^{*}\varSigma ^{22}\tilde {U}_2+\tilde {U}_1^{*}\varSigma ^{12}\tilde {U}_2+\tilde {U}_2^{*}\varSigma ^{21}\tilde {U}_1\). On expanding the Hermitian form

$$\displaystyle \begin{aligned} (\tilde{U}_2+C)^{*}\varSigma^{22}(\tilde{U}_2+C)&=\tilde{U}_2^{*}\varSigma^{22}\tilde{U}_2+\tilde{U}_2^{*}\varSigma^{22}C\\ &\ \ \ \ +C^{*}\varSigma^{22}\tilde{U}_2+C^{*}\varSigma^{22}C, \end{aligned} $$
(ii)

for some C and comparing (i) and (ii), we may let \(\varSigma ^{21}\tilde {U}_1=\varSigma ^{22}C\Rightarrow C=(\varSigma ^{22})^{-1}\varSigma ^{21}\tilde {U}_1\). Then \(C^{*}\varSigma ^{22}C=\tilde {U}_1^{*}\varSigma ^{12}(\varSigma ^{22})^{-1}\varSigma ^{21}\tilde {U}_1\) and (i) may thus be written as

$$\displaystyle \begin{aligned}\tilde{U}_1^{*}(\varSigma^{11}-\varSigma^{12}(\varSigma^{22})^{-1}\varSigma^{21})\tilde{U}_1+(\tilde{U}_2+C)^{*}\varSigma^{22}(\tilde{U}_2+C),\ C=(\varSigma^{22})^{-1}\varSigma^{21}\tilde{U}_1. \end{aligned}$$

However, from Sect. 1.3 on partitioned matrices, we have

$$\displaystyle \begin{aligned}\varSigma^{11}-\varSigma^{12}(\varSigma^{22})^{-1}\varSigma^{21}=\varSigma_{11}^{-1}. \end{aligned}$$

As well,

$$\displaystyle \begin{aligned} {\mathrm{det}}(\varSigma)&=[{\mathrm{det}}(\varSigma_{11})][{\mathrm{det}}(\varSigma_{22}-\varSigma_{21}\varSigma_{11}^{-1}\varSigma_{12})]\\ &=[{\mathrm{det}}(\varSigma_{11})][{\mathrm{det}}((\varSigma^{22})^{-1})].\end{aligned} $$

Note that the integral of \(\exp \{-(\tilde {U}_2+C)^{*}\varSigma ^{22}(\tilde {U}_2+C)\}\) over \(\tilde {U}_2\) gives \(\pi ^{p-r}|{\mathrm {det}}(\varSigma ^{22})^{-1}|=\pi ^{p-r}|{\mathrm {det}}(\varSigma _{22}-\varSigma _{21}\varSigma _{11}^{-1}\varSigma _{12})|\). Hence the marginal density of \(\tilde {X}_1\) is

$$\displaystyle \begin{aligned} \tilde{f}_1(\tilde{X}_1)=\frac{1}{\pi^r|{\mathrm{det}}(\varSigma_{11})|}{\mathrm{e}}^{-(\tilde{X}_1-\tilde{\mu}_{(1)})^{*}\varSigma_{11}^{-1}(\tilde{X}_1-\tilde{\mu}_{(1)})},\ \varSigma_{11}>O.{} \end{aligned} $$
(3.3a.1)

It is an r-variate complex Gaussian density. Similarly \(\tilde {X}_2\) has the (p − r)-variate complex Gaussian density

$$\displaystyle \begin{aligned} \tilde{f}_2(\tilde{X}_2)=\frac{1}{\pi^{p-r}|{\mathrm{det}}(\varSigma_{22})|}{\mathrm{e}}^{-(\tilde{X}_2-\tilde{\mu}_{(2)})^{*}\varSigma_{22}^{-1}(\tilde{X}_2-\tilde{\mu}_{(2)})},\ \varSigma_{22}>O.{} \end{aligned} $$
(3.3a.2)

Hence, the conditional density of \(\tilde {X}_1\) given \(\tilde {X}_2\), is

$$\displaystyle \begin{aligned} \tilde{g}_1(\tilde{X}_1|\tilde{X}_2)&=\frac{\tilde{f}(\tilde{X}_1,\tilde{X}_2)}{\tilde{f}_2(\tilde{X}_2)}=\frac{\pi^{p-r}|{\mathrm{det}}(\varSigma_{22})|}{\pi^p|{\mathrm{det}}(\varSigma)|}\\ &\ \ \ \times {\mathrm{e}}^{-(\tilde{X}-\tilde{\mu})^{*}\varSigma^{-1}(\tilde{X}-\tilde{\mu}) +(\tilde{X}_2-\tilde{\mu}_{(2)})^{*}\varSigma_{22}^{-1}(\tilde{X}_2-\tilde{\mu}_{(2)})}.\end{aligned} $$

From Sect. 1.3, we have

$$\displaystyle \begin{aligned}|{\mathrm{det}}(\varSigma)|=|{\mathrm{det}}(\varSigma_{22})|~|{\mathrm{det}}(\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21})|\end{aligned}$$

and then the normalizing constant is \([\pi ^r|{\mathrm {det}}(\varSigma _{11}-\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21})|]^{-1}\). The exponential part reduces to the following by taking \(\tilde {U}=\tilde {X}-\tilde {\mu },\ \tilde {U}_1=\tilde {X}_1-\tilde {\mu }_{(1)},\ \tilde {U}_2=\tilde {X}_2-\tilde {\mu }_{(2)}\):

$$\displaystyle \begin{aligned} (\tilde{X}-\tilde{\mu})^{*}\varSigma^{-1}(\tilde{X}-\tilde{\mu})&-(\tilde{X}_2-\tilde{\mu}_{(2)})^{*}\varSigma_{22}^{-1}(\tilde{X}_2-\tilde{\mu}_{(2)})\\ &=\tilde{U}_1^{*}\varSigma^{11}\tilde{U}_1 +\tilde{U}_2^{*}\varSigma^{22}\tilde{U}_2+\tilde{U}_1^{*}\varSigma^{12}\tilde{U}_2\\ &\ \ \ +\tilde{U}_2^{*}\varSigma^{21}\tilde{U}_1 -\tilde{U}_2^{*}(\varSigma^{22}-\varSigma^{21}(\varSigma^{11})^{-1}\varSigma^{12})\tilde{U}_2\\ &=\tilde{U}_1^{*}\varSigma^{11}\tilde{U}_1+\tilde{U}_2^{*}\varSigma^{21}(\varSigma^{11})^{-1}\varSigma^{12}\tilde{U}_2+2\,\tilde{U}_1^{*}\varSigma^{12}\tilde{U}_2\\ &=[\tilde{U}_1+(\varSigma^{11})^{-1}\varSigma^{12}\tilde{U}_2]^{*}\varSigma^{11}[\tilde{U}_1+(\varSigma^{11})^{-1}\varSigma^{12}\tilde{U}_2].{} \end{aligned} $$
(3.3a.3)

This exponent has the same structure as that of a complex Gaussian density with \(E[\tilde {U}_1|\tilde {U}_2]=-(\varSigma ^{11})^{-1}\varSigma ^{12}\tilde {U}_2\) and \({\mathrm {Cov}}(\tilde {X}_1|\tilde {X}_2)=(\varSigma ^{11})^{-1}=\varSigma _{11}-\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}.\) Therefore the conditional density of \(\tilde {X}_1\) given \(\tilde {X}_2\) is given by

$$\displaystyle \begin{aligned} \tilde{g}_1(\tilde{X}_1|\tilde{X}_2)&=\frac{1}{\pi^r|{\mathrm{det}}(\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21})|} {\mathrm{e}}^{-(\tilde{X}_1-\tilde{\mu}_{(1)}+C)^{*}\varSigma^{11}(\tilde{X}_1-\tilde{\mu}_{(1)}+C)},\\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \qquad \qquad \qquad \qquad C=-(\varSigma^{11})^{-1}\varSigma^{12}(\tilde{X}_2-\tilde{\mu}_{(2)}).{} \end{aligned} $$
(3.3a.4)

The conditional expectation of \(\tilde {X}_1\) given \(\tilde {X}_2\) is then

$$\displaystyle \begin{aligned} E[\tilde{X}_1|\tilde{X}_2]&=\tilde{\mu}_{(1)}-(\varSigma^{11})^{-1}\varSigma^{12}(\tilde{X}_2-\tilde{\mu}_{(2)})\\ &=\tilde{\mu}_{(1)}+\varSigma_{12}\varSigma_{22}^{-1}(\tilde{X}_2-\tilde{\mu}_{(2)}) \ \ \mbox{(linear in }\tilde{X}_2\text{)}{} \end{aligned} $$
(3.3a.5)

which follows from a result on partitioning of matrices obtained in Sect. 1.3. The matrix \(\varSigma _{12}\varSigma _{22}^{-1}\) is referred to as the matrix of regression coefficients. The conditional covariance matrix is

$$\displaystyle \begin{aligned}{\mathrm{Cov}}(\tilde{X}_1|\tilde{X}_2)=\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}\ \mbox{(free of }\tilde{X}_2\text{)}. \end{aligned}$$

From symmetry, the conditional density of \(\tilde {X}_2\) given \(\tilde {X}_1\) is given by

$$\displaystyle \begin{aligned} \tilde{g}_2(\tilde{X}_2|\tilde{X}_1)&=\frac{1}{\pi^{p-r}|{\mathrm{det}}(\varSigma_{22}-\varSigma_{21}\varSigma_{11}^{-1}\varSigma_{12})|}\\ &\ \ \ \ \times{\mathrm{e}}^{-(\tilde{X}_2-\tilde{\mu}_{(2)}+C_1)^{*}\varSigma^{22}(\tilde{X}_2-\tilde{\mu}_{(2)}+C_1)},{}\\ &\qquad C_1=-(\varSigma^{22})^{-1}\varSigma^{21}(\tilde{X}_1-\tilde{\mu}_{(1)}),\ \varSigma^{22}>O. \end{aligned} $$
(3.3a.6)

Then the conditional expectation and the conditional covariance of \(\tilde {X}_2\) given \(\tilde {X}_1\) are the following:

$$\displaystyle \begin{aligned} E[\tilde{X}_2|\tilde{X}_1]&=\tilde{\mu}_{(2)}-(\varSigma^{22})^{-1}\varSigma^{21}(\tilde{X}_1-\tilde{\mu}_{(1)})\\ &=\tilde{\mu}_{(2)}+\varSigma_{21}\varSigma_{11}^{-1}(\tilde{X}_1-\tilde{\mu}_{(1)})\ \ \mbox{(linear in }\tilde{X}_1\mbox{)}{}\\ {\mathrm{Cov}}(\tilde{X}_2|\tilde{X}_1)&=(\varSigma^{22})^{-1}=\varSigma_{22}-\varSigma_{21}\varSigma_{11}^{-1} \varSigma_{12}\ \ \mbox{(free of }\tilde{X}_1\text{)}, \end{aligned} $$
(3.3a.7)

where, in this case, the matrix \(\varSigma _{21}\varSigma _{11}^{-1}\) is referred to as the matrix of regression coefficients.

Example 3.3a.1

Let \(\tilde {X},\ \tilde {\mu }=E[\tilde {X}],\ \varSigma ={\mathrm {Cov}}(\tilde {X})\) be as follows:

Consider the partitioning

where \(\tilde {x}_j,\ j=1,2,3\) are scalar complex variables and \(\tilde {X}\sim \tilde {N}_3(\tilde {\mu },\varSigma )\). Determine (1) the marginal densities of \(\tilde {X}_1\) and \(\tilde {X}_2\); (2) the conditional expectation of \(\tilde {X}_1|\tilde {X}_2\) or \(E[\tilde {X}_1|\tilde {X}_2]\) and the conditional expectation of \(\tilde {X}_2|\tilde {X}_1\) or \(E[\tilde {X}_2|\tilde {X}_1]\); (3) the conditional densities of \(\tilde {X}_1|\tilde {X}_2\) and \(\tilde {X}_2|\tilde {X}_1\).

Solution 3.3a.1

Note that Σ = Σ and hence Σ is Hermitian. Let us compute the leading minors of Σ: ,

Hence Σ is Hermitian positive definite. Note that the cofactor expansion for determinants holds whether the elements present in the determinant are real or complex. Let us compute the inverses of the submatrices by taking the transpose of the matrix of cofactors divided by the determinant. This formula applies whether the elements comprising the matrix are real or complex. Then

(i)
(ii)
(iii)
(iv)
$$\displaystyle \begin{aligned}{}[\varSigma_{22}-\varSigma_{21}\varSigma_{11}^{-1}\varSigma_{12}]^{-1}&=\frac{4}{9}. \end{aligned} $$
(v)

As well,

(vi)
(vii)

With these computations, all the questions can be answered. We have

and \(\tilde {X}_2=\tilde {x}_3\sim \tilde {N}_1(3i,3)\). Let the densities of \(\tilde {X}_1\) and \(\tilde {X}_2=\tilde {x}_3\) be denoted by \(\tilde {f}_1(\tilde {X}_1)\) and \(\tilde {f}_2(\tilde {x}_3)\), respectively. Then

$$\displaystyle \begin{aligned} \tilde{f}_2(\tilde{x}_3)&=\frac{1}{(\pi)(3)}{\mathrm{e}}^{-\frac{1}{3}(\tilde{x}_3 -3i)^{*}(\tilde{x}_3-3i)};\\ \tilde{f}_1(\tilde{X}_1)&=\frac{1}{(\pi^2)(4)}{\mathrm{e}}^{-Q_1},\\ Q_1&=\frac{1}{4}[2(\tilde{x}_1-(1+i))^{*}(\tilde{x}_1-(1+i))\\ &\ \ \ \ -(1+i)(\tilde{x}_1-(1+i))^{*}(\tilde{x}_2-(2-i))-(1-i)(\tilde{x}_2\\ &\ \ \ \ -(2-i))^{*}(\tilde{x}_1-(1+i))+3(\tilde{x}_2-(2-i))^{*}(\tilde{x}_2-(2-i)).\end{aligned} $$

The conditional densities, denoted by \(\tilde {g}_1(\tilde {X}_1|\tilde {X}_2)\) and \(\tilde {g}_2(\tilde {X}_2|\tilde {X}_1)\), are the following:

$$\displaystyle \begin{aligned} \tilde{g}_1(\tilde{X}_1|\tilde{X}_2)&=\frac{1}{(\pi^2)(3)}{\mathrm{e}}^{-Q_2},\\ Q_2&=\frac{1}{3}\Big[\frac{5}{3}(\tilde{x}_1-(1+i))^{*}(\tilde{x}_1-(1+i))\\ &\ \ \ \ \ \ -(1+i)(\tilde{x}_1-(1+i))^{*}(\tilde{x}_2-(2-i)-\frac{i}{3}(\tilde{x}_3-3i))\\ &\ \ \ \ \ \ -(1-i)(\tilde{x}_2-(2-i)-\frac{i}{3}(\tilde{x}_3-3i))^{*}(\tilde{x}_1-(1+i))\\ &\ \ \ \ \ \ +3(\tilde{x}_2-(2-i)-\frac{i}{3}(\tilde{x}_3-3i))^{*}(\tilde{x}_2-(2-i)-\frac{i}{3}(\tilde{x}_3-3i))\Big];\end{aligned} $$
$$\displaystyle \begin{aligned} \tilde{g}_2(\tilde{X}_2|\tilde{X}_1)&=\frac{1}{(\pi)(9/4)}{\mathrm{e}}^{-Q_3},\\ Q_3&=\frac{4}{9}[(\tilde{x}_3-M_3)^{*}(\tilde{x}_3-M_3)\mbox{ where }\\ M_3&=3i+\frac{1}{4}\{(1+i)[\tilde{x}_1-(1+i)]-3i[\tilde{x}_2-(2-i)]\}\\ &=3i+\frac{1}{4}\{(1+i)\tilde{x}_1-3i\tilde{x}_2+3+4i\}.\end{aligned} $$

The bivariate complex Gaussian case

Letting ρ denote the correlation between \(\tilde {x}_1\) and \(\tilde {x}_2\), it is seen from (3.3a.5) that for p = 2,

$$\displaystyle \begin{aligned} E[\tilde{X}_1|\tilde{X}_2]&=\tilde{\mu}_{(1)}+\varSigma_{12}\varSigma_{22}^{-1}(\tilde{X}_2-\tilde{\mu}_{(2)})=\tilde{\mu}_1+\frac{\sigma_{12}}{\sigma_2^2}(\tilde{x}_2-\tilde{\mu}_2)\\ &=\tilde{\mu}_1+\frac{\sigma_1\sigma_2\rho}{\sigma_2^2}(\tilde{x}_2-\tilde{\mu}_2)=\tilde{\mu}_1+\frac{\sigma_1} {\sigma_2}\rho\,(\tilde{x}_2-\tilde{\mu}_2)=E[\tilde{x}_1|\tilde{x}_2]\ \ \mbox{(linear in }\tilde{x}_2\text{)}.{} \end{aligned} $$
(3.3a.8)

Similarly,

$$\displaystyle \begin{aligned} E[\tilde{x}_2|\tilde{x}_1]=\tilde{\mu}_2+\frac{\sigma_2}{\sigma_1}\rho\,(\tilde{x}_1-\tilde{\mu}_1)\ \ \mbox{(linear in }\tilde{x}_1\text{)}.{} \end{aligned} $$
(3.3a.9)

Incidentally, \({\sigma _{12}}/{\sigma _2^2}\) and \({\sigma _{12}}/{\sigma _1^2}\) are referred to as the regression coefficients.

Exercises 3.3

3.2.13

Let the real p × 1 vector X have a p-variate nonsingular normal density X ∼ N p(μ, Σ), Σ > O. Let u = X Σ −1 X. Make use of the mgf to derive the density of u for (1) μ = O, (2) μO.

3.2.14

Repeat Exercise 3.2.13 for μO for the complex nonsingular Gaussian case.

3.2.15

Observing that the density coming from Exercise 3.2.13 is a noncentral chi-square density, coming from the real p-variate Gaussian, derive the non-central F (the numerator chisquare is noncentral and the denominator chisquare is central) density with m and n degrees of freedom and the two chisquares are independently distributed.

3.2.16

Repeat Exercise 3.2.15 for the complex Gaussian case.

3.2.17

Taking the density of u in Exercise 3.2.13 as a real noncentral chisquare density, derive the density of a real doubly noncentral F.

3.2.18

Repeat Exercise 3.2.17 for the corresponding complex case.

3.2.19

Construct a 3 × 3 Hermitian positive definite matrix V . Let this be the covariance matrix of a 3 × 1 vector variable \(\tilde {X}\). Compute V −1. Then construct a Gaussian density for this \(\tilde {X}\). Derive the marginal joint densities of (1) \(\tilde {x}_1\) and \(\tilde {x}_2\), (2) \(\tilde {x}_1\) and \(\tilde {x}_3\), (3) \(\tilde {x}_2\) and \(\tilde {x}_3\), where \(\tilde {x}_1,\tilde {x}_2,\tilde {x}_3\) are the components of \(\tilde {X}\). Take \(E[\tilde {X}]=O\).

3.2.20

In Exercise 3.2.19, compute (1) \(E[\tilde {x}_1|\tilde {x}_2]\), (2) the conditional joint density of \(\tilde {x}_1,\tilde {x}_2\), given \(\tilde {x}_3\). Take \(E[\tilde {X}]=\tilde {\mu }\ne O\).

3.2.21

In Exercise 3.2.20, compute the mgf in the conditional space of \(\tilde {x}_1\) given \(\tilde {x}_2,\tilde {x}_3\), that is, \(E[{\mathrm {e}}^{\Re (t_1\tilde {x}_1)}|\tilde {x}_2,\tilde {x}_3]\).

3.2.22

In Exercise 3.2.21, compute the mgf in the marginal space of \(\tilde {x}_2,\tilde {x}_3\). What is the connection of the results obtained in Exercises 3.2.21 and 3.2.22 with the mgf of \(\tilde {X}\)?

3.4. Chisquaredness and Independence of Quadratic Forms in the Real Case

Let the p × 1 vector X have a p-variate real Gaussian density with a null vector as its mean value and the identity matrix as its covariance matrix, that is, X ∼ N p(O, I), that is, the components of X are mutually independently distributed real scalar standard normal variables. Let u = X AX, A = A be a real quadratic form in this X. The chisquaredness of a quadratic form such as u has already been discussed in Chap. 2. In this section, we will start with such a u and then consider its generalizations. When A = A , there exists an orthonormal matrix P, that is, PP  = I, P P = I, such that P AP = diag(λ 1, …, λ p) where λ 1, …, λ p are the eigenvalues of A. Letting Y = P X, E[Y ] = P O = O and Cov(Y ) = P IP = I. But Y  is a linear function of X and hence, Y  is also real Gaussian distributed; thus, Y ∼ N p(O, I). Then, \(y_j^2\overset {iid}{\sim }\chi _1^2,\ j=1,\ldots , p,\) or the \(y_j^2\)’s are independently distributed chisquares, each having one degree of freedom. Note that

$$\displaystyle \begin{aligned} u=X^{\prime}AX=Y^{\prime}P^{\prime}APY=\lambda_1y_1^2+\cdots+\lambda_py_p^2.{} \end{aligned} $$
(3.4.1)

We have the following result on the chisquaredness of quadratic forms in the real p-variate Gaussian case, which corresponds to Theorem 2.2.1.

Theorem 3.4.1

Let the p × 1 vector be real Gaussian with the parameters μ = O and Σ = I or X  N p(O, I). Let u = X AX, A = A be a quadratic form in this X. Then \(u=X^{\prime }AX\sim \chi _r^2\) , that is, a real chisquare with r degrees of freedom, if and only if A = A 2 and the rank of A is r.

Proof

When A = A , we have the representation of the quadratic form given in (3.4.1). When A = A 2, all the eigenvalues of A are 1’s and 0’s. Then r of the λ j’s are unities and the remaining ones are zeros and then (3.4.1) becomes the sum of r independently distributed real chisquares having one degree of freedom each, and hence the sum is a real chisquare with r degrees of freedom. For proving the second part, we will assume that \(u=X^{\prime }AX\sim \chi _r^2\). Then the mgf of u is \(M_u(t)=(1-2t)^{-\frac {r}{2}}\) for 1 − 2t > 0. The representation in (3.4.1) holds in general. The mgf of \(y_j^2, \ \lambda _jy_j^2\) and the sum of \(\lambda _jy_j^2\) are the following:

$$\displaystyle \begin{aligned}M_{y_j^2}(t)=(1-2t)^{-\frac{1}{2}},\ M_{\lambda_jy_j^2}(t)=(1-2\lambda_jt)^{-\frac{1}{2}},\ M_u(t)=\prod_{j=1}^p(1-2\lambda_jt)^{-\frac{1}{2}} \end{aligned}$$

for 1 − λ j t > 0, j = 1, …, p. Hence, we have the following identity:

$$\displaystyle \begin{aligned} (1-2t)^{-\frac{r}{2}}=\prod_{j=1}^p(1-2\lambda_jt)^{-\frac{1}{2}},1-2t>0, 1-2\lambda_jt>0,\ j=1,\ldots, p.{} \end{aligned} $$
(3.4.2)

Taking natural logarithm on both sides of (3.4.2), expanding and then comparing the coefficients of \(2t, \frac {(2t)^2}{2},\ldots ,\) we have

$$\displaystyle \begin{aligned} r=\sum_{j=1}^p\lambda_j=\sum_{j=1}^p\lambda_j^2=\sum_{j=1}^p\lambda_j^3=\cdots{} \end{aligned} $$
(3.4.3)

The only solution (3.4.3) can have is that r of the λ j’s are unities and the remaining ones zeros. This property alone will not guarantee that A is idempotent. However, having eigenvalues that are equal to zero or one combined with the property that A = A will ensure that A = A 2. This completes the proof.

Let us look into some generalizations of the Theorem 3.4.1. Let the p × 1 vector have a real Gaussian distribution X ∼ N p(O, Σ), Σ > O, that is, X is a Gaussian vector with the null vector as its mean value and a real positive definite matrix as its covariance matrix. When Σ is positive definite, we can define \(\varSigma ^{\frac {1}{2}}\). Letting \(Z=\varSigma ^{-\frac {1}{2}}X\), Z will be distributed as a standard Gaussian vector, that is, Z ∼ N p(O, I), since Z is a linear function of X with E[Z] = O and Cov(Z) = I. Now, Theorem 3.4.1 is applicable to Z. Then u = X AX, A = A , becomes

$$\displaystyle \begin{aligned}u=Z^{\prime}\varSigma^{\frac{1}{2}}A\varSigma^{\frac{1}{2}}Z, \ \varSigma^{\frac{1}{2}}A\varSigma^{\frac{1}{2}}=(\varSigma^{\frac{1}{2}}A\varSigma^{\frac{1}{2}})^{\prime},\end{aligned}$$

and it follows from Theorem 3.4.1 that the next result holds:

Theorem 3.4.2

Let the p × 1 vector X have a real p-variate Gaussian density X  N p(O, Σ), Σ > O. Then q = X AX, A = A , is a real chisquare with r degrees of freedom if and only if \(\varSigma ^{\frac {1}{2}}A\varSigma ^{\frac {1}{2}}\) is idempotent and of rank r or, equivalently, if and only if A = AΣA and the rank of A is r.

Now, let us consider the general case. Let X ∼ N p(μ, Σ), Σ > O. Let q = X AX, A = A . Then, referring to representation (2.2.1), we can express q as

$$\displaystyle \begin{aligned} \lambda_1(u_1+b_1)^2+\cdots+\lambda_p(u_p+b_p)^2\equiv\lambda_1w_1^2+\cdots+\lambda_pw_p^2 {} \end{aligned} $$
(3.4.4)

where U = (u 1, …, u p)∼ N p(O, I), the λ j’s, j = 1, …, p, are the eigenvalues of \(\varSigma ^{\frac {1}{2}}\!A\varSigma ^{\frac {1}{2}}\) and b i is the i-th component of \(P^{\prime }\varSigma ^{-\frac {1}{2}}\mu \), P being a p × p orthonormal matrix whose j-th column consists of the normalized eigenvectors corresponding to λ j, j = 1, …, p. When μ = O, \(w_j^2\) is a real central chisquare random variable having one degree of freedom; otherwise, it is a real noncentral chisquare random variable with one degree of freedom and noncentality parameter \(\frac {1}{2}b_j^2\). Thus, in general, (3.4.4) is a linear function of independently distributed real noncentral chisquare random variables having one degree of freedom each.

Example 3.4.1

Let X ∼ N 3(O, Σ), q = X AX where

(1) Show that \(q\sim \chi ^2_1\) by applying Theorem 3.4.2 as well as independently; (2) If the mean value vector μ  = [−1, 1, −2], what is then the distribution of q?

Solution 3.4.1

In (1) μ = O and

$$\displaystyle \begin{aligned}X^{\prime}AX=x_1^2+x_2^2+x_3^2+2(x_1x_2+x_1x_3+x_2x_3)=(x_1+x_2+x_3)^2.\end{aligned}$$

Let y 1 = x 1 + x 2 + x 3. Then E[y 1] = 0 and

$$\displaystyle \begin{aligned} {\mathrm{Var}}(y_1)&={\mathrm{Var}}(x_1)+{\mathrm{Var}}(x_2)+{\mathrm{Var}}(x_3)+2[{\mathrm{Cov}}(x_1,x_2)+{\mathrm{Cov}}(x_1,x_3)+{\mathrm{Cov}}(x_2,x_3)]\\ &=\frac{1}{3}[2+2+3+0-2-2]=\frac{3}{3}=1.\end{aligned} $$

Hence, y 1 = x 1 + x 2 + x 3 has E[u 1] = 0 and Var(u 1) = 1, and since it is a linear function of the real normal vector X, y 1 is a standard normal. Accordingly, \(q=y_1^2\sim \) \(\chi ^2_1\). In order to apply Theorem 3.4.2, consider AΣA:

Then, by Theorem 3.4.2, \(q=X^{\prime }AX\sim \chi ^2_r\) where r is the rank of A. In this case, the rank of A is 1 and hence \(y\sim \chi ^2_1\). This completes the calculations in connection with respect to (1). When μO, \(u\sim \chi ^2_1(\lambda )\), a noncentral chisquare with noncentrality parameter \(\lambda =\frac {1}{2}\mu ^{\prime }\,\varSigma ^{-1}\mu \). Let us compute Σ −1 by making use of the formula \(\varSigma ^{-1}=\frac {1}{|\varSigma |}[{\mathrm {Cof}}(\varSigma )]^{\prime }\) where Cof(Σ) is the matrix of cofactors of Σ wherein each of its elements is replaced by its cofactor. Now,

Then,

This completes the computations for the second part.

3.4.1. Independence of quadratic forms

Another relevant result in the real case pertains to the independence of quadratic forms. The concept of chisquaredness and the independence of quadratic forms are prominently encountered in the theoretical underpinnings of statistical techniques such as the Analysis of Variance, Regression and Model Building when it is assumed that the errors are normally distributed. First, we state a result on the independence of quadratic forms in Gaussian vectors whose components are independently distributed.

Theorem 3.4.3

Let u 1 = X AX, A = A , and u 2 = X BX, B = B , be two quadratic forms in X  N p(μ, I). Then u 1 and u 2 are independently distributed if and only if AB = O.

Note that independence property holds whether μ = O or μO. The result will still be valid if the covariance matrix is σ 2 I where σ 2 is a positive real scalar quantity. If the covariance matrix is Σ > O, the statement of Theorem 3.4.3 needs modification.

Proof

Since AB = O, we have AB = O = O  = (AB) = B A  = BA, which means that A and B commute. Then there exists an orthonormal matrix P, PP  = I, P P = I, that diagonalizes both A and B, and

$$\displaystyle \begin{aligned} AB&=O\Rightarrow PABP=O\Rightarrow P^{\prime}APP^{\prime}BP=D_1D_2=O,\\ D_1&={\mathrm{diag}}(\lambda_1,\ldots, \lambda_p), \ D_2={\mathrm{diag}}(\nu_1,\ldots, \nu_p),{} \end{aligned} $$
(3.4.5)

where λ 1, …, λ p are the eigenvalues of A and ν 1, …, ν p are the eigenvalues of B. Let Y = P X, then the canonical representations of u 1 and u 2 are the following:

$$\displaystyle \begin{aligned} u_1&=\lambda_1y_1^2+\cdots+\lambda_py_p^2{} \end{aligned} $$
(3.4.6)
$$\displaystyle \begin{aligned} u_2&=\nu_1y_1^2+\cdots+\nu_py_p^2{} \end{aligned} $$
(3.4.7)

where y j’s are real and independently distributed. But D 1 D 2 = O means that whenever a λ j≠0 then the corresponding ν j = 0 and vice versa. In other words, whenever a y j is present in (3.4.6), it is absent in (3.4.7) and vice versa, or the independent variables y j’s are separated in (3.4.6) and (3.4.7), which implies that u 1 and u 2 are independently distributed.

The necessity part of the proof which consists in showing that AB = O given that A = A , B = B and u 1 and u 2 are independently distributed, cannot be established by retracing the steps utilized for proving the sufficiency as it requires more matrix manipulations. We note that there are several incorrect or incomplete proofs of Theorem 3.4.3 in the statistical literature. A correct proof for the central case is given in Mathai and Provost (1992).

If X ∼ N p(μ, Σ), Σ > O, consider the transformation \(Y=\varSigma ^{-\frac {1}{2}}X\sim N_p(\varSigma ^{-\frac {1}{2}}\mu ,I)\). Then, \(u_1=X^{\prime }AX=Y^{\prime }\varSigma ^{\frac {1}{2}}A\varSigma ^{\frac {1}{2}}Y,\ u_2=X^{\prime }BX=Y^{\prime }\varSigma ^{\frac {1}{2}}B\varSigma ^{\frac {1}{2}}Y\), and we can apply Theorem 3.4.3. In that case, the matrices being orthogonal means

$$\displaystyle \begin{aligned}\varSigma^{\frac{1}{2}}A\varSigma^{\frac{1}{2}}\varSigma^{\frac{1}{2}}B\varSigma^{\frac{1}{2}}=O\Rightarrow A\varSigma B=O.\end{aligned}$$

Thus we have the following result:

Theorem 3.4.4

Let u 1 = X AX, A = A and u 2 = X BX, B = B where X  N p(μ, Σ), Σ > O. Then u 1 and u 2 are independently distributed if and only if AΣB = O.

What about the distribution of the quadratic form y = (Xμ) Σ −1(X − μ) that is present in the exponent of the p-variate real Gaussian density? Let us first determine the mgf of y, that is,

$$\displaystyle \begin{aligned} M_y(t)&=E[{\mathrm{e}}^{ty}]=\frac{1}{(2\pi)^{\frac{p}{2}}|\varSigma|{}^{\frac{1}{2}}}\int_X{\mathrm{e}}^{t(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)-\frac{1}{2}(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)}{\mathrm{d}}X\\ &=\frac{1}{(2\pi)^{\frac{p}{2}}|\varSigma|{}^{\frac{1}{2}}}\int_X{\mathrm{e}}^{-\frac{1}{2}(1-2t)(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)}{\mathrm{d}}X\\ &=(1-2t)^{-\frac{p}{2}}\mbox{ for }(1-2t)>0.{} \end{aligned} $$
(3.4.8)

This is the mgf of a real chisquare random variable having p degrees of freedom. Hence we have the following result:

Theorem 3.4.5

When X  N p(μ, Σ), Σ > O,

$$\displaystyle \begin{aligned} y=(X-\mu)^{\prime}\varSigma^{-1}(X-\mu)\sim\chi_p^2,{} \end{aligned} $$
(3.4.9)

and if y 1 = X Σ −1 X, then \(y_1\sim \chi _p^2(\lambda )\) , that is, a real non-central chisquare with p degrees of freedom and noncentrality parameter \(\lambda =\frac {1}{2}\mu ^{\prime }\varSigma ^{-1}\mu \).

Example 3.4.2

Let X ∼ N 3(μ, Σ) and consider the quadratic forms u 1 = X AX and u 2 = X BX where

Show that u 1 and u 2 are independently distributed.

Solution 3.4.2

Let J be a 3 × 1 column vector of unities or 1’s as its elements. Then observe that A = JJ and \(B=I-\frac {1}{3}JJ^{\prime }\). Further, J J = 3, J Σ = J and hence  = JJ Σ = JJ . Then \(A\varSigma B=JJ^{\prime }[I-\frac {1}{3}JJ^{\prime }]=JJ^{\prime }-JJ^{\prime }=O\). It then follows from Theorem 3.4.4 that u 1 and u 2 are independently distributed. Now, let us prove the result independently without resorting to Theorem 3.4.4. Note that u 3 = x 1 + x 2 + x 3 = J X has a standard normal distribution as shown in Example 3.4.1. Consider the B in BX, namely \(I-\frac {1}{3}JJ^{\prime }\). The first component of BX is of the form \(\frac {1}{3}[2,-1,-1]X=\frac {1}{3}[2x_1-x_2-x_3]\), which shall be denoted by u 4. Then u 3 and u 4 are linear functions of the same real normal vector X, and hence u 3 and u 4 are real normal variables. Let us compute the covariance between u 3 and u 4, observing that J Σ = J  = J Cov(X):

$$\displaystyle \begin{aligned}{\mathrm{Cov}}(u_3,u_4)=\frac{1}{3}\,[1,1,1]{\mathrm{Cov}}(X)\left[\begin{array}{r}2\\ -1\\ -1\end{array}\right]=\frac{1}{3}\,[1,1,1]\left[\begin{array}{r}2\\ -1\\ -1\end{array}\right]=0.\end{aligned}$$

Thus, u 3 and u 4 are independently distributed. As a similar result can be established with respect to the second and third component of BX, u 3 and BX are indeed independently distributed. This implies that \(u_3^2=(J^{\prime }X)^2=X^{\prime }JJ^{\prime }X=X^{\prime }AX\) and (BX)(BX) = X B BX = X BX are independently distributed. Observe that since B is symmetric and idempotent, B B = B. This solution makes use of the following property: if Y 1 and Y 2 are real vectors or matrices that are independently distributed, then \(Y_1^{\prime }Y_1\) and \(Y_2^{\prime }Y_2\) are also independently distributed. It should be noted that the converse does not necessarily hold.

3.4a. Chisquaredness and Independence in the Complex Gaussian Case

Let the p × 1 vector \(\tilde {X}\) in the complex domain have a p-variate complex Gaussian density \(\tilde {X}\sim \tilde {N}_p(O,I)\). Let \(\tilde {u}=\tilde {X}^{*}A\tilde {X}\) be a Hermitian form, A = A where A denotes the conjugate transpose of A. Then there exists a unitary matrix Q, QQ  = I, Q Q = I, such that Q AQ = diag(λ 1, …, λ p) where λ 1, …, λ p are the eigenvalues of A. It can be shown that when A is Hermitian, which means in the real case that A = A (symmetric), all the eigenvalues of A are real. Let \(\tilde {Y}=Q^{*}\tilde {X}\) then

$$\displaystyle \begin{aligned} \tilde{u}=\tilde{X}^{*}A\tilde{X}=\tilde{Y}^{*}Q^{*}AQ\tilde{Y}=\lambda_1|\tilde{y}_1|{}^2+\cdots+\lambda_p|\tilde{y}_p|{}^2{} \end{aligned} $$
(3.4a.1)

where \(|\tilde {y}_j|\) denotes the absolute value or modulus of \(\tilde {y}_j\). If \(\tilde {y}_j=y_{j1}+iy_{j2}\) where y j1 and y j2 are real, \(i=\sqrt {(-1)}\), then \(|\tilde {y}_j|{ }^2=y_{j1}^2+y_{j2}^2\). We can obtain the following result which is the counterpart of Theorem 3.4.1:

Theorem 3.4a.1

Let \(\tilde {X}\sim \tilde {N}_p(O,I)\) and \(\tilde {u}=\tilde {X}^{*}A\tilde {X},\ A=A^{*}\) . Then \(\tilde {u}\sim \tilde {\chi }_r^2\) , a chisquare random variable having r degrees of freedom in the complex domain, if and only if A = A 2 (idempotent) and A is of rank r.

Proof

The definition of an idempotent matrix A as A = A 2 holds whether the elements of A are real or complex. Let A be idempotent and of rank r. Then r of the eigenvalues of A are unities and the remaining ones are zeros. Then the representation given in (3.4a.1) becomes

$$\displaystyle \begin{aligned}\tilde{u}=|\tilde{y}_1|{}^2+\cdots+|\tilde{y}_r|{}^2\sim\tilde{\chi}_r^2, \end{aligned}$$

a chisquare with r degrees of freedom in the complex domain, that is, a real gamma with the parameters (α = r, β = 1) whose mgf is (1 − t)r, 1 − t > 0. For proving the necessity, let us assume that \(\tilde {u}\sim \tilde {\chi }_r^2\), its mgf being \(M_{\tilde {u}}(t)=(1-t)^{-r}\) for 1 − t > 0. But from (3.4a.1), \(|\tilde {y}_j|{ }^2\sim \tilde {\chi }_1^2\) and its mgf is (1 − t)−1 for 1 − t > 0. Hence the mgf of \(\lambda _j|\tilde {y}_j|{ }^2\) is \(M_{\lambda _j|\tilde {y}_j|{ }^2}(t)=(1-\lambda _jt)^{-1}\) for 1 − λ j t > 0, and we have the following identity:

$$\displaystyle \begin{aligned} (1-t)^{-r}=\prod_{j=1}^p(1-\lambda_jt)^{-1}.{} \end{aligned} $$
(3.4a.2)

Take the natural logarithm on both sides of (3.4a.2, expand and compare the coefficients of \(t,\frac {t^2}{2},\ldots \) to obtain

$$\displaystyle \begin{aligned} r=\sum_{j=1}^p\lambda_j=\sum_{j=1}^p\lambda_j^2=\cdots {} \end{aligned} $$
(3.4a.3)

The only possibility for the λ j’s in (3.4a.3) is that r of them are unities and the remaining ones, zeros. This property, combined with A = A guarantees that A = A 2 and A is of rank r. This completes the proof.

An extension of Theorem 3.4a.1 which is the counterpart of Theorem 3.4.2 can also be obtained. We will simply state it as the proof is parallel to that provided in the real case.

Theorem 3.4a.2

Let \(\tilde {X}\sim \tilde {N}_p(O,\varSigma ),\ \varSigma >O\) and \(\tilde {u}=\tilde {X}^{*}A\tilde {X},\ A=A^{*},\) be a Hermitian form. Then \(\tilde {u}\sim \tilde {\chi }_r^2\) , a chisquare random variable having r degrees of freedom in the complex domain, if and only if A = AΣA and A is of rank r.

Example 3.4a.1

Let \(\tilde {X}\sim \tilde {N}_3(\tilde {\mu },\varSigma ), \ \tilde {u}=\tilde {X}^{*}A\tilde {X}\) where

First determine whether Σ can be a covariance matrix. Then determine the distribution of \(\tilde {u}\) by making use of Theorem 3.4a.2 as well as independently, that is, without using Theorem 3.4a.2, for the cases (1) \(\tilde {\mu }=O\); (2) \(\tilde {\mu }\) as given above.

Solution 3.4a.1

Note that Σ = Σ , that is, Σ is Hermitian. Let us verify that Σ is a Hermitian positive definite matrix. Note that Σ must be either positive definite or positive semi-definite to be a covariance matrix. In the semi-definite case, the density of \(\tilde {X}\) does not exist. Let us check the leading minors: det((3)) = 3 > 0, , \({\mathrm {det}}(\varSigma )=\frac {13}{3^3}>0\) [evaluated by using the cofactor expansion which is the same in the complex case]. Hence Σ is Hermitian positive definite. In order to apply Theorem 3.4a.2, we must now verify that AΣA = A when \(\tilde {\mu }=O\). Observe the following: A = JJ , J A = 3J , J J = 3 where J  = [1, 1, 1]. Hence \(A\varSigma A=(JJ^{\prime })\varSigma (JJ^{\prime })=J(J^{\prime }\varSigma )JJ^{\prime }=\frac {1}{3}(JJ^{\prime })(JJ^{\prime })=\frac {1}{3}J(J^{\prime }J)J^{\prime }=\frac {1}{3}J(3)J^{\prime }=JJ^{\prime }=A\). Thus the condition holds and by Theorem 3.4a.2, \(\tilde {u}\sim \tilde {\chi }_1^2\) in the complex domain, that is, \(\tilde {u}\) a real gamma random variable with parameters (α = 1, β = 1) when \(\tilde {\mu }=O\). Now, let us derive this result without using Theorem 3.4a.2. Let \(\tilde {u}_1=\tilde {x}_1+\tilde {x}_2+\tilde {x}_3\) and A 1 = (1, 1, 1). Note that \(A_1^{\prime }\tilde {X}=\tilde {u}_1\), the sum of the components of \(\tilde {X}\). Hence \(\tilde {u}_1^{*}\tilde {u}_1=\tilde {X}^{*}A_1A_1^{\prime } \tilde {X}=\tilde {X}^{*}A\tilde {X}\). For \(\tilde {\mu }=O\), we have \(E[\tilde {u}_1]=0\) and

$$\displaystyle \begin{aligned} {\mathrm{Var}}(\tilde{u}_1)&={\mathrm{Var}}(\tilde{x}_1)+{\mathrm{Var}}(\tilde{x}_2)+{\mathrm{Var}}(\tilde{x}_3)+[{\mathrm{Cov}}(\tilde{x}_1,\tilde{x}_2)+{\mathrm{Cov}}(\tilde{x}_2,\tilde{x}_1)]\\ &\ \ \ \ +[{\mathrm{Cov}}(\tilde{x}_1,\tilde{x}_3)+{\mathrm{Cov}}(\tilde{x}_3,\tilde{x}_1)]+[{\mathrm{Cov}}(\tilde{x}_2,\tilde{x}_3)+{\mathrm{Cov}}(\tilde{x}_3,\tilde{x}_2)]\\ &=\frac{1}{3}\{3+3+3+[-(1+i)-(1-i)]+[-(1-i)-(1+i)]\\ &\ \ \ \ +[-(1+i)-(1-i)]\}=\frac{1}{3}[9-6]=1.\end{aligned} $$

Thus, \(\tilde {u}_1\) is a standard normal random variable in the complex domain and \(\tilde {u}_1^{*}\tilde {u}_1\sim \tilde {\chi }^2_1\), a chisquare random variable with one degree of freedom in the complex domain, that is, a real gamma random variable with parameters (α = 1, β = 1).

For \(\tilde {\mu }=(2+i,-i,2i)^{\prime }\), this chisquare random variable is noncentral with noncentrality parameter \(\lambda =\tilde {\mu }^{*}\varSigma ^{-1}\tilde {\mu }\). Hence, the inverse of Σ has to be evaluated. To do so, we will employ the formula \(\varSigma ^{-1}=\frac {1}{|\varSigma |}[{\mathrm {Cof}}(\varSigma )]^{\prime }\), which also holds for the complex case. Earlier, the determinant was found to be equal to \(\frac {13}{3^3}\) and

and

This completes the computations.

3.4a.1. Independence of Hermitian forms

We shall mainly state certain results in connection with Hermitian forms in this section since they parallel those pertaining to the real case.

Theorem 3.4a.3

Let \(\tilde {u}_1=\tilde {X}^{*}A\tilde {X}, \ A=A^{*}, \) and \( \tilde {u}_2=\tilde {X}^{*}B\tilde {X}, \ B=B^{*}\) , where \(\tilde {X}\sim \tilde {N}(\mu ,I)\) . Then, \(\tilde {u}_1\) and \(\tilde {u}_2\) are independently distributed if and only if AB = O.

Proof

Let us assume that AB = O. Then

$$\displaystyle \begin{aligned} AB=O=O^{*}=(AB)^{*}=B^{*}A^{*}=BA.{}\end{aligned} $$
(3.4a.4)

This means that there exists a unitary matrix Q, QQ  = I, Q Q = I, that will diagonalize both A and B. That is, Q AQ = diag(λ 1, …, λ p) = D 1, Q BQ = diag(ν 1, …, ν p) = D 2 where λ 1, …, λ p are the eigenvalues of A and ν 1, …, ν p are the eigenvalues of B. But AB = O implies that D 1 D 2 = O. As well,

$$\displaystyle \begin{aligned} \tilde{u}_1&=\tilde{X}^{*}A\tilde{X}=\tilde{Y}^{*}Q^{*}AQ\tilde{Y}=\lambda_1|\tilde{y}_1|{}^2+\cdots+\lambda_p|\tilde{y}_p|{}^2,{} \end{aligned} $$
(3.4a.5)
$$\displaystyle \begin{aligned} \tilde{u}_2&=\tilde{X}^{*}B\tilde{X}=\tilde{Y}^{*}Q^{*}BQ\tilde{Y}=\nu_1|\tilde{y}_1|{}^2+\cdots+\nu_p|\tilde{y}_p|{}^2.{} \end{aligned} $$
(3.4a.6)

Since D 1 D 2 = O, whenever a λ j≠0, the corresponding ν j = 0 and vice versa. Thus the independent variables \(\tilde {y}_j\)’s are separated in (3.4a.5) and (3.4a.6) and accordingly, \(\tilde {u}_1\) and \(\tilde {u}_2\) are independently distributed. The proof of the necessity which requires more matrix algebra, will not be provided herein. The general result can be stated as follows:

Theorem 3.4a.4

Letting \(\tilde {X}\sim \tilde {N}_p(\mu ,\varSigma ),\ \varSigma >O,\) the Hermitian forms \(\tilde {u}_1=\tilde {X}^{*}A\tilde {X},\) A = A , and \(\tilde {u}_2=\tilde {X}^{*}B\tilde {X},\ B=B^{*},\) are independently distributed if and only if AΣB = O.

Now, consider the density of the exponent in the p-variate complex Gaussian density. What will then be the density of \(\tilde {y}=(\tilde {X}-\tilde {\mu })^{*}\varSigma ^{-1}(\tilde {X}-\tilde {\mu })\)? Let us evaluate the mgf of \(\tilde {y}\). Observing that \(\tilde {y}\) is real so that we may take \(E[{\mathrm {e}}^{t\tilde {y}}]\) where t is a real parameter, we have

$$\displaystyle \begin{aligned} M_{\tilde{y}}(t)&=E[{\mathrm{e}}^{t\tilde{y}}]=\frac{1}{\pi^p|{\mathrm{det}}(\varSigma)|}\int_{\tilde{X}}{\mathrm{e}}^{t(\tilde{X}-\tilde{\mu})^{*}\varSigma^{-1}(\tilde{X}-\tilde{\mu})-(\tilde{X}-\tilde{\mu})^{*}\varSigma^{-1}(\tilde{X}-\tilde{\mu})}{\mathrm{d}}\tilde{X}\\ &=\frac{1}{\pi^p|{\mathrm{det}}(\varSigma)|}\int_{\tilde{X}}{\mathrm{e}}^{-(1-t)(\tilde{X}-\tilde{\mu})^{*}\varSigma^{-1}(\tilde{X}-\tilde{\mu})}{\mathrm{d}}\tilde{X}\\ &=(1-t)^{-p}\mbox{ for }1-t>0.{} \end{aligned} $$
(3.4a.7)

This is the mgf of a real gamma random variable with the parameters (α = p, β = 1) or a chisquare random variable in the complex domain with p degrees of freedom. Hence we have the following result:

Theorem 3.4a.5

When \(\tilde {X}\sim \tilde {N}_p(\tilde {\mu },\varSigma ),\varSigma >O\) then \(\tilde {y}=(\tilde {X}-\tilde {\mu })^{*}\varSigma ^{-1}(\tilde {X}-\tilde {\mu })\) is distributed as a real gamma random variable with the parameters (α = p, β = 1) or a chisquare random variable in the complex domain with p degrees of freedom, that is,

$$\displaystyle \begin{aligned} \tilde{y}\sim {\mathrm{gamma}}(\alpha=p,\beta=1)\mathit{\mbox{ or }}\tilde{y}\sim\tilde{\chi}_p^2.{} \end{aligned} $$
(3.4a.8)

Example 3.4a.2

Let \(\tilde {X}\sim \tilde {N}_3(\tilde {\mu },\varSigma ),\ \tilde {u}_1=\tilde {X}^{*}A\tilde {X},\ \tilde {u}_2=\tilde {X}^{*}B\tilde {X}\) where

(1) By making use of Theorem 3.4a.4, show that \(\tilde {u}_1\) and \(\tilde {u}_2\) are independently distributed. (2) Show the independence of \(\tilde {u}_1\) and \(\tilde {u}_2\) without using Theorem 3.4a.4.

Solution 3.4a.2

In order to use Theorem 3.4a.4, we have to show that AΣB = O irrespective of \(\tilde {\mu }\). Note that \(A=JJ^{\prime }, J^{\prime }=[1,1,1], J^{\prime }J=3,J^{\prime }\varSigma =\frac {1}{3}J^{\prime },J^{\prime }B=O\). Hence \(A\varSigma =JJ^{\prime }\varSigma =J(J^{\prime }\varSigma )=\frac {1}{3}JJ^{\prime }\Rightarrow A\varSigma B=\frac {1}{3}JJ^{\prime }B=\frac {1}{3}J(J^{\prime }B)=O.\) This proves the result that \(\tilde {u}_1\) and \(\tilde {u}_2\) are independently distributed through Theorem 3.4a.4. This will now be established without resorting to Theorem 3.4a.4. Let \(\tilde {u}_3=\tilde {x}_1+\tilde {x}_2+\tilde {x}_3=J^{\prime }X\) and \(\tilde {u}_4=\frac {1}{3}[2\tilde {x}_1-\tilde {x}_2-\tilde {x}_3]\) or the first row of \(B\tilde {X}\). Since independence is not affected by the relocation of the variables, we may assume, without any loss of generality, that \(\tilde {\mu }=O\) when considering the independence of \(\tilde {u}_3\) and \(\tilde {u}_4\). Let us compute the covariance between \(\tilde {u}_3\) and \(\tilde {u}_4\):

Thus, \(\tilde {u}_3\) and \(\tilde {u}_4\) are uncorrelated and hence independently distributed since both are linear functions of the normal vector \(\tilde {X}\). This property holds for each row of \(B\tilde {X}\) and therefore \(\tilde {u}_3\) and \(B\tilde {X}\) are independently distributed. However, \(\tilde {u}_1=\tilde {X}^{*}A\tilde {X}=\tilde {u}_3^{*}\tilde {u}_3\) and hence \(\tilde {u}_1\) and \((B\tilde {X})^{*}(B\tilde {X})=\tilde {X}^{*}B^{*}B\tilde {X}=\tilde {X}^{*}B\tilde {X}=\tilde {u}_2\) are independently distributed. This completes the computations. The following property was utilized: Let \(\tilde {U}\) and \(\tilde {V}\) be vectors or matrices that are independently distributed. Then, all the pairs \((\tilde {U},\tilde {V}^{*}),(\tilde {U},\tilde {V}\tilde {V}^{*}), (\tilde {U},\tilde {V}^{*}\tilde {V}),\ldots , (\tilde {U}\tilde {U}^{*},\tilde {V}\tilde {V}^{*}) \), are independently distributed whenever the quantities are defined. The converses need not hold when quadratic terms are involved; for instance, \((\tilde {U}\tilde {U}^{*},\tilde {V}\tilde {V}^{*})\) being independently distributed need not imply that \((\tilde {U},\tilde {V})\) are independently distributed.

Exercises 3.4

3.2.23

In the real case on the right side of (3.4.4), compute the densities of the following items: (i) \(z_1^2\), (ii) \(\lambda _1z_1^2\), (iii) \(\lambda _1z_1^2+\lambda _2z_2^2\), (iv) \(\lambda _1z_1^2+\cdots +\lambda _4z_4^2\) if λ 1 = λ 2, λ 3 = λ 4 for μ = O.

3.2.24

Compute the density of u = X AX, A = A in the real case when (i) X ∼ N p(O, Σ), Σ > O, (ii) X ∼ N p(μ, Σ), Σ > O.

3.2.25

Modify the statement in Theorem 3.4.1 if (i) X ∼ N p(O, σ 2 I), σ 2 > 0, (ii) X ∼ N p(μ, σ 2 I), μO.

3.2.26

Prove the only if part in Theorem 3.4.3

3.2.27

Establish the cases (i), (ii), (iii) of Exercise 3.2.23 in the corresponding complex domain.

3.2.28

Supply the proof for the only if part in Theorem 3.4a.3.

3.2.29

Can a matrix A having at least one complex element be Hermitian and idempotent at the same time? Prove your statement.

3.2.30

Let the p × 1 vector X have a real Gaussian density N p(O, Σ), Σ > O. Let u = X AX, A = A . Evaluate the density of u for p = 2 and show that this density can be written in terms of a hypergeometric series of the 1 F 1 type.

3.2.31

Repeat Exercise 3.2.30 if \(\tilde {X}\) is in the complex domain, \(\tilde {X}\sim \tilde {N}_p(O,\varSigma ),\ \varSigma >O\).

3.2.32

Supply the proofs for the only if part in Theorems 3.4.4 and 3.4a.4.

3.5. Samples from a p-variate Real Gaussian Population

Let the p × 1 real vectors X 1, …, X n be iid as N p(μ, Σ), Σ > O. Then, the collection X 1, …, X n is called a simple random sample of size n from this N p(μ, Σ), Σ > O. Then the joint density of X 1, …, X n is the following:

$$\displaystyle \begin{aligned} L&=\prod_{j=1}^nf(X_j)=\prod_{j=1}^n\frac{{\mathrm{e}}^{-\frac{1}{2}(X_j-\mu)^{\prime}\varSigma^{-1}(X_j-\mu)}}{(2\pi)^{\frac{p}{2}}|\varSigma|{}^{\frac{1}{2}}}\\ &=[(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}]^{-1}{\mathrm{e}}^{-\frac{1}{2}\sum_{j=1}^n(X_j-\mu)^{\prime}\varSigma^{-1}(X_j-\mu)}.{} \end{aligned} $$
(3.5.1)

This L at an observed set of X 1, …, X n is called the likelihood function. Let the sample matrix, which is p × n, be denoted by a bold-faced X. In order to avoid too many symbols, we will use X to denote the p × n matrix in this section. In earlier sections, we had used X to denote a p × 1 vector. Then

(i)

Let the sample average be denoted by \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\). Then \(\bar {X}\) will be of the following form:

(ii)

Let the bold-faced \({\bar {\mathbf {X}}}\) be defined as follows:

Then,

and

$$\displaystyle \begin{aligned} S=({\mathbf{X}}-{\bar{\mathbf{X}}})({\mathbf{X}}-{\bar{\mathbf{X}}})^{\prime}=(s_{ij}),\end{aligned} $$

so that

$$\displaystyle \begin{aligned}s_{ij}=\sum_{k=1}^n(x_{ik}-\bar{x}_i)(x_{jk}-\bar{x}_j).\end{aligned} $$

S is called the sample sum of products matrix or the corrected sample sum of products matrix, corrected in the sense that the averages are deducted from the observations. As well, \(\frac {1}{n}s_{ii}\) is called the sample variance on the component x i of any vector X k, referring to (i) above, and \(\frac {1}{n}s_{ij},\ i\ne j,\) is called the sample covariance on the components x i and x j of any X k, \(\frac {1}{n}S\) being referred to as the sample covariance matrix. The exponent in L can be simplified by making use of the following properties: (1) When u is a 1 × 1 matrix or a scalar quantity, then tr(u) = tr(u ) = u = u . (2) For two matrices A and B, whenever AB and BA are defined, tr(AB) = tr(BA) where AB need not be equal to BA. Observe that the following quantity is real scalar and hence, it is equal to its trace:

$$\displaystyle \begin{aligned} \sum_{j=1}^n(X_j-\mu)^{\prime}\varSigma^{-1}(X_j-\mu)&={\mathrm{tr}}\Big[\sum_{j=1}^n(X_j-\mu)^{\prime}\varSigma^{-1}(X_j-\mu)\Big]\\ &={\mathrm{tr}}[\varSigma^{-1}\sum_{j=1}^n(X_j-\mu)(X_j-\mu)^{\prime}]\\ &={\mathrm{tr}}[\varSigma^{-1}\sum_{j=1}^n(X_j-\bar{X}+\bar{X}-\mu)(X_j-\bar{X}+\bar{X}-\mu)^{\prime}\Big]\\ &={\mathrm{tr}}[\varSigma^{-1}({\mathbf{X}}-{\bar{\mathbf{X}}})({\mathbf{X}}-{\bar{\mathbf{X}}})^{\prime}]+n{\mathrm{tr}}[\varSigma^{-1}(\bar{X}-\mu)(\bar{X}-\mu)^{\prime}]\\ &={\mathrm{tr}}(\varSigma^{-1}S)+n(\bar{X}-\mu)^{\prime}\varSigma^{-1}(\bar{X}-\mu)\end{aligned} $$

because

$$\displaystyle \begin{aligned}{\mathrm{tr}}(\varSigma^{-1}(\bar{X}-\mu)(\bar{X}-\mu)^{\prime})={\mathrm{tr}}((\bar{X}-\mu)^{\prime}\varSigma^{-1}(\bar{X}-\mu)=(\bar{X}-\mu)^{\prime}\varSigma^{-1}(\bar{X}-\mu). \end{aligned}$$

The right-hand side expression being 1 × 1, it is equal to its trace, and L can be written as

$$\displaystyle \begin{aligned} L=\frac{1}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{n}{2}(\bar{X}-\mu)^{\prime}\varSigma^{-1}(\bar{X}-\mu)}.{} \end{aligned} $$
(3.5.2)

If we wish to estimate the parameters μ and Σ from a set of observation vectors corresponding to X 1, …, X n, one method consists in maximizing L with respect to μ and Σ given those observations and estimating the parameters. By resorting to calculus, L is differentiated partially with respect to μ and Σ, the resulting expressions are equated to null vectors and matrices, respectively, and these equations are then solved to obtain the solutions for μ and Σ. Those estimates will be called the maximum likelihood estimates or MLE’s. We will explore this aspect later.

Example 3.5.1

Let the 3 × 1 vector X 1 be real Gaussian, X 1 ∼ N 3(μ, Σ), Σ > O. Let X j, j = 1, 2, 3, 4 be iid as X 1. Compute the 3 × 4 sample matrix X, the sample average \(\bar {X}\), the matrix of sample means \({\bar {\mathbf {X}}}\), the sample sum of products matrix S, the maximum likelihood estimates of μ and Σ, based on the following set of observations on X j, j = 1, 2, 3, 4:

Solution 3.5.1

The 3 × 4 sample matrix and the sample average are

Then \({\bar {\mathbf {X}}}\) and \({\mathbf {X}}-{\bar {\mathbf {X}}}\) are the following:

and the sample sum of products matrix S is the following:

$$\displaystyle \begin{aligned}S=[{\mathbf{X}}-{\bar{\mathbf{X}}}][{\mathbf{X}}-{\bar{\mathbf{X}}}]^{\prime}=\left[\begin{array}{rrrr}1&0&0&-1\\ 0&-1&0&1\\ -3&0&2&1\end{array}\right]\left[\begin{array}{rrr}1&0&-3\\ 0&-1&0\\ 0&0&2\\ -1&1&1\end{array}\right]=\left[\begin{array}{rrr}2&-1&-4\\ -1&2&1\\ -4&1&14\end{array}\right].\end{aligned}$$

Then, the maximum likelihood estimates of μ and Σ, denoted with a hat, are

This completes the computations.

3.5a. Simple Random Sample from a p-variate Complex Gaussian Population

Our population density is given by the following:

$$\displaystyle \begin{aligned}\tilde{f}(\tilde{X}_j)=\frac{{\mathrm{e}}^{-(\tilde{X}_j-\tilde{\mu})^{*}\tilde{\varSigma}^{-1}(\tilde{X}_j-\tilde{\mu})}}{\pi^p|{\mathrm{det}}(\varSigma)|},\ \tilde{X}_j\sim\tilde{N}_p(\tilde{\mu},\tilde{\varSigma}),\ \tilde{\varSigma} =\tilde{\varSigma}^{*}>O. \end{aligned}$$

Let \(\tilde {X}_1,\ldots , \tilde {X}_n\) be a collection of complex vector random variables iid as \(\tilde {X}_j\sim \tilde {N}_p(\tilde {\mu },\tilde {\varSigma }),\ \tilde {\varSigma }>O\). This collection is called a simple random sample of size n from this complex Gaussian population \(\tilde {f}(\tilde {X}_j)\). We will use notations parallel to those utilized in the real case. Let \({\tilde {\mathbf {X}}}=[\tilde {X}_1,\ldots ,\ \tilde {X}_n]\), \(\bar {\tilde {X}}=\frac {1}{n}(\tilde {X}_1+\cdots +\tilde {X}_n)\), \({\bar {\tilde {\mathbf {X}}}}=(\bar {\tilde {X}},\ldots , \bar {\tilde {X}}),\) and \(\tilde {S}=({\tilde {\mathbf {X}}}-{\bar {\tilde {\mathbf {X}}}})({\tilde {\mathbf {X}}}-{\bar {\tilde {\mathbf {X}}}})^{*}=\tilde {S}=(\tilde {s}_{ij})\). Then

$$\displaystyle \begin{aligned}\tilde{s}_{ij}=\sum_{k=1}^n(\tilde{x}_{ik}-\bar{\tilde{x}}_i)(\tilde{x}_{jk}-\bar{\tilde{x}}_j)^{*}\end{aligned}$$

with \(\frac {1}{n}\tilde {s}_{ij}\) being the sample covariance between the components \(\tilde {x}_i\) and \(\tilde {x}_j\), ij, of any \(\tilde {X}_k,\ k=1,\ldots , n\), \(\frac {1}{n}\tilde {s}_{ii}\) being the sample variance on the component \(\tilde {x}_i\). The joint density of \(\tilde {X}_1,\ldots , \tilde {X}_n\), denoted by \(\tilde {L}\), is given by

$$\displaystyle \begin{aligned} \tilde{L}=\prod_{j=1}^n\frac{{\mathrm{e}}^{-(\tilde{X}_j-\mu)^{*}\tilde{\varSigma}^{-1}(\tilde{X}_j-\mu)}}{\pi^p|{\mathrm{det}}(\tilde{\varSigma})|}=\frac{{\mathrm{e}}^{-\sum_{j=1}^n(\tilde{X}_j-\mu)^{*}\tilde{\varSigma}^{-1}(\tilde{X}_j-\mu)}}{\pi^{np}|{\mathrm{det}}(\tilde{\varSigma})|{}^n},{} \end{aligned} $$
(3.5a.1)

which can be simplified to the following expression by making use of steps parallel to those utilized in the real case:

$$\displaystyle \begin{aligned} L=\frac{{\mathrm{e}}^{-{\mathrm{tr}}(\tilde{\varSigma}^{-1}\tilde{S})-n(\bar{\tilde{X}}-\mu)^{*}\tilde{\varSigma}^{-1}(\bar{\tilde{X}}-\mu)}}{\pi^{np}|{\mathrm{det}}(\tilde{\varSigma})|{}^n}.{} \end{aligned} $$
(3.5a.2)

Example 3.5a.1

Let the 3 × 1 vector \(\tilde {X}_1\) in the complex domain have a complex trivariate Gaussian distribution \(\tilde {X}_1\sim \tilde {N}_3(\tilde {\mu },\ \tilde {\varSigma }),\ \tilde {\varSigma }>O\). Let \(\tilde {X}_j,\ j=1,2,3,4\) be iid as \(\tilde {X}_1\). With our usual notations, compute the 3 × 4 sample matrix \({\tilde {\mathbf {X}}}\), the sample average \(\bar {\tilde {X}}\), the 3 × 4 matrix of sample averages \({\bar {\tilde {\mathbf {X}}}}\), the sample sum of products matrix \(\tilde {S}\) and the maximum likelihood estimates of \(\tilde {\mu }\) and \(\tilde {\varSigma }\) based on the following set of observations on \(\tilde {X}_j,\ j=1,2,3,4\):

Solution 3.5a.1

The sample matrix \(\tilde {X}\) and the sample average \(\bar {\tilde {X}}\) are

Then, with our usual notations, \({\bar {\tilde {\mathbf {X}}}}\) and \({\tilde {\mathbf {X}}}-{\bar {\tilde {\mathbf {X}}}}\) are the following:

Thus, the sample sum of products matrix \(\tilde {S}\) is

The maximum likelihood estimates are as follows:

where \(\tilde {S}\) is given above. This completes the computations.

3.5.1. Some simplifications of the sample matrix in the real Gaussian case

The p × n sample matrix is

where the rows are iid variables on the components of the p-vector X 1. For example, (x 11, x 12, …, x 1n) are iid variables distributed as the first component of X 1. Let

Consider the matrix

$$\displaystyle \begin{aligned}{\bar{\mathbf{X}}}=(\bar{X},\ldots, \bar{X})=\frac{1}{n}{\mathbf{X}}JJ^{\prime}={\mathbf{X}}B,B=\frac{1}{n}JJ^{\prime}. \end{aligned}$$

Then,

$$\displaystyle \begin{aligned}{\mathbf{X}}-{\bar{\mathbf{X}}}={\mathbf{X}}A, \ A=I-B=I-\frac{1}{n}JJ^{\prime}.\end{aligned}$$

Observe that A = A 2, B = B 2, AB = O, A = A and B = B where both A and B are n × n matrices. Then X A and X B are p × n and, in order to determine the mgf, we will take the p × n parameter matrices T 1 and T 2. Accordingly, the mgf of X A is \(M_{{\mathbf {X}}A}(T_1)=E[{\mathrm {e}}^{{\mathrm {tr}}(T_1^{\prime }{\mathbf {X}}A)}], \) that of X B is \( M_{{\mathbf {X}}B}(T_2)=E[{\mathrm {e}}^{{\mathrm {tr}}(T_2^{\prime }{\mathbf {X}}B)}]\) and the joint mgf is \(E[{\mathrm {e}}^{{\ tr}(T_1^{\prime }{\mathbf {X}}A)+{\mathrm {tr}}(T_2^{\prime }{\mathbf {X}}B)}]\). Let us evaluate the joint mgf for X j ∼ N p(O, I),

$$\displaystyle \begin{aligned}E[{\mathrm{e}}^{{\mathrm{tr}}(T_1^{\prime}{\mathbf{X}}A)+{\mathrm{tr}}(T_2^{\prime}{\mathbf{X}}B)}]=\int_{{\mathbf{X}}}\frac{1}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}{\mathrm{e}}^{{\mathrm{tr}}(T_1^{\prime}{\mathbf{X}}A)+{\mathrm{tr}}(T_2^{\prime}{\mathbf{X}}B)-\frac{1}{2}{\mathrm{tr}}({\mathbf{XX}^{\prime}})}{\mathrm{d}}{\mathbf{X}}. \end{aligned}$$

Let us simplify the exponent,

$$\displaystyle \begin{aligned} -\frac{1}{2}\{{\mathrm{tr}}({\mathbf{XX}^{\prime}})-2{\mathrm{tr}}[{\mathbf{X}}(AT_1^{\prime}+BT_2^{\prime})]\}. \end{aligned} $$
(i)

If we expand tr[(X − C)(XC)] for some C, we have

$$\displaystyle \begin{aligned} {\mathrm{tr}}({\mathbf{XX}^{\prime}})&-{\mathrm{tr}}(C{{\mathbf{X}}^{\prime}})-{\mathrm{tr}}({\mathbf{X}}C^{\prime})+{\mathrm{tr}}(CC^{\prime})\\ &={\mathrm{tr}}({\mathbf{XX}^{\prime}})-2{\mathrm{tr}}({\mathbf{X}}C^{\prime})+{\mathrm{tr}}(CC^{\prime}) \end{aligned} $$
(ii)

as tr(X C ) = tr(C X ) even though C X X C . On comparing (i) and (ii), we have \(C^{\prime }=AT_1^{\prime }+BT_2^{\prime }\), and then

$$\displaystyle \begin{aligned} {\mathrm{tr}}(CC^{\prime})&={\mathrm{tr}}[(T_1A^{\prime}+T_2B^{\prime})(AT_1^{\prime}+BT_2^{\prime})]\\ &={\mathrm{tr}}(T_1A^{\prime}AT_1^{\prime})+{\mathrm{tr}}(T_2B^{\prime}BT_2^{\prime})+{\mathrm{tr}}(T_1A^{\prime}BT_2^{\prime})+{\mathrm{tr}}(T_2B^{\prime}AT_1^{\prime}). \end{aligned} $$
(iii)

Since the integral over X − C will absorb the normalizing constant and give 1, the joint mgf is \({\mathrm {e}}^{{\mathrm {tr}}(CC^{\prime })}\). Proceeding exactly the same way, it is seen that the mgf of X A and X B are respectively

$$\displaystyle \begin{aligned}M_{{\mathbf{X}}A}(T_1)={\mathrm{e}}^{\frac{1}{2}{\mathrm{tr}}(T_1A^{\prime}AT_1^{\prime})}\ \ {\mathrm{and} }\ \ M_{{\mathbf{X}}B}(T_2)={\mathrm{e}}^{\frac{1}{2}{\mathrm{tr}}(T_2B^{\prime}BT_2^{\prime})}.\end{aligned}$$

The independence of X A and X B implies that the joint mgf should be equal to the product of the individual mgf’s. In this instance, this is the case as A B = O, B A = O. Hence, the following result:

Theorem 3.5.1

Assuming that X 1, …, X n are iid as X j ∼ N p(O, I), let the p × n matrix X = (X 1, …, X n) and \(\bar {X}=\frac {1}{n}{\mathbf {X}}J,\ J^{\prime }=(1,1,..,1)\) . Let \({\bar {\mathbf {X}}}={\mathbf {X}}B \) and \( {\mathbf {X}}-{\bar {\mathbf {X}}}={\mathbf {X}}A\) so that A = A , B = B , A 2 = A, B 2 = B, AB = O. Letting U 1 = X B and U 2 = X A, it follows that U 1 and U 2 are independently distributed.

Now, appealing to a general result to the effect that if U and V  are independently distributed then U and VV as well as U and V V  are independently distributed whenever VV and V V  are defined, the next result follows.

Theorem 3.5.2

For the p × n matrix X , let X A and X B be as defined in Theorem 3.5.1 . Then X B and X AA X  = X A X  = S are independently distributed and, consequently, the sample mean \(\bar {X}\) and the sample sum of products matrix S are independently distributed.

As μ is absent from the previous derivations, the results hold for a N p(μ, I) population. If the population is N p(μ, Σ), Σ > O, it suffices to make the transformation \(Y_j=\varSigma ^{-\frac {1}{2}}X_j\) or \({\mathbf {Y}}=\varSigma ^{-\frac {1}{2}}{\mathbf {X}}\), in which case \({\mathbf {X}}=\varSigma ^{\frac {1}{2}}{\mathbf {Y}}\). Then, \({\mathrm {tr}}(T_1^{\prime }{\mathbf {X}}A)={\mathrm {tr}}(T_1^{\prime }\varSigma ^{\frac {1}{2}}{\mathbf {Y}}A)={\mathrm {tr}}[(T_1^{\prime }\varSigma ^{\frac {1}{2}}){\mathbf {Y}}A]\) so that \(\varSigma ^{\frac {1}{2}}\) is combined with \(T_1^{\prime }\), which does not affect Y A. Thus, we have the general result that is stated next.

Theorem 3.5.3

Letting the population be N p(μ, Σ), Σ > O, and X, A, B, S, and \( \bar {X}\) be as defined in Theorem 3.5.1 , it then follows that U 1 = X A and U 2 = X B are independently distributed and thereby, that the sample mean \(\bar {X}\) and the sample sum of products matrix S are independently distributed.

3.5.2. Linear functions of the sample vectors

Let the X j’s, j = 1, …, n, be iid as X j ∼ N p(μ, Σ), Σ > O. Let us consider a linear function a 1 X 1 + a 2 X 2 + ⋯ + a n X n where a 1, …, a n are real scalar constants. Then the mgf’s of \(X_j, \ a_jX_j,\ U=\sum _{j=1}^na_jX_j\) are obtained as follows:

$$\displaystyle \begin{aligned} M_{X_j}(T)&=E[{\mathrm{e}}^{T^{\prime}X_j}]={\mathrm{e}}^{T^{\prime}\mu+\frac{1}{2}T^{\prime}\varSigma T},\ ~M_{a_jX_j}(T)={\mathrm{e}}^{T^{\prime}(a_j\mu)+\frac{1}{2}a_j^2T^{\prime}\varSigma T}\\ M_{\sum_{j=1}^na_jX_j}(T)&=\prod_{j=1}^nM_{a_jX_j}(T)={\mathrm{e}}^{T^{\prime}\mu(\sum_{j=1}^na_j)+\frac{1}{2}(\sum_{j=1}^na_j^2)T^{\prime}\varSigma T},\end{aligned} $$

which implies that \(U=\sum _{j=1}^na_jX_j\) is distributed as a real normal vector random variable with parameters \((\sum _{j=1}^na_j)\mu \,\) and \(\,(\sum _{j=1}^na_j^2)\varSigma \), that is, \(U\sim N_p(\mu (\sum _{j=1}^na_j), (\sum _{j=1}^na_j^2)\varSigma )\). Thus, the following result:

Theorem 3.5.4

Let the X j ’s be iid N p(μ, Σ), Σ > O, j = 1, …, n, and U = a 1 X 1 + ⋯ + a n X n be a linear function of the X j ’s, j = 1, …, n, where a 1, …, a n are real scalar constants. Then U is distributed as a p-variate real Gaussian vector random variable with parameters \([(\sum _{j=1}^na_j)\mu , (\sum _{j=1}^na_j^2)\varSigma ],\) that is, \(U\sim N_p((\sum _{j=1}^n a_j)\mu ,\) \( (\sum _{j=1}^na_j^2)\varSigma ),\) Σ > O.

If, in Theorem 3.5.4, \(a_j=\frac {1}{n}\), j = 1, …, n, then \(\sum _{j}^na_j=\sum _{j=1}^n\frac {1}{n}=1 \) and \( \sum _{j=1}^na_j^2=\sum _{j=1}^n(\frac {1}{n})^2=\frac {1}{n}\). However, when \(a_j=\frac {1}{n},\ j=1,\ldots , n,\) \(U=\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\). Hence we have the following corollary.

Corollary 3.5.2

Let the X j ’s be N p(μ, Σ), Σ > O, j = 1, …, n. Then, the sample mean \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) is distributed as a p-variate real Gaussian with the parameters μ and \(\frac {1}{n}\varSigma \) , that is, \(\bar {X}\sim N_p(\mu , \frac {1}{n}\varSigma ), \ \varSigma >O\).

From the representation given in Sect. 3.5.1, let X be the sample matrix, \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\), the sample average, and the p × n matrix \({\bar {\mathbf {X}}}=(\bar {X},\ldots , \bar {X}),\ {\mathbf {X}}-{\bar {\mathbf {X}}}={\mathbf {X}}(I-\frac {1}{n}JJ^{\prime })={\mathbf {X}}A,\ J^{\prime }=(1,\ldots , 1)\). Since A is idempotent of rank n − 1, there exists an orthonormal matrix P, PP  = I, P P = I, such that P AP = diag(1, …, 1, 0) ≡ D, A = PDP and X A = X PDP  = Z DP . Note that A = A , A 2 = A and D 2 = D. Thus, the sample sum of products matrix has the following representations:

$$\displaystyle \begin{aligned} S=({\mathbf{X}}-{\bar{\mathbf{X}}})({\mathbf{X}}-{\bar{\mathbf{X}}})^{\prime}={\mathbf{X}}AA^{\prime}{\mathbf{X}}={\mathbf{X}}A{{\mathbf{X}}^{\prime}}={\mathbf{Z}}DD^{\prime}{{\mathbf{Z}}^{\prime}}={\mathbf{Z}}_{n-1}{\mathbf{Z}}_{n-1}^{\prime} {} \end{aligned} $$
(3.5.3)

where Z n−1 is a p × (n − 1) matrix consisting of the first n − 1 columns of Z = X P. When

$$\displaystyle \begin{aligned}D=\left[\begin{array}{cc}I_{n-1}&O\\ O&0\end{array}\right],\ {\mathbf{Z}}=({\mathbf{Z}}_{n-1},\ Z_{(n)}),\ {\mathbf{Z}}D{{\mathbf{Z}}^{\prime}}={\mathbf{Z}}_{n-1}{\mathbf{Z}}_{n-1}^{\prime}, \end{aligned}$$

where Z (n) denotes the last column of Z. For a p-variate real normal population wherein the X j’s are iid N p(μ, Σ), Σ > O, j = 1, …, n, \(X_j-\bar {X}=(X_j-\mu )-(\bar {X}-\mu )\) and hence the population can be taken to be distributed as N p(O, Σ), Σ > O without any loss of generality. Then the n − 1 columns of Z n−1 will be iid standard normal N p(O, I). After discussing the real matrix-variate gamma distribution in the next chapter, we will show that whenever (n − 1) ≥ p, \({\mathbf {Z}}_{n-1}{\mathbf {Z}}_{n-1}^{\prime }\) has a real matrix-variate gamma distribution, or equivalently, that it is Wishart distributed with n − 1 degrees of freedom.

3.5a.1. Some simplifications of the sample matrix in the complex Gaussian case

Let the p × 1 vector \(\tilde {X}_1\) in the complex domain have a complex Gaussian density \(\tilde {N}_p(\tilde {\mu },\ \varSigma ),\ \varSigma >O\). Let \(\tilde {X}_1,\ldots , \tilde {X}_n\) be iid as \(\tilde {X}_j\sim \tilde {N}_p(\tilde {\mu },\varSigma ),\ \varSigma >O\) or \({\tilde {\mathbf {X}}}=[\tilde {X}_1,\ldots , \tilde {X}_n]\) is the sample matrix of a simple random sample of size n from a \(\tilde {N}_p(\tilde {\mu },\varSigma ),\ \varSigma >O\). Let the sample mean vector or the sample average be \(\bar {\tilde {X}}=\frac {1}{n}(\tilde {X}_1+\cdots +\tilde {X}_n)\) and the matrix of sample means be the bold-faced p × n matrix \({\bar {\tilde {\mathbf {X}}}}\). Let \(\tilde {S}=({\tilde {\mathbf {X}}}-{\bar {\tilde {\mathbf {X}}}})({\tilde {\mathbf {X}}}-{\bar {\tilde {\mathbf {X}}}})^{*}\). Then \({\bar {\tilde {\mathbf {X}}}}=\frac {1}{n}{\tilde {\mathbf {X}}}JJ^{\prime }={\tilde {\mathbf {X}}}B,\ {\tilde {\mathbf {X}}}-{\bar {\tilde {\mathbf {X}}}}={\tilde {\mathbf {X}}}(I-\frac {1}{n}JJ^{\prime })={\tilde {\mathbf {X}}}A\). Then, A = A 2, A = A  = A , B = B  = B , B = B 2, AB = O, BA = O. Thus, results parallel to Theorems 3.5.1 and 3.5.2 hold in the complex domain, and we now state the general result.

Theorem 3.5a.1

Let the population be complex p-variate Gaussian \(\tilde {N}_p(\tilde {\mu },\varSigma ),\ \varSigma >O\) . Let the p × n sample matrix be \({\tilde {\mathbf {X}}}=(\tilde {X}_1,\ldots , \tilde {X}_n)\) where \(\tilde {X}_1,\ldots , \tilde {X}_n\) are iid as \(\tilde {N}_p(\tilde {\mu },\varSigma ),\ \varSigma >O\) . Let \({\tilde {\mathbf {X}}},\ \bar {\tilde {X}},\ \tilde {S},\ {\tilde {\mathbf {X}}}A,\ {\tilde {\mathbf {X}}}B\) be as defined above. Then, \({\tilde {\mathbf {X}}}A\) and \({\tilde {\mathbf {X}}}B\) are independently distributed, and thereby the sample mean \(\bar {\tilde {X}}\) and the sample sum of products matrix \(\tilde {S}\) are independently distributed.

3.5a.2. Linear functions of the sample vectors in the complex domain

Let \(\tilde {X}_j\sim \tilde {N}_p(\tilde {\mu },\tilde {\varSigma }), \ \tilde {\varSigma }=\tilde {\varSigma }^{*}>O\) be a p-variate complex Gaussian vector random variable. Consider a simple random sample of size n from this population, in which case the \(\tilde {X}_j\)’s, j = 1, …, n, are iid as \(\tilde {N}_p(\tilde {\mu },\tilde {\varSigma }),\ \tilde {\varSigma }>O\). Let the linear function \(\tilde {U}=a_1\tilde {X}_1+\cdots +a_n\tilde {X}_n\) where a 1, …, a n are real or complex scalar constants. Then, following through steps parallel to those provided in Sect. 3.5.2, we obtain the following mgf:

$$\displaystyle \begin{aligned}\tilde{M}_{\tilde{U}}(\tilde{T})={\mathrm{e}}^{\Re(\tilde{T}^{*}\tilde{\mu}(\sum_{j=1}^na_j))+\frac{1}{4}(\sum_{j=1}^na_ja_j^{*})\tilde{T}^{*}\tilde{\varSigma}\tilde{T}} \end{aligned}$$

where \(\sum _{j=1}^na_ja_j^{*}=|\tilde {a}_1|{ }^2+\cdots +|\tilde {a}_n|{ }^2\). For example, if \(a_j=\frac {1}{n},\ j=1,\ldots , n\), then \(\sum _{j=1}^na_j=1\) and \(\sum _{j=1}^na_ja_j^{*}=\frac {1}{n}\). Hence, we have the following result and the resulting corollary.

Theorem 3.5a.2

Let the p × 1 complex vector have a p-variate complex Gaussian distribution \(\tilde {N}_p(\tilde {\mu }, \tilde {\varSigma }),\ \tilde {\varSigma }=\tilde {\varSigma }^{*}>O\) . Consider a simple random sample of size n from this population, with the \(\tilde {X}_j\) ’s, j = 1, …, n, being iid as this p-variate complex Gaussian. Let a 1, …, a n be scalar constants, real or complex. Consider the linear function \(\tilde {U}=a_1\tilde {X}_1+\cdots +a_n\tilde {X}_n\) . Then \(\tilde {U}\sim \tilde {N}_p(\tilde {\mu }(\sum _{j=1}^na_j), (\sum _{j=1}^na_ja_j^{*})\tilde {\varSigma })\) , that is, \(\tilde {U}\) has a p-variate complex Gaussian distribution with the parameters \((\sum _{j=1}^na_j)\tilde {\mu }\) and \((\sum _{j=1}^na_ja_j^{*})\tilde {\varSigma }\).

Corollary 3.5a.1

Let the population and sample be as defined in Theorem 3.5a.2 . Then the sample mean \(\bar {\tilde {X}}=\frac {1}{n}(\tilde {X}_1+\cdots +\tilde {X}_n)\) is distributed as a p-variate complex Gaussian with the parameters \(\tilde {\mu }\) and \(\frac {1}{n}\tilde {\varSigma }\).

Proceeding as in the real case, we can show that the sample sum of products matrix \(\tilde {S}\) can have a representation of the form

$$\displaystyle \begin{aligned} \tilde{S}={\tilde{\mathbf{Z}}}_{n-1}{\tilde{\mathbf{Z}}}_{n-1}^{*}{} \end{aligned} $$
(3.5a.3)

where the columns of \({\tilde {\mathbf {Z}}}_{n-1}\) are iid standard normal vectors in the complex domain if the population is a p-variate Gaussian in the complex domain. In this case, it will be shown later, that \(\tilde {S}\) is distributed as a complex Wishart matrix with (n − 1) ≥ p degrees of freedom.

3.5.3. Maximum likelihood estimators of the p-variate real Gaussian distribution

Letting L denote the joint density of the sample values X 1, …, X n, which are p × 1 iid Gaussian vectors constituting a simple random sample of size n, we have

$$\displaystyle \begin{aligned} L=\prod_{j=1}^n\frac{{\mathrm{e}}^{-\frac{1}{2}(X_j-\mu)^{\prime}\varSigma^{-1}(X_j-\mu)}}{(2 \pi)^{\frac{p}{2}}|\varSigma|{}^{\frac{1}{2}}}=\frac{{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{1}{2}n(\bar{X}-\mu)^{\prime}\varSigma^{-1}(\bar{X}-\mu)}}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}} {} \end{aligned} $$
(3.5.4)

where, as previously denoted, X is the p × n matrix

$$\displaystyle \begin{aligned} {\mathbf{X}}&=(X_1,\ldots, X_n),\ \bar{X}=\frac{1}{n}(X_1+\cdots+X_n),\ {\bar{\mathbf{X}}}=(\bar{X},\ldots, \bar{X}),\\ S&=({\mathbf{X}}-{\bar{\mathbf{X}}})({\mathbf{X}}-{\bar{\mathbf{X}}})^{\prime}=(s_{ij}),\ s_{ij}=\sum_{k=1}^n(x_{ik}-\bar{x_i})(x_{jk}-\bar{x_j}). \end{aligned} $$

In this case, the parameters are the p × 1 vector μ and the p × p real positive definite matrix Σ. If we resort to Calculus to maximize L, then we would like to differentiate L, or the one-to-one function \(\ln L\), with respect to μ and Σ directly, rather than differentiating with respect to each element comprising μ and Σ. For achieving this, we need to further develop the differential operators introduced in Chap. 1.

Definition 1

Derivative with respect to a matrix. Let Y = (y ij) be a p × q matrix where the elements y ij’s are distinct real scalar variables. The operator \(\frac {\partial }{\partial Y}\) will be defined as \(\frac {\partial }{\partial Y}=(\frac {\partial }{\partial y_{ij}})\) and this operator applied to a real scalar quantity f will be defined as

$$\displaystyle \begin{aligned}\frac{\partial}{\partial Y}f=\Big(\frac{\partial f}{\partial y_{ij}}\Big).\end{aligned}$$

For example, if \(f=y_{11}^2+y_{12}^2+y_{13}^2-y_{11}y_{12}+y_{21}+y_{22}^2+y_{23}\) and the 2 × 3 matrix Y  is

There are numerous examples of real-valued scalar functions of matrix argument. The determinant and the trace are two scalar functions of a square matrix A. The derivative with respect to a vector has already been defined in Chap. 1. The loglikelihood function \(\ln L\) which is available from (3.5.4) has to be differentiated with respect to μ and with respect to Σ and the resulting expressions have to be respectively equated to a null vector and a null matrix. These equations are then solved to obtain the critical points where the L as well as \(\ln L\) may have a local maximum, a local minimum or a saddle point. However, \(\ln L\) contains a determinant and a trace. Hence we need to develop some results on differentiating a determinant and a trace with respect to a matrix, and the following results will be helpful in this regard.

Theorem 3.5.5

Let the p × p matrix Y = (y ij) be nonsingular, the y ij ’s being distinct real scalar variables. Let f = |Y |, the determinant of Y . Then,

$$\displaystyle \begin{aligned}\frac{\partial}{\partial Y}|Y|=\begin{cases}|Y|(Y^{-1})^{\prime}\mathit{\mbox{ for a general }}Y\\ |Y|[2Y^{-1}-{\mathrm{diag}}(Y^{-1})]\mathit{\mbox{ for }}Y=Y^{\prime}\end{cases} \end{aligned}$$

where diag(Y −1) is a diagonal matrix whose diagonal elements coincide with those of Y −1.

Proof

A determinant can be obtained by expansions along any row (or column), the resulting sums involving the corresponding elements and their associated cofactors. More specifically, |Y | = y i1 C i1 + ⋯ + y ip C ip for each i = 1, …, p, where C ij is the cofactor of y ij. This expansion holds whether the elements in the matrix are real or complex. Then,

$$\displaystyle \begin{aligned}\frac{\partial}{\partial y_{ij}}|Y|=\begin{cases}C_{ij}\mbox{ for a general }Y\\ 2C_{ij}\mbox{ for }Y=Y^{\prime}, i\ne j\\ C_{jj}\mbox{ for }Y=Y^{\prime}, i=j.\end{cases} \end{aligned}$$

Thus, \(\frac {\partial }{\partial Y}|Y|=\) the matrix of cofactors = |Y |(Y −1) for a general Y . When Y = Y , then

Hence the result.

Theorem 3.5.6

Let A and Y = (y ij) be p × p matrices where A is a constant matrix and the y ij ’s are distinct real scalar variables. Then,

$$\displaystyle \begin{aligned}\frac{\partial}{\partial Y}[{\mathrm{tr}}(AY)]=\begin{cases}A^{\prime}\mathit{\mbox{ for a general }}Y\\ A+A^{\prime}-{\mathrm{diag}}(A)\mathit{\mbox{ for }} Y=Y^{\prime}.\end{cases}\end{aligned}$$

Proof

tr(AY ) =∑ij a ji y ij for a general Y , so that \(\frac {\partial }{\partial Y}[{\mathrm {tr}}(Y)]=A^{\prime }\) for a general Y . When Y = Y , \(\frac {\partial }{\partial y_{jj}}[{\mathrm {tr}}(AY)]=a_{jj} \) and \( \frac {\partial }{\partial y_{ij}}[{\mathrm {tr}}(AY)]=a_{ij}+a_{ji}\) for ij. Hence, \(\frac {\partial }{\partial Y}[{\mathrm {tr}}(AY)]=A+A^{\prime }-{\mathrm {diag}}(A)\) for Y = Y . Thus, the result is established.

With the help of Theorems 3.5.5 and 3.5.6, we can optimize L or \(\ln L\) with L as specified in Eq. (3.5.4). For convenience, we take \(\ln L\) which is given by

$$\displaystyle \begin{aligned} \ln L=-\frac{np}{2}\ln (2\pi)-\frac{n}{2}\ln|\varSigma|-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{n}{2}(\bar{X}-\mu)^{\prime}\varSigma^{-1}(\bar{X}-\mu).{} \end{aligned} $$
(3.5.5)

Then,

$$\displaystyle \begin{aligned} \frac{\partial}{\partial \mu}\ln L=O&\Rightarrow 0-\frac{n}{2}\frac{\partial}{\partial \mu}(\bar{X}-\mu)^{\prime}\varSigma^{-1}(\bar{X}-\mu)=O\\ &\Rightarrow n\,\varSigma^{-1}(\bar{X}-\mu)=O\Rightarrow \bar{X}-\mu=O\\ &\Rightarrow \mu=\bar{X},\end{aligned} $$

referring to the vector derivatives defined in Chap. 1. The extremal value, denoted with a hat, is \(\hat {\mu }=\bar {X}\). When differentiating with respect to Σ, we may take B = Σ −1 for convenience and differentiate with respect to B. We may also substitute \(\bar {X}\) to μ because the critical point for Σ must correspond to \(\hat {\mu }=\bar {X}\). Accordingly, \(\ln L\) at \(\mu =\bar {X}\) is

$$\displaystyle \begin{aligned}\ln L(\hat{\mu},B)=-\frac{np}{2}\ln(2\pi)+\frac{n}{2}\ln|B|-\frac{1}{2}{\mathrm{tr}}(BS). \end{aligned}$$

Noting that B = B ,

$$\displaystyle \begin{aligned} \frac{\partial}{\partial B}\ln L(\hat{\mu},B)=O&\Rightarrow \frac{n}{2}[2B^{-1}-{\mathrm{diag}}(B^{-1})]-\frac{1}{2}[2S-{\mathrm{diag}}(S)]=O\\ &\Rightarrow n[2\varSigma-{\mathrm{diag}}(\varSigma)]=2S-{\mathrm{diag}}(S)\\ &\Rightarrow \hat{\sigma}_{jj}=\frac{1}{n}s_{jj},\ \hat{\sigma}_{ij}=\frac{1}{n}s_{ij},i\ne j\\ &\Rightarrow (\hat{\mu}=\bar{X}, \hat{\varSigma}=\frac{1}{n}S).\end{aligned} $$

Hence, the only critical point is \((\hat {\mu },\hat {\varSigma })=(\bar {X},\frac {1}{n}S)\). Does this critical point correspond to a local maximum or a local minimum or something else? For \(\hat {\mu }=\bar {X}\), consider the behavior of \(\ln L\). For convenience, we may convert the problem in terms of the eigenvalues of B. Letting λ 1, …, λ p be the eigenvalues of B, observe that λ j > 0, j = 1, …, p, that the determinant is the product of the eigenvalues and the trace is the sum of the eigenvalues. Examining the behavior of \(\ln L\) for all possible values of λ 1 when λ 2, …, λ p are fixed, we see that \(\ln L\) at \(\hat {\mu }\) goes from − to − through finite values. For each λ j, the behavior of \(\ln L\) is the same. Hence the only critical point must correspond to a local maximum. Therefore \(\hat {\mu }=\bar {X}\) and \(\hat {\varSigma }=\frac {1}{n}S\) are the maximum likelihood estimators (MLE’s) of μ and Σ respectively. The observed values of \(\hat {\mu }\) and \(\hat {\varSigma }\) are the maximum likelihood estimates of μ and Σ, for which the same abbreviation MLE is utilized. While maximum likelihood estimators are random variables, maximum likelihood estimates are numerical values. Observe that, in order to have an estimate for Σ, we must have that the sample size n ≥ p.

In the derivation of the MLE of Σ, we have differentiated with respect to B = Σ −1 instead of differentiating with respect to the parameter Σ. Could this affect final result? Given any θ and any non-trivial differentiable function of θ, ϕ(θ), whose derivative is not identically zero, that is, \(\frac {{\mathrm {d}}}{{\mathrm {d}}\theta }\phi (\theta )\ne 0\) for any θ, it follows from basic calculus that for any differentiable function g(θ), the equations \(\frac {{\mathrm {d}}}{{\mathrm {d}}\theta }g(\theta )=0\) and \(\frac {{\mathrm {d}}}{{\mathrm {d}}\phi }g(\theta )=0\) will lead to the same solution for θ. Hence, whether we differentiate with respect to B = Σ −1 or Σ, the procedures will lead to the same estimator of Σ. As well, if \(\hat {\theta }\) is the MLE of θ, then \(g(\hat {\theta })\) will also the MLE of g(θ) whenever g(θ) is a one-to-one function of θ. The numerical evaluation of maximum likelihood estimates for μ and Σ has been illustrated in Example 3.5.1.

3.5a.3. MLE’s in the complex p-variate Gaussian case

Let the p × 1 vectors in the complex domain \(\tilde {X}_1,\ldots , \tilde {X}_n\) be iid as \(\tilde {N}_p(\tilde {\mu },\varSigma ),\ \varSigma >O\). and let the joint density of the \(\tilde {X}_j\)’s, j = 1, …, n, be denoted by \(\tilde {L}\). Then

$$\displaystyle \begin{aligned}\tilde{L}&=\prod_{j=1}^n\frac{{\mathrm{e}}^{-(\tilde{X}_j-\tilde{\mu})^{*}\varSigma^{-1}(\tilde{X}_j-\tilde{\mu})}}{\pi^p|{\mathrm{det}}(\varSigma)|}=\frac{{\mathrm{e}}^{-\sum_{j=1}^n(\tilde{X}_j-\tilde{\mu})^{*}\varSigma^{-1}(\tilde{X}_j-\tilde{\mu})}}{\pi^{np}|{\mathrm{det}}(\varSigma)|{}^n}\\ &=\frac{{\mathrm{e}}^{-{\mathrm{tr}}(\varSigma^{-1}\tilde{S})-n(\bar{\tilde{X}}-\tilde{\mu})^{*}\varSigma^{-1}(\bar{\tilde{X}}-\tilde{\mu})}}{\pi^{np}|{ \mathrm{det}}(\varSigma)|{}^n}\end{aligned} $$

where |det(Σ)| denotes the absolute value of the determinant of Σ,

$$\displaystyle \begin{aligned} \tilde{S}&=({\tilde{\mathbf{X}}}-{\bar{\tilde{\mathbf{X}}}})({\tilde{\mathbf{X}}}-{{\bar{\tilde{\mathbf{X}}}}})^{*}=(\tilde{s}_{ij}),\ \tilde{s}_{ij}=\sum_{k=1}^n(\tilde{x}_{ik}-\bar{\tilde{x}}_i)(\tilde{x}_{jk}-\bar{\tilde{x}}_j)^{*},\\ {\tilde{\mathbf{X}}}&=[\tilde{X}_1,\ldots, \tilde{X}_n],\ \bar{\tilde{X}}=\frac{1}{n}(\tilde{X}_1+\cdots+\tilde{X}_n),\ {\bar{\tilde{\mathbf{X}}}}=[\bar{\tilde{X}},\ldots, \bar{\tilde{X}}],\ \end{aligned} $$

where \({\tilde {\mathbf {X}}}\) and \({\bar {\tilde {\mathbf {X}}}}\) are p × n. Hence,

$$\displaystyle \begin{aligned} \ln\tilde{L}=-np\ln\pi-n\ln|{\mathrm{det}}(\varSigma)|-{\mathrm{tr}}(\varSigma^{-1}\tilde{S})-n(\bar{\tilde{X}}-\tilde{\mu})^{*}\varSigma^{-1}(\bar{\tilde{X}}-\tilde{\mu}).{} \end{aligned} $$
(3.5a.4)

3.5a.4. Matrix derivatives in the complex domain

Consider \({\mathrm {tr}}(\tilde {B}\tilde {S}^{*}),\ \tilde {B}=\tilde {B}^{*}>O,\ \tilde {S}=\tilde {S}^{*}>O\). Let \(\tilde {B}=B_1+iB_2,\ \tilde {S}=S_1+iS_2,\ i=\sqrt {(-1)}\). Then B 1 and S 1 are real symmetric and B 2 and S 2 are real skew symmetric since \(\tilde {B}\) and \(\tilde {S}\) are Hermitian. What is then \(\frac {\partial }{\partial \tilde {B}}[{\mathrm {tr}}(\tilde {B}\tilde {S}^{*})]\)? Consider

$$\displaystyle \begin{aligned} \tilde{B}\tilde{S}^{*}&=(B_1+iB_2)(S_1^{\prime}-iS_2^{\prime})=B_1S_1^{\prime}+B_2S_2^{\prime}+i(B_2S_1^{\prime}-B_1S_2^{\prime}),\\ {\mathrm{tr}}(\tilde{B}\tilde{S}^{*})&={\mathrm{tr}}(B_1S_1^{\prime}+B_2S_2^{\prime})+i[{\mathrm{tr}}(B_2S_1^{\prime})-{\mathrm{tr}}(B_1S_2^{\prime})].\end{aligned} $$

It can be shown that when B 2 and S 2 are real skew symmetric and B 1 and S 1 are real symmetric, then \({\mathrm {tr}}(B_2S_1^{\prime })=0,\ {\mathrm {tr}}(B_1S_2^{\prime })=0 \). This will be stated as a lemma.

Lemma 3.5a.3

Consider two p × p real matrices A and B where A = A (symmetric) and B = −B (skew symmetric). Then, tr(AB) = 0.

Proof

tr(AB) = tr(AB) = tr(B A ) = −tr(BA) = −tr(AB), which implies that tr(AB) = 0.

Thus, \({\mathrm {tr}}(\tilde {B}\tilde {S}^{*})={\mathrm {tr}}(B_1S_1^{\prime }+B_2S_2^{\prime })\). The diagonal elements of S 1 in \({\mathrm {tr}}(B_1S_1^{\prime })\) are multiplied once by the diagonal elements of B 1 and the non-diagonal elements in S 1 are multiplied twice each by the corresponding elements in B 1. Hence,

$$\displaystyle \begin{aligned}\frac{\partial}{\partial B_1}{\mathrm{tr}}(B_1S_1^{\prime})=2S_1-{\mathrm{diag}}(S_1).\end{aligned}$$

In B 2 and S 2, the diagonal elements are zeros and hence

$$\displaystyle \begin{aligned}\frac{\partial}{\partial B_2}{\mathrm{tr}}(B_2S_2^{\prime})=2S_2.\end{aligned}$$

Therefore

$$\displaystyle \begin{aligned}\Big(\frac{\partial}{\partial B_1}+i\frac{\partial}{\partial B_2}\Big){\mathrm{tr}}(B_1S_1^{\prime}+B_2S_2^{\prime})=2(S_1+iS_2)-{\mathrm{diag}}(S_1)=2\tilde{S}-{\mathrm{diag}}(\tilde{S}).\end{aligned}$$

Thus, the following result:

Theorem 3.5a.3

Let \(\tilde {S}=\tilde {S}^{*}>O\) and \(\tilde {B}=\tilde {B}^{*}>O\) be p × p Hermitian matrices. Let \(\tilde {B}=B_1+iB_2\) and \(\tilde {S}=S_1+iS_2\) where the p × p matrices B 1 and S 1 are symmetric and B 2 and S 2 are skew symmetric real matrices. Letting \(\frac {\partial }{\partial \tilde {B}}=\frac {\partial }{\partial B_1}+i\frac {\partial }{\partial B_2}\) , we have

$$\displaystyle \begin{aligned}\frac{\partial}{\partial\tilde{B}}{\mathrm{tr}}(\tilde{B}\tilde{S}^{*})=2\tilde{S}-{\mathrm{diag}}(\tilde{S}).\end{aligned}$$

Theorem 3.5a.4

Let \(\tilde {\varSigma }=(\tilde {\sigma }_{ij})=\tilde {\varSigma }^{*}>O\) be a Hermitian positive definite p × p matrix. Let det(Σ) be the determinant and |det(Σ)| be the absolute value of the determinant respectively. Let \(\frac {\partial }{\partial \tilde {\varSigma }}=\frac {\partial }{\partial \varSigma _1}+i\frac {\partial }{\partial \varSigma _2}\) be the differential operator, where \(\tilde {\varSigma }=\varSigma _1+i\varSigma _2\), \(i=\sqrt {(-1)}\) , Σ 1 being real symmetric and Σ 2 , real skew symmetric. Then,

$$\displaystyle \begin{aligned}\frac{\partial}{\partial\tilde{\varSigma}}\ln|{\mathrm{det}}(\tilde{\varSigma})|=2\tilde{\varSigma}^{-1}-{\mathrm{diag}}(\tilde{\varSigma}^{-1}).\end{aligned}$$

Proof

Note that for two scalar complex quantities, \(\tilde {x}=x_1+ix_2\) and \(\tilde {y}=y_1+iy_2\) where \(i=\sqrt {(-1)}\) and x 1, x 2, y 1, y 2 are real, and for the operator \(\frac {\partial }{\partial \tilde {x}}=\frac {\partial }{\partial x_1}+i\frac {\partial }{\partial x_2}\), the following results hold, which will be stated as a lemma.

Lemma 3.5a.4

Given \(\tilde {x}\), \(\tilde {y}\) and the operator \(\frac {\partial }{\partial x_1}+i\frac {\partial }{\partial x_2}\) defined above,

$$\displaystyle \begin{aligned} \frac{\partial}{\partial\tilde{x}}(\tilde{x}\tilde{y})&=0,~\frac{\partial}{\partial\tilde{x}}(\tilde{x}\tilde{y}^{*})=0,~\frac{\partial}{\partial\tilde{x}}(\tilde{x}^{*}\tilde{y})=2\tilde{y}, ~\frac{\partial}{\partial\tilde{x}}(\tilde{x}^{*}\tilde{y}^{*})=2\tilde{y}^{*},\\ \frac{\partial}{\partial\tilde{x}}(\tilde{x}\tilde{x}^{*})&=\frac{\partial}{\partial\tilde{x}}(\tilde{x}^{*}\tilde{x})=2\tilde{x},\ ~\frac{\partial}{\partial\tilde{x}^{*}}(\tilde{x}^{*}\tilde{x}) =\frac{\partial}{\partial\tilde{x}^{*}}(\tilde{x}\tilde{x}^{*})=2\tilde{x}^{*}\end{aligned} $$

where, for example, \(\tilde {x}^{*}\) which, in general, is the conjugate transpose of \(\tilde {x},\) is only the conjugate in this case since \(\tilde {x}\) is a scalar quantity.

Observe that for a p × p Hermitian positive definite matrix \(\tilde {X}\), the absolute value of the determinant, namely, \(|{\mathrm {det}}(\tilde {X})|=\sqrt {{\mathrm {det}}(\tilde {X}){\mathrm {det}}(\tilde {X}^{*})}={\mathrm {det}}(\tilde {X})={\mathrm {det}}(\tilde {X}^{*})\) since \(\tilde {X}=\tilde {X}^{*}\). Consider the following cofactor expansion of \({\mathrm {det}}(\tilde {X})\) (in general, a cofactor expansion is valid whether the elements of the matrix are real or complex). Letting C ij denote the cofactor of x ij in \(\tilde {X}=(x_{ij})\) when x ij is real or complex,

$$\displaystyle \begin{aligned} {\mathrm{det}}(X)&=x_{11}C_{11}+x_{12}C_{12}+\cdots+x_{1p}C_{1p} \end{aligned} $$
(1)
$$\displaystyle \begin{aligned} &=x_{21}C_{21}+x_{22}C_{22}+\cdots+x_{2p}C_{2p} \end{aligned} $$
(2)
$$\displaystyle \begin{aligned} &\ \ \! \vdots\\ &=x_{p1}C_{p1}+x_{p2}C_{p2}+\cdots+x_{pp}C_{pp}\,. \end{aligned} $$
(p)

When \(\tilde {X}=\tilde {X}^{*}\), the diagonal elements x jj’s are all real. From Lemma 3.5a.4 and equation (1), we have

$$\displaystyle \begin{aligned}\frac{\partial}{\partial x_{11}}(x_{11}C_{11})=C_{11},\ \frac{\partial}{\partial x_{1j}}(x_{1j}C_{1j})=0,\ j=2,\ldots,p. \end{aligned}$$

From Eq. (2), note that \(x_{21}=x_{12}^{*},\ C_{21}=C_{12}^{*}\) since \(\tilde {X}=\tilde {X}^{*}\). Then from Lemma 3.5a.4 and (2), we have

$$\displaystyle \begin{aligned}\frac{\partial}{\partial x_{12}}(x_{12}^{*}C_{12}^{*})=C_{12}^{*},\ \frac{\partial}{\partial x_{22}}(x_{22}^{*}C_{22}^{*})=C_{22}^{*},\ \frac{\partial}{\partial x_{2j}}(x_{2j}C_{2j})=0,\ j=3,\ldots,p, \end{aligned}$$

observing that \(x_{22}^{*}=x_{22}\) and \(C_{22}^{*}=C_{22}\). Now, continuing the process with Eqs. (3), (4), …, (p), we have the following result:

$$\displaystyle \begin{aligned}\frac{\partial}{\partial{x_{ij}}}[{\mathrm{det}}(\tilde{X})]=\begin{cases}C_{jj}^{*},\ j=1,\ldots,p\\ 2C_{ij}^{*}\mbox{ for all }i\ne j.\end{cases} \end{aligned}$$

Observe that for \(\tilde {\varSigma }^{-1}=\tilde {B}=\tilde {B}^{*}\),

where B rs is the cofactor of \(\tilde {b}_{rs}\), \(\tilde {B}=(\tilde {b}_{rs})\). Therefore, at \(\hat {\mu }=\bar {\tilde {X}}\), for \(\tilde {\varSigma }^{-1}=\tilde {B}\), and from Theorems 3.5a.5 and 3.5a.6, we have

$$\displaystyle \begin{aligned} \frac{\partial}{\partial \tilde{B}}[\ln \tilde{L}]=O&\Rightarrow n[\tilde{\varSigma}-{\mathrm{diag}}(\tilde{\varSigma})]-[\tilde{S}-{\mathrm{diag}}(\tilde{S})]=O\\ &\Rightarrow \tilde{\varSigma}=\frac{1}{n}\tilde{S}\Rightarrow \hat{\tilde{\varSigma}}=\frac{1}{n}\tilde{S} \mbox{ for }n\ge p,\end{aligned} $$

where a hat denotes the estimate/estimator.

Again, from Lemma 3.5a.4 we have the following:

$$\displaystyle \begin{aligned} \frac{\partial}{\partial\tilde{\mu}}[(\bar{\tilde{X}}-\tilde{\mu})^{*}\tilde{\varSigma}^{-1}(\bar{\tilde{X}}-\tilde{\mu})]&=\frac{\partial}{\partial\tilde{\mu}}\{\bar{\tilde{X}}^{*}\tilde{\varSigma}^{-1}\bar{\tilde{X}} +\tilde{\mu}^{*}\tilde{\varSigma}^{-1}\tilde{\mu} -\bar{\tilde{X}}^{*}\tilde{\varSigma}^{-1}\tilde{\mu}+\tilde{\mu}^{*}\tilde{\varSigma}^{-1}\bar{\tilde{X}}\}=O\\ &\Rightarrow O+2\tilde{\varSigma}^{-1}\tilde{\mu}-O -2\tilde{\varSigma}^{-1}\bar{\tilde{X}}=O\\ &\Rightarrow \hat{\tilde{\mu}}=\bar{\tilde{X}}.\end{aligned} $$

Thus, the MLE of \(\tilde {\mu }\) and \(\tilde {\varSigma }\) are respectively \(\hat {\tilde {\mu }}=\bar {\tilde {X}}\) and \(\hat {\tilde {\varSigma }}=\frac {1}{n}\tilde {S}\) for n ≥ p. It is not difficult to show that the only critical point \((\hat {\tilde {\mu }},\hat {\tilde {\varSigma }})=(\bar {\tilde {X}},\frac {1}{n}\tilde {S})\) corresponds to a local maximum for \(\tilde {L}\). Consider \(\ln \tilde {L}\) at \(\hat {\tilde {\mu }}=\bar {\tilde {X}}\). Let λ 1, …, λ p be the eigenvalues of \(\tilde {B}=\tilde {\varSigma }^{-1}\) where the λ j’s are real as \(\tilde {B}\) is Hermitian. Examine the behavior of \(\ln \tilde {L}\) when a λ j is increasing from 0 to . Then \(\ln \tilde {L}\) goes from − back to − through finite values. Hence, the only critical point corresponds to a local maximum. Thus, \(\bar {\tilde {X}}\) and \(\frac {1}{n}\tilde {S}\) are the MLE’s of \(\tilde {\mu }\) and \(\tilde {\varSigma },\) respectively.

Theorems 3.5.7, 3.5a.5

For the p-variate real Gaussian with the parameters μ and Σ > O and the p-variate complex Gaussian with the parameters \(\tilde {\mu }\) and \(\tilde {\varSigma }>O\) , the maximum likelihood estimators (MLE’s) are \(\hat {\mu }=\bar {X},\ \hat {\varSigma }=\frac {1}{n}S,\ \hat {\tilde {\mu }}=\bar {\tilde {X}},\ \hat {\tilde {\varSigma }}=\frac {1}{n}\tilde {S}\) where n is the sample size, \(\bar {X}\) and S are the sample mean and sample sum of products matrix in the real case, and \(\bar {\tilde {X}}\) and \(\tilde {S}\) are the sample mean and the sample sum of products matrix in the complex case, respectively.

A numerical illustration of the maximum likelihood estimates of \(\tilde {\mu }\) and \(\tilde {\varSigma }\) in the complex domain has already been given in Example 3.5a.1.

It can be shown that the MLE of μ and Σ in the real and complex p-variate Gaussian cases are such that \(E[\bar {X}]=\mu ,\ E[\tilde {X}]=\tilde {\mu },\ E[S]=\frac {n-1}{n}\varSigma ,\ E[\tilde {S}]=\frac {n-1}{n}\tilde {\varSigma }\). For these results to hold, the population need not be Gaussian. Any population for which the covariance matrix exists will have these properties. This will be stated as a result.

Theorems 3.5.8, 3.5a.6

Let X 1, …, X n be a simple random sample from any p-variate population with mean value vector μ and covariance matrix Σ = Σ  > O in the real case and mean value vector \(\tilde {\mu }\) and covariance matrix \(\tilde {\varSigma }=\tilde {\varSigma }^{*}>O\) in the complex case, respectively, and let Σ and \(\tilde {\varSigma }\) exist in the sense all the elements therein exist. Let \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n),\ \bar {\tilde {X}}=\frac {1}{n}(\tilde {X}_1+\cdots +\tilde {X}_n)\) and let S and \(\tilde {S}\) be the sample sum of products matrices in the real and complex cases, respectively. Then \(E[\bar {X}]=\mu ,\ E[\bar {\tilde {X}}]=\tilde {\mu }, \ E[\hat {\varSigma }]=E[\frac {1}{n}S]=\frac {n-1}{n}\varSigma \to \varSigma \) as n ∞ and \(E[\bar {\tilde {X}}]=\tilde {\mu },\ E[\hat {\tilde {\varSigma }}]=E[\frac {1}{n}\tilde {S}]=\frac {n-1}{n}\tilde {\varSigma }\to \tilde {\varSigma }\) as n ∞.

Proof

\(E[\bar {X}]=\frac {1}{n}\{E[X_1]+\cdots +E[X_n]\}=\frac {1}{n}\{\mu +\cdots +\mu \}=\mu \). Similarly, \(E[\bar {\tilde {X}}]=\tilde {\mu }\). Let M = (μ, μ, …, μ), that is, M is a p × n matrix wherein every column is the p × 1 vector μ. Let \({\bar {\mathbf {X}}}=(\bar {X},\ldots , \bar {X})\), that is, \({\bar {\mathbf {X}}}\) is a p × n matrix wherein every column is \(\bar {X}\). Now, consider

$$\displaystyle \begin{aligned}E[({\mathbf{X}}-{\mathbf{M}})({\mathbf{X}}-{\mathbf{M}})^{\prime}]=E[\sum_{j=1}^n(X_j-\mu)(X_j-\mu)^{\prime}]=\sum_{j=1}^n\{\varSigma+\cdots+\varSigma\}=n\varSigma.\end{aligned}$$

As well,

$$\displaystyle \begin{aligned} ({\mathbf{X}}-{\mathbf{M}})({\mathbf{X}}-{\mathbf{M}})^{\prime}&= ({\mathbf{X}}-{\bar{\mathbf{X}}}+{\bar{\mathbf{X}}}-{\mathbf{M}})({\mathbf{X}}-{\bar{\mathbf{X}}}+{\bar{\mathbf{X}}}-{\mathbf{M}})^{\prime}\\ &= ({\mathbf{X}}-{\bar{\mathbf{X}}})({\mathbf{X}}-{\bar{\mathbf{X}}})^{\prime}+ ({\mathbf{X}}-{\bar{\mathbf{X}}})({\bar{\mathbf{X}}}-{\mathbf{M}})^{\prime}\\ &\ \ \ +({\bar{\mathbf{X}}}-{\mathbf{M}})({\mathbf{X}}-{\bar{\mathbf{X}}})^{\prime}+({\bar{\mathbf{X}}}-{\mathbf{M}})({\bar{\mathbf{X}}}-{\mathbf{M}})^{\prime}\Rightarrow\\ ({\mathbf{X}}-{\mathbf{M}})({\mathbf{X}}-{\mathbf{M}})^{\prime}&=S+\sum_{j=1}^n(X_j-\bar{X})(\bar{X}-\mu)^{\prime}+\sum_{j=1}^n(\bar{X}-\mu)(X_j-\bar{X})^{\prime}\end{aligned} $$
$$\displaystyle \begin{aligned} &\ \ \ \ \ +\sum_{j=1}^n(\bar{X}-\mu)(\bar{X}-\mu)^{\prime}\\ &=S+O+O+n(\bar{X}-\mu)(\bar{X}-\mu)^{\prime}\Rightarrow\end{aligned} $$
$$\displaystyle \begin{aligned} n\varSigma&=E[S]+O+O+n{\mathrm{Cov}}(\bar{X})=E[S]+n\Big[\frac{1}{n}\varSigma\Big]=E[S]+\varSigma\Rightarrow\\ E[S]&=(n-1)\varSigma\Rightarrow E[\hat{\varSigma}]=E\Big[\frac{1}{n}S\Big]=\frac{n-1}{n}\varSigma\to\varSigma\mbox{ as }n\to\infty.\end{aligned} $$

Observe that \(\sum _{j=1}^n(X_j-\bar {X})=O\), this result having been utilized twice in the above derivations. The complex case can be established in a similar manner. This completes the proof.

3.5.4. Properties of maximum likelihood estimators

Definition 2 (Unbiasedness)

Let g(θ) be a function of the parameter θ which stands for all the parameters associated with a population’s distribution. Let the independently distributed random variables x 1, …, x n constitute a simple random sample of size n from a univariate population. Let T(x 1, …, x n) be an observable function of the sample values x 1, …, x n. This definition for a statistic holds when the iid variables are scalar, vector or matrix variables, whether in the real or complex domains. Then T is called a statistic (the plural form, statistics, is not to be confused with the subject of Statistics). If E[T] = g(θ) for all θ in the parameter space, then T is said to be unbiased for g(θ) or an unbiased estimator of g(θ).

We will look at some properties of the MLE of the parameter or parameters represented by θ in a given population specified by its density/probability function f(x, θ). Consider a simple random sample of size n from this population. The sample will be of the form x 1, …, x n if the population is univariate or of the form X 1, …, X n if the population is multivariate or matrix-variate. Some properties of estimators in the scalar variable case will be illustrated first. Then the properties will be extended to the vector/matrix-variate cases. The joint density of the sample values will be denoted by L. Thus, in the univariate case,

$$\displaystyle \begin{aligned}L=L(x_1,\ldots, x_n,\theta)=\prod_{j=1}^nf(x_j,\theta)\Rightarrow \ln L=\sum_{j=1}^n\ln f(x_j,\theta). \end{aligned}$$

Since the total probability is 1, we have the following, taking for example the variable to be continuous and a scalar parameter θ:

$$\displaystyle \begin{aligned}\int_X L ~{\mathrm{d}}X=1\Rightarrow \frac{\partial}{\partial \theta}\int_X L ~{\mathrm{d}}X=0, \ X^{\prime}=(x_1,\ldots, x_n). \end{aligned}$$

We are going to assume that the support of x is free of theta and the differentiation can be done inside the integral sign. Then,

$$\displaystyle \begin{aligned}0=\int_X\frac{\partial}{\partial \theta}L~{\mathrm{d}}X=\int_X\frac{1}{L}\Big(\frac{\partial}{\partial \theta}L\Big)~L~{\mathrm{d}}X=\int_X\Big[\frac{\partial}{\partial \theta}\ln L\Big]L~{\mathrm{d}}X.\end{aligned}$$

Noting that ∫X(⋅)L dX = E[(⋅)], we have

$$\displaystyle \begin{aligned} E\Big[\frac{\partial}{\partial \theta}\ln L\Big]=0\Rightarrow E\Big[ \sum_{j=1}^n\frac{\partial}{\partial \theta}\ln f(x_j,\theta)\Big]=0.{} \end{aligned} $$
(3.5.6)

Let \(\hat {\theta }\) be the MLE of θ. Then

$$\displaystyle \begin{aligned} \frac{\partial}{\partial \theta}L|{}_{\theta=\hat{\theta}}=0&\Rightarrow \frac{\partial}{\partial \theta}\ln L|{}_{\theta=\hat{\theta}}=0\\ &\Rightarrow E\Big[\sum_{j=1}^n\frac{\partial}{\partial \theta}\ln f(x_j,\theta)|{}_{\theta=\hat{\theta}}\Big]=0.{} \end{aligned} $$
(3.5.7)

If θ is scalar, then the above are single equations, otherwise they represent a system of equations as the derivatives are then vector or matrix derivatives. Here (3.5.6) is the likelihood equation giving rise to the maximum likelihood estimators (MLE) of θ. However, by the weak law of large numbers (see Sect. 2.6),

$$\displaystyle \begin{aligned} \frac{1}{n}\sum_{j=1}^n\frac{\partial}{\partial \theta}\ln f(x_j,\theta)|{}_{\theta=\hat{\theta}}\to E\Big[\frac{\partial}{\partial \theta}\ln f(x_j,\theta_o)\Big]\mbox{ as }n\to\infty{} \end{aligned} $$
(3.5.8)

where θ o is the true value of θ. Noting that \(E[\frac {\partial }{\partial \theta }\ln f(x_j,\theta _o)]=0\) owing to the fact that \(\int _{-\infty }^{\infty }f(x){\mathrm {d}}x=1\), we have the following results:

$$\displaystyle \begin{aligned}\sum_{j=1}^n\frac{\partial}{\partial \theta}\ln f(x_j,\theta)|{}_{\theta=\hat{\theta}}=0, \ E\Big[\frac{\partial}{\partial \theta}\ln f(x_j,\theta)|{}_{\theta=\theta_o}\Big]=0. \end{aligned}$$

This means that \(E[\hat {\theta }]=\theta _0\) or \(E[\hat {\theta }]\to \theta _o\) as n →, that is, \(\hat {\theta }\) is asymptotically unbiased for the true value θ o of θ. As well, \(\hat {\theta }\to \theta _o\) as n → almost surely or with probability 1, except on a set having probability measure zero. Thus, the MLE of θ is asymptotically unbiased and consistent for the true value θ o, which is stated next as a theorem:

Theorem 3.5.9

In a given population’s distribution whose parameter or set of parameters is denoted by θ, the MLE of θ, denoted by \(\hat {\theta }\) , is asymptotically unbiased and consistent for the true value θ o.

Definition 3.5.3.

Consistency of an estimator If \( Pr\{\hat {\theta } \to \theta _o\} \to 1\) as n →, then we say that \(\hat {\theta }\) is consistent for θ o, where \(\hat {\theta }\) is an estimator for θ.

Example 3.5.2

Consider a real p-variate Gaussian population N p(μ, Σ), Σ > O. Show that the MLE of μ is unbiased and consistent for μ and that the MLE of Σ is asymptotically unbiased for Σ.

Solution 3.5.2

We have \(\hat {\mu }=\bar {X}=\) the sample mean or sample average and \(\hat {\varSigma }=\frac {1}{n}S\) where S is the sample sum of products matrix. From Theorem 3.5.4, \(E[\bar {X}]=\mu \) and \({\mathrm {Cov}}(\bar {X})=\frac {1}{n}\varSigma \to O\) as n →. Therefore, \(\hat {\mu }=\bar {X}\) is unbiased and consistent for μ. From Theorem 3.5.8, \(E[\hat {\varSigma }]=\frac {n-1}{n}\varSigma \to \varSigma \) as n → and hence \(\hat {\varSigma }\) is asymptotically unbiased for Σ.

Another desirable property for point estimators is referred to as sufficiency. If T is a statistic used to estimate a real scalar, vector or matrix parameter θ and if the conditional distribution of the sample values, given this statistic T, is free of θ, then no more information about θ can be secured from that sample once the statistic T is known. Accordingly, all the information that can be obtained from the sample is contained in T or, in this sense, T is sufficient or a sufficient estimator for θ.

Definition 3.5.4

Sufficiency of estimators Let θ be a scalar, vector or matrix parameter associated with a given population’s distribution. Let T = T(X 1, …, X n) be an estimator of θ, where X 1, …, X n are iid as the given population. If the conditional distribution of the sample values X 1, …, X n, given T, is free of θ, then we say that this T is a sufficient estimator for θ. If there are several scalar, vector or matrix parameters θ 1, …, θ k associated with a given population and if T 1(X 1, …, X n), …, T r(X 1, …, X n) are r statistics, where r may be greater, smaller or equal to k, then if the conditional distribution of X 1, …, X n, given T 1, …, T r, is free of θ 1, …, θ k, then we say that T 1, …, T r are jointly sufficient for θ 1, …, θ k. If there are several sets of statistics, where each set is sufficient for θ 1, …, θ k, then that set of statistics which allows for the maximal reduction of the data is called the minimal sufficient set of statistics for θ 1, …, θ k.

Example 3.5.3

Show that the MLE of μ in a N p(μ, Σ), Σ > O, is sufficient for μ.

Solution 3.5.3

Let X 1, …, X n be a simple random sample from a N p(μ, Σ). Then the joint density of X 1, …, X n can be written as

$$\displaystyle \begin{aligned} L=\frac{1}{(2\pi)^{\frac{np}{2}}|\varSigma|{}^{\frac{n}{2}}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)-\frac{n}{2}(\bar{X}-\mu)^{\prime}\varSigma^{-1}(\bar{X}-\mu)}, \end{aligned} $$
(i)

referring to (3.5.2). Since\(\bar {X}\) is a function of X 1, …, X n, the joint density of X 1, …, X n and \(\bar {X}\) is L itself. Hence, the conditional density of X 1, …, X n, given \(\bar {X}\), is \(L/f_1(\bar {X})\) where \(f_1(\bar {X})\) is the marginal density of \(\bar {X}\). However, appealing to Corollary 3.5.2, \(f_1(\bar {X})\) is \(N_p(\mu , \frac {1}{n}\varSigma )\). Hence

$$\displaystyle \begin{aligned} \frac{L}{f_1(\bar{X})}=\frac{1}{(2\pi)^{\frac{n(p-1)}{2}}n^{\frac{p}{2}}|\varSigma|{}^{\frac{n-1}{2}}}{\mathrm{e}}^{-\frac{1}{2}{\mathrm{tr}}(\varSigma^{-1}S)}, \end{aligned} $$
(ii)

which is free of μ so that \(\hat {\mu }\) is sufficient for μ.

Note 3.5.1

We can also show that \(\hat {\mu }=\bar {X}\) and \(\hat {\varSigma }=\frac {1}{n}S\) are jointly sufficient for μ and Σ in a N p(μ, Σ), Σ > O, population. This results requires the density of S, which will be discussed in Chap. 5.

An additional property of interest for a point estimator is that of relative efficiency. If g(θ) is a function of θ and if T = T(x 1, …, x n) is an estimator of g(θ), then E|T − g(θ)|2 is a squared mathematical distance between T and g(θ). We can consider the following criterion: the smaller the distance, the more efficient the estimator is, as we would like this distance to be as small as possible when we are estimating g(θ) by making use of T. If E[T] = g(θ), then T is unbiased for g(θ) and, in this case, E|T − g(θ)|2 = Var(T), the variance of T. In the class of unbiased estimators, we seek that particular estimator which has the smallest variance.

Definition 3.5.5

Relative efficiency of estimators If T 1 and T 2 are two estimators of the same function g(θ) of θ and if E[|T 1 − g(θ)|2] < E[|T 2 − g(θ)|2], then T 1 is said to be relatively more efficient for estimating g(θ). If T 1 and T 2 are unbiased for g(θ), the criterion becomes Var(T 1) < Var(T 2).

Let u be an unbiased estimator of g(θ), a function of the parameter θ associated with any population, and let T be a sufficient statistic for θ. Let the conditional expectation of u, given T, be denoted by h(T), that is, E[u|T] ≡ h(T). We have the two following general properties on conditional expectations, refer to Mathai and Haubold (2017), for example. For any two real scalar random variables x and y having a joint density/probability function,

$$\displaystyle \begin{aligned} E[y]=E[E(y|x)]{} \end{aligned} $$
(3.5.9)

and

$$\displaystyle \begin{aligned} {\mathrm{Var}}(y)={\mathrm{Var}}(E[y|x])+E[{\mathrm{Var}}(y|x)]{} \end{aligned} $$
(3.5.10)

whenever the expected values exist. From (3.5.9),

$$\displaystyle \begin{aligned} g(\theta)=E[u]=E[E(u|T)]=E[h(T)]\Rightarrow E[h(T)]=g(\theta).{} \end{aligned} $$
(3.5.11)

Then,

$$\displaystyle \begin{aligned} {\mathrm{Var}}(u)&=E[u-g(\theta)]^2={\mathrm{Var}}(E[u|T])+E[{\mathrm{Var}}(E[u|T])]={\mathrm{Var}}(h(T))+\delta,\ \delta\ge 0\ \ \ \ \\ &\Rightarrow {\mathrm{Var}}(u)\ge {\mathrm{Var}}(h(T)), {} \end{aligned} $$
(3.5.12)

which means that if we have a sufficient statistic T for θ, then the variance of h(T), with h(T) = E[u|T] where u is any unbiased estimator of g(θ), is smaller than or equal to the variance of any unbiased estimator of g(θ). Accordingly, we should restrict ourselves to the class of h(T) when seeking minimum variance estimators. Observe that since δ in (3.5.12) is the expected value of the variance of a real variable, it is nonnegative. The inequality in (3.5.12) is known in the literature as the Rao-Blackwell Theorem.

It follows from (3.5.6) that \(E[\frac {\partial }{\partial \theta }\ln L] =\int _X(\frac {\partial }{\partial \theta }\ln L)L~{\mathrm {d}}X=0\). Differentiating once again with respect to θ, we have

$$\displaystyle \begin{aligned} 0&=\int_X\frac{\partial}{\partial \theta}\Big[\Big(\frac{\partial}{\partial \theta}\ln L\Big)L\Big]{\mathrm{d}} X=0\\ &\Rightarrow \int_X\Big\{\Big(\frac{\partial^2}{\partial \theta^2}\ln L\Big)L+\Big(\frac{\partial}{\partial \theta}\ln L\Big)^2\Big\}{\mathrm{d}}X=0\\ &\Rightarrow \int_X\Big(\frac{\partial}{\partial \theta}\ln L\Big)^2L~{\mathrm{d}}X=-\int_X\Big(\frac{\partial^2}{\partial \theta^2}\ln L\Big)L~{\mathrm{d}}X,\end{aligned} $$

so that

$$\displaystyle \begin{aligned} {\mathrm{Var}}\Big(\frac{\partial}{\partial \theta}\ln L\Big)&=E\Big[\frac{\partial}{\partial \theta}\ln L\Big]^2=-E\Big[\frac{\partial^2}{\partial \theta^2}\ln L\Big]\\ &=nE\Big[\frac{\partial}{\partial \theta}\ln f(x_j,\theta)\Big]^2=-nE\Big[\frac{\partial^2}{\partial \theta^2}\ln f(x_j,\theta)\Big].{} \end{aligned} $$
(3.5.13)

Let T be any estimator for θ, where θ is a real scalar parameter. If T is unbiased for θ, then E[T] = θ; otherwise, let E[T] = θ + b(θ) where b(θ) is some function of θ, which is called the bias. Then, differentiating both sides with respect to θ,

$$\displaystyle \begin{aligned} \int_XTL~{\mathrm{d}}X&=\theta+b(\theta)\Rightarrow \\ 1+b^{\prime}(\theta)&=\int_XT\frac{\partial}{\partial \theta}L~{\mathrm{d}}X,\ \ \ b^{\prime}(\theta)=\frac{{\mathrm{d}}}{{\mathrm{d}}\theta}b(\theta)\\ &\ \ \ \ \Rightarrow E[T(\frac{\partial}{\partial \theta}\ln L)]=1+b^{\prime}(\theta)\\ &={\mathrm{Cov}}(T, \frac{\partial}{\partial \theta}\ln L)\end{aligned} $$

because \(E[\frac {\partial }{\partial \theta }\ln L]=0\). Hence,

$$\displaystyle \begin{aligned}{}[{\mathrm{Cov}}(T,\frac{\partial}{\partial \theta}\ln L)]^2&=[1+b^{\prime}(\theta)]^2\le {\mathrm{Var}}(T){\mathrm{Var}}\Big(\frac{\partial}{\partial \theta}\ln L\Big)\Rightarrow\\ {\mathrm{Var}}(T)&\ge \frac{[1+b^{\prime}(\theta)]^2}{{\mathrm{Var}}(\frac{\partial}{\partial \theta}\ln L)} =\frac{[1+b^{\prime}(\theta)]^2}{n{\mathrm{Var}}(\frac{\partial}{\partial \theta}\ln f(x_j,\theta))}\\ &=\frac{[1+b^{\prime}(\theta)]^2}{E[\frac{\partial}{\partial \theta}\ln L]^2} =\frac{[1+b^{\prime}(\theta)]^2}{nE[\frac{\partial}{\partial \theta}\ln f(x_j,\theta)]^2}\ ,{} \end{aligned} $$
(3.5.14)

which is a lower bound for the variance of any estimator for θ. This inequality is known as the Cramér-Rao inequality in the literature. When T is unbiased for θ, then b (θ) = 0 and then

$$\displaystyle \begin{aligned} {\mathrm{Var}}(T)\ge \frac{1}{I_n(\theta)}=\frac{1}{nI_1(\theta)}{} \end{aligned} $$
(3.5.15)

where

$$\displaystyle \begin{aligned} I_n(\theta)&={\mathrm{Var}}\Big(\frac{\partial}{\partial \theta}\ln L\Big)=E\Big[\frac{\partial}{\partial \theta}\ln L\Big]^2=nE\Big[\frac{\partial}{\partial \theta}\ln f(x_j,\theta)\Big]^2\\ &=-E\Big[\frac{\partial^2}{\partial \theta^2}\ln L\Big]= -nE\Big[\frac{\partial^2}{\partial \theta^2}\ln f(x_j,\theta)\Big]=nI_1(\theta){} \end{aligned} $$
(3.5.16)

is known as Fisher’s information about θ which can be obtained from a sample of size n, I 1(θ) being Fisher’s information in one observation or a sample of size 1. Observe that Fisher’s information is different from the information in Information Theory. For instance, some aspects of Information Theory are discussed in Mathai and Rathie (1975).

Asymptotic efficiency and normality of MLE’s

We have already established that

$$\displaystyle \begin{aligned} 0=\frac{\partial}{\partial \theta}\ln L(X,\theta)|{}_{\theta=\hat{\theta}}\ , \end{aligned} $$
(i)

which is the likelihood equation giving rise to the MLE. Let us expand (i) in a neighborhood of the true parameter value θ o :

$$\displaystyle \begin{aligned} 0&=\frac{\partial}{\partial \theta}\ln L(X,\theta)|{}_{\theta=\theta_o}+(\hat{\theta}-\theta_o)\frac{\partial^2}{\partial \theta^2}\ln L(X,\theta)|{}_{\theta=\theta_o}\\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +\frac{(\hat{\theta}-\theta_o)^2}{2}\frac{\partial^3}{\partial\theta^3}\ln L(X,\theta)|{}_{\theta=\theta_1} \end{aligned} $$
(ii)

where \(|\hat {\theta }-\theta _1|<|\hat {\theta }-\theta _o|\). Multiplying both sides by \(\sqrt {n}\) and rearranging terms, we have the following:

$$\displaystyle \begin{aligned} \sqrt{n}(\hat{\theta}-\theta_0)=\frac{-\frac{1}{\sqrt{n}}\frac{\partial}{\partial \theta}\ln L(X,\theta)|{}_{\theta=\theta_o}}{\frac{1}{n}\frac{\partial^2}{\partial \theta^2}\ln (X,\theta)|{}_{\theta=\theta_o}+\frac{1}{n}\frac{(\hat{\theta}-\theta_o)}{2}\frac{\partial^3}{\partial \theta^3}\ln L(X,\theta)|{}_{\theta=\theta_1}}. \end{aligned} $$
(iii)

The second term in the denominator of (iii) goes to zero because \(\hat {\theta }\to \theta _o\) as n →, and the third derivative is assumed to be bounded. Then the first term in the denominator is such that

$$\displaystyle \begin{aligned} \frac{1}{n}\frac{\partial^2}{\partial \theta^2}\ln L(X,\theta)|{}_{\theta=\theta_o}&=\frac{1}{n}\sum_{j=1}^n\frac{\partial^2}{\partial \theta^2}\ln f(x_j,\theta)|{}_{\theta=\theta_o}\\ &\ \ \ \ \to E\Big[\frac{\partial^2}{\partial\theta^2}\ln f(x_j,\theta)\Big]=-I_1(\theta)|{}_{\theta_o}\\ &=- {\mathrm{Var}}\Big[\frac{\partial}{\partial \theta}\ln f(x_j,\theta)\Big]\Big|{}_{\theta=\theta_o}, \\ I_1(\theta)|{}_{\theta=\theta_o}&={\mathrm{Var}}\Big[\frac{\partial}{\partial\theta}\ln f(x_j,\theta)\Big]\Big|{}_{\theta=\theta_o},\end{aligned} $$

which is the information bound I 1(θ o). Thus,

$$\displaystyle \begin{aligned} \frac{1}{n}\frac{\partial^2}{\partial \theta^2}\ln L(X,\theta)|{}_{\theta=\theta_o}\to -I_1(\theta_o), \end{aligned} $$
(iv)

and we may write (iii) as follows:

$$\displaystyle \begin{aligned} \sqrt{I_1(\theta_o)}\sqrt{n}(\hat{\theta}-\theta_o)\approx \frac{\sqrt{n}}{\sqrt{I_1(\theta_o)}}\frac{1}{n}\sum_{j=1}^n\frac{\partial}{\partial \theta}\ln f(x_j,\theta)|{}_{\theta=\theta_o}\,, \end{aligned} $$
(v)

where \(\frac {\partial }{\partial \theta }\ln f(x_j,\theta )\) has zero as its expected value and I 1(θ o) as its variance. Further, f(x j, θ), j = 1, …, n are iid variables. Hence, by the central limit theorem which is stated in Sect. 2.6,

$$\displaystyle \begin{aligned} \frac{\sqrt{n}}{\sqrt{I(\theta_o)}}\frac{1}{n}\sum_{j=1}^n\frac{\partial}{\partial \theta}\ln f(x_j,\theta)\to N_1(0,1)\mbox{ as }n\to\infty.{} \end{aligned} $$
(3.5.17)

where N 1(0, 1) is a univariate standard normal random variable. This may also be re-expressed as follows since I 1(θ o) is free of n:

$$\displaystyle \begin{aligned}\frac{1}{\sqrt{n}}\sum_{j=1}^n\frac{\partial}{\partial \theta}\ln f(x_j,\theta)|{}_{\theta=\theta_o}\to N_1(0, I_1(\theta_o)) \end{aligned}$$

or

$$\displaystyle \begin{aligned} \sqrt{I_1(\theta_o)}\sqrt{n}(\hat{\theta}-\theta_o)\to N_1(0,1)\mbox{ as }n\to\infty.{} \end{aligned} $$
(3.5.18)

Since I 1(θ o) is free of n, this result can also be written as

$$\displaystyle \begin{aligned} \sqrt{n}(\hat{\theta}-\theta_o)\to N_1\Big(0,\frac{1}{I_1(\theta_o)}\Big).{} \end{aligned} $$
(3.5.19)

Thus, the MLE \(\hat {\theta }\) is asymptotically unbiased, consistent and asymptotically normal, referring to (3.5.18) or (3.5.19).

Example 3.5.4

Show that the MLE of the parameter θ in a real scalar exponential population is unbiased, consistent, efficient and that asymptotic normality holds as in (3.5.18).

Solution 3.5.4

As per the notations introduced in this section,

$$\displaystyle \begin{aligned} f(x_j,\theta)&=\frac{1}{\theta}{\mathrm{e}}^{-\frac{x_j}{\theta}},\ 0\le x_j<\infty,\ \theta>0,\\ L&=\frac{1}{\theta^n}{\mathrm{e}}^{-\frac{1}{\theta}\sum_{j=1}^nx_j}.\end{aligned} $$

In the exponential population, E[x j] = θ, Var(x j) = θ 2, j = 1, …, n, the MLE of θ is \(\hat {\theta }=\bar {x},\ \bar {x}=\frac {1}{n}(x_1+\cdots +x_n)\) and \( {\mathrm {Var}}(\hat {\theta })=\frac {\theta ^2}{n}\to 0\) as n →. Thus, \(E[\hat {\theta }]=\theta \) and \( {\mathrm {Var}}(\hat {\theta })\to 0\) as n →. Hence, \(\hat {\theta }\) is unbiased and consistent for θ. Note that

$$\displaystyle \begin{aligned}\ln f(x_j,\theta)=-\ln \theta -\frac{1}{\theta}x_j\Rightarrow -E\Big[\frac{\partial^2}{\partial \theta^2}f(x_j,\theta)\Big]=-\frac{1}{\theta^2}+2\frac{E[x_j]}{\theta^3}=\frac{1}{\theta^2}=\frac{1}{{\mathrm{Var}}(x_j)}\ .\end{aligned}$$

Accordingly, the information bound is attained, that is, \(\hat {\theta }\) is minimum variance unbiased or most efficient. Letting the true value of θ be θ o, by the central limit theorem, we have

$$\displaystyle \begin{aligned}\frac{\bar{x}-\theta_o}{\sqrt{{\mathrm{Var}}(\bar{x})}}=\frac{\sqrt{n}(\hat{\theta}-\theta_o)}{\theta_o}\to N_1(0,1) \mbox{ as }n\to\infty, \end{aligned}$$

and hence the asymptotic normality is also verified. Is \(\hat {\theta }\) sufficient for θ? Let us consider the statistic u = x 1 + ⋯ + x n, the sample sum. If u is sufficient, then \(\bar {x}=\hat {\theta }\) is also sufficient. The mgf of u is given by

$$\displaystyle \begin{aligned}M_u(t)=\prod_{j=1}^n(1-\theta t)^{-1}=(1-\theta t)^{-n},\ 1-\theta t>0\ \Rightarrow u\ \sim \mbox{ gamma} (\alpha=n, \beta=\theta) \end{aligned}$$

whose density is \(f_1(u)=\frac {u^{n-1}}{\theta ^n\varGamma (n)}{\mathrm {e}}^{-\frac {u}{\theta }}, \ u=x_1+\cdots +x_n\). However, the joint density of x 1, …, x n is \(L=\frac {1}{\theta ^n}{\mathrm {e}}^{-\frac {1}{\theta }(x_1+\cdots +x_n)}.\) Accordingly, the conditional density of x 1, …, x n given \(\hat {\theta }=\bar {x}\) is

$$\displaystyle \begin{aligned}\frac{L}{f_1(u)}=\frac{\varGamma(n)}{u^{n-1}}\ , \end{aligned}$$

which is free of θ, and hence \(\hat {\theta }\) is also sufficient.

3.5.5. Some limiting properties in the p-variate case

The p-variate extension of the central limit theorem is now being considered. Let the p × 1 real vectors X 1, …, X n be iid with common mean value vector μ and the common covariance matrix Σ > O, that is, E(X j) = μ and Cov(X j) = Σ > O, j = 1, …, n. Assume that ∥Σ∥ <  where ∥(⋅)∥ denotes a norm of (⋅). Letting \(Y_j=\varSigma ^{-\frac {1}{2}}X_j\), \(E[Y_j]=\varSigma ^{-\frac {1}{2}}\mu \) and Cov(Y j) = I, j = 1, …, n, and letting \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\), \(\bar {Y}=\varSigma ^{-\frac {1}{2}}\bar {X}\), \(E(\bar {X})=\mu \) and \( E(\bar {Y})=\varSigma ^{-\frac {1}{2}}\mu \). If we let

$$\displaystyle \begin{aligned} U=\sqrt{n}\,\varSigma^{-\frac{1}{2}}(\bar{X}-\mu),{} \end{aligned} $$
(3.5.20)

the following result holds:

Theorem 3.5.10

Let the p × 1 vector U be as defined in (3.5.20). Then, as n ∞, U  N p(O, I).

Proof

Let L  = (a 1, …, a p) be an arbitrary constant vector such that L L = 1. Then, L X j, j = 1, …, n, are iid with common mean L μ and common variance Var(L X j) = L ΣL. Let \(Y_j=\varSigma ^{-\frac {1}{2}}X_j\) and \(u_j=L^{\prime }Y_j=L^{\prime }\varSigma ^{-\frac {1}{2}}X_j\). Then, the common mean of the u j’s is \(L^{\prime }\varSigma ^{-\frac {1}{2}}\mu \) and their common variance is \({\mathrm {Var}}(u_j)=L^{\prime }\varSigma ^{-\frac {1}{2}}\varSigma \varSigma ^{-\frac {1}{2}}L=L^{\prime }L=1,\ j=1,\ldots , n\). Note that \(\bar {u}=\frac {1}{n}(u_1+\cdots +u_n)=L^{\prime }\bar {Y}=L^{\prime }\varSigma ^{-\frac {1}{2}}\bar {X}\) and that \({\mathrm {Var}}(\bar {u})=\frac {1}{n}L^{\prime }L=\frac {1}{n}\). Then, in light of the univariate central limit theorem as stated in Sect. 2.6, we have \(\sqrt {n}L^{\prime }\varSigma ^{-\frac {1}{2}}(\bar {X}-\mu )\to N_1(0,1)\) as n →. If, for some p-variate vector W, L W is univariate normal for arbitrary L, it follows from a characterization of the multivariate normal distribution that W is p-variate normal vector. Thus,

$$\displaystyle \begin{aligned} U=\sqrt{n}\varSigma^{-\frac{1}{2}}(\bar{X}-\mu)\to N_p(O,I)\mbox{ as }n\to\infty,{} \end{aligned} $$
(3.5.21)

which completes the proof.

A parallel result also holds in the complex domain. Let \(\tilde {X}_j,\ j=1,\ldots , n,\) be iid from some complex population with mean \(\tilde {\mu }\) and Hermitian positive definite covariance matrix \(\tilde {\varSigma }=\tilde {\varSigma }^{*}>O\) where \(\Vert \tilde {\varSigma }\Vert <\infty \). Letting \(\bar {\tilde {X}}=\frac {1}{n}(\tilde {X}_1+\cdots +\tilde {X}_n)\), we have

$$\displaystyle \begin{aligned} \sqrt{n}\,\tilde{\varSigma}^{-\frac{1}{2}}(\bar{\tilde{X}}-\tilde{\mu})\to N_p(O,I)\mbox{ as }n\to\infty.{} \end{aligned} $$
(3.5a.5)

Exercises 3.5

3.2.33

By making use of the mgf or otherwise, show that the sample mean \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) in the real p-variate Gaussian case, X j ∼ N p(μ, Σ), Σ > O, is again Gaussian distributed with the parameters μ and \(\frac {1}{n}\varSigma \).

3.2.34

Let X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n and iid. Let X = (X 1, …, X n) be the p × n sample matrix. Derive the density of (1) tr(Σ −1(X −M)(XM) where M = (μ, …, μ) or a p × n matrix where all the columns are μ; (2) tr(XX ). Derive the densities in both the cases, including the noncentrality parameter.

3.2.35

Let the p × 1 real vector X j ∼ N p(μ, Σ), Σ > O for j = 1, …, n and iid. Let X = (X 1, …, X n) the p × n sample matrix. Derive the density of \({\mathrm {tr}}({\mathbf {X}}-{\bar {\mathbf {X}}})({\mathbf {X}}-{\bar {\mathbf {X}}})^{\prime }\) where \({\bar {\mathbf {X}}}=(\bar {X},\ldots , \bar {X})\) is the p × n matrix where every column is \(\bar {X}\).

3.2.36

Repeat Exercise 3.2.33 for the p-variate complex Gaussian case.

3.2.37

Repeat Exercise 3.2.34 for the complex Gaussian case and write down the density explicitly.

3.2.38

Consider a real bivariate normal density with the parameters μ 1, μ 2, \(\sigma _1^2,\sigma _2^2,\rho \). Write down the density explicitly. Consider a simple random sample of size n, X 1, …, X n, from this population where X j is 2 × 1, j = 1, …, n. Then evaluate the MLE of these five parameters by (1) by direct evaluation, (2) by using the general formula.

3.2.39

In Exercise 3.2.38 evaluate the maximum likelihood estimates of the five parameters if the following is an observed sample from this bivariate normal population:

3.2.40

Repeat Exercise 3.2.38 if the population is a bivariate normal in the complex domain.

3.2.41

Repeat Exercise 3.2.39 if the following is an observed sample from the complex bivariate normal population referred to in Exercise 3.2.40:

3.2.42

Let the p × 1 real vector X 1 be Gaussian distributed, X 1 ∼ N p(O, I). Consider the quadratic forms \(u_1=X_1^{\prime }A_1X_1, u_2=X_1^{\prime }A_2X_1\). Let \(A_j=A_j^2,\ j=1,2\) and A 1 + A 2 = I. What can you say about the chisquaredness and independence of u 1 and u 2? Prove your assertions.

3.2.43

Let X 1 ∼ N p(O, I). Let \(u_j=X_1^{\prime }A_jX_1, \ A_j=A_j^2,\ j=1,\ldots , k,\ A_1+\cdots +A_k=I\). What can you say about the chisquaredness and independence of the u j’s? Prove your assertions.

5.3.12. Repeat Exercise 3.2.43 for the complex case. 5.3.13. Let X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n and iid. Let \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\). Show that the exponent in the density of \(\bar {X}\), excluding \(-\frac {1}{2}\), namely, \(\sqrt {n}(\bar {X}-\mu )^{\prime }\varSigma ^{-1}(\bar {X}-\mu )\sim \chi _p^2\). Derive the density of \({\mathrm {tr}}(\bar {X}^{\prime }\varSigma ^{-1}\bar {X})\). 3.5.14. Let \(Q=\sqrt {n}(\bar {X}-\mu )^{\prime }\varSigma ^{-1}(\bar {X}-\mu )\) as in Exercise 5.3.13. For a given α consider the probability statement Pr{Q ≥ b} = α. Show that \(b=\chi _{p,\alpha }^2\) where \(Pr\{\chi _p^2\ge \chi _{p,\alpha }^2\}=\alpha \). 3.5.15. Let \(Q_1=\sqrt {n}(\bar {X}-\mu _o)^{\prime }\varSigma ^{-1}(\bar {X}-\mu _o)\) where \(\bar {X},\ \varSigma \) and μ are all as defined in Exercise 5.3.14. If μ oμ, show that \(Q_1\sim \chi _p^2(\lambda )\) where the noncentrality parameter \(\lambda =\frac {1}{2}(\mu -\mu _o)^{\prime }\varSigma ^{-1}(\mu -\mu _o)\).

3.6. Elliptically Contoured Distribution, Real Case

Let X be a real p × 1 vector of distinct real scalar variables with x 1, …, x p as its components. For some p × 1 parameter vector B and p × p positive definite constant matrix A > O, consider the positive definite quadratic form (XB) A(X − B). We have encountered such a quadratic form in the exponent of a real p-variate Gaussian density, in which case B = μ is the mean value vector and A = Σ −1, Σ being the positive definite covariance matrix. Let g(⋅) ≥ 0 be a non-negative function such that \(|A|{ }^{\frac {1}{2}}g((X-B)^{\prime }A(X-B))\ge 0\) and

$$\displaystyle \begin{aligned} \int_X|A|{}^{\frac{1}{2}}g((X-B)^{\prime}A(X-B)){\mathrm{d}}X=1,{} \end{aligned} $$
(3.6.1)

so that \(|A|{ }^{\frac {1}{2}}g((X-B)^{\prime }A(X-B))\) is a statistical density. Such a density is referred to as an elliptically contoured density.

3.6.1. Some properties of elliptically contoured distributions

Let \(Y=A^{\frac {1}{2}}(X-B).\) Then, from Theorem 1.6.1, \({\mathrm {d}}X=|A|{ }^{-\frac {1}{2}}{\mathrm {d}}Y\) and from (3.6.1),

$$\displaystyle \begin{aligned} \int_Yg(Y^{\prime}Y){\mathrm{d}}Y=1{} \end{aligned} $$
(3.6.2)

where

$$\displaystyle \begin{aligned}Y^{\prime}Y=y_1^2+y_2^2+\cdots+y_p^2, \ Y^{\prime}=(y_1,\ldots, y_p). \end{aligned}$$

We can further simplify (3.6.2) via a general polar coordinate transformation:

$$\displaystyle \begin{aligned} y_1&=r~\sin\theta_1\\ y_2&=r~\cos\theta_1\sin\theta_2\\ y_3&=r~\cos\theta_1\cos\theta_2\sin\theta_3\\ &\ \, \vdots\\ y_{p-2}&=r~\cos\theta_1\cdots\cos\theta_{p-3}\sin\theta_{p-2}\\ y_{p-1}&=r~\cos\theta_1\cdots\cos\theta_{p-2}\sin\theta_{p-1}\\ y_p&=r~\cos\theta_1\cdots\cos\theta_{p-1}{} \end{aligned} $$
(3.6.3)

for \(-\frac {\pi }{2}<\theta _j\le \frac {\pi }{2},\ j=1,\ldots , p-2,\ -\pi <\theta _{p-1}\le \pi , \ 0\le r<\infty \). It then follows that

$$\displaystyle \begin{aligned} {\mathrm{d}}y_1\wedge\ldots\wedge{\mathrm{d}}y_p=r^{p-1}(\cos\theta_1)^{p-1}\cdots(\cos\theta_{p-1})\,{\mathrm{d}}r\wedge{\mathrm{d}}\theta_1\wedge\ldots\wedge{\mathrm{d}}\theta_{p-1}.{} \end{aligned} $$
(3.6.4)

Thus,

$$\displaystyle \begin{aligned}y_1^2+\cdots+y_p^2=r^2. \end{aligned}$$

Given (3.6.3) and (3.6.4), observe that r, θ 1, …, θ p−1 are mutually independently distributed. Separating the factors containing θ i from (3.6.4) and then, normalizing it, we have

$$\displaystyle \begin{aligned} \int_{-\frac{\pi}{2}}^{\frac{\pi}{2}}(\cos\theta_i)^{p-i-1}{\mathrm{d}}\theta_i=1\ \Rightarrow \ 2\int_0^{\frac{\pi}{2}}(\cos\theta_i)^{p-i-1}{\mathrm{d}}\theta_i=1. \end{aligned} $$
(i)

Let \(u=\sin \theta _i\Rightarrow {\mathrm {d}}u=\cos \theta _i{\mathrm {d}}\theta _i\). Then (i) becomes \(2\int _0^1(1-u^2)^{\frac {p-i}{2}-1}{\mathrm {d}}u=1\), and letting v = u 2 gives (i) as

$$\displaystyle \begin{aligned} \int_0^1v^{\frac{1}{2}-1}(1-v)^{\frac{p-i}{2}-1}{\mathrm{d}}v=\frac{\varGamma(\frac{1}{2})\varGamma(\frac{p-i}{2})}{\varGamma(\frac{p-i+1}{2})}. \end{aligned} $$
(ii)

Thus, the density of θ j, denoted by f j(θ j), is

$$\displaystyle \begin{aligned} f_j(\theta_j)=\frac{\varGamma(\frac{p-j+1}{2})}{\varGamma(\frac{1}{2})\varGamma(\frac{p-j}{2})}(\cos\theta_j)^{p-j-1},\ -\frac{\pi}{2}<\theta_j\le \frac{\pi}{2},{} \end{aligned} $$
(3.6.5)

and zero, elsewhere, for j = 1, …, p − 2, and

$$\displaystyle \begin{aligned} f_{p-1}(\theta_{p-1})=\frac{1}{2\pi},\ -\pi<\theta_{p-1}\le \pi, \end{aligned} $$
(iii)

and zero elsewhere. Taking the product of the p − 2 terms in (ii) and (iii), the total integral over the θ j’s is available as

$$\displaystyle \begin{aligned} \Big\{\prod_{j=1}^{p-2}\int_{-\frac{\pi}{2}}^{\frac{\pi}{2}}(\cos\theta_j)^{p-j-1}{\mathrm{d}}\theta_j\Big\}\int_{-\pi}^{\pi}{\mathrm{d}}\theta_{p-1}=\frac{2\pi^{\frac{p}{2}}}{\varGamma(\frac{p}{2})}.{} \end{aligned} $$
(3.6.6)

The expression in (3.6.6), excluding 2, can also be obtained by making the transformation s = Y Y  and then writing ds in terms of dY  by appealing to Theorem 4.2.3.

3.6.2. The density of u = r 2

From (3.6.2) and (3.6.3),

$$\displaystyle \begin{aligned} \frac{2\pi^{\frac{p}{2}}}{\varGamma(\frac{p}{2})}\int_0^{\infty}r^{p-1}g(r^2){\mathrm{d}}r=1,{} \end{aligned} $$
(3.6.7)

that is,

$$\displaystyle \begin{aligned} 2\int_{r=0}^{\infty}r^{p-1}g(r^2){\mathrm{d}}r=\frac{\varGamma(\frac{p}{2})}{\pi^{\frac{p}{2}}}. \end{aligned} $$
(iv)

Letting u = r 2, we have

$$\displaystyle \begin{aligned} \int_0^{\infty}u^{\frac{p}{2}-1}g(u){\mathrm{d}}u=\frac{\varGamma(\frac{p}{2})}{\pi^{\frac{p}{2}}}, \end{aligned} $$
(v)

and the density of r, denoted by f r(r), is available from (3.6.7) as

$$\displaystyle \begin{aligned} f_r(r)=\frac{2\pi^{\frac{p}{2}}}{\varGamma(\frac{p}{2})}\ r^{p-1}g(r^2), \ 0\le r<\infty,{} \end{aligned} $$
(3.6.8)

and zero, elsewhere. The density of u = r 2 is then

$$\displaystyle \begin{aligned} f_u(u)=\frac{\pi^{\frac{p}{2}}}{\varGamma(\frac{p}{2})}u^{\frac{p}{2}-1}g(u),\ 0\le u<\infty,{} \end{aligned} $$
(3.6.9)

and zero, elsewhere. Considering the density of Y  given in (3.6.2), we may observe that y 1, …, y p are identically distributed.

Theorem 3.6.1

If y j = r u j, j = 1, …, p, in the transformation in (3.6.3), then \(E[u_j^2]=\frac {1}{p}, \ j=1,\ldots , p,\) and u 1, …, u p are uniformly distributed.

Proof

From (3.6.3), y j = ru j, j = 1, …, p. We may observe that \(u_1^2+\cdots +u_p^2=1\) and that u 1, …, u p are identically distributed. Hence \(E[u_1^2]+E[u_2^2]+\cdots +E[u_p^2]=1\Rightarrow E[u_j^2]=\frac {1}{p}\).

Theorem 3.6.2

Consider the y j ’s in Eq.(3.6.2). If g(u) is free of p and if E[u] < ∞, then \(E[y_j^2]=\frac {1}{2\pi }\) ; otherwise, \(E[y_j^2]=\frac {1}{p}E[u]\) provided E[u] exists.

Proof

Since r and u j are independently distributed and since \(E[u_j^2]=\frac {1}{p}\) in light of Theorem 3.6.1, \(E[y_j^2]=E[r^2]E[u_j^2]=\frac {1}{p}E[r^2]=\frac {1}{p}E[u]\). From (3.6.9),

$$\displaystyle \begin{aligned} \int_0^{\infty}u^{\frac{p}{2}-1}g(u){\mathrm{d}}u=\frac{\varGamma(\frac{p}{2})}{\pi^{\frac{p}{2}}}. \end{aligned} $$
(vi)

However,

$$\displaystyle \begin{aligned} E[u]=\frac{\pi^{\frac{p}{2}}}{\varGamma(\frac{p}{2})}\int_{u=0}^{\infty}u^{\frac{p}{2}+1-1}g(u){\mathrm{d}}u. \end{aligned} $$
(vii)

Thus, assuming that g(u) is free of p, that \(\frac {p}{2}\) can be taken as a parameter and that (vii) is convergent,

$$\displaystyle \begin{aligned} E[y_j^2]=\frac{1}{p}E[r^2]=\frac{1}{p}E[u]=\frac{1}{p}\frac{\pi^{\frac{p}{2}}}{\varGamma(\frac{p}{2})}\frac{\varGamma(\frac{p}{2}+1)}{\pi^{\frac{p}{2}+1}} =\frac{1}{2\pi};{} \end{aligned} $$
(3.6.10)

otherwise, \(E[y_j^2]=\frac {1}{p}E[u]\) as long as E[u] < .

3.6.3. Mean value vector and covariance matrix

From (3.6.1),

$$\displaystyle \begin{aligned}E[X]=|A|{}^{\frac{1}{2}}\int_XX~g((X-B)^{\prime}A(X-B)){\mathrm{d}}X.\end{aligned} $$

Noting that

$$\displaystyle \begin{aligned} E[X]&=E[X-B+B]=B+E[X-B]\\ &=B+|A|{}^{\frac{1}{2}}\int_X(X-B)g((X-B)^{\prime}A(X-B)){\mathrm{d}}X\end{aligned} $$

and letting \(Y=A^{\frac {1}{2}}(X-B)\), we have

$$\displaystyle \begin{aligned}E[X]=B+\int_YYg(Y^{\prime}Y){\mathrm{d}}Y,\ -\infty<y_j<\infty,\ j=1,\ldots, p. \end{aligned}$$

But Y Y  is even whereas each element in Y  is linear and odd. Hence, if the integral exists, ∫Y Yg(Y Y )dY = O and so, E[X] = B ≡ μ. Let V = Cov(X), the covariance matrix associated with X. Then

$$\displaystyle \begin{aligned} V&=E[(X-\mu)(X-\mu)^{\prime}]=A^{-\frac{1}{2}}[\int_Y(YY^{\prime})g(Y^{\prime}Y){\mathrm{d}}Y]A^{-\frac{1}{2}},\qquad \ \ \ \ \\ {\mathrm{where}}\qquad \qquad \qquad Y&=A^{\frac{1}{2}}(X-\mu),{} \end{aligned} $$
(3.6.11)
(viii)

Since the non-diagonal elements, y i y j, ij, are odd and g(Y Y ) is even, the integrals over the non-diagonal elements are equal to zero whenever the second moments exist. Since E[Y ] = O, V = E(YY ). It has already been determined in (3.6.10) that \(E[y_j^2]=\frac {1}{2\pi }\) for j = 1, …, p, whenever g(u) is free of p and E[u] exists, the density of u being as specified in (3.6.9). If g(u) is not free of p, the diagonal elements will each integrate out to \(\frac {1}{p}E[r^2]\). Accordingly,

$$\displaystyle \begin{aligned} {\mathrm{Cov}}(X)=V=\frac{1}{2\pi}A^{-1}\ \ \mbox{or }\ \ V=\frac{1}{p}E[r^2]A^{-1}.{} \end{aligned} $$
(3.6.12)

Theorem 3.6.3

When X has the p-variate elliptically contoured distribution defined in (3.6.1), the mean value vector of X, E[X] = B and the covariance of X, denoted by Σ, is such that \(\varSigma =\frac {1}{p}E[r^2]A^{-1}\) where A is the parameter matrix in (3.6.1), u = r 2 and r is defined in the transformation (3.6.3).

3.6.4. Marginal and conditional distributions

Consider the density

$$\displaystyle \begin{aligned} f(X)=|A|{}^{\frac{1}{2}}g((X-\mu)^{\prime}A(X-\mu)), \ A>O,\ -\infty<x_j<\infty,\ -\infty<\mu_j<\infty{}\end{aligned} $$
(3.6.13)

where X  = (x 1, …, x p), μ  = (μ 1, …, μ p), A = (a ij) > O. Consider the following partitioning of X, μ and A:

where X 1, μ 1 are p 1 × 1, X 2, μ 2 are p 2 × 1, A 11 is p 1 × p 1 and A 22 is p 2 × p 2, p 1 + p 2 = p. Then, as was established in Sect. 3.3,

$$\displaystyle \begin{aligned} (X-\mu)^{\prime}A(X-\mu)&=(X_1-\mu_{(1)})^{\prime}A_{11}(X_1-\mu_{(1)})+2(X_2-\mu_{(2)})^{\prime}A_{21}(X_1-\mu_{(1)})\\ &\ \ \ \ \ +(X_2-\mu_{(2)})^{\prime}A_{22}(X_2-\mu_{(2)})\\ &=(X_1-\mu_{(1)})^{\prime}[A_{11}-A_{12}A_{22}^{-1}A_{21}](X_1-\mu_{(1)})\\ &\ \ \ \ \ +(X_2-\mu_{(2)}+C)^{\prime}A_{22}(X_2-\mu_{(2)}+C),\ C=A_{22}^{-1}A_{21}(X_1-\mu_{(1)}).\end{aligned} $$

In order to obtain the marginal density of X 1, we integrate out X 2 from f(X). Let the marginal densities of X 1 and X 2 be respectively denoted by g 1(X 1) and g 2(X 2). Then

$$\displaystyle \begin{aligned} g_1(X_1)&=|A|{}^{\frac{1}{2}}\int_{X_2}g((X_1-\mu_{(1)})^{\prime}[A_{11}-A_{12}A_{22}^{-1}A_{21}](X_1-\mu_{(1)})\\ &\ \ \ \ \ +(X_2-\mu_{(2)}+C)^{\prime}A_{22}(X_2-\mu_{(2)}+C)){\mathrm{d}}X_2.\end{aligned} $$

Letting \(A_{22}^{\frac {1}{2}}(X_2-\mu _{(2)}+C)=Y_2\), \({\mathrm {d}}Y_2=|A_{22}|{ }^{\frac {1}{2}}\,{\mathrm {d}}X_2\) and

$$\displaystyle \begin{aligned}g_1(X_1)=|A|{}^{\frac{1}{2}}|A_{22}|{}^{-\frac{1}{2}}\int_{Y_2}g((X_1-\mu_{(1)})^{\prime}[A_{11}-A_{12}A_{22}^{-1}A_{21}](X_1-\mu_{(1)})+Y_2^{\prime}Y_2)\,{\mathrm{d}}Y_2. \end{aligned}$$

Note that \(|A|=|A_{22}|~|A_{11}-A_{12}A_{22}^{-1}A_{21}|\) from the results on partitioned matrices presented in Sect. 1.3 and thus, \(|A|{ }^{\frac {1}{2}}|A_{22}|{ }^{-\frac {1}{2}}=|A_{11}-A_{12}A_{22}^{-1}A_{21}|{ }^{\frac {1}{2}}\). We have seen that Σ −1 is a constant multiple of A where Σ is the covariance matrix of the p × 1 vector X. Then

$$\displaystyle \begin{aligned}(\varSigma^{11})^{-1}=\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21} \end{aligned}$$

which is a constant multiple of \(A_{11}-A_{12}A_{22}^{-1}A_{21}\). If \(Y_2^{\prime }Y_2=s_2\), then from Theorem 4.2.3,

$$\displaystyle \begin{aligned} {\mathrm{d}}Y_2=\frac{\pi^{\frac{p_2}{2}}}{\varGamma(\frac{p_2}{2})}\int_{s_2>0}s_2^{\frac{p_2}{2}-1}g(s_2+u_1)\,{\mathrm{d}}s_2{} \end{aligned} $$
(3.6.14)

where \(u_1=(X_1-\mu _{(1)})^{\prime }[A_{11}-A_{12}A_{22}^{-1}A_{21}](X_1-\mu _{(1)})\). Note that (3.6.14) is elliptically contoured or X 1 has an elliptically contoured distribution. Similarly, X 2 has an elliptically contoured distribution. Letting \(Y_{11}=(A_{11}-A_{12}A_{22}^{-1}A_{21})^{\frac {1}{2}}(X_1-\mu _{(1)})\), then Y 11 has a spherically symmetric distribution. Denoting the density of Y 11 by g 11(Y 11), we have

$$\displaystyle \begin{aligned} g_1(Y_{11})=\frac{\pi^{\frac{p_2}{2}}}{\varGamma(\frac{p_2}{2})}\int_{s_2>0}s_2^{\frac{p_2}{2}-1}g(s_2+Y_{11}^{\prime}Y_{11})\,{\mathrm{d}}s_2.{} \end{aligned} $$
(3.6.15)

By a similar argument, the marginal density of X 2, namely g 2(X 2), and the density of Y 22, namely g 22(Y 22), are as follows:

$$\displaystyle \begin{aligned} g_2(X_2)&=|A_{22}-A_{21}A_{11}^{-1}A_{12}|{}^{\frac{1}{2}}\frac{\pi^{\frac{p_1}{2}}}{\varGamma(\frac{p_1}{2})}\\ &\ \ \ \ \ \ \times\int_{s_1>0}s_1^{\frac{p_1}{2}-1}g(s_1 +(X_2-\mu_{(2)})^{\prime}[A_{22}-A_{21}A_{11}^{-1}A_{12}](X_2-\mu_{(2)}))\,{\mathrm{d}}s_1,\\ g_{22}(Y_{22})&=\frac{\pi^{\frac{p_1}{2}}}{\varGamma(\frac{p_1}{2})}\int_{s_1>0}s_1^{\frac{p_1}{2}-1}g(s_1+Y_{22}^{\prime}Y_{22})\,{\mathrm{d}}s_1.{} \end{aligned} $$
(3.6.16)

3.6.5. The characteristic function of an elliptically contoured distribution

Let T be a p × 1 parameter vector, T  = (t 1, …, t p), so that T X = t 1 x 1 + ⋯ + t p x p. Then, the characteristic function of X, denoted by ϕ X(T), is \(E[{\mathrm {e}}^{i\,T^{\prime }X}]\) where E denotes the expected value and \(i=\sqrt {(-1)}\), that is,

$$\displaystyle \begin{aligned} \phi_X(T)=E[{\mathrm{e}}^{i\,T^{\prime}X}]=\int_X{\mathrm{e}}^{i\,T^{\prime}X}|A|{}^{\frac{1}{2}}g((X-\mu)^{\prime}A(X-\mu)){\mathrm{d}}X.{} \end{aligned} $$
(3.6.17)

Writing X as X − μ + μ and then making the transformation \(Y=A^{\frac {1}{2}}(X-\mu )\), we have

$$\displaystyle \begin{aligned} \phi_X(T)={\mathrm{e}}^{i\,T^{\prime}\mu}\int_Y{\mathrm{e}}^{i\, T^{\prime}A^{-\frac{1}{2}}Y}g(Y^{\prime}Y){\mathrm{d}}Y.{} \end{aligned} $$
(3.6.18)

However, g(Y Y ) is invariant under orthonormal transformation of the type Z = PY, PP  = I, P P = I, as Z Z = Y Y  so that g(Y Y ) = g(Z Z) for all orthonormal matrices. Thus,

$$\displaystyle \begin{aligned} \phi_X(T)={\mathrm{e}}^{i\,T^{\prime}\mu}\int_Z{\mathrm{e}}^{i\, T^{\prime}A^{-\frac{1}{2}}P^{\prime}Z}g(Z^{\prime}Z){\mathrm{d}}Z {} \end{aligned} $$
(3.6.19)

for all P. This means that the integral in (3.6.19) is a function of \((T^{\prime }A^{-\frac {1}{2}})(T^{\prime }A^{-\frac {1}{2}})^{\prime }=T^{\prime }A^{-1}T\), say ψ(T A −1 T). Then,

$$\displaystyle \begin{aligned} \phi_X(T)={\mathrm{e}}^{i\,T^{\prime}\mu}\psi(T^{\prime}A^{-1}T){} \end{aligned} $$
(3.6.20)

where A −1 is proportional to Σ, the covariance matrix of X, and

$$\displaystyle \begin{aligned}\frac{\partial}{\partial T}\phi_{X}(T)|{}_{T=O}=i\mu\Rightarrow E(X)=\mu;\end{aligned}$$

the reader may refer to Chap. 1 for vector/matrix derivatives. Now, considering ϕ Xμ(T), we have

$$\displaystyle \begin{aligned} &\frac{\partial}{\partial T}\phi_{X-\mu}(T)=\frac{\partial}{\partial T}\psi(T^{\prime}A^{-1}T)=\psi^{\prime}(TA^{-1}T)2A^{-1}T\\ &\Rightarrow \frac{\partial}{\partial T^{\prime}}\psi(T^{\prime}A^{-1}T)=\psi^{\prime}(T^{\prime}A^{-1}T)2T^{\prime}A^{-1}\\ &\Rightarrow\frac{\partial}{\partial T}\frac{\partial}{\partial T^{\prime}}\psi(T^{\prime}A^{-1}T)=\psi^{\prime\prime}(T^{\prime}A^{-1}T)(2A^{-1}T)(2T^{\prime}A^{-1})+\psi^{\prime}(TA^{-1}T)2A^{-1}\\ &\Rightarrow\frac{\partial}{\partial T}\frac{\partial}{\partial T^{\prime}}\psi(T^{\prime}A^{-1}T)|{}_{T=O}=2A^{-1}, \end{aligned} $$

assuming that ψ (T A −1 T)|T=O = 1 and ψ ′′(T A −1 T)|T=O = 1, where \(\psi ^{\prime }(u)=\frac {{\mathrm {d}}}{{\mathrm {d}}u}\psi (u)\) for a real scalar variable u and ψ ′′(u) denotes the second derivative of ψ with respect to u. The same procedure can be utilized to obtain higher order moments of the type E[ ⋯XX XX ] by repeatedly applying vector derivatives to ϕ X(T) as \(\cdots \frac {\partial }{\partial T}\frac {\partial }{\partial T^{\prime }}\) operating on ϕ X(T) and then evaluating the result at T = O. Similarly, higher order central moments of the type E[ ⋯(X − μ)(Xμ)(X − μ)(Xμ)] are available by applying the vector differential operator \(\cdots \frac {\partial }{\partial T}\frac {\partial }{\partial T^{\prime }}\) on ψ(T A −1 T) and then evaluating the result at T = O. However, higher moments with respect to individual variables, such as \(E[x_j^k],\) are available by differentiating ϕ X(T) partially k times with respect to t j, and then evaluating the resulting expression at T = O. If central moments are needed then the differentiation is done on ψ(T A −1 T).

Thus, we can obtain results parallel to those derived for the p-variate Gaussian distribution by applying the same procedures on elliptically contoured distributions. Accordingly, further discussion of elliptically contoured distributions will not be taken up in the coming chapters. Exercises 3.6

3.2.44

Let x 1, …, x k be independently distributed real scalar random variables with density functions f j(x j), j = 1, …, k. If the joint density of x 1, …, x k is of the form \(f_1(x_1)\cdots f_k(x_k)=g(x_1^2+\cdots +x_k^2)\) for some differentiable function g, show that x 1, …, x k are identically distributed as Gaussian random variables.

3.2.45

Letting the real scalar random variables x 1, …, x k have a joint density such that f(x 1, …, x k) = c for \(x_1^2+\cdots +x_k^2\le r^2,\ r>0,\) show that (1) (x 1, …, x k) is uniformly distributed over the volume of the k-dimensional sphere; (2) E[x j] = 0, Cov(x i, x j) = 0, ij = 1, …, k; (3) x 1, …, x k are not independently distributed.

3.2.46

Let u = (XB) A(X − B) in Eq. (3.6.1), where A > O and A is a p × p matrix. Let g(u) = c 1(1 − a u)ρ, a > 0, 1 − a u > 0 and c 1 is an appropriate constant. If \(|A|{ }^{\frac {1}{2}}g(u)\) is a density, show that (1) this density is elliptically contoured; (2) evaluate its normalizing constant and specify the conditions on the parameters.

3.2.47

Solve Exercise 3.2.46 for g(u) = c 2(1 + a u)ρ, where c 2 is an appropriate constant.

3.2.48

Solve Exercise 3.2.46 for g(u) = c 3 u γ−1(1 − a u)β−1, a > 0, 1 − a u > 0 and c 3 is an appropriate constant.

3.2.49

Solve Exercise 3.2.46 for g(u) = c 4 u γ−1eau, a > 0 where c 4 is an appropriate constant.

3.2.50

Solve Exercise 3.2.46 for g(u) = c 5 u γ−1(1 + a u)−(ρ+γ), a > 0 where c 5 is an appropriate constant.

3.2.51

Solve Exercises 3.2.46 to 3.2.50 by making use of the general polar coordinate transformation.

3.2.52

Let \(s=y_1^2+\cdots +y_p^2\) where y j, j = 1, …, p, are real scalar random variables. Let dY = dy 1 ∧… ∧dy p and let ds be the differential in s. Then, it can be shown that \({\mathrm {d}}Y=\frac {\pi ^{\frac {p}{2}}}{\varGamma (\frac {p}{2})}s^{\frac {p}{2}-1}{\mathrm {d}}s\). By using this fact, solve Exercises 3.2.463.2.50.

3.2.53

If write down the elliptically contoured density in (3.6.1) explicitly by taking an arbitrary b = E[X] = μ, if (1) g(u) = (ac u)α, a > 0 , c > 0, a − c u > 0; (2) g(u) = (a + c u)β, a >  0, c > 0, and specify the conditions on α and β.