10.1. Introduction

We will keep utilizing the same notations in this chapter. More specifically, lower-case letters x, y, … will denote real scalar variables, whether mathematical or random. Capital letters X, Y, … will be used to denote real matrix-variate mathematical or random variables, whether square or rectangular matrices are involved. A tilde will be placed above letters such as \(\tilde {x},\tilde {y},\tilde {X},\tilde {Y}\) to denote variables in the complex domain. Constant matrices will for instance be denoted by A, B, C. A tilde will not be used on constant matrices unless the point is to be stressed that the matrix is in the complex domain. The determinant of a square matrix A will be denoted by |A| or det(A) and, in the complex case, the absolute value or modulus of the determinant of A will be denoted as |det(A)|. When matrices are square, their order will be taken as p × p, unless specified otherwise. When A is a full rank matrix in the complex domain, then AA is Hermitian positive definite where an asterisk designates the complex conjugate transpose of a matrix. Additionally, dX will indicate the wedge product of all the distinct differentials of the elements of the matrix X. Letting the p × q matrix X = (x ij) where the x ij’s are distinct real scalar variables, \(\mathrm {d}X=\wedge _{i=1}^p\wedge _{j=1}^q\mathrm {d}x_{ij}\). For the complex matrix \(\tilde {X}=X_1+iX_2,\ i=\sqrt {(-1)}\), where X 1 and X 2 are real, \(\mathrm {d}\tilde {X}=\mathrm {d}X_1\wedge \mathrm {d}X_2\).

The necessary theory for the study of Canonical Correlation Analysis has already been introduced in Chap. 1, including the problem of optimizing a real bilinear form subject to two quadratic form constraints. This topic happens to be connected to the prediction problem. In regression analysis, the objective consists of seeking the best prediction function of a real scalar variable y based on a collection of preassigned real scalar variables x 1, …, x k. It was previously determined that the regression of y on x 1, …, x k, or the best predictor of y at preassigned values of x 1, …, x k, is the conditional expectation of y at the specified values of x 1, …, x k, that is, E[y|x 1, …, x k] where E denotes the expected value. In this case, best is understood to mean ‘in the minimum mean square’ sense. Now, consider the following generalization of this problem. Suppose that we wish to determine the best prediction function for a set of real scalar variables y 1, …, y q, on the basis of a collection of real scalar variables x 1, …, x p, where p needs not be equal to q. Since individual variables are available from linear functions of those variables, we will convert the problem into one of predicting a linear function of y 1, …, y q from an arbitrary linear function of x 1, …, x p, and vice versa if we are interested in determining the association between two sets of variables. Let the linear functions be u = α 1 x 1 + ⋯ + α p x p = α′X with α′ = (α 1, …, α p) and X′ = (x 1, …, x p) and v = β 1 y 1 + ⋯ + β q y q = β′Y  with β′ = (β 1, …, β q) and Y = (y 1, …, y q), where the coefficient vectors α and β are arbitrary. Let us provide an interpretion of best predictor in the case of two linear functions. As a criterion, we may make use of the maximum joint scatter, that is, the joint variation in u and v as measured by the covariance between u and v or, equivalently, the maximum scale-free covariance, namely, the correlation between u and v, and optimize this joint variation. Given the properties of linear functions of real scalar variables, we obtain the variances of linear functions and covariance between linear functions as follows: Var(u) = α′Σ 11 α, Var(v) = β′Σ 22 β, Cov(u, v) = α′Σ 12 β = β′Σ 21 α, \(\varSigma _{12}^{\prime }=\varSigma _{21}\), where Σ 11 > O and Σ 22 > O are the variance-covariance matrices of X and Y , respectively, and \(\varSigma _{12}=\varSigma _{21}^{\prime }\) accounts for the covariance between X and Y . Letting the augmented vector and its associated covariance matrix be Σ, we have

Our aim is to maximize α′Σ 12 β = β′Σ 21 α. When the coefficient vectors α and β are unrestricted, the optimization of α′Σ 12 β proves meaningless since the quantity α′Σ 12 β can vary from − to . Consequently, we impose the constraints, α′Σ 11 α = 1 and β′Σ 22 β = 1, to the coefficient vectors α and β. Accordingly, the mathematical problem consists of optimizing α′Σ 12 β subject to α′Σ 11 α = 1 and β′Σ 22 β = 1.

Letting

$$\displaystyle \begin{aligned}w=\alpha'\varSigma_{12}\beta-\frac{\rho_1}{2}(\alpha'\varSigma_{11}\alpha-1)-\frac{\rho_2}{2}(\beta'\varSigma_{22}\beta-1) \end{aligned}$$
(i)

where ρ 1 and ρ 2 are the Lagrangian multipliers, we differentiate w with respect to α and β and equate the resulting functions to null vectors. When differentiating with respect to β, we may utilize the equivalent form β′Σ 21 α = α′Σ 12 β. We then obtain the following equations:

$$\displaystyle \begin{aligned} \frac{\partial}{\partial \alpha}w=O&\Rightarrow \varSigma_{12}\beta-\rho_1\varSigma_{11}\alpha=O \end{aligned} $$
(ii)
$$\displaystyle \begin{aligned} \frac{\partial}{\partial \beta}w=O&\Rightarrow \varSigma_{21}\alpha-\rho_2\varSigma_{22}\beta=O. \end{aligned} $$
(iii)

On pre-multiplying (ii) by α′ and (iii), by β′, and using the fact that α′Σ 11 α = 1 and β′Σ 22 β = 1, one has ρ 1 = ρ 2 ≡ ρ and α′Σ 12 β = ρ. Thus,

(10.1.1)

and

$$\displaystyle \begin{aligned} \mathrm{Cov}(\alpha'X,\beta'Y)=\alpha'\varSigma_{12}\beta=\beta'\varSigma_{21}\alpha&=\rho. \end{aligned} $$
(10.1.2)

Hence, the maximum value of Cov(α′X, β′Y ) yields the largest ρ. It follows from (ii) that \(\alpha =\frac {1}{\rho }\varSigma _{11}^{-1}\varSigma _{12}\beta \) which, once substituted in (iii) yields

$$\displaystyle \begin{aligned}{}[\varSigma_{21}\varSigma_{11}^{-1}\varSigma_{12}-\rho^2\varSigma_{22}]\beta=O\Rightarrow [\varSigma_{22}^{-1}\varSigma_{21}\varSigma_{11}^{-1}\varSigma_{12}-\rho^2I]\beta=O.\end{aligned}$$

This entails that ρ 2 = λ, an eigenvalue of \(B=\varSigma _{22}^{-1}\varSigma _{21}\varSigma _{11}^{-1}\varSigma _{12}\) or its symmetrized form \(\varSigma _{22}^{-\frac {1}{2}}\varSigma _{21}\varSigma _{11}^{-1}\varSigma _{12}\varSigma _{22}^{-\frac {1}{2}}\), and that β is a corresponding eigenvector. Similarly, by obtaining a representation of β from (iii), substituting it in (ii) and proceeding as above, it is seen that ρ 2 = λ is an eigenvalue of \(A=\varSigma _{11}^{-1}\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\) or its symmetrized form \(\varSigma _{11}^{-\frac {1}{2}}\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}\), and that α is a corresponding eigenvector. Hence manifestly, all the nonzero eigenvalues of A coincide with those of B. If p ≤ q and Σ 12 is of full rank p, then A > O (real positive definite) and B ≥ O (real positive semi-definite), whereas if q ≤ p and Σ 21 is of full rank p, then A ≥ O (real positive semi-definite) and B > O (real positive definite). If p = q and Σ 12 is of full rank p, then A and B are both positive definite. If p ≤ q and Σ 12 is of full rank p, then one should start with A and compute all the p nonzero eigenvalues of A since A will be of lower order; on the other hand, if q ≤ p and Σ 21 is of full rank q, then one ought to begin with B and determine all the nonzero eigenvalues of B. Thus, one can obtain the common nonzero eigenvalues of A and B or their symmetrized forms by making use of one of these sets of steps. Let us denote the largest value of these common eigenvalues λ = ρ 2 by λ (1) and the corresponding eigenvectors with respect to A and B, by α (1) and β (1), where the eigenvectors are normalized via the constraints \(\alpha _{(1)}^{\prime }\varSigma _{11}\alpha _{(1)}=1\) and \(\beta _{(1)}^{\prime }\varSigma _{22}\beta _{(1)}=1\). Then, \((u_1,~v_1)\equiv (\alpha _{(1)}^{\prime }X,~\beta _{(1)}^{\prime }Y)\) is the first pair of canonical variables in the sense that u 1 is the best predictor of v 1 and v 1 is the best predictor of u 1. Similarly, letting \(\rho _{(i)}^2=\lambda _{(i)}\) be the i-th largest common eigenvalue of A and B and the corresponding eigenvectors such that \(\alpha _{(i)}^{\prime }\varSigma _{11}\alpha _{(i)}=1\) and \(\beta _{(i)}^{\prime }\varSigma _{22}\beta _{(i)}=1,\) be denoted by α (i) and β (i), the i-th largest correlation between u = α′X and v = β′Y  will be equal to \(\alpha _{(i)}^{\prime }\varSigma _{12}\beta _{(i)}= \rho _{(i)}=\sqrt {\lambda _{(i)}}\), i = 1, …, p, p denoting the common number of nonzero eigenvalues of A and B, and occur when \(u=u_i= \alpha _{(i)}^{\prime }X\) and \(v= v_i=\beta _{(i)}^{\prime }Y\), u i and v i being the i-th pair of canonical variables. Clearly, Var(u i) and Var(v i), i = 1, …, p, are both equal to one. Once again, best is taken to mean ‘in the minimum mean square’ sense. Hence, the following results:

Theorem 10.1.1

Letting Σ, A, B, ρ, α (i) , β (i), u i and v i be as previously defined,

$$\displaystyle \begin{aligned}\max_{\alpha'\varSigma_{11}\alpha=1,~\beta'\varSigma_{22}\beta=1}[\alpha'\varSigma_{12}\beta]= \alpha_{(1)}^{\prime}\varSigma_{12}\beta_{(1)}=\rho_{(1)}{}\end{aligned} $$
(10.1.3)

where ρ (1) is the largest ρ or the largest canonical correlation, that is, the largest correlation between the first pair of canonical variables, u = α′X and v = β′Y, which is equal to the correlation between u 1 and v 1 , with \(\rho _{(1)}^2=\lambda _{(1)}\) , the common largest eigenvalue of A and B. Similarly, we have

$$\displaystyle \begin{aligned}\min_{\alpha'\varSigma_{11}\alpha=1,~\beta'\varSigma_{22}\beta=1}[\alpha'\varSigma_{12}\beta]= \alpha_{(p)}^{\prime}\varSigma_{12}\beta_{(p)}=\rho_{(p)}{}\end{aligned} $$
(10.1.4)

where ρ (p), which is the smallest nonzero value of ρ with \(\rho _{(p)}^2=\lambda _{(p)}\) , the common smallest nonzero eigenvalue of A and B, represents the smallest canonical correlation between u and v or the correlation between u p and v p

This maximum correlation between the linear functions α′X and β′Y  or the correlation between the best predictors u 1 and v 1 or the maximum value of ρ is called the first canonical correlation between the sets X and Y  in the sense the correlation between u 1 and v 1 attains its maximum value. When p = 1 or q = 1, the canonical correlation becomes the multiple correlation, and when p = 1 and q = 1, it is simply the correlation between two real scalar random variables. The matrix of the nonzero eigenvalues of A and B, denoted by Λ, is Λ = diag(λ (1), …, λ (p)) when p ≤ q and Σ 12 is of full rank p; otherwise, p is replaced by q in Λ.

It should be noted that, for instance, the canonical variable β′Y  such that β satisfies the constraint β′Σ 22 β = 1 is identical to \(b'\varSigma _{22}^{-1/2} Y\) such that b′b = 1 since \(\beta '\varSigma _{22}\beta =b'\varSigma _{22}^{-1/2} \varSigma _{22}\varSigma _{22}^{-1/2}b=b'b\). Accordingly, letting the λ (i)’s as well as A and B be as previously defined, our definition of a canonical variable, that is, \(u_i=\alpha _{(i)}^{\prime } X\) and \(v_i=\beta _{(i)}^{\prime }Y,\) coincides with the customary one, that is, \(u_i^*=a_i^{\prime } \varSigma _{11}^{-1/2} X\) where a i is the eigenvector with respect to B which is associated with λ (i) and normalized by requiring that \(a_i^{\prime } a_i=1,\) and \(v_i^*=b_i^{\prime } \varSigma _{22}^{-1/2} Y\) where b i is an eigenvector with respect to B corresponding to λ (i) and such that \(b_i^{\prime } b_i=1.\) It can be readily proved that the canonical variables \(u_1^*,\ldots ,u_p^*\) (or equivalently the u i’s) are uncorrelated, as Cov(\(u_i^*,u_j^*)=a_i^{\prime } \varSigma _{11}^{-1/2} \varSigma _{11} \varSigma _{11}^{-1/2}a_j=0\) for i ≠ j since the normed eigenvectors a i are orthogonal to one another. It can be similarly established that the \(v_i^*\)’s or, equivalently, the v i’s are uncorrelated. Clearly, Cov(\(u_i^*,u_j^*)=\) Cov(u i, u j) = 1 and Cov\((v_i^*,v_j^*)= \) Cov(v i, v j) = 1. We now demonstrate that, for i ≠ k, the canonical variables, u i and v k are uncorrelated. First, consider the equation A a = λ a, that is,

$$\displaystyle \begin{aligned}\varSigma_{11}^{-\frac{1}{2}}\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}\varSigma_{11}^{-\frac{1}{2}} a=\lambda\, a.\end{aligned}$$

On pre-multiplying both sides by \(\varSigma _{22}^{-\frac {1}{2}}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}\), we obtain B b = λ b where \(b=\varSigma _{22}^{-\frac {1}{2}}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}a\). Thus, if λ and a constitute an eigenvalue-eigenvector pair for A, then λ and b must also form an eigenvalue-eigenvector pair for B, and vice versa with \(a=\varSigma _{11}^{-\frac {1}{2}}\varSigma _{12}\varSigma _{22}^{-\frac {1}{2}} b\). By definition, Cov(\(u_i^*,v_k^*\))\(=a_i^{\prime } \varSigma _{11}^{-\frac {1}{2}}\varSigma _{12}\varSigma _{22}^{-\frac {1}{2}}b_k\) where the vector \(b_k= \theta \ \varSigma _{22}^{-\frac {1}{2}}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}a_k\), θ being a positive constant such that the Euclidean norm of b k is one. Note that since \(b_k^{\prime } b_k =\theta ^2 \, a_k^{\prime }\, A\, a_k^{\prime }=\theta ^2\, a_k^{\prime }\,\lambda _{(k)}\,a_k=1, \ \theta \) must be equal to \(1/\sqrt {\lambda _{(k)}}\). Thus, \(b_k= \varSigma _{22}^{-\frac {1}{2}}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}a_k/\sqrt {\lambda _{(k)}}\) with \( \varSigma _{11}^{-\frac {1}{2}}a_k=\alpha _{(k)} \) and \( b_k=\varSigma _{22}^{\frac {1}{2}}\beta _{(k)} \), which is equivalent to (iii) with α = α (k), β = β (k) and ρ = ρ (k), that is, \(\beta _{(k)}=\varSigma _{22}^{-{1}}\varSigma _{21}\alpha _{(k)}/ \rho _{(k)}.\) Then, Cov(\(u_i^*,v_k^*\)) \(=a_i^{\prime } \varSigma _{11}^{-\frac {1}{2}}\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}a_k/\sqrt {\lambda _{(k)}}=\) the (i, k)th element of diag(\(\lambda _{(1)},\ldots ,\lambda _{(p)})/\sqrt {\lambda _{(k)}}\), which is equal to 0 whenever i ≠ k. As expected, Cov(\(u_k^*,v_k^*)=\sqrt {\lambda _{(k)}}=\rho _{(k)}\), and Cov(u i, v k) = \(\alpha _{(i)}^{\prime } \varSigma _{12}\beta _{(k)}\!\!\overset {(iii)}{=}\alpha _{(i)}^{\prime }\varSigma _{12} \varSigma _{22}^{-{1}}\varSigma _{21}\alpha _{(k)}/\rho _{(k)}\) \(=a_i^{\prime } \varSigma _{11}^{-\frac {1}{2}}\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}a_k/\sqrt {\lambda _{(k)}} \) = Cov(\(u_i^*,v_k^*)\) for i, k = 1, …, p, assuming that p ≤ q and Σ 12 is of full rank; if p ≥ q and Σ 12 is of full rank, A and B will then share q nonzero eigenvalues.

10.1.1. An invariance property

An interesting property of canonical correlations is now pointed out. Consider the following nonsingular transformations of X and Y : Let X 1 = A 1 X and Y 1 = B 1 Y  where A 1 is a p × p nonsingular constant matrix and B 1 is a q × q constant nonsingular matrix so that |A 1|≠0 and |B 1|≠0. Now, consider the linear functions α′X 1 = α′A 1 X and β′Y 1 = β′B 1 Y  whose variances and covariance are as follows:

$$\displaystyle \begin{aligned} &\mathrm{Var}(\alpha'Y_1)=\mathrm{Var}(\alpha'A_1Y)=\alpha'A_1\varSigma_{11}A_1^{\prime}\alpha,~ \mathrm{Var}(\beta'Y_1)=\mathrm{Var}(\beta'B_1Y)=\beta'B_1\varSigma_{22}B_1^{\prime}\beta\\ &\mathrm{Cov}(\alpha'X_1,\beta'Y_1)=\alpha'A_1\varSigma_{12}B_1^{\prime}\beta=\beta'B_1\varSigma_{21}A_1^{\prime}\alpha.\end{aligned} $$

On imposing the conditions Var(α′X 1) = 1 and Var(β′Y 1) = 1, and maximizing Cov(α′X 1, β′Y 1) by means of the previously used procedure, we arrive at the equations

$$\displaystyle \begin{aligned} A_1\varSigma_{12}B_1^{\prime}\beta-\rho_1A_1\varSigma_{11}A_1^{\prime}\alpha &=0 \end{aligned} $$
(iv)
$$\displaystyle \begin{aligned} -\rho_2B_1\varSigma_{22}B_1^{\prime}\beta+B_1\varSigma_{21}A_1^{\prime}\alpha &=0.\end{aligned} $$
(v)

On pre-multiplying (iv) by α′ and (v) by β′, one has ρ 1 = ρ 2 ≡ ρ, say. Equations (iv) and (v) can then be re-expressed as

Taking the determinant of the coefficient matrix and equating it to zero to determine the roots, we have

(10.1.5)
(10.1.6)

As can be seen from (10.1.6), (10.1.1) and (10.1.5) have the same roots ρ, which means that the canonical correlation ρ is invariant under nonsingular linear transformations. Observe that when Σ 12 is of full rank p and p ≤ q, ρ (1), …, ρ (p) corresponding to the nonzero roots of (10.1.1) or (10.1.6), encompasses all the canonical correlations, so that, in that case, we have a matrix of canonical correlations. Hence, the following result:

Theorem 10.1.2

Let X, a p × 1 vector of real scalar random variables x 1, …, x p, and Y, a q × 1 vector of real scalar random variables y 1, …, y q, have a joint distribution. Then, the canonical correlations between X and Y  are invariant under nonsingular linear transformations, that is, the canonical correlations between X and Y  are the same as those between A 1 X and B 1 Y  where |A 1|≠0 and |B 1|≠0.

10.2. Pairs of Canonical Variables

As previously explained, λ (1) which denotes the largest eigenvalue of the matrix A =  \(\varSigma _{11}^{-1}\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\) or its symmetrized form \(\varSigma _{11}^{-\frac {1}{2}}\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}\), as well the largest eigenvalue of \(B=\varSigma _{22}^{-1}\varSigma _{21}\varSigma _{11}^{-1}\varSigma _{12}\) or \(\varSigma _{22}^{-\frac {1}{2}}\varSigma _{21}\varSigma _{11}^{-1}\varSigma _{12}\varSigma _{22}^{-\frac {1}{2}}\), also turns out to be equal to \(\rho _{(1)}^2\), the square of the largest root of equation (10.1.1). Having evaluated λ (1), we compute the corresponding eigenvectors α (1) and β (1) and normalize them via the constraints \(\alpha _{(1)}^{\prime }\varSigma _{11}\alpha _{(1)}=1\) and \(\beta _{(1)}^{\prime }\varSigma _{22}\beta _{(1)}=1\), which produces the first pair of canonical variables: \((u_1,~v_1)=(\alpha _{(1)}^{\prime }X,~\beta _{(1)}^{\prime }Y)\). We then take the second largest nonzero eigenvalue of A or B, denote it by λ (2), compute the corresponding eigenvectors α (2) and β (2) and normalize them so that \(\alpha _{(2)}^{\prime }\varSigma _{11}\alpha _{(2)}=1\) and \(\beta _{(2)}^{\prime }\varSigma _{22}\beta _{(2)}=1\), which yields the second pair of canonical variables: \((u_2,~v_2)=(\alpha _{(2)}^{\prime }X,~\beta _{(2)}^{\prime }Y)\). Continuing this process with all of the p nonzero eigenvalues if p ≤ q and Σ 12 is of full rank p, or with all of the q nonzero eigenvalues if q ≤ p and Σ 21 is of full rank q, will produce a complete set of canonical variables pairs: \((u_i,~v_i)=(\alpha _{(i)}^{\prime }X,~\beta _{(i)}^{\prime }Y), \ i=1,\ldots ,p \ \mathrm {or}\ q\).

Since the symmetrized forms of A and B are symmetric and nonnegative definite, all of their eigenvalues will be nonnegative and all nonzero eigenvalues will be positive. As is explained in Chapter 1 and Mathai and Haubold (2017a), all the eigenvalues of real symmetric matrices are real and for such matrices, there exists a full set of orthogonal eigenvectors whether some of the roots are repeated or not. Hence, \(\alpha _{(j)}^{\prime }X\) will be uncorrelated with all the linear functions \(\alpha _{(r)}^{\prime }X,~r=1,2 ,\ldots , j-1\), and \(\beta _{(j)}^{\prime }Y\) will be uncorrelated with \(\beta _{(r)}^{\prime }Y,~r=1,2 ,\ldots , j-1\). When constructing the second pair of canonical variables, we may impose the condition that the second linear functions α′X and β′Y  must be uncorrelated with the first pair \(\alpha _{(1)}^{\prime }X\) and \(\beta _{(1)}^{\prime }Y,\) respectively, by taking two more Lagrangian multipliers, adding the conditions \(\mathrm {Cov}(\alpha 'X,~\alpha _{(1)}^{\prime }X)=\alpha '\varSigma _{11}\alpha _{(1)}=0\) and β′Σ 22 β (1) = 0 to the optimizing function w and carrying out the optimization. We will then realize that these additional conditions are redundant and that the original optimizing equations are recovered, as was observed in the case of Principal Components. Similarly, we could incorporate the conditions α′Σ 11 α (r) = 0, r = 1, …, j − 1 when constructing α (j) and similar conditions when constructing β (j). However, these uncorrelatedness conditions will become redundant in the optimization procedure. Note that \(\lambda _{(1)}=\rho _{(1)}^2\) is the square of the first canonical correlation. Thus, the first canonical correlation is denoted by ρ (1). Similarly \(\lambda _{(r)}=\rho _{(r)}^2\) is the square of the r-th canonical correlation, r = 1, …, p when p ≤ q and Σ 12 is of full rank p. That is, ρ (1), …, ρ (p), the p nonzero roots of (10.1.1) when p ≤ q and Σ 12 is of full rank p, are canonical correlations, ρ (r) being called the r-th canonical correlation which is the r-th largest root of the determinantal equation (10.1.1). If p ≤ q and Σ 12 is not of full rank p, then there will be fewer nonzero canonical correlations.

Example 10.2.1

Let be a 5 × 1 real vector random variable where X is 3 × 1 and Y  is 2 × 1. Let the covariance matrix of Z be Σ where

Construct the pairs of canonical variables.

Solution 10.2.1

We need the following quantities:

Let us compute the eigenvalues of B since it is 2 × 2 whereas A is 3 × 3. The characteristic equation of 45B is (20 − λ)(26 − λ) − 70 = 0 ⇒ λ 2 − 46λ + 450 = 0. The roots are \(\lambda _1=23+\sqrt {79},~\lambda _2=23-\sqrt {79}\). Hence, the eigenvalues of B are \(\rho _j=\frac {\lambda _j}{45}\), that is, \(\rho _1=\frac {23+\sqrt {79}}{45},~\rho _2=\frac {23-\sqrt {79}}{45}\). We have denoted the second set of real scalar random variables by Y, Y = [y 1, y 2]. An eigenvector corresponding to ρ 1 is available from (B − ρ 1 I)Y = O. Since the right-hand side is null, we may omit the denominator. The first equation is then \((-3-\sqrt {79})y_1-10y_2=0\). Taking y 1 = 1, \(y_2=-\frac {1}{10}(3+\sqrt {79})\). It is easily verified that these values will also satisfy the second equation in (B − ρ 1 I)Y = O. An eigenvector, denoted by β 1, is the following:

We normalize β 1 through \(\beta _1^{\prime }\varSigma _{22}\beta _1=1\). To this end, consider

Hence a normalized β 1, denoted by β (1), and the corresponding canonical variable v 1 are the following:

The second eigenvalue of B is \(\rho _2=\frac {1}{45}(23-\sqrt {79})\). An eigenvector corresponding to ρ 2 is available from the equation (B − ρ 2 I)Y = O. The second equation gives \(-7y_1+(3+\sqrt {79})y_2=0\). Taking y 2 = 1, \(y_1=\frac {1}{7}(3+\sqrt {79})\). Hence, an eigenvector corresponding to ρ 2, denoted by β 2, is the following:

We normalize this vector through the constraint \(\beta _2^{\prime }\varSigma _{22}\beta _2=1\). Consider

Hence, the normalized eigenvector, denoted by β (2), and the corresponding canonical variable v 2 are

We will obtain the eigenvectors resulting from (A − ρ 1 I)X = O from the eigenvector β 1 instead of solving the equation relating to A, as the presence of the term \(3+\sqrt {79}\) can make the computations tedious. From equation (ii) of Sect. 10.1, we have

Let us normalize this vector by requiring that \(\alpha _1^{\prime }\varSigma _{11}\alpha _1\!=\!1\) or \(\alpha _1^{\prime }\varSigma _{11}\alpha _1\!=\!\frac {1}{\rho _1^2}\beta _1^{\prime }\varSigma _{21}\varSigma _{11}^{-1}\varSigma _{12}\beta _1\) \(=\gamma _1^2\), say:

Hence, the normalized α 1, denoted by α (1), is the following:

so that the corresponding canonical variable is

$$\displaystyle \begin{aligned} u_1&=\frac{5}{\sqrt{(15)}\sqrt{(553+11\sqrt{79})}}\Big\{\Big[6+\frac{3}{10}(3+\sqrt{79})\Big]x_1-\Big[3+\frac{9}{10}(3+\sqrt{79})\Big]x_2\\ &\qquad \qquad \qquad \ \,\qquad \qquad \qquad +\Big[5-\frac{1}{2}(3+\sqrt{79})\Big]x_3\Big\}.\end{aligned} $$

Now, from the formula \(\alpha _2=\frac {1}{\rho _2}\varSigma _{11}^{-1}\varSigma _{12}\beta _2\), we have

Let us normalize this vector via the constraint \(\alpha _2^{\prime }\varSigma _{11}\alpha _2=1\) or

$$\displaystyle \begin{aligned}\alpha_2^{\prime}\varSigma_{11}\alpha_2=\frac{1}{\rho_2^2}\beta_2^{\prime}\varSigma_{21}\varSigma_{11}^{-1}\varSigma_{12}\beta_2=\gamma_2^2,\end{aligned}$$

say. Thus,

and the normalized vector α 2, denoted by α (2), is

so that the second canonical variable is

$$\displaystyle \begin{aligned} u_2&=\frac{7}{\sqrt{15}\sqrt{(1738+94\sqrt{79})}}\Big \{\Big[\frac{6}{7}(3+\sqrt{79})-3\Big]x_1-\Big[\frac{3}{7}(3+\sqrt{79})+9\Big]x_2\\ &\qquad \qquad \qquad \qquad \qquad \qquad +\Big[\frac{5}{7}(3+\sqrt{79})+5\Big]x_3\Big\}.\end{aligned} $$

Hence, the canonical pairs are (u 1, v 1), (u 2, v 2) where u j is the best predictor of v j and vice versa for j = 1, 2. The pair of canonical variables (u 2, v 2) has the second largest canonical correlation. It is easy to verify that Cov(u 1, u 2) = 0 and Cov(v 1, v 2) = 0.

10.3. Estimation of the Canonical Correlations and Canonical Variables

Consider a simple random sample of size n from a population designated by the (p + q) × 1 real vector . Let the (corrected) sample sum of products matrix be denoted by

where S 11 is the sample sum of products matrix corresponding to the sample from the subvector X, whose (i, j)th element is of the form \(\sum _{k=1}^n(x_{ik}-\bar {x}_i)(x_{jk}-\bar {x}_j)\) with the matrix (x ik) denoting a sample of size n from X, S 22 is the sample sum of products matrix corresponding to the subvector Y  and \(\frac {1}{n}S_{12}\) is the sample covariance between X and Y . Thus, denoting the estimates by hats, the estimates of Σ 11, Σ 22 and Σ 12 are \(\hat {\varSigma }_{11}=\frac {1}{n}S_{11}\), \(\hat {\varSigma }_{22}=\frac {1}{n}S_{22}\) and \( \hat {\varSigma }_{12}=\frac {1}{n}S_{12}\), respectively. These will also be the maximum likelihood estimates if we assume normality, that is, if

(10.3.1)

where Σ 11 = Cov(X) > O, Σ 22 = Cov(Y ) > O and Σ 12 = Cov(X, Y ). For the estimates of these submatrices, equation (10.1.1) will take the following form:

(10.3.2)

where t is the sample canonical correlation; the reader may also refer to Mathai and Haubold (2017b). Letting \(\hat {\rho }=t\) be the estimated canonical correlation, whenever p ≤ q, t 2 is an eigenvalue of the sample canonical correlation matrix given by

$$\displaystyle \begin{aligned} \hat{\varSigma}_{11}^{-\frac{1}{2}}\hat{\varSigma}_{12}\hat{\varSigma}_{22}^{-1}\hat{\varSigma}_{21}\hat{\varSigma}_{11}^{-\frac{1}{2}}=S_{11}^{-\frac{1}{2}}S_{12}S_{22}^{-1}S_{21}S_{11}^{-\frac{1}{2}} =R_{11}^{-\frac{1}{2}}R_{12}R_{22}^{-1}R_{21}R_{11}^{-\frac{1}{2}}. \end{aligned} $$
(10.3.3)

Note that we have chosen the symmetric format for the sample canonical correlation matrix. Observe that the sample size n is omitted from the middle expression in (10.3.3) as it gets canceled. As well, the middle expression is expressed in terms of sample correlation matrices in the last expression appearing in (10.3.3). The conversion from a sample covariance matrix to a sample correlation matrix has previously been explained. Letting S = (s ij) denote the sample sum of products matrix, we can write

where r ij is the (i, j)th sample correlation coefficient, and for example, R 11 is the p × p submatrix within the (p + q) × (p + q) matrix R. We will examine the distribution of t 2 when the population covariance submatrix Σ 12 = O, as well as when Σ 12O, in the case of a (p + q)-variate real Gaussian population as given in (10.3.1), after considering an example to illustrate the computations of the canonical correlation ρ resulting from (10.1.1) and presenting an iterative procedure.

Example 10.3.1

Let X and Y  be two real bivariate vector random variables and . Consider the following simple random sample of size 5 from Z:

Construct the sample pairs of canonical variables.

Solution 10.3.1

Let us use the standard notation. The sample matrix is Z = [Z 1, …, Z 5], the sample average \(\bar {Z}=\frac {1}{5}[Z_1 +\cdots +Z_5]\), the matrix of sample averages is \(\bar {\mathbf {Z}}=[\bar {Z} ,\ldots , \bar {Z}]\), the deviation matrix is \({\mathbf {Z}}_d=[Z_1-\bar {Z} ,\ldots , Z_5-\bar {Z}]\) and the sample sum of products matrix is \(S={\mathbf {Z}}_{d}\,{\mathbf {Z}}_d^{\prime }\). These quantities are the following:

$$\displaystyle \begin{aligned} \mathbf{Z}&=\left[\begin{array}{rrrrr}1&2&2&\ \ 0&0\\ 2&0&1&\ \ 1&1\\ 1&-1&0&\ \ 1&-1\\ -1&1&-1&\ \ 0&1\end{array}\right],~~\bar{Z}=\left[\begin{array}{r}1\\ 1\\ 0\\ 0\end{array}\right],\\ {\mathbf{Z}}_d&=\left[\begin{array}{rrrrr}0&1&1&-1&-1\\ 1&-1&0&0&0\\ 1&-1&0&1&-1\\ -1&1&-1&0&1\end{array}\right],~~ S=\left[\begin{array}{rrrrr}4&-1&-1&-1\\ -1&2&2&-2\\ -1&2&4&-3\\ -1&-2&-3&4\end{array}\right],\end{aligned} $$

where, as per our notation,

We need to compute the following items:

The matrices A and B are then the following (using the same notation as for the population values for convenience):

If the population covariance matrix of Z is Σ, an estimate of Σ is \(\frac {S}{n}\) where S is the sample sum of products matrix and n is the sample size, which is also the maximum likelihood estimate of Σ if Z is Gaussian distributed. Instead of using \(\frac {S}{n}\), we will work with S since the normalized eigenvectors of S and \(\frac {S}{n}\) are identical, although the eigenvalues of \(\frac {S}{n}\) are \(\frac {1}{n}\) times the eigenvalues of S.

The eigenvalues of A are \(\frac {2}{7^2}\) times the solution of \((14-\lambda )(16-\lambda )-28=0\Rightarrow \lambda _1=15+\sqrt {29},~\lambda _2=15-\sqrt {29}\), so that the eigenvalues of A, denoted by λ 11 and λ 12, are \(\lambda _{11}=(\frac {2}{7^2})(15+\sqrt {29}),~ \lambda _{12}=\frac {2}{7^2}(15-\sqrt {29})\). The eigenvalues of B are \(\frac {2}{7^2}\) times the solutions of \((7-\nu )(23-\nu )+35=0\Rightarrow \nu _1=15+\sqrt {29},~\nu _2=15-\sqrt {29}\), so that the eigenvalues of B, denoted by ν 21 and ν 22, are \(\nu _{21}=\frac {2}{7^2}(15+\sqrt {29}),~\nu _{22}=\frac {2}{7^2}(15-\sqrt {29})\), which, as expected, are the same as those of A. Corresponding to λ 11, an eigenvector from A is available from the equation

deleting \(\frac {2}{7^2}\) from both sides. Thus, one solution is

Let us normalize this vector through the constraint \(\alpha _1^{\prime }S_{11}\alpha _1=1\). Since

the normalized eigenvector, denoted by α (1), and the corresponding sample canonical variable, denoted by u 1, are

The eigenvalues of B are also the same as \(\nu _1=15+\sqrt {29},~~ \nu _2=15-\sqrt {29}\). Let us compute an eigenvector corresponding to the eigenvalue ν 1 obtained from B. This eigenvector can be determined from the equation

which gives one solution as

Let us normalize under the constraint \(\beta _1^{\prime }S_{22}\beta _1=1\). Since

the normalized eigenvector, denoted by β (1), and the corresponding canonical variable denoted by v 1, are the following:

Therefore, one pair of canonical variables is (u 1, v 1) where u 1 is the best predictor of v 1 and vice versa. Now, consider \(\lambda _2=15-\sqrt {29}\) and \(\nu _2=15-\sqrt {29}\). Proceed as in the above case to obtain the second pair of canonical variables (u 2, v 2).

10.3.1. An iterative procedure

Without any loss of generality, let p ≤ q. We will illustrate the procedure for the population values for convenience. Consider the matrix A as previously defined, that is, \(A=\varSigma _{11}^{-1}\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\), and ρ, the canonical correlation which is a solution of (10.1.1) with ρ 2 = λ where λ is an eigenvalue of A. When p is small, we may directly solve the determinantal equation (10.1.1) and evaluate the roots which are the canonical correlations. We are now illustrating the computations for the population values. When p is large, direct evaluation could prove tedious without resorting to computational software packages. In this case, the following iterative procedure may be employed. Let λ be an eigenvalue of A and α the corresponding eigenvector. We want to evaluate λ = ρ 2 and α, but we cannot solve (10.1.1) directly when p is large. In that case, take an initial trial vector γ 0 and normalize it via the constraint \(\gamma _0^{\prime }\,\varSigma _{11}\gamma _0=1\) so that \(\alpha _0^{\prime }\,\varSigma _{11}\alpha _0=1\), α 0 being the normalized γ 0. Then, \(\alpha _0=\frac {1}{\sqrt {\gamma _0^{\prime }\varSigma _{11}\gamma _0}}\gamma _0\). Now, consider the equation

$$\displaystyle \begin{aligned}A\,\alpha_0=\gamma_1. \end{aligned}$$

If α 0 happens to be the eigenvector α (1) corresponding to the largest eigenvalue λ (1) of A then \(A\,\alpha _0=\lambda _{(1)}\alpha _{(1)}; ~A\,\alpha _0=\gamma _1\Rightarrow \gamma _1^{\prime }\varSigma _{11}\gamma _1=\lambda _{(1)}^2\alpha _{(1)}^{\prime }\varSigma _{11}\alpha _{(1)}=\lambda _{(1)}^2\) since \(\alpha _{(1)}^{\prime }\varSigma _{11}\alpha _{(1)}=1\). Then \(\rho _{(1)}^2=\lambda _{(1)}=\sqrt {\gamma _1^{\prime }\varSigma _{11}\gamma _1}\). This gives the motivation for the iterative procedure. Consider the equation

$$\displaystyle \begin{aligned}A\,\alpha_i=\gamma_{i+1},~ \alpha_i=\frac{1}{\sqrt{\gamma_i^{\prime}\varSigma_{11}\gamma_i}}\gamma_i,~ i=0,1,\ldots \end{aligned}$$
(i)

Continue the iteration process. At each stage compute \(\delta _i=\alpha _i^{\prime }\varSigma _{11}\alpha _i\) while ensuring that δ i is increasing. Halt the iteration when γ j = γ j−1 approximately, that is, when α j = α j−1 approximately, which indicates that γ j converges to some vector γ. At this stage, the normalized γ is α (1), the eigenvector corresponding to the largest eigenvalue λ (1) of A. Then, the largest eigenvalue λ (1) of A is given by \(\lambda _{(1)}=\sqrt {\gamma '\varSigma _{11}\gamma }\). Thus, as a result of the iteration process specified by equation (i),

$$\displaystyle \begin{aligned}\lim_{j\to\infty}\alpha_j=\alpha_{(1)}\mbox{ and }+\sqrt{\lim_{j\to\infty}\gamma_j^{\prime}\varSigma_{11}\gamma_j}=\lambda_{(1)}. \end{aligned}$$
(ii)

These initial iterations produce the largest eigenvalue \(\lambda _{(1)}=\rho _{(1)}^2\) and the corresponding eigenvector α (1). From (10.1.1), we have

$$\displaystyle \begin{aligned}\varSigma_{22}^{-1}\varSigma_{21}\alpha=\rho\,\beta\Rightarrow \frac{1}{\rho}\varSigma_{22}^{-1}\varSigma_{21}\alpha=\beta. \end{aligned}$$
(iii)

Substitute the computed ρ (1) and α (1) in (iii) to obtain β (1), the eigenvector corresponding to the largest eigenvalue λ (1) of \(B=\varSigma _{22}^{-\frac {1}{2}}\varSigma _{21}\varSigma _{11}^{-1}\varSigma _{12}\varSigma _{22}^{-\frac {1}{2}}\). This completes the first stage of the iteration process. Now, consider \(A_2=A-\lambda _{(1)}\alpha _{(1)}\alpha _{(1)}^{\prime }\). Observe that \(\alpha _{(1)}\alpha _{(1)}^{\prime }\) is a p × p matrix. In general, we can express a symmetric matrix in terms of its eigenvalues and normalized eigenvectors as follows:

$$\displaystyle \begin{aligned}A=\varSigma_{11}^{-\frac{1}{2}}\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}\varSigma_{11}^{-\frac{1}{2}}=\lambda_{(1)}\alpha_{(1)}\alpha_{(1)}^{\prime}+\lambda_{(2)}\alpha_{(2)}\alpha_{(2)}^{\prime} +\cdots+\lambda_{(p)}\alpha_{(p)}\alpha_{(p)}^{\prime}, \end{aligned}$$
(iv)

as explained in Chapter 1 or Mathai and Haubold (2017a). Carry out the second stage of the iteration process on A 2 as indicated in (i). This will produce the second largest eigenvalue λ (2) of A and the corresponding eigenvector α (2). Then, compute the corresponding β (2) via the procedure given in (iii). This will complete the second stage. For the next stage, consider

$$\displaystyle \begin{aligned}A_3=A_2-\lambda_{(2)}\alpha_{(2)}\alpha_{(2)}^{\prime}=A-\lambda_{(1)}\alpha_{(1)}\alpha_{(1)}^{\prime}-\lambda_{(2)}\alpha_{(2)}\alpha_{(2)}^{\prime} \end{aligned}$$

and perform the iterative steps (i) to (iv). This will produce λ (3), α (3) and β (3). Keep on iterating until all the p eigenvalues λ (1), …, λ (p) of A as well as α (j) and β (j), the corresponding eigenvectors of A and B are obtained for j = 1, …, p.

In the case of sample eigenvalues and eigenvectors, start with the sample matrices

$$\displaystyle \begin{aligned} \hat{A}&=\hat{\varSigma}_{11}^{-\frac{1}{2}}\hat{\varSigma}_{12}\hat{\varSigma}_{22}^{-1}\hat{\varSigma}_{21}\hat{\varSigma}_{11}^{-\frac{1}{2}} =R_{11}^{-\frac{1}{2}}R_{12}R_{22}^{-1}R_{21}R_{11}^{-\frac{1}{2}} \\ \hat{B}&=\hat{\varSigma}_{22}^{-\frac{1}{2}}\hat{\varSigma}_{21}\hat{\varSigma}_{11}^{-1}\hat{\varSigma}_{12}\hat{\varSigma}_{22}^{-\frac{1}{2}} =R_{22}^{-\frac{1}{2}}R_{21}R_{11}^{{}_1}R_{12}R_{22}^{-\frac{1}{2}}.\end{aligned} $$

Carry out the iteration steps (i) to (iv) on \(\hat {A}\) to obtain the sample eigenvalues, denoted by \(\hat {\rho }_{(j)}=t_{(j)},~j=1 ,\ldots , p\) for p ≤ q, where t (j) is the j-th sample canonical correlation, and the corresponding eigenvectors of \(\hat {A}\) denoted by a (j) as well as those of \(\hat {B}\) denoted by b (j).

Example 10.3.2

Consider the real vectors X′ = (x 1, x 2), Y = (y 1, y 2, y 3), and let Z′ = (X′, Y) where x 1, x 2, y 1, y 2, y 3 are real scalar random variables. Let the covariance matrix of Z be Σ > O where

with

Consider the problem of predicting X from Y  and vice versa. Obtain the best predictors by constructing pairs of canonical variables.

Solution 10.3.2

Let us first compute the inverses \(\varSigma _{11}^{-1},~\varSigma _{22}^{-1}\) and \(\varSigma _{11}^{-1}\varSigma _{12},~ \varSigma _{22}^{-1}\varSigma _{21}\). We are taking the non-symmetric form of A as the symmetric form requires more calculations. Either way, the eigenvalues are identical. On directly applying the formula \(C^{-1}=\frac {1}{|C|}\times [\mbox{the matrix of cofactors}]'\), we have

Thus, the non-symmetric forms of A and B are

Let us compute the eigenvalues of 3A. Consider , which gives

$$\displaystyle \begin{aligned}\lambda=\frac{12\pm \sqrt{(12)^2-4(24)}}{2}=6+2\sqrt{3},~ 6-2\sqrt{3}, \end{aligned}$$

the eigenvalues of A being \(\lambda _{(1)}=2+\frac {2}{3}\sqrt {3},~ \lambda _{(2)}=2-\frac {2}{3}\sqrt {3}\). These are the squares of the canonical correlation coefficient ρ resulting from (10.1.1). Let us determine the eigenvectors corresponding to λ (1) and λ (2). Our notations for the linear functions of X and Y  are u = α′X and v = β′Y ; in this case, α′ = (α 1, α 2) and β′ = (β 1, β 2, β 3). Then, the eigenvector α, corresponding to λ (1) is obtained from the equation

(i )

Observe that since (i) is a singular system of linear equations, we need only consider one equation and we can preassign a value for α 1 or α 2. Taking α 1 = 1, let us normalize the resulting vector via the constraint α′Σ 11 α = 1. Since

the normalized α, denoted by α (1), is

(ii )

Now, the eigenvector corresponding to the second eigenvalue λ (2) is such that

Since

the normalized α such that α′Σ 11 α = 1 is

(iii)

Observe that computing the eigenvalues of B from the equation |B − λ (j) I| = 0 will be difficult. However, we know that they are λ (1) and λ (2) as given above, the third one being equal to zero. So, let us first verify that 3λ (1) is an eigenvalue of 3B. Consider

The operations performed are the following: Taking out 2 from each row; interchanging the second and the first rows; adding \((1+\sqrt {3})\) times the first row to the second row and adding minus one times the first row to the third row. Similarly, it can be verified that λ (2) is also an eigenvalue of B. Moreover, since the third row of B is equal to the sum of its first two rows, B is singular, which means that the remaining eigenvalue must be zero. In Example 10.2.1, we made use of the formula resulting from (ii) of Sect. 10.1 for determining the second set of canonical variables. In this case, they will be directly computed from B to illustrate a different approach. Let us now determine the eigenvectors with respect to B, corresponding to λ (1) and λ (2):

This yields the equations

$$\displaystyle \begin{aligned} -(1+\sqrt{3})\beta_1-3\beta_2+2\beta_3&=0 \end{aligned} $$
(iv)
$$\displaystyle \begin{aligned} -\beta_1-\sqrt{3}\beta_2-\beta_3&=0 \end{aligned} $$
(v)
$$\displaystyle \begin{aligned} \beta_1-(2+\sqrt{3})\beta_3&=0,\end{aligned} $$
(vi)

whose solution in terms of an arbitrary β 3 is \(\beta _2=-(1+\sqrt {3})\beta _3 \) and \( \beta _1=(2+\sqrt {3})\beta _3\). Taking β 3 = 1, we have the solution, \(\beta _3=1,~\beta _2=-(1+\sqrt {3}),~\beta _1=(2+\sqrt {3})\). Let us normalize the resulting vector via the constraint β′Σ 22 β = 1:

Thus, the normalized β, denoted by β (1), is

(vii)

Observe that we could also have utilized (iii) of Sect. 10.3.1 to evaluate β (1) and β (2) from α (1) and α (2). Consider the second eigenvalue \(\lambda _{(2)}=\frac {6}{3}-\frac {2}{3}\sqrt {3}\) and the equation (B − λ (2) I)β = O, that is,

which leads to the equations

$$\displaystyle \begin{aligned} (-1+\sqrt{3})\beta_1-3\beta_2+2\beta_3&=0 \end{aligned} $$
(viii)
$$\displaystyle \begin{aligned} -\beta_1+\sqrt{3}\beta_2-\beta_3&=0 \end{aligned} $$
(ix)
$$\displaystyle \begin{aligned} \beta_1+(-2+\sqrt{3})\beta_3&=0.\end{aligned} $$
(x)

Thus, when β 3 = 1, \(\beta _1=2-\sqrt {3} \) and \(\beta _2=\sqrt {3}-1\). Subject to the constraint β′Σ 22 β = 1, we have

Hence, the normalized β is

(xi)

The reader may also verify that this solution for β (2) is identical to that coming from (iii) of Sect. 10.3.1. Thus, the canonical pairs are the following: From (ii) and (vii), we have the first canonical pair (u 1, v 1), the second pair (u 2, v 2) resulting from (iii) and (xi). This means u 1 is the best predictor of v 1 and vice versa, and that u 2 is the second best predictor of v 2 and vice versa.

Let us ensure that no computational errors have been committed. Consider

with

$$\displaystyle \begin{aligned}\gamma_1\delta_1=6(2+\sqrt{3})2(3+\sqrt{3})=12(9+5\sqrt{3}), \end{aligned}$$

so that

$$\displaystyle \begin{aligned} \frac{[\alpha_{(1)}^{\prime}\varSigma_{12}\beta_{(1)}]^2}{\gamma_1\delta_1}&=\frac{16(3+2\sqrt{3})^2}{12(9+5\sqrt{3})}\\ &=\frac{16(21+12\sqrt{3})}{12(9+5\sqrt{3})}=\frac{4(7+4\sqrt{3})}{9+3\sqrt{3})}=\frac{4(7+4\sqrt{3})(9-5\sqrt{3})}{6}\\ &=\frac{2}{3}(3+\sqrt{3})=2+\frac{2}{\sqrt{3}}=2+\frac{2}{3}\sqrt{3}=\lambda_{(1)}\!:\mbox{the largest eigenvalue of}\ A,\end{aligned} $$

which corroborates the results obtained for α (1), β (1) and λ (1). Similarly, it can be verified that α (2), β (2) and λ (2) have been correctly computed.

10.4. The Sampling Distribution of the Canonical Correlation Matrix

Consider a simple random sample of size n from . Let the (p + q) × (p + q) sample sum of products matrix be denoted by S and let Z have a real (p + q)-variate standard Gaussian density. Then S has a real (p + q)-variate Wishart distribution with the identity matrix as its parameter matrix and m = n − 1 degrees of freedom, n being the sample size. Letting the density of S be denoted by f(S),

$$\displaystyle \begin{aligned}f(S)=\frac{|S|{}^{\frac{m}{2}-\frac{p+q-1}{2}}}{2^{\frac{m(p+q)}{2}}\varGamma_{p+q}(\frac{m}{2})}\mathrm{e}^{-\frac{1}{2}\mathrm{tr}(S)},~S>O,~m\ge p+q.{}\end{aligned} $$
(10.4.1)

Let us partition S as follows:

and let dS = dS 11 ∧dS 22 ∧dS 12. Note that tr(S) = tr(S 11) + tr(S 22) and

$$\displaystyle \begin{aligned} |S|&=|S_{22}|~|S_{11}-S_{12}S_{22}^{-1}S_{21}|\\ &=|S_{22}|~|S_{11}|~|I-S_{11}^{-\frac{1}{2}}S_{12}S_{22}^{-1}S_{21}S_{11}^{-\frac{1}{2}}|.\end{aligned} $$

Letting \(U=S_{11}^{-\frac {1}{2}}S_{12}S_{22}^{-\frac {1}{2}}], \ \mathrm {d}U=|S_{11}|{ }^{-\frac {q}{2}}|S_{22}|{ }^{-\frac {p}{2}}\mathrm {d}S_{12}\) for fixed S 11 and S 22, so that the joint density of S 11, S 22 and S 12 is given by

$$\displaystyle \begin{aligned} f_1(S)\mathrm{d}S_{11}\wedge\mathrm{d}S_{22}\wedge\mathrm{d}S_{12}&=\frac{|S_{11}|{}^{\frac{m}{2}-\frac{p+1}{2}}\mathrm{e}^{-\frac{1}{2}\mathrm{tr}(S_{11})}}{2^{\frac{mp}{2}}\varGamma_p(\frac{m}{2})}\mathrm{d}S_{11}\\ &\ \ \ \ \times\frac{|S_{22}|{}^{\frac{m}{2}-\frac{q+1}{2}}\mathrm{e}^{-\frac{1}{2}\mathrm{tr}(S_{22})}}{2^{\frac{mq}{2}}\varGamma_q(\frac{m}{2})}\mathrm{d}S_{22}\\ &\ \ \ \ \times \frac{\varGamma_p(\frac{m}{2})\varGamma_q(\frac{m}{2})} {\varGamma_{p+q}(\frac{m}{2})} \,|I-UU'|{}^{\frac{m}{2}-\frac{p+q-1}{2}}\mathrm{d}U. \end{aligned} $$
(10.4.2)

It is seen from (10.4.2) that S 11, S 22 and U are mutually independently distributed, and so are S 11, S 22 and W = UU′. Further, S 11 ∼ W p(m, I) and S 22 ∼ W q(m, I). Note that \(W=UU'=S_{11}^{-\frac {1}{2}}S_{12}S_{22}^{-1}S_{21}S_{11}^{-\frac {1}{2}}\) is the sample canonical correlation matrix. It follows from Theorem 4.2.3 of Chapter 4, that for p ≤ q and U of full rank p,

$$\displaystyle \begin{aligned}\mathrm{d}U=\frac{\pi^{\frac{pq}{2}}}{\varGamma_p(\frac{q}{2})}|W|{}^{\frac{q}{2}-\frac{p+1}{2}}\mathrm{d}W.{}\end{aligned} $$
(10.4.3)

After integrating out S 11 and S 22 from (10.4.2) and substituting for dS 12, we obtain the following representation of the density of W:

$$\displaystyle \begin{aligned}f_2(W)=\frac{\pi^{\frac{pq}{2}}}{\varGamma_p(\frac{q}{2})}\frac{\varGamma_p(\frac{m}{2})\varGamma_q(\frac{m}{2})}{\varGamma_{p+q}(\frac{m}{2})}|W|{}^{\frac{q}{2}-\frac{p+1}{2}} |I-W|{}^{\frac{m-q}{2}-\frac{p+1}{2}},\end{aligned}$$

where

$$\displaystyle \begin{aligned}\frac{\varGamma_q(\frac{m}{2})}{\varGamma_{p+q}(\frac{m}{2})}=\frac{\pi^{\frac{q(q-1)}{4}}}{\pi^{\frac{(p+q)(p+q-1)}{4}}}\frac{\varGamma(\frac{m}{2})\cdots\varGamma(\frac{m}{2}-\frac{q-1}{2})} {\varGamma(\frac{m}{2})\cdots\varGamma(\frac{m}{2}-\frac{p+q-1}{2})} =\frac{1}{\pi^{\frac{pq}{2}}\varGamma_p(\frac{m-q}{2})}.\end{aligned}$$

Hence, the density of W is

$$\displaystyle \begin{aligned} f_2(W)=\frac{\varGamma_p(\frac{m}{2})}{\varGamma_p(\frac{q}{2})\varGamma_p(\frac{m-q}{2})}|W|{}^{\frac{q}{2}-\frac{p+1}{2}} |I-W|{}^{\frac{m-q}{2}-\frac{p+1}{2}}. \end{aligned} $$
(10.4.4)

Thus, the following result:

Theorem 10.4.1

Let Z, S, S 11, S 22, S 12, U and W be as defined above. Then, for p  q and U of full rank p, the p × p canonical correlation matrix W = UU′ has the real matrix-variate type-1 beta density with the parameters \((\frac {q}{2},~\frac {m-q}{2})\) that is specified in (10.4.4) with m  p + q, m = n − 1.

When q ≤ p and S 21 is of full rank q, the canonical correlation matrix \(W_1=U'U=S_{22}^{-\frac {1}{2}}S_{21}S_{11}^{-1}S_{12}S_{22}^{-\frac {1}{2}}\) will have the density given in (10.4.4) with p and q interchanged. Suppose that p ≤ q and we would like to consider the density of W 1 = U′U. In this case U′U is real positive semi-definite as the rank of U is p ≤ q. However, on expanding the following determinant in two different ways:

it follows from (10.4.2) that the q × q matrix U′U has a distribution that is equivalent to that of the p × p matrix UU′, as given in (10.4.4). The distribution of the sample canonical correlation matrix has been derived in Mathai (1981) for a Gaussian population under the assumption that Σ 12O.

10.4.1. The joint density of the eigenvalues and eigenvectors

Without any loss of generality, let p ≤ q and U be of full rank p. Let W denote the sample canonical correlation matrix whose density is as given in (10.4.4) for the case when the population canonical matrix is a null matrix. Let the eigenvalues of W be distinct and such that 1 > ν 1 > ν 2 > ⋯ > ν p > 0. Observe that \(\nu _j=r_{(j)}^2\) where r (j), j = 1, …, p are the sample canonical correlations. For a unique p × p orthonormal matrix Q, QQ′ = I, Q′Q = I, we have Q′WQ = diag(ν 1, …, ν p) ≡ D. Consider the transformation from W to D and the normalized eigenvectors of W, which constitute the columns of Q. Then, as is explained in Theorem 8.2.1 or Theorem 4.4 of Mathai (1997),

$$\displaystyle \begin{aligned}\mathrm{d}W=\Big[\prod_{i<j}(\nu_i-\nu_j)\Big]\mathrm{d}D\wedge h(Q){}\end{aligned} $$
(10.4.5)

where h(Q) = ∧[(dQ)Q′] is the differential element associated with Q, and we have the following result:

Theorem 10.4.2

The joint density of the distinct eigenvalues 1 > ν 1 > ν 2 > ⋯ > ν p > 0, p  q, of W = UU′ whose density is specified in (10.4.4), U being assumed to be of full rank p, and the normalized eigenvectors corresponding to ν 1, …, ν p , denoted by f 3(D, Q), is the following:

$$\displaystyle \begin{aligned} f_3(D,Q)\mathrm{d}D\wedge h(Q)&=\frac{\varGamma_p(\frac{m}{2})}{\varGamma_p(\frac{q}{2})\varGamma_p(\frac{m-q}{2})}\Big[\prod_{j=1}^p\nu_j^{\frac{q}{2}-\frac{p+1}{2}}\Big]\Big[\prod_{j=1}^p(1-\nu_j)^{\frac{m-q}{2}-\frac{p+1}{2}}\Big] \\ {} &\, \qquad \qquad \qquad \times \Big[\prod_{i<j}(\nu_i-\nu_j)\Big]\mathrm{d}D\wedge h(Q) \end{aligned} $$
(10.4.6)

where h(Q) is as defined in (10.4.5). To obtain the joint density of the squares of the sample canonical correlations \(r_{(j)}^2\) and the corresponding canonical vectors, it suffices to replace ν j by \(r_{(j)}^2,~1>r_{(1)}^2>\cdots >r_{(p)}^2>0,~-1<r_{(j)}<1,~ j=1 ,\ldots , p\).

The joint density of the eigenvalues can be determined by integrating out h(Q) from (10.4.6) in this real case. It follows from Theorem 4.2.2 that

$$\displaystyle \begin{aligned}\int_{O_p}h(Q)=\frac{\pi^{\frac{p^2}{2}}}{\varGamma_p(\frac{p}{2})},{}\end{aligned} $$
(10.4.7)

this result being also stated in Mathai (1997). For the complex case, the expression on the right-hand side of (10.4.7) is \(\pi ^{p(p-1)}\!/{{\tilde \varGamma }_p({p})}\). Hence, the joint density of the eigenvalues or, equivalently, the density of D and the density of Q are the following:

Theorem 10.4.3

When p  q and U is of full rank p, the joint density of the distinct eigenvalues 1 > ν 1 > ⋯ > ν p > 0 of the canonical correlation matrix W in (10.4.4), which is available from (10.4.6) and denoted by f 4(D), is

$$\displaystyle \begin{aligned} f_4(D)&=\frac{\varGamma_p(\frac{m}{2})}{\varGamma_p(\frac{q}{2})\varGamma_p(\frac{m-q}{2})}\frac{\pi^{\frac{p^2}{2}}}{\varGamma_p(\frac{p}{2})}\Big[\prod_{j=1}^p\nu_j^{\frac{q}{2}-\frac{p+1}{2}}\Big] \\ {} & \ \ \ \ \times \Big[\prod_{j=1}^p(1-\nu_j)^{\frac{m-q}{2}-\frac{p+1}{2}}\Big]\Big[\prod_{i<j}(\nu_i-\nu_j)\Big], \end{aligned} $$
(10.4.8)

and the joint density of the normalized eigenvectors associated with W, denoted by f 5(Q), is given by

$$\displaystyle \begin{aligned}f_5(Q)=\frac{\varGamma_p(\frac{p}{2})}{\pi^{\frac{p^2}{2}}}h(Q){}\end{aligned} $$
(10.4.9)

where h(Q) is as defined in (10.4.5).

To obtain the joint density of the squares of the sample canonical correlations \(r_{(j)}^2\), one should replace ν j by \(r_{(j)}^2,\) j = 1, 2, …, p, in (10.4.8).

Example 10.4.1

Verify that (10.4.8) is a density for p = 2, m − q = p + 1, with q being a free parameter.

Solution 10.4.1

For m − q = p + 1, p = 2, the right-hand side of (10.4.8) becomes

$$\displaystyle \begin{aligned} \frac{\varGamma_p(\frac{q+p+1}{2})}{\varGamma_p(\frac{q}{2})\varGamma_p(\frac{p+1}{2})}&\frac{\pi^{\frac{p^2}{2}}}{\varGamma_p(\frac{p}{2})} (\nu_1\nu_2)^{\frac{q-(p+1)}{2}}(\nu_1-\nu_2)\\ &=\frac{\varGamma_2(\frac{q+3}{2})}{\varGamma_2(\frac{q}{2})\varGamma_2(\frac{p+1}{2})}\frac{\pi^2}{\varGamma_2(1)}(\nu_1\nu_2)^{\frac{m-3}{2}}(\nu_1-\nu_2). \end{aligned} $$
(i)

The constant part simplifies as follows:

$$\displaystyle \begin{aligned} \frac{\varGamma_2(\frac{q+3}{2})}{\varGamma_2(\frac{q}{2})\varGamma_2(\frac{p+1}{2})}\frac{\pi^2}{\varGamma_2(1)} &=\frac{\varGamma(\frac{q+3}{2})\varGamma(\frac{q+2}{2})}{\sqrt{\pi}[\varGamma(\frac{q}{2})\varGamma(\frac{q-1}{2})][\varGamma(\frac{3}{2})\varGamma(1)]} \frac{\pi^2}{\sqrt{\pi}\varGamma(1)\varGamma(\frac{1}{2})}; \end{aligned} $$
(ii)

now, noting that

$$\displaystyle \begin{aligned}\varGamma\Big(\frac{q+3}{2}\Big)=\Big(\frac{q+1}{2}\Big)\Big(\frac{q-1}{2}\Big)\varGamma\Big(\frac{q-1}{2}\Big)\mbox{ and }\varGamma\Big(\frac{q+2}{2}\Big) =\frac{q}{2}\varGamma\Big(\frac{q}{2}\Big), \end{aligned}$$

and substituting these values in (ii), the constant part becomes

$$\displaystyle \begin{aligned}\frac{(\frac{q+1}{2})(\frac{q-1}{2})(\frac{q}{2})}{\sqrt{\pi}(\frac{1}{2})\sqrt{\pi}}\frac{\pi^2}{\sqrt{\pi}\sqrt{\pi}} =\frac{(q-1)q(q+1)}{4}. \end{aligned}$$
(iii)

Let us show that the total integral equals 1. The integral part is the following:

$$\displaystyle \begin{aligned} \int_{\nu_1=0}^1\int_{\nu_2=0}^{\nu_1}&(\nu_1\nu_2)^{\frac{q-3}{2}}(\nu_1-\nu_2)\mathrm{d}\nu_1\wedge\mathrm{d}\nu_2\\ &=\int_0^1\nu_1^{\frac{q-1}{2}}\Big[\int_{\nu_2=0}^{\nu_1}\nu_2^{\frac{q-3}{2}}\mathrm{d}\nu_2\Big]\mathrm{d}\nu_1 -\int_0^1\nu_1^{\frac{q-3}{2}}\Big[\int_{\nu_2=0}^{\nu_1}\nu_2^{\frac{q-1}{2}}\mathrm{d}\nu_2\Big]\mathrm{d}\nu_1\\ &=\int_0^1\frac{\nu_1^{q-1}}{(\frac{q-1}{2})}\mathrm{d}\nu_1-\int_0^1\frac{\nu_1^{q-1}}{(\frac{q+1}{2})}\mathrm{d}\nu_1 =\frac{1}{q(\frac{q-1}{2})}-\frac{1}{q(\frac{q+1}{2})}\\ &=\frac{4}{(q-1)q(q+1)}. \end{aligned} $$
(iv)

The product of (iii) and (iv) being equal to 1, this verifies that (10.4.8) is a density for m − q = p + 1, p = 2. This completes the computations.

10.4.2. Testing whether the population canonical correlations equal zero

In its symmetric form, whenever Σ 12 = O, the population canonical correlation matrix is a null matrix, that is, \(\varSigma _{11}^{-\frac {1}{2}}\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}=O\). Thus, when Σ 12 = O, the canonical correlations are equal to zero and vice versa. As was explained in Sect. 6.8.2, we have a one-to-one function of u 4, the likelihood ratio criterion for testing this hypothesis in the case of a Gaussian distributed population. It was established that

$$\displaystyle \begin{aligned}u_4=\frac{|S|}{|S_{11}|~|S_{22}|}=|I-S_{11}^{-\frac{1}{2}}S_{12}S_{22}^{-1}S_{21}S_{11}^{-\frac{1}{2}}|=|I-UU'|=|I-W|=\prod_{j=1}^p(1-r_{(j)}^2){}\end{aligned} $$
(10.4.10)

where r (j), j = 1, …, p, are the sample canonical correlations. It can also be seen from (10.4.2) that, when U is of full rank p, U has a rectangular matrix-variate type-1 beta distribution and W = UU′ has a real matrix-variate type-1 beta distribution. Since it has been determined in Sect. 6.8.2, that under H o, the h-th moment of u 4 for an arbitrary h is given by

$$\displaystyle \begin{aligned}E[u_4^h|H_o]=c\frac{\prod_{j=p+1}^{p+q}\varGamma(\frac{m}{2}-\frac{j-1}{2}+h)}{\prod_{j=1}^q\varGamma(\frac{m}{2}-\frac{j-1}{2}+h)},~ m=n-1,{}\end{aligned} $$
(10.4.11)

where n is the sample size and c is such that \(E[u_4^0|H_o]=1\), the density of u 4 is expressible in terms of a G-function. It was also shown in the same section that \(-n\ln u_4\) is asymptotically distributed as a real chisquare random variable having \(\frac {(p+q)(p+q-1)}{2}-\frac {p(p-1)}{2}-\frac {q(q-1)}{2}=p\,q\) degrees of freedom, which corresponds to the number of parameters restricted by the hypothesis Σ 12 = O since there are p q free parameters in Σ 12. Thus, the following result:

Theorem 10.4.4

Consider the hypothesis H o  : ρ (1) = ⋯ = ρ (p) = 0, that is, the population canonical correlations ρ (j), j = 1, …, p, are all equal to zero, which is equivalent to the hypothesis H o  : Σ 12 = O. Let u 4 denote the (2∕n)-th root of the likelihood ratio criterion for testing this hypothesis. Then, as the sample size n ∞, under H o,

$$\displaystyle \begin{aligned}-n\ln u_4=-2 \ln(\mathit{\mbox{the likelihood ratio criterion}})\to\chi_{pq}^2\,,{}\end{aligned} $$
(10.4.12)

\(\chi _{\nu }^2\) denoting a real chisquare random variable having ν degrees of freedom.

An illustrative numerical example has already been presented in Chap. 6.

Note 10.1

We have initially assumed that Σ > O, Σ 11 > O and Σ 22 > O. However, \(\varSigma _{12}=\varSigma _{21}^{\prime }\) may or may not be of full rank or some of its elements could be equal to zero. Note that \(\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\) is either positive definite or positive semi-definite. Whenever p ≤ q and Σ 12 is of rank p, \(\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}>O\) and, in this instance, all the p canonical correlations are positive. If Σ 12 is not of full rank, then some of the eigenvalues of W as previously defined, as well as the corresponding canonical correlations will be equal to zero and, in the event that q ≤ p, similar statements would hold with respect to Σ 21, W′ and the resulting canonical correlations. This aspect will not be further investigated from an inferential standpoint.

Note 10.2

Consider the regression of X on Y , that is, E[X|Y ], when has the following real (p + q)-variate normal distribution:

Then, from equation (3.3.5), we have

$$\displaystyle \begin{aligned}E[X|Y]=\mu_{(1)}+\varSigma_{12}\varSigma_{22}^{-1}(Y-\mu_{(2)}) \end{aligned}$$

where \(\mu '=(\mu _{(1)}^{\prime },\mu _{(2)}^{\prime })\) and

$$\displaystyle \begin{aligned}\mathrm{Cov}(X|Y)=\varSigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}. \end{aligned}$$

Regression analysis is performed on the conditional space where Y  is either composed of non-random real scalar variables or given values of real scalar random variables, whereas canonical correlation analysis is carried out in the entire space of Z. Clearly, these techniques involve distinct approaches. When Y  is given values of random variables, then Σ 12 and Σ 22 can make sense. In this instance, the hypothesis H o  : Σ 12 = O, in which case the regression coefficient matrix is a null matrix or, equivalently, the hypothesis that Y  does not contribute to predicting X, implies that the canonical correlation matrix \(\varSigma _{11}^{-\frac {1}{2}}\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}\) is as well a null matrix. Accordingly, in this case, the ‘no regression’ hypothesis Σ 12 = O (no contribution of Y  in predicting X) is equivalent to the hypothesis that the canonical correlations are equal to zero and vice versa.

10.5. The General Sampling Distribution of the Canonical Correlation Matrix

Let the (p + q) × 1 real vector random variable . Consider a simple random sample of size n from this Gaussian population and let the sample sum of products matrix S be partitioned as in the preceding section. Let the sample canonical correlation matrix be denoted by R and the corresponding population canonical correlation, by P, that is,

$$\displaystyle \begin{aligned}R=S_{11}^{-\frac{1}{2}}S_{12}S_{22}^{-1}S_{21}S_{11}^{-\frac{1}{2}}\ \mathrm{and}\ \ P=\varSigma_{11}^{-\frac{1}{2}}\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}\varSigma_{11}^{-\frac{1}{2}}. \end{aligned}$$

We now examine the distribution of R, assuming that P ≠ O. Letting the determinant of I − R be denoted by u, we have

$$\displaystyle \begin{aligned}u=|I-R|=|S_{11}-S_{12}S_{22}^{-1}S_{21}||S_{11}|{}^{-1}=\frac{|S|}{|S_{11}|~|S_{22}|}. \end{aligned}$$

Thus, the h-th moment of u is

$$\displaystyle \begin{aligned}E[u^h]=E\Big[\frac{|S|}{|S_{11}|~|S_{22}|}\Big]^h=E[|S|{}^h|S_{11}|{}^{-h}|S_{22}|{}^{-h}]. \end{aligned}$$

Since S, S 11 and S 22 are functions of S, we can integrate out over the density of S, namely the Wishart density with m = n − 1 degrees of freedom and parameter matrix Σ > O. Then for m ≥ p + q,

$$\displaystyle \begin{aligned} E[u^h]=\frac{1}{|2\varSigma|{}^{\frac{m}{2}}\varGamma_{p+q}(\frac{m}{2})}\int_{S>O}\Big[\frac{|S|}{|S_{11}|~|S_{22}|}\Big]^h|S|{}^{\frac{m}{2}-\frac{p+q+1}{2}}\mathrm{e}^{-\frac{1}{2}\mathrm{tr}(\varSigma^{-1}S)}\mathrm{d}S. \end{aligned} $$
(10.5.1)

Let us substitute S to \(\frac {1}{2}S\) so that 2 will vanish from the factors containing 2, and let us replace |S 11|h and |S 22|h by equivalent integrals:

$$\displaystyle \begin{aligned} |S_{11}|{}^{-h}&=\frac{1}{\varGamma_p(h)}\int_{Y_1>O}|Y_1|{}^{h-\frac{p+1}{2}}\mathrm{e}^{-\mathrm{tr}(Y_2S_{11})}\mathrm{d}Y_1,~ \Re(h)>\frac{p-1}{2};\\ |S_{22}|{}^{-h}&=\frac{1}{\varGamma_q(h)}\int_{Y_2>O}|Y_2|{}^{h-\frac{q+1}{2}}\mathrm{e}^{-\mathrm{tr}(Y_2S_{22})}\mathrm{d}Y_2,~\Re(h)>\frac{q-1}{2}.\end{aligned} $$

Then,

$$\displaystyle \begin{aligned} E[u^h]&=\frac{1}{|\varSigma|{}^{\frac{m}{2}}\varGamma_{p+q}(\frac{m}{2})}\frac{1}{\varGamma_p(h)\varGamma_q(h)}\int_{Y_1>O}\int_{Y_2>O}|Y_1|{}^{h-\frac{p+1}{2}}|Y_2|{}^{h-\frac{q+1}{2}}\int_{S>O} |S|{}^{\frac{m}{2}+h-\frac{p+q+1}{2}}\\ &\qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \times\mathrm{e}^{-\mathrm{tr}(\varSigma^{-1}S+YS)}\mathrm{d}S\wedge\mathrm{d}Y_1\wedge\mathrm{d}Y_2\end{aligned} $$
(10.5.2)

where

Integrating over S in (10.5.2) gives

$$\displaystyle \begin{aligned} E[u^h]&=\frac{\varGamma_{p+q}(\frac{m}{2}+h)}{\varGamma_{p+q}(\frac{m}{2})}\frac{1}{|\varSigma|{}^{\frac{m}{2}}\varGamma_p(h)\varGamma_q(h)}\int_{Y_1>O}\int_{Y_2>O}|Y_1|{}^{h-\frac{p+1}{2}}|Y_2|{}^{h-\frac{q+1}{2}}\\ &\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \times |\varSigma^{-1}+Y|{}^{-(\frac{m}{2}+h)}\mathrm{d}Y_1\wedge\mathrm{d}Y_2.\end{aligned} $$

Let

Then, the determinant can be expanded as follows:

$$\displaystyle \begin{aligned} |\varSigma^{-1}+Y|&=|\varSigma^{22}+Y_2|~|\varSigma^{11}+Y_1-\varSigma^{12}(\varSigma^{22}+Y_2)^{-1}\varSigma^{21}|\\ &=|\varSigma^{22}+Y_2|~|Y_1+B|, \ B=\varSigma^{11}-\varSigma^{12}(\varSigma^{22}+Y_2)^{-1}\varSigma^{21},\end{aligned} $$

so that

$$\displaystyle \begin{aligned}|\varSigma^{-1}+Y|{}^{-(\frac{m}{2}+h)}=|\varSigma^{22}+Y_2|{}^{-(\frac{m}{2}+h)}|I+B^{-1}Y_1|{}^{-(\frac{m}{2}+h)}|B|{}^{-(\frac{m}{2}+h)}.\end{aligned}$$

Collecting the factors containing Y 1 and integrating out, we have

$$\displaystyle \begin{aligned}\frac{1}{\varGamma_p(h)}\int_{Y_1>O}|Y_1|{}^{h-\frac{p+1}{2}}|I+B^{-1}Y_1|{}^{-(\frac{m}{2}+h)}\mathrm{d}Y_1=\frac{\varGamma_p(\frac{m}{2})}{\varGamma_p(\frac{m}{2}+h)}|B|{}^h,~\Re(h)>\frac{p-1}{2}, \end{aligned}$$

and \(|B|{ }^{-(\frac {m}{2}+h)}|B|{ }^h=|B|{ }^{-\frac {m}{2}}\). Noting that

|B| can be expressed in the following form:

$$\displaystyle \begin{aligned} |B|&=\frac{|\varSigma^{11}|~|Y_2+C|}{|\varSigma^{22}+Y_2|},~ C=\varSigma^{22}-\varSigma^{21}(\varSigma^{11})^{-1}\varSigma^{12}=\varSigma_{22}^{-1}, \end{aligned} $$
(i)

so that,

$$\displaystyle \begin{aligned} |B|{}^{-\frac{m}{2}}&=|\varSigma^{11}|{}^{-\frac{m}{2}}|Y_2+\varSigma_{22}^{-1}|{}^{-\frac{m}{2}}|Y_2+\varSigma^{22}|{}^{\frac{m}{2}}\Rightarrow\\ |B|{}^{-\frac{m}{2}}|Y+\varSigma^{22}|{}^{-(\frac{m}{2}+h)}&=|\varSigma^{11}|{}^{-\frac{m}{2}}|Y_2+\varSigma_{22}^{-1}|{}^{-\frac{m}{2}}|Y_2+\varSigma^{22}|{}^{-h}. \end{aligned} $$
(ii)

Collecting all the factors containing Y 2 and integrating out, we have the following:

$$\displaystyle \begin{aligned} &\frac{1}{\varGamma_q(h)}\int_{Y_2>O}|Y_2|{}^{h-\frac{q+1}{2}}|Y_2+\varSigma_{22}^{-1}|{}^{-\frac{m}{2}}|Y_2+\varSigma^{22}|{}^{-h}\mathrm{d}Y_2\\ &=\frac{|\varSigma_{22}|{}^{\frac{m}{2}+h}}{\varGamma_q(h)}\int_{Y_2>O}|Y_2|{}^{h-\frac{q+1}{2}}|I+\varSigma_{22}^{\frac{1}{2}}Y_2\varSigma_{22}^{\frac{1}{2}}|{}^{-\frac{m}{2}}\\ &\qquad \qquad \qquad \ \ \ \times |\varSigma_{22}^{\frac{1}{2}}Y_2\varSigma_{22}^{\frac{1}{2}}+\varSigma_{22}^{\frac{1}{2}}\varSigma^{22}\varSigma_{22}^{\frac{1}{2}}|{}^{-h}\mathrm{d}Y_2\\ &=\frac{|\varSigma_{22}|{}^{\frac{m}{2}}}{\varGamma_q(h)}\int_{W>O}|W|{}^{h-\frac{q+1}{2}}|I+W|{}^{-\frac{m}{2}}|W+\varSigma_{22}^{\frac{1}{2}}\varSigma^{22}\varSigma_{22}^{\frac{1}{2}}|{}^{-h}\mathrm{d}W,~W=\varSigma_{22}^{\frac{1}{2}}Y_2\varSigma_{22}^{\frac{1}{2}}, \end{aligned} $$
(iii)

as \(W=\varSigma _{22}^{\frac {1}{2}}Y_2\varSigma _{22}^{\frac {1}{2}}\Rightarrow \mathrm {d}Y_2=|\varSigma _{22}|{ }^{-\frac {q+1}{2}}\mathrm {d}W\). Now, letting W = U −1 − I, so that dW = |U|−(q+1)dU with O < U < I, the expression in (iii), denoted by δ, becomes

$$\displaystyle \begin{aligned}\delta=\frac{|\varSigma_{22}|{}^{\frac{m}{2}}}{\varGamma_q(h)}\int_{O<U<I}|U|{}^{\frac{m}{2}-\frac{q+1}{2}}|I-U|{}^{h-\frac{q+1}{2}}|I-AU|{}^{-h}\mathrm{d}U \end{aligned}$$

where \(A=I-\varSigma _{22}^{\frac {1}{2}}\varSigma ^{22}\varSigma _{22}^{\frac {1}{2}}\). Note that since \(|\varSigma |{ }^{\frac {m}{2}}=|\varSigma _{22}|{ }^{\frac {m}{2}}|\varSigma _{11}-\varSigma _{12}(\varSigma _{22})^{-1}\varSigma _{21}|{ }^{\frac {m}{2}}=|\varSigma _{22}|{ }^{\frac {m}{2}}|\varSigma ^{11}|{ }^{-\frac {m}{2}}\), \(|\varSigma |{ }^{\frac {m}{2}}\) in the denominator of the constant part gets canceled out, the remaining constant expression being

$$\displaystyle \begin{aligned}\frac{\varGamma_{p+q}(\frac{m}{2}+h)}{\varGamma_{p+q}(\frac{m}{2})}\frac{\varGamma_p(\frac{m}{2})}{\varGamma_p(\frac{m}{2}+h)}. \end{aligned}$$
(iv)

The integral part of δ can be evaluated by making use of Euler’s representation of a Gauss’ hypergeometric function of matrix argument, which as given in formula (5.2.15) of Mathai (1997), is

$$\displaystyle \begin{aligned}\frac{\varGamma_q(a)\varGamma_q(c-a)}{\varGamma_q(c)}{{}_2F_1}(a,b;c;X)=\int_{O<Z<I}|Z|{}^{a-\frac{q+1}{2}}|I-Z|{}^{c-a-\frac{q+1}{2}}|I-XZ|{}^{-b}\mathrm{d}Z \end{aligned}$$

where O < Z < I and O < X < I are q × q real matrices. Thus, δ can be expressed as the follows:

$$\displaystyle \begin{aligned}\delta=\frac{\varGamma_q(\frac{m}{2})\varGamma_q(h)}{\varGamma_q(\frac{m}{2}+h)\varGamma_q(h)}{{}_2F_1}\Big(\frac{m}{2},h;\frac{m}{2}+h;I-\varSigma_{22}^{\frac{1}{2}}\varSigma^{22}\varSigma_{22}^{\frac{1}{2}}\Big), \end{aligned}$$

so that

$$\displaystyle \begin{aligned} E[u^h]=\frac{\varGamma_{p+q}(\frac{m}{2}+h)\varGamma_p(\frac{m}{2})}{\varGamma_{p+q}(\frac{m}{2})\varGamma_p(\frac{m}{2}+h)} \frac{\varGamma_q(\frac{m}{2})}{\varGamma_q(\frac{m}{2}+h)}\,{{}_2F_1}\Big(\frac{m}{2},h;\frac{m}{2}+h; I-\varSigma_{22}^{\frac{1}{2}}\varSigma^{22}\varSigma_{22}^{\frac{1}{2}}\Big). \end{aligned} $$
(10.5.3)

For p ≤ q, it follows from the definition of the matrix-variate gamma function that Γ p+q(α) = π pq∕2 Γ q(α)Γ p(α − q∕2). Thus, the constant part in (10.5.3) simplifies to

$$\displaystyle \begin{aligned}\frac{\varGamma_p(\frac{m}{2}-\frac{q}{2}+h)}{\varGamma_p(\frac{m}{2}-\frac{q}{2})}\frac{\varGamma_p(\frac{m}{2})}{\varGamma_p(\frac{m}{2}+h)}. \end{aligned}$$

Then,

$$\displaystyle \begin{aligned} E[u^h]=\frac{\varGamma_p(\frac{m-q}{2}+h)}{\varGamma_p(\frac{m-q}{2})}\frac{\varGamma_p(\frac{m}{2})}{\varGamma_p(\frac{m}{2}+h)}\, {{}_2F_1}\Big(\frac{m}{2},h;\frac{m}{2}+h;I-\varSigma_{22}^{\frac{1}{2}}\varSigma^{22}\varSigma_{22}^{\frac{1}{2}}\Big) \end{aligned} $$
(10.5.4)

for \(m\ge p+q,~ \Re (h)>-\frac {m}{2}+\frac {p-1}{2}+\frac {q}{2},~ p\le q\). Had Y 2 been integrated out first instead of Y 1, we would have ended up with a hypergeometric function having \(I-\varSigma _{11}^{\frac {1}{2}}\varSigma ^{11}\varSigma _{11}^{\frac {1}{2}}\) as its argument, that is,

$$\displaystyle \begin{aligned}E[u^h]=\frac{\varGamma_p(\frac{m-q}{2}+h)}{\varGamma_p(\frac{m-q}{2})}\frac{\varGamma_p(\frac{m}{2})}{\varGamma_p(\frac{m}{2}+h)}\,{{}_2F_1}\Big(\frac{m}{2},h;\frac{m}{2}+h;I-\varSigma_{11}^{\frac{1}{2}}\varSigma^{11}\varSigma_{11}^{\frac{1}{2}}\Big).{}\end{aligned} $$
(10.5.5)

10.5.1. The sampling distribution of the multiple correlation coefficient

When p = 1 and q > 1, \(r^2_{1(1\ldots q)}\) is equal to the square of the sample multiple correlation coefficient r 1(1…q). In this case, the argument in (10.5.5) is a real scalar quantity that is equal to 1 − σ 11 σ 11, the real matrix-variate Γ p(⋅) functions are simply Γ(⋅) functions and \(u=1-r^2_{1(1\ldots q)}\). Letting \(y=r^2_{1(1\ldots q)}\), E[1 − y]h is available from (10.5.5) for p = 1 and the argument of the 2 F 1 hypergeometric function is then

$$\displaystyle \begin{aligned}1-\sigma_{11}\sigma^{11}=1-\sigma_{11}(\sigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21})^{-1}=-\frac{\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}}{\sigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}}.{}\end{aligned} $$
(10.5.6)

By taking the inverse Mellin transform of (10.5.5) for h = s − 1 and p = 1, we can express the density f(y) of the square of the sample multiple correlation as follows:

$$\displaystyle \begin{aligned}f(y)=\frac{(1-\rho^2)^{\frac{m}{2}}\varGamma(\frac{m}{2})}{\varGamma(\frac{m-q}{2})\varGamma(\frac{q}{2})}y^{\frac{q}{2}-1}(1-y)^{\frac{m-q}{2}-1}{{}_2F_1}\Big(\frac{m}{2},\frac{m}{2};\frac{q}{2};\rho^2y\Big){}\end{aligned} $$
(10.5.7)

where ρ 2 is the population multiple correlation squared, that is, \(\rho ^2=[\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}]/\sigma _{11}\). We can verify the result by computing the h-th moment of 1 − y in (10.5.7). The h-th moment can be determined as follows by expanding the 2 F 1 function and then integrating:

$$\displaystyle \begin{aligned} E[1-y]^h&=\frac{(1-\rho^2)^{\frac{m}{2}}\varGamma(\frac{m}{2})}{\varGamma(\frac{m-q}{2})\varGamma(\frac{q}{2})}\sum_{k=0}^{\infty}\frac{(\frac{m}{2})_k(\frac{m}{2})_k}{(\frac{q}{2})_k}\frac{(\rho^2)^k}{k!}\\ &\ \ \ \ \times \int_0^1y^{\frac{q}{2}+k-1}(1-y)^{\frac{m-q}{2}+h-1}\mathrm{d}y,\end{aligned} $$

the integral part being

$$\displaystyle \begin{aligned}\frac{\varGamma(\frac{q}{2}+k)\varGamma(\frac{m-q}{2}+h)}{\varGamma(\frac{m}{2}+h+k)}=\frac{\varGamma(\frac{q}{2})\varGamma(\frac{m-q}{2}+h)}{\varGamma(\frac{m}{2}+h)}\frac{(\frac{q}{2})_k}{(\frac{m}{2}+h)_k}, \end{aligned}$$

so that

$$\displaystyle \begin{aligned}E[1-y]^h=(1-\rho^2)^{\frac{m}{2}}\frac{\varGamma(\frac{m-q}{2}+h)}{\varGamma(\frac{m-q}{2})}\frac{\varGamma(\frac{m}{2})}{\varGamma(\frac{m}{2}+h)}{{}_2F_1}\Big(\frac{m}{2},\frac{m}{2};\frac{m}{2}+h;\rho^2\Big).{}\end{aligned} $$
(10.5.8)

On applying the relationship,

$$\displaystyle \begin{aligned}{{}_2F_1}(a,b;c;z)=(1-z)^{-b}{{}_2F_1}\Big(c-a,b;c;\frac{z}{z-1}\Big),{}\end{aligned} $$
(10.5.9)

we have

$$\displaystyle \begin{aligned}{{}_2F_1}\Big(\frac{m}{2},\frac{m}{2};\frac{m}{2}+h;\rho^2\Big)=(1-\rho^2)^{-\frac{m}{2}}{{}_2F_1}\Big(h,\frac{m}{2};\frac{m}{2}+h;\frac{\rho^2}{\rho^2-1}\Big), \end{aligned}$$

with

$$\displaystyle \begin{aligned}\frac{\rho^2}{\rho^2-1}=-\frac{\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}}{\sigma_{11}-\varSigma_{12}\varSigma_{22}^{-1}\varSigma_{21}}, \end{aligned}$$

which agrees with (10.5.6). Observe that \((1-\rho ^2)^{\frac {m}{2}}\) gets canceled out so that (10.5.8) agrees with (10.5.5) for p = 1.

We can also obtain a representation of the density of the sample canonical correlation matrix whose M-transform is as given in (10.5.5) for p ≤ q. This can be achieved by duplicating the steps utilized for the particular case considered in this section, which yields the following density:

$$\displaystyle \begin{aligned} f(R)&=\frac{|I-P|{}^{\frac{m}{2}}\varGamma_p(\frac{m}{2})}{\varGamma_p(\frac{m-q}{2})\varGamma_p(\frac{q}{2})}|R|{}^{\frac{q}{2}-\frac{p+1}{2}}|I-R|{}^{\frac{m-q}{2}-\frac{p+1}{2}}\\ {} &\ \ \ \ \times {{}_2F_1}\Big(\frac{m}{2},\frac{m}{2};\frac{q}{2};P^{\frac{1}{2}}RP^{\frac{1}{2}}\Big) \end{aligned} $$
(10.5.10)

where \(P=\varSigma _{11}^{-\frac {1}{2}}\varSigma _{12}\varSigma _{22}^{-1}\varSigma _{21}\varSigma _{11}^{-\frac {1}{2}}\) is the population canonical correlation matrix. Note that a function giving rise to a certain M-transform need not be unique. However, by making use of the Laplace transform and its inverse in the real matrix-variate case, Mathai (1981) has shown that the function specified in (10.5.10) is actually the unique density of R.

Exercises 10

10.1

In Example 10.3.2, verify that

$$\displaystyle \begin{aligned}\frac{[\alpha_{(2)}^{\prime}\varSigma_{12}\beta_{(2)}]^2}{\gamma_2\delta_2}=\lambda_2\end{aligned}$$

where λ 2 is the second largest eigenvalue of the canonical correlation matrix A.

10.2

In Example 10.3.2, use equation (10.1.1) or equation (ii) preceding it with ρ 1 = ρ 2 = ρ and evaluate β (1) and β (2) from α (1) and α (2). Obtain β first, normalize it subject to the constraint β′Σ 22 β = 1 and then obtain β (1) and β (2). Then verify the results

$$\displaystyle \begin{aligned}\frac{[\alpha_{(1)}^{\prime}\varSigma_{12}\beta_{(1)}]^2}{\gamma_1\delta_1}=\lambda_1\ \mbox{ and }\ \frac{[\alpha_{(2)}^{\prime}\varSigma_{12}\beta_{(2)}]^2}{\gamma_2\delta_2}=\lambda_2 \end{aligned}$$

where λ 1 and λ 2 are the largest and second largest eigenvalues of the canonical correlation matrix A.

10.3

Let

where x 1, x 2, x 3, y 1, y 2 are real scalar random variables. Evaluate the following where the notations of this chapter are utilized: (1): The canonical correlations ρ (1) and ρ (2); (2): The first pair of canonical variables (u 1, v 1) by direct evaluation as done in Example 10.3.2; (3): Verify that

$$\displaystyle \begin{aligned}\frac{[\beta_{(1)}^{\prime}\varSigma_{21}\alpha_{(1)}]^2}{\gamma_1\delta_1}=\lambda_1:\mbox{ the largest eigenvalue of}\ B \end{aligned}$$

where \(B=\varSigma _{22}^{-\frac {1}{2}}\varSigma _{21}\varSigma _{11}^{-1}\varSigma _{12}\varSigma _{22}^{-\frac {1}{2}}\); (4): Evaluate the second pair of canonical variables (u 2, v 2) by using equation (10.1.1) for constructing α (1) and α (2) after obtaining β (1) and β (2); (5): Verify that

$$\displaystyle \begin{aligned}\frac{[\beta_{(2)}^{\prime}\varSigma_{21}\alpha_{(2)}]^2}{\gamma_2\delta_2}=\lambda_2:\mbox{ the second largest eigenvalue of}\ B.\end{aligned}$$

10.4

Repeat Problem 10.3 with X, Y  and their associated covariance matrices defined as follows:

where x 1, x 2, x 3, y 1, y 2, y 3 are real scalar random variables. As well, compute the three pairs of canonical variables.

10.5

Show that the M-transform in (10.5.5) is available from the density specified in (10.5.10).