1.1. Introduction

It is assumed that the reader has had adequate exposure to basic concepts in Probability, Statistics, Calculus and Linear Algebra. This chapter provides a brief review of the results that will be needed in the remainder of this book. No detailed discussion of these topics will be attempted. For essential materials in these areas, the reader is, for instance, referred to Mathai and Haubold (2017a, 2017b). Some properties of vectors, matrices, determinants, Jacobians and wedge product of differentials to be utilized later on, are included in the present chapter. For the sake of completeness, we initially provide some elementary definitions. First, the concepts of vectors, matrices and determinants are introduced.

Consider the consumption profile of a family in terms of the quantities of certain food items consumed every week. The following table gives this family’s consumption profile for three weeks:

All the numbers appearing in this table are in kilograms (kg). In Week 1 the family consumed 2 kg of rice, 0.5 kg of lentils, 1 kg of carrots and 2 kg of beans. Looking at the consumption over three weeks, we have an arrangement of 12 numbers into 3 rows and 4 columns. If this consumption profile is expressed in symbols, we have the following representation:

$$\displaystyle \begin{aligned}A=(a_{ij})=\left[\begin{matrix}a_{11}&a_{12}&a_{13}&a_{14}\\ a_{21}&a_{22}&a_{23}&a_{24}\\ a_{31}&a_{32}&a_{33}&a_{34}\end{matrix}\right]=\left[\begin{matrix}2.00&0.50&1.00&2.00\\ 1.50&0.50&0.75&1.50\\ 2.00&0.50&0.50&1.25\end{matrix}\right]\end{aligned}$$

where, for example, a 11 = 2.00, a 13 = 1.00, a 22 = 0.50, a 23 = 0.75, a 32 = 0.50, a 34 = 1.25.

Definition 1.1.1

A matrix An arrangement of mn items into m rows and n columns is called an m by n (written as m × n) matrix.

Accordingly, the above consumption profile matrix is 3 × 4 (3 by 4), that is, it has 3 rows and 4 columns. The standard notation consists in enclosing the mn items within round ( ) or square [ ] brackets as in the above representation. The above 3 × 4 matrix is represented in different ways as A, (a ij) and items enclosed by square brackets. The mn items in the m × n matrix are called elements of the matrix. Then, in the above matrix A, a ij =  the i-th row, j-th column element or the (i,j)-th element. In the above illustration, i = 1, 2, 3 (3 rows) and j = 1, 2, 3, 4 (4 columns). A general m × n matrix A can be written as follows:

$$\displaystyle \begin{aligned}A=\left[\begin{matrix}a_{11}&a_{12}&\ldots&a_{1n}\\ a_{21}&a_{22}&\ldots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}&a_{m2}&\ldots&a_{mn}\end{matrix}\right].{}\end{aligned} $$
(1.1.1)

The elements are separated by spaces in order to avoid any confusion. Should there be any possibility of confusion, then the elements will be separated by commas. Note that the plural of “matrix” is “matrices”. Observe that the position of each element in Table 1.1 has a meaning. The elements cannot be permuted as rearranged elements will give different matrices. In other words, two m × n matrices A = (a ij) and B = (b ij) are equal if and only if a ij = b ij for all i and j, that is, they must be element-wise equal.

Table 1.1 Consumption profile

In Table 1.1, the first row, which is also a 1 × 4 matrix, represents this family’s first week’s consumption. The fourth column represents the consumption of beans over the three weeks’ period. Thus, each row and each column in an m × n matrix has a meaning and represents different aspects. In Eq. (1.1.1), all rows are 1 × n matrices and all columns are m × 1 matrices. A 1 × n matrix is called a row vector and an m × 1 matrix is called a column vector. For example, in Table 1.1, there are 3 row vectors and 4 column vectors. If the row vectors are denoted by R 1, R 2, R 3 and the column vectors by C 1, C 2, C 3, C 4, then we have

$$\displaystyle \begin{aligned}R_1=[2.00~~0.50~~1.00~~2.00], R_2=[1.50~~0.50~~0.75~~1.50], R_3=[2.00~~0.50~~0.50~~1.25] \end{aligned}$$

and

$$\displaystyle \begin{aligned}C_1=\left[\begin{matrix}2.00\\ 1.50\\ 2.00\end{matrix}\right], \ C_2=\left[\begin{matrix}0.50\\ 0.50\\ 0.50\end{matrix}\right],\ C_3=\left[\begin{matrix}1.00\\ 0.75\\ 0.50\end{matrix}\right], \ C_4=\left[\begin{matrix}2.00\\ 1.50\\ 1.25\end{matrix}\right].\end{aligned}$$

If the total consumption in Week 1 and Week 2 is needed, it is obtained by adding the row vectors element-wise:

$$\displaystyle \begin{aligned}R_1+R_2=[2.00+1.50~~0.50+0.50~~1.00+0.75~~2.00+1.50]= [3.50~~1.00~~1.75~~3.50].\end{aligned}$$

We will define the addition of two matrices in the same fashion as in the above illustration. For the addition to hold, both matrices must be of the same order m × n. Let A = (a ij) and B = (b ij) be two m × n matrices. Then the sum, denoted by A + B, is defined as

$$\displaystyle \begin{aligned}A+B=(a_{ij}+b_{ij})\end{aligned}$$

or equivalently as the matrix obtained by adding the corresponding elements. For example,

$$\displaystyle \begin{aligned}C_1+C_3=\left[\begin{matrix}2.00\\ 1.50\\ 2.00\end{matrix}\right]+\left[\begin{matrix}1.00\\ 0.75\\ 0.50\end{matrix}\right]=\left[\begin{matrix}3.00\\ 2.25\\ 2.50\end{matrix}\right]. \end{aligned}$$

Repeating the addition, we have

$$\displaystyle \begin{aligned}C_1+C_3+C_4=(C_1+C_3)+\left[\begin{matrix}2.00\\ 1.50\\ 1.25\end{matrix}\right]=\left[\begin{matrix}3.00\\ 2.25\\ 2.50\end{matrix}\right]+\left[\begin{matrix}2.00\\ 1.50\\ 1.25\end{matrix}\right]=\left[\begin{matrix}5.00\\ 3.75\\ 3.75\end{matrix}\right]. \end{aligned}$$

In general, if A = (a ij), B = (b ij), C = (c ij), D = (d ij) are m × n matrices, then A + B + C + D = (a ij + b ij + c ij + d ij), that is, it is the matrix obtained by adding the corresponding elements.

Suppose that in Table 1.1, we wish to express the elements in terms of grams instead of kilograms; then, each and every element therein must be multiplied by 1000. Thus, if A is the matrix corresponding to Table 1.1 and B is the matrix in terms of grams, we have

$$\displaystyle \begin{aligned} A&=\left[\begin{matrix}2.00&0.50&1.00&2.00\\ 1.50&0.50&0.75&1.50\\ 2.00&0.50&0.50&1.25\end{matrix}\right],\\ B&=\left[\begin{matrix}1000\times 2.00&1000\times 0.50&1000\times 1.00&1000\times 2.00\\ 1000\times 1.50&1000\times 0.50&1000\times 0.75&1000\times 1.50\\ 1000\times 2.00&1000\times 0.50&1000\times 0.50&1000\times 1.25\end{matrix}\right].\end{aligned} $$

We may write this symbolically as B = 1000 × A = 1000 A. Note that 1000 is a 1 × 1 matrix or a scalar quantity. Any 1 × 1 matrix is called a scalar quantity. Then we may define scalar multiplication of a matrix A by the scalar quantity c as c A or A = (a ij) ⇒ c A = (c a ij) or it is obtained by multiplying each and every element of A by the scalar quantity c. As a convention, c is written on the left of A as c A and not as A c. Then, if c = −1, then c A = (−1)A = −A and A + (−1)A = A − A = O where the capital O denotes a matrix whose elements are all equal to zero. A general m × n matrix wherein every element is zero is referred to as a null matrix and it is written as O (not zero). We may also note that if A, B, C are m × n matrices, then A + (B + C) = (A + B) + C. Moreover, A + O = O + A = A. If m = n, in which case the number of rows is equal to the number of columns, the resulting matrix is referred to as a square matrix because it is a square arrangement of elements; otherwise the matrix is called a rectangular matrix. Some special cases of square matrices are the following: For an n × n matrix or a square matrix of order n, suppose that a ij = 0 for all ij (that is, all non-diagonal elements are zeros; here “diagonal” means the diagonal going from top left to bottom right) and if there is at least one nonzero diagonal element, then such a matrix is called a diagonal matrix and it is usually written as diag(d 1, …, d n) where d 1, …, d n are the diagonal elements. Here are some examples of 3 × 3 diagonal matrices:

If in D 3, a = 1 so that all the diagonal elements are unities, the resulting matrix is called an identity matrix and a diagonal matrix whose diagonal elements are all equal to some number a that is not equal to 0 or 1, is referred to as a scalar matrix. A square non-null matrix A = (a ij) that contains at least one nonzero element below its leading diagonal and whose elements above the leading diagonal are all equal to zero, that is, a ij = 0 for all i < j, is called a͡ lower triangular matrix. Some examples of 2 × 2 lower triangular matrices are the following:

If, in a square non-null matrix, all elements below the leading diagonal are zeros and there is at least one nonzero element above the leading diagonal, then such a square matrix is referred to as an upper triangular matrix. Here are some examples:

Multiplication of Matrices Once again, consider Table 1.1. Suppose that by consuming 1 kg of rice, the family is getting 700 g (where g represents grams) of starch, 2 g protein and 1 g fat; that by eating 1 kg of lentils, the family is getting 200 g of starch, 100 g of protein and 100 g of fat; that by consuming 1 kg of carrots, the family is getting 100 g of starch, 200 g of protein and 150 g of fat; and that by eating 1 kg of beans, the family is getting 50 g of starch, 100 g of protein and 200 g of fat, respectively. Then the starch-protein-fat matrix, denoted by B, is the following where the rows correspond to rice, lentil, carrots and beans, respectively:

$$\displaystyle \begin{aligned}B=\left[\begin{matrix}700&2&1\\ 200&100&100\\ 100&200&150\\ 50&100&200\end{matrix}\right].\end{aligned}$$

Let B 1, B 2, B 3 be the columns of B. Then, the first column B 1 of B represents the starch intake per kg of rice, lentil, carrots and beans respectively. Similarly, the second column B 2 represents the protein intake per kg and the third column B 3 represents the fat intake, that is,

$$\displaystyle \begin{aligned}B_1=\left[\begin{matrix}700\\ 200\\ 100\\ 50\end{matrix}\right],\ B_2=\left[\begin{matrix}2\\ 100\\ 200\\ 100\end{matrix}\right],\ B_3=\left[\begin{matrix}1\\ 100\\ 150\\ 200\end{matrix}\right].\end{aligned}$$

Let the rows of the matrix A in Table 1.1 be denoted by A 1, A 2 and A 3, respectively, so that

$$\displaystyle \begin{aligned}A_1=[2.00~~0.50~~1.00~~2.00], A_2=[1.50~~0.50~~0.75~~1.50], A_3=[2.00~~0.50~~0.50~~1.25].\end{aligned}$$

Then, the total intake of starch by the family in Week 1 is available from

$$\displaystyle \begin{aligned}2.00\times 700+0.50\times 200+1.00\times 100+2.00\times 50=1700g.\end{aligned}$$

This is the sum of the element-wise products of A 1 with B 1. We will denote this by A 1 ⋅ B 1 (A 1 dot B 1). The total intake of protein by the family in Week 1 is determined as follows:

$$\displaystyle \begin{aligned}A_1.B_2=2.00\times 2+0.50\times 100+1.00\times 200+2.00\times 100=454\ \mathrm{g}\end{aligned}$$

and the total intake of fat in Week 1 is given by

$$\displaystyle \begin{aligned}A_1.B_3=2.00\times 1+0.50\times 100+1.00\times 150+2.00\times 200=602\ \mathrm{g}.\end{aligned}$$

Thus, the dot product of A 1 with B 1, B 2, B 3 provides the intake of starch, protein and fat in Week 1. Similarly, the dot product of A 2 with B 1, B 2, B 3 gives the intake of starch, protein and fat in Week 2. Thus, the configuration of starch, protein and fat intake over the three weeks is

$$\displaystyle \begin{aligned}AB=\left[\begin{matrix}A_1\cdot B_1&A_1\cdot B_2&A_1\cdot B_3\\ A_2\cdot B_1&A_2\cdot B_2&A_2\cdot B_3\\ A_3\cdot B_1&A_3\cdot B_2&A_3\cdot B_3\end{matrix}\right].\end{aligned}$$

A matrix having one column and m rows is an m × 1 matrix that is referred to as a column vector of m elements or a column vector of order m. A matrix having one row and n column is a 1 × n matrix called a row vector of n components or a row vector of order n. Let A be a row or column vector of order n, which consist of n elements or components. Let the elements comprising A be denoted by a 1, …, a n. Let B be a row or column vector of order n consisting of the elements b 1, …, b n. Then, the dot product of A and B, denoted by A ⋅ B = B ⋅ A is defined as A ⋅ B = a 1 b 1 + a 2 b 2 + ⋯ + a n b n so that or the corresponding elements of A and B are multiplied and added up. Let A be an m × n matrix whose m rows are written as A 1, …, A m. Let B be another n × r matrix whose r columns are written as B 1, …, B r. Note that the number of columns of A is equal to the number of rows of B, which in this case is n. When the number of columns of A is equal to the number of rows of B, the product AB is defined and equal to

$$\displaystyle \begin{aligned}AB=\left[\begin{matrix}A_1\cdot B_1&A_1\cdot B_2&\ldots&A_1\cdot B_r\\ A_2\cdot B_1&A_2\cdot B_2&\ldots&A_2\cdot B_r\\ \vdots&\vdots&\ddots&\vdots\\ A_m\cdot B_1&A_m\cdot B_2&\ldots&A_m\cdot B_r\end{matrix}\right] \\mbox{with}\ A=\left[\begin{matrix}A_1\\ A_2\\ \vdots\\ A_m\end{matrix}\right],\ B=[B_1~~B_2~~\cdots \ B_r], \end{aligned}$$

the resulting matrix AB being of order m × r. When AB is defined, BA need not be defined. However, if r = m, then BA is also defined, otherwise not. In other words, if A = (a ij) is m × n and if B = (b ij) is n × r and if C = (c ij) = AB, then c ij = A i ⋅ B j where A i is the i-th row of A and B j is the j-th column of B or \(c_{ij}=\sum _{k=1}^na_{ik}\,b_{kj}\) for all i and j. For example,

Note that, in this case, BA is defined and equal to

As another example, let

which is 3 × 3, whereas BA is 1 × 1:

As yet another example, let

Note that here both A and B are lower triangular matrices. The products AB and BA are defined since both A and B are 3 × 3 matrices. For instance,

Observe that since A and B are lower triangular, AB is also lower triangular. Here are some general properties of products of matrices: When the product of the matrices is defined, in which case we say that they are conformable for multiplication,

(1): the product of two lower triangular matrices is lower triangular;

(2): the product of two upper triangular matrices is upper triangular;

(3): the product of two diagonal matrices is diagonal;

(4): if A is m × n, IA = A where I = I m is an identity matrix, and A I n = A;

(5): OA = O whenever OA is defined, and A O = O whenever A O is defined. Transpose of a Matrix A matrix whose rows are the corresponding columns of A or, equivalently, a matrix whose columns are the corresponding rows of A is called the transpose of A denoted as A′ (A prime). For example,

Observe that A 2 is lower triangular and \(A_2^{\prime }\) is upper triangular, that \(A_3^{\prime }=A_3\), and that \(A_4^{\prime }=-A_4\). Note that if A is m × n, then A′ is n × m. If A is 1 × 1 then A′ is the same scalar (1 × 1) quantity. If A is a square matrix and A′ = A, then A is called a symmetric matrix. If B is a square matrix and B′ = −B, then B is called a skew symmetric matrix. Within a skew symmetric matrix B = (b ij), a diagonal element must satisfy the equation \(b_{jj}^{\prime }=-b_{jj},\) which necessitates that b jj = 0, whether B be real or complex. Here are some properties of the transpose: The transpose of a lower triangular matrix is upper triangular; the transpose of an upper triangular matrix is lower triangular; the transpose of a diagonal matrix is diagonal; the transpose of an m × n null matrix is an n × m null matrix;

$$\displaystyle \begin{aligned}(A')'=A;\ (AB)'=B'A'; \ (A_1\,A_2\,\cdots \, A_k)'=A_k^{\prime}\,\cdots \, A_2^{\prime}\,A_1^{\prime}\, ;\ (A+B)'=A'+B' \end{aligned}$$

whenever AB, A + B, and A 1 A 2A k are defined. Trace of a Square Matrix The trace is defined only for square matrices. Let A = (a ij) be an n × n matrix whose leading diagonal elements are a 11, a 22, …, a nn; then the trace of A, denoted by tr(A) is defined as tr(A) = a 11 + a 22 + ⋯ + a nn, that is, the sum of the elements comprising the leading diagonal. The following properties can directly be deduced from the definition. Whenever AB and BA are defined, tr(AB) = tr(BA) where AB need not be equal to BA. If A is m × n and B is n × m, then AB is m × m whereas BA is n × n; however, the traces are equal, that is, tr(AB) = tr(BA), which implies that tr(ABC) = tr(BCA) = tr(CAB). Length of a Vector Let V  be a n × 1 real column vector or a 1 × n real row vector then V  can be represented as a point in n-dimensional Euclidean space when the elements are real numbers. Consider a 2-dimensional vector (or 2-vector) with the elements (1, 2). Then, this vector corresponds to the point depicted in Fig. 1.1.

Figure 1.1
figure 1

The point P = (1, 2) in the plane

Let O be the origin and P be the point. Then, the length of the resulting vector is the Euclidean distance between O and P, that is, \(+\sqrt {(1)^2+(2)^2}=+\sqrt {5}\). Let U = (u 1, …, u n) be a real n-vector, either written as a row or a column. Then the length of U, denoted byUis defined as follows:

$$\displaystyle \begin{aligned}\Vert U\Vert =+\sqrt{u_1^2+\cdots +u_n^2}\end{aligned}$$

whenever the elements u 1, …, u n are real. If u 1, …, u n are complex numbers then \(\Vert U\Vert =\sqrt {|u_1|{ }^2+\cdots +|u_n|{ }^2}\) where |u j| denotes the absolute value or modulus of u j. If u j = a j + ib j, with \(i=\sqrt {(-1)}\) and a j, b j real, then \(|u_j|=+\sqrt {(a_j^2+b_j^2)}\). If the length of a vector is unity, that vector is called a unit vector. For example, e 1 = (1, 0, …, 0), e 2 = (0, 1, 0, …, 0), …, e n = (0, …, 0, 1) are all unit vectors. As well, \(V_1=(\frac {1}{\sqrt {2}},\frac {1}{\sqrt {2}})\) and \(V_2=(\frac {1}{\sqrt {6}},\frac {-2}{\sqrt {6}},\frac {1}{\sqrt {6}})\) are unit vectors. If two n-vectors, U 1 and U 2, are such that U 1 ⋅ U 2 = 0, that is, their dot product is zero, then the two vectors are said to be orthogonal to each other. For example, if U 1 = (1, 1) and U 2 = (1, −1), then U 1 ⋅ U 2 = 0 and U 1 and U 2 are orthogonal to each other; similarly, if U 1 = (1, 1, 1) and U 2 = (1, −2, 1), then U 1 ⋅ U 2 = 0 and U 1 and U 2 are orthogonal to each other. If U 1, …, U k are k vectors, each of order n, all being either row vectors or column vectors, and if U i ⋅ U j = 0 for all ij, that is, all distinct vectors are orthogonal to each other, then we say that U 1, …, U k forms an orthogonal system of vectors. In addition, if the length of each vector is unity, ∥U j∥ = 1, j = 1, …, k, then we say that U 1, …, U k is an orthonormal system of vectors. If a matrix A is real and its rows and its columns form an orthonormal system, then A is called an orthonormal matrix. In this case, AA′ = I n and A′A = I n; accordingly, any square matrix A of real elements such that AA′ = I n and A′A = I n is referred to as an orthonormal matrix. If only one equation holds, that is, B is a real matrix such that either BB′ = I, B′BI or B′B = I, BB′I, then B is called a semiorthonormal matrix. For example, consider the matrix

and A is an orthonormal matrix. As well,

and A here is orthonormal. However,

so that B is semiorthonormal. On deleting some rows from an orthonormal matrix, we obtain a semiorthonormal matrix such that BB′ = I and B′BI. Similarly, if we delete some of the columns, we end up with a semiorthonormal matrix such that B′B = I and BB′I. Linear Independence of Vectors Consider the vectors U 1 = (1, 1, 1), U 2 = (1, −2, 1), U 3 = (3, 0, 3). Then, we can easily see that U 3 = 2U 1 + U 2 = 2(1, 1, 1) + (1, −2, 1) = (3, 0, 3) or U 3 − 2U 1 − U 2 = O (a null vector). In this case, one of the vectors can be written as a linear function of the others. Let V 1 = (1, 1, 1), V 2 = (1, 0, −1), V 3 = (1, −2, 1). Can any one of these be written as a linear function of others? If that were possible, then there would exist a linear function of V 1, V 2, V 3 that is equal to is a null vector. Let us consider the equation a 1 V 1 + a 2 V 2 + a 3 V 3 = (0, 0, 0) where a 1, a 2, a 3 are scalars where at least one of them is nonzero. Note that a 1 = 0, a 2 = 0 and a 3 = 0 will always satisfy the above equation. Thus, our question is whether a 1 = 0, a 2 = 0, a 3 = 0 is the only solution.

$$\displaystyle \begin{aligned} a_1V_1+a_2V_2+a_3V_3&=O\Rightarrow a_1(1,1,1)+a_2(1,0,-1)+a_3(1,-2,1)=(0,0,0)\\ \Rightarrow a_1+a_2+a_3&=0~~({\mathit{i}});~~a_1-2a_3=0~~({\mathit{ii}});~~a_1-a_2+a_3=0. \end{aligned} $$
(iii)

From (ii), a 1 = 2a 3. Then, from (iii), 3a 3 − a 2 = 0 ⇒ a 2 = 3a 3; then from (i), 2a 3 + 3a 3 + a 3 = 0 or 6a 3 = 0 or a 3 = 0. Thus, a 2 = 0, a 1 = 0 and there is no nonzero a 1 or a 2 or a 3 satisfying the equation and hence V 1, V 2, V 3 cannot be linearly dependent; so, they are linearly independent. Hence, we have the following definition: Let U 1, …, U k be k vectors, each of order n, all being either row vectors or column vectors, so that addition and linear functions are defined. Let a 1, …, a k be scalar quantities. Consider the equation

$$\displaystyle \begin{aligned}a_1U_1+a_2U_2+\cdots +a_kU_k=O \ (\mbox{a null vector}).\end{aligned} $$
(iv)

If a 1 = 0, a 2 = 0, …, a k = 0 is the only solution to (iv), then U 1, …, U k are linearly independent, otherwise they are linearly dependent. If they are linearly dependent, then at least one of the vectors can be expressed as a linear function of others. The following properties can be established from the definition: Let U 1, …, U k be n-vectors, k ≤ n.

(1) If U 1, …, U k are mutually orthogonal, then they are linearly independent, that is, if U i ⋅ U j = 0, for all ij, then U 1, …, U k are linearly independent;

(2) There cannot be more than n mutually orthogonal n-vectors;

(3) There cannot be more than n linearly independent n-vectors. Rank of a Matrix The maximum number of linearly independent row vectors of a m × n matrix is called the row rank of the matrix; the maximum number of linearly independent column vectors is called the column rank of the matrix. It can be shown that the row rank of any matrix is equal to its column rank, and this common rank is called the rank of the matrix. If r is the rank of a m × n matrix, then r ≤ m and r ≤ n. If m ≤ n and the rank is m or if n ≤ m and the rank is n, then the matrix is called a full rank matrix. A square matrix of full rank is called a nonsingular matrix. When the rank of an n × n matrix is r < n, this matrix is referred to as a singular matrix. Singularity is defined only for square matrices. The following properties clearly hold:

(1) A diagonal matrix with at least one zero diagonal element is singular or a diagonal matrix with all nonzero diagonal elements is nonsingular;

(2) A triangular matrix (upper or lower) with at least one zero diagonal element is singular or a triangular matrix with all diagonal elements nonzero is nonsingular;

(3) A square matrix containing at least one null row vector or at least one null column vector is singular;

(4) Linear independence or dependence in a collection of vectors of the same order and category (either all are row vectors or all are column vectors) is not altered by multiplying any of the vectors by a nonzero scalar;

(5) Linear independence or dependence in a collection of vectors of the same order and category is not altered by adding any vector of the set to any other vector in the same set;

(6) Linear independence or dependence in a collection of vectors of the same order and category is not altered by adding a linear combination of vectors from the same set to any other vector in the same set;

(7) If a collection of vectors of the same order and category is a linearly dependent system, then at least one of the vectors can be made null by the operations of scalar multiplication and addition. Note: We have defined “vectors” as an ordered set of items such as an ordered set of numbers. One can also give a general definition of a vector as an element in a set S which is closed under the operations of scalar multiplication and addition (these operations are to be defined on S), that is, letting S be a set of items, if V 1 ∈ S and V 2 ∈ S, then cV 1 ∈ S and V 1 + V 2 ∈ S for all scalar c and for all V 1 and V 2, that is, if V 1 is an element in S, then cV 1 is also an element in S and if V 1 and V 2 are in S, then V 1 + V 2 is also in S, where operations c V 1 and V 1 + V 2 are to be properly defined. One can impose additional conditions on S. However, for our discussion, the notion of vectors as ordered set of items will be sufficient.

1.2. Determinants

Determinants are defined only for square matrices. They are certain scalar functions of the elements of the square matrix under consideration. We will motivate this particular function by means of an example that will also prove useful in other areas. Consider two 2-vectors, either both row vectors or both column vectors. Let U = OP and V = OQ be the two vectors as shown in Fig. 1.2. If the vectors are separated by a nonzero angle θ then one can create the parallelogram OPSQ with these two vectors as shown in Fig. 1.2.

Figure 1.2
figure 2

Parallelogram generated from two vectors

The area of the parallelogram is twice the area of the triangle OPQ. If the perpendicular from P to OQ is PR, then the area of the triangle is \(\frac {1}{2}PR\times OQ\) or the area of the parallelogram OPSQ is PR ×∥V ∥ where PR is \(OP\times \sin \theta =\Vert U\Vert \times \sin \theta \). Therefore the area is \(\Vert U\Vert ~\Vert V\Vert ~\sin \theta \) or the area, denoted by ν is

$$\displaystyle \begin{aligned}\nu=\Vert U\Vert ~\Vert V\Vert\sqrt{(1-\cos^2\theta)}.\end{aligned}$$

If θ 1 is the angle U makes with the x-axis and θ 2, the angle V  makes with the x-axis, then if U and V  are as depicted in Fig. 1.2, then θ = θ 1 − θ 2. It follows that

$$\displaystyle \begin{aligned}\cos\theta=\cos{}(\theta_1-\theta_2)=\cos\theta_1\cos\theta_2+ \sin\theta_1\sin\theta_2=\frac{U\cdot V}{\Vert U\Vert~\Vert V\Vert},\end{aligned}$$

as can be seen from Fig. 1.2. In this case,

$$\displaystyle \begin{aligned}\nu=\Vert U\Vert~\Vert V\Vert\sqrt{1-\Big(\frac{(U\cdot V)}{\Vert U\Vert~\Vert V\Vert}\Big)^2}=\sqrt{(\Vert U\Vert)^2~(\Vert V\Vert)^2-(U\cdot V)^2}{}\end{aligned} $$
(1.2.1)

and

$$\displaystyle \begin{aligned}\nu^2=(\Vert U\Vert)^2(\Vert V\Vert)^2-(U\cdot V)^2.{} \end{aligned} $$
(1.2.2)

This can be written in a more convenient way. Letting \(X=\left (\begin {matrix}U\\ V\end {matrix}\right )\),

$$\displaystyle \begin{aligned}XX'=\left(\begin{matrix}U\\ V\end{matrix}\right)\left(U'~V'\right)=\left[\begin{matrix}UU'&UV'\\ VU'&VV'\end{matrix}\right]=\left[\begin{matrix}U\cdot U&U\cdot V\\ V\cdot U&V\cdot V\end{matrix}\right].{}\end{aligned} $$
(1.2.3)

On comparing (1.2.2) and (1.2.3), we note that (1.2.2) is available from (1.2.3) by taking a scalar function of the following type. Consider a matrix

$$\displaystyle \begin{aligned}C=\left[\begin{matrix}a&b\\ c&d\end{matrix}\right];\mbox{ then (1.2.2) is available by taking } ad-bc \end{aligned}$$

where a, b, c, d are scalar quantities. A scalar function of this type is the determinant of the matrix C.

A general result can be deduced from the above procedure: If U and V  are n-vectors and if θ is the angle between them, then

$$\displaystyle \begin{aligned}\cos\theta=\frac{U\cdot V}{\Vert U\Vert~\Vert V\Vert}\end{aligned}$$

or the dot product of U and V  divided by the product of their lengths when θ≠0, and the numerator is equal to the denominator when θ = 2, n = 0, 1, 2, … . We now provide a formal definition of the determinant of a square matrix.

Definition 1.2.1

The Determinant of a Square Matrix Let A = (a ij) be a n × n matrix whose rows (columns) are denoted by α 1, …, α n. For example, if α i is the i-th row vector, then

$$\displaystyle \begin{aligned}\alpha_i=(a_{i1}~a_{i2}\ \dots \ a_{in}).\end{aligned}$$

The determinant of A will be denoted by |A| or det(A) when A is real or complex and the absolute value of the determinant of A will be denoted by |det(A)| when A is in the complex domain. Then, |A| will be a function of α 1, …, α n, written as

$$\displaystyle \begin{aligned}|A|=\mathrm{det}(A)=f(\alpha_1,\ldots,\alpha_i,\ldots,\alpha_j,\ldots,\alpha_n), \end{aligned}$$

which will be defined by the following four axioms (postulates or assumptions): (this definition also holds if the elements of the matrix are in the complex domain)

$$\displaystyle \begin{aligned}f(\alpha_1,\ldots,c\,\alpha_i,\ldots,\alpha_n)=c f(\alpha_1,\ldots,\alpha_i,\ldots,\alpha_n), \end{aligned}$$
(1)

which is equivalent to saying that if any row (column) is multiplied by a scalar quantity c (including zero), then the whole determinant is multiplied by c;

$$\displaystyle \begin{aligned}f(\alpha_1,...\alpha_i,\ldots,\alpha_i+\alpha_j,\ldots,\alpha_n)=f(\alpha_1,\ldots,\alpha_i,\ldots,\alpha_j,\ldots,\alpha_n), \end{aligned}$$
(2)

which is equivalent to saying that if any row (column) is added to any other row (column), then the value of the determinant remains the same;

$$\displaystyle \begin{aligned}f(\alpha_1,\ldots,\gamma_i+\delta_i,\ldots,\alpha_n)=f(\alpha_1,\ldots,\gamma_i,\ldots,\alpha_n)+f(\alpha_1,\ldots,\delta_i,\ldots,\alpha_n), \end{aligned}$$
(3)

which is equivalent to saying that if any row (column), say the i-th row (column) is written as a sum of two vectors, α i = γ i + δ i then the determinant becomes the sum of two determinants such that γ i appears at the position of α i in the first one and δ i appears at the position of α i in the second one;

$$\displaystyle \begin{aligned}f(e_1,\ldots,e_n)=1 \end{aligned}$$
(4)

where e 1, …, e n are the basic unit vectors as previously defined; this axiom states that the determinant of an identity matrix is 1.

Let us consider some corollaries resulting from Axioms (1) to (4). On combining Axioms (1) and (2), we have that the value of a determinant remains unchanged if a linear function of any number of rows (columns) is added to any other row (column). As well, the following results are direct consequences of the axioms.

(i): The determinant of a diagonal matrix is the product of the diagonal elements [which can be established by repeated applications of Axiom (1)];

(ii): If any diagonal element in a diagonal matrix is zero, then the determinant is zero, and thereby the corresponding matrix is singular; if none of the diagonal elements of a diagonal matrix is equal to zero, then the matrix is nonsingular.

(iii): If any row (column) of a matrix is null, then the determinant is zero or the matrix is singular [Axiom (1)];

(iv): If any row (column) is a linear function of other rows (columns), then the determinant is zero [By Axioms (1) and (2), we can reduce that row (column) to a null vector]. Thus, the determinant of a singular matrix is zero or if the row (column) vectors form a linearly dependent system, then the determinant is zero.

By using Axioms (1) and (2), we can reduce a triangular matrix to a diagonal form when evaluating its determinant. For this purpose we shall use the following standard notation: “c (i) + (j) ⇒” means “c times the i-th row is added to the j-th row which results in the following:” Let us consider a simple example. Consider a triangular matrix and its determinant. Evaluate the determinant of the following matrix:

It is an upper triangular matrix. We take out − 4 from the third row by using Axiom (1). Then,

$$\displaystyle \begin{aligned}|T|=-4\left\vert\begin{matrix}2&1&5\\ 0&3&4\\ 0&0&1\end{matrix}\right\vert. \end{aligned}$$

Now, add (−4) times the third row to the second row and (−5) times the third row to the first row. This in symbols is “− 4(3) + (2), −5(3) + (1) ⇒”. The net result is that the elements 5 and 4 in the last column are eliminated without affecting the other elements, so that

$$\displaystyle \begin{aligned}|T|=-4\left\vert\begin{matrix}2&1&0\\ 0&3&0\\ 0&0&1\end{matrix}\right\vert.\end{aligned}$$

Now take out 3 from the second row and then use the second row to eliminate 1 in the first row. After taking out 3 from the second row, the operation is “− 1(2) + (1) ⇒”. The result is the following:

$$\displaystyle \begin{aligned}|T|=(-4)(3)\left\vert\begin{matrix}2&0&0\\ 0&1&0\\ 0&0&1\end{matrix}\right\vert.\end{aligned}$$

Now, take out 2 from the first row, then by Axiom (4) the determinant of the resulting identity matrix is 1, and hence |T| is nothing but the product of the diagonal elements. Thus, we have the following result:

(v): The determinant of a triangular matrix (upper or lower) is the product of its diagonal elements; accordingly, if any diagonal element in a triangular matrix is zero, then the determinant is zero and the matrix is singular. For a triangular matrix to be nonsingular, all its diagonal elements must be non-zeros.

The following result follows directly from Axioms (1) and (2). The proof is given in symbols.

(vi): If any two rows (columns) are interchanged (this means one transposition), then the resulting determinant is multiplied by − 1 or every transposition brings in − 1 outside that determinant as a multiple. If an odd number of transpositions are done, then the whole determinant is multiplied by − 1, and for even number of transpositions, the multiplicative factor is + 1 or no change in the determinant. An outline of the proof follows:

$$\displaystyle \begin{aligned} |A|&=f(\alpha_1,\ldots,\alpha_i,\ldots,\alpha_j,\ldots,\alpha_n)\\ &=f(\alpha_1,\ldots,\alpha_i ,\ldots,\alpha_i+\alpha_j,\ldots,\alpha_n)\ \ \ [\mbox{Axiom (2)}]\\ &=-f(\alpha_1,\ldots,\alpha_i,\ldots,-\alpha_i-\alpha_j,\ldots,\alpha_n)\ \ \ [\mbox{Axiom (1)}]\\ &=-f(\alpha_1,\ldots,-\alpha_j,\ldots,-\alpha_i-\alpha_j,\ldots,\alpha_n)\ \ \ [\mbox{Axiom (2)}]\\ &=f(\alpha_1,\ldots,\alpha_j,\ldots,-\alpha_i-\alpha_j,\ldots,\alpha_n)\ \ \ [\mbox{Axiom (1)}]\\ &=f(\alpha_1,\ldots,\alpha_j,\ldots,-\alpha_i,\ldots,\alpha_n)\ \ \ [\mbox{Axiom (2)}]\\ &=-f(\alpha_1,\ldots,\alpha_j,\ldots,\alpha_i,\ldots,\alpha_n)\ \ \ [\mbox{Axiom (1)}].\end{aligned} $$

Now, note that the i-th and j-th rows (columns) are interchanged and the result is that the determinant is multiplied by − 1.

With the above six basic properties, we are in a position to evaluate most of the determinants.

Example 1.2.1

Evaluate the determinant of the matrix

Solution 1.2.1

Since, this is a triangular matrix, its determinant will be product of its diagonal elements. Proceeding step by step, take out 2 from the first row by using Axiom (1). Then − 1(1) + (2), −2(1) + (3), −3(1) + (3) ⇒. The result of these operations is the following:

Now, take out 5 from the second row so that 1(2) + (3) ⇒, the result being the following:

$$\displaystyle \begin{aligned}|A|=(2)(5)\left\vert\begin{matrix}1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&1&4\end{matrix}\right\vert\end{aligned}$$

The diagonal element in the third row is 1 and there is nothing to be taken out. Now − 1(3) + (4) ⇒ and then, after having taken out 4 from the fourth row, the result is

$$\displaystyle \begin{aligned}|A|=(2)(5)(1)(4)\left\vert\begin{matrix}1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&1\end{matrix}\right\vert .\end{aligned}$$

Now, by Axiom (4) the determinant of the remaining identity matrix is 1. Therefore, the final solution is |A| = (2)(5)(1)(4) = 40.

Example 1.2.2

Evaluate the determinant of the following matrix:

Solution 1.2.2

Since the first row, first column element is a convenient number 1 we start operating with the first row. Otherwise, we bring a convenient number to the (1, 1)-th position by interchanges of rows and columns (with each interchange the determinant is to be multiplied by (−1). Our aim will be to reduce the matrix to a triangular form so that the determinant is the product of the diagonal elements. By using the first row let us wipe out the elements in the first column. The operations are − 2(1) + (3), −5(1) + (4) ⇒. Then

Now, by using the second row we want to wipe out the elements below the diagonal in the second column. But the first number is 3. One element in the third row can be wiped out by simply adding 1(2) + (3) ⇒. This brings the following:

If we take out 3 from the second row then it will bring in fractions. We will avoid fractions by multiplying the second row by 8 and the fourth row by 3. In order preserve the value, we keep \(\frac {1}{(8)(3)}\) outside. Then, we add the second row to the fourth row or (2) + (4) ⇒. The result of these operations is the following:

Now, multiply the third row by 41 and fourth row by 7 and then add− 1(3) + (4) ⇒. The result is the following:

Now, take the product of the diagonal elements. Then

$$\displaystyle \begin{aligned}|A|=\frac{(1)(24)(-287)(55)}{(8)(3)(7)(41)}=-55.\end{aligned}$$

Observe that we did not have to repeat the 4 × 4 determinant each time. After wiping out the first column elements, we could have expressed the determinant as follows because only the elements in the second row and second column onward would then have mattered. That is,

Similarly, after wiping out the second column elements, we could have written the resulting determinant as

and so on.

Example 1.2.3

Evaluate the determinant of a 2 × 2 general matrix.

Solution 1.2.3

A general 2 × 2 determinant can be opened up by using Axiom (3), that is,

$$\displaystyle \begin{aligned} |A|&=\left\vert\begin{matrix}a_{11}&a_{12}\\ a_{21}&a_{22}\end{matrix}\right\vert=\left\vert\begin{matrix}a_{11}&0\\ a_{21}&a_{22}\end{matrix}\right\vert+\left\vert\begin{matrix}0&a_{12}\\ a_{21}&a_{22}\end{matrix}\right\vert\ \ [\mbox{Axiom (3)}]\\ &=a_{11}\left\vert\begin{matrix}1&0\\ a_{21}&a_{22}\end{matrix}\right\vert+a_{12}\left\vert\begin{matrix}0&1\\ a_{21}&a_{22}\end{matrix}\right\vert\ \ [\mbox{Axiom (1)}].\end{aligned} $$

If any of a 11 or a 12 is zero, then the corresponding determinant is zero. In the second determinant on the right, interchange the second and first columns, which will bring a minus sign outside the determinant. That is,

$$\displaystyle \begin{aligned}|A|=a_{11}\left\vert\begin{matrix}1&0\\ a_{21}&a_{22}\end{matrix}\right\vert -a_{12}\left\vert\begin{matrix}1&0\\ a_{22}&a_{21}\end{matrix}\right\vert =a_{11}a_{22}-a_{12}a_{21}. \end{aligned}$$

The last step is done by using the property that the determinant of a triangular matrix is the product of the diagonal elements. We can also evaluate the determinant by using a number of different procedures. Taking out a 11 if a 11≠0,

$$\displaystyle \begin{aligned}|A|=a_{11}\left\vert\begin{matrix}1&\frac{a_{12}}{a_{11}}\\ a_{21}&a_{22}\end{matrix}\right\vert .\end{aligned}$$

Now, perform the operation − a 21(1) + (2) or − a 21 times the first row is added to the second row. Then,

$$\displaystyle \begin{aligned}|A|=a_{11}\left\vert\begin{matrix}1&\frac{a_{12}}{a_{11}}\\ 0&a_{22}-\frac{a_{12}a_{21}}{a_{11}}\end{matrix}\right\vert .\end{aligned}$$

Now, expanding by using a property of triangular matrices, we have

$$\displaystyle \begin{aligned}|A|=a_{11}(1)\big[a_{22}-\frac{a_{12}a_{21}}{a_{11}}\big]=a_{11}a_{22}-a_{12}a_{21}.{}\end{aligned} $$
(1.2.4)

Consider a general 3 × 3 determinant evaluated by using Axiom (3) first.

$$\displaystyle \begin{aligned} |A|&=\left\vert\begin{matrix}a_{11}&a_{12}&a_{13}\\ a_{21}&a_{22}&a_{23}\\ a_{31}&a_{32}&a_{33}\end{matrix}\right\vert \\ &=a_{11}\left\vert\begin{matrix}1&0&0\\ a_{21}&a_{22}&a_{23}\\ a_{31}&a_{32}&a_{33}\end{matrix}\right\vert +a_{12}\left\vert\begin{matrix}0&1&0\\ a_{21}&a_{22}&a_{23}\\ a_{31}&a_{32}&a_{33}\end{matrix}\right\vert +a_{13}\left\vert\begin{matrix}0&0&1\\ a_{21}&a_{22}&a_{23}\\ a_{31}&a_{32}&a_{33}\end{matrix}\right\vert\\ &=a_{11}\left\vert\begin{matrix}1&0&0\\ 0&a_{22}&a_{23}\\ 0&a_{32}&a_{33}\end{matrix}\right\vert +a_{12}\left\vert\begin{matrix}0&1&0\\ a_{21}&0&a_{23}\\ a_{31}&0&a_{33}\end{matrix}\right\vert+a_{13}\left\vert\begin{matrix}0&0&1\\ a_{21}&a_{22}&0\\ a_{31}&a_{32}&0\end{matrix}\right\vert.\end{aligned} $$

The first step consists in opening up the first row by making use of Axiom (3). Then, eliminate the elements in rows 2 and 3 within the column headed by 1. The next step is to bring the columns whose first element is 1 to the first column position by transpositions. The first matrix on the right-hand side is already in this format. One transposition is needed in the second matrix and two are required in the third matrix. After completing the transpositions, the next step consists in opening up each determinant along their second row and observing that the resulting matrices are lower triangular or can be made so after transposing their last two columns. The final result is then obtained. The last two steps are executed below:

$$\displaystyle \begin{aligned} |A|&=a_{11}\left\vert\begin{matrix}1&0&0\\ 0&a_{22}&a_{23}\\ 0&a_{32}&a_{33}\end{matrix}\right\vert -a_{12}\left\vert\begin{matrix}1&0&0\\ 0&a_{21}&a_{23}\\ 0&a_{31}&a_{33}\end{matrix}\right\vert+a_{13}\left\vert\begin{matrix}1&0&0\\ 0&a_{21}&a_{22}\\ 0&a_{31}&a_{32}\end{matrix}\right\vert \\ &=a_{11}[a_{22}a_{33}-a_{23}a_{32}]-a_{12}[a_{21}a_{33}-a_{23}a_{31}] +a_{13}[a_{21}a_{32}-a_{22}a_{31}]\\ &=a_{11}a_{22}a_{33}+a_{12}a_{23}a_{31}+a_{13}a_{21}a_{32} -a_{11}a_{23}a_{32}-a_{12}a_{21}a_{33}-a_{13}a_{22}a_{31}.{}\end{aligned} $$
(1.2.5)

A few observations are in order. Once 1 is brought to the first row first column position in every matrix and the remaining elements in this first column are eliminated, one can delete the first row and first column and take the determinant of the remaining submatrix because only those elements will enter into the remaining operations involving opening up the second and successive rows by making use of Axiom (3). Hence, we could have written

$$\displaystyle \begin{aligned}|A|=a_{11}\left\vert\begin{matrix}a_{22}&a_{23}\\ a_{32}&a_{33}\end{matrix}\right\vert -a_{12}\left\vert\begin{matrix} a_{21}&a_{23}\\ a_{31}&a_{33}\end{matrix}\right\vert +a_{13}\left\vert\begin{matrix}a_{21}&a_{22}\\ a_{31}&a_{32}\end{matrix}\right\vert .\end{aligned}$$

This step is also called the cofactor expansion of the matrix. In a general matrix A = (a ij), the cofactor of the element a ij is equal to (−1)i+j M ij where M ij is the minor of a ij. This minor is obtained by deleting the i-th row and j-the column and then taking the determinant of the remaining elements. The second item to be noted from (1.2.5) is that, in the final expression for |A|, each term has one and only one element from each row and each column of A. Some elements have plus signs in front of them and others have minus signs. For each term, write the first subscript in the natural order 1, 2, 3 for the 3 × 3 case and in the general n × n case, write the first subscripts in the natural order 1, 2, …, n. Now, examine the second subscripts. Let the number of transpositions needed to bring the second subscripts into the natural order 1, 2, …, n be ρ. Then, that term is multiplied by (−1)ρ so that an even number of transpositions produces a plus sign and an odd number of transpositions brings a minus sign, or equivalently if ρ is even, the coefficient is plus 1 and if ρ is odd, the coefficient is − 1. This also enables us to open up a general determinant. This will be considered after pointing out one more property for a 3 × 3 case. The final representation in the 3 × 3 case in (1.2.5) can also be written up by using the following mechanical procedure. Write all elements in the matrix A in the natural order. Then, augment this arrangement with the first two columns. This yields the following format:

$$\displaystyle \begin{aligned}\begin{matrix} a_{11}&a_{12}&a_{13}&a_{11}&a_{12}\\ a_{21}&a_{22}&a_{23}&a_{21}&a_{22}\\ a_{31}&a_{32}&a_{33}&a_{31}&\,\,\,a_{32}\ .\end{matrix}\end{aligned}$$

Now take the products of the elements along the diagonals going from the top left to the bottom right. These are the elements with the plus sign. Take the products of the elements in the second diagonals or the diagonals going from the bottom left to the top right. These are the elements with minus sign. As a result, |A| is as follows:

$$\displaystyle \begin{aligned} |A|&=[a_{11}a_{22}a_{33}+a_{12}a_{23}a_{31}+a_{13}a_{21}a_{32}]\\ &\ \ \ -[a_{13}a_{22}a_{31}+a_{11}a_{23}a_{32}+a_{12}a_{21}a_{33}].\end{aligned} $$

This mechanical procedure applies only in the 3 × 3 case. The general expansion is the following:

$$\displaystyle \begin{aligned}|A|=\left\vert\begin{matrix}a_{11}&a_{12}&\ldots&a_{1n}\\ a_{21}&a_{22}&\ldots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n1}&a_{n2}&\ldots&a_{nn}\end{matrix}\right\vert=\sum_{i_1} \cdots\sum_{i_n}(-1)^{\rho(i_1,\ldots,i_n)}a_{1i_1}a_{2i_2}\cdots a_{ni_n} {}\end{aligned} $$
(1.2.6)

where ρ(i 1, …, i n) is the number of transpositions needed to bring the second subscripts i 1, …, i n into the natural order 1, 2, …, n.

The cofactor expansion of a general matrix is obtained as follows: Suppose that we open up a n × n determinant A along the i-th row using Axiom (3). Then, after taking out a i1, a i2, …, a in, we obtain n determinants where, in the first determinant, 1 occupies the (i, 1)-th position, in the second one, 1 is at the (i, 2)-th position and so on so that, in the j-th determinant, 1 occupies the (i, j)-th position. Given that i-th row, we can now eliminate all the elements in the columns corresponding to the remaining 1. We now bring this 1 into the first row first column position by transpositions in each determinant. The number of transpositions needed to bring this 1 from the j-th position in the i-th row to the first position in the i-th row, is j − 1. Then, to bring that 1 to the first row first column position, another i − 1 transpositions are required, so that the total number of transpositions needed is (i − 1) + (j − 1) = i + j − 2. Hence, the multiplicative factor is (−1)i+j−2 = (−1)i+j, and the expansion is as follows:

$$\displaystyle \begin{aligned} |A|&=(-1)^{i+1}a_{i1}M_{i1}+(-1)^{i+2}a_{i2}M_{i2}+\cdots +(-1)^{i+n}a_{in}M_{in}\\ &=a_{i1}C_{i1}+a_{i2}C_{i2}+\cdots +a_{in}C_{in}{}\end{aligned} $$
(1.2.7)

where C ij = (−1)i+j M ij, C ij is the cofactor of a ij and M ij is the minor of a ij, the minor being obtained by taking the determinant of the remaining elements after deleting the i-th row and j-th column of A. Moreover, if we expand along a certain row (column) and the cofactors of some other row (column), then the result will be zero. That is,

$$\displaystyle \begin{aligned}0=a_{i1}C_{j1}+a_{i2}C_{j2}+\cdots +a_{in}C_{jn},\mbox{ for all }i\ne j.{} \end{aligned} $$
(1.2.8)

Inverse of a Matrix Regular inverses exist only for square matrices that are nonsingular. The standard notation for a regular inverse of a matrix A is A −1. It is defined as AA −1 = I n and A −1 A = I n. The following properties can be deduced from the definition. First, we note that AA −1 = A 0 = I = A −1 A. When A and B are n × n nonsingular matrices, then (AB)−1 = B −1 A −1, which can be established by pre- or post-multiplying the right-hand side side by AB. Accordingly, with A m = A × A ×⋯ × A, A m = A −1 ×⋯ × A −1 = (A m)−1, m = 1, 2, … , and when A 1, …, A k are n × n nonsingular matrices, \((A_1A_2\cdots A_k)^{-1}=A_k^{-1}A_{k-1}^{-1}\cdots A_2^{-1}A_1^{-1}\). We can also obtain a formula for the inverse of a nonsingular matrix A in terms of cofactors. Assuming that A −1 exist and letting Cof(A) = (C ij) be the matrix of cofactors of A, that is, if A = (a ij) and if C ij is the cofactor of a ij then Cof(A) = (C ij). It follows from (1.2.7) and (1.2.8) that

$$\displaystyle \begin{aligned}A^{-1}=\frac{1}{|A|}(\mathrm{Cof}(A))'=\frac{1}{|A|}\left[\begin{matrix}C_{11}&\ldots&C_{1n}\\ \vdots&\ddots&\vdots\\ C_{n1}&\ldots&C_{nn}\end{matrix}\right]',{} \end{aligned} $$
(1.2.9)

that is, the transpose of the cofactor matrix divided by the determinant of A. What about \(A^{\frac {1}{2}}\)? For a scalar quantity a, we have the definition that if b exists such that b × b = a, then b is a square root of a. Consider the following 2 × 2 matrices:

Thus, if we use the definition B 2 = A and claim that B is the square root of A, there are several candidates for B; this means that, in general, the square root of a matrix cannot be uniquely determined. However, if we restrict ourselves to the class of positive definite matrices, then a square root can be uniquely defined. The definiteness of matrices will be considered later.

1.2.1. Inverses by row operations or elementary operations

Basic elementary matrices are of two types. Let us call them the E-type and the F-type. An elementary matrix of the E-type is obtained by taking an identity matrix and multiplying any row (column) by a nonzero scalar. For example,

where E 1, E 2, E 3 are elementary matrices of the E-type obtained from the identity matrix I 3. If we pre-multiply an arbitrary matrix A with an elementary matrix of the E-type, then the same effect will be observed on the rows of the arbitrary matrix A. For example, consider a 3 × 3 matrix A = (a ij). Then, for example,

Thus, the same effect applies to the rows, that is, the second row is multiplied by (−2). Observe that E-type elementary matrices are always nonsingular and so, their regular inverses exist. For instance,

Observe that post-multiplication of an arbitrary matrix by an E-type elementary matrix will have the same effect on the columns of the arbitrary matrix. For example, AE 1 will have the same effect on the columns of A, that is, the second column of A is multiplied by − 2; AE 2 will result in the first column of A being multiplied by 5, and so on. The F-type elementary matrix is created by adding any particular row of an identity matrix to another one of its rows. For example, consider a 3 × 3 identity matrix I 3 and let

$$\displaystyle \begin{aligned}F_1=\left[\begin{matrix}1&0&0\\ 1&1&0\\ 0&0&1\end{matrix}\right], \ F_2=\left[\begin{matrix}1&0&0\\ 0&1&0\\ 1&0&1\end{matrix}\right],\ F_3=\left[\begin{matrix}1&0&0\\ 0&1&0\\ 0&1&1\end{matrix}\right], \end{aligned}$$

where F 1 is obtained by adding the first row to the second row of I 3; F 2 is obtained by adding the first row to the third row of I 3; and F 3 is obtained by adding the second row of I 3 to the third row. As well, F-type elementary matrices are nonsingular, and for instance,

where \(F_1F_1^{-1}=I_3, F_2^{-1}F_2=I_3 \) and \( F_3^{-1}F_3=I_3 \). If we pre-multiply an arbitrary matrix A by an F-type elementary matrix, then the same effect will be observed on the rows of A. For example,

$$\displaystyle \begin{aligned}F_1A=\left[\begin{matrix}1&0&0\\ 1&1&0\\ 0&0&1\end{matrix}\right]\left[\begin{matrix}a_{11}&a_{12}&a_{13}\\ a_{21}&a_{22}&a_{23}\\ a_{31}&a_{32}&a_{33}\end{matrix}\right]=\left[\begin{matrix}a_{11}&a_{12}&a_{13}\\ a_{21}+a_{11}&a_{22}+a_{12}&a_{23}+a_{13}\\ a_{31}&a_{32}&a_{33}\end{matrix}\right]. \end{aligned}$$

Thus, the same effect applies to the rows, namely, the first row is added to the second row in A (as F 1 was obtained by adding the first row of I 3 to the second row of I 3). The reader may verify that F 2 A has the effect of the first row being added to the third row and F 3 A will have the effect of the second row being added to the third row. By combining E- and F-type elementary matrices, we end up with a G-type matrix wherein a multiple of any particular row of an identity matrix is added to another one of its rows. For example, letting

it is seen that G 1 is obtained by adding 5 times the first row to the second row in I 3, and G 2 is obtained by adding − 2 times the first row to the third row in I 3. Pre-multiplication of an arbitrary matrix A by G 1, that is, G 1 A, will have the effect that 5 times the first row of A will be added to its second row. Similarly, G 2 will have the effect that − 2 times the first row of A will be added to its third row. Being product of E- and F-type elementary matrices, G-type matrices are also nonsingular. We also have the result that if A, B, C are n × n matrices and B = C, then AB = AC as long as A is nonsingular. In general, if A 1, …, A k are n × n nonsingular matrices, we have

$$\displaystyle \begin{aligned} B=C&\Rightarrow A_kA_{k-1}\cdots A_2A_1B=A_kA_{k-1}\cdots A_2A_1C;\\ &\Rightarrow A_1A_2B=A_1(A_2B)=(A_1A_2)B=(A_1A_2)C=A_1(A_2C).\end{aligned} $$

We will evaluate the inverse of a nonsingular square matrix by making use of elementary matrices. The procedure will also verify whether a regular inverse exists for a given matrix. If a regular inverse for a square matrix A exists, then AA −1 = I. We can pre- or post-multiply A by elementary matrices. For example,

$$\displaystyle \begin{aligned} AA^{-1}=I&\Rightarrow E_kF_r\cdots E_1F_1AA^{-1}=E_kF_r\cdots E_1F_1I\\ &\Rightarrow (E_k\cdots F_1A)A^{-1}=(E_k\cdots F_1).\end{aligned} $$

Thus, if the operations E kF 1 on A reduced A to an identity matrix, then A −1 is E kF 1. If an inconsistency has occurred during the process, we can conclude that there is no inverse for A. Hence, our aim in performing our elementary operations on the left of A is to reduce it to an identity matrix, in which case the product of the elementary matrices on the right-hand side of the last equation will produce the inverse of A.

Example 1.2.4

Evaluate A −1 if it exists, where

Solution 1.2.4

If A −1 exists then AA −1 = I which means

This is our starting equation. Only the configuration of the elements matters. The matrix notations and the symbol A −1 can be disregarded. Hence, we consider only the configuration of the numbers of the matrix A on the left and the numbers in the identity matrix on the right. Then we pre-multiply A and pre-multiply the identity matrix by only making use of elementary matrices. In the first set of steps, our aim consists in reducing every element in the first column of A to zeros, except the first one, by only using the first row. For each elementary operation on A, the same elementary operation is done on the identity matrix also. Now, utilizing the second row of the resulting A, we reduce all the elements in the second column of A to zeros except the second one and continue in this manner until all the elements in the last columns except the last one are reduced to zeros by making use of the last row, thus reducing A to an identity matrix, provided of course that A is nonsingular. In our example, the elements in the first column can be made equal to zeros by applying the following operations. We will employ the following standard notation: a(i) + (j) ⇒ meaning a times the i-th row is added to the j-th row, giving the result. Consider (1) + (2);  − 2(1) + (3);  − 1(1) + (4) ⇒ (that is, the first row is added to the second row; and then − 2 times the first row is added to the third row; then − 1 times the first row is added to the fourth row), (for each elementary operation on A we do the same operation on the identity matrix also) the net result being

Now, start with the second row of the resulting A and the resulting identity matrix and try to eliminate all the other elements in the second column of the resulting A. This can be achieved by performing the following operations: (2) + (3);  − 1(2) + (1) ⇒

Now, start with the third row and eliminate all other elements in the third column. This can be achieved by the following operations. Writing the row used in the operations (the third one in this case) within the first set of parentheses for each operation, we have 2(3) + (4);  − 2(3) + (2); (3) + (1) ⇒

Divide the 4th row by 2 and then perform the following operations: \(\frac {1}{2}(4); \ -1(4)+(3);\ (4)+(2);\ -1(4)+(1)\Rightarrow \)

Thus,

This result should be verified to ensure that it is free of computational errors. Since

the result is indeed correct.

Example 1.2.5

Evaluate A −1 if it exists where

Solution 1.2.5

If A −1 exists, then AA −1 = I 3. Write

Starting with the first row, eliminate all other elements in the first column with the following operations: − 1(1) + (2);  − 2(1) + (3) ⇒

The second and third rows on the left side being identical, the left-hand side matrix is singular, which means that A is singular. Thus, the inverse of A does not exist in this case.

1.3. Determinants of Partitioned Matrices

Consider a matrix A written in the following format:

$$\displaystyle \begin{aligned}A=\left[\begin{matrix}A_{11}&A_{12}\\ A_{21}&A_{22}\end{matrix}\right]\mbox{ where }A_{11},A_{12},A_{21},A_{22}\mbox{ are submatrices}.\end{aligned}$$

For example,

$$\displaystyle \begin{aligned}A=\left[\begin{matrix}a_{11}&a_{12}&a_{13}\\ a_{21}&a_{22}&a_{23}\\ a_{31}&a_{32}&a_{33}\end{matrix}\right]=\left[\begin{matrix}A_{11}&A_{12}\\ A_{21}&A_{22}\end{matrix}\right] ,\ A_{11}=[a_{11}],\ A_{12}=[a_{12}~a_{13}],\ A_{21}=\left[\begin{matrix}a_{21}\\ a_{31}\end{matrix}\right]\end{aligned}$$

and \(A_{22}=\left [\begin {matrix}a_{22}&a_{23}\\ a_{32}&a_{33}\end {matrix}\right ]\). The above is a 2 × 2 partitioning or a partitioning into two sub-matrices by two sub-matrices. But a 2 × 2 partitioning is not unique. We may also consider

$$\displaystyle \begin{aligned}A_{11}=\left[\begin{matrix}a_{11}&a_{12}\\ a_{21}&a_{22}\end{matrix}\right],\ A_{22}=[a_{33}],\ A_{12}=\left[\begin{matrix}a_{13}\\ a_{23}\end{matrix}\right],\ A_{21}=[a_{31}~a_{32}],\end{aligned}$$

which is another 2 × 2 partitioning of A. We can also have a 1 × 2 or 2 × 1 partitioning into sub-matrices. We may observe one interesting property. Consider a block diagonal matrix. Let

$$\displaystyle \begin{aligned}A=\left[\begin{matrix}A_{11}&O\\ O&A_{22}\end{matrix}\right]\ \Rightarrow\ |A|=\left\vert\begin{matrix}A_{11}&O\\ O&A_{22}\end{matrix}\right\vert,\end{aligned}$$

where A 11 is r × r, A 22 is s × s, r + s = n and O indicates a null matrix. Observe that when we evaluate the determinant, all the operations on the first r rows will produce the determinant of A 11 as a coefficient, without affecting A 22, leaving an r × r identity matrix in the place of A 11. Similarly, all the operations on the last s rows will produce the determinant of A 22 as a coefficient, leaving an s × s identity matrix in place of A 22. In other words, for a diagonal block matrix whose diagonal blocks are A 11 and A 22,

$$\displaystyle \begin{aligned}|A|=|A_{11}|\times |A_{22}|.{} \end{aligned} $$
(1.3.1)

Given a triangular block matrix, be it lower or upper triangular, then its determinant is also the product of the determinants of the diagonal blocks. For example, consider

$$\displaystyle \begin{aligned}A=\left[\begin{matrix}A_{11}&A_{12}\\ O&A_{22}\end{matrix}\right]\ \Rightarrow\ |A|=\left\vert\begin{matrix}A_{11}&A_{12}\\ O&A_{22}\end{matrix}\right\vert.\end{aligned}$$

By using A 22, we can eliminate A 12 without affecting A 11 and hence, we can reduce the matrix of the determinant to a diagonal block form without affecting the value of the determinant. Accordingly, the determinant of an upper or lower triangular block matrix whose diagonal blocks are A 11 and A 22, is

$$\displaystyle \begin{aligned}|A|=|A_{11}|~|A_{22}|.{} \end{aligned} $$
(1.3.2)

Partitioning is done to accommodate further operations such as matrix multiplication. Let A and B be two matrices whose product AB is defined. Suppose that we consider a 2 × 2 partitioning of A and B into sub-matrices; if the multiplication is performed treating the sub-matrices as if they were scalar quantities, the following format is obtained:

$$\displaystyle \begin{aligned} AB&=\left[\begin{matrix}A_{11}&A_{12}\\ A_{21}&A_{22}\end{matrix}\right]\left[\begin{matrix}B_{11}&B_{12}\\ B_{21}&B_{22}\end{matrix}\right]\\ &=\left[\begin{matrix}A_{11}B_{11}+A_{12}B_{21}&A_{11}B_{12}+A_{12}B_{22}\\ A_{21}B_{11}+A_{22}B_{21}&A_{21}B_{12}+A_{22}B_{22} \end{matrix}\right]\ \ .\end{aligned} $$

If all the products of sub-matrices on the right-hand side are defined, then we say that A and B are conformably partitioned for the product AB. Let A be a n × n matrix whose determinant is defined. Let us consider the 2 × 2 partitioning

$$\displaystyle \begin{aligned}A=\left[\begin{matrix}A_{11}&A_{12}\\ A_{21}&A_{22}\end{matrix}\right]\mbox{ where }A_{11}\ \mbox{is}\ r\times r,\ A_{22}\mbox{ is }s\times s,\ r+s=n. \end{aligned}$$

Then, A 12 is r × s and A 21 is s × r. In this case, the first row block is [A 11 A 12] and the second row block is [A 21 A 22]. When evaluating a determinant, we can add linear functions of rows to any other row or linear functions of rows to other blocks of rows without affecting the value of the determinant. What sort of a linear function of the first row block could be added to the second row block so that a null matrix O appears in the position of A 21? It is \(-A_{21}A_{11}^{-1}\) times the first row block. Then, we have

$$\displaystyle \begin{aligned}|A|=\left\vert\begin{matrix}A_{11}&A_{12}\\ A_{21}&A_{22}\end{matrix}\right\vert =\left\vert\begin{matrix} A_{11}&A_{12}\\ O&A_{22}-A_{21}A_{11}^{-1}A_{12}\end{matrix}\right\vert .\end{aligned}$$

This is a triangular block matrix and hence its determinant is the product of the determinants of the diagonal blocks. That is,

$$\displaystyle \begin{aligned}|A|=|A_{11}|~|A_{22}-A_{21}A_{11}^{-1}A_{12}|,\ |A_{11}|\ne 0.\end{aligned}$$

From symmetry, it follows that

$$\displaystyle \begin{aligned} |A| &=|A_{22}|~|A_{11}-A_{12}A_{22}^{-1}A_{21}|,\ |A_{22}|\ne 0.{}\end{aligned} $$
(1.3.3)

Let us now examine the inverses of partitioned matrices. Let A and A −1 be conformably partitioned for the product AA −1. Consider a 2 × 2 partitioning of both A and A −1. Let

$$\displaystyle \begin{aligned}\left[\begin{matrix}A_{11}&A_{12}\\ A_{21}&A_{22}\end{matrix}\right]~\left[\begin{matrix}A^{11}&A^{12}\\ A^{21}&A^{22}\end{matrix}\right]=\left[\begin{matrix}I_r&O\\ O&I_s\end{matrix}\right].\end{aligned}$$

where A 11 and A 11 are r × r and A 22 and A 22 are s × s with r + s = n, A is n × n and nonsingular. AA −1 = I gives the following:

$$\displaystyle \begin{aligned}\left[\begin{matrix}A_{11}&A_{12}\\ A_{21}&A_{22}\end{matrix}\right]~\left[\begin{matrix}A^{11}&A^{12}\\ A^{21}&A^{22}\end{matrix}\right]=\left[\begin{matrix}I_r&O\\ O&I_s\end{matrix}\right].\end{aligned}$$

That is,

$$\displaystyle \begin{aligned} A_{11}A^{11}+A_{12}A^{21}&=I_r \end{aligned} $$
(i)
$$\displaystyle \begin{aligned} A_{11}A^{12}+A_{12}A^{22}&=O \end{aligned} $$
(ii)
$$\displaystyle \begin{aligned} A_{21}A^{11}+A_{22}A^{21}&=O \end{aligned} $$
(iii)
$$\displaystyle \begin{aligned} A_{21}A^{12}+A_{22}A^{22}&=I_s.\end{aligned} $$
(iv)

From (ii), \(A^{12}=-A_{11}^{-1}A_{12}A^{22}\). Substituting in (iv),

$$\displaystyle \begin{aligned}A_{21}[-A_{11}^{-1}A_{12}A^{22}]+A_{22}A^{22}=I_s\Rightarrow [A_{22}-A_{21}A_{11}^{-1}A_{12}]A^{22}=I_s. \end{aligned}$$

That is,

$$\displaystyle \begin{aligned}A^{22}=(A_{22}-A_{21}A_{11}^{-1}A_{12})^{-1}, |A_{11}|\ne 0,{} \end{aligned} $$
(1.3.4)

and, from symmetry, it follows that

$$\displaystyle \begin{aligned} A^{11}&=(A_{11}-A_{12}A_{22}^{-1}A_{21})^{-1},|A_{22}|\ne 0{} \end{aligned} $$
(1.3.5)
$$\displaystyle \begin{aligned} A_{11}&=(A^{11}-A^{12}(A^{22})^{-1}A^{21})^{-1},|A^{22}|\ne 0{} \end{aligned} $$
(1.3.6)
$$\displaystyle \begin{aligned} A_{22}&=(A^{22}-A^{21}(A^{11})^{-1}A^{12})^{-1},|A^{11}|\ne 0.{}\end{aligned} $$
(1.3.7)

The rectangular components A 12, A 21, A 12, A 21 can also be evaluated in terms of the sub-matrices by making use of Eqs. (i)(iv).

1.4. Eigenvalues and Eigenvectors

Let A be n × n matrix, X be an n × 1 vector, and λ be a scalar quantity. Consider the equation

$$\displaystyle \begin{aligned}AX=\lambda X\Rightarrow (A-\lambda I)X=O.\end{aligned}$$

Observe that X = O is always a solution. If this equation has a non-null vector X as a solution, then the determinant of the coefficient matrix must be zero because this matrix must be singular. If the matrix (A − λI) were nonsingular, its inverse (AλI)−1 would exist and then, on pre-multiplying (A − I)X = O by (AλI)−1, we would have X = O, which is inadmissible since XO. That is,

$$\displaystyle \begin{aligned}|A-\lambda I|=0,\ \lambda \mbox{ being a scalar quantity}.{}\end{aligned} $$
(1.4.1)

Since the matrix A is n × n, equation (1.4.1) has n roots, which will be denoted by λ 1, …, λ n. Then

$$\displaystyle \begin{aligned}|A-\lambda I|=(\lambda_1-\lambda)(\lambda_2-\lambda)\cdots (\lambda_n-\lambda),\ AX_j=\lambda_jX_j. \end{aligned}$$

Then, λ 1, …, λ n are called the eigenvalues of A and X jO, an eigenvector corresponding to the eigenvalue λ j.

Example 1.4.1

Compute the eigenvalues and eigenvectors of the matrix \(A=\left [\begin {matrix}1&1\\ 1&2\end {matrix}\right ]\).

Solution 1.4.1

Consider the equation

$$\displaystyle \begin{aligned} |A-\lambda I|&=0\Rightarrow\left\vert\left[\begin{matrix}1&1\\ 1&2\end{matrix}\right]-\lambda\left[\begin{matrix}1&0\\ 0&1\end{matrix}\right]\right\vert=0\Rightarrow\\ \left\vert\begin{matrix}1-\lambda&1\\ 1&2-\lambda\end{matrix}\right\vert&=0\Rightarrow (1-\lambda)(2-\lambda)-1=0\Rightarrow \lambda^2-3\lambda+1=0\Rightarrow\\ \lambda &=\frac{3\pm \sqrt{(9-4)}}{2}\Rightarrow \lambda_1=\frac{3}{2}+\frac{\sqrt{5}}{2},\ \lambda_2=\frac{3}{2}-\frac{\sqrt{5}}{2}.\end{aligned} $$

An eigenvector X 1 corresponding to \(\lambda _1=\frac {3}{2}+\frac {\sqrt {5}}{2}\) is given by AX 1 = λ 1 X 1 or (A − λ 1 I)X 1 = O. That is,

$$\displaystyle \begin{aligned} \left[\begin{matrix}1-(\frac{3}{2}+\frac{\sqrt{5}}{2})&1\\ 1&2-(\frac{3}{2}+\frac{\sqrt{5}}{2})\end{matrix}\right]\left[\begin{matrix}x_1\\ x_2\end{matrix}\right]&=\left[\begin{matrix}0\\ 0\end{matrix}\right]\Rightarrow\\ \Big(\!\!-\frac{1}{2}-\frac{\sqrt{5}}{2}\Big)x_1+x_2&=0\, , \end{aligned} $$
(i)
$$\displaystyle \begin{aligned} x_1+ \Big(\frac{1}{2}-\frac{\sqrt{5}}{2} \Big)x_2&=0.\end{aligned} $$
(ii)

Since A − λ 1 I is singular, both (i) and (ii) must give the same solution. Letting x 2 = 1 in (ii), \(x_1=-\frac {1}{2}+\frac {\sqrt {5}}{2}\). Thus, one solution X 1 is

$$\displaystyle \begin{aligned}X_1=\left[\begin{matrix}-\frac{1}{2}+\frac{\sqrt{5}}{2}\\ 1\end{matrix}\right].\end{aligned}$$

Any nonzero constant multiple of X 1 is also a solution to (A − λ 1 I)X 1 = O. An eigenvector X 2 corresponding to the eigenvalue λ 2 is given by (A − λ 2 I)X 2 = O. That is,

$$\displaystyle \begin{aligned} \Big(\!\!-\frac{1}{2}+\frac{\sqrt{5}}{2}\, \Big)x_1+x_2&=0, \end{aligned} $$
(iii)
$$\displaystyle \begin{aligned} x_1+ \Big(\frac{1}{2}+\frac{\sqrt{5}}{2}\, \Big)x_2&=0.\end{aligned} $$
(iv)

Hence, one solution is

$$\displaystyle \begin{aligned}X_2=\left[\begin{matrix}-\frac{1}{2}-\frac{\sqrt{5}}{2}\\ 1\end{matrix}\right].\end{aligned}$$

Any nonzero constant multiple of X 2 is also an eigenvector corresponding to λ 2.

Even if all elements of a matrix A are real, its eigenvalues can be real, positive, negative, zero, irrational or complex. Complex and irrational roots appear in pairs. If \(a+ib,\ i=\sqrt {(-1)},\) and a, b real, is an eigenvalue, then a − ib is also an eigenvalue of the same matrix. The following properties can be deduced from the definition:

(1): The eigenvalues of a diagonal matrix are its diagonal elements;

(2): The eigenvalues of a triangular (lower or upper) matrix are its diagonal elements;

(3): If any eigenvalue is zero, then the matrix is singular and its determinant is zero;

(4): If λ is an eigenvalue of A and if A is nonsingular, then \(\frac {1}{\lambda }\) is an eigenvalue of A −1;

(5): If λ is an eigenvalue of A, then λ k is an eigenvalue of A k, k = 1, 2, …, their associated eigenvector being the same;

(7): The eigenvalues of an identity matrix are unities; however, the converse need not be true;

(8): The eigenvalues of a scalar matrix with diagonal elements a, a, …, a are a repeated n times when the order of A is n; however, the converse need not be true;

(9): The eigenvalues of an orthonormal matrix, AA′ = I, A′A = I, are ± 1; however, the converse need not be true;

(10): The eigenvalues of an idempotent matrix, A = A 2, are ones and zeros; however, the converse need not be true. The only nonsingular idempotent matrix is the identity matrix;

(11): At least one of the eigenvalues of a nilpotent matrix of order r, that is, AO, …, A r−1O, A r = O, is null;

(12): For an n × n matrix, both A and A′ have the same eigenvalues;

(13): The eigenvalues of a symmetric matrix are real;

(14): The eigenvalues of a skew symmetric matrix can only be zeros and purely imaginary numbers;

(15): The determinant of A is the product of its eigenvalues: |A| = λ 1 λ 2λ n ;

(16): The trace of a square matrix is equal to the sum of its eigenvalues;

(17): If A = A′ (symmetric), then the eigenvectors corresponding to distinct eigenvalues are orthogonal;

(18): If A = A′ and A is n × n, then there exists a full set of n eigenvectors which are linearly independent, even if some eigenvalues are repeated.

Result (16) requires some explanation. We have already derived the following two results:

$$\displaystyle \begin{aligned}|A|=\sum_{i_1}\cdots\sum_{i_n}(-1)^{\rho}a_{1i_1}a_{2i_2}\cdots a_{ni_n}\end{aligned} $$
(v)

and

$$\displaystyle \begin{aligned}|A-\lambda I|=(\lambda_1-\lambda)(\lambda_2-\lambda)\cdots (\lambda_n-\lambda).\end{aligned} $$
(vi)

Equation (vi) yields a polynomial of degree n in λ where λ is a variable. When λ = 0, we have |A| = λ 1 λ 2λ n, the product of the eigenvalues. The term containing (−1)n λ n, when writing |A − λI| in the format of equation (v), can only come from the term (a 11 − λ)(a 22 − λ)⋯(a nn − λ) (refer to the explicit form for the 3 × 3 case discussed earlier). Two factors containing λ will be missing in the next term with the highest power of λ. Hence, (−1)n λ n and (−1)n−1 λ n−1 can only come from the term (a 11 − λ)⋯(a nn − λ), as can be seen from the expansion in the 3 × 3 case discussed in detail earlier. From (v) the coefficient of (−1)n−1 λ n−1 is a 11 + a 22 + ⋯ + a nn = tr(A) and from (vi), the coefficient of (−1)n−1 λ n−1 is λ 1 + ⋯ + λ n. Hence tr(A) = λ 1 + ⋅ + λ n =  sum of the eigenvalues of A. This does not mean that λ 1 = a 11, λ 2 = a 22, …, λ n = a nn, only that the sums will be equal. Matrices in the Complex Domain When the elements in A = (a ij) can also be complex quantities, then a typical element in A will be of the form \(a+ib,\ i=\sqrt {(-1)}, \) and a, b real. The complex conjugate of A will be denoted by \(\bar {A}\) and the conjugate transpose will be denoted by A . Then, for example,

$$\displaystyle \begin{aligned}A=\left[\begin{matrix}1+i&2i&3-i\\ 4i&5&1+i\\ 2-i&i&3+i\end{matrix}\right]\Rightarrow\bar{A}=\left[\begin{matrix}1-i&-2i&3+i\\ -4i&5&1-i\\ 2+i&-i&3-i\end{matrix}\right],\ A^{*}=\left[\begin{matrix}1-i&-4i&2+i\\ -2i&5&-i\\ 3+i&1-i&3-i\end{matrix}\right].\end{aligned}$$

Thus, we can also write \(A^{*}=(\bar {A})'=\bar {(A')}\). When a matrix A is in the complex domain, we may write it as A = A 1 + iA 2 where A 1 and A 2 are real matrices. Then \(\bar {A}=A_1-iA_2\) and \(A^{*}=A_1^{\prime }-iA_2^{\prime }\). In the above example,

A Hermitian Matrix If A = A , then A is called a Hermitian matrix. In the representation A = A 1 + iA 2, if A = A , then \(A_1=A_1^{\prime }\) or A 1 is real symmetric, and \(A_2=-A_2^{\prime }\) or A 2 is real skew symmetric. Note that when X is an n × 1 vector, X X is real. Let

$$\displaystyle \begin{aligned} X&=\left[\begin{matrix}2\\ 3+i\\ 2i\end{matrix}\right]\Rightarrow \bar{X}=\left[\begin{matrix}2~~\\ 3-i~~\\ -2i\end{matrix}\right], \ X^{*}=[2~3-i~-2i],\\ X^{*}X&=[2~\ 3-i~\ -2i]\left[\begin{matrix}2\\ 3+i\\ 2i\end{matrix}\right]=(2^2+0^2)+(3^2+1^2)+(0^2+(-2)^2)=18.\end{aligned} $$

Consider the eigenvalues of a Hermitian matrix A, which are the solutions of |A − λI| = 0. As in the real case, λ may be real, positive, negative, zero, irrational or complex. Then, for XO,

$$\displaystyle \begin{aligned} AX&=\lambda X\Rightarrow \end{aligned} $$
(i)
$$\displaystyle \begin{aligned} X^{*}A^{*}&=\bar{\lambda}X^{*}\end{aligned} $$
(ii)

by taking the conjugate transpose. Since λ is scalar, its conjugate transpose is \(\bar {\lambda }\). Pre-multiply (i) by X and post-multiply (ii) by X. Then for XO, we have

$$\displaystyle \begin{aligned} X^{*}AX&=\lambda X^{*}X \end{aligned} $$
(iii)
$$\displaystyle \begin{aligned} X^{*}A^{*}X&=\bar{\lambda}X^{*}X.\end{aligned} $$
(iv)

When A is Hermitian, A = A , and so, the left-hand sides of (iii) and (iv) are the same. On subtracting (iv) from (iii), we have \(0=(\lambda -\bar {\lambda })X^{*}X\) where X X is real and positive, and hence \(\lambda -\bar {\lambda }=0\), which means that the imaginary part is zero or λ is real. If A is skew Hermitian, then we end up with \(\lambda +\bar {\lambda }=0\Rightarrow \lambda \) is zero or purely imaginary. The above procedure also holds for matrices in the real domain. Thus, in addition to properties (13) and (14), we have the following properties:

(19) The eigenvalues of a Hermitian matrix are real; however, the converse need not be true;

(20) The eigenvalues of a skew Hermitian matrix are zero or purely imaginary; however, the converse need not be true.

1.5. Definiteness of Matrices, Quadratic and Hermitian Forms

Let X be an n × 1 vector of real scalar variables x 1, …, x n so that X′ = (x 1, …, x n). Let A = (a ij) be a real n × n matrix. Then, all the elements of the quadratic form, u = X′AX, are of degree 2. One can always write A as an equivalent symmetric matrix when A is the matrix in a quadratic form. Hence, without any loss of generality, we may assume A = A′ (symmetric) when A appears in a quadratic form u = X′AX. Definiteness of a quadratic form and definiteness of a matrix are only defined for A = A′ (symmetric) in the real domain and for A = A (Hermitian) in the complex domain. Hence, the basic starting condition is that either A = A′ or A = A . If for all non-null X, that is, XO, X′AX > 0, A = A′, then A is said to be a positive definite matrix and X′AX > 0 is called a positive definite quadratic form. If for all non-null X, X AX > 0, A = A , then A is referred to as a Hermitian positive definite matrix and the corresponding Hermitian form X AX > 0, as Hermitian positive definite. Similarly, if for all non-null X, X′AX ≥ 0, X AX ≥ 0, then A is positive semi-definite or Hermitian positive semi-definite. If for all non-null X, X′AX < 0, X AX < 0, then A is negative definite and if X′AX ≤ 0, X AX ≤ 0, then A is negative semi-definite. The standard notations being utilized are as follows:

A > O (A and X′AX > 0 are real positive definite; (O is a capital o and not zero)

A ≥ O (A and X′AX ≥ 0 are positive semi-definite)

A < O (A and X′AX < 0 are negative definite)

A ≤ O (A and X′AX ≤ 0 are negative semi-definite).

All other matrices, which do no belong to any of those four categories, are called indefinite matrices. That is, for example, A is such that for some X, X′AX > 0, and for some other X, X′AX < 0, then A is an indefinite matrix. The corresponding Hermitian cases are:

$$\displaystyle \begin{aligned} A&>O, X^{*}AX>0 \mbox{ (Hermitian positive definite)}\\ A&\ge O, X^{*}AX\ge 0 \mbox{ (Hermitian positive semi-definite)}\\ A&<O,X^{*}AX<0 \mbox{ (Hermitian negative definite)}\\ A&\le O, X^{*}AX\le 0 \mbox{ (Hermitian negative semi-definite).}{}\end{aligned} $$
(1.5.1)

In all other cases, the matrix A and the Hermitian form X AX are indefinite. Certain conditions for the definiteness of A = A′ or A = A are the following:

(1) All the eigenvalues of A are positiveA > O; all eigenvalues are greater than or equal to zeroA ≥ O; all eigenvalues are negativeA < O; all eigenvalues are ≤ 0 ⇔ A ≤ O; all other matrices A = A′ or A = A for which some eigenvalues are positive and some others are negative are indefinite.

(2) A = A′or A = A and all the leading minors of A are positive (leading minors are determinants of the leading sub-matrices, the r-th leading sub-matrix being obtained by deleting all rows and columns from the (r + 1)-th onward), then A > O; if the leading minors are ≥ 0, then A ≥ O; if all the odd order minors are negative and all the even order minors are positive, then A < O; if the odd order minors are ≤ 0 and the even order minors are ≥ 0, then A ≤ O; all other matrices are indefinite. If AA′ or AA , then no definiteness can be defined in terms of the eigenvalues or leading minors. Let

$$\displaystyle \begin{aligned}A=\left[\begin{matrix}2&0\\ 0&5\end{matrix}\right].\end{aligned}$$

Note that A is real symmetric as well as Hermitian. Since \(X'AX=2x_1^2+5x_2^2>0\) for all real x 1 and x 2 as long as both x 1 and x 2 are not both equal to zero, A > O (positive definite). \(X^{*}AX=2|x_1|{ }^2+5|x_2|{ }^2=2[\sqrt {(x_{11}^2+x_{12}^2)}]^2+5[\sqrt {(x_{21}^2+x_{22}^2)}]^2>0\) for all x 11, x 12, x 21, x 22, as long as all are not simultaneously equal to zero, where x 1 = x 11 + ix 12, x 2 = x 21 + ix 22 with x 11, x 12, x 21, x 22 being real and \(i=\sqrt {(-1)}\). Consider

Then, B < O and C is indefinite. Consider the following symmetric matrices:

The leading minors of A 1 are \(5>0,\ \left \vert \begin {matrix}5&2\\ 2&4\end {matrix}\right \vert =16>0\). The leading minors of A 2 are \(2>0,\ \left \vert \begin {matrix}2&2\\ 2&1\end {matrix}\right \vert =-2<0\). The leading minors of A 3 are . Hence A 1 > O, A 3 < O and A 2 is indefinite. The following results will be useful when reducing a quadratic form or Hermitian form to its canonical form.

(3) For every real A = A′ (symmetric), there exists an orthonormal matrix Q, QQ′ = I, Q′Q = I such that Q′AQ = diag(λ 1, …, λ n) where λ 1, …, λ n are the eigenvalues of the n × n matrix A and diag(…) denotes a diagonal matrix. In this case, a real quadratic form will reduce to the following linear combination:

$$\displaystyle \begin{aligned}X'AX=Y'Q'AQY=Y'\mathrm{diag}(\lambda_1,\ldots,\lambda_n)Y=\lambda_1y_1^2+\cdots +\lambda_ny_n^2,\ Y=Q'X.{} \end{aligned} $$
(1.5.2)

(4): For every Hermitian matrix A = A , there exists a unitary matrix U, U U = I, UU  = I such that

$$\displaystyle \begin{aligned}X^{*}AX=Y^{*}\mathrm{diag}(\lambda_1,\ldots,\lambda_n)Y=\lambda_1|y_1|{}^2+\cdots +\lambda_n|y_n|{}^2,\ Y=U^{*}X.{} \end{aligned} $$
(1.5.3)

When A > O (real positive definite or Hermitian positive definite) then all the λ j’s are real and positive. Then, X′AX and X AX are strictly positive.

Let A and B be n × n matrices. If AB = BA, in which case we say that A and B commute, then both A and B can be simultaneously reduced to their canonical forms (diagonal forms with the diagonal elements being the eigenvalues) with the same orthonormal or unitary matrix P, PP′ = I, P′P = I if P is real and PP  = I, P P = I if complex, such that P′AP = diag(λ 1, …, λ n) and P′BP = diag(μ 1, …, μ n) where λ 1, …, λ n are the eigenvalues of A and μ 1, …, μ n are the eigenvalues of B. In the complex case, P AP = diag(λ 1, …, λ n) and P BP = diag(μ 1, …, μ n). Observe that the eigenvalues of Hermitian matrices are real.

1.5.1. Singular value decomposition

For an n × n symmetric matrix A = A′, we have stated that there exists an n × n orthonormal matrix P, PP′ = I n, P′P = I n such that P′AP = D = diag(λ 1, …, λ n), where λ 1, …, λ n are the eigenvalues of A. If a square matrix A is not symmetric, there exists a nonsingular matrix Q such that Q −1 AQ = D = diag(λ 1, …, λ n) when the rank of A is n. If the rank is less than n, then we may be able to obtain a nonsingular Q such that the above representation holds; however, this is not always possible. If A is a p × q rectangular matrix for pq or if p = q and AA′, then can we find two orthonormal matrices U and V  such that \(A=U\left [\begin {matrix}\varLambda &O\\ O&O\end {matrix}\right ]V'\) where Λ = diag(λ 1, …, λ k), UU′ = I p, U′U = I p, VV = I q, V′V = I q and k is the rank of A. This representation is equivalent to the following:

$$\displaystyle \begin{aligned}A=U\left[\begin{matrix}\varLambda &O\\ O&O\end{matrix}\right]V'=U_{(1)}\varLambda V_{(1)}^{\prime}\end{aligned} $$
(i)

where U (1) = [U 1, …, U k], V (1) = [V 1, …, V k], U j being the normalized eigenvector of AA′ corresponding to the eigenvalue \(\lambda _j^2\) and V j, the normalized eigenvector of A′A corresponding to the eigenvalue \(\lambda _j^2\). The representation given in (i) is known as the singular value decomposition of A and λ 1 > 0, …, λ k > 0 are called the singular values of A. Then, we have

$$\displaystyle \begin{aligned}AA'=U\left[\begin{matrix}\varLambda^2&O\\ O&O\end{matrix}\right]U'=U_{(1)}\varLambda^2 U_{(1)}^{\prime},\ A'A=V\left[\begin{matrix}\varLambda^2&O\\ O&O\end{matrix}\right]V'=V_{(1)}\varLambda^2V_{(1)}^{\prime}.\end{aligned} $$
(ii)

Thus, the procedure is the following: If p ≤ q, compute the nonzero eigenvalues of AA′, otherwise compute the nonzero eigenvalues of A′A. Denote them by \(\lambda _1^2,\ldots ,\lambda _k^2\) where k is the rank of A. Construct the following normalized eigenvectors U 1, …, U k from AA′. This gives U (1) = [U 1, …, U k]. Then, by using the same eigenvalues \(\lambda _j^2,\ j=1,\ldots ,k\), determine the normalized eigenvectors, V 1, …, V k, from A′A, and let V (1) = [V 1, …, V k]. Let us verify the above statements with the help of an example. Let

Then,

The eigenvalues of AA′ are \(\lambda _1^2=3\) and \(\lambda _2^2=2\). The corresponding normalized eigenvectors of AA′ are U 1 and U 2, where so that \(~U_{(1)}=[U_1,U_2]=\left [\begin {matrix}1&0\\ 0&1\end {matrix}\right ].\)

Now, by using \(\lambda _1^2=3\) and \(\lambda _2^2=2\), compute the normalized eigenvectors from A′A. They are:

Then \(\varLambda =\mathrm {diag}(\sqrt {3},\sqrt {2})\). Also,

This establishes the result. Observe that

$$\displaystyle \begin{aligned} AA'&=[U_{(1)}\varLambda V_{(1)}^{\prime}][U_{(1)}\varLambda V_{(1)}^{\prime}]'=U_{(1)}\varLambda^2 U_{(1)}^{\prime}\\ A'A&=[U_{(1)}\varLambda V_{(1)}^{\prime}]'[U_{(1)}\varLambda V_{(1)}^{\prime}]=V_{(1)}\varLambda^2 V_{(1)}^{\prime}\\ \varLambda^2&=\mathrm{diag}(\lambda_1^2,\lambda_2^2).\end{aligned} $$

1.6. Wedge Product of Differentials and Jacobians

If y = f(x) is an explicit function of x, where x and y are real scalar variables, then we refer to x as the independent variable and to y as the dependent variable. In the present context, “independent” means that values for x are preassigned and the corresponding values of y are evaluated from the formula y = f(x). The standard notations for small increment in x and the corresponding increment in y are Δx and Δy, respectively. By convention, Δx > 0 and Δy can be positive, negative or zero depending upon the function f. If Δx goes to zero, then the limit is zero. However, if Δx goes to zero in the presence of the ratio \(\frac {\varDelta y}{\varDelta x}\), then we have a different situation. Consider the identity

$$\displaystyle \begin{aligned}\varDelta y\equiv \Big(\frac{\varDelta y}{\varDelta x} \Big)\varDelta x\ \Rightarrow \ \mathrm{d}y=A\, \mathrm{d}x, \ A=\frac{\mathrm{d}y}{\mathrm{d}x}.{} \end{aligned} $$
(1.6.1)

This identity can always be written due to our convention Δx > 0. Consider Δx → 0. If \(\frac {\varDelta y}{\varDelta x}\) attains a limit at some stage as Δx → 0, let us denote it by \(A=\lim _{\varDelta x\to 0}\frac {\varDelta y}{\varDelta x}\), then the value of Δx at that stage is the differential of x, namely dx, and the corresponding Δy is dy and A is the ratio of the differentials \(A=\frac {\mathrm {d}y}{\mathrm {d}x}\). If x 1, …, x k are independent variables and if y = f(x 1, …, x k), then by convention Δx 1 > 0, …, Δx k > 0. Thus, in light of (1.6.1), we have

$$\displaystyle \begin{aligned}\mathrm{d}y=\frac{\partial f}{\partial x_1}\mathrm{d}x_1+\cdots +\frac{\partial f}{\partial x_k}\mathrm{d}x_k{} \end{aligned} $$
(1.6.2)

where \(\frac {\partial f}{\partial x_j}\) is the partial derivative of f with respect to x j or the derivative of f with respect to x j, keeping all other variables fixed. Wedge Product of Differentials Let dx and dy be differentials of the real scalar variables x and y. Then the wedge product or skew symmetric product of dx and dy is denoted by dx ∧dy and is defined as

$$\displaystyle \begin{aligned}\mathrm{d}x\wedge\mathrm{d}y=-\mathrm{d}y\wedge\mathrm{d}x\ \Rightarrow\ \mathrm{d}x\wedge\mathrm{d}x=0\ \ \mathrm{and}\ \ \mathrm{d}y\wedge\mathrm{d}y=0.\ {} \end{aligned} $$
(1.6.3)

This definition indicates that higher order wedge products involving the same differential are equal to zero. Letting

$$\displaystyle \begin{aligned}y_1=f_1(x_1,x_2)\ \ \mathrm{and}\ \ y_2=f_2(x_1,x_2),\end{aligned}$$

it follows from the basic definitions that

$$\displaystyle \begin{aligned}\mathrm{d}y_1=\frac{\partial f_1}{\partial x_1}\mathrm{d}x_1+\frac{\partial f_1}{\partial x_2}\mathrm{d}x_2\ \ \mathrm{and}\ \ \mathrm{d}y_2=\frac{\partial f_2}{\partial x_1}\mathrm{d}x_1+\frac{\partial f_2}{\partial x_2}\mathrm{d }x_2. \end{aligned}$$

By taking the wedge product and using the properties specified in (1.6.3), that is, dx 1 ∧dx 1 = 0, dx 2 ∧dx 2 = 0, dx 2 ∧dx 1 = −dx 1 ∧dx 2, we have

In the general case we have the following corresponding result:

(1.6.4)

where dX = dx 1 ∧… ∧dx k, dY = dy 1 ∧… ∧dy k and \(J=|(\frac {\partial f_i}{\partial x_j})|=\) the determinant of the matrix of partial derivatives where the (i, j)-th element is the partial derivative of f i with respect to x j. In dX and dY, the individual real scalar variables can be taken in any order to start with. However, for each interchange of variables, the result is to be multiplied by − 1.

Example 1.6.1

Consider the transformation \(x_1=r\cos ^2\theta ,\ x_2=r\sin ^2\theta ,\ 0\le r<\infty ,\ 0\le \theta \le \frac {\pi }{2},\ x_1\ge 0, x_2\ge 0\). Determine the relationship between dx 1 ∧dx 2 and dr ∧dθ.

Solution 1.6.1

Taking partial derivatives, we have

$$\displaystyle \begin{aligned} \frac{\partial x_1}{\partial r}&=\cos^2\theta,\ \frac{\partial x_1}{\partial \theta}=-2r\cos\theta\sin\theta,\\ \frac{\partial x_2}{\partial r}&=\sin^2\theta,\ \frac{\partial x_2}{\partial\theta}=2r\cos\theta\sin\theta.\end{aligned} $$

Then, the determinant of the matrix of partial derivatives is given by

since \(\cos ^2\theta +\sin ^2\theta =1\). Hence,

$$\displaystyle \begin{aligned}\mathrm{d}x_1\wedge\mathrm{d}x_2=2r\cos\theta\sin\theta~\mathrm{d}r\wedge\mathrm{d}\theta,\ J=2r\cos\theta\sin\theta.\end{aligned}$$

We may also establish this result by direct evaluation.

$$\displaystyle \begin{aligned} \mathrm{d}x_1&=\frac{\partial x_1}{\partial r}\mathrm{d}r+\frac{\partial x_1}{\partial\theta}\mathrm{d}\theta=\cos^2\theta~\mathrm{d}r-2r\cos\theta\sin\theta~\mathrm{d}\theta,\\ \mathrm{d}x_2&=\frac{\partial x_2}{\partial r}\mathrm{d}r+\frac{\partial x_2}{\partial \theta}\mathrm{d}\theta=\sin^2\theta~\mathrm{d}r+2r\cos\theta\sin\theta ~\mathrm{d}\theta,\end{aligned} $$
$$\displaystyle \begin{aligned} \mathrm{d}x_1\wedge\mathrm{d}x_2&=\cos^2\theta\sin^2\theta\,\mathrm{d}r\wedge\mathrm{d}r+\cos^2\theta(2r\cos\theta\sin\theta)\,\mathrm{d}r\wedge\mathrm{d}\theta\\ &\ \ \ \ -\sin^2\theta(2r\cos\theta\sin\theta)\,\mathrm{d}\theta\wedge\mathrm{d}r-(2r\cos\theta\sin\theta)^2\mathrm{d}\theta\wedge\mathrm{d}\theta\\ &=2r\cos\theta\sin\theta\,[\cos^2\theta\,\mathrm{d}r\wedge\mathrm{d}\theta-\sin^2\theta\,\mathrm{d}\theta\wedge\mathrm{d}r],\ [\mathrm{d r}\wedge\mathrm{d}r=0,\ \mathrm{d}\theta\wedge\mathrm{d}\theta=0]\\ &=2r\cos\theta\sin\theta\,[\cos^2\theta+\sin^2\theta]\,\mathrm{d}r\wedge\mathrm{d}\theta,\ [\mathrm{d}\theta\wedge\mathrm{d}r=-\mathrm{d}r\wedge\mathrm{d}\theta]\\ &=2r\cos\theta\sin\theta\,\mathrm{d}r\wedge\mathrm{d}\theta.\end{aligned} $$

Linear Transformation Consider the linear transformation Y = AX where

$$\displaystyle \begin{aligned}Y=\left[\begin{matrix}y_1\\ \vdots\\ y_p\end{matrix}\right], \ X=\left[\begin{matrix}x_1\\ \vdots\\ x_p\end{matrix}\right],\ A=\left[\begin{matrix}a_{11}&\ldots&a_{1p}\\ \vdots&\ddots&\vdots\\ a_{p1}&\ldots&a_{pp}\end{matrix}\right].\end{aligned}$$

Then, \(\frac {\partial y_i}{\partial x_j}=a_{ij}\Rightarrow (\frac {\partial y_i}{\partial x_j})=(a_{ij})=A\). Then dY = |A| dX or J = |A|. Hence, the following result:

Theorem 1.6.1

Let X and Y  be p × 1 vectors of distinct real variables and A = (a ij) be a constant nonsingular matrix. Then, the transformation Y = AX is one to one and

$$\displaystyle \begin{aligned}Y=AX,\ |A|\ne 0\ \Rightarrow \ \mathrm{d}Y=|A|\,\mathrm{d}X.{}\end{aligned} $$
(1.6.5)

Let us consider the complex case. Let \(\tilde {X}=X_1+iX_2\) where a tilde indicates that the matrix is in the complex domain, X 1 and X 2 are real p × 1 vectors if \(\tilde {X}\) is p × 1, and \(i=\sqrt {(-1)}\). Then, the wedge product \(\mathrm {d}\tilde {X}\) is defined as \(\mathrm {d}\tilde {X}=\mathrm {d}X_1\wedge \mathrm {d}X_2\). This is the general definition in the complex case whatever be the order of the matrix. If \(\tilde {Z}\) is m × n and if \(\tilde {Z}=Z_1+iZ_2\) where Z 1 and Z 2 are m × n and real, then \(\mathrm {d}\tilde {Z}=\mathrm {d}Z_1\wedge \mathrm {d}Z_2\). Letting the constant p × p matrix A = A 1 + iA 2 where A 1 and A 2 are real and p × p, and letting \(\tilde {Y}=Y_1+iY_2\) be p × 1 where Y 1 and Y 2 are real and p × 1, we have

$$\displaystyle \begin{aligned} \tilde{Y}&=A\tilde{X}\Rightarrow Y_1+iY_2=[A_1+iA_2][X_1+iX_2]\\ &=[A_1X_1-A_2X_2]+i[A_1X_2+A_2X_1]\Rightarrow\\ Y_1&=A_1X_1-A_2X_2,\ \ Y_2=A_1X_2+A_2X_1\Rightarrow\\ \left[\begin{matrix}Y_1\\ Y_2\end{matrix}\right]&=\left[\begin{matrix}A_1&-A_2\\ A_2&A_1\end{matrix}\right]\left[\begin{matrix}X_1\\ X_2\end{matrix}\right].\end{aligned} $$
(i)

Now, applying Result 1.6.1 on (i), it follows that

$$\displaystyle \begin{aligned}\mathrm{d}Y_1\wedge\mathrm{d}Y_2=\mathrm{det}\left[\begin{matrix}A_1&-A_2\\ A_2&A_1\end{matrix}\right] \mathrm{d}X_1\wedge\mathrm{d}X_2.\end{aligned} $$
(ii)

That is,

$$\displaystyle \begin{aligned}\mathrm{d}\tilde{Y}=\mathrm{det}\left[\begin{matrix}A_1&-A_2\\ A_2&A_1\end{matrix}\right]\mathrm{d}\tilde{X}\ \Rightarrow\ \mathrm{d}\tilde{Y}=J\,\mathrm{d}\tilde{X}\end{aligned} $$
(iii)

where the Jacobian can be shown to be the absolute value of the determinant of A. If the determinant of A is denoted by det(A) and its absolute value, by |det(A)|, and if det(A) = a + ib with a, b real and \(i=\sqrt {(-1)}\) then the absolute value of the determinant is \(+\sqrt {(a+ib)(a-ib)}=+\sqrt {(a^2+b^2)}=+\sqrt {[\mathrm {det}(A)][\mathrm {det}(A^{*})]}=+\sqrt {[\mathrm {det}(AA^{*})]}\). It can be easily seen that the above Jacobian is given by

$$\displaystyle \begin{aligned} J&=\mathrm{det}\left[\begin{matrix}A_1&-A_2\\ A_2&A_1\end{matrix}\right]=\mathrm{det}\left[\begin{matrix}A_1&-iA_2\\ -iA_2&A_1\end{matrix}\right]\\ &\ \ \ \ \ \ \mbox{(multiplying the second row block by }-i\mbox{ and second column block by }i)\\ &=\mathrm{det}\left[\begin{matrix}A_1-iA_2&A_1-iA_2\\ -iA_2&A_1\end{matrix}\right]\mbox{ (adding the second row block to the first row block)}\\ &=\mathrm{det}(A_1-iA_2)\,\mathrm{det}\left[\begin{matrix}I&I\\ -iA_2&A_1\end{matrix}\right]=\mathrm{det}(A_1-iA_2)\mathrm{det}(A_1+iA_2)\\ &\mbox{ (adding }(-1)\mbox{ times the first }p\mbox{ columns to the last }p\mbox{ columns)}\\ &=[\mathrm{det}(A)]\,[\mathrm{det}(A^{*})]=[\mathrm{det}(AA^{*})]=|\mathrm{det}(A)|{}^2.\end{aligned} $$

Then, we have the following companion result of Theorem 1.6.1.

Theorem 1.6a.1

Let \(\tilde {X}\) and \(\tilde {Y}\) be p × 1 vectors in the complex domain, and let A be a p × p nonsingular constant matrix that may or may not be in the complex domain. If C is a constant p × 1 vector, then

$$\displaystyle \begin{aligned}\tilde{Y}=A\tilde{X}+C, \ \mathrm{det}(A)\ne 0\ \Rightarrow \ \mathrm{d}\tilde{Y}=|\mathrm{det}(A)|{}^2\mathrm{d}\tilde{X}=|\mathrm{det}(AA^{*})|\,\mathrm{d}\tilde{X}.{}\end{aligned} $$
(1.6a.1)

For the results that follow, the complex case can be handled in a similar way and hence, only the final results will be stated. For details, the reader may refer to Mathai (1997). A more general result is the following:

Theorem 1.6.2

Let X and Y  be real m × n matrices with distinct real variables as elements. Let A be a m × m nonsingular constant matrix and C be a m × n constant matrix. Then

$$\displaystyle \begin{aligned}Y=AX+C,\ \mathrm{det}(A)\ne 0\ \Rightarrow\ \mathrm{d}Y=|A|{}^n\mathrm{d}X.{}\end{aligned} $$
(1.6.6)

The companion result is stated in the next theorem.

Theorem 1.6a.2

Let \(\tilde {X}\) and \(\tilde {Y}\) be m × n matrices in the complex domain. Let A be a constant m × m nonsingular matrix that may or may not be in the complex domain, and C be a m × n constant matrix. Then

$$\displaystyle \begin{aligned}\tilde{Y}=A\tilde{X}+C,\ \mathrm{det}(A)\ne 0\Rightarrow \mathrm{d}\tilde{Y}=|\mathrm{det}(AA^{*})|{}^n\mathrm{d}\tilde{X}.{}\end{aligned} $$
(1.6a.2)

For proving the Theorems 1.6.2 and 1.6a.2, consider the columns of Y  and X. Then apply Theorems 1.6.1 and 1.6a.1 to establish the results. If X, \(\tilde {X},\ Y, \ \tilde {Y}\) are as defined in Theorems 1.6.2 and 1.6a.2 and if B is a n × n nonsingular constant matrix, then we have the following results:

Theorems 1.6.3 and 1.6a.3

Let \(X ,\ \tilde {X},\ Y,\ \tilde {Y}\) and C be m × n matrices with distinct elements as previously defined, C be a constant matrix and B be a n × n nonsingular constant matrix. Then

$$\displaystyle \begin{aligned}Y=XB+C,\ \mathrm{det}(B)\ne 0\Rightarrow \mathrm{d}Y=|B|{}^m\mathrm{d}X{}\end{aligned} $$
(1.6.7)

and

$$\displaystyle \begin{aligned}\tilde{Y}=\tilde{X}B+C,\ \mathrm{det}(B)\ne 0\Rightarrow\mathrm{d}\tilde{Y}=|\mathrm{det}(BB^{*})|{}^m\mathrm{d}\tilde{X}.{}\end{aligned} $$
(1.6a.3)

For proving these results, consider the rows of \(Y,\ \tilde {Y},\ X,\ \tilde {X}\) and then apply Theorems 1.6.1,1.6a.1 to establish the results. Combining Theorems 1.6.2 and 1.6.3, as well as Theorems 1.6a.2 and 1.6a.3, we have the following results:

Theorems 1.6.4 and 1.6a.4

Let \(X,\ \tilde {X},\ Y,\ \tilde {Y}\) be m × n matrices as previously defined, and let A be m × m and B be n × n nonsingular constant matrices. Then

$$\displaystyle \begin{aligned}Y=AXB,\,\mathrm{det}(A)\ne 0,\, \mathrm{det}(B)\ne 0\Rightarrow \mathrm{d}Y=|A|{}^n|B|{}^m\mathrm{d}X{} \end{aligned} $$
(1.6.8)

and

$$\displaystyle \begin{aligned}\tilde{Y}=A\tilde{X}B,\,\mathrm{det}(A)\ne 0,\ \mathrm{det}(B)\ne 0\Rightarrow\mathrm{d}\tilde{Y}=|\mathrm{det}(AA^{*})|{}^n|\mathrm{det}(BB^{*})|{}^m\mathrm{d}\tilde{X}.{}\end{aligned} $$
(1.6a.4)

We now consider the case of linear transformations involving symmetric and Hermitian matrices.

Theorems 1.6.5 and 1.6a.5

Let X = X′, Y = Y ′ be real symmetric p × p matrices and let \(\tilde {X}=\tilde {X}^{*},\ \tilde {Y}=\tilde {Y}^{*}\) be p × p Hermitian matrices. If A is a p × p nonsingular constant matrix, then

$$\displaystyle \begin{aligned}Y=AXA', \ Y=Y',\ X=X',\ \mathrm{det}(A)\ne 0\Rightarrow \mathrm{d}Y=|A|{}^{p+1}\mathrm{d}X{}\end{aligned} $$
(1.6.9)

and

$$\displaystyle \begin{aligned}\tilde{Y}=A\tilde{X}A^{*},\ \mathrm{det}(A)\ne 0\Rightarrow \mathrm{d}\tilde{Y}=|\mathrm{det}(AA^{*})|{}^p\mathrm{d}\tilde{X}{} \end{aligned} $$
(1.6a.5)

for \(\tilde {X}=\tilde {X}^{*}\) or \(\tilde {X}=-\tilde {X}^{*}.\)

The proof involves some properties of elementary matrices and elementary transformations. Elementary matrices were introduced in Sect. 1.2.1. There are two types of basic elementary matrices, the E and F types where the E type is obtained by multiplying any row (column) of an identity matrix by a nonzero scalar and the F type is obtained by adding any row to any other row of an identity matrix. A combination of E and F type matrices results in a G type matrix where a constant multiple of one row of an identity matrix is added to any other row. The G type is not a basic elementary matrix. By performing successive pre-multiplication with E, F and G type matrices, one can reduce a nonsingular matrix to a product of the basic elementary matrices of the E and F types, observing that the E and F type elementary matrices are nonsingular. This result is needed to establish Theorems 1.6.5 and 1.6a.5. Let A = E 1 E 2 F 1E r F s for some E 1, …, E r and F 1, …, F s. Then

$$\displaystyle \begin{aligned}AXA'=E_1E_2F_1\cdots E_rF_sXF_s^{\prime}E_r^{\prime}\cdots E_2^{\prime} E_1^{\prime}.\end{aligned}$$

Let \(Y_1=F_sXF_s^{\prime }\) in which case the connection between dX and dY 1 can be determined from F s. Now, letting \(Y_2=E_rY_1E_r^{\prime }\), the connection between dY 2 and dY 1 can be similarly determined from E r. Continuing in this manner, we finally obtain the connection between dY  and dX, which will give the Jacobian as |A|p+1 for the real case. In the complex case, the procedure is parallel.

We now consider two basic nonlinear transformations. In the first case, X is a p × p nonsingular matrix going to its inverse, that is, Y = X −1.

Theorems 1.6.6 and 1.6a.6

Let X and \( \tilde {X}\) be p × p real and complex nonsingular matrices, respectively. Let the regular inverses be denoted by Y = X −1 and \( \tilde {Y}=\tilde {X}^{-1},\) respectively. Then, ignoring the sign,

$$\displaystyle \begin{aligned}Y=X^{-1}\Rightarrow \mathrm{d}Y=\begin{cases}|X|{}^{-2p}\mathrm{d}X\mathit{\mbox{ for a general }}X \\ |X|{}^{-(p+1)}\mathrm{d}X\mathit{\mbox{ for }}X=X' \\ |X|{}^{-(p-1)}\mathrm{d}X\mathit{\mbox{ for }}X=-X' \end{cases}{}\end{aligned} $$
(1.6.10)

and

$$\displaystyle \begin{aligned}\tilde{Y}=\tilde{X}^{-1}\Rightarrow\mathrm{d}\tilde{Y}=\begin{cases} |\mathrm{det}(\tilde{X}\tilde{X}^{*})|{}^{-2p}\mbox{ for a general} \tilde{X}\\ |\mathrm{det}(\tilde{X}\tilde{X}^{*})|{}^{-p}\mbox{ for }\tilde{X}=\tilde{X}^{*}\mbox{ or } \tilde{X}=-\tilde{X}^{*}.\end{cases}{}\end{aligned} $$
(1.6a.6)

The proof is based on the following observations: In the real case XX −1 = I p ⇒ (dX)X −1 + X(dX −1) = O where (dX) represents the matrix of differentials in X. This means that

$$\displaystyle \begin{aligned}(\mathrm{d}X^{-1})=-X^{-1}(\mathrm{d}X)X^{-1}. \end{aligned}$$

The differentials are appearing only in the matrices of differentials. Hence this situation is equivalent to the general linear transformation considered in Theorems 1.6.4 and 1.6.5 where X and X −1 act as constants. The result is obtained upon taking the wedge product of differentials. The complex case is parallel.

The next results involve real positive definite matrices or Hermitian positive definite matrices that are expressible in terms of triangular matrices and the corresponding connection between the wedge product of differentials. Let X and \(\tilde {X}\) be complex p × p real positive definite and Hermitian positive definite matrices, respectively. Let T = (t ij) be a real lower triangular matrix with t ij = 0, i < j, t jj > 0, j = 1, …, p, and the t ij’s, i ≥ j, be distinct real variables. Let \(\tilde {T}=(\tilde {t}_{ij})\) be a lower triangular matrix with \(\tilde {t}_{ij}=0\), for i < j, the \(\tilde {t}_{ij}\)’s, i > j, be distinct complex variables, and \(\tilde {t}_{jj},\ j=1,\ldots ,p,\) be positive real variables. Then, the transformations X = TT′ in the real case and \(\tilde {X}=\tilde {T}\tilde {T}^{*}\) in the complex case can be shown to be one-to-one, which enables us to write dX in terms of dT and vice versa, uniquely, and \(\mathrm {d}\tilde {X}\) in terms of \(\mathrm {d}\tilde {T},\) uniquely. We first consider the real case. When p = 2,

$$\displaystyle \begin{aligned}\left[\begin{matrix}x_{11}&x_{12}\\ x_{12}&x_{22}\end{matrix}\right],\ x_{11}>0,\ x_{22}>0,\ x_{21}=x_{12},\ x_{11}x_{22}-x_{12}^2>0\end{aligned}$$

due to positive definiteness of X, and

$$\displaystyle \begin{aligned} X&=TT'=\left[\begin{matrix}t_{11}&0\\ t_{21}&t_{22}\end{matrix}\right]\left[\begin{matrix}t_{11}&t_{21}\\ 0&t_{22}\end{matrix}\right]=\left[\begin{matrix}t_{11}^2&t_{21}t_{11}\\ t_{21}t_{11}&t_{21}^2+t_{22}^2\end{matrix}\right]\Rightarrow\\ \frac{\partial x_{11}}{\partial t_{11}}&=2t_{11},\ \frac{\partial x_{11}}{\partial t_{21}}=0,\ \frac{\partial x_{11}}{\partial t_{22}}=0\\ \frac{\partial x_{22}}{\partial t_{11}}&=0,\ \frac{\partial x_{22}}{\partial t_{21}}=2t_{21}, \ \frac{\partial x_{22}}{\partial t_{22}}=2t_{22}\\ \frac{\partial x_{12}}{\partial t_{11}}&=t_{21},\ \frac{\partial x_{12}}{\partial t_{21}}=t_{11},\ \frac{\partial x_{12}}{\partial t_{22}}=0.\end{aligned} $$

Taking the x ij’s in the order x 11, x 12, x 22 and the t ij’s in the order t 11, t 21, t 22, we form the following matrix of partial derivatives:

where an asterisk indicates that an element may be present in that position; however, its value is irrelevant since the matrix is triangular and its determinant will simply be the product of its diagonal elements. It can be observed from this pattern that for a general p, a diagonal element will be multiplied by 2 whenever x jj is differentiated with respect to t jj, j = 1, …, p. Then t 11 will appear p times, t 22 will appear p − 1 times, and so on, and t pp will appear once along the diagonal. Hence the product of the diagonal elements will be \(2^p\,t_{11}^p\,t_{22}^{p-1}\cdots t_{pp}=2^p\{\prod _{j=1}^pt_{jj}^{p+1-j}\}\). A parallel procedure will yield the Jacobian in the complex case. Hence, the following results:

Theorems 1.6.7 and 1.6a.7

Let X, \(\tilde {X}, T\) and \( \tilde {T}\) be p × p matrices where X is real positive definite, \(\tilde {X}\) is Hermitian positive definite, and T and \(\tilde {T}\) are lower triangular matrices whose diagonal elements are real and positive as described above. Then the transformations X = TT′ and \(\tilde {X}=\tilde {T}\tilde {T}^{*}\) are one-to-one, and

$$\displaystyle \begin{aligned}\mathrm{d}X=2^p\{\prod_{j=1}^pt_{jj}^{p+1-j}\}\,\mathrm{d}T{}\end{aligned} $$
(1.6.11)

and

$$\displaystyle \begin{aligned}\mathrm{d}\tilde{X}=2^p\{\prod_{j=1}^pt_{jj}^{2(p-j)+1}\}\,\mathrm{d}\tilde{T}.{}\end{aligned} $$
(1.6a.7)

Given these introductory materials, we will explore multivariate statistical analysis from the perspective of Special Functions. As far as possible, the material in this chapter is self-contained. A few more Jacobians will be required when tackling transformations involving rectangular matrices or eigenvalue problems. These will be discussed in the respective chapters later on.

Example 1.6.2

Evaluate the following integrals: (1): \(\int _X\mathrm {e}^{-X'AX}\mathrm {d}X\) where A > O (real positive definite) is 3 × 3 and X is a 3 × 1 vector of distinct real scalar variables; (2): \(\int _X\mathrm {e}^{-\mathrm {tr}(AXBX')}\mathrm {d}X\) where X is a 2 × 3 matrix of distinct real scalar variables, A > O (real positive definite), is 2 × 2 and B > O (real positive definite) is 3 × 3, A and B being constant matrices; (3): ∫X>Oe−tr(X)dX where X = X′ > O is a 2 × 2 real positive definite matrix of distinct real scalar variables.

Solution 1.6.2

(1) Let X′ = (x 1, x 2, x 3), A > O. Since A > O, we can uniquely define \(A^{\frac {1}{2}}=(A^{\frac {1}{2}})'\). Then, write \(X'AX=X'A^{\frac {1}{2}}A^{\frac {1}{2}}X=Y'Y,\ Y=A^{\frac {1}{2}}X\). It follows from Theorem 1.6.1 that \(\mathrm {d}X=|A|{ }^{-\frac {1}{2}}\mathrm {d}Y\), and letting Y = (y 1, y 2, y 3), we have

$$\displaystyle \begin{aligned} \int_X\mathrm{e}^{-X'AX}\mathrm{d}X&=|A|{}^{-\frac{1}{2}}\int_Y\mathrm{e}^{-(Y'Y)}\mathrm{d}Y\\ &=|A|{}^{-\frac{1}{2}}\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\mathrm{e}^{-(y_1^2+y_2^2+y_3^2)}\mathrm{d}y_1\wedge\mathrm{d}y_2\wedge\mathrm{d}y_3.\end{aligned} $$

Since

$$\displaystyle \begin{aligned}\int_{-\infty}^{\infty}\mathrm{e}^{-y_j^2}\mathrm{d}y_j=\sqrt{\pi},\ j=1,2,3, \end{aligned}$$
$$\displaystyle \begin{aligned}\int_X\mathrm{e}^{-X'AX}\mathrm{d}X=|A|{}^{-\frac{1}{2}}(\sqrt{\pi})^3. \end{aligned}$$

(2) Since A is a 2 × 2 positive definite matrix, there exists a 2 × 2 matrix \(A^{\frac {1}{2}}\) that is symmetric and positive definite. Similarly, there exists a 3 × 3 matrix \(B^{\frac {1}{2}}\) that is symmetric and positive definite. Let \(Y=A^{\frac {1}{2}}XB^{\frac {1}{2}}\Rightarrow \mathrm {d}Y=|A|{ }^{\frac {3}{2}}|B|{ }^{\frac {2}{2}}\mathrm {d}X\) or \(\mathrm {d}X=|A|{ }^{-\frac {3}{2}}|B|{ }^{-1}\mathrm {d}Y\) by Theorem 1.6.4. Moreover, given two matrices A 1 and A 2, tr(A 1 A 2) = tr(A 2 A 1) even if A 1 A 2A 2 A 1, as long as the products are defined. By making use of this property, we may write

$$\displaystyle \begin{aligned} \mathrm{tr}(AXBX')&=\mathrm{tr}(A^{\frac{1}{2}}A^{\frac{1}{2}}XB^{\frac{1}{2}}B^{\frac{1}{2}}X')=\mathrm{tr}(A^{\frac{1}{2}}XB^{\frac{1}{2}}B^{\frac{1}{2}}X'A^{\frac{1}{2}})\\ &=\mathrm{tr}[(A^{\frac{1}{2}}XB^{\frac{1}{2}})(A^{\frac{1}{2}}XB^{\frac{1}{2}})']=\mathrm{tr}(YY')\end{aligned} $$

where \(Y=A^{\frac {1}{2}}XB^{\frac {1}{2}}\) and dY  is given above. However, for any real matrix Y , whether square or rectangular, tr(YY) = tr(Y′Y ) =  the sum of the squares of all the elements of Y . Thus, we have

$$\displaystyle \begin{aligned}\int_X\mathrm{e}^{-\mathrm{tr}(AXBX')}\mathrm{d}X=|A|{}^{-\frac{3}{2}}|B|{}^{-1}\int_Y\mathrm{e}^{-\mathrm{tr}(YY')}\mathrm{d}Y. \end{aligned}$$

Observe that since tr(YY) is the sum of squares of 6 real scalar variables, the integral over Y  reduces to a multiple integral involving six integrals where each variable is over the entire real line. Hence,

$$\displaystyle \begin{aligned}\int_Y\mathrm{e}^{-\mathrm{tr}(YY')}\mathrm{d}Y=\prod_{j=1}^6\int_{-\infty}^{\infty}\mathrm{e}^{-y_j^2}\mathrm{d}y_j=\prod_{j=1}^6(\sqrt{\pi})=(\sqrt{\pi})^6. \end{aligned}$$

Note that we have denoted the sum of the six \(y_{ij}^2\) as \(y_1^2+\cdots +y_6^2\) for convenience. Thus,

$$\displaystyle \begin{aligned}\int_X\mathrm{e}^{-\mathrm{tr}(AXBX')}\mathrm{d}X=|A|{}^{-\frac{3}{2}}|B|{}^{-1}(\sqrt{\pi})^6.\end{aligned}$$

(3) In this case, X is a 2 × 2 real positive definite matrix. Let X = TT′ where T is lower triangular with positive diagonal elements. Then,

$$\displaystyle \begin{aligned}T=\left[\begin{matrix}t_{11}&0\\ t_{21}&t_{22}\end{matrix}\right],\ t_{11}>0,\ t_{22}>0,\ TT'=\left[\begin{matrix}t_{11}&0\\ t_{21}&t_{22}\end{matrix}\right]\left[\begin{matrix}t_{11}&t_{21}\\ 0&t_{22}\end{matrix}\right]=\left[\begin{matrix}t_{11}^2&t_{11}t_{21}\\ t_{11}t_{21}&t_{21}^2+t_{22}^2\end{matrix}\right], \end{aligned}$$

and \(\mathrm {tr}(TT')=t_{11}^2+(t_{21}^2+t_{22}^2),\ t_{11}>0, \ t_{22}>0,\ -\infty <t_{21}<\infty \). From Theorem 1.6.7, the Jacobian is

$$\displaystyle \begin{aligned}\mathrm{d}X=2^p\{\prod_{j=1}^pt_{jj}^{p+1-j}\}\mathrm{d}T=2^2(t_{11}^2t_{22})\,\mathrm{d}t_{11}\wedge\mathrm{d}t_{21}\wedge\mathrm{d}t_{22}. \end{aligned}$$

Therefore

$$\displaystyle \begin{aligned} \int_{X>O}\mathrm{e}^{-\mathrm{tr}(X)}\mathrm{d}X&=\int_T\mathrm{e}^{-\mathrm{tr}(TT')}[2^2(t_{11}^2t_{22})]\mathrm{d}T\\ &=\Big(\int_{-\infty}^{\infty}\mathrm{e}^{-t_{21}^2}\mathrm{d}t_{21}\Big)\Big(\int_0^{\infty}2t_{11}^2\mathrm{e}^{-t_{11}^2}\mathrm{d}t_{11}\Big)\Big(\int_0^{\infty}2t_{22}\mathrm{e}^{-t_{22}^2}\mathrm{d}t_{22}\Big)\\ &=[\sqrt{\pi}]\,[\varGamma(\frac{3}{2}))]\,[\varGamma(1)]=\frac{\pi}{2}.\end{aligned} $$

Example 1.6.3

Let A = A  > O be a constant 2 × 2 Hermitian positive definite matrix. Let \(\tilde {X}\) be a 2 × 1 vector in the complex domain and \(\tilde {X}_2>O\) be a 2 × 2 Hermitian positive definite matrix. Then, evaluate the following integrals: (1): \(\int _{\tilde {X}}\mathrm {e}^{-(\tilde {X}^{*}A\tilde {X})}\mathrm {d}\tilde {X}\); (2): \(\int _{\tilde {X}_2>O}\mathrm {e}^{-\mathrm {tr}(\tilde {X}_2)}\mathrm {d}\tilde {X}_2\).

Solution 1.6.3

(1): Since A = A  > O, there exists a unique Hermitian positive definite square root \(A^{\frac {1}{2}}\). Then,

$$\displaystyle \begin{aligned} \tilde{X}^{*}A\tilde{X}&=\tilde{X}^{*}A^{\frac{1}{2}}A^{\frac{1}{2}}\tilde{X}=\tilde{Y}^{*}\tilde{Y},\\ \tilde{Y}&=A^{\frac{1}{2}}\tilde{X}\Rightarrow \mathrm{d}\tilde{X}=|\mathrm{det}(A)|{}^{-1}\mathrm{d}\tilde{Y}\end{aligned} $$

by Theorem 1.6a.1. But \(\tilde {Y}^{*}\tilde {Y}=|\tilde {y}_1|{ }^2+|\tilde {y}_2|{ }^2\) since \(\tilde {Y}^{*}=(\tilde {y}_1^{*},\tilde {y}_2^{*})\). Since the \(\tilde {y}_j\)’s are scalar in this case, an asterisk means only the complex conjugate, the transpose being itself. However,

$$\displaystyle \begin{aligned}\int_{\tilde{y}_j}\mathrm{e}^{-|\tilde{y}_j|{}^2}\mathrm{d}\tilde{y}_j=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\mathrm{e}^{-(y_{j1}^2+y_{j2}^2)}\mathrm{d}y_{j1}\wedge\mathrm{d}y_{j2}=(\sqrt{\pi})^2=\pi \end{aligned}$$

where \(\tilde {y}_j=y_{j1}+iy_{j2},~ i=\sqrt {(-1)}, y_{j1}\) and y j2 being real. Hence

$$\displaystyle \begin{aligned} \int_{\tilde{X}}\mathrm{e}^{-(\tilde{X}^{*}A\tilde{X})}\mathrm{d}\tilde{X}&=|\mathrm{det}(A)|{}^{-1}\int_{\tilde{Y}}\mathrm{e}^{-(|\tilde{y}_1|{}^2+|\tilde{y}_2|{}^2)}\mathrm{d}\tilde{y}_1\wedge\mathrm{d}\tilde{y}_2\\ &=|\mathrm{det}(A)|{}^{-1}\Big(\prod_{j=1}^2\int_{\tilde{Y}_j}\mathrm{e}^{-|\tilde{y}_j|{}^2}\mathrm{d}\tilde{y}_j\Big)=|\mathrm{det}(A)|{}^{-1}\prod_{j=1}^2\pi\\ &=|\mathrm{det}(A)|{}^{-1}\pi^2.\end{aligned} $$

(2): Make the transformation \(\tilde {X}_2=\tilde {T}\tilde {T}^{*}\) where \(\tilde {T}\) is lower triangular with its diagonal elements being real and positive. That is,

$$\displaystyle \begin{aligned}\tilde{T}=\left[\begin{matrix}t_{11}&0\\ \tilde{t}_{21}&t_{22}\end{matrix}\right],\ \tilde{T}\tilde{T}^{*}=\left[\begin{matrix}t_{11}^2&t_{11}\tilde{t}_{21}\\ t_{11}\tilde{t}_{21}&|\tilde{t}_{21}|{}^2+t_{22}^2\end{matrix}\right] \end{aligned}$$

and the Jacobian is \(\mathrm {d}\tilde {X}_2=2^p\{\prod _{j=1}^pt_{jj}^{2(p-j)+1}\}\mathrm {d}\tilde {T}=2^2t_{11}^3t_{22}\,\mathrm {d}\tilde {T}\) by Theorem 1.6a.7. Hence,

$$\displaystyle \begin{aligned}\int_{\tilde{X}_2>O}\mathrm{e}^{-\mathrm{tr}(\tilde{X}_2)}\mathrm{d}\tilde{X}_2=\int_{\tilde{T}}2^2t_{11}^3t_{22}\mathrm{e}^{-(t_{11}^2+t_{22}^2+|\tilde{t}_{21}|{}^2)}\mathrm{d}t_{11}\wedge\mathrm{d}t_{22}\,\wedge\mathrm{d}\tilde{t}_{21}. \end{aligned}$$

But

$$\displaystyle \begin{aligned} 2\int_{t_{11}>0}t_{11}^3\mathrm{e}^{-t_{11}^2}\,\mathrm{d}t_{11}&=\int_{u=0}^{\infty}u\,\mathrm{e}^{-u}\mathrm{d}u=1,\\ 2\int_{t_{22}>0}t_{22}\mathrm{e}^{-t_{22}^2}\mathrm{d}t_{22}&=\int_{v=0}^{\infty}\mathrm{e}^{-v}\mathrm{d}v=1,\\ \int_{\tilde{t}_{21}}\mathrm{e}^{-|\tilde{t}_{21}|{}^2}\mathrm{d}\tilde{t}_{21}&=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\mathrm{e}^{-(t_{211}^2+t_{212}^2)}\mathrm{d}t_{211}\wedge\mathrm{d}t_{212}\\ &=\Big(\int_{-\infty}^{\infty}\mathrm{e}^{-t_{211}^2}\mathrm{d}t_{211}\Big)\Big(\int_{-\infty}^{\infty}\mathrm{e}^{-t_{212}^2}\mathrm{d}t_{212}\Big)\\ &=\sqrt{\pi}\sqrt{\pi}=\pi,\end{aligned} $$

where \(\tilde {t}_{21}=t_{211}+it_{212},\ i=\sqrt {(-1)}\) and t 211, t 212 real. Thus

$$\displaystyle \begin{aligned}\int_{\tilde{X}_2>O}\mathrm{e}^{-\mathrm{tr}(\tilde{X}_2)}\mathrm{d}\tilde{X}_2=\pi.\end{aligned}$$

1.7. Differential Operators

Let

where x 1, …, x p are distinct real scalar variables, \(\frac {\partial }{\partial X}\) is the partial differential operator and \(\frac {\partial }{\partial X'}\) is the transpose operator. Then, \(\frac {\partial }{\partial X}\frac {\partial }{\partial X'}\) is the configuration of all second order partial differential operators given by

Let f(X) be a real-valued scalar function of X. Then, this operator, operating on f will be defined as

For example, if \(f=x_1^2+x_1x_2+x_2^3,\) then \(\frac {\partial f}{\partial x_1}=2x_1+x_2, \frac {\partial f}{\partial x_2}=x_1+3x_2^2\), and

Let f = a 1 x 1 + a 2 x 2 + ⋯ + a p x p = A′X = X′A, A′ = (a 1, …, a p), X′ = (x 1, …, x p) where a 1, …, a p are real constants and x 1, …, x p are distinct real scalar variables. Then \(\frac {\partial f}{\partial x_j}=a_j\) and we have the following result:

Theorem 1.7.1

Let A, X and f be as defined above where f = a 1 x 1 + ⋅ + a p x p is a linear function of X, then

$$\displaystyle \begin{aligned}\frac{\partial}{\partial X}f=A.\end{aligned}$$

Letting \(f=X'X=x_1^2+\cdots +x_p^2\), \(\frac {\partial f}{\partial x_j}=2x_j\), and we have the following result.

Theorem 1.7.2

Let X be a p × 1 vector of real scalar variables so that \(X'X=x_1^2+\cdots +x_p^2\) . Then

$$\displaystyle \begin{aligned}\frac{\partial f}{\partial X}=2X.\end{aligned}$$

Now, let us consider a general quadratic form f = X′AX, A = A′, where X is a p × 1 vector whose components are real scalar variables and A is a constant matrix. Then \(\frac {\partial f}{\partial x_j}=(a_{j1}x_1+\cdots +a_{jp}x_p)+(a_{1j}x_1+a_{2j}x_2+\cdots +a_{pj}x_p)\) for j = 1, …, p. Hence we have the following result:

Theorem 1.7.3

Let f = X′AX be a real quadratic form where X is a p × 1 real vector whose components are distinct real scalar variables and A is a constant matrix. Then

$$\displaystyle \begin{aligned}\frac{\partial f}{\partial X}=\begin{cases}(A+A')X\mathit{\mbox{ for a general }} A\\ 2AX\mathit{\mbox{ when }}A=A'\end{cases}.\end{aligned}$$

1.7.1. Some basic applications of the vector differential operator

Let X be a p × 1 vector with real scalar elements x 1, …, x p. Let A = (a ij) = A′ be a constant matrix. Consider the problem of optimizing the real quadratic form u = X′AX. There is no unrestricted maximum or minimum. If A = A′ > O (positive definite), u can tend to +  and similarly, if A = A′ < O, u can go to −. However, if we confine ourselves to the surface of a unit hypersphere or equivalently require that X′X = 1, then we can have a finite maximum and a finite minimum. Let u 1 = X′AX − λ(X′X − 1) so that we have added zero to u and hence u 1 is the same as u, where λ is an arbitrary constant or a Lagrangian multiplier. Then, differentiating u 1 with respect to x 1, …, x p, equating the resulting expressions to zero, and thereafter solving for critical points, is equivalent to solving the equation \(\frac {\partial u_1}{\partial X}=O\) (null) and solving this single equation. That is,

$$\displaystyle \begin{aligned}\frac{\partial u_1}{\partial X}=O\Rightarrow 2AX-2\lambda X=O\Rightarrow (A-\lambda I)X=O. \end{aligned} $$
(i)

For (i) to have a non-null solution for X, the coefficient matrix A − λI has to be singular or its determinant must be zero. That is, |A − λI| = 0 and AX = λX or λ is an eigenvalue of A and X is the corresponding eigenvector. But

$$\displaystyle \begin{aligned}AX=\lambda X\Rightarrow X'AX=\lambda X'X=\lambda \mbox{ since }X'X=1. \end{aligned} $$
(ii)

Hence the maximum value of X′AX corresponds to the largest eigenvalue of A and the minimum value of X′AX, to the smallest eigenvalue of A. Observe that when A = A′ the eigenvalues are real. Hence we have the following result:

Theorem 1.7.4

Let u = X′AX, A = A′, X be a p × 1 vector of real scalar variables as its elements. Letting X′X = 1, then

$$\displaystyle \begin{aligned} \max_{X'X=1}[X'AX]&=\lambda_1=\mathit{\mbox{ the largest eigenvalue of }}A\\ \min_{X'X=1}[X'AX]&=\lambda_p=\mathit{\mbox{ the smallest eigenvalue of }}A.\end{aligned} $$

Principal Component Analysis where it is assumed that A > O relies on this result. This will be elaborated upon in later chapters. Now, we consider the optimization of u = X′AX, A = A′ subject to the condition X′BX = 1, B = B′. Take λ as the Lagrangian multiplier and consider u 1 = X′AX − λ(X′BX − 1). Then

$$\displaystyle \begin{aligned}\frac{\partial u_1}{\partial X}=O\Rightarrow AX=\lambda BX\Rightarrow |A-\lambda B|=0. \end{aligned} $$
(iii)

Note that X′AX = λX′BX = λ from (i). Hence, the maximum of X′AX is the largest value of λ satisfying (i) and the minimum of X′AX is the smallest value of λ satisfying (i). Note that when B is nonsingular, |A − λB| = 0 ⇒|AB −1 − λI| = 0 or λ is an eigenvalue of AB −1. Thus, this case can also be treated as an eigenvalue problem. Hence, the following result:

Theorem 1.7.5

Let u = X′AX, A = A′ where the elements of X are distinct real scalar variables. Consider the problem of optimizing X′AX subject to the condition X′BX = 1, B = B′, where A and B are constant matrices, then

$$\displaystyle \begin{aligned} \max_{X'BX=1}[X'AX]&=\lambda_1=\mathit{\mbox{ largest eigenvalue of }} AB^{-1},\ |B|\ne 0\\ &= \mathit{\mbox{ the largest root of }}|A-\lambda B|=0;\\ \min_{X'BX=1}[X'AX]&=\lambda_p =\mathit{\mbox{ smallest eigenvalue of }} AB^{-1}, \ |B|\ne 0\\ &=\mathit{\mbox{ the smallest root of }} |A-\lambda B|=0.\end{aligned} $$

Now, consider the optimization of a real quadratic form subject to a linear constraint. Let u = X′AX, A = A′ be a quadratic form where X is p × 1. Let B′X = X′B = 1 be a constraint where B′ = (b 1, …, b p), X′ = (x 1, …, x p) with b 1, …, b p being real constants and x 1, …, x p being real distinct scalar variables. Take 2λ as the Lagrangian multiplier and consider u 1 = X′AX − 2λ(X′B − 1). The critical points are available from the following equation

$$\displaystyle \begin{aligned} \frac{\partial}{\partial X}u_1=O&\Rightarrow 2AX-2\lambda B=O\Rightarrow X=\lambda A^{-1}B, \mbox{ for }|A|\ne 0\\ &\Rightarrow B'X=\lambda B'A^{-1}B\Rightarrow \lambda=\frac{1}{B'A^{-1}B}.\end{aligned} $$

In this problem, observe that the quadratic form is unbounded even under the restriction B′X = 1 and hence there is no maximum. The only critical point corresponds to a minimum. From AX = λB, we have X′AX = λX′B = λ. Hence the minimum value is λ = [B′A −1 B]−1 where it is assumed that A is nonsingular. Thus following result:

Theorem 1.7.6

Let u = X′AX, A = A′, |A|≠0. Let B′X = 1 where B′ = (b 1, …, b p) be a constant vector and X is p × 1 vector of real distinct scalar variables. Then, the minimum of the quadratic form u, under the restriction B′X = 1 where B is a constant vector, is given by

$$\displaystyle \begin{aligned}\min_{B'X=1}[X'AX]=\frac{1}{B'A^{-1}B}.\end{aligned}$$

Such problems arise for instance in regression analysis and model building situations. We could have eliminated one of the variables with the linear constraint; however, the optimization would still involve all other variables, and thus not much simplification would be achieved by eliminating one variable. Hence, operating with the vector differential operator is the most convenient procedure in this case.

We will now consider the mathematical part of a general problem in prediction analysis where some variables are predicted from another set of variables. This topic is related to Canonical Correlation Analysis. We will consider the optimization part of the problem in this section. The problem consists in optimizing a bilinear form subject to quadratic constraints. Let X be a p × 1 vector of real scalar variables x 1, …, x p, and Y  be a q × 1 vector of real scalar variables y 1, …, y q, where q need not be equal to p. Consider the bilinear form u = X′AY  where A is a p × q rectangular constant matrix. We would like to optimize this bilinear form subject to the quadratic constraints X′BX = 1, Y′CY = 1, B = B′ and C = C′ where B and C are constant matrices. In Canonical Correlation Analysis, B and C are constant real positive definite matrices. Take λ 1 and λ 2 as Lagrangian multipliers and let \(u_1=X'AY-\frac {\lambda _1}{2}(X'BX-1)-{\frac {\lambda _2}{2}}(Y'CY-1)\). Then

$$\displaystyle \begin{aligned} \frac{\partial}{\partial X}u_1=O&\Rightarrow AY-\lambda_1BX=O\Rightarrow AY=\lambda_1BX\\ &\Rightarrow X'AY=\lambda_1X'BX=\lambda_1; \end{aligned} $$
(i)
$$\displaystyle \begin{aligned} \frac{\partial}{\partial Y}u_1=O&\Rightarrow A'X-\lambda_2CY=O\Rightarrow A'X={\lambda_2}CY\\ &\Rightarrow Y'A'X=\lambda_2 Y'CY=\lambda_2.\end{aligned} $$
(ii)

It follows from (i) and (ii) that λ 1 = λ 2  = λ, say. Observe that X′AY  is 1 × 1 so that X′AY = Y′A′X. After substituting λ to λ 1 and λ 2, we can combine equations (i) and (ii) in a single matrix equation as follows:

$$\displaystyle \begin{aligned} \left[\begin{matrix}-\lambda B&A\\ A'&-\lambda C\end{matrix}\right]\left[\begin{matrix}X\\ Y\end{matrix}\right]&=O\Rightarrow\\ \left\vert\begin{matrix}-\lambda B&A\\ A'&-\lambda C\end{matrix}\right\vert&=0.\end{aligned} $$
(iii)

Opening up the determinant by making use of a result on partitioned matrices from Sect. 1.3, we have

$$\displaystyle \begin{aligned} |-\lambda B|~|-\lambda C-A'(-\lambda B)^{-1}A|&=0,\ |B|\ne 0\Rightarrow\\ |A'B^{-1}A-\lambda^2C|&=0.\end{aligned} $$
(iv)

Then ν = λ 2 is a root obtained from Eq. (iv). We can also obtain a parallel result by opening up the determinant in (iii) as

$$\displaystyle \begin{aligned}|-\lambda C|~|-\lambda B-A(-\lambda C)^{-1}A'|=0\Rightarrow |AC^{-1}A'-\lambda^2B|=0,\ |C|\ne 0.\end{aligned} $$
(v)

Hence we have the following result.

Theorem 1.7.7

Let X and Y  be respectively p × 1 and q × 1 real vectors whose components are distinct scalar variables. Consider the bilinear form X′AY  and the quadratic forms X′BX and Y ′CY  where B = B′, C = C′, and B and C are nonsingular constant matrices. Then,

$$\displaystyle \begin{aligned} \max_{X'BX=1,Y'CY=1}[X'AY]&=|\lambda_1|\\ \min_{X'BX=1,Y'CY=1}[X'AY]&=|\lambda_p|\end{aligned} $$

where \(\lambda _1^2\) is the largest root resulting from equation (iv) or (v) and \(\lambda _p^2\) is the smallest root resulting from equation (iv) or (v).

Observe that if p < q, we may utilize equation (v) to solve for λ 2 and if q < p, then we may use equation (iv) to solve for λ 2, and both will lead to the same solution. In the above derivation, we assumed that B and C are nonsingular. In Canonical Correlation Analysis, both B and C are real positive definite matrices corresponding to the variances X′BX and Y′CY  of the linear forms and then, X′AY  corresponds to covariance between these linear forms.

Note 1.7.1

We have confined ourselves to results in the real domain in this subsection since only real cases are discussed in connection with the applications that are considered in later chapters, such as Principal Component Analysis and Canonical Correlation Analysis. The corresponding complex cases do not appear to have practical applications. Accordingly, optimizations of Hermitian forms will not be discussed. However, parallel results to Theorems 1.7.11.7.7 could similarly be worked out in the complex domain.