Generalized Visual Information Analysis via Tensorial Algebra

High order data is modeled using matrices whose entries are numerical arrays of a fixed size. These arrays, called t-scalars, form a commutative ring under the convolution product. Matrices with elements in the ring of t-scalars are referred to as t-matrices. The t-matrices can be scaled, added and multiplied in the usual way. There are t-matrix generalizations of positive matrices, orthogonal matrices and Hermitian symmetric matrices. With the t-matrix model, it is possible to generalize many well-known matrix algorithms. In particular, the t-matrices are used to generalize the SVD (Singular Value Decomposition), HOSVD (High Order SVD), PCA (Principal Component Analysis), 2DPCA (Two Dimensional PCA) and GCA (Grassmannian Component Analysis). The generalized t-matrix algorithms, namely TSVD, THOSVD,TPCA, T2DPCA and TGCA, are applied to low-rank approximation, reconstruction,and supervised classification of images. Experiments show that the t-matrix algorithms compare favorably with standard matrix algorithms.


Introduction
In data analysis, machine learning and computer vision, the data are often given in the form of multi-dimensional arrays of numbers. For example, an RGB image has three dimensions, namely two for the pixel array and a third dimension for the values of the pixels. An RGB image is said to be an array of order three. Alternatively, the RGB image is said to have three modes or to be three-way. A video sequence of images is of order four, with two dimensions for the pixel array, one dimension for time and a fourth dimension for the pixel values.
One way of analyzing multi-dimensional data is to remove the array structure by flattening, to obtain a vector. A set of vectors obtained in this way can be analyzed using standard matrix-vector algorithms such as the singular value decomposition (SVD) and principal components analysis (PCA). An alternative to flattening is to use algorithms that preserve the multi-dimensional structure. In these algorithms, the elements of matrices and vectors are entire arrays rather than real numbers in R or complex numbers in C. Multi-dimensional arrays with the same dimensions can be added in the usual way, but there is no definition of multiplication which satisfies the requirements for a field such as R or C. However, multiplication based on the convolution product has 1 LIAOLIANGIS@126.COM, SJMAYBANK@DCS.BBK.AC.UK arXiv:2001.11708v1 [cs.CV] 31 Jan 2020 many but not all of the properties of a field. Convolution multiplication differs from the multiplication in a field in that many elements have no multiplicative inverse. The multi-dimensional arrays with given dimensions form a commutative ring under the convolution product. The elements of this ring are referred to as t-scalars.
An application of the Fourier transform shows that each ring of t-scalars under the convolution product is isomorphic to a ring of arrays in which the Hadamard product defines the multiplication. In effect, the ring obtained by applying the Fourier transform splits into a product of copies of C. It is this splitting which allows the construction of new algorithms for analyzing tensorial data without flattening. The so-called t-matrices with t-scalar entries have many of the properties of matrices with elements in R or C. In particular, t-matrices can be scaled, added and multiplied. There is an additive identity and a multiplicative identity. The determinant of a t-matrix is defined and a given tmatrix is invertible if and only if it has an invertible determinant. The t-matrices include generalizations of positive matrices, orthogonal matrices and symmetric matrices.
A tensorial version, TSVD, of the SVD is described in [18] and [41]. The TSVD expresses a t-matrix as the product of three t-matrices, of which two are generalizations of the orthogonal matrices and one is a diagonal matrix with positive t-scalars on the diagonal. The TSVD is used to define tensorial versions of principal components analysis (PCA) and two dimensional PCA (2DPCA). A tensorial version of Grassmannian components analysis is also defined. These tensorial algorithms are tested by experiments that include low-rank approximations to tensors, reconstruction of tensors and terrain classification using hyperspectral images. The different algorithms are compared using the peak signal to noise ratio and Cohen's kappa.
The t-scalars are described in Section 2 and the t-matrices are described in Section 3. The TSVD is described in Section 4. A tensorial version of principal components analysis (TPCA) is obtained from the TSVD in Section 5 and then generalized to tensorial two dimensional PCA (T2DPCA). A tensorial version of Grassmannian components analysis is also defined. The tensorial algorithms are tested experimentally in Section 6. Some concluding remarks are made in section 7.
1.1. Related Work. A tensor of order two or more can be simplified using the socalled N mode singular value decomposition (SVD). The three mode case is described by Tucker in [30]. The multi-modal case is discussed in detail by De Lathauwer et al. in [6]. Each mode of the tensor has an associated set of vectors, each one of which is obtained by varying the index for the given mode while keeping the indices of the other modes fixed. In the N mode SVD, an orthonormal basis is obtained for the space spanned by these vectors. In the 2-mode case, the result is the usual SVD. The resulting decomposition of a tensor is referred to as the higher-order SVD (HOSVD). Surveys of tensor decompositions can be found in Kolda and Bader [19] and Sidiropoulos et al. [27]. De Lathauwer et al. [6] describe a higher-order eigenvalue decomposition. Vasilescu and Terzopoulos [34] use the N mode SVD to simplify a fifth-order tensor constructed from face images taken under varying conditions and with varying expressions. A tensor version of the singular value decomposition is described in [18], [41], and [17].
He et al. [15] sample a hyperspectral data cube to yield tensors of order three of which two orders are for the pixel array and one order is for the hyperspectral bands. A training set of samples is used to produce a dictionary for sparse classification. Lu et al. [23] use N -mode analysis to obtain projections of tensors to a lower-dimensional space. The resulting multilinear PCA is applied to the classification of gait images. Vannieuwenhoven et al. [32] describe a new method for truncating the higher-order SVD, to obtain low-rank multilinear approximations to tensors. The method is tested on the classification of handwritten digits and the compression of a database of face images.
Many authors have studied algebras of matrices in which the elements are tensors of order one, equipped with a convolution multiplication, under which they form a commutative ring R with a multiplicative identity. In particular, Gleich et al. [11] describe the generalized eigenvalues and eigenvectors of matrices with elements in R and show how the standard power method for finding an eigenvector and the standard Arnoldi method for constructing an orthogonal basis for a Krylov subspace can both be generalized. Braman [3] shows that the t-vectors with a given dimension form a free module over R. Kilmer and Martin [18] show that many of the properties and structures of canonical matrices and vectors can be generalized. Their examples include transposition, orthogonality and the singular value decomposition (SVD). The tensor SVD is used to compress tensors. A tensor-based method for image de-blurring is also described. Kilmer et al. [17] generalize the inner product of two vectors, suggest a notion of the angle between two vectors with elements in R, and define a notion of orthogonality for two vectors. A generalization of the Gram-Schmidt method for generating an orthonormal set of vectors is also described in [17].
Zhang et al. [41] use the tensor SVD to store video sequences efficiently and also to fill in missing entries in video sequences. Zhang et al. [39] use a randomized version of the tensor SVD to produce low-rank approximations to matrices. Ren et al. [28] define a tensor version of principal component analysis and use it to extract features from hyperspectral images. The features are classified using standard methods such as support vector machines and nearest neighbors. Liao et al. [20] generalize a sparse representation classifier to tensor data and apply the generalized classifier to image data such as numerals and faces. Chen et al. [4] use a four-dimensional HOSVD to detect changes in a time sequence of hyperspectral images. The K-means clustering algorithm is used to classify the pixel values as changed or unchanged. Fan et al. [8] model a hyperspectral image as the sum of an ideal image, a sparse noise term and a Gaussian noise term. A product of two low-rank tensors models the ideal image. The low-rank tensors are estimated by minimizing a penalty function obtained by adding the squared errors in a fit of the hyperspectral image to penalty terms for the sparse noise and the sizes of the two low-rank tensors. Lu et al. [22] approximate a third-order tensor using the sum of a low-rank tensor and a sparse tensor. Under suitable conditions, the low-rank tensor and the sparse tensor are recovered exactly.

T-scalars
The notations for t-scalars are summarized in Section 2.1. Basic definitions are given in Section 2.2. The Fourier transform of a t-scalar is defined in Section 2.3. Properties of t-scalars and the Fourier transform of a t-scalar are described in Section 2.4. A generalization of the t-scalars is described in Section 3.5.

Notations and
Preliminaries. An array of order N over the complex numbers C is an element of the set C defined by C ≡ C I 1 ×...×I N , where the I n for 1 ≤ n ≤ N are strictly positive integers. Similarly, an array of order N over the real numbers is an element of the set R defined by R ≡ R I 1 ×...×I N . The sets R and C have the structure of commutative rings, in which the product is defined by circular convolution. The elements of C and R are referred to as t-scalars.
Elements of R and C are denoted by lower case letters and tensorial data are denoted by upper case letters. The t-scalars are identified using the subscript T , for example, X T . Lower case subscripts such as i, j, α, β are indices or lists of indices.
All indices begin from 1 rather than 0. Given an array of any order N , namely X ∈ The notation X i , or (X) i , is also used, where i is a multi-index defined by i = (i 1 , · · · , i N ). Let I = (I 1 , I 2 , . . . , I N ) and let i be a multi-index. The notation 1 ≤ i ≤ I specifies the range of values of i such that 1 ≤ i n ≤ I n for 1 ≤ n ≤ N . It is often convenient to extend the indexing beyond the range specified by I. Let j be a general multi-index. Then X j is defined by X j = X i , where i is the multi-index such that each component i n is in the range 1 ≤ i n ≤ I n and i n − j n is divisible by I n . A multi-index such as i − j + 1 has components i n − j n + 1 for 1 ≤ n ≤ N . The sum I i=1 (·) is an abbreviation for

2.2.
Definitions. The following definitions are for t-scalars in C. Similar definitions can be made for t-scalars in R.
Definition 2.1. T-scalar addition. Given t-scalars X T and Y T in C, the addition of Definition 2.2. T-scalar multiplication. Given t-scalars X T and Y T in C, their product, denoted by D T = X T • Y T is a t-scalar in C defined by the circular convolution Definitions 2.1 and 2.2 reduce to complex number addition and multiplication when N = 1 and I 1 = 1.
Definition 2.3. Zero t-scalar. The zero t-scalar Z T is the array in C defined by Definition 2.4. Identity t-scalar. The identity t-scalar E T in C has the first entry equal to 1 and all other entries equal to 0, namely, E T,i = 1 if i = (1, · · · , 1) and E T,i = 0 otherwise.
The set of t-scalars satisfies the axioms of a commutative ring with Z T as an additive identity and E T as a multiplicative identity. This ring of t-scalars is denoted by (C, +, •).
The ring (C, +, •) is a generalization of the field (C, +, ·) of complex numbers. If the t-scalars are restricted to have real number elements, then the ring (R, +, •) is obtained.
2.3. Fourier Transform of a T-scalar. Let ζ n be a primitive I n -th root of unity, for example, Let ζ n be the complex conjugate of ζ n and let X T be a t-scalar in the ring C. The Fourier transform F (X T ) of X T is defined by The inverse of the Fourier transform is defined by Given t-scalars X T ∈ C and Y T ∈ C and their t-scalar product 4) where * denotes the Hadamard product in C. Equation (2.4) is an extension of the convolution theorem [2]. The equation can be equivalently rewritten as where · is multiplication in C.
An equivalent definition of the Fourier transform of a high-order array in the form of multi-mode tensor multiplication and a diagram of the multiplication of two t-scalars, computed in the Fourier domain, is given in a supplementary file.
It is not difficult to prove that C is a commutative ring, (C, +, * ), under the Hadamard product. The Fourier transform is a ring isomorphism from (C, +, •) to (C, +, * ). The identity element of (C, +, * ) is J T = F (E T ). All the entries of J T are equal to 1.

2.4.
Properties of t-scalars. The invertible t-scalars are defined as follows.
The zero t-scalar Z T is non-invertible. In addition, there is an infinite number of tscalars that are non-invertible. For example, given a t-scalar X T ∈ C, if the entries of X T are all equal, then X T is non-invertible. The existence of more than one noninvertible element shows that C is not a field.
Definition 2.6. Scalar multiplication of a t-scalar. Given a scalar λ ∈ C and a t-scalar X T ∈ C, their product, denoted by Y T = λ · X T ≡ X T · λ is the t-scalar given by It can be shown that the set of t-scalars is a vector space over C.
The following definition of the conjugate of a t-scalar generalizes the conjugate of a complex number.
Definition 2.7. Conjugate of a t-scalar. Given a t-scalar X T in C, its conjugate, denoted by conj(X T ), is the t-scalar in C such that where X T,2−i is the complex conjugate of X T,2−i in C.
The conjugate of a t-scalar reduces to the conjugate of a complex number when N = 1, I 1 = 1. The relationship of conj(X T ) and X T is much clearer if they are mapped to the Fourier domain -each entry of F (conj(X T )) is the complex conjugate of the corresponding entry of F (X T ), namely It follows from equation (2.7) that conj(conj(X T )) = X T for any X T ∈ C.
Definition 2.8. Self-conjugate t-scalar: Given a t-scalar X T ∈ C, if X T = conj(X T ), then X T is said to be a self-conjugate t-scalar.
If X T is self conjugate, then It follows from equation (2.9) that X T is self-conjugate if and only if all the elements of F (X T ) are real numbers.
The t-scalars Z T and E T are both self-conjugate. Furthermore, the self-conjugate tscalars form a ring denoted by C sc . This ring is a subring of C.
Given any t-scalar X T ∈ C, let (X T ) and (X T ) be defined by It follows from equation (2.9) that (X T ) and (X T ) are self-conjugate. The t-scalars X T ∈ C and conj(X T ) ∈ C can be expressed in the form In an analogy with the real and imaginary parts of a complex number, (X T ) is called the real part of X T and (X T ) is called the imaginary part of X T .
Given two t-scalars X T and Y T , the equations (2.14) hold true and are backward compatible with the corresponding equations for complex numbers.
Definition 2.9. Nonnegative t-scalar: The t-scalar X T is said to be nonnegative if there exists a self-conjugate t-scalar Y T such that If a t-scalar X T is nonnegative, it is also self-conjugate, because the multiplication of any two self-conjugate t-scalars is also a self-conjugate t-scalar. Thus, both Z T and E T are nonnegative, since Z T and E T are self-conjugate t-scalars and satisfy Z T = Z 2 T and E T = E 2 T . Furthermore, for all X T ∈ C, the ring element (X T ) 2 + (X T ) 2 is nonnegative.
The set S nonneg of nonnegative t-scalars is closed under the t-scalar addition and multiplication. Since a nonnegative t-scalar is also a self-conjugate t-scalar, S nonneg ⊂ C sc ⊂ C. Theorem 2.10. For all t-scalars X T ∈ S nonneg , there exists a unique t-scalar S T ∈ S nonneg satisfying X T = S T • S T . = S 2 T . We call the nonnegative t-scalar S T the arithmetic square root of the nonnegative t-scalar X T and denote it by On applying the Fourier transform, it follows that Let S T be defined such that where the nonnegative square root is chosen for each value of i. The Fourier components F (S T ) i are real valued, thus S T is self conjugate. The equation X T = S T • S T holds because the Fourier transform is injective.
Definition 2.11. A nonnegative t-scalar that is invertible under multiplication is called a positive t-scalar. The set of positive t-scalars is denoted by S pos .
The following inclusions are strict, S pos ⊂ S nonneg ⊂ C sc ⊂ C. The inverse and the arithmetic square root of a positive t-scalar are positive.
The absolute t-value r(X T ) of X T is defined by The t-scalars (X T ) and (X T ) are both self-conjugate, therefore (X T ) 2 and (X T ) 2 are both nonnegative . The sum (X T ) 2 + (X T ) 2 is nonnegative and it has a nonnegative arithmetical square root, namely r(X T ).
If r(X T ) is invertible, then let φ(X T ) be defined by The ring element φ(X T ) is a generalized angle. The order 1 version of φ(X T ) is obtained by Gleich et al. in [11]. Equation (2.17) generalizes the polar form of a complex number. It can be shown that φ(X T ) • conj(φ(X T )) = E T . The absolute t-value r(X T ) is used in Section 3 to define a generalization of the Frobenius norm for t-matrices.

Matrices with T-Scalar Elements
It is shown that t-matrices, i.e. matrices with elements in the rings C or R, are in many ways analogous to matrices with elements in C or R.

3.1.
Indexing. The t-matrices are order-two arrays of t-scalars. Since the t-scalars are arrays of complex numbers, it is convenient to organize t-matrices as hierarchical arrays of complex numbers.
Let X TM be a t-matrix with D 1 rows and D 2 columns. Then X TM is an element of C D 1 ×D 2 . The (α, β) entry of X TM is the element of C denoted by X TM ,α,β for 1 ≤ α ≤ D 1 and 1 ≤ β ≤ D 2 . Let i be a multi-index for elements of C. Then X TM ,i,α,β is the element of C given as the i-th entry of the ring element X TM ,α,β .
The t-matrix X TM can be interpreted as an element in C I 1 ×···×I N ×D 1 ×D 2 or alternatively it can be interpreted as an element in C D 1 ×D 2 ×I 1 ×···×I N . The only thing needed to switch from one data structure to the other is a permutation of indices. The data structure C I 1 ×···×I N ×D 1 ×D 2 is chosen unless otherwise indicated.
(2) T-matrix multiplication: Given any t-matrices A TM ∈ C D 1 ×Q and B TM ∈ C Q×D 2 , their product, denoted by C TM .
(3) Identity t-matrix : The identity t-matrix is the diagonal t-matrix, in which each diagonal entry is equal to the identity t-scalar E T in Definition 2.4. The D × D identity t-matrix is denoted by I Given any X TM ∈ C D 1 ×D 2 , it follows that I TM is also denoted by I TM if the value of D can be inferred from context.
(4) Scalar multiplication: Given any A TM ∈ C D 1 ×D 2 and λ ∈ C, their multiplication, denoted by B TM . = λ · A TM , is the t-matrix in C D 1 ×D 2 defined by where the products with λ are computed as in Definition 2.6.
(5) T-scalar multiplication: Given any A TM ∈ C D 1 ×D 2 and λ T ∈ C, their product, denoted by B TM .
(6) Conjugate transpose of a t-matrix : Given any t-matrix X TM ∈ C D 1 ×D 2 , its conjugate transpose, denoted by X H TM is the t-matrix in C D 2 ×D 1 given by The Fourier transform F is extended to t-matrices element-wise, i.e. F (X TM ) is the D 1 × D 2 t-matrix defined by F (X TM ) α,β = F (X TM ,α,β ) . It is not difficult to prove that (7) T-vector dot product and the Frobenius norm: Given any two t-vectors (i.e., two t-matrices, each having only one column) X TV and Y TV of the same length D, their dot product is the t-scalar defined by If X TV , Y TV = Z T , then X TV and Y TV are said to be orthogonal. The nonnegative t-scalar X TV , X TV is called the generalized norm of X TV and denoted by where r(·) is the absolute t-value as defined by equation (2.16). The generalized Frobenius norm of a D 1 × D 2 t-matrix W TM is defined by In order to have a mechanism to connect t-matrices with matrices with elements in C or R, the slices of a t-matrix are defined as follows.
(8) Slice of a t-matrix : Any t-matrix X TM ∈ C D 1 ×D 2 , organized as an array in The t-vectors with a given dimension form an algebraic structure called a module over the ring C [16]. Modules are generalizations of vector spaces [17]. The t-vector whose entries are all equal to Z T is denoted by Z TV , and called the zero t-vector. The next step is to define what is meant by a set of linearly independent t-vectors and what is meant by a full column rank t-matrix.
(9) Linear independence in t-vector module: The t-vectors in a subset {X TV ,1 , X TV ,2 , · · · , X TV ,K } of a t-vector module are said to be linearly independent if the equation If the t-vectors X TV ,i , 1 ≤ i ≤ K, are linearly independent then they are said to have a rank of K. If the t-vectors Y TV ,i for 1 ≤ i ≤ K are linearly independent and span the same sub-module as the X TM ,i then K = K . For further information see [16].
(10) Full column rank t-matrix: A t-matrix is said to be of full column rank if all its column t-vectors are linearly independent.
3.3. T-matrix Analysis via the Fourier Transform. The Fourier transform of the t-matrix X TM ∈ C D 1 ×D 2 is the t-matrix in C D 1 ×D 2 given by equation (3.1).
Many t-matrix computations can be carried out efficiently using the Fourier transform. For example, any multiplication can be decomposed to N n=1 I n matrix multiplications over the complex numbers, namely is the canonical identity matrix with elements in C.
The Fourier transform decomposes a t-matrix computation such as multiplication to N n=1 I n independent complex matrix computations in the Fourier domain. The i-th (1 ≤ i ≤ I) computation involves only the i-th slices of the associated t-matrices. This fact underlies an approach for speeding-up t-matrix algorithms using parallel computations. This independence of the data in the Fourier domain makes it possible to implement parallel computing using the so-called vectorization programming (also known as array programming), which is supported by many programming languages including MATLAB, R, NumPy, Julia, and Fortran.

3.4.
Pooling. Sometimes, it is necessary to have a pooling mechanism to transform t-scalars to scalars in R or C. Given any t-scalar X T ∈ C, its pooling result P (X T ) ∈ C is defined by The pooling operation for t-matrices transforms each t-scalar entry to a scalar. More formally, given any t-matrix Y TM ∈ C D 1 ×D 2 , its pooling result P (Y TM ) is by definition the matrix in C D 1 ×D 2 given by The pooling of t-vectors is a special case of equation (3.7).
3.5. Generalized tensors. Generalized tensors, called g-tensors, generalize t-matrices and canonical tensors. The generalized tensors defined in this section are used to construct the higher order TSVD in Section 4.2. A g-tensor, denoted by X GT ∈ C D 1 ×D 2 ×···×D M , is a generalized tensor with t-scalar entries (i.e., an order-M array of t-scalars). Its t-scalar entries are indexed by (X GT ) α 1 ,··· ,α M . Then, a generalized mode- The generalized mode-k flattening of a g-tensor Dm . Each column of the matrix is obtained by holding the indices in K 2 fixed and varying the index in K 1 .
The generalized mode-k multiplication defined in equation (3.8) can also be expressed in terms of unfolded g-tensors: are respectively the generalized mode-k flattening of the g-tensors M GT and X GT .

Tensor Singular Value Decomposition
The singular value decomposition (SVD) is a well known factorization of real or complex matrices [12]. It generalizes the eigen-decomposition of positive semi-definite normal matrices to non-square and non-normal matrices. The SVD has a wide range of applications in data analytics, including computing the pseudo-inverse of a matrix, solving linear least squares problems, low-rank approximation and linear and multi-linear component analysis. A tensor version TSVD of the SVD is described in Section 4.1, and then applied in Section 4.2 to obtain a tensor version, THOSVD, of the Higher Order SVD (HOSVD). Further information about the TSVD can be found in [18] and [41].
4.1. TSVD: Tensorial SVD. Algorithm. A tensor version, TSVD, of the singular value decomposition is described in this section and then applied in Section 4.2 to obtain a tensor version of the High Order SVD (HOSVD). See [41] and [18].
. The TSVD of X TM yields the following three t-matrices U TM ∈ C D 1 ×Q , S TM ∈ C Q×Q and V TM ∈ C D 2 ×Q , such that TM , S TM = diag(λ T,1 , · · · , λ T,Q ) and λ T,1 , · · · , λ T,Q ∈ C are nonnegative, and satisfy F (λ T, The t-matrices U TM and V TM are generalizations of the orthogonal matrices in the SVD of a matrix with elements in R or C. Although it is possible to compute U TM , S TM and V TM in the spatial domain, it is preferable to organize the TSVD algorithm in the Fourier domain, because of the observation in Section 2.3 that the Fourier transform converts the convolution product to the Hadamard product. The TSVD of X TM can be decomposed into N n=1 I n SVDs of complex number matrices given by the slices of the Fourier transform F (X TM ). The t-matrices U TM , S TM and V TM in equation (4.1) are obtained in Algorithm 1.
Inputs: A t-matrix X TM ∈ C D 1 ×D 2 as in equation (4.1) and the t-scalar dimensions I. Outputs: The t-matrices U TM , S TM and V TM as in equation (4.1) denotes the conjugate transpose of the complex matrix V mat .

4:
If X TM is defined over R, then U TM , S TM and V TM can be chosen such that they are defined over R. It is sufficient to choose the slicesŨ When the t-scalar dimensions are given by N = 1, I 1 = 1, TSVD reduces to the canonical SVD of a matrix in C D 1 ×D 2 . The properties of the SVD can be used to show that the t-matrix S TM in Algorithm 1 is unique. The t-matrices U TM and V TM are not unique.
TSVD Approximation. TSVD can be used to approximate data. Given a t-matrix X TM ∈ C D 1 ×D 2 , let Q . = min(D 1 , D 2 ) and let the TSVD of X TM be computed as in equation (4.1). The low-rank approximationX TM of X TM with rank of r (1 ≤ r ≤ Q) is defined byX When the t-scalar dimensions are given by N = 1, I 1 = 1, equation (4.2) reduces to the SVD low-rank approximation to a matrix in C D 1 ×D 2 .
Furthermore, we contend that the approximationX TM computed as in equation (4.2) is the solution of the following optimization problem where · F denotes the generalized Frobenius norm of a t-matrix, which is a nonnegative t-scalar, as defined in equation (3.3). The result X approx TM generalizes the Eckart-Young-Mirsky theorem [7].
To have an optimization problem in the form of (4.3), the notation rank(·), i.e., the rank of a t-matrix, and min(·), i.e., the minimization of a nonnegative t-scalar variable belonging to a subset of S nonneg , and the ordering relationship ≤ between two nonnegative t-scalars need to be defined.
These definitions generalize their canonical counterparts. The definitions and the generalized Eckart-Young-Mirsky theorem are discussed in an appendix. 4.2. THOSVD: Tensor Higher Order SVD. In multilinear algebra, the higher order singular value decomposition (HOSVD), also known as the orthogonal Tucker decomposition of a tensor, is a generalization of the SVD. It is commonly used to extract directional information from multi-way arrays [30,6]. The applications of HOSVD include data analytics [32,29], machine learning [33,34,23], DNA and RNA analysis [26,25] and texture mapping in computer graphics [35].
On using the t-scalar algebra, the HOSVD can be generalized further to obtain a tensorial HOSVD, called THOSVD. The THOSVD is obtained by replacing the complex number elements of each multi-way array by t-scalar elements. Based on the definitions of g-tensors in Section 3.5, the THOSVD of X GT ∈ C D 1 ×D 2 ×···×D M is given by the following generalized mode-k multiplications.
Given a g-tensor X GT ∈ C D 1 ×D 2 ×···×D M , the THOSVD of X GT , as in equation (4.4), is obtained in Algorithm 2, using a strategy analogous to that of Tucker [30] and De Lathauwer et al. [6] for computing the HOSVD of a tensor with elements in R or C.

Algorithm 2 THOSVD
Input: Outputs: U TM ,1 , U TM ,2 , · · · , U TM ,M and S GT as in equation (4.4) Construct the generalized mode-k flattening Note that THOSVD generalizes the HOSVD for canonical tensors, TSVD for t-matrices, and SVD for canonical matrices. Many SVD and HOSVD based algorithms can be generalized by TSVD and THOSVD, respectively.

Tensor Based Algorithms
Three tensor based algorithms are proposed. They are Tensorial Principal Component Analysis (TPCA), Tensorial Two-Dimensional Principal Component Analysis (T2DPCA) and Tensorial Grassmannian Component Analysis (TGCA). TPCA and T2DPCA are generalizations of the well-known algorithms PCA and 2DPCA [37]. TGCA is a generalization of the recent GCA algorithm [14,13]. It is possible to generalize many other linear or multi-linear algorithms using similar methods.
5.1. TPCA: Tensorial Principal Component Analysis. Principal Component Analysis (PCA) is a well known algorithm for extracting the prominent components of observed vectors. PCA is generalized to TPCA in a straightforward manner. Let X TV ,1 , · · · , X TV ,K ∈ C D be K given t-vectors. Then, the covariance-like t-matrix G TM ∈ C D×D is defined by The t-matrix U TM ∈ C D×D is computed from the TSVD of G TM as in Algorithm 1. Then, given any t-vector In algebraic terminology, the column t-vectors of U TM span a linear sub-module of tvectors, which is a generalization of a vector subspace [3]. In this sense, each t-scalar entry of Y feat TV is a generalized coordinate of the projection of the t-vector (Y TV −X TV ) onto the sub-module. The low-rank reconstruction Y rec TV ∈ C D with the parameter d is given by where (U TM ) :,1:d ∈ C D×d denotes the t-matrix containing the first d t-vector columns of U TM ∈ C D×D and (Y feat Note that PCA is a special case of TPCA. When the t-scalar dimensions are given by N = 1, I 1 = 1, TPCA reduces to PCA. The algorithm 2DPCA is an extension of PCA proposed by Yang et al. [37] for analysing the principal components of matrices. Although 2DPA is written in a non-centred rowvector oriented form in the original paper [37], it is rewritten here in a centred columnvector oriented form, which is consistent with the formulation of PCA. The centred column-vector oriented form of 2DPCA is chosen for discussing its generalization to T2DPCA (Tensorial 2DPCA).
Similar to TPCA, T2DPCA also finds sub-modules, but they are obtained by analysing t-matrices. Let X TM ,1 , · · · , X TM ,K ∈ C D 1 ×D 2 be the K observed t-matrices. Then, the Hermitian covariance-like t-matrix G TM ∈ C D 1 ×D 1 is given by Then, the t-matrix U TM ∈ C D 1 ×D 1 is computed from the TSVD of G TM as in Algorithm 1. Given any t-matrix Y TM ∈ C D 1 ×D 2 , its feature t-matrix Y feat TM ∈ C D 1 ×D 2 is a centred t-matrix projection (i.e., a collection of centred column t-vector projections) on the module spanned by U TM , namely The T2DPCA reconstruction with the parameter d is given by Y rec TM ∈ C D 1 ×D 2 as follows. 5.3. TGCA: Tensorial Grassmannian Component Analysis. A t-matrix algorithm which generalizes the recent algorithm for Grassmannian Component Analysis (GCA) is proposed. An example of GCA an be found in [13], where it forms part of an algorithm for sparse coding on Grassmann manifolds. In this section GCA is extended to its generalized version called TGCA (Tensorial GCA).
In TGCA, each measurement is a set of t-vectors organized into a "thin" t-matrix, with the number of rows larger than the number of columns. Let X TM ,1 , · · · , X TM ,K ∈ C D×d (D > d) be the observed t-matrices. Then, the t-vector columns of each t-matrix are first orthogonalized. Using the t-scalar algebra, it is straightforward to generalize the classical Gram-Schmidt orthogonalization process for t-vectors. The TSVD can also be used to orthogonalise a set of t-vectors. In GCA and TGCA, the choice of orthogonalization algorithm doesn't matter as long as the algorithm is consistent for all sets of vectors and t-vectors.
Given a t-matrix Y TM ∈ C D×d , letẎ TM ∈ C D×d be the corresponding unitary orthogo- LetẊ TM ,k ∈ C D×d be the unitary orthogonalized t-matrices computed from X TM ,k for 1 ≤ k ≤ K. Then, for 1 ≤ k, k ≤ K, the (k, k ) t-scalar entry of the symmetric t-matrix G TM ∈ C K×K is nonnegative and given by where · F is the generalized Frobenius norm of a t-matrix, as defined by equation (3.3).
Given any query t-matrix sample Y TM ∈ C D×d , letẎ TM ∈ C D×d be the corresponding unitary orthogonalized t-matrix computed from Y TM . Then, the k-th t-scalar entry of K TV ∈ C K is computed as follows.
Since G TM , computed as in equation (5.7), is symmetric, the TSVD of G TM has the following form Furthermore, if it is assumed that the diagonal entries S TM . = diag(λ T,1 , · · · , λ T,K ) are all strictly positive, then the multiplicative inverse of λ T,k exists for 1 ≤ k ≤ K. The t-matrix S 1/2 TM . = diag( λ T,1 , · · · , λ T,K ) is called the t-matrix square root of S TM and the t-matrix S Thus, the features of the t-matrix sample Y TM ∈ C D×d are given by the t-vector Y feat TV ∈ C K as Y feat and the features of the k-th measurement X TM ,k are given by the t-vector X feat TV ,k as follows.
X feat TV , where (G TM ) :,k denotes k-th t-vector column of G TM . It is not difficult to verify that S The dimension of a TGCA feature t-vector is reduced from K to K (K > K ) by discarding the last (K − K ) t-scalar entries. It is noted that GCA is a special case of TGCA when the dimensions of the t-scalars are given by N = 1, I 1 = 1.

Experiments
The results obtained from TSVD, THOSVD, TPCA, T2DPCA, TGCA and their precursors are compared in applications to low-rank approximation in Section 6.1, reconstruction in Section 6.2 and supervised classification of images in Section 6.3.
In these experiments "vertical" and "horizontal" comparisons between generalised algorithms and the corresponding canonical algorithms are made.
In a "vertical" experiment, tensorized data is obtained from the canonical data in 3 × 3 neighborhoods. The associated t-scalar is a 3×3 array. To make the vertical comparison fair, we put the central slices of a generalized result into the original canonical form and then compare it with the result of the associated canonical algorithm.
In a "horizontal" comparison, a generalized order-N array of order-two t-scalars is equivalent to a canonical order-(N + 2) array of scalars. Therefore, a generalized algorithm based on order-N arrays of order-two t-scalars is compared with a canonical algorithm based on order-(N + 2) arrays of scalars.
6.1. Low-rank Approximation. TSVD approximation is computed as in equation (4.2). THOSVD approximation generalizes low-rank approximation by TSVD and lowrank approximation by HOSVD. To simplify the calculations, the approximation is obtained for a g-tensor The low-rank approximationX GT ∈ C D 1 ×D 2 ×D 3 to X GT and with multilinear rank tuple (r 1 , r 2 , r 3 ), (1 ≤ r k ≤ Q k for all k = 1, 2, 3), is computed as in equation (6.2), where (U TM ,k ) :,1:r k denotes the t-matrix containing the first r k t-vector columns of U TM ,k for k = 1, 2, 3 and (S GT ) 1:r 1 ,1:r 2 ,1:r 3 ∈ C r 1 ×r 2 ×r 3 denotes the g-tensor containing the first r 1 × r 2 × r 3 t-scalar entries of S GT . When the t-scalar dimensions are given by N = 1, I 1 = 1, equation (6.2) reduces to the HOSVD low-rank approximation of a tensor in C D 1 ×D 2 ×D 3 . When the g-tensor dimension D 3 = 1, equation (6.2) reduces to the SVD low-rank approximation of a canonical matrix in C D 1 ×D 2 .
6.1.1. TSVD versus SVD -A "Vertical" Comparison. The low-rank approximation performances of TSVD and SVD are compared. In the experiment, the test sample is the 512 × 512 × 3 RBG Lena image downloaded from Wikipedia. 1 .
For the SVD low-rank approximations, the RGB Lena image is split into three 512×512 monochrome images. Each monochrome image is analyzed using the SVD. The three extracted monochrome Lena images are order-two arrays in R 512×512 . Each monochrome Lena image is tensorized to produce a t-image (a generalized monochrome image) in R 512×512 ≡ R 3×3×512×512 . In the tensorized version of the image each pixel value is replaced by a 3 × 3 square of values obtained from the 3 × 3 neighborhood of the pixel. Padding with 0 is used where necessary at the boundary of the image.
Given an array X of any order over the real numbers R, letX be an approximation to X. Then, the PSNR (Peak Signal-to-Noise Ratio) forX is defined as in [1] by where N entry denotes the number of real number entries of X, X −X F is the canonical Frobenius norm of the array (X −X) and MAX is the maximum possible value of the entries of X. In all the experiments, MAX = 255. Figure 1 shows the PSNR curves of the SVD and TSVD approximations as functions of the rank ofX. It is clear that the PSNR of the TSVD approximation is consistently higher than that of SVD approximation. When the rank r = 500, the PSNRs of TSVD and SVD differ by more than than 37 dBs.

TSVD versus HOSVD -A "Horizontal" Comparison. Given a monochrome
Lena image as an order-two array in R 512×512 and its tensorized form as an orderfour array in R 3×3×512×512 , TSVD yields an approximation array in R 3×3×512×512 . Since the HOSVD is applicable to order-four arrays in R 3×3×512×512 , we give a "horizontal" comparison of the performances of TSVD and HOSVD.
Let the HOSVD of X ∈ R 3×3×512×512 be X = S × 1 U 1 × 2 U 2 × 3 U 3 × 4 U 4 where S ∈ R 3×3×512×512 denotes the core tensor, and U 1 ∈ R 3×3 , U 2 ∈ R 3×3 , U 3 ∈ R 512×512 , U 4 ∈ R 512×512 are all orthogonal matrices. Then, to give a "horizontal" comparison with the TSVD approximationX TM with rank r, the HOSVD approximationX ∈ R 3×3×512×512 is given by the multi-mode product For each of the generalized monochrome Lena images (respectively marked by the channel type "red", "green" and "blue"), as a 3 × 3 × 512 × 512 real number array, the PSNRs of TSVD and HOSVD are given in Figure 2. As rank r is varied, the PSNR of TSVD approximation is always higher than that of the corresponding HOSVD approximation. When rank r is equal to 500, the PSNRs of TSVD and HOSVD approximations differ significantly.

THOSVD versus HOSVD -A "
Vertical" Comparison. The low-rank approximation performances of THOSVD and HOSVD are compared. For the HOSVD approximations the RGB Lena image, which is a tensor in R 512×512×3 , is used as the test sample. For the THOSVD the 3 × 3 neighborhood (with zero-padding) strategy is used to tensorize each real number entry of the RGB Lena image. The obtained t-image X GT is a g-tensor in R 512×512×3 , i.e., an order-five array in R 3×3×512×512×3 .
To give a "vertical" comparison, on obtaining an approximationX GT ∈ R 3×3×512×512×3 , we compareX GT (i)| i=(2,2) ∈ R 512×512×3 , i.e., the central slice of the THOSVD approximation, with the HOSVD approximation on the RGB Lena image. PSNRs of HOSVD/TSVD approximations (on same fourth-order data) with different approximation rank r Figure 2. A "horizontal" comparison of low-rank approximations by HOSVD and TSVD on each generalized monochrome Lena image, as an fourth-order real number array in R 3×3×512×512 . First column: PSNR curves, over rank r, of HOSVD/TSVD approximations on each generalized monochrome Lena image. Second column: Some quantitative PSNRs of HOSVD/TSVD approximations with rank r. Figure 3 gives a "vertical" comparison of the PSNR maps of THOSVD and HOSVD approximations and the tabulated PSNRs for some representative multilinear rank tuples (r 1 , r 2 , r 3 ). It shows the PSNR of the THOSVD approximation is consistently higher than the PSNR of the HOSVD approximation. When (r 1 , r 2 , r 3 ) = (500, 500, 3), the approximations obtained by THOSVD and HOSVD differ by 30 PSNRs of HOSVD/THOSVD approximations (on/for third-order data/slice) with different multilinear rank tuple (r1, r2, r3) Figure 3. A "vertical" comparison of THOSVD approximations and HOSVD approximations with the multilinear rank tuple (r 1 , r 2 , r 3 ). First column: PSNR maps of HOSVD approximation on the RGB Lena image. Second column: PSNR maps of THOSVD approximation for the RGB Lena image (i.e., third-order central slice of THOSVD approximation). Third column: Some quantitative PSNRs of HOSVD/THOSVD approximations with representative multilinear rank tuples.
6.1.4. THOSVD versus HOSVD -A "Horizontal Comparison". Given a fifth-order array X ∈ R 3×3×512×512×5 tensorized from the RGB Lena image, which is a third-order array in R 512×512×3 , both THOSVD and HOSVD can be applied to the same data X.
PSNR of HOSVD approximation (on same fifth-order data) with rank r3 = 1 PSNR of THOSVD approximation (on same fifth-order data) with rank r3 = 1   The average PSNR for TPCA is consistently higher than the average PSNR for PCA. The PSNR standard deviation for TPCA is slightly larger than the PSNR standard deviation for PCA, but the ratio A/S for TPCA is generally smaller than the ratio A/S for PCA. This indicates that TPCA outperforms PCA in terms of reconstruction quality.    is 31.98 dBs. Furthermore, the PSNR standard deviation for T2DPCA is also generally smaller than the PSNR standard deviation for 2DPCA. In terms of reconstruction quality, T2DPCA outperforms 2DPCA.
6.3. Classification. TGCA and GCA are applied to the classification of the pixel values in hyperspectral images. Hyperspectral images have hundreds of spectral bands, in contrast with RGB images which have only three spectral bands. The multiple spectral bands and high resolution make hyperspectral imagery essential in remote sensing, target analysis, classification and identification [21,15,38,10,36,24,40]. Two publicly available data sets are used to evaluate the effectiveness of TGCA and GCA for supervised classification.
6.3.1. Datasets. The first hyperspectral image dataset is the Indian Pines cube (Indian cube for short), which consists of 145 × 145 hyperspectral pixels (hyperpixels for short)  and has 220 spectral bands, yielding an array of order-three in R 145×145×220 . The Indian cube comes with ground-truth labels for 16 classes [31]. The second hyperspectral image dataset is the Pavia University cube (Pavia cube for short), which consists of 610 × 340 hyperpixels with 103 spectral bands, yielding an array of order three in R 610×340×103 . The ground-truth contains 9 classes [31].
6.3.2. Tensorization. Given a hyperspectral cube, let D 1 be he number of rows, D 2 the number of columns and D the number of spectral bands. A hyperpixel is represented by a vector in R D . Each pixel is tensorized by its 3 × 3 neighborhood. The tensorized hyperspectral cube is represented by an array in R 3×3×D 1 ×D 2 ×D . Each tensorized hyperpixel, called t-hyperpixel in this paper, is represented by a t-vector in R D , i.e., an array in R 3×3×D . Figure 7 shows the tensorization of a canonical vector extracted from a hyperspectral cube. The tensorization of all vectors yields a tensorized hyperspectral cube in R 3×3×D 1 ×D 2 ×D .

Input Matrices and T-matrices.
To classify a query hyperpixel, it is necessary to extract features from the hyperpixel. A t-hyperpixel in TGCA is represented by a set of t-vectors in the 5 × 5 neighborhood of the t-hyperpixel. These t-vectors are used to construct a t-matrix. A similar construction is used for GCA.
In GCA for example, let the vectors in the 5 × 5 neighborhood of a hyperpixel be X vec,1 , · · · , X vec, 25 . The ordering of the vectors should be the same for all hyperpixels. The raw matrix X mat representing the hyperpixel is given by marshalling these vectors as the columns of X mat , namely X mat . = [X vec,1 , · · · , X vec,25 ] ∈ R D×25 . The associated t-matrix X TM ∈ C D×25 in TGCA is obtained by marshalling the associated 25 t-vectors.
After obtaining each matrix and t-matrix, the columns are orthogonalized. The resulting matrices and t-matrices are input samples for GCA and TGCA respectively. 6.3.4. Classification. To evaluate GCA, TGCA and the competing methods, the overall accuracies (OA) and the Cohen's κ indices of the supervised classification of hyperpixels (i.e., prediction of class labels of hyperpixels) are used. The overall accuracies and κ indices are obtained for different component analysers and classifiers. Higher values of OA or κ indicate a higher component analyzer performance [9]. Let K be the number of query samples, let K be the number of correctly classified samples. The overall accuracy is simply defined by OA = K /K. The κ index is defined by [5] where N class is the number of classes, a j is the number of samples belonging to the j-th class and b j is the number of samples classified to the j-th class.
Two classical component analyzers, namely PCA and LDA, and four state-of-the-art component analyzers, namely TDLA [40], LTDA [42], GCA [13] and TPCA (ours) are evaluated against TGCA. As an evaluation baseline, the results obtained with the original raw canonical vectors for hyperpixels are given. These raw vectors are denoted as the "original" (ORI for short) vectors. Three vector-oriented classifiers, NN (Nearest In the experiments, the background hyperpixels are excluded, because they do not have labels in the ground-truth. A total of 10% of the foreground hyperpixels are randomly and uniformly chosen without replacement as the observed samples (i.e., samples whose class labels are known in advance). The rest of the foreground hyperpixels are chosen as the query samples, that is samples with the class labels to be determined.
In order to use the vector-oriented classifiers NN, SVM and RF, the t-vector results, generated by TGCA or TPCA, are transformed by pooling them to yield canonical vectors. For TGCA, the canonical vectors obtained by pooling are referred to as TGCA-I features and the t-vectors without pooling are referred to as the TGCA-II features.
To assess the effectiveness of the TGCA-II features, a generalized classifier which deals with t-vectors is needed. It is possible to generalize many canonical classifiers from vector-oriented to t-vector-oriented, however a comprehensive discussion of these generalizations is outside the scope of this paper. Nevertheless, it is very straightforward to generalize NN. The d-dimensional t-vectors are not only elements of the module C d , but also the elements in the vector space C 3×3×d . This enables the use of the canonical Frobenius norm to measure the distance between two t-vectors, as the elements in C 3×3×d . The canonical Frobenius norm should not be confused with the generalized Frobenius norm defined in equation (3.3). 6.3.5. TGCA versus GCA. It is noted that the maximum dimension of the TGCA and GCA features is equal to the number of observed training samples, and therefore is much higher than the original dimension, which is equal to the number of spectral bands. Thus, taking the original dimension as the baseline, one can employ TGCA or GCA either for dimension reduction or dimension increase. When the so-called "curse of dimension" is the concern, one can discard the insignificant entries of the TGCA and GCA features. When the accuracy is the primary concern, one can use higher dimensional features.
The performances of TGCA and GCA for varying feature dimension are compared using accuracy curves generated by TGCA (ie., TGCA-I and TGCA-II) and GCA, as shown  Figure 8. Classification accuracies obtained on two hyperspectral cubes in Figure 11. The results are obtained for low feature dimensions and for high feature dimensions. It is clear that the classification accuracies obtained using TGCA and TGCAII are consistently higher than the accuracies obtained using GCA.
6.3.6. TPCA versus PCA. The classification accuracies of TPCA and PCA are compared, although the highest classification accuracies are not obtained from TPCA or PCA. The classification accuracy curves obtained by TPCA and PCA (with classifiers NN, SVM and RF) are given in Figure 12. It is clear that, no matter which classifier and feature dimension are chosen, the accuracy using TPCA is consistently higher than the accuracy using PCA. 3 6.4. Computational Cost. The run times of t-matrix manipulations with different tscalar sizes I 1 × I 2 are given in Figure 13. The size of t-scalars ranges from 1 ≤ I 1 , I 2 ≤ 32. The evaluated t-matrix manipulations include addition, conjugate transposition,   From Figure 13, it can be seen that the run time is essentially an increasing linear function of the number of slices, i.e., I 1 · I 2 .

Conclusion
An algebraic framework of tensorial matrices is proposed for generalized visual information analysis. The algebraic framework generalizes the canonical matrix algebra, combining the "multi-way" merits of high-order arrays and the "two-way" intuition of matrices. In the algebraic framework, scalars are extended to t-scalars, which are implemented as high-order numerical arrays of a fixed-size. With appropriate operations, the t-scalars are trinitarian in the following sense. Tensorial matrices, called t-matrices, are constructed with t-scalar elements. The resulting t-matrix algebra is backward compatible with the canonical matrix algebra. Using  this t-algebra framework, it is possible to generalize many canonical matrix and vector constructions and algorithms.
To demonstrate the "multi-way" merits and "two-way" matrix intuition of the proposed tensorial algebra and its applications to generalized visual information analysis, the canonical matrix algorithms SVD, HOSVD, PCA, 2DPCA and GCA are generalized. Experiments with low-rank approximation, reconstruction, and supervised classification show that the generalized algorithms compare favorably with their canonical counterparts on visual information analysis.
acknowledgements Liang Liao would like to thank professor Pinzhi Fan (Southwestern Jiaotong University, China) for his support and some insightful suggestions to this work. Liang Liao also  All prospective supports and collaborations to this research are welcome. Contact email: liaolangis@126.com or liaoliang2018@gmail.com.

Appendix I
Before giving a proof of the equivalence of equations (4.2) and (4.3), namely, the generalized Eckart-Young-Mirsky theorem, some notations need to be defined.
First, rank(·) denotes the rank of a t-matrix, which generalizes the rank of a canonical matrix and is defined as follows.
Definition I, rank of a t-matrix. Given a t-matrix, the rank Y T . = rank(X TM ) is a nonnegative t-scalar such that F (Y T ) i = rank(F (X TM )(i)) ≥ 0 , 1 ≤ i ≤ I . (7.1) where F (X TM )(i) denotes the i-th slice of the Fourier transform F (X TM ).
Definition II, partial ordering of nonnegative t-scalars. Given two nonnegative t-scalars X T and Y T , the notation X T ≤ Y T is equivalent to the following condition Definition III, minimization of nonnegative t-scalar variable. For a nonnegative t-scalar variable X T varying in a subset of S nonneg , Y T . = min(X T ) is the nonnegative t-scalar infimum of the subset, satisfying the following condition. where F (Y T ) and F (X T ) respectively denote the Fourier transforms of Y T and X T .
Given two nonnegative t-scalars X T and Y T , let M T be the nonnegative t-scalar defined by M T = min(X T , Y T ), namely The above definitions are not casual ones. Following the above definitions, it is not difficult to verify that many generalized rank properties hold in the analogous form of their canonical counterparts.
For examples, given any t-matrices X TM ∈ C D 1 ×D 2 and Y TM ∈ C D 2 ×D 3 , the following inequalities hold.
Z T ≤ rank(X TM ) ≤ min(D 1 , D 2 ) · E T . (7.5) Z T ≤ rank(X TM + Y TM ) ≤ rank(X TM ) + rank(Y TM ) . (7.6) rank(X TM ) + rank(Y TM ) − D 2 · E T ≤ rank(X TM • Y TM ) ≤ min rank(X TM ), rank(Y TM ) (7.7) Since a t-scalar is a t-matrix of one row and one column, the rank of a t-scalar can be obtained.
Given any t-scalar X T , let G T . = rank(X T ) be the rank of X T . Then, following equation (7.1), it is not difficult to prove that the i-th entry of the Fourier transform F (G T ) is given as follows.
Following the partial ordering given as in (7.2) and equation (7.1), it is not difficult to prove that the following propositions hold. Z T ≤ rank(X T ) ≤ E T , for all t-scalars It follows from (7.9) that Z T < rank(X T ) < E T iff the t-scalar X T is non-zero and non-invertible. 4 Generalized rank from a TSVD perspective. Given any t-matrix X TM = U TM •S TM •V H TM where S TM . = diag(λ T,1 , · · · , λ T,k , · · · λ T,Q ) and λ T,k ∈ C is a t-scalar for all k, then the following equation holds and generalizes its canonical counterpart.
Then,X TM is a low-rank approximation to X TM since the following rank inequality holds. rank(X TM ) ≡ r k=1 rank(λ T,k ) ≤ rank(X TM ) .  The canonical Eckart-Young-Mirsky theorem guarantees the equivalence of equations (7.12) and (7.13).   Figure 15. An illustrative example of t-matrix multiplication where the size of t-scalars is 3 × 3.  Figure 16. An illustrative example of a generalized tensor in C 2×3×2 and the mode-k (k = 1, 2, 3) flattened form of the generalized tensor.