Component SPD matrices: A low-dimensional discriminative data descriptor for image set classification

In pattern recognition, the task of image set classification has often been performed by representing data using symmetric positive definite (SPD) matrices, in conjunction with the metric of the resulting Riemannian manifold. In this paper, we propose a new data representation framework for image sets which we call component symmetric positive definite representation (CSPD). Firstly, we obtain sub-image sets by dividing the images in the set into square blocks of the same size, and use a traditional SPD model to describe them. Then, we use the Riemannian kernel to determine similarities of corresponding subimage sets. Finally, the CSPD matrix appears in the form of the kernel matrix for all the sub-image sets; its i, j-th entry measures the similarity between the i-th and j-th sub-image sets. The Riemannian kernel is shown to satisfy Mercer’s theorem, so the CSPD matrix is symmetric and positive definite, and also lies on a Riemannian manifold. Test on three benchmark datasets shows that CSPD is both lower-dimensional and more discriminative data descriptor than standard SPD for the task of image set classification.

The dimensionality of traditional SPD matrices arising from covariance descriptors [2], [4], [5], [6], [7], [8] used for image set classification is relatively high.Although highdimensional data descriptors provide enough information, the curse of higher-dimensional data causes a lot of computations and has a poor effect on efficiency of the algorithms.The DR (dimensionality reduction) is always imperative in computer vision and machine learning.The classical methods, such as PCA(Principal Component Analysis) [16] and LDA(Linear Discriminant Analysis) [17] are pervasive in various applications.Because the SPD matrices lie on a non-linear manifold, these methods used in Euclidean space are not suitable for analyzing the SPD matrices.Recently, considering the Riemannian structure of SPD matrices, the work of DR has been extended to the space spanned by SPD matrices.The BCM (Bidirectional Covariance Matrices) [8]and SPDML [2] are the DR methods on the SPD manifold.BCM is a twodirectional two-dimensional PCA [18] method directly working on the SPD matrices to obtain low-dimensional descriptors.SPDML [2] embed the high-dimension SPD matrices into a lower-dimensional and more discriminative SPD manifold through a projection matrix.
In this paper, we propose a new framework to obtain lowdimensional and more discriminative descriptors for representing image set.Now let  be an image set and we assume there are  samples in the set,  = [ 1 ,  2 , …   ], where   ∈   represents the -th image in the set.the traditional SPD model, which arises in the form of Covariance descriptors [2], [4], [5], [7], [8], will give a  ×  SPD matrix for the representation of image set.SPD , denotes the covariance between -ℎ dimension and -ℎ dimension of the all images in image set, a.k.a.-ℎ row and -ℎ row of the image set matrix .In our minds, we want to represent image set by describing the similarity between regions of the image set instead of the dimensions.In our CSPD model, we firstly divide the image set into  ×  blocks, each block is a square with the same size.For the sub-image sets, we use covariance descriptor to represent the data in the sub-image sets.CSPD , denotes the similarity between i-th sub-image set and j-th sub-image set.Due to the number of the blocks is  2 and CSPD , = CSPD , , the form of CSPD descriptor is a symmetric matrix with the dimensionality being  2 ×  2 .At last, in order to ensure the positive definiteness of this symmetric matrix, we use the result of the Riemannian kernel function on SPD matrices corresponding to the sub-image sets as the similarity between sub-image sets.Figure 1 shows the entire produce of our approach and traditional model for image set.
The remaining of this paper is organized as follows: In Section Ⅱ, we give a brief overview on the geometry of SPD manifold and some classical related Riemannian metrics.In Section Ⅲ, we present the model of original SPD and our proposed CSPD, and introduce some SPD manifold-based classification algorithms which are used in the experiments of this paper.In Section Ⅳ, we present the experimental results with the average accuracies and standard deviations.In our experiments, we conduct experiments on three tasks of object categorization, hand gesture recognition and virus cell classification.Moreover, the experiments show that our model CSPD has better recognition rates and improves the efficiency of the classification algorithms.In Section Ⅴ, we present our conclusions and future directions.

II. RELATED WORK
In this section, we will give an overview on the geometry of SPD manifold and some classical related Riemannian metrics.In this paper, we will take the following notation:   + is the space spanned by real  ×  SPD matrices,   is the tangent space spanned by real  ×  symmetric matrices at the point of identity matrix    × .    + is the tangent space spanned by real  ×  symmetric matrices at the point of   + .

A. SPD manifold
The SPD matrices have been proved to be the powerful data representation approach for images or image sets via covariance [7], [8] or region covariance [15] descriptors.The space spanned by the SPD matrices does not satisfy the scalar multiplication axiom of the vector space.For example, the result matrix via multiplying an SPD matrix by a negative scalar does not lie on   + [11].The similarity between two SPD matrices computed by the Euclidean metrics is not reasonable and Riemannian metrics have been proven to get a better effect on the SPD matrices.As studied in [2], the SPD manifold spanned by SPD matrices is one kind of Riemannian manifolds and forms the interior of a convex cone in the Euclidean space.
A variety of Riemannian metrics of SPD manifold have been proposed.In particular, the AIRM (Affine Invariant Riemannian Metric) [2], [8] is the mostly studied descriptor which is the geodesic distance between two SPD matrices, and has the property of invariance to affine transformations.The Stein divergence and Jeffrey divergence [2], [10], which are efficient metrics akin to AIRM to measure geodesic distance between two SPD matrices, are Bregman divergence for some special seed function.The LEM (Log-Euclidean Metric) [7], [8] obtains the similarity between two SPD matrices through computing the distance in the space of matrix logarithm which is the tangent space at the point of identity matrix.Then, we will introduce AIRM and LEM in detail, and these two metrics will be used in our experiment.

B. Affine Invariant Riemannian Metric
The   + can be viewed as a convex cone in the ( + 1)/2 dimensional Euclidean space [2].The similarity between two SPD matrices on the manifold can be described by the length of geodesic curve, which is analogous to the straight line between two points in the vector space.The AIRM [2], [8] is one of the most popular Riemannian metrics on the SPD manifold, and it measures the similarity between two points on SPD manifold by computing the geodesic distance between them.For point P on the SPD manifold, The AIRM can be defined through two tangent vectors ,     + : The geodesic distance   between two points  and  on SPD manifold computed by AIRM can be written as: where || • ||  denotes the Frobenius norm, log(⋅) is the matrix logarithm operator.
C. Log-Euclidean metric LEM(Log-Euclidean metric) [4], [7], [8], [11] is a bi-variant Riemannian metric coming from the Lie group multiplication on SPD matrices [11]: where  and  lie on SPD manifold.The distance   between these two SPD matrices computed by this metric can be written as: ) where log(⋅) is the matrix logarithm operator, || • ||  denotes the Frobenius norm.The results of LEM can be viewed as the distance of the points in the tangent space   projected from SPD manifold   + by logarithm mapping [7], [8]: where   is a vector space, and Figure 2 gives the conceptual illustration of logarithm mapping vividly.In compliance with Riemannian multiplication ⨀ operater on SPD matrices, the scalar multiplication can be defined [11] as: where  is a real scalar.  + is a vector space when endowed with the Riemannian multiplication ⨀ and Riemannian scalar multiplication ⨂ [11].Furthermore, the Riemannian kernel function can be represented by the Log-Euclidean inner product [7], [11]: For the all points  1 , … ,   ϵ  + ,  LogE is a symmetric function because of  LogE (  ,   ) =  LogE (  ,   ) .
According to the paper [7], we have: The Eq (8) gives the proof that Log-Euclidean kernel guarantee the positive definite property of the Riemannian kernel and satisfies the Mercer's theorem.The kernel matrix of the all points on SPD manifold is also a SPD(Symmetric Positive Definite) matrix.In this section, we will recall the Original SPD model for image set obtained by covariance descriptors [7], [8] and introduce our model CSPD(Component Symmetric Positive Definite) model in detail.

A. Original Symmetric Positive Definite model
For an image set with n images: S = [ 1 ,  2 … ,   ], where     represents the i-th image sample of D-dimensional vector.Here, the covariance matrix [2], [7], [8] computed from the raw intensity of sample in the image sets: where is the mean of all images which are represented by the D-dimensional vectors in the set S.   =   − 1  1  1   is the centering matrix and 1  is a column vector of n ones [2].Also   is a symmetric matrix, with the rank(  ) =  − 1 and   2 =   .In general, the number of the images in the set is often smaller than the dimensionality of the feature vector, thus a covariance matrix is not positive definite, so we need to add a small perturbation [7]: (10) where  was set as 10 −3 and  is the identity matrix [7].Now, the image sets are modeled as the SPD matrices which form the SPD manifold.In general, the dimensionality of the covariance descriptors is high, our model overcomes this limitation.

B. Component Symmetric Positive Definite model
For our model CSPD, we firstly divide image set into  ×  square blocks with the same size.One block of image set is described by covariance descriptor(Eq.9),there are  2 SPD matrices for all blocks in the image set.Our model is proposed to describe the relationship between the blocks of the image set.
For example, from the Fig. 1 path of arrow (a)-(c)-(d), the image set was divided into 2 × 2 square blocks and form 4 sub-image sets: 1 ,  2 ,  3 and  4 firstly.Correspondingly, there are 4 covariance descriptors  1 ,  2 ,  3 and  4 for 4 sub-image sets.To this end,  4×4 is a matrix describing the similarity between 4 sub-image sets.Here in Fig. 1(c),   lies on a higher-dimensional SPD manifold even though the dimensionality of the images in the blocks is lowers.In order to measure similarity between sub-image sets, we use the Log-Euclidean inner product(Eq.7) to represent the similarity of the covariance descriptors:  , =   (  ,   ) = (log(  ) log(  )) (11) Here,  , means the similarity between i-th sub-image set and j-th sub-image set, and  , =  , .We use the Log-Euclidean inner product [7], [11] to represent the covariance descriptors, because it can ensure the positive definite property of the CSPD.The final CSPD appears in the form of the Riemannian kernel matrix of the covariance descriptors   for all sub-image sets.

C. Classification algorithms based on SPD manifold
The NN (nearest neighbor) algorithm is one of the simplest methods for classification and regression in the domain of computer vision and pattern recognition.This classification algorithm classifies the input point to the class of the closest neighbor point and will show different accuracies under different geometric metrics.According to the literature [8], the NN classification algorithms based on AIRM and LEM are utilized to the SPD manifold, and these simple classification algorithms can clearly show the benefits of our CSPD model.
In the literature [7], the CDL(covariance discriminative learning) was proposed for image set classification.In this paper, the geometric properties of the Riemannian manifold are fully considered, and the classical classification algorithms are not directly utilized to the SPD manifold.It derives a kernel function that maps the SPD matrices from the Riemannian manifold to the Euclidean space through the LEM metric.With this mapping, the classical classification algorithms applied in the linear space can be exploited in the kernel formulation.LDA (linear discriminant analysis) and PLS (partial least squares) devoted to the linear space are considered in the literature [7] for the task of classification.
Lastly, we introduce the Riemannian sparse coding algorithm LogEKSR [11] which applies the sparse representation and dictionary learning to SPD matrices through mapping the SPD matrices into RKHS (Reproducing Kernel Hilbert Space) to obtain the sparse coefficients through Log-Euclidean kernels.Note that the Log-Euclidean kernels in this algorithm are the derivatives of Eq.7.

IV. EXPERIMENTAL RESULTS AND ANALYSIS
In order to verify the effectiveness of our model, we do experiments on the three tasks: object categorization, hand gesture recognition and virus cell classification.The three datasets are ETH-80 [4], Cambridge hand gesture dataset(CG) [5], and Virus dataset [13] respectively.In our experiments, we compare the accuracies of our model CSPD with original SPD model under the same classification algorithms.Firstly, we take a most commonly used nearest neighbor classifier based on AIRM [2], [8] and LEM [4], [8], [11] which are introduced in section 2. The NN classifier is a simple method to display the advantage of our model.Secondly, we make use of classical Riemannian classification algorithms Log-E poly.kernel-basedLogEKSR(Log-Euclidean Kernels for Sparse Representation) [11] and LDA-based CDL(Covariance Discriminative Learning) [7], which are the efficient methods on SPD manifold.Next, we give the naming of different algorithms: • NN − AIRM  : AIRM-based Nearest Neighbor classifier for the SPD manifold spanned by original SPD matrices.
• NN − AIRM  : AIRM-based Nearest Neighbor classifier for the CSPD manifold spanned by our proposed CSPD matrices.
• NN − LogED  : LEM-based Nearest Neighbor classifier for the SPD manifold spanned by original SPD matrices.
• NN − LogED  : LEM-based Nearest Neighbor classifier for the CSPD manifold spanned by our proposed CSPD matrices.
• CDL  : CDL-based classifier for the SPD manifold spanned by original SPD matrices.
• CDL  : CDL-based classifier for the CSPD manifold spanned by our proposed CSPD matrices.
• LogEKSR  : LogEKSR-based classifier for the SPD manifold spanned by original SPD matrices.
• LogEKSR  : LogEKSR-based classifier for the CSPD manifold spanned by our proposed CSPD matrices.
In our experiments, we re-size all the images into 24 × 24 and the image set can be divided into 2 × 2, 3 × 3, 4 × 4, 6 × 6, 8 × 8 and 12 × 12 blocks.For such image size setting, the dimensionality of the original SPD is 576 × 576.Instead, the dimensionality of the CSPD will be 4 × 4, 9 × 9, 16 × 16, 36 × 36, 64 × 64 and 144 × 144.Here, just from the view of dimensionality of the two kinds of data descriptors, our approach has lower-dimensional data representation.Next, we will use the results of the experiments to verify the discrimination of our model.

A. Object Categorization on the dataset ETH-80
For the task of object categorization, we selected the ETH-80 dataset for experiments.The ETH-80 contains eight categories images of apples, pears, tomatoes, cows, dogs, horses, cups, and cars.Each class has 10 image sets, and each image set consists of 41 images from different angles.Fig. 3. gives the part of images in the ETH-80 dataset.For each class, we randomly choose 2 image sets as training data, and the rest image sets were used as test data.We give the average accuracies and standard deviations of the 10 cross validation experiments.
Table 1 shows the performance of our model CSPD and original model SPD under the same classification algorithms.The results of our CSPD model with four different classification algorithms are on the premise of the image set being divided into 6 × 6 blocks.We can see that the results of NN classifier based on two introduced Riemannian metrics and CDL-based classifier are improved significantly by using our CSPD model.In particular, the these NN classification algorithms NN − AIRM  and NN − LogED  based on our model CSPD not only outperform NN − AIRM  and NN − LogED  , but also outperform CDL  and LogEKSR  based on original SPD model.To the end, the accuracies of the CDL  and LogEKSR  are better than the CDL  and LogEKSR  , and the LogEKSR  achieves the best accuracy of 89.92% and the lowest standard deviation of 3.84.

B. Hand Gesture Recognition
Cambridge hand gesture dataset composed of a set of high resolution color sequences acquired by the Senz3D sensor is an image sequence of hand gestures defined by 3 primitive hand shapes and 3 primitive motions.In this dataset, there are 900 image sets of 9 classes with 100 image sets in each class (see Fig. 4 for example).For the task of hand gesture recognition, 20 image sets of each class were randomly selected as training data, and the rest image sets were chosen as test data.Ten-fold cross validation experiments were operated on this dataset.We give the average accuracies and standard deviations of ten experiments in Table 2.The results of our CSPD model with four different classification algorithms are on the premise of the image set being divided into 6 × 6 blocks.For all the classifiers, the CSPD model has the advantages of higher recognition rates and lower standard deviations.Again we can see that the recognition rates of NN classification algorithms with CSPD model have obvious advantages over SPD model.In all the methods, LogEKSR  achieves the best recognition rate of 91.02% and lower standard deviation of 1.54.

C. Virus Cell Classification
The Virus dataset contains 15 categories, each category contains 5 image sets, each with 20 pictures taken from different angles.We arbitrarily choose 3 for training and the rest for testing.Figure 5 is the part of the images in Virus dataset, Table 3 shows the average recognition rates and standard deviations of ten experiments with respect to four algorithms on the Virus dataset.Table 3 gives the results of the different methods with different image set descriptors SPD and CSPD.The results of our CSPD model with four different classification algorithms are on the premise of the image set being divided into 4 × 4 blocks.Note that the recognition rates of all methods with the CSPD model is higher than the SPD model.In particular, the accuracy of NN − LogED  with CSPD model is similar to CDL-based CDL  with original SPD model.In the all methods, CDL  achieves the best recognition rate of 54.50%.

D. Effects of block number
Here, we will present the effects of block number on the average accuracies, standard deviations and running time under the same classification algorithm.Here, we do the experiment on the ETH-80 dataset as an example and give the next notations.
• SPD OR : the data representation obtained by covariance descriptors • CSPD n×n : the CSPD descriptor obtained by dividing the image into  ×  blocks

1) Effects of block number on average accuracies and standard deviations
In order to display the effects of the block number, we will show the average accuracies of the 6 kinds of CSPD descriptors arising from the different segmentations of the image set and the original SPD obtained by covariance descriptor on ETH-80 dataset shown in Table .4, we can see that the recognition rates of four classification algorithms with CSPD model are lower than the original SPD model when the image set was divided into 2 × 2 blocks.Finally, the four algorithms have a better recognition rates for all classification algorithms when the image set was divided into 6 × 6 square blocks.According to the above two tables, we have the finding that our CSPD model used on ETH-80 dataset has higher recognition rates and lower standard deviations when the image set was divided into 6 × 6 blocks, and the results of classification algorithms with CSPD model in Table 1 are obtained on the premise of the image set being divided into 6 × 6 blocks.Similarly, we divide the image set of Cambridge hand gesture dataset into 6 × 6 blocks to obtain the results of CSPD model in Table 2.The results of CSPD model in Table 3 are obtained by dividing the image set of Virus cell dataset into 4 × 4 blocks.Here, we will not give the average accuracies and standard deviations under different data descriptors with the same classification algorithms for Cambridge hand gesture dataset and Virus cell dataset.

2) Effects of block number on running time
The dimensionality of our CSPD matrices is lower than original SPD matrices.This property is good for saving running time.We consider the efficiency of our CSPD model from two aspects: 1) the running time of different data representation models with the same classification.2) the time of obtaining data descriptors from image set to SPD or CSPD.
Firstly, we give the Table 6 which shows the time needed from image set to data descriptors (SPD or CSPD).The unit of the time is second.As can be seen from Table 6, the time needed for CSPD is less than that of original SPD while image set being divided into 2 × 2, 3 × 3, 4 × 4 and 6 × 6 blocks.In general, the gap between the time needed for different descriptors is relatively small.As shown in the Table 7 where the unit of the time is millisecond, the advantages of our CSPD model are obvious with the same classification algorithm whatever the kinds of CSPD model.According to these two tables, we have the observation that the efficiency of classification algorithms with our approach of data descriptors has been greatly improved.

V. CONCLUSION
In this paper, we propose the CSPD(Component Symmetric Positive Definite) model to extract novel descriptors for image sets.The superior performance of our proposed CSPD is mainly demonstrated with more discriminative ability and lower dimensionality.For the property of discriminative ability, the recognition rates of CSPD are higher than that of traditional SPD while using the same classification algorithm.Especially, it can be expressed directly and effectively by the comparisons of the results from two Nearest Neighbor classification algorithms.For the property of lower dimensionality, the time complexity has been decreased and efficiency for algorithms have been improved significantly.For the future work, we will study more data descriptors for image set classification.

Fig. 1 .
Fig.1.The flow chart of the traditional SPD model and our CSPD model.For an image set, the traditional SPD model follows the way (a)-(b)-(c), and the resulting SPD matrix is computed by covariance descriptor and lies on a non-linear geometry structure named SPD manifold.Our approach Component Symmetric Positive Definite(CSPD) model follows the way (a)-(d)-(e)-(f) to firstly divide the image set into square blocks with the same size, and the representation of i-th sub-image set Bi is obtained by traditional SPD model.Then, we use the result of Riemannian kernel to describe the similarity between the sub-image sets, and the final CSPD appears in the form of the Riemannian kernel matrix of the representations of sub-image sets.

Fig. 5 .
Fig.5.Images in Virus cell dataset 4, which shows the average accuracies of different data descriptors with the NN-AIRM (AIRM-based Nearest Neighbor classifier), NN-LogED (LEM-based Nearest Neighbor classifier), CDL and LogEKSR classification algorithms.The data in the row are the average recognition rates of the same data descriptor with the different classification algorithms, and the data in the column are the average recognition rates of the same classification algorithm with different data descriptors.From Table.
give the comparison of running time of different data descriptors with the classification algorithms in Table7.The data in the row are the running time of the same data descriptor with the different classification algorithms, and the data in the column are the running time of the same classification algorithm with different data descriptors.

Table 1 :
Recognition rates and standard deviations for the ETH-80 dataset

Table 2 :
Recognition rates and standard deviations for the CG dataset

Table 3 :
Recognition rates and standard deviations for the Virus cell dataset

Table 4
5n order to show the robustness of original SPD model and our proposed CSPD model on ETH-80 dataset, we give average standard deviations of ten experiments in Table.5.The data in the row are the standard deviations of the same data descriptor with the different classification algorithms, and the data in the column are the standard deviations of the same classification algorithm with different data descriptors.As shown in the Table.5, we can see that the standard deviations of our CSPD model are generally lower than the original SPD model with the same classification algorithm.Especially, the standard deviations of our CSPD model is lower than the original SPD model with the same classification algorithm when the image set was divided into 3 × 3, 4 × 4, 6 × 6, 8 × 8 and 12 × 12 blocks.

Table 6
time needed from image set to data descriptors

Table 7
Comparison time of different data descriptors with the same classification algorithm