Self-organizing mappings on the flag manifold with applications to hyper-spectral image data analysis

A flag is a nested sequence of vector spaces. The type of the flag encodes the sequence of dimensions of the vector spaces making up the flag. A flag manifold is a manifold whose points parameterize all flags of a fixed type in a fixed vector space. This paper provides the mathematical framework necessary for implementing self-organizing mappings on flag manifolds. Flags arise implicitly in many data analysis contexts including wavelet, Fourier, and singular value decompositions. The proposed geometric framework in this paper enables the computation of distances between flags, the computation of geodesics between flags, and the ability to move one flag a prescribed distance in the direction of another flag. Using these operations as building blocks, we implement the SOM algorithm on a flag manifold. The basic algorithm is applied to the problem of parameterizing a set of flags of a fixed type.


Introduction
Self-Organizing Mappings (SOMs) were introduced as a means to see data in high-dimensions [8][9][10][11].This competitive learning algorithm effectively transports the notion of proximity in the data space to proximity in the index space (which may in turn be endowed with its own geometry).As a tool, SOMs have been widely applied and extended [5].The goal of the SOM algorithm is to produce a topology preserving mapping from a high-dimensional space to a low-dimensional space in the sense that points that are neighbors in the high-dimensional space are also represented as neighbors in the low-dimensional index space.
The geometric framework of the vanilla version of the SOM algorithm is Euclidean space.In this setting, the distance between points is simply the standard 2-norm of the vector difference.The movement of a center toward a pattern takes place on a line segment in the ambient space.The only additional ingredient to the algorithm is a metric on the index space.Some additional treatments are needed when data are living on a high-dimensional manifold rather than Euclidean space.In [20], the author proposed a modification of the Self-organizing map algorithm to learn the manifold structure in the high-dimensional observation coordinates.Motivated by the subspace approach to data analytics we proposed a version of SOM using the geometric framework of the Grassmannian [3,[17][18][19].This subspace approach has proven to be effective in settings where you have a collection of subspaces built up from a set of patterns drawn from a given family.Given that one can compute distances between points on a Grassmann manifold and that one can move one point in the direction of another, it is possible to transport the SOM algorithm on Euclidean space to an SOM algorithm on a Grassmannian [7,14].
An interesting structure that generalizes Grassmannians and encodes additional geometry in data is known as the flag manifold.The points of a flag manifold parameterize the flags of a given type.Thus, a single point on a flag manifold corresponds to a sequence of nested subspaces.

Introduction to flag manifold with data analysis examples
In this section, we introduce the basics of the flag manifold, fix some terminology and notation, and provide examples of its appearance in the context of data analysis.
A flag of subspaces in R n is a nested sequence of sub- The signature or type of the flag refers to the dimensions of the V i .There are two standard ways to encode this dimension information.One way is as the sequence ðdim V 1 ; dim V 2 ; . ..; dim V d Þ.
A second way to encode this dimension information is as the sequence ðdim In this paper, we will use this second encoding for the type of flag.We let FLðn 1 ; n 2 ; . ..; n d Þ denote the flag manifold whose points parameterize all flags of type ðn 1 ; n 2 ; . ..; n d Þ.Thus, a point on this flag manifold corresponds to a nested sequence of As a special case, a flag of type ð1; 1; . ..; 1Þ is referred to as a full flag and FLð1; 1; . ..; 1Þ denotes the manifold whose points parameterize all full flags in R n .Figure 1 illustrates the nested structure of the first three low-dimensional elements comprising a full flag in R n .
A flag of type ðk; n À kÞ is simply a kÀdimensional subspace of R n (which can be considered as a point on the Grassmann manifold Gr(k, n)).Hence FLðk; n À kÞ ¼ Grðk; nÞ.The Grassmannian-SOM algorithm is developed in [7,14].The idea that the flag manifold is a generalization of the Grassmann manifold will be utilized later to introduce the geodesic formula on the flag manifold.The nested structure inherent in a flag shows up naturally in the context of data analysis.1. Wavelet analysis: Wavelet analysis and its associated multiresolution representation produces a nested sequence of vector spaces that approximate data with increasing resolution [2,15,16].Each scaling subspace V j is a dilation of its adjacent neighbor V jþ1 in the sense that if f ðxÞ 2 V j then a reduced resolution copy f ðx=2Þ 2 V jþ1 .The scaling subspaces are nested and in the finite-dimensional setting can be considered as a point on a flag manifold.In Fig. 2, we visualize points on the geodesic between the flags associated with Daubechies2 (Haar) and Daubechies4 as they are applied to a particular image of size 32 Â 32.To be more specific, for each timestamp t (0 t 1), we have a flag corresponding to a point on the geodesic between Daubechies2 and Daubechies4.Using the flag corresponding to one of these time steps, we can transform an MNIST image (considered as a 32 Â 32 matrix) by multiplying on both the left and the right by the projection matrices associated to each subspace in the flag.In this figure, each row is showing how this transformation affects the 32 Â 32 MNIST image while morphing along this geodesic.Each column is a visualization of the nested scaling subspaces, i.e., a 4-dimensional scaling subspace living in an 8-dimensional scaling subspace living in a 16-dimensional scaling subspace living in a 32-dimensional ambient space.Note that the last column remains constant for all t since it recovers the original MNIST image.2. SVD basis of a real data matrix: Let X 2 R nÂk be a real data matrix consisting of k samples in R n .Let URV T ¼ X be the thin SVD of X.The columns of the n-by-d orthonormal matrix U is an ordered basis for the column span of X.This basis is ordered by the magnitude of the singular values of X.This order provides a straightforward way to associate to U a point on a flag manifold.If U ¼ ½u 1 ju 2 j. ..ju d then the nested subspaces is a flag of type ð1; 1; . ..; 1; n À dÞ in R n .After we introduce the distance metric on the flag manifold in Sect.3.2, one could consider computing the distance between two flags, perhaps derived from a thin SVD of two different data sets, which takes the order of the bases into consideration.

Numerical representation and geodesics
A point in the vector space R n can be naturally represented by an n Â 1 vector.For a more abstract object like a Grassmann or flag manifold, we need a way to represent points in such a way that we can do computations.In this section, we describe how we can represent points and we describe how to determine and express geodesic paths between points.Note that in this paper we are using exp and log to denote the matrix exponential and the matrix logarithm.

Flag manifold
The flag manifold FLðn 1 ; n 2 ; . ..; n d Þ is a manifold whose points parameterize the set of all flags of type ðn 1 ; n 2 ; . ..; n d Þ.The presentation in [4] describes how to view the Grassmann manifold Gr(k, n) as the quotient Fig. 2 A visualization of the geodesic between the flags associated with Daubechies2 (Haar) and Daubechies4 transform matrix of size 32 Â 32 manifold OðnÞ=OðkÞ Â Oðn À kÞ where O(n) denotes the orthogonal group and OðkÞ Â Oðn À kÞ denotes the block diagonal matrix with elements from O(k) in the first block and elements from Oðn À kÞ in the second block.If we let SO(n) denote the special orthogonal group and SðOðkÞ Â Oðn À kÞÞ denote the subgroup of OðkÞ Â Oðn À kÞ consisting of matrices having determinant 1, then an equivalent description of Gr(k, n) is as the quotient manifold SOðnÞ=SðOðkÞ Â Oðn À kÞÞ.
In the same way, FLðn 1 ; n 2 ; . ..; SOðnÞ be an n-by-n special orthogonal matrix, the equivalence class [P], representing a point on the flag manifold, is the set of special orthogonal matrices A manifold closely related to FLðn 1 ; n 2 ; . ..; This map is subjective and is a 2 dÀ1 cover of FLðn 1 ; n 2 ; . ..; n d Þ.Thus, the inverse image of a point in the flag manifold is a collection of 2 dÀ1 points in the fully oriented flag manifold.
It is well known that the geodesic paths on SO(n) are given by exponential flows PðtÞ ¼ P expðtAÞ where A 2 R nÂn is any skew symmetric matrix and Pð0Þ ¼ P. The geodesics on SO(n), i.e., PðtÞ ¼ P expðtAÞ, continue to be geodesics on FLðn 1 ; n 2 ; . ..; n d Þ as long as they are perpendicular to the orbits generated by SðOðn 1 Þ Â Oðn 2 Þ Â Á Á Á Â Oðn d ÞÞ, which requires further constraints on the tangent vector A. FLðn 1 ; n 2 ; . ..; The tangent space to SO(n) at P, T P SOðnÞ, can be decomposed into a vertical space V P and a horizontal space H P .The vertical space is the set of vectors in the tangent space corresponding to motions flowing along the equivalence class [P] at P. The horizontal space is defined as the orthogonal (with respect to the Euclidean metric) complement of the vertical space in T P SOðnÞ.The Euclidean metric is defined as a function d : T P SOðnÞ Â T P SOðnÞ7 !R: dðU; VÞ ¼ TrðU T VÞ ¼ vecðUÞ T vecðVÞ Intuitively, the vectors in the vertical space can be thought of as the set of velocity vectors which preserve the equivalence class, while the vectors in the horizontal space modify the equivalence class.Therefore, tangent vectors to geodesics need to be further constrained to the horizontal space.If V is a tangent vector to FLðn 1 ; n 2 ; . ..; n d Þ, then there is a horizontal vector V 2 H P which represents V uniquely, which gives a numerical/matrix representation to the tangent vectors.
The vertical space at a point P is the set of matrices where A i is a n i -by-n i skew symmetric matrix.The horizontal space H P is the set of matrices which are orthogonal to the vertical space and living in T P SOðnÞ.Consider the following set of equations where A 2 R nÂn and A i 2 R n i Ân i are skew symmetric matrices.By solving the above system of equations, we can conclude that the horizontal space at P is the set of matrices where 0 n i denotes an n i Â n i matrix of zeros.This leads one to conclude that the geodesic paths on FLðn 1 ; n 2 ; . ..; n d Þ are exponential flows: where C is any skew symmetric matrix of the form 3.2 Skew symmetric matrix determines a geodesic between two points By Eq. ( 1), one may trace out the geodesic path on a flag manifold emanating from P in the direction of C. In this section, we utilize Eq. ( 1) to solve the inverse problem: Given two points Q 1 ; Q 2 2 SO(n), whose equivalence classes ½Q 1 ; ½Q 2 represent flags of type ðn 1 ; n 2 ; . ..; n d Þ, the goal is to obtain a factorization for H and M where H and M are constrained to be of the form where For additional details on this formula, see [4,21].If one starts with two special orthogonal matrices Q 1 and Q 2 , one can consider their equivalence classes ½Q 1 ; ½Q 2 as two points on a flag manifold.In order to compute their distance apart on the flag manifold FLðn 1 ; n 2 ; . ..; n d Þ, we consider the inverse image of ½Q 1 and ½Q 2 under the map /.The inverse image of each determines 2 dÀ1 points on FL O ðn 1 ; n 2 ; . ..; n d Þ.The algorithms that we propose in the next section compute the shortest length of a geodesic between a point in the inverse image of ½Q 1 and a point in the inverse image of ½Q 2 .The length of this shortest geodesic is the distance between ½Q 1 and ½Q 2 as points on FLðn 1 ; n 2 ; . ..; Equation ( 2) can be interpreted in the following way.First, we map Q 1 to a representative in ½Q 2 via the geodesic determined by the velocity matrix H. Second, we map this element in ½Q 2 to Q 2 via the matrix M. Figure 3 is a pictorial illustration of the idea behind Eq. (2).
For the more general case of computing the length of the geodesic between ½Q 1 and ½Q 2 (as shown in Fig. 3), we will present an iterative algorithm to obtain a numerical approximation of H and M in Sect.3.3.Before we proceed to the algorithm, let us further simplify Eq. ( 2) by letting This allows us to rewrite (1) as Here we define W as the vector space of all n-by-n skew symmetric matrices.Let p ¼ ðn 1 ; n 2 ; . ..; n d Þ.We define W p to be the set of all block diagonal skew symmetric matrices of type p and its orthogonal complement W ? p in W , i.e., where by definition, G i 2 R n i Ân i is skew symmetric for each i.Instead of solving Eq. ( 4) directly, we propose to solve the following alternative equation: where G 2 W p and H 2 W ? p .However, it is important to note that expðGÞ will produce an element in As a consequence, in these computations we implicitly work on the fully oriented flag manifold The fact that there is the natural 2 dÀ1 to 1 map from the fully oriented flag manifold to the flag manifold means we must compute values for H and G for many different representatives.As the output of the algorithm (Algorithm 2), we must pick the ''optimal'' H giving the shortest distance arising from this map.

Iterative alternating algorithm
The idea of the Iterative Alternating algorithm is straightforward.Given an initial guess G ½0 2 W p , since Q and G ½0 are known, we can solve for H numerically.Let Ĥ ¼ logðQ Á expðG ½0 Þ T Þ.Since Ĥ is generally not of the desired form (i.e., Ĥ 6 2 W ? p ), we project Ĥ onto W ? p to obtain an update for H.This projection has the effect of zeroing out entries in a certain pattern in Ĥ.We let H ½1 ¼ Proj W ? p ð ĤÞ.Then we start updating G. Let Ĝ ¼ logðexpðH ½1 Þ T QÞ then project Ĝ onto W p to obtain an update for G.This projection has the effect of zeroing out entries in a pattern complementary to what we did to obtain an update for H.We let G ½1 ¼ Proj G ð ĜÞ.Now iterate this process obtaining H ½2 then G ½2 and continue until the values stabilize.The pseudo code of our Iterative Alternating algorithm is presented in Algorithm 1 and Algorithm 2.

Algorithm 1: Iterative Alternating algorithm
Tr(H T H) while k ≤ itermax and err < do 11 We walk through two examples as an illustration of the numerical computation of the geodesic formula and distance between two points on a flag manifold.Here two types of flag manifolds are utilized to illustrate how the geometry of a Grassmann manifold differs from that of a related flag manifold.Let Here we look at two different flag structures: 1. Flag manifold of type p ¼ ð2; 2Þ: The initial G 0 (and any other G i in the iterative procedure) should be of the form The output velocity matrix H (and any other H i in the iterative procedure) should be of the form The unique singular values of output H are k 1 = 1.0172 and 2 ¼ 0:5536.The geodesic distance is therefore 4).It is easy to verify that k 1 ; k 2 are exactly the principal angles between X and Y. 2. Flag manifold of type p ¼ ð1; 1; 2Þ: For this example, the G i 's and H i 's should be of the form respectively.The unique singular values of output H are k 1 = 1.0469, k 2 ¼ 0:5404 and the geodesic distance Algorithm 3: Flag SOM algorithm The geodesic distance is larger than the previous example since we have imposed more structure in this example.

SOM on flag manifolds
In this section we extend the SOM algorithm to the setting of flag manifolds.The general setting of SOM starts with a set of training data x ðlÞ with l ¼ 1; . ..; p and an initial set of randomized centers fC i g where the subscript i is associated to the label of the low-dimensional index a i .The standard SOM center update equation is given by, The superscript m is indicating the m-th iteration in the SOM algorithm.Here i Ã is the winning center of data point X, i.e., We also set the localization function as the standard hðsÞ ¼ e Às 2 =r 2 and d is the metric which induces the geometry on the index set.Here we mainly focus on the simple one, where the indices are enumerated by subscript, i.e., the index set contains a 1 ; a 2 , . ..; a N .On the flag manifold, points are no longer living in a Euclidean space thus cannot be moved using the standard update equation.For a given data point X from a flag manifold of type p ¼ ðn 1 ; n 2 ; . ..; n d Þ, we identify the winning center, from the set of all nested subspaces of type p which represent centers fC i g, that is closest via where d g is defined in Eq. (3).To move the centers toward the nested subspace pattern X according to the SOM update We used 15 pixels to form the SVD basis we compute the geodesic, using the Iterative Alternating algorithm described in Algorithm 1 and 2, between each center C i and nested subspace pattern X.
Our localization term now becomes t n ¼ n h n ðdða i ; a i Ã ÞÞ: We now take where n ¼ 0 ð1 À n=TÞ and r n ¼ r 0 ð1 À n=TÞ.The centers thus change along the geodesic by moving from C i ð0Þ to C i ðtÞ where t is adjusted for the step size.The algorithm for SOM on a flag manifold is summarized in Algorithm 3.

Numerical experiment
In this section, to illustrate the proposed method for visualizing real world data, we implement it on the well known Indian Pine hyper-spectral image data set [12].This data set was collected over an agricultural area in Northwestern Indiana in 1992.It consists of 145 Â 145 pixels by 220 bands from 0:4lm to 2:4lm.This data set has been previously studied within the context of band selection [1], which will be utilized here to show the advantage of the flag manifold as a refined version of the Grassmann manifold.For the illustration of the algorithm, we consider a two-class problem and a three-class problem.We preprocessed data via mean centering, i.e., each pixel is subtracted by the mean value of the pixels for the whole scene (spectrum).In this application we selected only 5 bands (hence the ambient space is R 5 ) to form 5 Â 5 ordered orthogonal SVD basis matrices.Each SVD basis represents a data point within a specific class.The number of pixels (with the same class labels) required to form a robust SVD basis is also explored in this experiment.
For the two class problem we initialized the centers for flag-SOM by selecting 100 5 Â 5 orthogonal matrices at random, corresponding to a 10 Â 10 integer lattice.This was done by computing the singular value decomposition of matrices of size 5 Â 5 from the uniform distribution.We also assemble 15 5 Â k matrices Y i from both classes, which results in constructing 30 data points U i living on the flag manifold Fl(1, 1, 3).Here U i is the ordered set of left singular vectors of the corresponding data matrix Y i , i.e., In Fig. 4, we observe that as we increase the number of pixels (k) used to form SVD bases, more robust and clear separation is achieved via flag-SOM between two classes, namely Corn-notill and Grass/Trees.When k ¼ 15 pixels are used to construct SVD bases, there is a linear separation between two classes.With smaller values of k (e.g.k ¼ 5 or k ¼ 10), we observe a lack of linear separability.
In Fig. 5, we see the results of flag-SOM on the three class problem when the data points reside on Fl(1, 1, 3).225 centers (5 Â 5 orthogonal matrices) are randomly generated as the centers of the 15 Â 15 integer lattice.15 SVD bases from each class is generated as described previously.We observe in Fig. 5 that with only 5 bands selected from 220 bands, we still obtain an excellent clustering with all three classes.Here we also measure the quality of the flag-SOM by computing the topographic error.First, we define two centers to be adjacent in their index space if their indices has distance 1 (Note that indices are defined on the integer grid).We obtain a topographical error of 0.22.If we relax the definition of adjacency by allowing the surrounding 8 nodes on the integer grid to be considered as adjacent, the topographical error becomes 0.04.
For a numerical comparison of SOM visualizations, we introduce a distortion error to measure the separation and compactness for the class distributions on the SOM grid, in our case, the integer grid.Let a 2 R 2 be the coordinates of the winning centers on the integer grid, which belong to The visualization distortion error is defined as In Fig. 6, we demonstrate the Grassmannian-SOM on the same data set for the purpose of comparison.We observe that with low ambient dimension, the Grassmannian-SOM shows poor separation on the 2D grid with a distortion error of 2023.The flag-SOM visualization obtains well separated classes with a much lower distortion error of 981.Note that the Grassmannian-SOM suffers from the low ambient dimension while flag-SOM is still separating classes well thanks to the refined structure of the flag manifold.

Conclusions and future work
We have presented algorithms for Self-Organizing Mappings on flag manifolds.Techniques for computing the key ingredients of the SOM on flags are determining distances between flags and moving one flag a prescribed distance in the direction of another flag.The algorithm was tested on a sample problem that involves computing an ordering of points on a flag manifold.The flag-SOM algorithm has been demonstrated on hyper-spectral image data, in which case the algorithm organizes the hyper-spectral image data in the index space and separates 5 Â 5 SVD bases when only 5 out of 220 bands are utilized.Note that we have yet to explore the impact of the flag structure for the flag-SOM algorithm.Searching for an optimal flag structure has the potential to improve the visualization results.

Fig. 1
Fig. 1 Illustration of a nested sequence of subspaces corresponding to a point on the flag manifold FLð1; 1; . ..; 1Þ

and M 2
SOðnÞ.However, related to the covering map mentioned above, this factorization has multiple solutions.The expression QðtÞ ¼ Q 1 exp tH, as t varies from 0 to 1, traces out a geodesic path between Q 1 and Q 1 expðHÞ.The length of the geodesic path between Q 1 and Q 1 expðHÞ can be computed as a function of the eigenvalues of H which can be simplified to the expression Length of Path ¼

Fig. 3
Fig. 3 Illustration of Eq. (2).The vertical lines represent the equivalence classes ½Q 1 and ½Q 2 , respectively.Q 1 is mapped to an element in ½Q 2 by right multiplication with expðHÞ which is then sent to Q 2 by multiplying with M

Fig. 4 Fig. 5
Fig. 4 Flag-SOM visualization results of Corn-notill and Grass/Trees.Left: 5 pixels used to form the SVD basis.Middle: 10 pixels are used to form the SVD basis.Right: 15 pixels are used to form the SVD basis