Procrustes-based distances for exploring between-matrices similarity

The statistical shape analysis called Procrustes analysis minimizes the distance between matrices by similarity transformations. The method returns a set of optimal orthogonal matrices, which project each matrix into a common space. This manuscript presents two types of distances derived from Procrustes analysis for exploring between-matrices similarity. The first one focuses on the residuals from the Procrustes analysis, i.e., the residual-based distance metric. In contrast, the second one exploits the fitted orthogonal matrices, i.e., the rotational-based distance metric. Thanks to these distances, similarity-based techniques such as the multidimensional scaling method can be applied to visualize and explore patterns and similarities among observations. The proposed distances result in being helpful in functional magnetic resonance imaging (fMRI) data analysis. The brain activation measured over space and time can be represented by a matrix. The proposed distances applied to a sample of subjects -- i.e., matrices -- revealed groups of individuals sharing patterns of neural brain activation.


Introduction
Applications in several fields, such as ecology (Saito et al., 2015), biology (Rohlf and Slice, 1990), analytical chemometrics (Andrade et al., 2004), psychometrics (Green, 1952;McCrae et al., 1996), and neuroscience (Haxby et al., 2011) need to compare information described by matrices expressed in an arbitrary coordinate system.The dimension of the matrices corresponding to this arbitrary coordinate system results to be a functional misalignment.In this context, the statistical shape analysis (Dryden and Mardia, 2016) called Procrustes analysis (Gower et al., 2004) can be helpful.Briefly, the Procrustes analysis aligns the matrices into a common reference space by similarity transformations (i.e., rotation, reflection, translation, and scaling transformations).The optimal similarity transformations are those that minimize the squared distance between the matrices.
Several Procrustes-based functional alignment approaches can be found in the literature; two of the most used ones are the orthogonal Procrustes problem (OPP) (Berge, 1977) and the generalized Procrustes analysis (GPA) (Gower, 1975).The first deals with the alignment of two matrices, while the second finds optimal similarity transformations when more than two matrices are analyzed.OPP has a closed-form solution, while GPA is based on an iterative algorithm proposed by Gower (1975).Since the Procrustes problem can be seen as a least squares problem, Goodall (1991) translated it into a statistical model, i.e., the perturbation model, where the error terms follow a matrix normal distribution (Gupta and Nagar, 2018).
Neuroscience is one of the fields where Procrustes-based methods are most widely used.In particular, functional magnetic resonance imaging (fMRI) is the most widely used technique for studying the neural underpinnings of human cognition.Brain activation is expressed as the correlation between the sequence of cognitive stimuli and the sequence of measured blood oxygenation levels (BOLD).In response to neural activity, changes in brain hemodynamics affect the local intensity of the magnetic resonance signal, that is, the voxel intensity (single-volume elements).However, various criticalities arise when analysis (e.g., classification analysis, inference analysis) between subjects is performed.The anatomical and functional structures of the brain greatly vary between subjects, even if time-synchronized stimuli are proposed to the participants (Watson et al., 1993;Tootell et al., 1995;Hasson et al., 2004).For that, the alignment step is an essential part of the preprocessing procedure in fMRI group-level analysis.Anatomical normalization (e.g., Talairach (1988); Jenkinson et al. (2002); Fischl et al. (1999)) fixes the anatomical misalignment through affine transformations, where brain images are aligned to a standard anatomical template (e.g., Talairach template (Talairach, 1988), Montreal Imaging Institute (MNI) template (Collins et al., 1994)).However, the anatomical alignment does not capture the functional variability between subjects, which is a well-known problem in the neuroscience literature (Watson et al., 1993;Hasson et al., 2004;Tootell et al., 1995).
The brain activation of one subject can be described by a matrix where the rows represent the time points/stimuli and the columns the voxels.Therefore, each row shows the response activation to one stimulus across all voxels, and each column expresses the time series of activation for each voxel.The functional misalignment can be focused on the columns between matrices, i.e., the time series of activations are not in correspondence between subjects, while the response activations are since the stimuli are generally time-synchronized (Haxby et al., 2011;Andreella and Finos, 2022).In the context of fMRI data, one of the most popular Procrustes-based functional alignment methods is the hyperalignment technique proposed by Haxby et al. (2011), which is a sequential approach to OPP.However, both OPP and GPA and hyperalignment suffer from low interpretability of aligned matrices (i.e., fMRI images) and related results (e.g., statistical t-tests, classifier coefficients) as well as in-applicability in high-dimensional data.In particular, in fMRI data analysis, the first problem leads to losing the anatomical interpretation of the final aligned images, and the second one makes it impossible to apply the alignment method to the whole brain.The low interpretability is caused by the ill-posed structure of the Procrustes-based approaches: they do not return a unique solution for the optimal orthogonal transformation.For further details about the functional alignment problem in the fMRI data analysis framework, please see Andreella et al. (2022).
For that, Andreella and Finos (2022) proposed an extension, i.e., the ProMises model, of the perturbation model developed by Goodall (1991).In particular, the perturbation model rephrases the Procrustes problem as a statistical model.The extension of Andreella and Finos (2022) is focused on inserting a penalization in the orthogonal matrix's estimation process, specifying a proper prior distribution for the orthogonal matrix parameter.The von Mises-Fisher distribution (Downs, 1972) is used to insert prior information about the final structure of the common space.Thanks to that, the no-uniqueness problem of the Procrustes-based methods is solved, getting an interpretable estimator for the orthogonal matrix transformations.This permits to have unique aligned matrices as well as related statistical inference results.The alignment process does not affect the type I error since the ProMises model can be seen as a procedure that sorts the null hypotheses based on a priori information (Blinded, 2022).The computation of the maximum a posteriori estimate is straightforward; in fact, the von Mises-Fisher distribution is a conjugate prior to the matrix normal distribution (Gupta and Nagar, 2018), which is the distribution of the error terms in the ProMises and perturbation models.
In this work, we present a method that exploits the information coming from the functional misalignment resulting from Procrustes-based methods (e.g., GPA, hyperlalignment and ProMises model).We propose here two distance metrics (Deza and Deza, 2006) that capture different perspectives of similarity/dissimilarity between matrices, e.g., subjects in the fMRI cases.The minimization problem solved by Procrustes's methods can also be defined as distance among objects (Dryden and Mardia, 2016).The first distance metric presented here is based on the residuals coming from the solution of a Procrustes problem.The residual-based distance expresses then how the matrices/subjects are different/similar after functional alignment.In this case, the distance metric captures the dissimilarity/similarity in terms of noise since the matrices have the same orientations after functional alignment.Instead, the second distance exploits the orthogonal matrix parameters solution of the Procrustes problem.The rotational-based distance computes the squared distance between these estimated orthogonal matrices.As we will see, this metric measures the level of dissimilarity/similarity in orientation between matrices/subjects before functional alignment.
In the paper, we show how these metrics can be used in distance-based techniques such as the multidimensional scaling method (Carroll and Arabie, 1998), hierarchical clustering (Murtagh and Contreras, 2012) and t-distributed stochastic neighbor embedding (t-SNE) (Van der Maaten and Hinton, 2008) in order to visualize and quantify patterns and shared characteristics between matrices (i.e., individuals described by multiple dimensions).
The paper is organized as follows.Section 2 introduces the Procrustesbased methods.The core of the manuscript is contained in Section 3, where the residual-based and rotation-based distances are proposed.Finally, we explain how to use the distances between rotations and residuals as a tool to understand the underlying clusters between subjects in the fMRI data analysis framework in Section 4. The analyses of this manuscript are performed using the R package alignProMises available at https://github.com/angeella/alignProMises for the functional alignment part, and using the R package rotoDistance available at https://github.com/angeella/rotoDistance for the computation of the rotational-based and residual-based distances.

Procrustes analysis
Let {X i ∈ R n×m } i=1,...,N be a set of matrices to be aligned.The Procrustes analysis uses similarity transformations (Gower, 1975), i.e., scaling, rotation/ reflection, and translation, to map {X i ∈ R n×m } i=1,...,N into a common reference space.
If only two matrices are analyzed, i.e., N = 2, we can consider one of the two matrices as a common reference matrix.The orthogonal Procrustes problem (OPP) is then applied and defined as: where defines the translation vector, and 1 n ∈ R 1×n is a vector of ones.
The optimal translation results to be the column-centering, while R and where U DV is the singular value decomposition of X i X j .
If more than two matrices are analyzed, i.e., N > 2, the generalized Procrustes analysis (GPA) must be applied.In this case, the set of matrices {X i ∈ R n×m } i=1,...,N are mapped by similarity transformations into a common reference matrix M ∈ R n×m .This common reference matrix can be defined in several ways, e.g., element-wise arithmetic mean.The GPA is defined as (3) Unlike OPP, GPA does not have a closed-form solution for R i and α i , and an iterative algorithm must be used where at each step, the reference matrix is updated (Gower, 1975).
Another approach is the perturbation model proposed by Goodall (1991), where the least squares problem defined in Equation 3 is translated as a statistical model assuming that {X i } i=1,...,N are noisy rotations of a common space M .
The perturbation model is then defined as follows: where E i is the random error matrix following a normal matrix distribution (Gupta and Nagar, 2018) The similarity transformations are represented by the following parameters R i , α i , and t i that must be estimated for each i = 1, . . ., N .The optimal similarity transformations Ri and αi Ri are slight modifications of the ones found by OPP and GPA: where The extension of the perturbation model is proposed by Andreella and Finos (2022), where the orthogonal matrix parameter R i follows a von Mises-Fisher distribution (Downs, 1972): where F ∈ R m×m is the location matrix parameter, k ∈ R + represents the regularization parameter and C(F , k) is the normalizing constant.Andreella and Finos (2022) found that the maximum a posteriori estimates are slight modifications of the perturbation model proposed by Goodall (1991) (i.e., without imposing the von Mises-Fisher prior distribution for R i ).The estimators for the sets of parameters {R i } i=1,...,N and {α i } i=1,...,N are essentially the same but decomposing The straightforward solutions are due to the conjugacy of the von Mises-Fisher distribution to the matrix normal distribution (Green and Mardia, 2006;Andreella and Finos, 2022).Therefore, the prior information enters directly into the singular value decomposition step of the estimation process.
The motivation to impose an a priori distribution to the orthogonal matrix parameter R i stems from the assumption that "the anatomical alignment is not so far from the truth".The information of the three-dimensional spatial coordinates of the voxels is then inserted into the estimation process thanks to a proper definition of the prior location parameter F ∈ R m×m .Andreella and Finos (2022) define F as a similarity Euclidean distance.In this way, the rotation loadings that combine closer voxels are higher than the ones that combine voxels that are far apart.In addition, defining F as a similarity Euclidean matrix leads to X i Σ −1 n M Σ −1 m + kF having full rank, i.e., unique solution for R i .
Finally, Andreella and Finos (2022) proposed an efficient version of the ProMises model in the case of high-dimensional data.The problem when m >> n arises since the ProMises model, and also the perturbation model, must compute N singular value decompositions of matrices with dimensions m × m.Andreella and Finos (2022) use specific semi-orthogonal transformations to project the matrices X i ∈ R n×m into the lower dimensional space R n×n .In particular, if we consider as m × n semi-orthogonal transformation Q i the ones coming from the thin singular value decomposition (Bai et al., 2000) of X i we reach the same fit of data but reducing the time complexity from O(m 3 ) to O(mn 2 ), and the space complexity from O(m 2 ) to O(mn).
Briefly, the Efficient ProMises applies the semi-orthogonal transformation Q i to X i and then applies the ProMises model on the set of lower dimensional matrices {X i Q i ∈ R n×n }.The efficient ProMises model allows the alignment of high-dimensional data such as fMRI data where the dimension m (i.e., the number of voxels) equals approximately 200, 000.
For further details about the ProMises model and its Efficient version, please see Andreella and Finos (2022).

Procrustes-based distances
Procrustes-based methods (i.e., OPP, GPA, perturbation model, or ProMises model) find the orthogonal matrices that, applied to the original matrices, minimize the Frobenius distance among resulting matrices.It is, therefore, natural to define a distance that is based on this quantity: the squared residuals among aligned matrices.In this case, we measure how different two matrices are beyond rotation.Two matrices can look very different, while they may result to be very similar after rotation.Residual-based distance succeeds in capturing this aspect, thus evaluating only the distance between matrices net of rotations.
The second kind of distance that we will define is based on the rotational effort that is taken to align one matrix X i to another matrix X j .This effort is measured as the distance between the orthogonal matrix that solves the Procrustes problem Ri and I m (i.e., the matrix that does not operate any rotation): the larger the distance between the Ri and I m , the bigger the effort to align X i to X j .
In the following, we give the formal definitions of residual-based and rotationalbased distances: Definition 1 Consider a set of matrices { Xi ∈ R n×m } i=1,...,N functionally aligned by some Procrustes-based method presented in Section 2, i.e., Xi = αi Ri X i Ri .
The residual-based distance is defined as: We can note that the residual-based distance defined in Equation 6 is directly related to the GPA defined in Equation 3. If we consider two matrices, the distance is simply the pair's contribution within the GPA minimization problem, precisely the optimization's residuals.
Definition 2 Consider a set of orthogonal matrices { Ri ∈ O(m)} i=1,...,N estimated by some Procrustes-based method presented in Section 2. The rotationalbased distance is defined as: Since both distances are based on the matrix Frobenius norm, this implies that d Re (•) and d Ro (•) can be considered directly as a valid metric, i.e., distance functions d Re : R n×m × R n×m → R ≥0 and d Ro : Therefore, if d Re = 0, the two matrices are functionally similar without considering the orientation characteristics.In the same way, as d Re increases, dissimilarity in functional terms increases without considering orientation again.Instead, if d Ro = 0, we have two images sharing the same orientation, i.e., functional (mis)alignment concerning the reference matrix M .In the same way, if d Ro > 0, the two matrices have different orientations in terms of column dimension.
Indeed, the definition of rotational-based distance can be significantly simplified, thus simplifying both the computational calculation and the interpretation of distance itself.This is formalized in the following: Proof 1 Considering the rotational-based distance, the trace of the product between the two orthogonal matrices Ri and Rj can take only values between m and −m since the eigenvalues of an orthogonal matrix lie on the unit circle (i.e., they have module equal to 1).If tr( R i Rj ) = m, this means that Ri = Rj since the only way for an orthogonal matrix to have all the eigenvalues equal to 1 is being an identity matrix.In the same way, if the trace equals −m, this means that R i Rj = −I m , i.e., Ri = − Rj .
The residual-based distance and the rotational-based distance are then computed for each pair of aligned matrices { Xi } i=1,...,N and for each pair of orthogonal matrices { Ri } i=1,...,N , resulting in a the global distance matrix D ∈ R N ×N .Information from different matrices with large dimensions can be summarized through the proposed distance matrix, which will turn out to be of lower dimension, i.e., of dimension N × N .These distance matrices can be handy in various applications, particularly when handling big data.In the literature, various statistical methods are based on the distance matrix.However, they generally focus on analyzing the distances of several covariates described by a single matrix or on analyzing multiple distance matrices (e.g., the INDSCAL method proposed by Carroll and Arabie (1998)).In contrast, the distance matrix proposed in this manuscript directly summarizes several large matrices' similarity and dissimilarity characteristics.
This matrix D can then be used inside a dissimilarity-based algorithm such as the multidimensional scaling technique (Carroll and Arabie, 1998), hierarchical clustering (Murtagh and Contreras, 2012) and t-distributed stochastic neighbor embedding (t-SNE) (Van der Maaten and Hinton, 2008).

Application
We analyze 24 subjects passively looking at food and no-food (office utensils) images collected by Smeets et al. (2013).The food/no-food images are proposed to the participants alternately (24 seconds of food images and 24 seconds of no-food images) with a rest block of 12 seconds on average showing a crosshair.The food stimulus is a collection of attractive foods to capture brain activations concerning self-regulation in response to viewing images of tempting (i.e., palatable high-caloric) food (Smeets et al., 2013).
The dataset was preprocessed using the Functional MRI of the Brain Software Library (FSL) (Jenkinson et al., 2012) following a standard processing pipeline.The registration step to standard space images was computed using FLIRT (Jenkinson and Smith, 2001), the motion correction using MCFLIRT (Jenkinson et al., 2002), the non-brain removal using BET (Jenkinson et al., 2002), and spatial smoothing using a Gaussian Kernel FWHM (6mm).Finally, the intensity normalization of the entire four-dimensional dataset was computed by a single multiplicative factor, and the high-pass temporal filtering (Gaussian-weighted least-squares straight line fitting, with sigma=64.0s)was applied.The raw dataset is available at https://openneuro.org/datasets/ ds000157/versions/00001, while the preprocessed one is available in the R package rotoDistance (https://github.com/angeella/rotoDistance).For details about the experimental design and data acquisition, please see Smeets et al. (2013).
We analyze the right calcarine sulcus composed of 237 voxels being an area involved in processing visual information and related to regions involved in the regulation of food intake (Smeets et al., 2013).However, the whole brain can be analyzed instead of only a region of interest (e.g., right calcarine sulcus).In fact, the distances proposed permit to resume complex high-dimensional data, like the fMRI ones, that are generally composed by N matrices having dimensions 300 × 200, 000 (i.e., 300 time points and 200, 000 voxels), through a matrix of low N × N dimensions.
The ProMises model is fitted on preprocessed data, and aligned images are then used to compute the distance matrix D Re ∈ R 24×24 (i.e., residualbased distances), while the corresponding optimal rotation matrices are used to compute D Ro ∈ R 24×24 (i.e., rotational -based distances) as described in Section 3.
These two types of Procrustes-based distance capture different information, i.e., the between-subjects dissimilarity in terms of brain activations before and after functional alignment.Figure 1 shows the distances D Re and D Ro for each pair of subjects.The correlation between them is very low, i.e., ≈ 0.03, as we can note from Figure 1, i.e., D Re and D Ro return two distinct insights regarding the between-subjects dissimilarity brain activations.
The multidimensional scaling (MDS) technique (Carroll and Arabie, 1998) is now applied considering D Ro as distance matrix.A second analysis is run using D Re .For comparison porposes, we also computed the Euclidean distances between images that are not functionally aligned.We denote the corresponding distance matrix as F .We used the smacof R package (De Leeuw and Mair, 2009) for applying the multidimensional scaling technique.We decided to apply the spline MDS (monotone spline transformation) with as much flexibility as possible.See De Leeuw and Mair (2009) for more details.
Furthermore, we have some covariates for each subject to analyze, briefly described in Table 1 together with age, body mass index (BMI), and other information.We then analyze these covariates with the matrix of fitted configurations computed by the multidimensional scaling approach.Please see Smeets et al. (2013) for more details.
Focusing firstly on the distance matrix D Ro , Figure 2 shows the stress value considering several numbers of dimensions K = {1, . . ., 20} into the  multidimensional scaling method.We evaluated that K = 11 is a good value corresponding to stress ≈ 0.05.
We then performed simple generalized linear regressions with the covariates as dependent variables and the 11 configurations fitted by MDS as explanatory variables.We found a significant relationship between the covariate diet success and 6 dimension (t = −4.171,p = 0.0065) in accordance with the results found by Smeets et al. (2013).The p-value reported was adjusted for multiple testing using the Bonferroni method (Goeman and Solari, 2014).Therefore, Figure 3 shows the 1 and 6 fitted configurations along with the main covariate (diet success) and cycle phase covariate.
At first look, we can note how the y axis represents the success in a diet, where negative values correspond to low success and positive values high success in a diet.We can also note, for example, that subjects 16 and 18 share the same functional misalignment with similar diet success values and the same cycle phase.
However, if we instead apply multidimensional scaling on the matrix of residual-based distances, we do not find patterns as clear as those found using rotation-based distances, as can be seen from Figure 4.In this case, we automatically set the number of dimensions equal to 11 (which is equivalent to a stress of 0.05).The generalized linear regressions did not show any significant features, unlike the first analysis based on the distances of the rotations.
Finally, we would have the same situation found using D Re (or even worse) if we used as distance matrix D raw (i.e., using images not functionally aligned).The stress value equals 0.02 considering 11 dimensions, and no significant dimensions were found from the generalized linear regressions.Figure 5 represents the multidimensional scaling results using D raw .The two dimensions do Figure 5: D raw analysis: The x-axis represents the 1th fitted configuration computed by the multidimensional scaling approach, while the y-axis shows the 6th fitted configuration.The color gradient describes the appetite post covariate (scaled), while the size of the points specifies the phase of the cycle.
not capture the subject-level features analyzed.
To sum up, the rotational -based distance D Ro allows capturing the functional variability in neural response in terms of rotational effort.The subjects' neural activation then differs in terms of orientation.However, these differences are lessened after and before the application of functional alignment, i.e., considering the residual -based distance D Re and the raw -based distance D raw when some subject-specific covariates are analyzed in the same time.

Conclusions
In this manuscript, we proposed Procrustes-based distances based on the aligned images and orthogonal transformations estimated by Procrustes-based methods.These distances permit the exploration of the dissimilarity between matrices from two independent points of view.The residual -based distance expresses the dissimilarity in terms of functional columns net to rotations, i.e., eliminating the orientation component.Instead, the rotation-based distance describes the dissimilarity in terms of functional (mis)alignment of the matrices' columns, i.e., how the matrices have similar column orientations.The method is helpful when the research aim is to analyze matrices expressed in an arbitrary coordinate system.In addition, the proposed distances can also be advantageous when the focus is exploring the distances between big data matrices, e.g., fMRI application.In this framework, each subject is represented by a vast matrix with approximately 300 × 200, 000 dimensions.The Procrustesbased distances permit the exploration of these matrices in a space with di-mensions equal to the number of matrices/subjects analyzed.In the fMRI application, we found that these metrics result in reliable measures of individual differences.In fact, the Procrustes-based functional alignments permit reducing confounds from topographic idiosyncrasies and capturing variation around shared functional and anatomical responses across individuals.The distances proposed in this manuscript allowed to find groups of individuals sharing patterns of neural brain activation.In conclusion, the Procrustesbased distances thus add valuable exploratory and visualization tools to the world of Procrustes' methods.

Proposition 1
The rotational-based distance defined in Definition 2 can be expressed as: d Ro ( Ri , Rj ) = 2m − 2tr( R i Rj ) and takes values in [0, 4m].The same result can be obtained using the residualbased distance d Re defined in Equation 6 when Xi , Xj ∈ O(m).

Figure 1 :Figure 2 :
Figure 1: Scatterplot between D Re and D Ro .

Figure 3 :Figure 4 :
Figure3: D Ro analysis: The x-axis represents the 1th fitted configuration computed by the multidimensional scaling approach, while the y-axis shows the 6th fitted configuration.The color gradient describes the diet success covariate (scaled), while the size of the points specifies the phase of the cycle.