Procrustes Analysis for High-Dimensional Data

The Procrustes-based perturbation model (Goodall in J R Stat Soc Ser B Methodol 53(2):285–321, 1991) allows minimization of the Frobenius distance between matrices by similarity transformation. However, it suffers from non-identifiability, critical interpretation of the transformed matrices, and inapplicability in high-dimensional data. We provide an extension of the perturbation model focused on the high-dimensional data framework, called the ProMises (Procrustes von Mises–Fisher) model. The ill-posed and interpretability problems are solved by imposing a proper prior distribution for the orthogonal matrix parameter (i.e., the von Mises–Fisher distribution) which is a conjugate prior, resulting in a fast estimation process. Furthermore, we present the Efficient ProMises model for the high-dimensional framework, useful in neuroimaging, where the problem has much more than three dimensions. We found a great improvement in functional magnetic resonance imaging connectivity analysis because the ProMises model permits incorporation of topological brain information in the alignment’s estimation process. Supplementary Information The online version contains supplementary material available at 10.1007/s11336-022-09859-5.

The Procrustes problem is aimed at matching matrices using similarity transformations by minimizing their Frobenius distance. It allows comparison of matrices with dimensions defined in an arbitrary coordinate system. This method raised the interest of applied researchers hence highlighting its potentiality through a plethora of applications in several fields, such as ecology (Saito et al., 2015), biology (Rohlf & Slice, 1990), analytical chemometrics (Andrade et al., 2004), and psychometrics (Green, 1952;McCrae et al., 1996).
The interest of a large audience from applied fields stimulates, in parallel, the growth of a vast body of literature. Despite this, essentially all applications comprise spatial coordinates (i.e., two-or three-dimensional). Haxby et al. (2011) first introduced the use of this approach into a different context: align functional Magnetic Resonance Images (fMRI). The coordinates are hence substituted by voxels (i.e., three-dimensional pixels), and the problem becomes inherently highdimensional. The approach rapidly grew in popularity in the neuroimaging community because of its effectiveness. However, the proposed solution is naive; the extension from the spatial context to a more general and high-dimensional one is a theoretical challenge that needs adequate attention.
The most serious concern is the results' interpretability. In most cases, Procrustes methods turn into an ill-posed problem. It is a barely noticeable problem with spatial coordinates because the solution is unique up to rotations; hence, the user has the freedom to choose the point of view that provides the nicest picture. When the dimensions do not have a spatial meaning, any rotation completely changes the interpretation of the results. the R (R Core Team, 2018) package alignProMises. We report the proofs of the main lemmas here, whereas the remaining proofs are included in the supplementary material.

Background
Let {X i ∈ IR n×m } i=1,...,N be a set of matrices to be aligned. The Procrustes-based method uses similarity transformations to match each matrix to the target one as closely as possible, according to the Frobenius distance.
Each matrix X i could be then assumed to be a similarity transformation of a shared matrix M ∈ IR n×m , which contains the common reference space's coordinates, plus a random error matrix E i ∈ IR n×m . The perturbation model proposed by Goodall (1991) is then reported. Definition 1. (Goodall, 1991) Let {X i ∈ IR n×m } i=1,...,N be a set of matrices to be aligned and O(m) the orthogonal group in dimension m. The perturbation model is defined as where E i ∼ MN n,m (0, n , m ) -i.e., the matrix normal distribution with n ∈ IR n×n and m ∈ IR m×m scale parameters-M ∈ IR n×m is the shared matrix, α i ∈ IR + is the isotropic scaling, t i ∈ IR 1×m defines the translation vector, and 1 n ∈ IR 1×n is a vector of ones.
To simplify the problem, the column-centered X i are considered. The distribution is where C n = I n − 1 n J n , I n ∈ IR n×n is the identity matrix, J n is a n × n matrix of ones, and vec( A) the vectorization of the matrix A. Let the singular value decomposition of C n = , where ∈ IR n×(n−1) , then: The transformation leads to independence between the rows of E i without affecting the estimation of R i . C n X i and C n M have now n − 1 rows; however, we can simply re-project them on IR n×m using . Because we can always writeX i = C n X i ,M = C n M, and n = C n n C n without loss of generality, we can re-write Definition 1 as follows: Definition 2.
where E i ∼ MN nm (0, n , m ). This way, we obtain the following: The following notation is also adopted: || · || to indicate the Frobenius norm and < ·, · > for the Frobenius inner product (Golub & van Loan, 2013). The main objective of this work is comparing the shapes X i instead of the form's analysis of the matrices. For that, the parameters of interest are R i and α i , whereas M, n , and m are considered as nuisance parameters for each i = 1, . . . , N . The estimation of the unknown parameters changes if these nuisance parameters are known. Section 1.2 initially presents the estimates under this assumption and then provides the more realistic case of unknown nuisance parameters.

Estimation of the Perturbation Model
We formalize some results from Theobald and Wuttke (2006) in the case of known nuisance parameters M, m , and n , with n and m positive definite matrices by the following theorem: Theorem 1. (Theobald & Wuttke 2006) Consider the perturbation model described in Definition 2, and the singular value decomposition X i −1 Now consider N independent observations X 1 , . . . , X N . The joint log-likelihood is simply the sum of N log-likelihoods.
In the case of unknown nuisance parameters M, m , and n , the joint likelihood cannot be written as product of separated likelihoods, one for each X i , because each of the unknown parameters is a function of the others. The solution must be found by an iterative algorithm. In particular, the two covariance matrices m and n can be estimated by a two-stage algorithm defined in Dutilleul (1999) }/N n are maximum likelihood estimators. The necessary and sufficient condition for the existence ofˆ n andˆ m is N ≥ m n + 1, assuming m and n are positive definite matrices. In real applications, this assumption could be problematic. For example, in fMRI data analysis, m roughly equals 200, 000, and n approximately equals 200; therefore, the researcher would have to analyzing at least 1, 001 subjects, which is virtually impossible because fMRI is costly.
Various solutions can be found in the literature: Theobald and Wuttke (2006) proposed a regularization for the covariance matrix, whereas Lele (1993) estimated n using the distribution of X i X i . In this work, we maintain a general formulation of the estimator forˆ m = g(ˆ n ,M, X i ) andˆ n = g(ˆ m ,M, X i ), because we aim to find a proper estimate of R i rather than n and m . The shared matrix M is estimated by the element-wise arithmetic mean of We then modified the iterative algorithm of Gower (1975) (i.e., generalized Procrustes analysis), to estimate R i . Groisser (2005) proved the convergence of the generalized Procrustes analysis algorithm. However, it leads to non-identifiable estimators of R i , i = 1, . . . , N , proved by the following lemma:

Algorithm 1
Generalized Procrustes analysis (Gower, 1975) with the two-stage algorithm from Dutilleul (1999), where T is the threshold for ||M −M old || 2 , maxIt is the maximum number of iterations allowed, and 1 and 2 denote two infinitesimal positive quantities.
Require: X i , T , maxIt, ∀i = 1, . . . , N Ensure: Consider the proof of Theorem 1 placed in the supplementary material, we have ..,N are still valid solutions for the maximization (3).
To sum up, in the more realistic case, when we must estimate the nuisance parameters, the Procrustes solutions are infinite in general, in both in the high-dimensional case and the lowdimensional by Lemma 1. We emphasize here that, to resolve the non-identifiability of R i , the proposed ProMises model imposes a prior distribution for the parameter R i .

Background
In the previous section, we justified how the perturbation model could be problematic because Lemma 1 proves the non-identifiability of the parameter R i in the realistic case of unknown nuisance parameters. This scenario returns to be critical in several applications, where the m columns of the matrix X i do not express the three-dimensional spatial coordinates, such as in the fMRI data framework illustrated in Sect. 4. In the high-dimensional case, the final orientations of the aligned data can be relevant for interpretation purposes.
For that, we propose its Bayesian approach: the ProMises model. We stress here that a proper prior distribution leads to a closed-form and interpretable, unique point estimate of R i . The specification of the prior parameters is essential, especially in our high-dimensional context.

Interpretation of the Prior Parameters
Because R i ∈ O(m), a proper prior distribution must take values in the Stiefel manifold V m (IR m ). The matrix von Mises-Fisher distribution is a non-uniform distribution on V m (IR m ), which describes a rigid configuration of m distinct directions with fixed angles. It was proposed by Downs (1972) and investigated by many authors (e.g., Chikuse, 1979;Jupp & Mardia 2003).
We report below the formal definition of the von Mises-Fisher distribution.
Definition 3. (Downs, 1972) The von Mises-Fisher distribution for where C(F, k) is a normalizing constant, F ∈ IR m×m is the location matrix parameter, and k ∈ IR + is the concentration parameter.
The parameter k defined in (4) balances the amount of concentration of the distribution around F. If k → 0, the prior distribution is near a uniform distribution (i.e., unconstrained). If k → +∞, the prior distribution tends toward a Dirac distribution (i.e., maximum constraint).
A proper specification of the prior distribution leads to improved estimation of R i . Therefore, the core of the ProMises model is the specification of F defined in (4). Consider the polar decomposition and singular value decomposition of F = P K = L B = L B B B , where P, L, B ∈ O(m), and K ∈ IR m×m are symmetric positive semi-definite matrices, and ∈ IR m×m diagonal matrix with non-negative real numbers on the diagonal. The mode of the density defined in (4) equals P (Jupp & Mardia, 1979), so the most plausible rotation matrix depends on the orientation characteristic of F. Merging the two decompositions, P = L B describes the orientation part of F, and K = B B defines the concentration part. The mode is specified by the product of the left and right singular vectors of F. These decompositions are useful to understand when the density (4) is uni-modal. If F has full rank, does too, the polar decomposition is unique, and thus the mode of the density (i.e., P is the global maximum). Let F be a full rank matrix, then the maximum equals max To sum up, the prior specification allows us to include a priori information about the optimal orientation in the perturbation model. We anticipate here the result of Lemma 3: If F is defined as a full rank matrix, the maximum a posteriori solutionR i will be unique with the orientation structure of F.
Two simple examples of F are delineated below.
Example 1. The most simple definition of F is I m (Lee, 2018). The eigenvalues are all 1, and L and B are equal to e 1 , . . . , e m , where e i is the standard basis forming an orthonormal basis of IR m . The prior distribution shrinks the possible solutions for R i toward orthogonal matrices that consider only the combination of variables with the same location. Alternatively, considering the fMRI scenario, the hyperparameter F can be defined as an Euclidean similarity matrix using the 3D anatomical coordinates x, y, and z of each voxel: where i, j = 1, . . . m. In this way, F is a symmetric matrix with ones in the diagonal, which means that voxels with the same spatial location are combined with weights equalling 1, and the weights decrease as the voxels to be combined are more spatially distant.
Example 2. Consider N matrices, one for each plant, describing the three-dimensional spatial trajectories of a climbing plant, Pisum sativum, having wooden support as a stimulus (Guerra et al., 2019) across time. The spatiotemporal trajectories of the plants are analyzed until they come to grasp the stick. The aim is to functionally align the three time series, one for each coordinate (x, y, z); then, we have m = 3. In this case, we could suppose that the rotation along the z axis is not of interest because the functional misalignment between plants can be along the x-axis and y-axis with the z-axis being the one that reflects the growth of the plants and the x-axis and y-axis describing the elliptical movement (circumnutation) of the plants (Guerra et al., 2019). Therefore, the F location matrix parameter can be described as follows: The axes x and y have the same probability of entering in the calculation of the first two dimensions of the final common space, whereas the z axis is not considered. We use the kinematic plant data from Guerra et al. (2019), consisting of five matrices/plants. Figure 1 shows the elliptical movement expressed by the axes x and y, in the case of unaligned and aligned plant trajectories. The rotation transformations are estimated by the ProMises model with F expressed as (5). We do not go into detail about the meaning of the results because we have introduced this example to explain the usefulness of F. However, we can note how the ProMises model aligns the final coordinates of the tendrils (i.e., when the plant touches the wooden support).

Von Mises-Fisher Conjugate Prior
The von Mises-Fisher distribution (4) was proved by Khatri and Mardia (1977) to be a member of the standard exponential family (Barndorff-Nielsen, 2014). Green and Mardia (2006) mentioned that the von Mises-Fisher distribution is a conjugate prior for the matrix normal distribution, which we formally prove in the following lemma under the perturbation model's assumptions.
Lemma 2. Consider the perturbation model of Definition 2, with R i distributed according to (4), then the posterior distribution f (R i |k, F, X i ) is a conjugate distribution to the von Mises-Fisher prior distribution with location posterior parameter equalling the following: The posterior location parameter is the sum of X i The right part of (7) V i D i V i is the elliptical part of X i −1 n M −1 m , which is a measure of variation relative to the decomposition U i V i (i.e., the maximum likelihood estimator of R i ). 1430 PSYCHOMETRIKA Focus on the right part of (6), F = P K, which is the polar decomposition of F, where P is the mode of the von Mises-Fisher distribution, and K is its measure of variation. Therefore, F is expressed as a combination of the maximum likelihood estimateR i = U i V i and the prior mode P, multiplied by corresponding measures of variation.
Thanks to the conjugacy, the estimation process remains simple; with a small modification we keep the previous algorithm without increasing the computational burden.

Estimation of the ProMises Model
This section delineates the estimation process for R i using the ProMises method. First of all, f (X i |α i , R i ) depends only on the product α i R i , we thus refer to the distribution f ( (2). The following density is then considered as prior distribution for the product α i R i : The following theorem delineates the estimation of R i with known nuisance parameters: Theorem 2. The ProMises model is defined as the perturbation model specified in Definition 2 imposing the prior distribution (8) for α i R i . Let the singular value decomposition of The prior information about R i 's structure is directly entered in the singular value decomposition step; the maximum a posteriori estimator turns out to be a slight modification of the solution given in Theorem 1. We decompose X i −1 ..,N be a set of independent matrices. Then, the joint posterior distribution is simply the product of the single posterior distribution.
If M, n , and m are unknown, the maximization problem has no closed-form solution, like in Sect. 1.2. Because we proved that the prior specification modifies only the singular value decomposition step of Theorem 1, we then modify the Line 5 of Algorithm 1, as follows: Algorithm 2 ProMises model. Use Algorithm 1, only modify:

On the Choice of the Parameter of the von Mises-Fisher Distribution
We choose the von Mises-Fisher distribution as prior distribution for R i for its useful and practical properties. First, as shown, it is a conjugate prior distribution, leading to a direct calculation and interpretation ofR i . Second, it expresses the orthogonality constraint imposed by the Procrustes problem. Finally, the definition of F does not require strong assumptions (Downs, 1972); nevertheless, if we specify it as a full-rank matrix, we guarantee solution's uniqueness. This permits formulation of the below lemma. Lemma 3. If F has full rank, the maximum a posteriori estimates for R i given by Theorem 2 are unique.
Proof Consider the proof of Lemma 1. Multiplying by Z leads to the following maximization: since the cyclic permutation invariance property of the trace does not work as Lemma 1 having the additional term ktr(F R i ).
In addition, recalling Lemma 4, the solution for R i is unique if and only if X i −1 n M −1 m + k F has full rank. If F is defined with full rank, soX iM + k F, and the solution for R i is unique. Furthermore, recalling Jupp and Mardia (1979), the mode of the von Mises-Fisher is the orientation part of location matrix parameter. Because the polar decomposition of F is unique, the maximum a posteriori estimate is unique.
To sum up, the ProMises model enables resolving the non-identifiability of R i that characterizes the perturbation model. The prior information inserted in the model permits guidance of the estimation process, computing a unique and interpretable data orthogonal transformation. Finally, all these properties are reached without complicating the estimation process of the perturbation model; we only modify the singular value decomposition step of Algorithm 1.

Efficient ProMises Model
The framework depicted above can be applied both in low-and high-dimensional settings. However, the extension to the high-dimensional case does not come for free if the perturbation model is used. When n < m, rank equals n, the identifiability of the solution is lost even when the nuisance parameters are known. The lemma below formally states: Lemma 4. Consider X i ∈ IR n×m , if n < m, then the maximum likelihood estimate for R i defined in Theorem 1 is not unique.
Although the ProMises model provides unique solutions even in high-dimensional frameworks, a second issue remains prominent: the computational load. At each step, the presented algorithms perform N singular values decompositions of m × m matrices, which have a polynomialtime complexity O(m 3 ). When m becomes large, as in fMRI data where m is a few hundred thousands, the computation runtime, and the required storing memory, becomes inadmissible.
This section proposes the Efficient ProMises model, which resolves the two above points. The method is efficient in terms of space and time complexity and fixes the non-identifiability of R i . The algorithm allows a faster and more accessible shape analysis without loss of information in the case of n m. It essentially merges the thin singular value decomposition (Bai et al., 2000) with the Procrustes problem.
In practice, the Efficient ProMises approach projects the matrices X i into an n-lowerdimensional space using a specific semi-orthogonal transformation (Abadir & Magnus, 2005;Groß et al., 1999) Q i , with dimensions m × n, which preserve all the data's information. It aligns, then, the reduced n × n matrices {X i Q i ∈ IR n×n } i=1,...,N by the perturbation or ProMises model. Finally, it projects the aligned matrices back to the original n × m-size matrices The following theorem proves that the maximum defined in Eq. (3) using {X i Q i ∈ IR n×n } i=1,...,N equals the original maximum because the Procrustes problem analyzes the first n × n dimensions of R i . The maximum remains the same if we multiply {X i ∈ IR n×m } i=1,...,N by Q i . Theorem 3. Consider the perturbation model in Definition 2 with m = σ 2 I m and the thin singular value decompositions of X i = L i S i Q i for each i = 1, . . . , N , where Q i has dimensions m × n. The following holds Additionally, the condition for the existence ofˆ n by Dutilleul (1999) is satisfied because X i now has dimensions n × n, and the functional alignment needs at least N = 2 observations. So, Theorem 3 is used to define an Efficient version of the ProMises model.

Lemma 5. Consider the assumptions of Theorem 3, then
where F ∈ IR m×m and F = Q i F Q j ∈ IR n×n .
Proofs of Theorem 3 and Lemma 5 are shown in the supplementary material, whereas here, we make some further considerations about the proposed method.
At first glance, the assumption m = σ 2 I m may dilute the resul's impact. However, this assumption does not imply that the data are column-wise independent because this dependence is modeled by R i . Additionally, the joint and accurate estimate of m , n and R i requires a large number of observations. So, it is common in real applications to set m and n to be proportional to the identities in Procrustes-like problems (Haxby et al., 2020). When the model is high-dimensional, this problem becomes even more pronounced because of the huge number of parameters to be estimated.
The Efficient ProMises approach reaches the same maximum while working in the reduced space of the first n eigenvectors, which contains all the information, instead of the full data set. Therefore, the original problem estimates orthogonal matrices of size m ×m: R i ∈ O(m), whereas the Efficient solution provides a set of orthogonal matrices of size n × n: R i ∈ O(n). Even when the solution is projected back into the m ×m space through Q i R i Q i , the rank remains n, whereas the matrices of the original solutions have rank m. This should clarify that the Efficient approach reaches the same fit to the data under a different set of constraints that is n × n orthogonal matrices instead of m × m matrices; hence, the solutions of the two algorithms will not be identical.
Then, we add on Algorithm 1 the lines used to reduce the dimensions of X i , and we modify Line 5 to insert the prior information: Algorithm 3 Efficient ProMises model. Use Algorithm 1, only add these three lines at the beginning: 1: for i = 1 to N do 2: Thin singular value decomposition 3: X i := X i Q i 4: end for and modify: The Efficient approach reduces the time complexity from O(m 3 ) to O(mn 2 ) and the space complexity from O(m 2 ) to O(mn).

Motivation
The alignment problem is recognized in fMRI multi-subject studies because the brain's anatomical and functional structures vary across subjects. The most used anatomical alignments are the Talairach normalization (Talairach & Tournoux, 1988) and the Montréal Neurological Institute (MNI) space normalization (Jenkinson et al., 2002), where the brain images are aligned to an anatomical template by affine transformations using a set of major anatomical landmarks. However, this alignment does not explore the between-subjects variability in anatomical positions of the functional loci. The functional brain regions are not consistently placed on the anatomical landmarks defined by the Talairach and MNI templates. The anatomical alignment is then an approximate inter-subject registration of the functional cortical areas. Haxby et al. (2011) proved that functional brain anatomy exhibits a regular organization at a fine spatial scale shared across subjects.
Therefore, we can assume that anatomical and functional structures are subject-specific (Conroy et al., 2009;Sabuncu et al., 2010) and that the neural activities in different brains are noisy rotations of a common space (Haxby et al., 2011). Functional alignments (e.g., Procrustes methods) attempt to rotate the neural activities to maximize similarity across subjects.
Specifically, each subject's brain activation can be represented by a matrix, where the rows represent the stimuli/time points, and the columns represent the voxels. The stimuli are timesynchronized among subjects, so we have correspondence among the matrices' rows. However, the columns are not assumed to be in correspondence among subjects, as explained before. Each time series of brain activation (i.e., each of the matrices' columns) represents the voxels' functional characteristics that the anatomical normalization fails to align. We aim to represent the neural responses to stimuli into a common high-dimensional space, rather than in a canonical anatomical space that does not consider the variability of functional topographies loci. Figure 2 shows three voxels' neural activities (i.e., v 1 , v 2 , and v 3 ) three columns of the data matrix in two subjects recorded across time. The functional pattern of v 2 is equal across subjects, whereas v 1 and v 3 are swapped. A rotation matrix can resolve this misalignment, with the swap being a particular case of the rotation matrix. For further details about the motivation in using functional Procrustes-based alignment in fMRI studies, see Haxby et al. (2020).

Data Description
We apply the proposed method to data from Pernet et al. (2015), available at https://openneuro. org/datasets/ds000158/versions/1.0.0. The study consists of neural activations of 218 subjects passively listening to vocal (i.e., speech) and nonvocal sounds. Because the application has had a mere illustrative purpose, we choose to use a small number of subjects (18) to facilitate the example's reproducibility by the readers. We preprocessed the data using the Functional MRI of the Brain Software Library (FSL) (Jenkinson et al., 2012) using a standard processing procedure (i.e., high-pass filtering, brain extraction, spatially smoothing, registration to standard MNI space, dealing with motion and differences in slice acquisition time). Anatomical and functional alignment (based on the ProMises model) is compared, having images preprocessed in the same way, but in one case, the functional alignment is applied, while in the other case not. For details about the experimental design and data acquisition, please see Pernet et al. (2015).

Functional Connectivity
We performed region of interest and seed-based correlation analysis (Cordes et al., 2000). The seed-based correlation map shows the level of functional connectivity between a seed and every voxel in the brain, whereas the region of interest analysis expresses the functional correlation between predefined regions of interest coming from a standard atlas. The analysis process is defined as follows: First, the subject images are aligned using Algorithm 3, then the element-wise arithmetic mean across subjects is calculated, and finally, the functional connectivity analysis is developed on this average matrix.
We take the frontal pole as seed, being a region with functional diversity (Liu et al., 2013). The anatomical alignment considered here refers to the MNI space normalization (Jenkinson et al., 2002). Figure 3 shows the correlation values between the seed and each voxel in the brain using data without functional alignment (top of Fig. 3) and with functional alignment using the Efficient ProMises model (bottom of Fig. 3). The first evidence is that the functional alignment produces more interpretable maps, where the various regions, such as the superior temporal gyrus, are delineated by marked spatial edges, while the non-aligned map produces more spread regions, hence being less interpretable. It is interesting to evaluate the regions more correlated with the frontal pole, for example, the superior temporal gyrus. This region is associated with the processing Seed-based correlation map for M, using data only aligned anatomically (top figure), and data also functionally aligned by the Efficient ProMises model (bottom figure). The black point refers to the seed used (i.e., frontal pole with MNI coordinates (0, 64, 18)). So, the brain map indicates the level of correlation between each voxel and the frontal pole. of auditory stimuli. The correlation of the superior temporal gyrus with the seed is clear in the bottom part of Fig. 3, where functionally aligned images are used.
In contrast, the region of interest correlations analysis shows the integration mechanisms between specialized brain areas. Figure 4 indicates the correlation matrices of time-series extracted from the 39 main regions of the atlas of Varoquaux et al. (2011). Using functionally aligned data (right side of Fig. 4), we can see delineated blocks of synchronized regions that can be interpreted as large-scale functional networks. Instead, using data without functional alignment (left side of Fig. 4) the distinctions between blocks are clearly worse. Using functionally aligned data, the left and right visual systems, composed of the dorsolateral prefrontal cortex (DLPFC), frontal pole (Front pol), and parietal (Par), are clearly visible, whereas in the analysis using functionally nonaligned data, this distinction is hidden by noise.

Discussion
The ProMises model provides a methodologically grounded approach to the Procrustes problem allowing functional alignment on high-dimensional data in a computationally efficient way. The issues of the perturbation model (Goodall, 1991)-non-uniqueness, critical interpretation, and inapplicability when n m-are completely surpassed thanks to our Bayesian extension. Indeed, the ProMises method returns unique and interpretable orthogonal transformations, and its efficient approach extends the applicability to high-dimensional data. The presented method is particularly useful in fMRI data analysis because it allows the functional alignment of images having roughly 200× 200,000 dimensions, obtaining a unique representation of the aligned images in the brain space and a unique interpretation of the related results. In the application example presented in Sect. 4, a subsample was analyzed. However, the algorithm has a linear growth in N , and therefore, it is not a problem to work with larger samples. Also, the algorithm permits a parallel computation for the subjects.
The Bayesian framework gives the user the advantage and the duty to insert prior information into the model through k and F. The parameter k plays the role of regularization parameter, which is rarely known a priori. We estimated it by cross-validation, although it may be interesting to adapt approximations-based methods (e.g., generalized cross-validation) to reduce the computational burden. Alternatively, we could assume a prior distribution taking values in R + for the regularization parameter k and proceed to jointly estimate this parameter as well. More interestingly, the matrix F addresses the estimate of the optimal rotations, which is favorable in the analysis of fMRI data because, in this context, the variables have a spatial anatomical location. In the example in Sect. 4, our definition of F favors the combination of voxels with equal location. However, a more thoughtful specification can entirely exploit the voxels' specific spatial position in the anatomical template. This opens up the possibility to explore various specifications of F and will be the subject of further research.