A consistently oriented basis for eigenanalysis

Repeated application of eigen-centric methods to an evolving dataset reveals that eigenvectors calculated by well-established computer implementations are not stable along an evolving sequence. This is because the sign of any one eigenvector may point along either the positive or negative direction of its associated eigenaxis, and for any one eigenfunction call, the sign does not matter when calculating a solution. This work reports a model-free algorithm that creates a consistently oriented basis of eigenvectors. The algorithm postprocesses any well-established eigen call and is therefore agnostic to the particular implementation of the latter. Once consistently oriented, directional statistics can be applied to the eigenvectors in order to track their motion and summarize their dispersion. When a consistently oriented eigensystem is applied to methods of machine learning, the time series of training weights becomes interpretable in the context of the machine-learning model. Ordinary linear regression is used to demonstrate such interpretability. A reference implementation of the algorithm reported herein has been written in Python and is freely available, both as source code and through the thucyd Python package.


Overview
Eigenanalysis plays a central role in many aspects of machine learning because the natural coordinates and relative lengths of a multifactor dataset are revealed by the analysis [6,1,9,17].Singular-value decomposition and eigendecomposition are two of the most common instances of eigenanalysis [21,5], yet there is a wide range of adaptation, such as eigenface [12].The eigenanalysis ecosystem is so important and well studied that highly optimized implementations are available in core numerical libraries such as LAPACK [2].This article does not seek to suggest or make any improvement to the existing theoretical or numerical study of eigenanalysis as it applies to any one standalone problem.Instead, the focus of this article is on the improvement of eigenanalysis when applied repeatedly to an evolving dataset.With today's world being awash in data, and with machine-learning techniques being applied regularly, if not continuously, on everexpanding datasets as new data arrives, the evolution of the eigensystem ought to be included in any wellconceived study.Yet, the researcher will find that the orientation of the eigenvector basis produced by current eigenanalysis algorithms is not consistent, that it may change according to hardware and software, and that it is sensitive to data perturbations.This paper demonstrates a method for the consistent orientation of an eigensystem, in which the orientation is made consistent after an initial call to a canonical eigenanalysis routine.This method, however, does not address the issue of a correct orientation: correctness is context specific and must be addressed in a manner exogenous to eigenanalysis.The analyst will need to know, for instance, whether a positive change in the factors aligned to particular eigenvectors should arXiv:1912.12983v1[math.NA] 30 Dec 2019 Fig. 1 An ellipsoid-shaped point cloud in R 3 typical of centered, variance-bound data having a stationary or slowly changing underlying random process.The point cloud is quoted in a constituent basis (π 1 , π 2 , π 3 ) while the eigenvectors lie along the natural axes of the data.Here, the ambiguity of the direction of v 2 is unresolved.Left, a lefthanded basis of eigenvectors.Right, a right-handed basis of eigenvectors.
imply a positive or negative change in a dependent feature.In this article, consistency alone is sought.
As a consequence of the method detailed below, the evolution of the eigenvectors themselves can be tracked.Eigenvectors that span R n lie on a hypersphere S n−1 ⊆ R n , and the mean pointing direction on the sphere and dispersion about the mean are statistics of interest, as is any statistically significant drift.Basic results from directional statistics (see [13,8,11]) are used to study the mean direction and dispersion of the common and differential modes over eigenvector evolution.

Inconsistent Orientation
Given a square, symmetric matrix, M T = M , with realvalued entries, M ∈ R n×n , the eigendecomposition is where the columns of V ∈ R n×n are the eigenvectors of M , the diagonal entries along Λ ∈ R n×n are the eigenvalues, and entries in both are purely real.Moreover, when properly normalized, V T V = I, indicating that the eigenvectors form an orthonormal basis.
To inspect how the eigenvectors might be oriented, write out V as (v 1 , v 2 , . . ., v n ), where v k is a column vector, and take the inner product two ways: A choice of +v 2 is indistinguishable from a choice of −v 2 because in either case the equality in (1) holds.In practice, the eigenvectors take an inconsistent orientation, even though they will have a particular orientation once returned from a call to an eigenanalysis routine.
A similar situation holds for singular-value decomposition (SVD).For a centered panel of data P ∈ R m×n where m > n, SVD gives where the columns in panel U ∈ R m×n are the projections of the columns of P onto the eigenvectors V , scaled to have unit variance.The eigenvectors have captured the natural orientation of the data.Unlike the example above, the eigensystem here is positive semidefinite, so all eigenvalues in Λ are nonnegative, and in turn a scatter plot of data in P typically appears as an ellipsoid in R n (see figure 1).Following the thought experiment above, a sign flip of an eigenvector in V flips the sign of the corresponding column in U , thereby preserving the equality in (2).Consequently, the eigenvector orientations produced by SVD also have inconsistent orientation, since either sign satisfies the equations.
In isolation, such inconsistent eigenvector orientation does not matter.The decomposition is correct and is generally produced efficiently.However, in subsequent analysis of the eigensystem-principal component analysis (PCA) being a well known example [6]-and for an evolving environment, the inconsistency leads to an inability to interpret subsequent results and the inability to track the eigenvector evolution.
For instance, a simple linear regression performed on the eigensystem (whether with reduced dimension or not) with dependent variable y ∈ R m reads where the weight vector is Each entry in β corresponds to a column in U , which in turn corresponds to an eigenvector in V .If the sign of one of the eigenvectors is flipped (compare left and right in figure 1), the corresponding column in U has Fig. 2 Left, in R 2 a regression along the eigenaxes yields the tuple (β 1 , β 2 ).Right, top, because the eigenvector direction can flip from one eigenanalysis to the next, the signs of the βtuple also flip, defeating any attempt at interpretability.Right, bottom, ideally there is no sign flipping of the eigenvectors and therefore no flips of the β-tuple along a sequence.
its sign flipped, which flips the sign of the corresponding β entry.In an evolving environment, a particular regression might be denoted by index k, and the kth regression reads When tracking any particular entry in the β vector, say β i , it will generally be found that β i,k flips sign throughout the evolution of system (see figure 2).There are two consequences: First, for a single regression, the interpretation of how a change of the ith factor corresponds to a change in y is unknown because the ith eigenvector can take either sign.Second, as the data evolves, the sign of β i,k will generally change, even if the unsigned β i value remains fixed, so that the time series of β cannot be meaningfully analyzed.If these sources of sign flips can be eliminated, then a time series of weights β can be interpreted as we might have expected (see figure 2 lower right).

Toward a Consistent Orientation
To construct a consistent eigenvector basis we must begin with a reference, and SVD offers a suitable start.
The data in matrix P of (2) is presented with n columns, where, typically, each column refers to a distinct observable or feature.Provided that the data is well formed, such as having zero mean, bound variance, and negligible autocorrelation along each column, the row-space of P forms the constituent basis Π ∈ R n×n with basis vectors π ∈ R n such that Π = (π 1 , π 2 , . . ., π n ) (see figure 1).If we label the columns in P by (p 1 , . . ., p n ), then the basis Π when materialized onto the constituent axes is simply or, Π = I.Now, when plotted as a scatter plot in R n , the data in the rows of P typically form an ellipsoid in the space.When the columns of P have zero linear correlation, the axes of the ellipsoid align to the constituent axes.As correlation is introduced across the columns of P , the ellipsoid rotates away.Given such a case, we might seek to rotate the ellipsoid back onto the constituent axes.The constituent axes can therefore act as our reference.
This idea has good flavor, but is incomplete.An axis is different from a vector, for a vector may point along either the positive or negative direction of a particular axis.Therefore, ambiguity remains.To capture this mathematically, let us construct a rotation matrix R ∈ R n×n such that R T rotates the data ellipse onto the constituent axes.A modified SVD will read If it were the case that R T V = I, then the goal would be achieved.However, R T V = I is not true.Instead, is the general case, where the sign on the kth entry depends on the corresponding eigenvector orientation in V .While the following is an oversimplification of the broader concept of orientation, it is instructive to consider that a basis in R 3 may be oriented in a right-or left-handed way.It is not possible to rotate one handedness onto the other because of the embedded reflection.
It is worth noting that an objection might be raised at this point because the eigenvector matrix V is itself a unitary matrix with det(V ) = +1 and therefore serves as the sought-after rotation.Indeed, V T V = I.The problem is that V T V is quadratic, so any embedded reflections cancel.In order to identify the vector orientation within V , we must move away from the quadratic form.

A Representation for Consistent Orientation
The representation detailed herein uses the constituent basis Π = I as the reference basis and seeks to make (5) an equality for a particular V .To do so, first define the diagonal sign matrix S k ∈ R n×n as S k = diag (1, 1, . . ., s k , . . ., 1) , where s k = ±1.(6) The product S = S k is S = diag (s , s 2 , . . ., s n ), and clearly S 2 = I.Given an entry s k = −1, the V S k product reflects the kth eigenvector such that v k → −v k while leaving the other eigenvectors unaltered.
An equality can now be written with the representation given suitable rotation and reflection matrices R and S.This representation says that a rotation matrix R can be found to rotate V S onto the constituent basis Π, provided that a diagonal reflection matrix S is found to selectively flip the sign of eigenvectors in V to ensure complete alignment in the end.Once (7) is satisfied, the oriented eigenvector basis V can be calculated in two ways: To achieve this goal, the first step is to order the eigenvalues in descending order, and reorder the eigenvectors in V accordingly.(Since the representation (7) holds for real symmetric matrices M = M T and not solely for positive semidefinite systems, the absolute value of the eigenvalues should be ordered in descending order.)Indexing into V below expects that this ordering has occurred.
The representation (7) is then expanded so that each eigenvector is inspected for a reflection, and a rotation matrix is constructed for its alignment to the constituent.The expansion reads There is a good separation of concerns here: reflections are imparted by the S k operators to the right of V , while rotations are imparted by the R T k operators to the left.Reflections do not preserve orientation whereas rotations do preserve orientation.
The algorithm created to attain a consistent orientation follows the pattern For each eigenvector, a possible reflection is computed first, followed by a rotation.The result of the rotation is to align the target eigenvector with its associated constituent basis vector.The first rotation is performed on the entire space R n , the second is performed on the subspace R n−1 so that the alignment v T 1 π 1 = 1 is preserved, the third rotation operates on R n−2 to preserve the preceding alignments, and so on.The solution to (7) will include n reflection elements s k and n(n − 1)/2 rotation angles embedded in the n rotation matrices R k .
Before working through the solution, it will be helpful to walk through an example of the attainment algorithm in R 3 .

Algorithm Example in R 3
This example illustrates a sequence of reflections and rotations in R 3 to attain alignment between V and Π, and along the way shows cases in which a reflection or rotation is simply an identity operator.
A example eigenvector V is created with a left-hand basis, such that an alignment to the constituent basis Π would yield R T V = diag(1, −1, 1).The precondition of ordered eigenvalues λ 1 > λ 2 > λ 3 has already been met.Figure 3 (a) shows the relative orientation of V to Π.
The first step is to consider whether or not to reflect v 1 , the purpose being to bring v 1 into the nonnegative side of the halfspace created by the hyperplane perpendicular to π 1 .To do so, the relative orientation of (v 1 , π 1 ) is measured with an inner product.The reflection entry s 1 is then determined from (with sign(0) = 1).From this, V S 1 is formed.A consequence of the resultant orientation is that the angle subtended between v 1 and π 1 is ≤ |π/2|.The next step is to align v 1 with π 1 through a rotation R 1 : the result is shown in figure 3 (b).The principal result of R T 1 V S 1 is that v T 1 π 1 = 1: the basis vectors are aligned.Significantly, the remaining vectors (v 2 , v 3 ) are perpendicular to π 1 , and therefore the subsequent rotation needs only to act in the (π Fig. 3 Example of the orientation of a left-handed eigenvector basis V to the constituent basis Π. a) Orthonormal eigenvectors (v 1 , v 2 , v 3 ) as they are oriented with respect to the constituent basis vectors With this sequence of operations in the full space R 3 now complete, the subspace R 2 is treated next.As before, the first step is to inspect the orientation of v 2 with π 2 .However, care must be taken because v 2 is no longer in its original orientation; instead, it was rotated with the whole basis during R T 1 V , as shown in figure 3 (b).It is the new orientation of v 2 that needs inspection, and that orientation is 1 v 2 and π 2 point in opposite directions, so eigenvector v 2 needs to be reflected.We have The solution needs to work through the last subspace R to align v 3 .The reflection entry s 3 is evaluated from , but since we are in the last subspace of R, the operation is a tautology.Still, R 3 is included in the representation (9) for symmetry.
Lastly, the final, oriented eigenvector basis can be calculated two ways, compare with (8).In summary, the algorithm walks down from the full space R 3 into two subspaces in order to perform rotations that do not upset previous alignments.As this example shows, S 1 = S 3 = I, so there are in fact two reflection operators that are simply identities.Moreover, R 3 = I, since the prior rotations automatically align that last basis vector to within a reflection.

Algorithm Details
The algorithm that implements ( 9) is now generalized to R n .There are three building blocks to the algorithm: sorting of the eigenvalues in descending order, calculation of the reflection entries, and construction of the rotation matrices.Of the three, construction of the rotation matrices requires the most work and has not been previously addressed in detail.

Sorting of Eigenvalues
The purpose of sorting the eigenvalues in descending order and rearranging the corresponding eigenvectors accordingly is to ensure that the first rotation operates on the axis of maximum variance, the second rotation on the axis of next-largest variance, and so on, the reason being that the numerical certainty of the principal ellipsoid axis is higher than that of any other axis.Since the rotations are concatenated in this method, an outsize error introduced by an early rotation persists throughout the calculation.By sorting the eigenvalues and vectors first, any accumulated errors will be minimized.
Since the algorithm works just as well for real-valued symmetric matrices, it is reasonable to presort the eigenvalues in descending order of their absolute value.From a data-science perspective, we generally expect positive semidefinite systems; therefore a system with negative eigenvalues is unlikely, and if one is encountered, there may be larger upstream issues to address.

Reflection Calculation
Reflection entries are calculated by inspecting the inner product between a rotated eigenvector and its associated constituent basis (several examples are given in the previous section).The formal reflection-value expression for the kth basis vector is where Despite of the apparent complexity of these expressions, the practical implementation is simply to inspect the sign of the (k, k) entry in the

Rotation Matrix Construction
One way to state the purpose of a particular rotation is that the rotation reduces the dimension of the subspace by one.For an eigenvector matrix V in R n , denoted by V (n) , the rotation acts such that Here, S 1 ensures that the (1, 1) entry on the right is +1.
Next, the sign of the (2, 2) matrix entry is inspected and s 2 set accordingly.The next rotation decrements the dimension again such that As before, S 2 ensures that the (2, 2) entry on the right is +1.The 1 entries along the diagonal reflect the fact that a eigenvector has been aligned with its corresponding constituent basis vector.Subsequent rotations preserve the alignment by operating in an orthogonal subspace.

Alignment to One Constituent Basis Vector
To simplify the notation for the analysis below, let us collapse V and all operations prior to the kth rotation into a single working matrix, denoted by W k .The lefthand side of the two equations above then deals with The th column of the kth working matrix is w k, , and the mth column entry is w k, ,m .
With this notation, the actions of R 1 and R 2 above on the principal working columns w 1,1 and w 2,2 are . . .
Strictly speaking, the 1 entries in the column vectors on the right-hand sides are w T k,k w k,k , but since V is expected to be orthonormal, the inner product is unity.
Equations like ( 16) are well known in the linear algebra literature, for instance [21,5], and form the basis for the implementation of QR decomposition (see also [16]).There are two approaches to the solution: use of Householder reflection matrices, or use of Givens rotation matrices.The Householder approach is reviewed in section 7.1 below, but in the end it is not suitable in this context.The Givens approach works well, and the solution here has an advantage over the classical formulation because we can rely on w T k,k w k,k = 1.To simplify the explanation once again, let us focus on the four-dimensional space R 4 .In this case, R T 1 w 1,1 needs to zero out three entries below the pivot, and likewise R T 2 w 2,2 needs to zero out two entries below its pivot.Since one Givens rotation matrix can zero out one column entry at most, a cascade of Givens matrices is required to realize the pattern in (16) and the length of the cascade depends on the dimension of the current subspace.
A Givens rotation imparts a rotation within an R 2 plane embedded in a higher-dimensional space.While a simple two-dimensional counterclockwise rotation matrix is where the dot notation on the right represents the pattern of nonzero entries.The full rotation R 1 is then decomposed into a cascade of Givens rotations: The order of the component rotations is arbitrary and is selected for best analytic advantage.That said, a different rotation order yields different Givens rotation angles, so there is no uniqueness to the angles calculated below.The only material concern is that the rotations must be applied in a consistent order.
A advantageous cascade order of Givens rotations to expand the left-hand equation in (16) follows the pattern where the rotation R 1 has been moved to the other side of the original equation, and δ k denotes a vector in R n×1 , in which all entries are zero except for a unit entry in the kth row.Each Givens matrix moves some of the weight from the first row of δ 1 into a specific row below, and otherwise the matrices are not coupled.
In the same vein, Givens rotations to represent R 2 follow the pattern Similar to before, weight in the (2, 1) entry of δ 2 is shifted into the third and fourth rows of the final vector.It is evident that R 3 requires only one Givens rotation, and that for R 4 is simply the identity matrix.

Solution for Givens Rotation Angles
Now, to solve for the Givens angles, the cascades are multiplied through.For the R 1 cascade in (18), multiplying through yields where entries a k denote row entries in w 1,1 and are used only to further simplify the notation.This is the central equation, and it has a straightforward solution.Yet first, there are several important properties to note: 1.The L 2 norms of the two column vectors are both unity.2. While there are four equations, the three angles (θ 2 , θ 3 , θ 4 ) are the only unknowns.The fourth equation is constrained by the L 2 norm.3. The a 1 entry (or specifically the w 1,1,1 entry) is nonnegative: a 1 ≥ 0. This holds by construction because the associated reflection matrix, applied previously, ensures the nonnegative value of this leading entry.With these properties in mind, a solution to (19) uses the arcsine method, which starts from the bottom row and works upward.The sequence in this example reads where θ k ∈ [−π/2, π/2].The sign of each angle is solely governed by the sign of the corresponding a k entry.Moreover, other than the first angle, the Givens angles are coupled in the sense that no one angle can be calculated from the entries of working matrix W k alone, and certainly not prior to the rotation of W k−1 into W k .This shows once again that there is no shortcut to working down from the full space through each subspace until R 1 is resolved.
Past the first equation, the arguments to the arcsine functions are quotients, and it will be reassuring to verify that these quotients never exceed unity in absolute value.For θ 3 , we can write thus a 2 3 ≤ cos 2 θ 4 , so the arcsine function has a defined value.Similarly, for θ 2 we have and therefore a 2 2 ≤ cos 2 θ 3 cos 2 θ 4 .Again, the arcsine function has a defined value.This pattern holds in any dimension.
The preceding analysis is carried out for each eigenvector in V .For each subspace k there are k − 1 angles to resolve.For convenience, the Givens angles can be organized into the upper right triangle of a square matrix.For instance, in R 4 , Lastly, it is worth noting that Dash constructed a storage matrix of angles similar to (21) when calculating the embedded angles in a correlation matrix [4].His interest, shared here, is to depart from the Cartesian form of a matrix-whose entries in his case are correlation values and in the present case are eigenvector entries-and cast them into polar form.Polar form captures the natural representation of the system.

Summary
This section has shown that each eigenvector rotation matrix R k is constructed from a concatenation of elemental rotation matrices called Givens rotations.Each Givens rotation has an angle, and provided that the rotation order is always preserved, the angles are a meaningful way to represent the orientation of the eigenbasis within the constituent basis.
The representation for θ in (21) highlights that there are n(n − 1)/2 angles to calculate to complete the full basis rotation in R n .It will be helpful to connect the angles to a rotation matrix, so let us define a generator function G(•) such that The job of the generator is to concatenate Givens rotations in the correct order, using the recorded angles, to correctly produce a rotation matrix, either a matrix R k for one vector, or the matrix R for the entire basis.

Consideration of Alternative Approaches
The preceding analysis passes over two choices made during the development of the algorithm.These alternatives are detailed in this section, together with the reasoning for not selecting them.

Householder Reflections
Equations nearly identical to (16) appear in well-known linear algebra texts, such as [5,21,20], typically in the context of QR decomposition, along with the advice that Householder reflections are preferred over Givens rotations because all entries below a pivot can be zeroed out in one step.This advice is accurate in the context of QR decomposition, but does not hold in the current context.
Recall that in order to align eigenvector v 1 to constituent basis π 1 , n − 1 Givens rotations are necessary for a space of dimension n.Arranged appropriately, each rotation has the effect of introducing a zero in the column vector below the pivot.A Householder reflection aligns v 1 to π 1 in a single operation by reflecting v 1 onto π 1 about an appropriately aligned hyperplane in R n .In general this hyperplane is neither parallel nor perpendicular to v 1 , and therefore neither are any of the other eigenvectors in V .
The Householder operator is written as where u is the Householder vector that lies perpendicular to the reflecting hyperplane.The operator is Hermitian, unitary, involutory, and has det(H) = −1.In fact, all eigenvalues are +1, except for one that is −1.
Let us reconsider the eigenvector representation ( 7) and apply a general unitary transform U to V , as in (Unitary transform U is not to be confused with the matrix of projected data U in SVD equation (2).)The equality holds for any unitary operator, so Householder reflections are admitted.Let us then compare expanded representations in the form of (9) for both rotation and reflection operators to the left of V : Even though (23) holds for Householder reflections, the equality itself is not the goal ; rather, the interpretability of the eigenvector orientation is paramount.In the rotations method, reflections only reflect one eigenvector at a time whereas rotations transform the basis as a whole.
In contrast, the reflections method intermingles eigenvector reflections with basis reflections, thereby disrupting the interpretation of the basis orientation.
A comparison between the rotation and reflection methods is illustrated in figure 4. Here, the eigenvector basis spans R 2 with an initial orientation of Following the upper pane in the figure, where the reflection method is illustrated, it is found that v T 1 π 1 < 0 and therefore 2 π 2 > 0 (trivially so) and therefore s 2 = +1.The tuple of sign reflections for the eigenvectors in V is therefore (s 1 , s 2 ) = (−1, +1).
A parallel sequence is taken along the lower pane, where a Householder reflection is used in place of rotation.Expression V S 1 is as before, but to align v 1 with π 1 a reflection is used.To do so, Householder vector u 1 is oriented so that a reflection plane is inclined by 22.5 • from the π 1 axis.Reflection of V S 1 about this plane indeed aligns v 1 to π 1 but has the side effect of reflecting v 2 too.As a consequence, when it is time to inspect the orientation of v 2 with respect to π 2 , we find that s 2 = −1.Therefore the tuple of eigenvector signs is now (−1, −1).
Our focus here is not to discuss which tuple of signs is "correct," but rather on how to interpret the representation.The choice made in this article is to use rotations instead of reflections for the basis transformations so that the basis orientation is preserved through this essential step.The use of Householder reflections, by contrast, scatters the basis orientation after each application.

Arctan Calculation Instead of Arcsine
A solution to (19) is stated above by equations (20) in terms of arcsine functions, and a requirement of this approach is that the range of angles be restricted to While this is the preferred solution, there is another approach that uses arctan functions instead of arcsine.The arctan solution to (19) starts at the top of the column vector and walks downward, the angles being θ 2 = arctan2 (a 2 , a 1 csc θ 1 ) The first angle is defined as θ 1 ≡ π/2 (which is done solely for equation symmetry), and the arctan2 function is the four-quadrant form arctan2(y, x), in which the admissible angular domain is θ ∈ [−π, π].It would appear that the arctan method is a better choice.
The difficulty arises with edge cases.Let us consider the vector The sequence of arctan angles is then The final arctan evaluation is not numerically stable.For computer systems that support signed zero, the arctan can either be arctan2(0, 0) → 0 or arctan2(0, −0) → −π, see [15,7].The latter case is catastrophic because product V S 1 has forced the sign of v 1,1 to be positive, yet now R T 1 V S 1 flips the sign since cos π = −1.After R T 1 V S 1 is constructed, the first eigenvector is not considered thereafter, so there is no chance, unless the algorithm is changed, to correct this spurious flip.
Therefore, to avoid edge cases that may give false results, the arcsine method is preferred.

Application to Regression
Returning now to the original motivation for a consistently oriented eigenvector basis, the workflow for regression and prediction on the eigenbasis becomes: 1. SVD: Find (V, Λ) from in-sample data P in such that 3. Regression: Find ( β, ) from (y, U, S, Λ) such that 4. Prediction: Calculate Ey from (P out , R, β), where P out is out-of-sample data, with The form of (26) here assumes that there is no exogenous correction to the direction of y in response to a change of the factors along the eigenbasis, as discussed in section 1, Overview.
The workflow highlights that both elements of the orientation solution, rotation R and reflection S, are used, although for different purposes.The regression (26) requires the reflection matrix S to ensure that the signs of β faithfully align to the response of y.Predictive use with out-of-sample data requires the β estimate, as expected, but also the rotation matrix R. The rotation orients the constituent basis, which is observable, into the eigenbasis, which is not.
Without dimension reduction, the rotation in ( 27) is associative: for no dimension reduction.
However, associativity breaks with dimension reduction.Principal components analysis, for instance, discards all but the top few eigenvector components (as ranked by their corresponding eigenvalue) and uses the remaining factors in the regression.The number of entries in β equals the number of remaining components, not the number of constituent components.In this case, only (P out R) can be used.Typically, online calculation of this product is simple because the calculation is updated for each new observation.Rather than being an R m×n matrix, out-of-sample updates are P out ∈ R 1×n , which means that P out R requires only a BLAS level-2 (vector-matrix) call on an optimized system [3].

Treatment of Evolving Data
Eigenvectors and values calculated from data are themselves sample estimates subject to uncertainties based on the particular sample at hand, statistical uncertainties based on the number of independent samples available in each dimension1 as well as on assumptions about the underlying distribution,2 and scale-related numerical uncertainties based on the degree of colinearity among factors.All of these uncertainties exist for a single, static dataset.Additional uncertainty becomes manifest when data evolves because the estimate of the eigensystem will vary at each point in time, this variation being due in part to sample noise, and possibly due to nonstationarity of the underlying random processes.
In the presence of evolving data, therefore, the eigensystem fluctuates, even when consistently oriented.It is natural, then, to seek statistics for the location and dispersion of the eigensystem.To do so, two orthogonal modes of variation are identified (see figure 5): stretch variation, which is tied to change of the eigenvalues; and wobble variation, which is tied to change of the eigenvectors.Each is treated in turn.

Stretch Variation
Stretch variation is nominally simple because eigenvalues λ i are scalar, real numbers.The mean and variance follow from the usual sample forms of the statistics.For an ensemble of N samples, the average eigenvalues are simply A challenge for the variance statistic is that eigenvalues are not independent since How the covariance manifests itself is specific to the dataset at hand.
Fig. 5 Orthogonal modes of variation of an eigensystem.
Left, in one variation mode, the eigenvalues of a new observation differ from those of the past, thereby stretching and/or compressing the point-cloud ellipse.Right, in another variation mode, it is the eigenvectors that vary in this way, thereby changing the orientation of the pointcloud ellipse with respect to the constituent basis.

Wobble Variation
Quantification of wobble variation requires a different toolset because eigenvectors are vectors, and these orthonormal vectors point onto the surface of a hypersphere such that v k ∈ S n−1 ⊆ R n .Figure 6 left illustrates a case in point for R 3 : a stationary variation of the eigenvectors forms point constellations on the surface of the sphere S 2 , one constellation for each vector. 3 The field of directional statistics informs us regarding how to define mean direction (the vector analogue of location) and dispersion in a consistent manner for point constellations such as these [13,8,11].Directional statistics provides a guide on how to use the vector information available from V and embedded angle information recorded in θ, see (21).Recent work focuses on the application of these statistics to machine learning [18].Nonetheless, application of directional statistics to eigenvector systems appears to be underrepresented in the literature.
The following discussion only applies to eigensystems that have been consistent orientated.

Estimation of Mean Direction
Both mean direction and directional dispersion are measurable statistics for eigensystems, and determining the mean direction is the simpler of the two.The reason this is so is because the eigenvectors are orthogonal in every instance of an eigenbasis, thus the mean directions must 3 Figure 6 was drawn using point constellations drawn from the von Mises-Fisher distribution [13] with concentration κ = 100.Sampling this distribution in R 2 and R 3 was done using the VonMisesFisher class in the TensorFlow Probability package [23], which belongs to the TensorFlow platform [22].The VonMisesFisher class draws samples using the nonrejection based method detailed by Kurz and Hanebeck [10].also be orthogonal.Consequently, the eigenvectors can be treated uniformly.
It is a tenet of directional statistics that the mean direction is calculated from unit vectors, not from their angles. 4For an ensemble of N unit vectors x[i], the mean direction x is defined as with the caveat that the mean direction is undefined for x S = 0.The resultant vector x S is the vector sum of component vectors x[i] and has length x S under an L 2 norm (see figure 7).The mean direction is thus a unit-vector version of the resultant vector.Extending this construction for mean direction to an ensemble of oriented eigenvector matrices V, the mean location of the eigenvectors is and for normalization.Using the methodologies above, the average basis V can be rotated onto I with a suitable rotation matrix R T V = I, and in doing so, the Cartesian form of V can be converted into polar form.Using the generator function G() from (22) to connect the two, we have Fig. 6 Illustration of eigenvector wobble in R 3 and its R 2 subspace.Left, points illustrating eigenvector wobble for a stationary process.The points lie on the hypersphere S 2 , and there is one constellation for each eigenvector.Samples were drawn from a von Mises-Fisher distribution with concentration κ = 100.In this illustration, the point constellations are identical, each having been rotated onto the mean direction of each vector.In general the point constellations are mixtures having different component dispersions.Right, eigenvector v 2 may wobble in the (v 2 , v3 ) plane independently from v 1 wobble.
Points here illustrate wobble in this R 2 subspace.
Note that θ is not the arithmetic average of angles θ[i] but is exclusively derived from V. In fact, an alternative way to express the basis sum in (30) is to write and from this we see the path of analysis:

Estimation of Dispersion
Dispersion estimation is more involved for an ensemble of eigenbases than for an ensemble of single vectors because both common and differential modes of variation exist.Referring to figure 6 left, consider a case where the only driver of directional variation for the eigenbasis is a change of the pointing direction of v 1 .As v 1 scatters about its mean direction, vectors v 2,3 will likewise scatter, together forming three constellations of points, as in the figure.Thus, wobble in v 1 imparts wobble in the other vectors.Since dispersion is a scalar independent of direction, the dispersion estimates along all three directions in this case are identical.Yet, there is another possible driver for variation that is orthogonal to v 1 , and that is motion within the (v 2 , v 3 ) plane.Such motion is equivalent to a pirouette of the (v 2 , v 3 ) plane about v 1 .Taking (v 2 , v3 ) as a reference, variation within this R 2 subspace needs to be estimated, see figure 6 right.When viewed from R n , however, the point constellation distribution about v2 is a mixture of variation drivers.
Regardless of the estimation technique for dispersion, it is clear that the pattern developed in section 6.3, Rotation Matrix Construction, for walking through subspaces of R n to attain R T V S = I occurs here as well.Thus, rather than calculate a dispersion estimate for each eigenvector, since they are mixtures, a dispersion estimate is made for each subspace.
Returning to the generator for subspace k in (22), given an ensemble θ[i] the ensemble of the kth subspace is where R k ∈ R n×n .From R k [i] the kth column is extracted, and the lower k entries taken to produce column vector x k [i] ∈ R (n−k+1)×1 .Figuratively, The reason behind of removing the top zero entries from the column vector is that parametric models of dispersion are isotropic in the subspace, and therefore keeping dimensions with zero variation would create an unintended distortion.Now that the vectors of interest x k [i] have been identified, their dispersions can be considered.Unlike the mean direction, there is no one measure of dispersion.Two approaches are touched upon here, one being model free and the other based on a parametric density function.Both approaches rely on the resultant vector x S .
Returning to figure 7, the two panels differ in that on the left there is a higher degree of randomness in the pointing directions of the unit vectors along the sequence, while on the right there is a lower degree x S Fig. 7 The resultant vector x S ≡ i x [i] for two levels of concentration, κ = (1, 10).Lower concentration, which is akin to higher dispersion, leads to a smaller expected norm of the resultant vector x S , whereas higher concentration leads to a larger expected norm.Sample vectors in R 2 were generated from the von Mises-Fisher distribution.(Note that the difference in horizontal and vertical scales distorts the length of the unit vectors in the plots.) of randomness.The consequence is that the expected lengths of the result vectors are different: the lower the randomness, the longer the expected length.Define the mean resultant length for N samples as r = x S /N, 0 ≤ r ≤ 1. (36) For distributions on a circle, Mardia and Jupp define a model-free circular variance as V ≡ 1 − r (see [13]).Among other things, this variance is higher for more dispersed unit vectors and lower for less dispersed vectors.Yet, after the generalization to higher dimensions, it remains unclear how to compare the variance from one dimension to another.
As an alternative, a model-based framework starts by positing an underlying parametric probability distribution and seeks to define its properties and validate that they meet the desired criteria.In the present case, eigenvectors are directional, as opposed to axial; their evolution-induced scatter is probably unimodal, at least across short sequences; and their dimensionality is essentially arbitrary.An appropriate choice for an underlying distribution therefore is the von Mises-Fisher (vMF) distribution.The vMF density function in R n is [18] p vmf (x; n, µ, κ) = c n (κ) e κµ T x where The single argument to the density function is the vector x ∈ R n×1 , and the parameters are the dimension n, mean direction µ ∈ R n×1 , and concentration κ ∈ [0, ∞).
The mean direction parameter is the same as in (29): µ = x.The concentration κ parameter, a scalar, is like an inverse variance.For κ = 0, unit vectors x are uniformly distributed on the hypersphere S n−1 , whereas with κ → ∞, the density concentrates to the mean direction µ.Maximum likelihood estimation yields μ = x S −1 x S and κ = A −1 n (r) , where the nonlinear function A n (•) is The key aspect here is the connection of the concentration parameter κ to the mean resultant length r from (36).Notice as well that κ is related to the dimensionality n, indicating that concentration values across different dimensions cannot be directly compared.
Sampling from the vMF distribution is naturally desirable.Early approaches used rejection methods (see the introduction in [10]), these being simpler, yet the runtime is not deterministic.Kurz and Hanebeck have since reported a stochastic sampling method that is deterministic, see [10], and it is this method that is implemented in TensorFlow (see footnote 3 on page 11).The samples in figure 6 were calculated with Tensor-Flow.
To conclude this section, in order to find the underlying drivers of variation along a sequence of eigenbasis observations, concentration parameters κ, or at least the mean resultant lengths r, should be computed for each descending subspace in R n .The n concentration parameters might then be stored as a vector, 9.3 Rank-Order Change of Eigenvectors Along an evolving system, the rank order of eigenvectors may change.This paper proposes no mathematically rigorous solution to dealing such change-it is unclear that one exists, nor is its absence a defect of the current work-yet there are a few remarks that may assist the analyst to determine whether rank-order change is a property of the data itself or a spurious effect that results from an upstream issue.Eigenvectors are "labeled" by their eigenvalue: in the absence of eigenvalue degeneracy, each vector is mapped one-to-one with a value.The concept of rankorder change of eigenvectors along evolving data is only meaningful if a second label can be attached to the vectors such that the second label does not change when the first does.Intuitively (for evolving data), a second label is the pointing direction of the vectors.This choice is fraught: Two time series are not comparable if they begin at different moments along a sequence because the initial labeling may not match.Additionally, validation of secondary labeling stability needs to be performed, and revalidation is necessary for each update because stability cannot be guaranteed.
With an assumption that a second labeling type is suitably stable, rank-order changes may still be observed.Before it can be concluded that the effect is a trait of the data, modeling assumptions must be considered first.There is a significant assumption embedded in the estimation of the eigensystem itself: using an eig(..) function call presupposes that the underlying statistical distribution is Gaussian.However, data is often heavy tailed, so copula or implicit methods are appropriate since Gaussian-based methods are not robust (see [14]).Another modeling concern is the number of independent samples per dimension used in each panel: too few samples inherently skews the eigenvalue spectrum, and when coupled with sample noise, may induce spurious rank-order changes (again, see [14]).Lastly, PCA ought only be performed on homogeneous data categories because the mixing of categories may well lead to spurious rank-order changes.Better to apply PCA independently to each category and then combine.
If rank-order change is determined to be a true trait of the data, then axial statistics is used in place of directional statistics.Referring to figure 6 above, two or more constellations will have some of their points mirror-imaged through the origin, creating a "barbell" shape along a common axis.The juxtaposition of the Watson density function, which is an axial distribution, to the von Mises-Fisher distribution, shows that the axial direction is squared in order to treat the dual-signed nature of the pointing directions [19].With this, directional statistics can be applied once again.

Regression Revisited
Let us apply the averages developed in this section to the equations in section 8, Application to Regression.
The SVD and orientation steps remain the same.The linear regression of ( 26) is modified to use the local average of the eigenvalues from (28), The prediction step is also modified from (27) to read Here the mean direction of the eigenvectors in (30) replaces the single-observation rotation matrix R in the original.
Use of eigenvalue and eigenvector averages will reduce sample-based fluctuation and therefore may improve the predictions.However, from a time-series perspective, averages impart delay because an average is constructed based on a lookback interval.For stationary underlying processes the delay may not a problem, yet for nonstationary processes the lag between the average and the current state can lead to lower quality predictions.
In any event, at the very least this section has presented a methodology to decompose variation across a sequence of eigensystem observations into meaningful quantities.

Reference Implementation
The Python package thucyd, written by this author, is freely available from PyPi5 and Conda-Forge6 and can be used directly once installed.The source code is available on at gitlab.com/thucyd-dev/thucyd.
There are two functions exposed on the interface of thucyd.eigen: orient eigenvectors implements the algorithm in section 6, and generate oriented eigenvectors implements R k = G(θ, k), eq. ( 22).
The pseudocode in listing 1 outlines the reference implementation for orient eigenvectors that is available at the above-cited source-code repositories.Matrix and vector indexing follows the Python Numpy notation.We can now build up Vor through W k from these embedded angles: It is apparent that as the subspace rotations are concatenated, the rightward columns become aligned to the reference eigenbasis Vor.That the reconstructed matrix matches the reference matrix before the last rotation is simply because the last rotation, R 4 , is the identity matrix, and is only included in the formalism for symmetry.
Looking ahead, an optimized implementation can avoid trigonometry by carrying the sine functions sin θ rather than the angle values θ.On the range θ ∈ [−π/2, π/2] the function is invertible, thus either representation will do.Elimination of trigonometry leaves only arithmetic and matrix multiplication, thus making the orient algorithm a candidate for LAPACK implementation.

Conclusions
Although eigenanalysis is an old and well-studied topic, linking eigenanalysis to an evolving dataset leads to unexpected results and new opportunities for analysis.The optimization of eigenanalysis codes for one-time solutions has admitted inconsistent eigenvector orientation because such inconsistency is irrelevant to the one-time solution.Yet for an evolving system it is precisely the inconsistency that disrupts interpretability of an eigen-based model.This article reports a method to correct for the inconsistency, assuming a framework where orientation is a postprocessing step to existing eigenanalysis implementations.Once corrected, directional statistics brings a new avenue of inquiry to the behavior of the underlying factors in the dataset.
It is hoped that the methods in this paper will be applied in fields as disparate as human health, climate change, navigation systems, economics, the internet, and other scientific fields.

4 .
In order to uniquely take the arcsine, the rotation angles are restricted to the domain θ ∈ [−π/2, π/2].As a consequence, the sign of each row is solely determined by the sine-function values, the cosinefunction values being nonnegative on this domain: c k ≥ 0. This leads to a global constraint on the solution since c 2 c 3 c 4 ≥ 0 could otherwise hold for pairs of negatively signed cosine-function values.
W_k = np .eye ( full_dim ) # iterate across columns in ( Vor , W_k ) for cursor in np .arange ( full_dim ) : # call focus fcxn ( the 2 nd argument is in base (1) ) W_k = W_k .dot ( thucyd .eigen .g e n e r a t e _ o r i e n t e d _ e i g e n v e c t o r s ( theta_matrix , cursor + 1) ) # reconstructed eigenbasis Vor_recon = W_k The output theta matrix from the function orient eigenvectors is the input to the focus function here.While we could simply call Vor_recon = thucyd .eigen .g e n e r a t e _ o r i e n t e d _ e i g e n v e c t o r s ( theta_matrix )it is revealing to build up Vor recon one subspace at a time.The oriented eigenvector and angles matrices are