Implicit algorithms for eigenvector nonlinearities

We study and derive algorithms for nonlinear eigenvalue problems, where the system matrix depends on the eigenvector, or several eigenvectors (or their corresponding invariant subspace). The algorithms are derived from an implicit viewpoint. More precisely, we change the Newton update equation in a way that the next iterate does not only appear linearly in the update equation. Although, the modifications of the update equation make the methods implicit we show how corresponding iterates can be computed explicitly. Therefore we can carry out steps of the implicit method using explicit procedures. In several cases, these procedures involve a solution of standard eigenvalue problems. We propose two modifications, one of the modifications leads directly to a well-established method (the self-consistent field iteration) whereas the other method is to our knowledge new and has several attractive properties. Convergence theory is provided along with several simulations which illustrate the properties of the algorithms.


Introduction
Let M ⊂ R n×n denote the set of symmetric n × n-matrices.Let A : R n×p → M , p < n.We consider the problem of finding V ∈ R n×p and a symmetric S ∈ R p×p such that This is the general formulation of the eigenvector-dependent nonlinear eigenvalue problem.In our work A satisfies A(V P ) = A(V ) for any non-singular matrix P such that the range of V can be seen as an invariant subspace of A. This property (and a notion of invariant subspace) is characterized in Section 2, we show how a problem which does satisfy the condition can be transformed where we also give a problem transformation applicable when the condition is not satisfied.
The V -dependent matrix A(V ) is assumed to satisfy certain properties such that, such that the columns span of V can be This notion of invariant subspace is characterized in Section 2, in particular conditions on A.
If p = 1 the setting reduces to a class of problem which has received some attention, mostly in application specific settings.In this case we need to determine v ∈ R n and λ ∈ R such that (2) A(v)v = λv where v = 1.Our results are, in particular, applicable for this case.
A number of algorithms have been proposed for the above problems, as we summarize below.In this paper we propose to derive algorithms based on implicit formulations, in particular based on implicit improvements of Newton's method.One proposed algorithm leads to a linearly convergent wellestablished method, whereas the other approach leads to a new method with quadratic convergence.Both of the implicit approaches have advantageous properties for certain problem classes that we characterize.
Our approach is based on viewing iterative eigenvalue solvers (for eigenvector nonlinearities) as modifications of Newton's method.This has also been done for standard eigenvalue problems, already by Wilkinson and Peters [17].See also the recent review paper [24] and the paper by Unger [26].
One of the most important applications for (1) is within the field of quantum mechanics and electronic structure calculations.Discretization methods in combination with the Hartree-Fock approximation or the Kohn-Sham equations lead to problems of type (1).See standard literature in quantum chemistry [23].For a survey of numerical methods, see [22].Considerable application specific research has been carried to specialized algorithms for this problem, mainly based on the self-consistent field iteration (SCF).SCF is an iterative method that involves solving a linear eigenvalue problem in each step until convergence or self-consistency.The convergence of SCF and its variants has been studied in a number of works which can be classified into two broad categories: the optimization based approach of looking at (1) as the optimality conditions of a minimization problem [7], [13], [15], [14] or different matrix analysis based approaches [28], [27].Strategies for accelerating the convergence of SCF have also been studied well, e.g., [19], [18].
The special case p = 1 has its most important application in quantum physics.Characterization of the ground state of bosons is usually done with the Gross-Pitaevskii equation [12], [1], whose spatial discretization is of the form (2).Although SCF can be used in this case too, the more common techniques involve discretization of a gradient flow.See [4], and references therein.
Another class of applications where p = 1 arises is in data science, for example, applications such as spectral clustering which rely on computing eigenpairs of the p-Laplacian [6], [16], [9].See [3] for a Rayleigh quotient minimization approach for Fisher linear discriminant analysis, which is used in pattern recognition and classification.In [25], the authors propose a new model for the core-periphery detection problem in network science (in the sense of [5]) and show its equivalence to the p = 1 problem.
The contributions of the paper can be summarized as follows.In Section 2, we introduce the concept of basis invariance.This allows us to derive an alternate characterization of (1) in terms of an associated Jacobian.We introduce our implicit algorithms in Section 3 motivated by this result.Explicit procedures to carry out these algorithms are derived and studied in Section 4. In Section 5, we provide convergence results for these algorithms and Section 6 contains numerical examples, illustrating advantages of our approach.
We will extensively use vectorization and devectorization and introduce the following shorthand.Small letters denote the vectorization of capital letters.For example, For any H : R n → R n , the operator dH dv denotes forming the Jacobian of H with respect to v, where v denotes vectors in R n .

Notion of invariant subspace
In order to appropriately generalize the concept of invariant pairs, we will throughout the paper make the following assumption on A.

Assumption 1 (Basis invariance)
We consider A : R n×p → M such that it is a function of the outer product of W , i.e., for some B : M → M .Moreover, we assume that for any X ∈ M , where h : R → R denotes the heavyside function, generalized to a matrix sense, Assumption 1 is a generalization of the scaling invariance property for the case p = 1 in [12].If p = 1 and v ∈ R n , then for any α ∈ R, Moreover, Assumption 1 leads to the fact that A(W ) = A(W P ) for invertible P , as we shall illustrate in the following theorem.This is important in our context, since it allows us to interpret the columns of W as a basis of an invariant subspace, and A can be viewed as a function of a subspace, i.e., it is a function of a vector space, and independent of the basis.
Theorem 1 If A satisfies the basis invariant conditions (3) and (4) then A(W ) = A(W P ) for any non-singular matrix P ∈ R p×p and W ∈ R n×p where n ≤ p and rank(W ) = p .
Proof Let W = QS for some invertible matrix S and Q ∈ R n×p orthogonal.Let V, Λ + be a diagonalization of SS T , i.e., SS T = V Λ + V T , where V is orthogonal and Λ + is a positive diagonal matrix.Then, This along with (3) and (4) gives us If we let W = QR be a QR-factorization of W , we see that A(W ) = A(Q) with S = R, and A(W P ) = A(Q) with S = RP .This shows that A(W ) = A(W P ), which concludes the proof.
Since S is symmetric, it can be diagonalized as S = Q s Λ s Q T s where Q s is orthogonal.Problem (1) can be reformulated using Theorem 1 as ( 6) showing that a solution to (1) can be diagonalized.
Example 1 (Transformation to basis invariant form) The heaviside function usually does not appear directly in the standard formulation in NEPv-applications, but can be obtained easily.In the context of the self-consistent field iteration in quantum chemistry, we want to solve the equation ( 7) where, e.g.H(V ) = H 0 + diag(V V T ), which does not satisfy (3) and ( 4).This can be transformed to a problem satisfying (3) and (4) by defining ( 8) A pair (V, S) is a full rank solution to (7) if and only if it is a solution to (1) with A defined as in (8).However, the similarity transformation of a solution, i.e., (V P, P −1 SP ) is a solution to (1), but not (7).
We will denote the Jacobian as follows, and we directly characterize a theoretical property as a consequence of Assumption 1.
Definition 1 (Left-hand side Jacobian) The Jacobian of the vectorization of the LHS of the first subequation of (1) is denoted as J : R np → R np×np and given by ( 9) The vectorized form of (1) can now be written as ( 10) The method we propose will work better for problems where the Jacobian evaluated in the solution is non-singular.The Jacobian of (10) in the fixed point is given by ( 11) where As a conseqeunce of the Assumption 1, we conclude the following generalization of [12,Lemma 2.1], which shows a relationship between the eigenpairs of J and A. We exploit this relationship later in Section 3 and Section 4 where we formulate and derive our algorithms respectively.
Theorem 2 (Eigenproblem equivalence) For any v ∈ R np , we have Proof From (9), we have Interpreting the above as a directional derivative in the direction of v, we have This completes the proof.

Implicit algorithms
Standard Newton's method for the vectorized form ( 10) is where the ∆-matrices are updates, . where The iterates V (k+1) computed using ( 12) and ( 13) are not orthogonal matrices.We prefer iterates which are orthogonal since the solution is orthogonal and it is advisable to work with orthogonal iterates from a robustness perspective.We will now carry out a modification such that ( 14) In this first modification of Newton's method, we use an alternate definition of Z k (15) .
The reason this implies ( 14) can be seen as follows.We first observe the relation from the product rule ( 16) for an arbitrary vector w.The last block equation in the update (12) now reads Reversing the vectorization, and using ( 16) the left hand side can be expanded and simplified to The insertion into (17) leads to V (k+1) T V (k+1) = I p , i.e., ( 14) is satisfied for k = 2, 3, . ... We introduce this first implicit method in order to maintain orthogonality of the eigenvector approximations.This is summarized in the following lemma, which also illustrates that the behavior is not that different from the standard Newton method.The proof of statement (b) is based on theory for Newton-like methods and can be found in Appendix A.
Lemma 1 Let v (k) , s (k) , k = 1, . .., be a sequence of vectors that satisfy (12) with Z k given by (15).Then, If the sequence converges monotonically to a solution (V * , S * ), and the Jacobian evaluated in (V * , S * ) (given by (11)) is invertible, then it converges with the same convergence order as Newton's method.
In this work we consider two modifications of the Newton-like method in Lemma 1.Both lead either to new methods (which have some attractive properties) or well-established methods suggesting that the methods can be viewed as Newton-like methods.
-We modify the (1,2) block of the Jacobian to This is analogous to the modification [11, Equation (1.10)] which directly leads to the method of successive linear problems [21].With the techniques of the next section, this leads to Algorithm 1. -We modify the (1,1) block of the Newton's method Jacobian from J(v (k) ) to I p ⊗ A(V k ), in addition to the modification done to obtain Algorithm 1. ( We will show that this leads to Algorithm 2, which is the well known SCF iteration.

Explicit interpretation of algorithms
Both update equations ( 18) and ( 19) correspond to implicit methods.We will illustrate several situations where we can generate iterates that satisfy the implicit algorithms update equations in an explicit way.Equations ( 18) and ( 19) have to be expanded to derive usable algorithms, starting with the latter variant since it leads to a clear relationship with stateof-the-art methods.

Algorithm 2
Although, Algorithm 2 is a modification of Algorithm 1, we start our discussion with Algorithm 2 since it leads to a well-established method.We can obtain Algorithm 2 can be obtained from (19) by multiplying out the first subequation of (19) as follows.
Cancellation of terms leads to (20) Devectorizing this system gives the following result.
Theorem 3 Let (V (k) , S (k) ) be pairs that satisfy the update equation ( 19) where This leads to a practical way to carry out the algorithm, since ( 21) is an eigenvalue problem where V (k+1) are the eigenvectors and the diagonal matrix S (k+1) are the eigenvalues.This is the well-known SCF algorithm.

Algorithm 1
Similarly, Algorithm 1 can be obtained from the first subequation in (18) as follows.
From Theorem 2, we have k) .Using this in (22) leads to the following result.
Theorem 4 Let (V (k) , S (k) ) be pairs that satisfy the update equation (18 Since J is not block diagonal, ( 23) cannot be easily devectorized as was done for (20).For the special case p = 1 we directly identify that (23) reduces to Similar to ( 21), ( 24) is a standard eigenvalue problem and we can compute a next iterate with a solver for standard eigenvalue problem.It directly suggests that the matrix A(V (k) ) in the SCF-iteration, can be viewed as an approximation of the Jacobian matrix, and in order to obtain faster convergence it can be better to use J(v (k) ), or approximations thereof.This in turn leads to quadratic convergence and in contrast to Newton's method, the method converges in one step for a linear problem, and is superior to Newton's method for problems that are close to being linear in this sense (as we prove in Section 5.3).

Implementation aspects
In both the implicit algorithms, the first step in the loop can be done with standard tools for solving linear eigenproblems [2].The iterates of both algorithms will depend on which p eigenvectors are selected to construct V (k+1) .The best choice is usually application dependent.For example, in applications based on quantum mechanics, one is usually interested in the p first eigenvalues and hence, the eigenvectors corresponding to the p first eigenvalues would be selected in each step.Although Theorem 3 and Theorem 4 provide us with explicit ways to implement our implicitly formulated methods, they do not automatically enforce the orthogonality constraint V (k+1) T V (k+1) = I p .To this end, we compute an intermediate eigenvector eigenpair (Y, Z) and add an additional step in our algorithms that involves computing a thin QR factorization of Z to obtain V (k+1) .This improves the numerical stability of our implementation.
Compute thin QR factorizaton of Y to get V (k+1) .
Compute a similarity transformation of Z: Note that the final step in both algorithms need not be done for every iteration of the loop since we do not use S (k+1) anywhere inside it.We can Compute thin QR factorizaton of Y to get V (k+1) .
Compute a similarity transformation of Z: S (k+1) = R −1 ZR. end instead compute the transformation only once after the termination of the loop.
We do not provide an explicit procedure for (23) for p > 1, making the procedure somewhat theoretical for that case.There are very important applications for the p = 1 case, and we illustrate in the simulations that if we can solve (23) corresponding to p > 1 in some way (possibly approximately) we obtain attractive convergence properties.

Local convergence of Algorithm 2
Since Algorithm 2 is equivalent to SCF as shown in Theorem 3, the convergence can be described in the setting of SCF.There has been extensive study of convergence of SCF and its acceleration in the last fifty years.Several results exist in the literature, as mentioned in Section 1.In general, SCF exhibits linear local convergence when it converges.Convergence can be characterized in terms of gaps [28] (see also [27,Theorem 3.1]).Recent gloval convergence are available, e.g., in [15], [14].

Local convergence of Algorithm 1
Due to our inexact Newton viewpoint, the convergence of Algorithm 1 can be characterized using results in the rich literature on inexact Newton methods.Quadratic local convergence can be proved using theorems in [8].
Theorem 5 Let (V (k) , S (k) ), k = 1, . .., be a sequence of pairs that satisfies satisfy (23) with the normalization constraint V (k) T V (k) = I p .If the sequence converges monotonically to a solution (V * , S * ), and the Jacobian evaluated in (V * , S * ) (given by (11)) is invertible, then it converges with the same convergence order as Newton's method.
We refer the reader to Appendix A for the proof.

Single step analysis
As we illustrate in the examples, the implicit methods tend seem to often work considerably better in general and in particular for close-to-linear problems.This is intuitively natural since both implicit methods converge in one step if we apply it to linear problems.
This can be further characterized, by considering one step of the method applied to a problem parameterized by a parameter α, where α = 0 corresponds to a linear problem.For this analysis we consider the model problem Let v 0 , λ 0 T be an initial guess and v + , λ + T be the result of (for the moment) any of the two algorithms.We introduce three functions where β can be β = * , β = A and β = J.The values of P (where we dropped the parameters for notational convenience) denote the nonlinearity These nonlinear functions respectively corresponds the residual for the exact solution (β = * ), one step of Algorithm 1 (β = J) and one step of Algorithm 2 (β = A).
We can apply the implicit function theorem for all three functions, and express the first n+1 variables in terms of the third variable α in a neighborhood of the solution, if the associated Jacobian is non-singular.The Jacobian given (11) is now assumed to be non-singular in the solution.The exact solution can then be expanded as (26) v * (α) whereas both β = A and β = J can be expanded as where c A = P a v * (0) and c J = P j v * (0).The first terms in the Taylor expansion of the next iterate and the exact iterate as a function of the parameterization of the nonlinearity, are equal.Therefore, meaning that the accuracy of one step, is order of magnitude of the nonlinear term.Moreover, the coefficient is proportional to C(v * (0))v * (0) − P β v * (0) .

Scalar nonlinearity
The theory and methods are first illustrated with a reproducible example where p = 1.We consider ( 2) with ( 28) and for essentially any c ∈ R n .This specific example appears in [12, Section 3.3] and J is explicitly given by We solve four instances of this problem generated by four different values of α, that is α = 0, 0.5, 1 and 5. To all of these instances, we apply Algorithm 1 (using ( 24)), Algorithm 2, the J-Inverse iteration (from [12]) and Newton's method with initial guess v 0 = 1, 1, 1, 1 T .In Figure 1, we see the error history of all three methods for all four values of α.The error is computed as v k − v * , where v * is the reference solution.
We observe linear convergence for Algorithm 2 and quadratic convergence for Algorithm 1 as predicted by the theory in Section 5.Both implicit methods are competitive, at least for small values of α.For higher values of α, the number of iterations required to enter the regime of quadratic convergene increases for both Algorithm 1 and Newton's method.This example illustrates a simple case when Algorithm 1 is a better choice than Newton's method, although both methods converge quadratically.We observe linear convergence for J-Inverse iteration as predicted by [12,Theorem 3.1].
In Figure 2, we visualize the implications of the theory in Section 5.3 by plotting the single step errors for all three methods.It is clear that the single step error is linear in α, as expected from (26) and (27).The predicted line is plotted using the coefficent C(v * (0))v * (0) − P β v * (0) .This illustrates an advantage of the proposed methods for small α.The Gross-Pitaevskii equation (GPE) is a nonlinear PDE obtained by a Hartree-Fock approximation (see [22]) of the Schrödinger equation.It describes the ground state of identical bosons in a quantum system.We consider the case of a rotating Bose-Einstein condensate on the domain D = (−L, L) × (−L, L).
In this case, the GPE for the wave function Ψ : R 2 → C under an external Here, The scalar b is a constant indicating the strength of interaction between the bosons and Ω is the angular velocity of rotation.We choose the boundary condition Ψ (x, y) = 0 for (x, y) ∈ ∂D.
We perform a central difference discretization of (29) using a uniform grid of N + 2 points along each dimension with grid spacing ∆x.Details are in [12,Section 5.1].This leads to a problem of size n = 2N 2 with where Ã0 is the discretization of the linear operator gives the vectorization of Ψ evaluated at the interior points.We have We use Algorithm 1 on with J as defined by (30).Since one step of both Algorithm 1 and Algorithm 2 requires the solution of a standard eigenvalue problem, we need to select an appropriate eigenpair at each step.Special attention is needed in the selection in this problem.We select a new iterate in a way that minimizes the difference between two iterates.More precisely, we choose δ ∈ (0, 1) and select all eigenpairs which correspond to eigenvalue within a radius δ of a given target.We then do a least squares fitting to find the linear combination of these eigenvectors which is closest to the previous iterate.This is needed due to the fact that the problem has highly clustered eigenvalues.

Invariant subspace
We consider (31) where A 0 is the discrete 1D Laplacian.Problems of this type occur frequently in electronic structure calculations when using a Hartree-Fock discretization of the Schrödinger equation.See [28] for a discussion of the problem type and convergence results of SCF applied to (31).
If we let e i denote the the ith column of the identity matrix I n and E i,j = e i e T j , then We refer the reader to Appendix B for a derivation of J(V ) for problems of this type.The implementation of Algorithm 2 is straightforward from (21).We also illustrate the importance of the implicit formulation of Algorithm 1 by inexactly solving (23).We do this with the optimization subroutine fminsearch.It provides us a way to test Algorithm 1 for relatively small examples as a proof of concept.We use n = 10, p = 3 and apply Algorithm 1 and Algorithm 2 for two different values of α.In Figure 4, we observe that Algorithm 2 con- verges linearly and Algorithm 1 has much faster convergence, with an initial quadratic phase.The number of iterations required for convergence increases with increase in α, as expected from the single step analysis of section 5.3.The initial quadratic phase is suceeded by an asymptotic slowdown, which can be attributed to the inexact solution of the update equation (23).

Conclusions and Outlook
This paper shows that taking an inexact Newton approach towards deriving algorithms for problems with eigenvector nonlinearities leads to new algorithmic insights.Using this approach, we derive two algorithms.Algorithm 2 is shown to be the widely used SCF algorithm.This result shows a connection between Newton's method and the SCF algorithm which was previously unknown.Algorithm 1 is a new algorithm, to the best of our knowledge.We prove that Algorithm 1 exhibits quadratic local convergence.Both Algorithm 1 and Algorithm 2 have favourable convergence properties for problems that are close to being linear, as shown by the single step analysis of Section 5.3.Numerical simulations for the Gross-Pitaevskii equation in Section 6.2 show that Algorithm 1 is a competitive algorithm for the p = 1 case.The p > 1 example in Section 6.3 shows that Algorithm 1 converges faster than Algorithm 2 even when we solve the update equation ( 23) inexactly.
There are several improvements of the SCF algorithm.Some of these techniques may be interpretable from an implicit viewpoint as well.For instance, acceleration schemes such as DIIS [19] might be seen as an inexact Newton algorithm.This could be combined with other convergence theory to gain further understanding of DIIS.Another direction that can be explored is to develop application-specific strategies to solve the (23) for p > 1 or approximate solution techniques that lead to superlinear convergence.

A.1 Proof of Lemma 4
Proof Let F ′ k be the Jacobian of F evaluated in the iterate k.In the notation of [8] we introduce a residual denoted r k , corresponding to the difference between a Newton step and a inexact Newton step: For notational convenience we now define such that the Z k in the Jacobian ( 12) is Z k = 1 2 (α k + α k+1 ).Then, subtracting (18) from (32) the residual becomes In our setting where we used the smoothness of d dv vec(V T V ) such that (33) By the assumption of monotonic convergence, we have, (34) From the implicit function theorem (e.g. the formulation in [20,Theorem 9.28]) and the assumption about the invertability of the Jacobian at the solution we get that where F ′ * is the Jacobian of F evaluated in the solution.The combination of (33), ( 34) and (35) leads to By [8, Theorem 3.3], the proof is complete.

A.2 Proof of Theorem 7
Proof For any sequence of pairs (V (k) , S (k) ) that satisfies (23) with the V k T V k = Ip, the corresponding vectorized pairs must satisfy (18).The iterative method dictated by ( 18) is also an Inexact Newton Method with the Jacobian approximation where Bk is the modified (1,2) block of the Jacobian.Using this approximation, the residual vector is ∆s (k)  .
Repeating the final arguement in the proof of Lemma 1, the proof is complete.

B Derivation of J for a problem type
We consider a specific problem with where c i , d i ∈ R n and A i ∈ R n×n , ∀i ∈ {1, . . ., k}.
Using (9), we have Theorem 6 Suppose V ∈ R n×p has full column rank.For any d ∈ R n , ∃δ > 0 such that where g δ (x) = h(x − δ).
Proof Since V is full rank, we can diagonalize V V T as V V T = P SP T , where and P is orthogonal.Hence, if 0 < δ 0 < min i∈{1,...,p} s i,i , we have (37) h(V V T ) = P h(S)P T = P g δ 0 (S)P T = g δ 0 (P SP T ) = g δ 0 (V V T ).
Using the continuity of the eigendecomposition, we have Note that ||∆V || can be chosen to be arbitrarily small and by (37), we can choose δ > O( ∆V ) such that h (V + ∆V )(V + ∆V ) T − h(V V T ) = g δ (V + ∆V )(V + ∆V ) T − g δ (V V T ).
For such δ, we have Hence, the proof if complete.
Theorem 7 (Frechet derivative of shifted heavyside function) Let Lg(W, E) denote the Frechet derivative of g δ at W and applied to E. For any V ∈ R n×p such that V T V = I, we have Lg(V V T , E) = (I − V V T )EV V T + V V T E(I − V V T ).
Note that c T d dw (g δ (W )d) W =V V T = c T Lg(V V T , E 1,1 )d c T Lg(V V T , E 2,1 )d . . .c T Lg(V V T , En,n)d where E i,j = e i e T j .Let P n 2 ∈ R n 2 ×n 2 be the shuffle matrix such vec(W ) = P n 2 vec(W T ) for any W ∈ R n×n .Then for any V ∈ R n×p such that V T V = I.

Fig. 2
Fig. 2 Dependence of single step error on α

Fig. 3
Fig. 3 Algorithm 1 and J-Inverse iteration applied to GPE for different b

Fig. 4
Fig. 4 Algorithm 1 and Algorithm 2 for different α applied t