Abstract
We study and derive algorithms for nonlinear eigenvalue problems, where the system matrix depends on the eigenvector, or several eigenvectors (or their corresponding invariant subspace). The algorithms are derived from an implicit viewpoint. More precisely, we change the Newton update equation in a way that the next iterate does not only appear linearly in the update equation. Although the modifications of the update equation make the methods implicit, we show how corresponding iterates can be computed explicitly. Therefore, we can carry out steps of the implicit method using explicit procedures. In several cases, these procedures involve a solution of standard eigenvalue problems. We propose two modifications, one of the modifications leads directly to a wellestablished method (the selfconsistent field iteration) whereas the other method is to our knowledge new and has several attractive properties. Convergence theory is provided along with several simulations which illustrate the properties of the algorithms.
Introduction
Let \(M\subset \mathbb {R}^{n\times n}\) denote the set of symmetric n × nmatrices. Let \(A:\mathbb {R}^{n\times p}\to M\), p ≤ n. We consider the problem of finding \(V\in \mathbb {R}^{n\times p}\) and a symmetric \(S \in \mathbb {R}^{p\times p}\) such that
This is the general formulation of the eigenvectordependent nonlinear eigenvalue problem. In our work, A satisfies A(V P) = A(V ) for any nonsingular matrix P such that the range of V can be seen as an invariant subspace of A. This property (and a notion of invariant subspace) is characterized in Section 2.1, where we also provide a problem transformation applicable when the condition is not satisfied.
If p = 1, the setting reduces to a class of problems which has received considerable attention, mostly in applicationspecific settings, as we further discuss below. In this case, we need to determine \(v\in \mathbb {R}^{n}\) and \(\lambda \in \mathbb {R}\) such that
where ∥v∥ = 1.
A number of algorithms have been proposed for the above problems, both for p = 1 and the general case. In this paper, we propose to derive algorithms based on implicit formulations, in particular based on implicit improvements of Newton’s method. One proposed algorithm leads to a linearly convergent wellestablished method, whereas the other approach leads to a new method with quadratic convergence. Both of the implicit approaches have advantages for certain problem classes that we characterize.
Our approach is based on viewing iterative eigenvalue solvers (for eigenvector nonlinearities) as modifications of Newton’s method. This has also been done for standard eigenvalue problems, already by Wilkinson and Peters [23]. The Newton’s method viewpoint of iterative methods has also been used in the derivation of algorithms for nonlinear eigenvalue problems with eigenvalue nonlinearity, e.g., [15, 21, 32]. See also the recent review paper [30] and to our knowledge the first publication in this direction by Unger [33].
One of the most important applications for (1) is within the field of quantum mechanics and electronic structure calculations. Discretization methods in combination with the HartreeFock approximation or the KohnSham equations lead to problems of type (1). See standard literature in quantum chemistry [29]. For a survey of numerical methods, see [28]. Considerable applicationspecific research has been carried to specialized algorithms for this problem, mainly based on the selfconsistent field iteration (SCF). SCF is an iterative method that involves solving a linear eigenvalue problem in each step until convergence or selfconsistency. The convergence of SCF and its variants has been studied in a number of works which can be classified into two broad categories: the optimizationbased approach of looking at (1) as the optimality conditions of a minimization problem [8, 18,19,20] or different matrix analysis–based approaches [34, 35]. For a discussion of similarities and differences among the two approaches, see [9]. Strategies for accelerating the convergence of SCF have also been studied well, e.g., [24, 25].
The special case p = 1 has very important applications in quantum physics. Characterization of the ground state of bosons is usually done with the GrossPitaevskii equation, see, e.g., references in [1] (and also [16]), whose spatial discretization is of the form (2). Although SCF can be used in this case too, the more common techniques involve discretization of a gradient flow. See [5], and references therein. Also applicable to the GrossPitaevskii equation is the result in our paper [16], which has similarities with the current approach in the sense that we both use higher order terms (represented with a specific type of Jacobian).
Another class of applications where p = 1 arises is in data science, for example, applications such as spectral clustering which rely on computing eigenpairs of the pLaplacian [7, 13, 22]. See [4] for a Rayleigh quotient minimization approach for Fisher linear discriminant analysis, which is used in pattern recognition and classification. In [31], the authors propose a new model for the coreperiphery detection problem in network science (in the sense of [6]) and show its equivalence to the p = 1 problem.
Our approach is based on flavors of Newton’s method and there are many other approaches also based on Newton’s method for related problems. A popular technique is based on deriving iterative methods from a geometric perspective, [12], often with optimization techniques as in [36]. Our algorithms are not derived from an optimization method, and more importantly, our implicit construction directly provides features which are not directly available from a geometry perspective, e.g., the attractive property illustrated in Section 5.3.
We note that in the context of some applications, the methods of this paper are naturally viewed as the second stage in a twostage approach outlined as follows:

Obtain an approximation of the solution of interest with a method which has attractive global convergence properties.

Improve the solution with a fast locally convergent method.
For example, in the context of the GrossPitaevskii equation, the popular gradient flow methods [5] often converge to the solution of interest (the ground state) but can be slow if a small steplength is required in the timestepping scheme. A gradient flow method can be used in the first stage, followed by a fast locally convergent method in the second stage to obtain an accurate solution. The method in the second stage can be initialized with the solution from the first stage. Some NEPv problems are easier to solve, in the sense that there exist algorithms based on a companion linearization [10], which is another example of a method to be used in the first stage.
The contributions of the paper can be summarized as follows. In Section 2.1, we introduce the concept of basis invariance. This allows us to derive an alternate characterization of (1) in terms of an associated Jacobian. We introduce our implicit algorithms in Section 3 motivated by this result. Explicit procedures to carry out these algorithms are derived and studied in Section 4. In Section 5, we provide convergence results for these algorithms and Section 6 contains numerical examples, illustrating advantages of our approach.
We will extensively use vectorization and devectorization and introduce the following shorthand. Small letters denote the vectorization of capital letters. For example,
For any \(H:\mathbb {R}^{n}\to \mathbb {R}^{n}\), the operator \(\frac {dH}{dv}\) denotes forming the Jacobian of H with respect to v, where v denotes vectors in \(\mathbb {R}^{n}\). Also, \(\left (\frac {dH(v)}{dv}\hat {v}\right )_{\hat {v}=v}\) denotes the directional derivative of H evaluated at v in the direction of v.
Preliminaries
Notion of invariant subspace
In order to appropriately generalize the concept of invariant pairs, we will throughout the paper make the following assumption on A.
Assumption 1 (Basis invariance)
We consider \(A:\mathbb {R}^{n\times p}\to M\) such that it is a function of the outer product of W, i.e.,
for some \(B:M\rightarrow M\). Moreover, we assume that
for any X ∈ M, where \(h:\mathbb {R}\to \mathbb {R}\) denotes the heaviside function
Note that h is defined in a matrix function sense, for example, using diagonalization in [14, Definition 1.2]. Assumption 1 is a generalization of the scaling invariance property for the case p = 1 in [16]. If p = 1 and \(v\in \mathbb {R}^{n}\), then for any \(\alpha \in \mathbb {R}\),
Moreover, Assumption 1 leads to the fact that A(W) = A(WP) for invertible P, as we shall illustrate in the following theorem. This is important in our context, since it allows us to interpret the columns of W as a basis of an invariant subspace, and A can be viewed as a function of a subspace, i.e., it is a function of a vector space, and independent of the basis.
Theorem 1
If A satisfies the basis invariant conditions (3) and (4) then A(W) = A(WP) for any nonsingular matrix \(P \in \mathbb {R}^{p\times p}\) and \(W\in \mathbb {R}^{n\times p}\) where n ≥ p and rank(W) = p.
Proof
Let W = QU for some invertible matrix U and \(Q\in \mathbb {R}^{n\times p}\) orthogonal. Let V,Λ_{+} be a diagonalization of UU^{T}, i.e., \(UU^{T}=V{\Lambda }_{+}V^{T}\), where V is orthogonal and Λ_{+} is a positive diagonal matrix. Then,
This along with (3) and (4) gives us
If we let W = QR be a QR factorization of W, we see that A(W) = A(Q) with U = R, and A(WP) = A(Q) with U = RP. This shows that A(W) = A(WP), which concludes the proof. □
Since S is symmetric, it can be diagonalized as \(S = Q_{s}{\Lambda }_{s}{Q_{s}^{T}}\) where Q_{s} is orthogonal. Problem (1) can be reformulated using Theorem 1 as
showing that a solution to (1) can be diagonalized.
Example 1 (Transformation to basis invariant form)
The heaviside function usually does not appear directly in the standard formulation in NEPv applications, e.g., those mentioned in the introduction, but can be obtained easily. In the context of the selfconsistent field iteration for a simplified version of a quantum chemistry problem, we want to solve the equation
where, e.g., H(V ) = H_{0} + diag(V V^{T}), which does not satisfy (3) and (4). This can be transformed to a problem satisfying (3) and (4) by defining
A pair (V,S) is a full rank solution to (7) if and only if it is a solution to (1) with A defined as in (8). However, the similarity transformation of a solution, i.e., (V P,P^{− 1}SP) is a solution to (1), but not (7).
Jacobian properties
We will denote the Jacobian as follows, and we directly characterize a theoretical property as a consequence of Assumption 1.
Definition 1 (Lefthand side Jacobian)
The Jacobian of the vectorization of the LHS of (1a) is denoted as \(J:\mathbb {R}^{np}\to \mathbb {R}^{np\times np}\) and given by
In the analysis and derivation of our algorithms, we need an intermediate problem where the orthogonalization is done with respect to a constant C matrix. More precisely, we study this problem:
Equivalence of (10a) and (1) is discussed in Section 3. The vectorized form of (10a) can now be written as
The methods we propose will work better for problems where the Jacobian evaluated in the solution is nonsingular. The Jacobian of (11) in the fixed point is given by
As a consequence of Assumption 1, we conclude the following generalization of [16, Lemma 2.1], which shows a relationship between the eigenpairs of J and A. We exploit this relationship later in Sections 3 and 4 where we formulate and derive our algorithms respectively.
Theorem 2 (Eigenproblem equivalence)
For any \(v\in \mathbb {R}^{np}\), we have
Proof
From (9), we have
Interpreting \(\left (\frac {d}{dv}(I_{p}\otimes A(V))\hat {v}\right )_{\hat {v}=v}\) as a directional derivative at v in the direction of v and evaluated at v, we have
This completes the proof. □
Implicit algorithms
Constant orthogonalization Newton’s method
The basis of our derivation is a variation of the nonlinear system of (1). The property that A(V P) = A(V ) implies that we can view V as a basis of a subspace, and therefore we can consider an equivalent formulation with a different orthogonality condition (10a) where C^{n×p} is any fixed vector (and is typically chosen as an approximation of W ). The problems (10a) and (1) are equivalent for any W with full column rank since if we let W^{T}W = R^{T}R be a Cholesky factorization of W^{T}W we can define V = WR^{− 1} such that V^{T}V = I and A(V ) = A(W). This is direct consequence of the Grassman manifold description, as explained in the convergence theory (of SCF) in [3].
The constant orthogonalization formulation (10a) will be used to derive an algorithm. Although (1) and (10a) are equivalent, we use (10a) in the analysis for simplicity. Note that for p > 1, (1) has a continuum of solutions because V is an orthogonal basis of a subspace and any other orhogonal basis is also a solution (for different S). The solutions to (10a) are in general isolated.
Standard Newton’s method for the vectorized form (10a) is
where the Δmatrices are updates, w^{(k+ 1)} = w^{(k)} + Δw^{(k)} and z^{(k+ 1)} = z^{(k)} + Δz^{(k)}, with where
Lemma 1
Let w^{(k)}, z^{(k)}, k = 1,…, be a sequence of vectors that satisfy (13). Then,

C^{T}W^{(k)} = I_{p} for k = 2,3,…

If the Jacobian evaluated in (W_{∗},Z_{∗}) (given by (12)) is invertible, then the convergence is quadratic.
Proof
If C^{T}W^{(k)} = I_{p} then the second subequation of (12) gives
which proves (a). The statement (b) is a standard result about the convergence of Newton’s method □
In this work, we consider two modifications of Newton’s method as formulated in (13). Both lead either to new methods (which have some attractive properties) or wellestablished methods suggesting that the methods can be viewed as Newtonlike methods.

We modify the (1,2) block of the Jacobian to I_{p} ⊗ W^{(k+ 1)}.
$$ F^{(k)}=\left[\begin{array}{cc} J(w^{(k)}){Z^{(k)}}^{T}\otimes I_{n}&I_{p}\otimes W^{(k+1)}\\ I_{p}\otimes C^{T}&0 \end{array}\right]\left[\begin{array}{cc} {{\varDelta}} w^{(k)}\\ {{\varDelta}} z^{(k)} \end{array}\right] $$(14)This is analogous to the modification [15, Equation (1.10)] which directly leads to the method of successive linear problems [27]. With the techniques of the next section, this leads to Algorithm 1.

We modify the (1,1) block of the Newton’s method Jacobian from J(w^{(k)}) to I_{p} ⊗ A(W^{k}), in addition to the modification done to obtain Algorithm 1.
$$ F^{(k)}=\left[\begin{array}{cc} I_{p}\otimes A(W^{(k)}){Z^{(k)}}^{T}\otimes I_{n}&I_{p}\otimes W^{(k+1)}\\ I_{p}\otimes C^{T}&0 \end{array}\right]\left[\begin{array}{cc} {{\varDelta}} w^{(k)}\\ {{\varDelta}} z^{(k)} \end{array}\right] $$(15)We will show that this leads to Algorithm 2, which is the wellknown SCF iteration.
Reformulation for direct computation
Both updates (14) and (15) correspond to implicit methods. We will illustrate several situations where we can generate iterates that satisfy the implicit algorithms update equations in an explicit way.
Algorithm 2
Although Algorithm 2 is a modification of Algorithm 1, we start our discussion with Algorithm 2 since it leads to a wellestablished method. We can obtain Algorithm 2 from (15) by multiplying out the first subequation of (15) as follows.
Cancellation of terms leads to
Devectorizing this system gives the following result.
Theorem 3
Suppose C^{T}W^{(k)} = I_{p}. Then, the pair (W^{(k+ 1)},Z^{(k+ 1)}) satisfies the update (15) if and only if it satisfies
and C^{T}W^{(k+ 1)} = I_{p}.
Equation (17) gives a practical way to compute the iterates from (15). Given W^{(k)}, we compute (W^{(k+ 1)},Z^{(k+ 1)}) as an invariant pair of A(W^{(k)}). This is the wellknown SCF algorithm.
Algorithm 1
The first subequation in (15) implies that
From Theorem 2, we have \(J(w^{(k)})w^{(k)} = \left (I_{p}\otimes A(W^{k})\right )w^{(k)}\). Using this in (18) leads to the following result.
Theorem 4
Suppose C^{T}W^{(k)} = I_{p}. Then, the pair (W^{(k+ 1)},Z^{(k+ 1)}) satisfies the update (14) if and only if it satisfies
and C^{T}W^{(k+ 1)} = I_{p}.
Since J is not block diagonal, (19) cannot be easily devectorized as was done for (16). For the special case p = 1, we directly identify that (19) reduces to
Similar to (17), (20) is a standard eigenvalue problem and we can compute a next iterate with a solver for standard eigenvalue problem. It directly suggests that the matrix A(w^{(k)}) in the SCF iteration can be viewed as an approximation of the Jacobian matrix, and in order to obtain faster convergence it can be better to use J(w^{(k)}), or approximations thereof. This in turn leads to quadratic convergence and in contrast to Newton’s method, the method converges in one step for a linear problem. It is superior to Newton’s method for problems that are close to being linear, as we prove in Section 5.3.
Further implementation aspects
Although the theory above provides us with explicit ways ((19) and (17)) to implement our implicitly formulated methods, they do not automatically enforce the constraint C^{T}W^{(k+ 1)} = I_{p}. To this end, we compute an intermediate eigenvector eigenpair \(\left (Y,Z\right )\) and add an additional step in our algorithms to enforce orthogonality and compute V^{(k+ 1)}. Note that C can be chosen freely. Without substantial additional computation, we can impose
that is, C = V^{(k+ 1)}. We compute a thin QR factorization of Y to obtain V^{(k+ 1)} and perform a similarity transformation using the R matrix to compute S^{(k+ 1)}. This need not be performed in every step and can be done just once at the end, since only V^{(k+ 1)} is required for the algorithm to execute the next step. We prefer the normalization condition (21) over a constant C because numerical linear algebra folklore tells us that orthogonal matrices have better numerical stability properties. Combining (19) and (17) with this normalization step leads to Algorithm 1 and Algorithm 2.
Remark 1 (Selection of eigenvectors)
The iterates of both algorithms will depend upon which of the p eigenvectors are used to construct V^{(k+ 1)} in each step. The best choice is usually application dependent. For example, in quantum chemistry, one would select the eigenvectors corresponding to the p smallest eigenvalues. This is because only the p smallest eigenvalues are of interest, which correspond to the socalled occupied states [29].
In the the case p = 1, specifically when we apply the method to the GrossPitaevskii equation, another strategy is natural. Suppose we use the algorithms in the second stage, as part of the twostage approach (outlined in Section 1), where the first stage is done with a gradient flow method. We can use the eigenvector computed in the first stage as an initial guess for the iteration, and the eigenvalue as a target in the selection procedure. More precisely, if the gradient flow returns x which is an approximation of the ground state, we can compute the corresponding eigenvalue approximation with the Rayleigh quotient σ = x^{T}A(x)x. In each iteration of Algorithm 1 or Algorithm 2, we can select the eigenvector corresponding to the eigenvalue closest to σ.
Yet another application with p = 1 occurs in hierarchical spectral clustering using the pLaplacian [7]. However, in contrast to the GrossPitaevskii application, one is interested in computing the eigenpair corresponding to the second smallest eigenvalue. By construction, the pLaplacian always has a zero eigenvalue and hence, the strategy in this case will be to select the smallest nonzero eigenpair.
In this setting, we only provide an exact direct computation formulation for (19) when p = 1, which has many important applications, most notably in computing numerical solutions to the GrossPitaevskii equation (see Section 6.2). When p > 1, we are not aware of any exact direct computation formulation, but we illustrate that if we resort to approximate solution methods, we do obtain similar attractive convergence properties. The approximate solution approach is illustrated with a specific example in the simulations section (Section 6.3).
Convergence theory
Local convergence of Algorithm 2
Since Algorithm 2 is equivalent to SCF as shown in Theorem 3, the convergence can be described in the setting of SCF. There has been extensive study of convergence of SCF and its acceleration in the last fifty years. Several results exist in the literature, as mentioned in Section 1. In general, SCF exhibits linear local convergence when it converges. Convergence can be characterized in terms of gaps [35] (see also [34, Theorem 3.1]). Rather than reviewing the details of the convergence results, we refer the reader to the general characterizations, e.g., in [3, 19, 20, 35] and the references therein.
Local convergence of Algorithm 1
Due to our inexact Newton viewpoint, the convergence of Algorithm 1 can be characterized using results in the rich literature on inexact Newton methods. Quadratic local convergence can be proved using theorems in [11].
Theorem 5
Let (W^{(k)}, Z^{(k)}), k = 1,…, be a sequence of pairs satisfying (14). Then, C^{T}W^{(k)} = I_{p} for k = 2,…,. If the sequence converges monotonically to a solution (W_{∗},Z_{∗}) to (10a), and the Jacobian given by (12) in invertible, then it converges with the same convergence order as Newton’s method.
We refer the reader to the A for the proof.
Single step analysis
As we illustrate in the examples, the implicit methods often work well in general and in particular for closetolinear problems. This is intuitively natural since both implicit methods converge in one step if we apply it to linear problems.
This can be further characterized, by considering one step of the method applied to a problem parameterized by a parameter α, where α = 0 corresponds to a linear problem. For this analysis, we consider the model problem
Let \(\left [\begin {array}{c}v_{0}, \lambda _{0} \end {array}\right ]^{T}\) be an initial guess and \(\left [\begin {array}{c}v_{+}, \lambda _{+} \end {array}\right ]^{T}\) be the result of (for the moment) any of the two algorithms. We introduce three functions
where β can be β = ∗, β = A, and β = J. The values of P (where we dropped the parameters for notational convenience) denote the nonlinearity
These functions correspond to the residual for the exact solution (β = ∗), one step of Algorithm 1 (β = J) and one step of Algorithm 2 (β = A). Note that v_{+}(0) = v_{∗}(0) and λ_{+}(0) = λ_{∗}(0), since α = 0 corresponds to a linear eigenvalue problem, respectively.
We can apply the implicit function theorem for all three functions, and express the first n + 1 variables in terms of the third variable α in a neighborhood of the solution, if the associated Jacobian is nonsingular. The Jacobian given (12) is now assumed to be nonsingular in the solution. The exact solution can then be expanded as
whereas both β = A and β = J can be expanded as
where c_{A} = P_{a}v_{∗}(0) and c_{J} = P_{j}v_{∗}(0).
The first terms in the Taylor expansion of the next iterate and the exact iterate as a function of the parameterization of the nonlinearity are equal. Therefore,
meaning that the accuracy of one step is of the order of magnitude of the nonlinear term. Moreover, the coefficient is proportional to ∥K(v_{∗}(0))v_{∗}(0) − P_{β}v_{∗}(0)∥.
Simulations
Scalar nonlinearity
The theory and methods are first illustrated with a reproducible example where p = 1. We consider (2) with
and
and \(\alpha \in \mathbb {R}\). Note that A in (25) satisfies Assumption 1 if we select
for essentially any \(c\in \mathbb {R}^{n}\). This specific example appears in [16, Section 3.3] and J is explicitly given by
We solve four instances of this problem generated by four different values of α, that is α = 0,0.5,1, and 5, corresponding to four different weights of the nonlinear term. To all of these instances, we apply Algorithm 1 (using (20)), Algorithm 2, the JInverse iteration (from [16]), and Newton’s method with initial guess \(v_{0} = \left (\begin {array}{cccc}1,&1,&1,&1 \end {array}\right )^{T}\). In Fig. 1, we see the error history of all three methods for all four values of α. The error is computed as ∥v_{k} − v_{∗}∥, where v_{∗} is the reference solution.
We observe linear convergence for Algorithm 2 and quadratic convergence for Algorithm 1 as predicted by the theory in Section 5. Both implicit methods are competitive, at least for small values of α. For higher values of α, the number of iterations required to enter the regime of quadratic convergence increases for both Algorithm 1 and Newton’s method. This example illustrates a simple case when Algorithm 1 is a better choice than Newton’s method, although both methods converge quadratically. We observe linear convergence for JInverse iteration as predicted by [16, Theorem 3.1].
In Fig. 2, we visualize the implications of the theory in Section 5.3 by plotting the single step errors for all four methods. It is clear that the single step error is linear in α, as expected from (23) and (24). The predicted line is plotted using the coefficent ∥C(v_{∗}(0))v_{∗}(0) − P_{β}v_{∗}(0)∥. This illustrates an advantage of the proposed methods for small α.
The implicit algorithms derived in this paper may be used in combination with an additional globalization strategy to make them more robust with respect to the choice of initial guess. For example, in many applications, it is common to use the Armijo rule [2] to find a suitable step size along the update direction computed by a Newtonbased method. Figure 3 is an illustration of how the convergence basin can be enlarged when we use the Armijo rule. More precisely, we apply Algorithm 1 with many different initial guesses which are obtained by perturbing the second and third components (represented by the horizontal and vertical axes of the plots) of the reference eigenvector \(v_{*} \approx \left (\begin {array}{cccc}0.2107& 0.6730& 0.3909& 0.5915 \end {array}\right )^{T}\). The figure contains two separate contour plots of the natural logarithm of the residual after ten iterations. The plot on the left is obtained without using any globalization strategy. The one on the right is obtained by combining Algorithm 1 with one substep that implements the Armijo rule. The addition of the Armijo rule clearly benefits Algorithm 1 by enlarging the set of initial guesses that lead to convergence.
Computing the ground state of bosons
The GrossPitaevskii equation (GPE) is a nonlinear PDE obtained by a HartreeFock approximation (see [28]) of the Schrödinger equation. It describes the ground state of identical bosons in a quantum system. We consider the case of a rotating BoseEinstein condensate on the domain \(\mathbb {D}=(L,L)\times (L,L)\). In this case, the GPE for the wave function \({{\varPsi }}: \mathbb {R}^{2}\to \mathbb {C}\) under an external potential \(V:\mathbb {R}^{2}\to \mathbb {R}\) is
Here, \(\frac {\partial }{\partial \phi } = y\frac {\partial }{\partial x}x\frac {\partial }{\partial y}\). The scalar b is a constant indicating the strength of interaction between the bosons and Ω is the angular velocity of rotation. We choose the boundary condition Ψ(x,y) = 0 for (x,y) ∈ ∂D.
We perform a central difference discretization of (26) using a uniform grid of N + 2 points along each dimension with grid spacing Δx. Details are in [16, Section 5.1]. This leads to a problem of size n = 2N^{2} with
where \(\tilde {A}_{0}\) is the discretization of the linear operator \(\frac {1}{2}{{\varDelta }}i{{\varOmega }}\frac {\partial }{\partial \phi }+V(x,y)\) and γ = b(Δx)^{− 2}. Note that both \(v_{1}, v_{2} \in \mathbb {R}^{N^{2}}\) and v_{1} + iv_{2} give the vectorization of Ψ evaluated at the interior points. We have
We now apply and compare the performance of Algorithm 1 and JInverse iteration with J as defined by (27).
As seen from Fig. 4, Algorithm 1 converges in much fewer iterations as compared to the JInverse iteration. Since each iteration of Algorithm 1 involves the solution of a linear eigenvalue problem, we see that it performs worse when we measure the cumulative computation times for each iteration. This is because the JInverse iteration solves a linear system in each step (Fig. 5).
Since one step of both Algorithm 1 and Algorithm 2 requires the solution of a standard eigenvalue problem, we need to select an appropriate eigenpair at each step. Special attention is needed in the selection in this problem. We select a new iterate in a way that minimizes the difference between two iterates. More precisely, we choose δ ∈ (0,1) and select all eigenpairs which correspond to eigenvalue within a radius δ of a given target. We then do a least squares fitting to find the linear combination of these eigenvectors which is closest to the previous iterate. This is needed due to the fact that the problem has highly clustered eigenvalues.
Invariant subspace
We consider
where A_{0} is the discrete 1D Laplacian. This problem is a simplified version of problems that frequently occur in electronic structure calculations when we discretize the HartreeFock approximation of the Schrödinger equation. See [35] for a discussion of the problem type and convergence results of SCF applied to (28).
If we let e_{i} denote the the i th column of the identity matrix I_{n} and \(E_{i,j} = e_{i}{e_{j}^{T}}\), then
The derivation and some computational aspects of J(V ) are contained in the (publicly available) extended technical report [17, Appendix B], which is omitted for brevity.
The implementation of Algorithm 2 follows directly from (17). We also illustrate the importance of the implicit formulation of Algorithm 1 by inexactly solving (19). We do this with the optimization subroutine fminsearch. It provides us a way to test Algorithm 1 for relatively small examples as a proof of concept. We use n = 10 and p = 3 and apply Algorithm 1 and Algorithm 2 for two different values of α.
In Fig. 6, we observe that Algorithm 2 converges linearly and Algorithm 1 has much faster convergence, with an initial quadratic phase. The number of iterations required for convergence increases with increase in α, as expected from the single step analysis of Section 5.3. The initial quadratic phase is succeeded by an asymptotic slowdown, which can be attributed to the inexact solution of the update (19).
Conclusions and outlook
This paper shows that taking an inexact and implicit Newton approach towards deriving algorithms for problems with eigenvector nonlinearities leads to new algorithmic insights. Using this approach, we derive two algorithms. Algorithm 2 is shown to be the widely used SCF algorithm. This result shows a connection between Newton’s method and the SCF algorithm which was previously unknown. Algorithm 1 is a new algorithm, to the best of our knowledge.
We prove that Algorithm 1 exhibits quadratic local convergence. Both Algorithm 1 and Algorithm 2 have favorable convergence properties for problems that are close to being linear, as shown by the single step analysis of Section 5.3. Numerical simulations for the GrossPitaevskii equation in Section 6.2 show that Algorithm 1 is a competitive algorithm for the p = 1 case. The p > 1 example in Section 6.3 shows that Algorithm 1 converges faster than Algorithm 2 even when we solve the update (19) inexactly.
There are several improvements of the SCF algorithm. Some of these techniques may be interpretable from an implicit viewpoint as well. For instance, acceleration schemes such as DIIS [25] might be seen as an inexact Newton algorithm. This could be combined with other convergence theories to gain further understanding of DIIS. Another direction that can be explored is to develop applicationspecific strategies to solve the (19) for p > 1 or approximate solution techniques that lead to superlinear convergence.
References
 1.
Altmann, R., Henning, P., Peterseim, D.: The Jmethod for the GrossPitaevskii eigenvalue problem. Tech. rep., Univ. Augsburg. arXiv:2009.09022 (2019)
 2.
Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific J. Math. 16(1), 1–3 (1966). https://doi.org/10.2140/pjm.1966.16.1
 3.
Bai, Z., Li, R.C., Lu, D.: Optimal convergence rate of selfconsistent field iteration for solving eigenvectordependent nonlinear eigenvalue problems. Tech. rep. arXiv:1908.00333 (2020)
 4.
Bai, Z., Lu, D., Vandereycken, B.: Robust Rayleigh quotient minimization and nonlinear eigenvalue problems. SIAM J. Sci. Comput. 40(5), A3495–A3522 (2018). https://doi.org/10.1137/18M1167681
 5.
Bao, W., Du, Q.: Computing the ground state solution of BoseEinstein condensates by a normalized gradient flow. SIAM J. Sci. Comput. 25 (5), 1674–1697 (2004). https://doi.org/10.1137/S1064827503422956
 6.
Borgatti, S. P., Everett, M. G.: Models of core/periphery structures. Soc. Networks 21 (4), 375–395 (2000). https://doi.org/10.1016/S03788733(99)000192
 7.
Bühler, T., Hein, M.: Spectral clustering based on the graph pLaplacian. In: Proceedings of the 26th International Conference on Machine Learning, pp. 81–88 (2009)
 8.
Cancès, E., Bris, C.L.: On the convergence of SCF algorithms for the HartreeFock equations. M2AN, Math. Model. Numer. Anal. 34(4), 749–774 (2000). https://doi.org/10.1051/m2an:2000102
 9.
Cancès, E., Kemlin, G., Levitt, A.: Convergence analysis of direct minimization and selfconsistent iterations. Tech. rep. arXiv:2004.09088 (2020)
 10.
Claes, R., Jarlebring, E., Meerbergen, K., Upadhyaya, P.: Linearizability of eigenvector nonlinearities. Tech. rep. arXiv:2105.10361 (2021 )
 11.
Dembo, R. S., Eisenstat, S., Steihaug, T.: Inexact Newton methods. SIAM J. Numer. Anal. 19, 400–408 (1982). https://doi.org/10.1137/0719025
 12.
Edelman, A., Arias, T. A., Smith, S. T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
 13.
Hein, M., Bühler, T.: An inverse power method for nonlinear eigenproblems with applications in 1spectral clustering and sparse PCA. In: Advances in Neural Information Processing Systems 23, pp. 847–855 (2010)
 14.
Higham, N.: Functions of Matrices. SIAM . https://doi.org/10.1137/1.9780898717778 (2008)
 15.
Jarlebring, E., Koskela, A., Mele, G.: Disguised and new quasiNewton methods for nonlinear eigenvalue problems. Numer. Algorithms 79, 331–335 (2018). https://doi.org/10.1007/s1107501704382
 16.
Jarlebring, E., Kvaal, S., Michiels, W.: An inverse iteration method for eigenvalue problems with eigenvector nonlinearities. SIAM J. Sci. Comput. 36(4), A1978–A2001 (2014). https://doi.org/10.1137/130910014
 17.
Jarlebring, E., Upadhyaya, P.: Implicit algorithms for eigenvector nonlinearities. Tech. rep., KTH Royal Institute of Technology. arXiv:2002.12805(2002)
 18.
Levitt, A.: Convergence of gradientbased algorithms for the HartreeFock equations. ESAIM Math. Model. Numer. Anal. 46, 1321–1336 (2012). https://doi.org/10.1051/m2an/2012008
 19.
Liu, X., Wang, X., Wen, Z., Yuan, Y.: On the convergence of the selfconsistent field iteration in KohnSham density functional theory. SIAM J. Matrix Anal. Appl. 35(2), 546–558 (2014). https://doi.org/10.1137/130911032
 20.
Liu, X., Wen, Z., Wang, X., Ulbrich, M., Yuan, Y.: On the analysis of the discretized KohnSham density functional theory. SIAM J. Numer. Anal. 53(4), 1758–1785 (2015). https://doi.org/10.1137/140957962
 21.
Mehrmann, V., Voss, H.: Nonlinear eigenvalue problems: a challenge for modern eigenvalue methods. GAMM Mitteilungen 27, 121–152 (2004)
 22.
Mercado, P., Tudisco, F., Hein, M.: Spectral clustering of signed graphs via matrix power means. In: Proceedings of the 36th International Conference on Machine Learning, pp. 4526–4536 (2019)
 23.
Peters, G., Wilkinson, J.: Inverse iterations, illconditioned equations and Newton’s method. SIAM Rev. 21, 339–360 (1979). https://doi.org/10.1137/1021052
 24.
Pulay, P.: Convergence acceleration of iterative sequences  the case of SCF iteration. Chem. Phys. Lett 73(2), 393–398 (1980). https://doi.org/10.1016/00092614(80)803964
 25.
Rohwedder, T., Schneider, R.: An analysis for the DIIS acceleration method used in quantum chemistry calculations. J. Math. Chem. 49(9), 1889–1914 (2011). https://doi.org/10.1007/s109100119863y
 26.
Rudin, W.: Principles of Mathematical Analysis. 3rd ed McGrawHill (1976)
 27.
Ruhe, A.: Algorithms for the nonlinear eigenvalue problem. SIAM J. Numer. Anal. 10, 674–689 (1973). https://doi.org/10.1137/0710059
 28.
Saad, Y., Chelikowsky, J. T., Shontz, S. M.: Numerical methods for electronic structure calculations of materials. SIAM Rev. 52(1), 3–54 (2010). https://doi.org/10.1137/060651653
 29.
Szabo, A., Ostlund, N. S.: Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory. Dover Publications (1996)
 30.
Tapia, R. A., Dennis, J. E., Schäfermeyer, J. P.: Inverse, shifted inverse, and Rayleigh quotient iteration as Newton’s method. SIAM Rev. 60(1), 3–55 (2018). https://doi.org/10.1137/15M1049956
 31.
Tudisco, F., Higham, D. J.: A nonlinear spectral method for core–periphery detection in networks. SIAM J. Math. Data. Sci. 1(2), 269–292 (2019). https://doi.org/10.1137/18M1183558
 32.
Unger, G.: Convergence orders of iterative methods for nonlinear eigenvalue problems. Springer, Berlin. https://doi.org/10.1007/9783642303166_10 (2013)
 33.
Unger, H.: Nichtlineare Behandlung von Eigenwertaufgaben. Z. Angew. Math. Mech. 30, 281–282 (1950). https://doi.org/10.1002/zamm.19500300839. English translation: http://www.math.tudresden.de/~schwetli/Unger.html
 34.
Upadhyaya, P., Jarlebring, E., Rubensson, E. H.: A density matrix approach to the convergence of the selfconsistent field iteration. Numer. Alg. Control Optimization, Accepted for publication (2020)
 35.
Yang, C., Gao, W., Meza, J. C.: On the convergence of the selfconsistent field iteration for a class of nonlinear eigenvalue problems. SIAM J. Matrix Anal. Appl. 30(4), 1773–1788 (2009). https://doi.org/10.1137/080716293
 36.
Zhao, Z., Bai, Z. J., Jin, X. Q.: A Riemannian Newton algorithm for nonlinear eigenvalue problems. SIAM J. Matrix Anal. Appl. 36(2), 752–774 (2015)
Acknowledgements
We thank Prof. Daniel Kressner for comments on a preliminary version of this manuscript.
Funding
Open access funding provided by Royal Institute of Technology.
Author information
Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix. : Proof of Theorem 5
Appendix. : Proof of Theorem 5
Proof
Let \(F^{\prime }_{k}\) be the Jacobian of F evaluated in the iterate k. In the notation of [11], we introduce a residual denoted r_{k}, corresponding to the difference between a Newton step and an inexact Newton step:
Then, subtracting (14) from (29) the residual becomes
Using the CauchySchwarz inequality, we have
By the assumption of monotonic convergence, we have,
From the implicit function theorem (e.g., the formulation in [26, Theorem 9.28]) and the assumption about the invertibility of the Jacobian at the solution, we get that
where \(F_{*}^{\prime }\) is the Jacobian of F evaluated in the solution. The combination of (30), (31), and (32) leads to
By [11, Theorem 3.3], the proof is complete. □
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jarlebring, E., Upadhyaya, P. Implicit algorithms for eigenvector nonlinearities. Numer Algor (2021). https://doi.org/10.1007/s11075021011894
Received:
Accepted:
Published:
Keywords
 Eigenvector nonlinearity
 Inexact Newton
 Implicit Newton
 SCF