Rational Krylov methods for fractional diffusion problems on graphs

In this paper we propose a method to compute the solution to the fractional diffusion equation on directed networks, which can be expressed in terms of the graph Laplacian $L$ as a product $f(L^T) \boldsymbol{b}$, where $f$ is a non-analytic function involving fractional powers and $\boldsymbol{b}$ is a given vector. The graph Laplacian is a singular matrix, causing Krylov methods for $f(L^T) \boldsymbol{b}$ to converge more slowly. In order to overcome this difficulty and achieve faster convergence, we use rational Krylov methods applied to a desingularized version of the graph Laplacian, obtained with either a rank-one shift or a projection on a subspace.

1. Introduction. The use of graph models to represent complex structures is extremely widespread, ranging from real and digital social networks, to transportation networks, to networks of chemical reactions and many others. It is often of interest in applications to study diffusion processes taking place on a network, that evolve in time according to the distribution of edges in the underlying graph. Such a process can be described in terms of a system of ordinary differential equations in time, with the Laplacian matrix L of the graph as the coefficient matrix, and its solution at time t can be written as u(t) = exp(−tL T )u(0). An expression of this form can be computed efficiently using a polynomial Krylov method.
While a diffusion process is a local phenomenon, there are certain phenomena that allow long-range interactions and are non-local in nature: in the continuous setting, phenomena of this kind have been effectively modelled using fractional powers of the Laplace differential operator, that is (−∆) α for α ∈ [1/2, 1) (see, e.g., [19,Definition 2]). Analogously, in the context of directed graphs, these phenomena can be well described with a fractional diffusion model, which employs a fractional power of the graph Laplacian instead of the Laplacian itself. Unlike for the continuous Laplace operator, in the discrete case there is no need to restrict the values of α to the interval [1/2, 1), and we can use any exponent α ∈ (0, 1). This modelling approach has been recently studied in [27] in the undirected case, and in [2] for more general directed graphs; we refer to the book [24] for more information on fractional dynamics. We mention that there are alternative approaches for modelling nonlocal phenomena on graphs, e.g. the one based on the transformed k-path Laplacian presented in [12,13]: this model shares some properties with fractional diffusion, such as superdiffusive behaviour on infinite one-dimensional graphs.
In this paper we focus on the computational aspect of fractional dynamics, in particular on the efficient computation of the solution to the fractional diffusion equation on both undirected and directed graphs using rational Krylov methods. The solution at time t ≥ 0 can be expressed as u(t) = f (L T )u(0), where f (z) = exp(−tz α ) for α ∈ (0, 1). The function f has a branch cut on the negative real axis (−∞, 0], and hence it is not analytic in a neighbourhood of the spectrum of the graph Laplacian L, which is a singular M -matrix. The lack of a region of analyticity around the spectrum of L causes the error bounds for Krylov methods based on the spectrum and the field of values to be unusable, and in practice this also negatively affects the actual convergence rate of these methods. Moreover, since f is not analytic, it is preferable to approximate f (L T )u(0) using rational Krylov methods instead of polynomial methods. In our experiments, in addition to the polynomial Krylov method, we investigate the performance of the Shift-and-Invert Krylov method with two different shifts, and of a rational Krylov method with asymptotically optimal poles presented in [22] for Laplace-Stieltjes functions.
To improve convergence, we propose to remove the singularity of the graph Laplacian with either a rank-one shift or a projection on a subspace. This enables us to use any rational Krylov method on the (now nonsingular) modified Laplacian, which exhibit faster convergence, and then recover the solution to the original problem with only little additional cost. We also show that the improved convergence of the projection method can be achieved without explicitly projecting the graph Laplacian, by suitably manipulating the initial vector (see subsection 5.2.2). These techniques can be applied with no preliminary computations to any undirected graph, and they only require the solution of the singular linear system L T z = 0 for a general digraph.
The paper is organized as follows. First, in section 2 we provide the necessary background and notation on graphs, and we introduce the graph Laplacian L. In section 3 we define the fractional powers L α , α ∈ (0, 1), and we briefly discuss the properties of the fractional Laplacian. In section 4 we introduce rational Krylov methods for the computation of matrix functions, and in section 5 we present some techniques to remove the singularity of the graph Laplacian. In section 6 we conduct some numerical experiments on real-world networks to compare the performance of different Krylov methods and demostrate the effectiveness of the desingulatization techniques proposed in section 5.
2. Background and notation regarding graphs. A directed graph, or digraph, is a pair G = (V, E), where V = {v 1 , . . . , v n } is a set of n nodes or vertices, and E ⊆ V × V is the set of edges. A digraph can be represented with its adjacency matrix A, an n × n matrix whose entries are The out-degree d i of a node i is defined as the number of edges going out from i, i.e. edges of the form (i, ) ∈ E for some node ∈ V . The vector d of out-degrees can be computed as d = A1, where 1 denotes the all-ones vector of size n. If we denote by D = diag(d) the diagonal matrix of out-degrees, the (out-degree) graph Laplacian of G is defined as L = D − A.
Note that one can also define the vector of in-degrees as d in = A T 1, and the in-degree graph Laplacian as L in = diag(d in ) − A. Here, we focus solely on the outdegree graph Laplacian L, and we refer to it as the graph Laplacian whenever there is no ambiguity. Most of the properties of L also hold for L in , and what we say for the out-degree graph Laplacian can be extended to the case of the in-degree Laplacian with only minor adjustments.
One can also consider weighted graphs, where to each edge (i, j) ∈ E is associated a positive weight w ij , repesenting the strength of the connection between nodes i and j; if (i, j) / ∈ E, we write w ij = 0. The matrix W = (w ij ) is a weighted adjacency matrix associated to G, and it can be used to defined a weighted vector of out-degrees d W = W 1 and a weighted graph Laplacian L W = diag(d W ) − W . In this paper we only consider unweighted graphs for simplicity, but the techniques that we propose can be applied in the same way to weighted graphs. We also mainly focus on strongly connected graphs, i.e., graphs on which there exists a directed path from node i to node j for any pair of nodes (i, j). Recall that a graph is strongly connected if and only if its adjacency matrix A (and hence also L) is irreducible, i.e., there exists no permutation matrix P such that P T AP is block triangular.
2.1. Properties of the graph Laplacian. Here we briefly discuss some properties of the graph Laplacian L, which later will be used in the definition of its fractional powers. We also introduce the classical diffusion equation on graphs.
It follows from its definition that the graph Laplacian L is a singular matrix, indeed it holds L1 = 0. More specifically, the graph Laplacian is a singular Mmatrix.
One can easily prove the following basic result.
Proposition 2.2. The graph Laplacian L of a digraph G has the following properties: • L is a singular M -matrix.
• The nonzero eigenvalues of L have positive real part.
• 0 is a semisimple eigenvalue of L, i.e. its algebraic multiplicity and geometric multiplicity are the same.
These properties are fundamental for being able to define fractional powers of L, as we will see shortly.
The graph Laplacian is used as the coefficient matrix in the diffusion equation on the graph G. Denote by u(t) ∈ R n a vector of concentrations at time t of a substance that is diffusing on the graph. Up to normalization, we can assume that u(t) is a probability vector, i.e. that u(t) ≥ 0 and u(t) T 1 = 1. The diffusion equation on a directed graph reads and the solution to this system of ordinary differential equations can be explicitly stated in terms of the matrix exponential, Using properties of M -matrices, one can easily prove that e −tL is a stochastic matrix, i.e. that it has nonnegative entries and e −tL 1 = 1, and hence the solution u(t) is a probability vector at all times t ≥ 0. Note that this property would not be preserved if we used column vectors instead of row vectors in (2.1): see, e.g., the discussion in [8].
3. The fractional graph Laplacian. In this section, we recall the general definition of a matrix function in terms of the Jordan canonical form, following [17, Section 1.2], and we use it to define the fractional graph Laplacian and the related fractional diffusion process.
Recall that any matrix A ∈ C n×n can be expressed in the Jordan canonical form as where Z is nonsingular and m 1 + m 2 + · · · + m p = n. An eigenvalue λ is semisimple if and only if all the Jordan blocks associated to λ are 1 × 1. We have the following definition.
Definition 3.1. The function f is said to be defined on the spectrum of A if the values Provided that f is defined on the spectrum of A, the matrix function f (A) can be defined for any matrix using the Jordan canonical form.
Definition 3.2. Let f be defined on the spectrum of A ∈ C n×n , and let A have the Jordan canonical form (3.1). Then we define By Proposition 2.2, the function f (z) = z α , α ∈ (0, 1) is defined on the spectrum of the graph Laplacian L, since the eigenvalue 0 is semisimple and all other eigenvalues are in the right half-plane. Here we denote by z α the branch of the fractional power with a cut on the negative real axis (−∞, 0], i.e. if z = ρe iθ , with ρ > 0 and θ ∈ (−π, π), then z α = ρ α e iαθ . With the above definition, the fractional Laplacian L α is still an M -matrix. Indeed, we have the following result. Moreover, since L α 1 = 0, we can interpret the fractional graph Laplacian as the Laplacian of a weighted graph on the same set of nodes as G, and we can use it to define a fractional diffusion process on G, with a system of differential equations analogous to (2.1): The solution of this system can be explicitly written in the form As in the case of classical diffusion, the solution u(t) to (3.2) is a probability vector at all times t ≥ 0.

Rational Krylov methods.
In this section we briefly introduce rational Krylov methods for the computation of expressions of the form f (A)b, with the goal of applying them for the computation the solution to the fractional diffusion equation (3.2), which can be expressed in the form u(t) = f (L T )u 0 , where f (z) = e −tz α . For a more extensive treatment of rational Krylov methods, including the problem of the selection of poles, we refer, e.g., to [15,16].
In many applications it is required to compute the product f (A)b, where A is a large and sparse matrix. In these cases, the computation of f (A)b by first computing the whole matrix f (A) and then forming the product f (A)b is extremely expensive and often unfeasible; moreover, the matrix function f (A) is generally dense even when the original matrix A is sparse, making the full computation of f (A) costly also in terms of storage for large scale problems. A rational Krylov methods overcomes these difficulties by directly approximating the product f (A)b using a low-dimensional search space, without explicitly computing f (A). In each iteration of a rational Krylov method it is required to solve a shifted linear system involving A, making the iterations more expensive than those of a polynomial Krylov method, which only requires a matrix-vector product with A at each iteration. However, the increased cost per iteration of a rational method is offset by the often superior approximation properties of rational functions, compared to polynomial approximation, especially for functions that are not analytic.
For any k ≥ 1, define the rational Krylov subspace of order k associated to A and b as . If all poles are equal to ∞, the rational Krylov subspace Q k (A, b) reduces to the polynomial Krylov subspace The rational Krylov subspaces Q k (A, b) form a sequence of nested subspaces, each of dimension k, as long as k ≤ K, where K is the invariance index of the sequence, i.e. the smallest index such that Generally it is assumed that k ≤ K. If this is the case, we can compute an orthonormal basis of Q k (A, b) using Ruhe's rational Arnoldi algorithm [28], which is summarized in Algorithm 4.1. The first basis vector is chosen As is customary, we assume that the last computed basis vector v j is a continuation vector, and we compute v j+1 by orthonormalizing w j = (I −A/ξ j ) −1 Av j against all the previous basis vectors. Input: if h j+1,j = 0 then stop 10: For the solution of the fractional diffusion problem, we are going to use real poles (in particular, located on (−∞, 0)), so the matrices V k are going to be real. When the sequence of poles (ξ k ) k≥1 consists of a single repeated pole ξ, the rational Krylov subspaces that are generated are known as Shift-and-Invert Krylov subspaces, and they can be written more simply as i.e. it corresponds to a polynomial Krylov subspace relative to the matrix (A − ξI) −1 .
The Shift-and-Invert method was first investigated for the approximation of matrix functions in [25,32]. Even though this type of Krylov subspace sacrifices some flexibility in the choice of the poles, it is appealing because at each iteration we are required to solve a linear system with the same matrix (A − ξI); this allows us, for instance, to compute an LU factorization of (A − ξI) only once, and then we can apply it at each iteration to solve the linear systems at a reduced cost. Therefore, although a Shift-and-Invert method will typically require more iterations to converge compared to a rational Krylov method with a carefully chosen sequence of poles, it can still be competitive in terms of execution time. The The field of values, also known as numerical range, is a convex and compact set which contains the spectrum σ(A), and it reduces to the convex hull of σ(A) when A is a normal matrix.
The following theorem by Crouzeix and Palencia provides a bound for the norm of a matrix function using the field of values W(A).
Then it holds A conjecture by Crouzeix states that the inequality f (A) 2 ≤ C f W(A) holds with C = 2 for any matrix A. With Theorem 4.1 one can prove the following.
where Π k−1 denotes the set of polynomials of degree ≤ k − 1.
The bound given by Proposition 4.2 decays rapidly to zero when f is an entire function (e.g., f (z) = e z ), or if it has a large region of analiticity surrounding the field of values of A. Unfortunately, in the case of fractional diffusion on graphs, the 0 eigenvalue of the Laplacian is located on the boundary of the region of analiticity of f . Moreover, for most directed graphs the field of values of the Laplacian intersects the negative real axis (−∞, 0), preventing us from using convergence results based on the field of values. The presence of an eigenvalue at 0 can also be detrimental in practice for the convergence of Krylov methods. With this motivation in mind, in section 5 we propose some techniques to remove the singularity of the graph Laplacian, in order to work with nonsingular matrices and improve the convergence of Krylov methods.

Laplace-Stieltjes functions.
The problem of the selection of poles for rational Krylov methods is a highly active area of reseach, and many different choices have been proposed in the literature, depending on the function f and on the spectrum of A (see, for instance, [16]). Most of the existing analysis deals with real symmetric matrices, since in that case the field of values is reduced to an interval on the real line, and hence the minimization problem (4.2) becomes easier to handle.
A pole selection strategy for the evaluation of f (A)b was recently proposed in [22] for the case of a Hermitian positive definite matrix A and a Cauchy-Stieltjes or Laplace-Stieltjes function f . For a matrix A with spectrum contained in the positive interval [a, b], the choice of poles described in [22] gives after k iterations an error . However, the poles that satisfy the error bound for iteration k + 1 are not obtained by adding a new pole to the ones of iteration k, so in order to effectively use this pole selection strategy one would need to decide a priori the number of iterations to be performed. In order to overcome this drawback, in [22,Section 3.5] the authors use the method of equidistributed sequences (EDS) to construct an infinite sequence of poles with the same asymptotic rate of convergence, that can be more easily used in practice. For the details on the construction of this pole sequence, we refer to the discussion in [22,Section 3.5].
In this section we observe that the function f (z) = e −tz α is a Laplace-Stieltjes function, and hence we can use the pole sequence proposed in [22] for the fractional diffusion problem (3.2). Even though there are no guarantees on the effectiveness of this pole sequence for general matrices, in our numerical experiments we observed that it provides a good convergence rate even when A is the (singular and nonsymmetric) Laplacian of a directed graph.
The class of Laplace-Stieltjes functions coincides with the class of completely monotonic functions, i.e. infinitely differentiable functions defined on (0, ∞) such that The equivalence between these two classes of functions is known as Bernstein's theorem [29,Theorem 1.4]. A class of functions which is closely related to completely monotonic functions is the class of Bernstein functions, that consists of all functions f : (0, ∞) → R of class C ∞ such that Observe that a nonnegative function f : (0, ∞) → R is a Bernstein function if and only if f is a completely monotonic function. The fractional power f (z) = z α , for α ∈ (0, 1), is an example of a Bernstein function. By [29,Theorem 3.7], if f is a positive Bernstein function, then the function g(z) = e −tf (z) is completely monotonic for all t > 0. This proves that g(z) = e −tz α is a completely monotonic (equivalently, Laplace-Stieltjes) function for all t > 0 and α ∈ (0, 1). This fact can also be easily proved diredtly by computing the derivatives of g and checking that condition (4.4) is verified.
5. Dealing with the singularity. As we have discussed previously, the functions f (z) = z α and g(z) = e −tz α that are involved in fractional dynamics are not analytic at z = 0. Since the graph Laplacian L always has a zero eigenvalue, the convergence of rational Krylov methods for the computation of f (L T )b and g(L T )b may be hindered by the fact that the function has no region of analyticity surrounding the spectrum of L.
In this section we propose a rank-one shift and a subspace projection that can be used to transform the graph Laplacian into a nonsingular matrix, and we provide simple formulas that link f (L) and f (L T )b with functions of the transformed matrix. We are also going to show that Krylov methods directly applied to the singular graph Laplacian can inherit the improved convergence of the projection approach, at least in exact arithmetic, provided that the initial vector b is suitably modified.
We present these techniques in detail for strongly connected directed graphs. Recall that in this case the eigenvalue 0 of the Laplacian L is simple.
5.1. Rank-one shift. Recall that L1 = 0, and let z > 0 be such that z T L = 0 T and z T 1 = 1 (the positivity of z is a consequence of the Perron-Frobenius Theorem [23]). The vector z is uniquely defined by the above identities if the graph G is strongly connected.
The right and left eigenvectors 1 and z can be respectively completed to a right and left Jordan basis for L with two matrices R, S ∈ C n×(n−1) , so that we have The matrix J 1 ∈ C (n−1)×(n−1) contains all the other Jordan blocks of L, which correspond to nonzero eigenvalues. Now, denoting by e 1 ∈ R n the first vector of the canonical basis, observe that 1z T = Ze 1 e T 1 Z −1 , and hence for all θ ∈ R we have i.e. the matrix L + θ1z T has the same spectrum as L except for the eigenvalue 0, which is replaced by θ. Therefore, using basic properties of matrix functions, for any function f defined on the spectra of L and L + θ1z T it holds Identity (5.1) allows us to compute f (L + θ1z T ) instead of f (L) and then recover the latter for a minimal cost. For any θ > 0 (e.g., θ = 1), the matrix L + θ1z T is nonsingular and all its eigenvalues have strictly positive real part, so we expect Krylov methods to converge faster when f has a branch cut on (−∞, 0], e.g. for f (z) = z α .
In particular, for fractional diffusion the objective is the computation of f (L T )b for a probability vector b and f (z) = e −tz α , and identity (5.1) becomes When the matrix L is large and sparse and a rational Krylov method is used to approximate f (L T + θz1 T ), it would be preferable to solve shifted linear systems involving the dense matrix L T + θz1 T without explicitly forming it. This can be done by using the Sherman-Morrison formula: for an invertible matrix A and two vectors u, v such that 1 + v T A −1 u = 0, the matrix A + uv T is invertible and it holds In our setting, for a pole ξ ∈ (−∞, 0) the invertibility condition 1+1 T (L T −ξI) −1 z = 0 is always satisfied (since (L T − ξI) −1 ≥ 0, being the inverse of a nonsingular Mmatrix), and identity (5.3) becomes Remark 5.1. We mention that the Sherman-Morrison formula has already been applied in the literature in the context of rational Krylov methods. For instance, in [30, Section 3.1] the authors use the Sherman-Morrison-Woodbury formula in the construction of an "augmented" Krylov subspace associated to a singular matrix, arising in connection with the solution of a constrained Sylvester equation.
Remark 5.2. Using the Jordan canonical form, it is not difficult to see that it holds and hence for small ξ < 0 and any vector w the identity (5.4) becomes i.e. we are very close to subtracting two multiples of the vector z of approximately the same length. Hence formula (5.6) may suffer from severe numerical cancellation when the pole ξ is very close to the origin, and therefore its use is not advised in that situation. Indeed, in our numerical experiments we observed a large loss of precision when solving linear systems with formula (5.6) for poles ξ < 0 of order 10 −6 .
In order to address the issue mentioned in Remark 5.2, we now derive an alternative way to compute the solution of the shifted linear system (L T − ξI)φ = w, in order to avoid the cancellation in (5.6) when ξ is small. We have so we can define ψ := φ + ξ −1 (1 T w)z, and it holds by construction that 1 T ψ = 0. It is also straightforward to verify that ψ is the solution to the linear system and that the vector w − (1 T w)z is orthogonal to 1. Hence, we can compute φ as With formula (5.7), we have explicitly separated a component of the solution that is proportional to ξ −1 z. By substituting (5.7) in (5.6), we obtain (5.8) Observe that cancellation is avoided when using (5.8), because the subtraction of the two close multiples of the vector z is performed analytically. Moreover, because of (5.5) and since (w − (1 T w)z) ⊥ 1, we have so for small ξ < 0 we do not expect ψ to have a component of order ξ −1 along the vector z (note that, in general, this argument fails for φ). Our numerical experiments confirm that the use of formula (5.8) fixes the problem of cancellation.
Remark 5.3. Note that if the graph is undirected, or more generally if it is balanced (i.e. each node has equal in-and out-degree), we also have z = 1 up to a normalization factor, so no preliminary computation is needed to use this approach with the rank-one shift. On the other hand, for a general digraph it is first required to compute a nonzero vector z such that L T z = 0.
The problem of solving this linear system was recently discussed, for instance, in [3]. One possible approach is to compute an LDU factorization of the transpose of the graph Laplacian, L T = LDU, where L is unit lower triangular, U is unit upper triangular, and D is diagonal with D ii > 0 for i = 1, . . . , n − 1 and D nn = 0. Such a factorization always exists since L is an irreducible singular M -matrix [6], and it can be computed in a stable way using Gaussian elimination, with no pivoting required [14]. The original linear system L T z = 0 is thus equivalent to DUz = 0, which can be solved by fixing z n = 1 and solving the lower triangular linear system Uz = e n via backward substitution. We remark that L −1 ≥ 0, so the vector z is nonnegative, and it can be indeed normalized so that z T 1 = 1. We also mention that when L is sparse this method can be improved by computing the LDU factorization of P T L T P instead of L T , where P is a permutation matrix suitably chosen to reduce the fill-in in the factors L and U. For example, the Matlab routines amd and symrcm can be used for this purpose.
Alternatively, when L is very large and sparse, the linear system L T z = 0 can be solved iteratively, e.g. with a preconditioned GMRES method (see [3] and references therein). Of course, if L is large and we choose to solve L T z = 0 iteratively, we should also use an iterative method for the solution of the shifted linear systems at each step of the rational Krylov iteration. However, in this paper we do not address this specific subproblem, and we instead focus on the case where it is feasible to solve the linear systems with a direct method.

Projected Krylov methods.
Another way to obtain a nonsingular matrix from the graph Laplacian is to project L on the n − 1 dimensional subspace S = span{1} ⊥ . We remark that the approach we present here is similar to the one described in [7,Section 4], where the authors separate the eigenvalue 0 from the rest of the spectrum of a symmetric positive semidefinite matrix A, to compute f (A)b with an integral on a contour surrounding σ(A) \ {0}. See also [20,21] for a discussion of more general spectral splitting methods for symmetric matrices.
Let {q 1 , . . . , q n−1 } be an orthonormal basis of S, and define the n × (n − 1) matrix Q := q 1 . . . q n−1 . The matrixQ := Q 1 √ n 1 is orthogonal, and we have Q T Q = I n−1 and QQ T = I n − 1 n 11 T . Here we denoted by I k the identity matrix of size k × k in order to stress that the two matrices have a different size; in the sequel, we will drop the subscript when there is no ambiguity. Observe that the matrix Q T LQ is nonsingular, since range Q = span{1} ⊥ , ker Q T = ker L = span{1} and range L = span{z} ⊥ .
We are going to rewrite f (L) in terms of f (Q T LQ) by using some properties of matrix functions. Recalling that L1 = 0 and that 1 T Q = 0 T , we have: Now, using well known properties of matrix functions, we have for some ϕ ∈ R n−1 . The vector ϕ can be expressed in closed form (see, e.g., [17,Theorem 1.21]), but this is not necessary for our purposes.
Let us assume at first that our goal is to compute f (L T )v for a vector v such that 1 T v = 0. Using (5.9), we get Now, consider the computation of f (L T )b for a generic vector b. If 1 T b = β = 0, we can always write b = v + βz for some vector v ⊥ 1 (recall that z satisfies L T z = 0 and 1 T z = 1). Hence, using (5.10) we have Using (5.11), we can compute f (L T )b by using a rational Krylov method on the nonsingular projected matrix Q T L T Q. As the rank-one shift, this requires knowledge of z, the left 0-eigenvector of the graph Laplacian, which must be computed beforehand by solving the singular linear system L T z = 0. In order to make this approach viable, we need to be able to compute matrix-vector products with Q efficiently: we address this problem in subsection 5.2.1. We are also going to show that in exact arithmetic the Krylov methods for f (L T )v and Qf (Q T L T Q)Q T v construct precisely the same approximations after an equal number of iterations, so it is actually not necessary to perform the projection explicitly.

Fast matrix-vector products with Q.
In this part we show how to construct a matrix Q with orthonormal columns spanning the subspace S = span 1 ⊥ , such that the matrix-vector products of the form Qu and Q T v can be perfomed with cost O(n).
Let us define the orthogonal matrixQ = Q 1 √ n 1 as where we denoted by 0 n−1 , 1 n−1 and I n−1 respectively the all-zeroes vector, the allones vector, and the identity matrix of size n−1. It is straightforward to see that with the above definitionQ is indeed an orthogonal matrix. Now, for any vector v ∈ R n and u ∈ R n−1 we have The last equality in (5.13) follows from Lemma 5.5(b).
Hence the matrix-vector products Qu and Q T v can be computed with cost O(n), and the solution of a shifted linear system with Q T L T Q can be reduced with cost O(n) to the solution of a shifted linear system with L T .

Implicit projection.
In the following part, we are going to examine how rational Krylov methods for the computation of f (L T )b are related to their projected counterpart, i.e. to methods that first approximate f (Q T L T Q)Q T b using rational Krylov subspaces and then use (5.10) to compute f (L T )b, in the case of an initial vector b ⊥ 1. Note that the assumption b ⊥ 1 is not satisfied when computing the solution to the fractional diffusion equation, since in that case the initial vector u 0 is a probability vector, and hence 1 T u 0 = 1. However, with the same procedure used in identity (5.11), the results of this section can be used with minor modifications for any initial vector b.
We are going to prove our result in a slightly more general scenario: let A ∈ R n×n , and let x be a left eigenvector of A such that x T A = λx T . The specific case of the graph Laplacian will then correspond to A = L T , x = 1 and λ = 0. Let Q be an n × (n − 1) matrix whose columns are an orthonormal basis of span{x} ⊥ . If b ⊥ x, the same argument used in the proof of (5.10) gives us (5.14) f Recall that the usual rational Arnoldi algorithm for f (A)b computes an orthonormal sequence {v k } k≥1 such that v 1 = b/ b 2 and span {v 1 , . . . , v k } = Q k (A, b). If we define V k = [v 1 . . . v k ], and B k = V T k AV k , a rational Krylov method then yields the approximation Alternatively, if we work with the right hand side of (5.14), after k iterations the rational Arnoldi algorithm constructs the matrix U k = [u 1 . . . u k ] ∈ R (n−1)×k , whose columns {u 1 , . . . , u k } are an orthonormal basis for Q k (Q T AQ, Q T b). Then the vector f (Q T AQ)Q T b can be approximated by Applying now (5.14), we have the following approximation to f (A)b: We will refer to the method described by equation (5.16) as a projected rational Krylov method.
The main result of this section is the following.
Proof of Theorem 5.4. We start by showing that there exists a k × k diagonal and orthogonal matrix D k such that U k = Q T V k D k . Since u 1 = Q T v 1 by definition, it is enough to prove that u k+1 = ±Q T v k+1 , with the assumption that U k = Q T V k D k , for k ≥ 1. If we denote by z k a continuation vector for Q k (Q T AQ, Q T b), then by Lemma 5.5(d) we have that w k = Qz k is a continuation vector for Q k (A, b). We have the following chain of equalities, using Lemma 5.5(b) for the second equality: So we have u k+1 = ±Q T v k+1 , since both vectors are in Q k (Q T AQ, Q T b) and they are orthogonal to the columns of U k . Hence we have obtained where D k is a diagonal matrix whose diagonal elements are equal to ±1. Now it is straightforward to see that C k = D k B k D k and thatȳ k =ȳ k . Indeed we haveȳ The result of Theorem 5.4 can also be seen as a consequence of the implicit Q theorem for rational Arnoldi decompositions [5,Theorem 3.2].
Although Theorem 5.4 is guaranteed to hold only in exact arithmetic, we observed from our experiments that the error curves given by the two approximations (5.15) and (5.16) are almost always overlapping, so the implicit projection is a valid (and cheaper) alternative to (5.16).
Remark 5.6. The approach described in this section is similar to the deflation and augmentation strategies used in the solution of linear systems with Krylov methods: the aim of these techniques is to include exact or approximate spectral information on the matrix in order to speed up the convergence. This can be done by either adding a few known eigenvectors to the Krylov subspace, or by directly solving a deflated problem constructed using the spectral information on the matrix. Since the implicit projection method constructs a Krylov subspace that is orthogonal to the vector x (see Lemma 5.5(a)), it can be interpreted as an implicit way to construct an augmented Krylov subspace, also containing the eigenvector x. For additional details on deflation and augmentation techniques used in Krylov methods for the solution of linear systems, we refer to the review article [31,Section 9] and to the references cited therein.
6. Numerical experiments. In this section we test and compare the performance of the various methods for the computation of f (A)b that we presented earlier, using them to approximate the solution to the fractional diffusion equation (3.2) on real-world networks, both undirected and directed. Recall that the solution to (3.2) at time t can be expressed in the form Since the graphs we consider are strongly connected, the eigenvalues λ 1 , . . . , λ n of the graph Laplacian can be ordered in a way such that 0 = λ 1 < |λ 2 | ≤ · · · ≤ |λ n |.
We use the result of Theorem 5.4 to compute f (L T )u 0 via (5.11) in the following way: letting β = 1 T u 0 > 0, there exists w ⊥ 1 such that u 0 = w + βz (recall that L T z = 0), and thus we can compute f (L T )u 0 as Theorem 5.4 guarantees that, at least in exact arithmetic, a rational Krylov method for f (L T )w yields the same approximate solution as the same Krylov method applied to the projected problem f (Q T L T Q)Q T w. We refer to the method obtained by using (6.2) and approximating f (L T )w with a Krylov method as an implicitly projected Krylov method.
As we mentioned earlier, we use poles located on the negative real axis (−∞, 0). For the Shift-and-Invert Krylov method, we compare two different choices of poles. Recall that in the case of a symmetric positive definite matrix A, if a > 0 is a lower bound for the smallest eigenvalue of A and b > 0 is an upper bound for its largest eigenvalue, so that σ(A) ⊂ [a, b], an effective pole choice for the Shift-and-Invert Krylov method is given by ξ = − √ ab (see, e.g., [1,Section 6]). In analogy with this choice, we use the pole ξ = − |λ 2 λ n |: if the graph is undirected, when we use the rank-one shift approach (5.2) with θ ≥ |λ 2 |, this pole corresponds exactly to the optimal choice ξ = − √ λ min λ max for symmetric positive definite matrices; the same choice appears to be reasonable also for the singular graph Laplacian and in the directed case. Indeed, our experiments show that this choice always provides a reliable convergence rate.
As a second possible choice for the Shift-and-Invert Krylov method, we use the pole ξ = −t −2/α proposed in [26]; this choice is based on an integral bound for the error of the Shift-and-Invert Krylov method, obtained using an integral expression for the function f . The choice ξ = −t −2/α was proposed for specific functions that arise in the context of fractional differential equations, like f (z) = e −tz α or f (z) = (1+tz α ) −1 , with α ∈ (0, 1) and t > 0. This pole choice is particularly effective when t is large, but it is more sensitive to changes in the parameters.
For the rational Krylov method based on the equidistributed sequence (EDS) described in subsection 4.1, we computed the asymptotically optimal poles using the spectral interval [0.99 · |λ 2 |, 1.01 · |λ n |], once again ignoring the presence of the eigenvalue of the Laplacian at 0. The resulting poles are located on the negative real axis (−∞, 0). Similarly to the choice ξ = − |λ 2 λ n | for the Shift-and-Invert Krylov method, this pole sequence is guaranteed to have the asymptotic rate of convergence (4.3) on the rank-one shifted matrix L T + θz1 T when the graph is undirected (for θ ≥ |λ 2 |). The experiments that we performed suggest that the same choice is still very effective also in the directed case and even when applied directly to the singular graph Laplacian. Note that this method, as well as the Shift-and-Invert method with ξ = − |λ 2 λ n |, requires the knowledge of the largest and smallest nonzero eigenvalues of the graph Laplacian L, that have to be computed beforehand.
All the experiments were performed in Matlab using the rat krylov function in the Rational Krylov toolbox [4]. The shifted linear systems in the Shift-and-Invert Krylov method were solved by computing beforehand an LU decomposition of the permuted matrix P T L T P − ξI, where P is a fill-in reducing permutation matrix obtained using the amd Matlab function. In the rank-one shifted and in the projected version of the methods, we used the modified Sherman-Morrison formula (5.8), with the vector ψ defined in (5.7), and the identities (5.13) to avoid explicitly forming the dense matrices L T + θz1 T and Q T L T Q. The use of (5.8) over (5.4) allowed us to avoid the cancellation in (5.6) for poles close to zero, as discussed after Remark 5.2. We set θ = 1 in (5.2) and we used the matrix Q defined by (5.12).
The error that we display is the relative error in the 2-norm, y −ȳ k 2 , where y is the solution to (3.2) at a certain time t, or an accurate approximation computed with a Krylov method when the size of the graph is large. In all our experiments, we first extracted the largest strongly connected component (LCC) of a graph and we restricted our problem to that component. Information on the number of nodes and edges of these components, as well as the maximum and minimum nonzero eigenvalues of the corresponding graph Laplacians, are reported in Table 1. The real-world networks that we used are available in the Sparse Matrix Collection [11].
As we observed in section 3, the solution u(t) to the fractional diffusion equation (3.2) is a probability vector for all t ≥ 0, and hence it is desirable for the approximations computed with Krylov methods to have the same property. In our experiments, we observed that this is indeed the case for the Krylov methods applied to the shifted matrix L T + z1 T , as well as for the projected and implicitly projected Krylov methods. On the other hand, in general the approximate solutions obtained by working directly with the singular graph Laplacian L T do not have entries that sum up to 1, and often exhibit a wildly oscillating error; moreover, upon closer inspection, we observed that a significant portion of the error lied along the left null-eigenvector z of the graph Laplacian L. By subtracting a multiple of z from the approximate solution to enforce 1 Tȳ k = 1, we were able to "correct" this oscillating component of the error; specifically, for each k we replaced the approximate solutionȳ k obtained after k iterations of a Krylov method with the corrected approximation whose entries now sum up to 1. We found that this correction greatly reduced the error both in the undirected and in the directed case, and hence we always applied it to the standard version of Krylov methods in all our experiments. On the other hand, the error correction (6.3) was never needed for the shifted, projected and implicitly projected variants of the methods. The error with and without the correction (6.3) is illustrated for the directed graph Roget in Figure 1.
The first set of experiments is performed on graphs of moderate size (about 1000 nodes), where the solution to (6.1) can still be computed directly in a reasonable amount of time via an eigendecomposition of the graph Laplacian. In these experiments, we used the largest connected component (LCC) of the undirected graph minnesota (2640 nodes) and the LCC ot the directed graph wiki-Vote (1300 nodes). The results for different values of t and α are shown in Figures 2 and 3. We can see that the EDS method always converges in the smallest number of iterations, with a rate that is always equal to or better than the one predicted by the bound (4.3), even in the nonsymmetric case. The Shift-and-Invert (S&I) Krylov method with ξ = − |λ 2 λ n | has a reliable convergence rate for all choices of the parameters, while the one with ξ = −t −2/α is more effective for large t (see, e.g., Figure 3(c)); however, it is more sensitive to changes in the parameters, and sometimes converges slowly (Figure 2(a)). As expected, the polynomial Krylov method usually converges very slowly, except for the case α = 1, corresponding to the matrix exponential (Figure 2(c)), for which polynomial Krylov methods are known to be effective. Note that in Figure 2(b) it holds t −2/α = 10 −4 , and observe that the rank-one shifted S&I method with ξ = −t −2/α attains the same accuracy as the other methods, as a result of formula (5.8); because of the cancellation discussed in Remark 5.2, the same accuracy could not be attained by using (5.4) in place of (5.8). The error curves of the desingularized methods are always overlapped to each other (showing, in particular, that the result of Theorem 5.4 also holds in finite precision arithmetic), and they often represent an improvement over the standard version of Krylov methods. Note that the desingularization techniques seem to be always effective for the polynomial Krylov method, reducing the error of at least one or two orders of magnitude compared to the standard version.
We also point out that in certain cases the desingularized methods manage to attain a higher final accuracy: this is most apparent in Figure 3(c) for the S&I method with ξ = − |λ 2 λ n |.
The second set of experiments deals with larger graphs with about 10000 or 20000 nodes, for which the computation of an eigendecomposition of the graph Laplacian would be very expensive. Based on the results of the experiments on smaller graphs, in this case we compute the error using as the reference solution an approximation to f (L T )u 0 computed using the EDS rational Krylov method with implicit projection, stopping the iterations when the 2-norm of the difference between two consecutive iterates is of the order of machine precision. To avoid bias in the error curves, we chose a different starting point for the EDS of the reference solution, thus producing a sequence of poles that is different from the one used to plot the error of the EDS method. We used the LCC of the undirected graphs Oregon-1 (11174 nodes), ca-HepPh (11204 nodes) and as-july06 (22963 nodes), and of the directed graphs enron (8271 nodes), p2p-Gnutella30 (8490 nodes) and hvdc1 (24836 nodes). The results for different values of α and t are shown in Figures 4 and 5. The use of desingularization is again shown to be beneficial, often leading to faster convergence and attaining a better maximum accuracy. In Figure 4(b) and Figure 5(a) we can observe that the S&I method with ξ = −t −2/α does not suffer from cancellation by virtue of (5.8), despite the presence of poles close to zero. These experiments also show that the polynomial Krylov method can have a variable and unpredictable convergence rate, depending on the graph: there are situations in which the convergence can take place quickly ( Figure 5(b)) or with moderate speed (Figure 4(a)), but more often than not this method converges very slowly and the error practically stagnates (see, e.g., Figure 4(b)). 7. Conclusions. In this work we have discussed the use of rational Krylov methods for the solution of the fractional diffusion equation on a graph. In order to improve the convergence speed of the methods, we have proposed three different procedures to deal with the eigenvalue at zero of the graph Laplacian, namely a rank-one shift, a subspace projection, and an implicit version of this projection. The experiments we conducted show that these three procedures yield in practice the same convergence curves, and often they are faster and attain higher accuracy than the original Krylov methods. To be applied, these methods only require the computation of the left zero-eigenvector of the graph Laplacian, and an additional cost of O(n) per iteration for the rank-one shift and projection techniques. The implicit projection approach is extremely easy to implement, since it only modifies the starting vector for the Krylov method and it requires no additional computations at each iteration.
Among the Krylov methods that we tested, the one based on the EDS and the S&I method with ξ = − |λ 2 λ n | converge quickly regardless of the parameters α and t; however, these methods require the computation of approximations to the eigenvalues λ 2 and λ n of the graph Laplacian. On the other hand, the S&I method with ξ = −t −2/α requires no previous knowledge of the spectrum of L, but its rate of convergence is more sensitive to changes in the parameters; even so, this method can sometimes outperform the others, especially when t is large.