1 Introduction

Linear operators are used in various tasks in engineering and scientific research such as simulations and data analysis. A classical example of linear operators is a differential operator used for describing various natural phenomena. Krylov subspace methods have been actively researched in numerical linear algebra for approximating the behavior of such given operators, such as approximation of eigenvalues, solutions of linear equations, and operator functions acting on vectors, which provide approximations of solutions or information on the solutions of differential equations [16, 19, 24, 27, 31,32,33, 35, 36, 43, 45]. In many cases, problems in infinite dimensional spaces such as differential equations are discretized by, for example, a finite difference method [8] or finite element method [1] and are transformed into finite dimensional problems with matrices, after which the Krylov subspace methods are applied to the matrices. On the other hand, Krylov subspace methods for operators in infinite dimensional Hilbert spaces without discretization have also been investigated and more general results than those for matrices have been developed [6, 9,10,11,12, 15, 26, 28, 30].

Meanwhile, linear operators that represent time evolutions in dynamical systems, called transfer operators, are being investigated in relation to various fields such as machine learning, physics, molecular dynamics, and control engineering [18, 21, 25, 40,41,42]. Unknown transfer operators are estimated through data generated by dynamical systems. Since the transfer operators are generally linear even if the dynamical systems are nonlinear, Krylov subspace methods can be used to understand nonlinear dynamical systems [3, 4, 14, 20]. To make the algorithm computable only with the data, transfer operators are often discussed in relation to RKHSs (reproducing kernel Hilbert spaces). The Arnoldi and shift-invert Arnoldi methods have been proposed as Krylov subspace methods for transfer operators in RKHSs. The Arnoldi method is a standard Krylov subspace method [3, 35], but for its convergence, operators applied to it have to be bounded. However, not all transfer operators are bounded [17]. For example, transfer operators defined in the RKHS associated with the Gaussian kernel are unbounded if the dynamical system is nonlinear and deterministic. Thus, the shift-invert Arnoldi method was also considered [14]. When we apply the shift-invert Arnoldi method, a shifted and inverted operator \((\gamma I-K)^{-1}\) for some \(\gamma\) which is not contained in the spectrum of K was considered instead of an unbounded K.

The main difference between the classical settings assumed for the Krylov subspace methods mentioned in the first paragraph and those for transfer operators is whether the information of the model is given or not. In the classical setting, a differential operator is given, and a model-driven approach with the operator is applied. On the other hand, in the above setting for transfer operators, neither a dynamical system nor a transfer operator is given. Instead, data generated by the system is given. A data-driven approach is applied in this case. The purpose of applying the Krylov subspace method also differs in some situations. When we apply it to a transfer operator, denoted as K, one important task is to estimate \(K^nv\) for a given vector v and some \(n\in \mathbb {N}\) because K is unknown. On the other hand, in the classical setting for a given operator such as a differential operator, denoted as A, Av for a given vector v is already known, because both A and v are known. The main task of the Krylov subspace method in such a setting is to estimate f(A)v for a given vector v and a function f such as \(f(z)=z^{-1}\) and \(f(z)=e^{z}\), except for \(f(z)=z^n\). For this reason, Krylov subspace methods for estimating operator-vector multiplications have not been discussed in the classical setting of numerical linear algebra. Meanwhile, although Krylov subspace methods for estimating operator-vector multiplications have been proposed in machine learning, the convergence analysis for them has not been fully investigated.

The objective of this paper is to analyze the convergence of such Krylov subspace methods for estimating operator-vector multiplications. We define a “residual” for approximating \(K^nv\) for a vector v and analyze the convergence of the residuals of the Krylov approximations. The classical Krylov subspace methods for estimating f(A)v are frequently associated with residuals. For \(f(z)=z^{-1}\), for example, the GMRES (generalized minimal residual method) approximation minimizes the residual in a Krylov subspace and the convergence of the residual is superlinear [9, 28, 44]. Moreover, in BiCG (biconjugate gradient) type approximations, the residual or a value relevant to the residual is orthogonal to the Krylov subspace [43]. For a more general f, a generalized residual is proposed for evaluating the convergence of approximations [2, 13, 16, 34].

In our case, we show that the Arnoldi approximation converges to the minimizer of the residual. To illustrate this point, an error bound for a Krylov approximation of an operator function acting on a vector is used [10,11,12, 15, 27]. For the shift-invert Arnoldi method, the convergence analysis is not straightforward. At first glance, the problem of estimating \(K^nv\) seems to be the same as that of estimating the operator function f(K) acting on the vector v, where \(f(z)=z^n\). However, the situation is different from that of the classical Krylov subspace methods in terms of operator functions. The existing error bound for the Krylov approximation of f(K)v requires an assumption of the holomorphicity of f on the spectrum of K. On the other hand, the function \(f(z)=(\gamma -z^{-1})^n\), where “\(f((\gamma I-K)^{-1})=K^n\)” holds formally, is not holomorphic at 0, but 0 is contained in the spectrum of \((\gamma I-K)^{-1}\) if K is unbounded. We resolve this problem through the factor \(K^{-n}\) that appears in the residual.

This paper is structured in the following manner. In Sect. 2, to explain why operator-vector multiplications need to be estimated for data analysis, we review the definition of a transfer operator and the Krylov subspace methods for it. In Sect. 3, we generalize the problem to Krylov approximations for estimating operator-vector multiplications for linear operators in a Hilbert space and investigate a convergence analysis. In Sect. 4, we empirically confirm the results investigated in Sect. 3. Section 5 is the conclusion.

1.1 Notations

Linear operators are denoted with standard capital letters, except for \(m\times m\) matrices, which are denoted in bold. Calligraphic capital letters and Italicized Greek capital letters denote sets. The inner product and norm are denoted as \(\left\langle \cdot ,\cdot \right\rangle\) and \(\Vert \cdot \Vert\), respectively.

2 Background

In this section, we briefly review the definition of Perron–Frobenius operators and Krylov subspace methods for Perron–Frobenius operators [14, 20]. Perron–Frobenius operators are transfer operators often discussed in relation to RKHSs, and their Krylov subspaces naturally appear [18, 20, 22]. The adjoint operators of Perron–Frobenius operators are referred to as the Koopman operators [23], which are also transfer operators and have been researched for data-driven approaches [3, 4, 21, 40, 42].

2.1 Perron–Frobenius operator in RKHS

Consider the following dynamical system with random noise [14]:

$$\begin{aligned} X_{t+1}=h(X_{t})+\xi _t, \end{aligned}$$
(1)

where \(t\in \mathbb {Z}_{\ge 0}\), \((\varOmega ,\mathcal {F})\) is a measurable space, \((\mathcal {X},\mathcal {B})\) is a Borel measurable and locally compact Hausdorff vector space, \(X_t\) and \(\xi _t\) are random variables from \(\varOmega\) to \(\mathcal {X}\), and \(h:\mathcal {X}\rightarrow \mathcal {X}\) is a generally nonlinear map. Assume \(\{\xi _t\}_{t\in \mathbb {Z}}\) is an i.i.d. stochastic process and that \(\xi _t\) is also independent of \(X_t\). The random variable \(\xi _t\) corresponds to random noise in \(\mathcal {X}\).

Let P be a probability measure in \(\varOmega\). The nonlinear time evolution of \(X_t\) in a dynamical system (1) is regarded as a linear time evolution of the push forward measure \({X_t}_*P\), defined by \({X_t}_*P(B)=P({X_t}^{-1}(B))\) for \(B\in \mathcal {B}\). To describe the time evolution in a Hilbert space, RKHS [37] is used. An RKHS is a Hilbert space constructed by a map \(k:\mathcal {X}\times \mathcal {X}\rightarrow \mathbb {C}\) called a positive definite kernel. For \(x\in \mathcal {X}\), a map \(\phi :\mathcal {X}\rightarrow \mathbb {C}^{\mathcal {X}}\) defined as \(\phi (x)=k(x,\cdot )\) is called a feature map. Let \(\mathcal {H}_{k,0}\) be a vector space defined as

$$\begin{aligned} \mathcal {H}_{k,0}=\bigg \{\sum _{i=1}^nc_i\phi (x_i)\mid \ n\in \mathbb {N},\ c_1\ldots ,c_n\in \mathbb {C},\ x_1,\ldots ,x_n\in \mathcal {X}\bigg \}. \end{aligned}$$

In \(\mathcal {H}_{k,0}\), the inner product associated with k is defined, and the completion of \(\mathcal {H}_{k,0}\), which is denoted as \(\mathcal {H}_k\), is called an RKHS. An observation \(z\in \mathcal {X}\) is regarded as a vector \(\phi (z)\) in \(\mathcal {H}_k\) through \(\phi\). Moreover, if k is bounded, continuous, and \(c_0\)-universal, then the space of all the complex-valued finite regular Borel measures on \(\mathcal {X}\), which is denoted as \(\mathcal {M}(\mathcal {X})\), is densely embedded into \(\mathcal {H}_k\). That is, a map \(\varPhi :\mathcal {M}(\mathcal {X})\rightarrow \mathcal {H}_k\) defined as \(\mu \mapsto \int _{x\in \mathcal {X}}\phi (x)\;d\mu (x)\) is injective [39] and \(\varPhi (\mathcal {M}(\mathcal {X}))\) is dense in \(\mathcal {H}_k\) [14]. Here, \(c_0\)-universal means that \(\mathcal {H}_k\) is dense in the space of all continuous functions that vanish at infinity. The map \(\varPhi\) is called a kernel mean embedding [29]. For example, the Gaussian kernel \(e^{-c\Vert x-y\Vert ^2_2}\) and Laplacian kernel \(e^{-c\Vert x-y\Vert _1}\) with \(c>0\) for \(x,y\in \mathbb {R}^d\) are bounded and continuous \(c_0\)-universal kernels. Therefore, a complex-valued finite regular Borel measure \(\mu\) is regarded as a vector \(\varPhi (\mu )\) in the dense subset of Hilbert space \(\mathcal {H}_k\). Since the map \(\varPhi :\mathcal {M}(\mathcal {X})\rightarrow \mathcal {H}_k\) is linear, it is possible to define a linear operator \(K:\varPhi (\mathcal {M}(\mathcal {X}))\rightarrow \mathcal {H}_k\), which is called a Perron–Frobenius operator, in \(\mathcal {H}_k\) as follows:

$$\begin{aligned} K\varPhi (\mu )=\varPhi ({\beta _t}_*(\mu \otimes P)), \end{aligned}$$
(2)

where \(\beta _t:\mathcal {X}\times \varOmega \rightarrow \mathcal {X}\) is defined as \((x,\omega )\mapsto h(x)+\xi _t(\omega )\). Since \(\xi _t\) and \(X_t\) are independent, \(\varPhi ({\beta _t}_*({X_t}_*P\otimes P))=\varPhi ((h(X_t)+\xi _t)_*P)\) holds, and K maps \(\varPhi ({X_t}_*P)\) to \(\varPhi ({X_{t+1}}_*P)\). In addition, since \(\{\xi _t\}_{t\in \mathbb {Z}}\) is an i.i.d. process, it can be shown that K does not depend on t.

2.2 Krylov subspace methods for Perron–Frobenius operators

Let \(\{x_0,x_1,\ldots \}\subseteq \mathcal {X}\) be observed time-series data from the dynamical system (1), i.e., \(x_t=X_t(\omega _0)\) for some \(\omega _0\in \varOmega\). By using Krylov subspace methods, we estimate \(K^n\phi (x_t)\) for \(x_t\in \mathcal {X}\) to predict \(\phi (x_{t+n})\) through available data. In the mth Krylov step, the data is split into S datasets. Examples of the choice for S are \(S=m+1\) and \(S=M\) for a sufficiently large natural number M. Let \(\mu _{t,N}^S=1/N\sum _{i=0}^{N-1}\delta _{x_{t+iS}}\ (t=0,\ldots ,m)\) be empirical measures with the datasets, where \(N\in \mathbb {N}\) and \(\delta _x\) denotes the Dirac measure at \(x\in \mathcal {X}\). It is assumed that \(\mu _{t,N}^S\) weakly converges to a finite regular Borel measure \(\mu _{t}^S\) as \(N\rightarrow \infty\) for \(t=0,\ldots m\).

A Krylov subspace is constructed with \(\varPhi (\mu _{t}^S)\). To construct the Krylov subspace only with the observed data \(\{x_0,x_1,\dots \}\), the following equality of the average of noise \(\xi _t\) is assumed for any measurable and integrable function f:

$$\begin{aligned}&\lim _{N\rightarrow \infty }\frac{1}{N}\sum _{i=0}^{N-1}\int _{\omega \in \varOmega }f(h({x}_{t+iS})+\xi _t(\omega ))\;dP(\omega ) \nonumber \\&\quad =\lim _{N\rightarrow \infty }\frac{1}{N}\sum _{i=0}^{N-1}f(h({x}_{t+iS})+\xi _{t+iS}(\eta ))\ a.s.\ \eta \in \varOmega . \end{aligned}$$
(3)

The left- and right-hand sides of Eq. (3) represent the space average and time average of \(\xi _t\), respectively. The same types of assumptions as Eq. (3) have also been considered in other studies [4, 42]. By applying the above settings and assumptions, the Arnoldi and shift-invert Arnoldi approximations for the Perron–Frobenius operator K, defined as Eq. (2), are computed as explained in the following subsections.

2.3 The Arnoldi method

In this section, the Perron–Frobenius operator K is assumed to be bounded. Under the assumption (3), the following equation is derived since \(\varPhi\) is continuous:

$$\begin{aligned} \lim _{N\rightarrow \infty }K\varPhi (\mu _{t,N}^S)&=\varPhi (\mu _{t+1}^S)\ (t=0,\ldots ,m-1). \end{aligned}$$
(4)

Thus, if K is bounded, \(K\varPhi (\mu _{t}^S)=\varPhi (\mu _{t+1}^S)\) holds. Therefore, if the set of the vectors \(\{\varPhi (\mu _{0}^S),\ldots ,\varPhi (\mu _{m-1}^S)\}\) is linearly independent, the following space, denoted as \(\mathcal {K}_m(K,\varPhi (\mu _0^S))\), is an m-dimensional Krylov subspace of the operator K and vector \(\varPhi (\mu _0^S)\):

$$\begin{aligned} \mathcal {K}_m(K,\varPhi (\mu _0^S))={\text {Span}}\{\varPhi (\mu _{0}^S),\ldots , \varPhi (\mu _{m-1}^S)\}. \end{aligned}$$

Remark 2.1

If S depends on m, the initial vector \(\varPhi (\mu _{0}^S)\) depends on m. In this case, the inclusion \(\mathcal {K}_{m-1}(K,\varPhi (\mu _0^S))\subseteq \mathcal {K}_m(K,\varPhi (\mu _0^S))\) does not always hold. On the other hand, if S does not depend on m, \(\varPhi (\mu _{0}^S)\) does not depend on m and the inclusion \(\mathcal {K}_{m-1}(K,\varPhi (\mu _0^S))\subseteq \mathcal {K}_m(K,\varPhi (\mu _0^S))\) holds.

Let \({q_1,\ldots ,q_m}\) be an orthonormal basis of the Krylov subspace \(\mathcal {K}_m(K,\varPhi (\mu _0^S))\) obtained through the Gram–Schmidt orthonormalization and \(Q_m:\mathbb {C}^m\rightarrow \mathcal {H}_k\) be defined as \([c_1,\ldots ,c_m]\mapsto \sum _{i=1}^m c_iq_i\). Note that \(Q_mQ_m^*\), where \(^*\) means adjoint, is a projection operator onto the Krylov subspace. There exists an invertible matrix \(\mathbf {R}_m\in \mathbb {C}^{m\times m}\) such that \([\varPhi (\mu _0^S),\ldots ,\varPhi (\mu _{m-1}^S)]=Q_m\mathbf {R}_m\). This makes it possible to compute the following Arnoldi approximation of \(K\phi (z)\) for an observable \(z\in \mathcal {X}\) only with observed data \(\{x_0,x_1,\ldots \}\):

$$\begin{aligned} K\phi (z)&\approx Q_mQ_m^*KQ_mQ_m^*\phi (z) \\&=Q_mQ_m^*[\varPhi (\mu _1^S),\ldots ,\varPhi (\mu _{m}^S)]\mathbf {R}_m^{-1}Q_m^*\phi (z) \\&=Q_m\tilde{\mathbf {K}}_mQ_m^*\phi (z) , \end{aligned}$$

where \(\tilde{\mathbf {K}}_m=Q_m^*KQ_m=Q_m^*[\varPhi (\mu _1^S),\ldots ,\varPhi (\mu _{m}^S)]\mathbf {R}_m^{-1}\).

2.4 The shift-invert Arnoldi method

The convergence of the Arnoldi method along m is not always attained if K is unbounded [14]. According to Ikeda et al. [17], not all the Perron–Frobenius operators are bounded. For this reason, the shift-invert Arnoldi method is also considered.

Let \(\gamma \notin \varLambda (K)\) be fixed, where \(\varLambda (K)\) is the spectrum of K under the assumption of \(\varLambda (K)\ne \mathbb {C}\), and consider using bounded bijective operator \((\gamma I-K)^{-1}\). Under the assumption (3), the following equation is derived:

$$\begin{aligned} \lim _{N\rightarrow \infty }(\gamma I-K)^{-1}u_{t+1,N}^S&=u_t^S, \end{aligned}$$
(5)

where \(u_{t,N}^S=\sum _{i=0}^{t}\left( {\begin{array}{c}t\\ i\end{array}}\right) (-1)^{i}\gamma ^{t-i}\varPhi (\mu _{i,N}^S)\) and \(u_{t}^S=\sum _{i=0}^{t}\left( {\begin{array}{c}t\\ i\end{array}}\right) (-1)^{i}\gamma ^{t-i}\varPhi (\mu _{i}^S)\). Since \((\gamma I-K)^{-1}\) is bounded, \((\gamma I-K)^{-1}u_{t+1}^S=u_t^S\) holds. Therefore, if the set of the vectors \(\{u_1^S,\ldots ,u_m^S\}\) is linearly independent, then the space spanned by \(\{u_1^S,\ldots ,u_m^S\}\) is an m-dimensional Krylov subspace of the operator \((\gamma I-K)^{-1}\) and vector \(u_m^S\). Similar to the Arnoldi method, let \({q_1,\ldots ,q_m}\) be an orthonormal basis of the Krylov subspace \(\mathcal {K}_m((\gamma I-K)^{-1},u_m^S)\) obtained through the Gram–Schmidt orthonormalization and \(Q_m:\mathbb {C}^m\rightarrow \mathcal {H}_k\) be defined as \([c_1,\ldots ,c_m]\mapsto \sum _{i=1}^m c_iq_i\). There exists an invertible matrix \(\mathbf {R}_m\in \mathbb {C}^{m\times m}\) satisfying \([u_1^S,\ldots ,u_m^S]\) \(=Q_m\mathbf {R}_m\). If K is unbounded, Kv for \(v\in \mathcal {H}_k\) is not always defined. However, if \(v\in \varPhi (\mathcal {M}(\mathcal {X}))\), Kv is defined, in which case Kv is represented as \((\gamma I-((\gamma I-K)^{-1})^{-1})v\). On the basis of this observation, the following shift-invert Arnoldi approximation of \(K\phi (z)\) for \(z\in \mathcal {X}\) is deduced if \(\tilde{\mathbf {L}}_m\), which is defined as \(\tilde{\mathbf {L}}_m=Q_m^*(\gamma I-K)^{-1}Q_m=Q_m^*[u_0^S,\ldots ,u_{m-1}^S]\mathbf {R}_m^{-1}\), is invertible:

$$\begin{aligned} K\phi (z)&\approx Q_mf_{\gamma }(Q_m^*(\gamma I-K)^{-1}Q_m)Q_m^*\phi (z) \\&=Q_mf_{\gamma }(Q_m^*[u_0^S,\ldots ,u_{m-1}^S]\mathbf {R}_m^{-1})Q_m^*\phi (z) \\&=Q_m\tilde{\mathbf {K}}_mQ_m^*\phi (z), \end{aligned}$$

where \(f_{\gamma }(z)=\gamma -z^{-1}\) for \(z\in \mathbb {C}\) and \(\tilde{\mathbf {K}}_m=f_{\gamma }(\tilde{\mathbf {L}}_m)\).

3 Convergence analysis

In this section, we provide a convergence analysis of the Arnoldi method and shift-invert Arnoldi method described in Sect. 2. The problem is generalized to a separable complex Hilbert space \(\mathcal {H}\) and linear operator K on \(\mathcal {H}\) by setting \(v=\phi (z)\), \(v_0=\varPhi (\mu _0^S)\), and \(v_i=K^iv_0\) for \(i=1,\ldots ,m\).

In Sect. 3.1, we generalize the problem. In Sect. 3.2, we define a residual of an approximation of \(K^nv\). Then, we investigate the relationship between the two methods and the residuals in Sects. 3.3 and 3.4.

3.1 The general setting for Krylov subspace methods for estimating operator-vector multiplications

Let \(\mathcal {H}\) be a separable complex Hilbert space, let \(K:\mathcal {D}\rightarrow \mathcal {H}\) be an unknown linear map, where \(\mathcal {D}\) is a dense subset of \(\mathcal {H}\), and let v and \(v_0\) be given vectors in \(\mathcal {H}\). We assume \(K^iv_0\in \mathcal {D}\) for any natural number i since by the definition of the Perron–Frobenius operator K, \(Kv\in \mathcal {D}\) holds for \(\mathcal {D}=\varPhi (\mathcal {M}(\mathcal {X}))\). The purpose of the Krylov subspace method is to estimate \(K^nv\) only with \(v,v_0,\ldots ,v_m\), where \(v_i=K^iv_0\).

Assume the dimension of \(\mathcal {K}_m(K,v_0)\) is m. Let \({q_1,\ldots ,q_m}\) be an orthonormal basis of the Krylov subspace \(\mathcal {K}_m(K,v_0)\) obtained through the Gram–Schmidt orthonormalization and \(Q_m:\mathbb {C}^m\rightarrow \mathcal {H}\) be defined as \([c_1,\ldots ,c_m]\mapsto \sum _{i=1}^m c_iq_i\). Then, there exists an invertible matrix \(\mathbf {R}_m\in \mathbb {C}^{m\times m}\) such that \([v_0,\ldots ,v_{m-1}]=Q_m\mathbf {R}_m\). The Arnoldi approximation of \(K^nv\), which is denoted as \(a_m^{{\text {Arnoldi}}}\), is defined as

$$\begin{aligned} a_m^{{\text {Arnoldi}}}=Q_m\tilde{\mathbf {K}}_m^n Q_m^*v, \end{aligned}$$

where \(\tilde{\mathbf {K}}_m=Q_m^*KQ_m\), which can be represented as \(Q_m^*[v_1,\ldots ,v_m]\mathbf {R}_m^{-1}\).

Analogously, let \(\gamma \notin \varLambda (K)\), let \({q_1,\ldots ,q_m}\) be an orthonormal basis of the Krylov subspace \(\mathcal {K}_m((\gamma I-K)^{-1},u_m)\) obtained through the Gram–Schmidt orthonormalization, where \(u_m=\sum _{i=0}^{m}\left( {\begin{array}{c}m\\ i\end{array}}\right) (-1)^{i}\gamma ^{m-i}v_i\), and let \(Q_m:\mathbb {C}^m\rightarrow \mathcal {H}\) be defined as \([c_1,\ldots ,c_m]\mapsto \sum _{i=1}^m c_iq_i\). Then, there exists an invertible matrix \(\mathbf {R}_m\in \mathbb {C}^{m\times m}\) such that \([u_1,\ldots ,u_{m}]=Q_m\mathbf {R}_m\). Let \(\tilde{\mathbf {L}}_m=Q_m^*(\gamma I-K)^{-1}Q_m\), which can be represented as \(Q_m^*[u_0,\ldots ,u_{m-1}]\mathbf {R}_m^{-1}\). If \(\tilde{\mathbf {L}}_m\) is invertible, the shift-invert Arnoldi approximation of \(K^nv\), which is denoted as \(a_m^{{\text {SIA}}}\), is defined as

$$\begin{aligned} a_m^{{\text {SIA}}}=Q_m\tilde{\mathbf {K}}_m^n Q_m^*v, \end{aligned}$$

where \(\tilde{\mathbf {K}}_m=f_{\gamma }(\tilde{\mathbf {L}}_m)\) and \(f_{\gamma }(z)=\gamma -z^{-1}\). Here, we give the different expression of \(\tilde{\mathbf {K}}_m\) for the shift-invert Arnoldi method from the Arnoldi method.

3.2 A residual of an approximation of operator-vector multiplication

Assume \(0\notin \varLambda (K)\). We define a residual of an approximation \(a_m\) of \(K^nv\) as follows:

$$\begin{aligned} {\text {res}}(a_m)=v-K^{-n}a_m. \end{aligned}$$
(6)

Although the approximation error \(K^nv-a_m\) is generally not available since \(K^nv\) is unknown, \(K^{-n}a_m\) is available in some cases. For example, if K is a Perron–Frobenius operator and we know past observations \(x_{-1},\ldots ,x_{-n}\), then we can calculate \(K^{-n}\phi (\mu _t^S)\) for \(t=0,\ldots ,m-1\). Then, we can also calculate \(K^{-n}a_m^{{\text {Arnoldi}}}\). In fact, the residual (6) is a reasonable criterion for evaluating the convergence of the approximation for two reasons. First, the residual of an approximation \(a_m\) of the solution of a linear equation \(Ax=b\) is defined as \(b-Aa_m\). If the problem of approximating \(K^nv\) is regarded as that of solving \(K^{-n}x=v\), the residual of approximation \(a_m\) is \(v-K^{-n}a_m\). Second, the following proposition shows that the value \(v-K^{-n}a_m\) can be decomposed into a generalized residual of the Krylov approximation proposed by Saad [34] and Hochbruck et al. [16] and the error with respect to projecting v into a Krylov subspace.

Proposition 3.1

Assume \(0\notin \varLambda (K)\). Let \(a_m=Q_m\tilde{\mathbf {K}}_m^nQ_m^*v\) be the Arnoldi or shift-invert Arnoldi approximation of \(K^nv\) and let \(f(z)=z^{-1}\). In addition, let \(r_m\) be the generalized residual of \(a_m\) with respect to \(f(K^{-n})v\), i.e.,

$$\begin{aligned} r_m=\frac{1}{2\pi \mathrm {i}}\int _{z\in \varGamma }f(z)\big ((zI-K^{-n})Q_m(zI-\tilde{\mathbf {K}}_m^{-n})^{-1}Q_m^*v-v\big )dz, \end{aligned}$$

where \(\mathrm {i}\) is the imaginary unit and \(\varGamma\) is a rectifiable Jordan curve enclosing \(\varLambda (\mathbf {K}_m^{-1})\) but not enclosing 0. Then, the residual of \(a_m\) defined as (6) is decomposed as follows:

$$\begin{aligned} {\text {res}}(a_m)=r_m+\left( v-Q_mQ_m^*v\right) . \end{aligned}$$

Proof

Since \(0\notin \varLambda (K)\) and \(\varLambda (\tilde{\mathbf {K}}_m)=\varLambda (Q_m^*KQ_m)\subseteq \varLambda (K)\) hold, we have \(0\notin \varLambda (\tilde{\mathbf {K}}_m)\). Thus, we obtain \(0\notin \varLambda (\tilde{\mathbf {K}}_m^{-1})\), and there exists a rectifiable Jordan curve \(\varGamma\) where f is holomorphic in the region enclosed by \(\varGamma\) and continuous on \(\varGamma\). Therefore, \(\int _{z\in \varGamma }f(z)dz=0\), and by the Cauchy’s integral formula, the following equalities are derived:

$$\begin{aligned} r_m&=\frac{1}{2\pi \mathrm {i}}\int _{z\in \varGamma }f(z)\big ((zI-K^{-n})Q_m(zI-\tilde{\mathbf {K}}_m^{-n})^{-1}Q_m^*v-v\big )dz\\&=Q_mf(\tilde{\mathbf {K}}_m^{-n})\tilde{\mathbf {K}}_m^{-n}Q_m^*v-K^{-n}Q_mf(\tilde{\mathbf {K}}_m^{-n})Q_m^*v\\&=Q_mQ_m^*v-K^{-n}a_m, \end{aligned}$$

which completes the proof of the proposition. \(\square\)

3.3 Convergence analysis for the Arnoldi method

In this section, we assume K is bounded and \(0\notin \varLambda (K)\).

The Arnoldi approximation \(a_m^{{\text {Arnoldi}}}=Q_m\tilde{\mathbf {K}}_m^n Q_m^*v\) is obtained through two projections. First, the vector \(v\in \mathcal {H}\) is projected onto the Krylov subspace \(\mathcal {K}_m(K,v_0)\). Then, K acts on the projected vector in \(\mathcal {K}_m(K,v_0)\) and is projected back to the Krylov subspace again. Note that we do not need the first projection in the classical Krylov subspace method for approximating f(A)v for a given linear operator A, vector v, and function f since we can compute \(A^iv\) for \(i=1,\ldots ,m-1\) and construct the Krylov subspace of A and v. On the other hand, we cannot construct the Krylov subspace of K and v in our current case since K is unknown and only \(K^iv_0\), not \(K^iv\), for \(i=1,\ldots ,m-1\) and a vector \(v_0\) are given. This prevents us evaluating the convergence speed of the approximation error or residual directly since the convergence speed of the approximation depends on that of the projected vector \(Q_mQ_m^*v\) to the original vector v. Therefore, we first consider the minimizer of the residual in a Krylov subspace and evaluate the difference between the Arnoldi approximation and minimizer.

In fact, since the projection \(Q_mQ_m^*\) is orthogonal, the projected vector \(Q_mQ_m^*v\) minimizes the difference from the original vector v, i.e.,

$$\begin{aligned}&\mathrm{arg\,min}_{u\in \mathcal {K}_m(K,v_0)}\Vert v-u\Vert =Q_mQ_m^*v. \end{aligned}$$
(7)

Since each \(u\in \mathcal {K}_m(K,v_n)\) satisfies \(K^{-n}u\in \mathcal {K}_m(K,v_0)\), Eq. (7) implies that the inequality \(\Vert v-K^{-n}\tilde{a}_m\Vert \le \Vert v-K^{-n}u\Vert\) holds, where \(\tilde{a}_m=K^{n}Q_mQ_m^*v\in \mathcal {K}_m(K,v_n)\). Therefore, \(\tilde{a}_m\) minimizes \(\Vert v-K^{-n}u\Vert\) for all \(u\in \mathcal {K}_m(K,v_n)\), i.e.,

$$\begin{aligned} \mathrm{arg\,min}_{u\in \mathcal {K}_m(K,v_n)}\Vert v-K^{-n}u\Vert =\tilde{a}_m. \end{aligned}$$

However, in practice, \(\mathcal {K}_m(K,v_n)\) is unavailable only with \(v,v_0,\ldots ,v_m\). Therefore, \(\tilde{a}_m\) is also unavailable. Thus, \(a_m^{{\text {Arnoldi}}}\), instead of \(\tilde{a}_m\) is used for estimating \(K^nv\).

We evaluate the difference between \(a_m^{{\text {Arnoldi}}}\) and \(\tilde{a}_m\). Let \({\mathbb {D}_{\rho }}=\{z\in \mathbb {C}\mid \vert z\vert \le \rho \}\) be the disk of diameter \(\rho >0\), let \(\mathcal {W}(K)=\{\left\langle v,Kv\right\rangle \mid \ v\in {\mathcal {D}},\ \Vert v\Vert =1\}\) be the numerical range of K, and let \(\overline{\mathbb {C}}=\mathbb {C}\bigcup \{\infty \}\) be the extended complex plane. Moreover, let \(\alpha _{\rho }\) be a conformal map from \(\overline{\mathbb {C}}{\setminus }\overline{\mathcal {W}(K)}\) to \(\overline{\mathbb {C}}{\setminus } \mathbb {D}_{\rho }\) that satisfies \(\alpha _{\rho }(\infty )=\infty\) and \(\lim _{z\rightarrow \infty }\alpha _{\rho }(z)/z=1\), and let \(\varGamma _r\) be the region enclosed by the contour \(\{z\in \mathbb {C}\mid \vert \alpha _{\rho }(z)\vert =r\}\) for \(r>\rho\). Here, \(\overline{\mathcal {W}(K)}\) is the closure of \({\mathcal {W}(K)}\) and by the Riemann mapping theorem, the map \(\alpha _{\rho }\) exists. The following theorem is deduced.

Theorem 3.2

Let \(n<m\), and let \(p_{m-n-1}\) and \({\tilde{p}}_{n-1}\) be polynomials of order \(m-n-1\) and \(n-1\) that satisfy \(a_m^{{\text {Arnoldi}}}=K^np_{m-n-1}(K)v_0+{\tilde{p}}_{n-1}(K)v_0\). Assume the set \(\{v_0,\ldots ,v_{m-1}\}\) is linearly independent. If the function \(f_m\) defined as \(f_m(z)=z^{-n}{\tilde{p}}_{n-1}(z)\) is holomorphic in \(\varGamma _r\), the residual of \(a_m^{{\text {Arnoldi}}}\) is evaluated as follows:

$$\begin{aligned}&\Vert {\text {res}}(a_m^{{\text {Arnoldi}}})-{\text {res}}(\tilde{a}_m)\Vert \le 2C_1C_2(m)\Vert v_0\Vert \frac{(\rho /r)^m}{1-(\rho /r)}{,} \end{aligned}$$

where \(C_1>0\) is a constant and \(C_2(m)>0\) depends on m.

We use the following lemma for deriving Theorem 3.2.

Lemma 3.3

Let \(n<m\). Assume \(0\notin \mathcal {W}(K)\) and the set \(\{v_0,\ldots ,v_{m-1}\}\) is linearly independent. Then, the following equality is deduced:

$$\begin{aligned}&{\text {res}}(a_m^{{\text {Arnoldi}}})-{\text {res}}(\tilde{a}_m)={Q_m\tilde{\mathbf {K}}_m^{-n}{\tilde{p}}_{n-1}(\tilde{\mathbf {K}}_m)Q_m^*v_0-K^{-n}{\tilde{p}}_{n-1}(K)v_0}. \end{aligned}$$
(8)

Proof

The identity \(p(K)v_0=Q_mp(\tilde{\mathbf {K}}_m)Q_m^*v_0\) holds for any polynomial p of an order less than or equal to \(m-1\). In addition, by the assumption of \(0\notin \mathcal {W}(K)\) and the inclusion \(\mathcal {W}(\tilde{\mathbf {K}}_m)\subseteq \mathcal {W}(K)\), \(\tilde{\mathbf {K}}_m\) is invertible. As a result, the following equalities are derived:

$$\begin{aligned}&{\text {res}}(a_m^{{\text {Arnoldi}}})-{\text {res}}(\tilde{a}_m)=K^{-n}\tilde{a}_m-K^{-n}a_m^{{\text {Arnoldi}}}\\&\quad =Q_m\tilde{\mathbf {K}}_m^{-n}\tilde{\mathbf {K}}_m^nQ_m^*v-\left( p_{m-n-1}(K)v_0+K^{-n}{\tilde{p}}_{n-1}(K)v_0\right) \\&\quad =Q_m\tilde{\mathbf {K}}_m^{-n}Q_m^*\left( K^np_{m-n-1}(K)v_0+{\tilde{p}}_{n-1}(K)v_0\right) \\&\qquad -\left( p_{m-n-1}(K)v_0+K^{-n}{\tilde{p}}_{n-1}(K)v_0\right) \\&\quad =Q_m\tilde{\mathbf {K}}_m^{-n}\left( \tilde{\mathbf {K}}_m^np_{m-n-1}(\tilde{\mathbf {K}}_m)Q_m^*v_0+{\tilde{p}}_{n-1}(\tilde{\mathbf {K}}_m)Q_m^*v_0\right) \\&\qquad -\left( {Q_m}p_{m-n-1}(\tilde{\mathbf {K}}_m)Q_m^*v_0+K^{-n}{\tilde{p}}_{n-1}(K)v_0\right) \\&\quad =Q_m\tilde{\mathbf {K}}_m^{-n}{\tilde{p}}_{n-1}(\tilde{\mathbf {K}}_m)Q_m^*v_0-K^{-n}{\tilde{p}}_{n-1}(K)v_0, \end{aligned}$$

which completes the proof of the lemma. \(\square\)

The vector \(Q_m\tilde{\mathbf {K}}_m^{-n}{\tilde{p}}_{n-1}(\tilde{\mathbf {K}}_m)Q_m^*v_0\) in the right-hand side of Eq. (8) is equivalent to the Arnoldi approximation of the operator function \(f_m(K)\) acting on the vector \(v_0\) [12, 15]. Note that since \({\tilde{p}}_{n-1}\) depends on m, \(f_m\) depends on m. By using this fact, we now prove Theorem 3.2.

Proof

(Proof of Theorem 3.2) Let \(\mathcal {P}_{m-1}\) be the set of all polynomials of orders less than or equal to \(m-1\). By Crouzeix et al. [5], the following bound is deduced for any \(p\in \mathcal {P}_{m-1}\):

$$\begin{aligned}&\Vert {\text {res}}(a_m^{{\text {Arnoldi}}})-{\text {res}}(\tilde{a}_m)\Vert =\Vert Q_mf_m(\tilde{\mathbf {K}}_m)Q_m^*v_0- f_m(K)v_0\Vert \\&\quad \le \Vert Q_mf_m(\tilde{\mathbf {K}}_m)Q_m^*v_0-Q_mp(\tilde{\mathbf {K}}_m)Q_m^*v_0\Vert {+}\Vert f_m(K)v_0-p(K){v_0}\Vert \\&\quad \le 2C_1\Vert v_0\Vert \Vert f_m-p\Vert _{\infty ,\mathcal {W}(K)}, \end{aligned}$$

where \(0<C_1\le 1+\sqrt{2}\). In addition, for a linear operator K and a map \(f:\mathbb {C}\rightarrow \mathbb {C}\) that is holomorphic in the interior of \(\mathcal {W}(K)\) and continuous in \(\overline{\mathcal {W}(K)}\), the norm \(\Vert f\Vert _{\infty ,\mathcal {W}(K)}\) is defined as \(\Vert f\Vert _{\infty ,\mathcal {W}(K)}=\sup _{z\in \mathcal {W}(K)}\vert f(z)\vert\). By taking the infimum among \(p\in \mathcal {P}_{m-1}\), we obtain

$$\begin{aligned} \Vert {\text {res}}(a_m^{{\text {Arnoldi}}})-{\text {res}}(\tilde{a}_m)\Vert&\le 2C_1\Vert v_0\Vert \inf _{p\in \mathcal {P}_{m-1}}\Vert f_m-p\Vert _{\infty ,\mathcal {W}(K)}. \end{aligned}$$
(9)

In fact, the infimum in the inequality (9) can be taken among \(p\in \{\tilde{p}\in \mathcal {P}_{m-1}\mid \ \Vert \tilde{p}\Vert _{\infty ,\mathcal {W}(K)}\le 2\Vert f_m\Vert _{\infty ,\mathcal {W}(K)}\}\), which is a compact space. Indeed, for a polynomial \(p\in \mathcal {P}_{m-1}\) satisfying \(\Vert p\Vert _{\infty ,\mathcal {W}(K)}>2\Vert f_m\Vert _{\infty ,\mathcal {W}(K)}\), we have

$$\begin{aligned} \Vert f_m-p\Vert _{\infty ,\mathcal {W}(K)}>\Vert f_m\Vert _{\infty ,\mathcal {W}(K)}=\Vert f_m-0\Vert _{\infty ,\mathcal {W}(K)}, \end{aligned}$$

and \(0\in \mathcal {P}_{m-1}\). Therefore, the infimum in the inequality (9) can be replaced with the minimum. By Ellacott [7, Corollary 2.2], this factor is bounded as

$$\begin{aligned} \min _{p\in \mathcal {P}_{m-1}}\Vert f_m-p\Vert _{\infty ,\mathcal {W}(K)}\le C_2(m)\frac{(\rho /r)^m}{1-(\rho /r)}, \end{aligned}$$

where \(C_2(m)=\max _{z\in \varGamma _r}\vert f_m(z)\vert\), which completes the proof of the theorem. \(\square\)

In fact, the following proposition guarantees the order of the increase in the factor \(C_2(m)\) is at least m in the case where the initial vector \(v_0\) does not depend on m (see Remark 2.1). Thus, in this case, the Arnoldi approximation \(a_m^{{\text {Arnoldi}}}\) approaches \(\tilde{a}_m\), the minimizer of the residual, in the order of \(m\alpha ^m\) for some \(0<\alpha <1\).

Proposition 3.4

Assume the set \(\{v_0,\ldots ,v_{m-1}\}\) is linearly independent and \(v_0\) does not depend on m. If the function \(f_m\) is holomorphic in \(\varGamma _r\), then the factor \(C_2(m)\) is bounded as

$$\begin{aligned} C_2(m)\le C_2(1)+(m-1)C_3, \end{aligned}$$

for some constant \(C_3>0\) does not depend on m.

Proof

We first evaluate the coefficients of the polynomial \({\tilde{p}_{n-1}}\). Since \(p_{m-n-1}\) and \({\tilde{p}_{n-1}}\) depend on m, we denote them as \(p_{m-n-1}^m\) and \({\tilde{p}_{n-1}}^m\) in this proof. Let \({\tilde{p}_{n-1}}^m(z)=\sum _{i=0}^{n-1}c_i(m)z^i\) and \(p_{m-n-1}^m(z)=\sum _{i=n}^{m-1}c_i(m)z^{i-n}\), where \(c_i(m)\in \mathbb {C}\). Then, by the definitions of \({\tilde{p}_{n-1}}^m\) and \(p_{m-n-1}^m\), we have

$$\begin{aligned}&\Vert a_m^{{\text {Arnoldi}}}-a_{m-1}^{{\text {Arnoldi}}}\Vert =\bigg \Vert \sum _{i=0}^{m-1}c_i(m)v_i-\sum _{i=0}^{m-2}c_i(m-1)v_i\bigg \Vert \\&\ \ge \bigg \vert \bigg \langle \tilde{q}_i,\bigg (\sum _{i=0}^{m-1}c_i(m)v_i-\sum _{i=0}^{m-2}c_i(m-1)v_i\bigg )\bigg \rangle \bigg \vert =\vert \langle \tilde{q}_i,v_i\rangle \vert \vert c_i(m)-c_i(m-1)\vert , \end{aligned}$$

for \(i=0,\ldots ,n\), where \(\tilde{q}_i\) is a normalized vector in the orthogonal complement of the space spanned by \(\{v_0,\ldots ,v_{i-1},v_{i+1},\ldots \}\). In addition, we have

$$\begin{aligned}&\Vert a_m^{{\text {Arnoldi}}}-a_{m-1}^{{\text {Arnoldi}}}\Vert =\Vert Q_m\tilde{\mathbf {K}}_m^nQ_m^*v-Q_{m-1}\tilde{\mathbf {K}}_{m-1}^nQ_{m-1}^*v\Vert \le 2\Vert K^n\Vert \Vert v\Vert . \end{aligned}$$

As a result, the following inequality is derived:

$$\begin{aligned} \vert c_i(m)-c_i(m-1)\vert \le \frac{2\Vert K^n\Vert \Vert v\Vert }{\vert \langle \tilde{q}_i,v_i\rangle \vert }. \end{aligned}$$
(10)

We now evaluate \(C_2(m)\). By the inequality (10) and the holomorphicity of \(f_m\), we obtain

$$\begin{aligned} C_2(m)&=\sup _{z\in \varGamma _r}\vert z^{-n}{\tilde{p}_n}^m(z)\vert \nonumber \\&\le \sup _{z\in \varGamma _r}\vert z^{-n}{\tilde{p}_n}^m(z)-z^{-n}{\tilde{p}_n}^{m-1}(z)\vert +\sup _{z\in \varGamma _r}\vert z^{-n}{\tilde{p}_n}^{m-1}(z)\vert \nonumber \\&=\sup _{z\in \varGamma _r}\bigg \vert \sum _{i=0}^nz^{-n+i}(c_i(m)-c_i(m-1))\bigg \vert +C_2(m-1) \nonumber \\&\le C_3+C(m-1), \end{aligned}$$
(11)

where

$$\begin{aligned} C_3=\sum _{i=1}^n\frac{2\Vert K^n\Vert \Vert v\Vert }{\vert \langle \tilde{q}_i,v_i\rangle \vert }\sup _{z\in \varGamma _r}\vert z^{-n+i}\vert . \end{aligned}$$

Applying the inequality (11) recursively completes the proof of the proposition. \(\square\)

The decrease in the value \(\Vert {\text {res}}(a_m^{{\text {Arnoldi}}})-{\text {res}}(\tilde{a}_m)\Vert\) is confirmed numerically in Sect. 4.2.

3.4 Convergence analysis for the shift-invert Arnoldi method

The convergence of the Arnoldi method is not guaranteed when K is unbounded. Moreover, although Theorem 3.2 requires the assumption about the numerical range of K, it is generally hard to calculate the numerical range of K, a linear operator in an infinite dimensional space. Therefore, we also consider the shift-invert Arnoldi method.

The shift-invert Arnoldi approximation \(a_m^{{\text {SIA}}}=Q_m\tilde{\mathbf {K}}^n_m Q_m^*v\) can also be obtained through two projections similar to the Arnoldi method. However, in this case, instead of K, the polynomial of \((\gamma I-K)^{-1}\) that approximates K acts on the vector which is the projection of v onto \(\mathcal {K}_m((\gamma I-K)^{-1},u_m)\).

Let \(n<m\). To address \(K^{-n}\) in the residual, we slightly modify the Krylov subspace \(\mathcal {K}_m((\gamma I-K)^{-1},u_m)\) and define a space \(\tilde{\mathcal {K}}_m((\gamma I-K)^{-1},w_{m-n})\) as follows:

$$\begin{aligned}&\tilde{\mathcal {K}}_m((\gamma I-K)^{-1},w_{m-n})\\&\quad {:}={\text {Span}}\{w_1,\ldots ,w_{m-n},K^{-1}w_{m-n},\ldots ,K^{-n}w_{m-n}\}\\&\quad ={\text {Span}}\{(\gamma I-K)^{-m+n+1}w_{m-n},\ldots ,(\gamma I-K)^{-1}w_{m-n},\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad w_{m-n}, K^{-1}w_{m-n},\ldots ,K^{-n}w_{m-n}\}\\&\quad ={\text {Span}}\{(\gamma I-K)^{-m+n+1}K^{-n}w_{m-n},\ldots ,(\gamma I-K)^{-1}K^{-n}w_{m-n},\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad w_{m-n}, K^{-1}w_{m-n},\ldots ,K^{-n}w_{m-n}\}\\&\quad ={\text {Span}}\{K^{-n}w_1,\ldots ,K^{-n}w_{m-n-1},w_{m-n},K^{-1}w_{m-n},\ldots ,K^{-n}w_{m-n}\}, \end{aligned}$$

where \(w_t=\sum _{i=0}^t\left( {\begin{array}{c}t\\ i\end{array}}\right) (-1)^{i}\gamma ^{t-i}v_{i+n}\). The third equality is obtained because by applying the partial fraction decomposition principle, \((\gamma I-K)^iK^{-n}w_{m-n}\) for \(i=-1,\ldots ,-m+n+1\) is represented as a linear combination of \((\gamma I-K)^{i}w_{m-n},\ldots ,(\gamma I-K)^{-1}w_{m-n},K^{-n}w_{m-n},\ldots ,K^{-1}w_{m-n}\) and the coefficient of \((\gamma I-K)^{i}w_{m-n}\) is not 0. Then, \((\gamma I-K)^{-1}w_{t+1}=w_t\) holds. Assume the dimension of \(\tilde{\mathcal {K}}_m((\gamma I-K)^{-1},w_{m-n})\) is m. Let \({q_1,\ldots ,q_m}\) be the orthonormal basis of the space \(\tilde{\mathcal {K}}_m((\gamma I-K)^{-1},w_{m-n})\) obtained through the Gram–Schmidt orthonormalization and let \(Q_m:\mathbb {C}^m\rightarrow \mathcal {H}\) be defined as \([c_1,\ldots ,c_m]\mapsto \sum _{i=1}^m c_iq_i\). Then, there exists an invertible matrix \(\mathbf {R}_m\in \mathbb {C}^{m\times m}\) such that \([w_1,\ldots ,w_{m-n},K^{-1}w_{m-n},\ldots ,K^{-n}w_{m-n}]=Q_m\mathbf {R}_m\). Let \(a_m^{{\text {SIA}}}=Q_m\tilde{\mathbf {K}}_m^nQ_m^*v\), where \(\tilde{\mathbf {K}}_m=f_{\gamma }(\tilde{\mathbf {L}}_m)\) and \(\tilde{\mathbf {L}}_m=Q_m^*(\gamma I-K)^{-1}Q_m\).

Since the projection \(Q_mQ_m^*\) is orthogonal, the projected vector \(Q_mQ_m^*v\) minimizes the difference from the original vector v, i.e.,

$$\begin{aligned}&\mathrm{arg\,min}_{u\in \tilde{\mathcal {K}}_m((\gamma I-K)^{-1},{w_{m-n}})}\Vert v-u\Vert =Q_mQ_m^*v. \end{aligned}$$
(12)

Since \((\gamma I-K)^{i}K^{-n}w_{m-n}\) can be represented as a linear combination of \(K^{-n+i}w_{m-n},\ldots ,K^{-n}w_{m-n}\), for \(i=1,\ldots ,n\), we have

$$\begin{aligned} K^{-n}w_{m-n+i}=(\gamma I-K)^{i}K^{-n}w_{m-n}\in \tilde{\mathcal {K}}_m((\gamma I-K)^{-1},w_{m-n}). \end{aligned}$$

Thus, each \(u\in {\text {Span}}\{w_1,\ldots ,w_m\}\) satisfies \(K^{-n}u\in \tilde{\mathcal {K}}_m((\gamma I-K)^{-1},w_{m-n})\). Therefore, Eq. (12) implies that \(\Vert v-K^{-n}\tilde{a}_m\Vert \le \Vert v-K^{-n}u\Vert\) holds, where \(\tilde{a}_m=K^nQ_mQ_m^*v\). As a result, \(\tilde{a}_m\) minimizes \(\Vert v-K^{-n}u\Vert\) for all \(u\in {\text {Span}}\{w_1,\ldots ,w_m\}\), i.e.,

$$\begin{aligned} \mathrm{arg\,min}_{u\in {\text {Span}}\{w_1,\ldots ,w_m\}}\Vert v-K^{-n}u\Vert =\tilde{a}_m. \end{aligned}$$

However, in practice, \({\text {Span}}\{w_1,\ldots ,w_m\}\) is unavailable only with \(v,v_0,\ldots ,v_m\). Therefore \(\tilde{a}_m\) is also unavailable. Thus, \(a_m^{{\text {SIA}}}\), instead of \(\tilde{a}_m\) is used for estimating \(K^nv\).

Concerning the difference between \(a_m^{{\text {SIA}}}\) and \(\tilde{a}_m\), the following theorem is deduced. Here, r, \(\rho\), and \(\varGamma _r\) are defined in the same manner as those for the Arnoldi approximation by replacing \(\mathcal {W}(K)\) with \(\mathcal {W}((\gamma I-K)^{-1})\).

Theorem 3.5

Let n satisfy \(m\ge 2n+1\), and let \(p_{m-n-1}\) be a polynomial of order \(m-n-1\) and \({\tilde{p}}_{n}\) be a polynomial of order n without the term of order 0 that satisfies \(a_m^{{\text {SIA}}}=p_{m-n-1}((\gamma I-K)^{-1})w_{m-n}+{\tilde{p}}_{n}(K^{-1})w_{m-n}\). Assume \(0\notin \varLambda (K)\) and the set \(\{w_1,\ldots ,w_{m-n},K^{-1}w_{m-n},\ldots ,K^{-n}w_{m-n}\}\) is linearly independent. If \(g_{\gamma }(z):=z^n(\gamma z-1)^{-n}\) is holomorphic in \(\varGamma _r\), the residual of \(a_m^{{\text {SIA}}}\) is evaluated as

$$\begin{aligned}&\Vert {\text {res}}(a_m^{{\text {SIA}}})-{\text {res}}(\tilde{a}_m)\Vert \le 2C\Vert {\tilde{p}}_{n}(K^{-1})w_{m-n}\Vert \frac{(\rho /r)^{m-n}}{1-(\rho /r)}, \end{aligned}$$

where \(C>0\) is a constant.

We use the following lemma for deriving Theorem 3.5.

Lemma 3.6

Let n satisfy \(m\ge 2n+1\). Assume \(0\notin \varLambda (K)\) and the set \(\{w_1,\ldots ,\) \(w_{m-n},K^{-1}w_{m-n},\ldots ,K^{-n}w_{m-n}\}\) is linearly independent. If \(1/\gamma \notin \mathcal {W}((\gamma I-K)^{-1})\), the following equality is deduced:

$$\begin{aligned}&{\text {res}}(a_m^{{\text {SIA}}})-{\text {res}}(\tilde{a}_m) \nonumber \\&\quad ={Q_mg_{\gamma }(\tilde{\mathbf {L}}_m)Q_m^*{\tilde{p}}_{n}(K^{-1})w_{m-n}-K^{-n}{\tilde{p}}_{n}(K^{-1})w_{m-n}}. \end{aligned}$$
(13)

Proof

Let \(\hat{w}_{m-n}=K^{-n}w_{m-n}\).

Then, we have

$$\begin{aligned} (\gamma I-K)^{-n}K^n\hat{w}_{m-n}&=(\gamma I-K)^{-n}(\gamma I-(\gamma I-K))^n\hat{w}_{m-n} \\&=(\gamma (\gamma I-K)^{-1}-I)^n\hat{w}_{m-n}. \end{aligned}$$

Since \(p((\gamma I-K)^{-1})w_{m-n}=Q_mp(\tilde{\mathbf {L}}_m)Q_m^*w_{m-n}\) and \(p((\gamma I-K)^{-1})\hat{w}_{m-n}=Q_mp(\tilde{\mathbf {L}}_m)Q_m^*\hat{w}_{m-n}\) holds for any polynomial p of an order less than or equal to n, the following equality is deduced:

$$\begin{aligned} \tilde{\mathbf {L}}_m^{n}Q_m^*w_{m-n}&=(\gamma \tilde{\mathbf {L}}_m-I)^nQ_m^*\hat{w}_{m-n}. \end{aligned}$$
(14)

By the assumption of \(1/\gamma \notin \mathcal {W}((\gamma I-K)^{-1})\) and the inclusion \(\mathcal {W}(\tilde{\mathbf {L}}_m)\subseteq \mathcal {W}((\gamma I-K)^{-1})\), \(\gamma \tilde{\mathbf {L}}_m-I\) is invertible. Therefore, by Eq. (14), the following equalities are deduced:

$$\begin{aligned}&{\text {res}}(a_m^{{\text {SIA}}})-{\text {res}}(\tilde{a}_m)=K^{-n}\tilde{a}_m-K^{-n}a_m^{{\text {SIA}}}\\&\quad =Q_mg_{\gamma }(\tilde{\mathbf {L}}_m)\tilde{\mathbf {K}}_m^nQ_m^*v-K^{-n}\left( p_{m-n-1}((\gamma I-K)^{-1})w_{m-n}+{\tilde{p}}_{n}(K^{-1})w_{m-n}\right) \\&\quad =Q_mg_{\gamma }(\tilde{\mathbf {L}}_m)Q_m^*\left( p_{m-n-1}((\gamma I-K)^{-1})w_{m-n}+{\tilde{p}}_{n}(K^{-1})w_{m-n}\right) \\&\qquad -\left( p_{m-n-1}((\gamma I-K)^{-1})K^{-n}w_{m-n}+K^{-n}{\tilde{p}}_{n}(K^{-1})w_{m-n}\right) \\&\quad =Q_mg_{\gamma }(\tilde{\mathbf {L}}_m)\left( p_{m-n-1}(\tilde{\mathbf {L}}_m)Q_m^*w_{m-n}+Q_m^*{\tilde{p}}_{n}(K^{-1})w_{m-n}\right) \\&\qquad -\left( Q_mp_{m-n-1}(\tilde{\mathbf {L}}_m)g_{\gamma }(\tilde{\mathbf {L}}_m)Q_m^*w_{m-n}+K^{-n}{\tilde{p}}_{n}(K^{-1})w_{m-n}\right) \\&\quad =Q_mg_{\gamma }(\tilde{\mathbf {L}}_m)Q_m^*{\tilde{p}}_{n}(K^{-1})w_{m-n}-K^{-n}{\tilde{p}}_{n}(K^{-1})w_{m-n}, \end{aligned}$$

which completes the proof of the lemma. \(\square\)

The vector \(Q_mg_{\gamma }(\tilde{\mathbf {L}}_m)Q_m^*{\tilde{p}}_{n}(K^{-1}){w_{m-n}}\) in the right-hand side of Eq. (13) is equivalent to the shift-invert Arnoldi approximation of the operator function \(g_{\gamma }((\gamma I-K)^{-1})\) acting on the vector \({\tilde{p}}_{n}(K^{-1})w_{m-n}\). In the same manner as the Arnoldi method, Theorem 3.5 is proved as follows.

Proof

(Proof of Theorem 3.5) By Crouzeix et al. [5] and Ellacott [7, Corollary 2.2], the following bound is deduced:

$$\begin{aligned}&\Vert Q_mg_{\gamma }(\tilde{\mathbf {L}}_m)Q_m^*{\tilde{p}}_{n}(K^{-1})w_{m-n}-K^{-n}{\tilde{p}}_{n}(K^{-1})w_{m-n}\Vert \\&\quad \le 2C_1C_2\Vert {\tilde{p}}_{n}(K^{-1})w_{m-n}\Vert \min _{p\in \mathcal {P}_{m-n-1}}\Vert g_{\gamma }-p\Vert _{\infty ,\mathcal {W}((\gamma I-K)^{-1})}\\&\quad \le 2C_1C_2\Vert {\tilde{p}}_{n}(K^{-1})w_{m-n}\Vert \frac{(\rho /r)^{m-n}}{1-(\rho /r)}, \end{aligned}$$

where \(0<C_1\le 1+\sqrt{2}\) and \(C_2=\max _{z\in \varGamma _r}\vert g_{\gamma }(z)\vert\), which completes the proof of the theorem. \(\square\)

Remark 3.7

If K is unbounded, 0 is contained in \(\varLambda ((\gamma I-K)^{-1})\). Since \(\varLambda ((\gamma I-K)^{-1})\subseteq \overline{\mathcal {W}((\gamma I-K)^{-1})}\subseteq {\varGamma _r}\) holds, 0 is contained in \({\varGamma _r}\). This is why \(g_{\gamma }\), which is always holomorphic at 0 and holomorphic in \(\mathbb {C}\) when \(\gamma =0\), is used instead of \(f_{\gamma }\), which is not holomorphic at 0, for evaluating Eq. (13).

Remark 3.8

We can choose r arbitrary as long as \(g_{\gamma }\) is holomorphic in \(\varGamma _r\). The choice of r corresponds to a trade-off between the decay rate \(\rho /r\) of \(\Vert {\text {res}}(a_m^{{\text {SIA}}})-{\text {res}}(\tilde{a}_m)\Vert\) and the magnitude of the constant \(C_2\). Indeed, the decay rate \(\rho /r\) is small if r is large. On the other hand, the larger r becomes, the smaller the distance between \(\varGamma _r\) and \(1/\gamma\), the singular point of \(g_{\gamma }\), becomes. Thus, the larger r becomes, the larger \(C_2=\max _{z\in \varGamma _r}\vert g_{\gamma }(z)\vert\) becomes.

As a result, if \(\gamma\) is chosen so that \(g_{\gamma }\) is holomorphic in \(\varGamma _r\), and if the factor \(\Vert {\tilde{p}}_{n}(K^{-1})w_{m-n}\Vert\) is bounded by some constant, the difference between residuals of \(a_m^{{\text {SIA}}}\) and \(\tilde{a}_m\), the minimizer of the residual, exponentially decays as m becomes larger. Unfortunately, evaluating \(\Vert \tilde{p}_{n}(K^{-1})w_{m-n}\Vert\) theoretically is a challenging task. Thus, we numerically confirm that the factor \(\Vert \tilde{p}_{n}(K^{-1})w_{m-n}\Vert\) is bounded by a constant in Sect. 4.2.

4 Numerical experiments

Several typical numerical experiments are implemented in this section. These experiments are in a collection of problems to illustrate that the shift-invert Arnoldi method performs better than the Arnoldi method and to confirm the results in Theorems 3.2 and 3.5 numerically. All numerical computations of these experiments are executed with Python 3.6.

All the experiments are under the setting described in Sect. 2. Therefore, Krylov subspaces are subspaces of RKHSs and discrepancies are measured by norms in the RKHSs. In practical computations, all \(\mu _t^S\)s in the algorithms are replaced by \(\mu _{t,N}^S\) for \(N\in \mathbb {N}\). The convergence of the approximation constructed with \(\mu _{t,N}^S\) to the one constructed with \(\mu _t^S\) is shown in [14, Section 4.3].

4.1 Comparison between the Arnoldi and shift-invert Arnoldi methods

To illustrate that the shift-invert Arnoldi method performs better than the Arnoldi method, the following dynamical system is considered in \(\mathcal {X}\subseteq \mathbb {R}\):

$$\begin{aligned} X_t=0.99X_{t-1}\cos (0.1X_{t-1})+\xi _t, \end{aligned}$$
(15)

where \(X_0=0.5\) and \(\xi _t\) (\(t=0,1,\ldots\)) are independent random variables with the Gaussian distribution with a mean of 0 and a standard deviation of 0.01. For \(n=1\), \(N=50\), \(m=2,3,\ldots ,12\), and \(v=\phi (x_{1600})\), the discrepancy \(\Vert a_m-a_{m-1}\Vert\), where \(a_m=a_m^{{\text {Arnoldi}}}\) or \(a_m=a_m^{{\text {SIA}}}\), are computed. The vector \(v_0\) is set as \(\varPhi (\mu _{0,N}^S)\) as defined in Sect. 2.2. In addition, we set \(\gamma =1+\mathrm {i}\), \(k(x,y)=e^{-\Vert x-y\Vert _2^2}\) and \(S=m+1\). The results of the mean values with 100 time-series data randomly generated by Eq. (15) are illustrated in Fig. 1. If K is bounded, and if the Krylov subspace converges to the dense subset of \(\mathcal {H}_k\), \(Q_m\tilde{\mathbf {K}}_mQ_m^*\) converges strongly to K. Therefore, \(a_m\) converges to Kv in \(\mathcal {H}_k\), and \(\Vert a_m-a_{m-1}\Vert\) decreases as m becomes larger in this case. However, in Fig. 1, \(\Vert a_m-a_{m-1}\Vert\) does not decrease as m becomes larger with the Arnoldi method. This is due to the unboundedness of K. On the other hand, that with the shift-invert Arnoldi method decreases as m becomes larger. The results indicate that the shift-invert Arnoldi method can address the unboundedness of K and is a better choice than the Arnoldi method in this case.

Fig. 1
figure 1

The convergence behavior of \(a_m\) along the dimensions of Krylov subspace m

4.2 Confirmation of Theorems 3.2 and 3.5

The following deterministic dynamical system on \(\mathcal {X}=\mathbb {R}\) is considered:

$$\begin{aligned} X_t=0.99X_{t-1}+0.2, \end{aligned}$$
(16)

where \(X_0=1\). In this example, we set \(n=1\), \(N=10\), \(S=15\), \(v_0=\varPhi (\mu _{0,N}^S)\), and \(v=\phi (x_{300})\).

To confirm the result of Theorem 3.5, we first check the assumptions of Theorem 3.5 and the behavior of \(\Vert \tilde{p}_n(K^{-1})w_{m-n}\Vert\). Regarding the condition \(0\notin \varLambda (K)\), the function \(h(x)=0.99x+0.2\) is bijective, and there is the inverse map \(h^{-1}\) of h, defined as \(h^{-1}(x)=(x-0.2)/0.99\). Therefore, the linear operator L defined as \(L\phi (x)=\phi (h^{-1}(x))\) is the inverse of the Perron–Frobenius operator K with respect to the dynamical system (16), i.e., \(L=K^{-1}\). Let \(k(x,y)=e^{-\vert x-y\vert }\) be the Laplacian kernel and we put \(\eta =1/0.99\), the coefficient of x in \(h^{-1}\). Then, \(K^{-1}\) is bounded. Indeed, for \(v\in \mathcal {H}_k\), there exists a sequence \(\{v_l\}_{l=1}^{\infty }\subseteq \mathcal {H}_{k,0}\) (see Sect. 2.1) such that \(v=\lim _{l\rightarrow \infty }v_l\). Let \(v_l\) be represented as \(v_l=\sum _{i=1}^{n(l)}\phi (x_i(l))c_i(l)\) for some \(x_i(l)\in \mathcal {X}\) and \(c_i(l)\in \mathbb {C}\). By the identity \(e^{-\vert x\vert }=\int _{\omega \in \mathbb {R}}e^{-\mathrm {i}x\omega }4/(1+\omega ^2)d\omega\) [38, Table 2], we have

$$\begin{aligned}&\Vert K^{-1}v_l-K^{-1}v_{l'}\Vert ^2 =\int _{\omega \in \mathbb {R}}G(l,l')\frac{4}{1+\omega ^2/\eta ^2}d\omega \nonumber \\&\quad \le \int _{\omega \in \mathbb {R}}G(l,l')\frac{4\eta ^2}{1+\omega ^2}d\omega =\eta ^2\bigg \Vert \sum _{i=1}^{n(l)}\phi (x_i(l))c_i(l)-\sum _{i=1}^{n(l')}\phi (x_i(l'))c_i(l')\bigg \Vert ^2 \nonumber \\&\quad =\eta ^2\Vert v_l-v_{l'}\Vert ^2, \end{aligned}$$
(17)

where

$$\begin{aligned}&G(l,l')= \bigg \vert \sum _{i=1}^{n(l)}c_i(l)e^{-\mathrm {i}x_i(l)\omega }-\sum _{i=1}^{n(l')}c_i(l')e^{-\mathrm {i}x_i(l')\omega }\bigg \vert ^2. \end{aligned}$$

Since \(\{v_l\}_{l=1}^{\infty }\) is a Cauchy sequence, Eq. (17) implies \(\{K^{-1}v_l\}_{l=1}^{\infty }\) is a Cauchy sequence. Thus, the sequence \(\{K^{-1}v_l\}_{l=1}^{\infty }\) converges. Therefore, \(K^{-1}\) is bounded.

Concerning the condition about the linearly independence, the orbit of the dynamical system (16) monotonically increases as i becomes large. Thus, the observations \(x_0,x_1,\ldots\) satisfy \(x_i\ne x_j\) for \(i\ne j\). Therefore, since k is set as the Laplacian kernel, the set \(\{\phi (x_0),\phi (x_1),\ldots \}\) is linearly independent. As a result, the condition about the linearly independence is satisfied since \(v_0\) is represented as \(v_0=\varPhi (\mu _{0,N}^S)=1/N\sum _{i=0}^{N-1}\phi (x_{iS})\) and by the definition of \(w_i\).

Regarding the holomorphicity of \(g_{\gamma }\), it has a singular point at \(1/\gamma\). Moreover, by the Cauchy–Schwarz inequality, the following inequality holds for \(v\in \mathcal {H}_k\), \(\Vert v\Vert =1\):

$$\begin{aligned} \vert \left\langle v,(\gamma I-K)^{-1}v\right\rangle \vert \le \Vert (\gamma I-K)^{-1}\Vert . \end{aligned}$$

Since the function \(H(\gamma ):=\Vert (\gamma I-K)^{-1}\Vert\) is continuous on \(\mathbb {C}{\setminus }\varLambda (K)\), we consider \(\Vert K^{-1}\Vert\) instead of \(\Vert (\gamma I-K)^{-1}\Vert\) for \(\gamma \approx 0\). By Eq. (17), \(\Vert K^{-1}\Vert \le \eta\) holds. Thus, for a sufficiently small \(\gamma\), we have \(\Vert (\gamma I-K)^{-1}\Vert \lesssim \eta\). This implies the numerical range \(\mathcal {W}((\gamma I-K)^{-1})\) is contained in the ball \(\mathbb {D}_{\rho }\) where \(\rho \approx \eta\). As a result, if we set \(\gamma\) as a sufficiently small value, \(\rho \approx \eta\), and \(\rho < r\lesssim 1/\vert \gamma \vert\), then \(g_{\gamma }\) is holomorphic in \(\varGamma _r\).

As for the behavior of \(\Vert \tilde{p}_n(K^{-1})w_{m-n}\Vert\), the following inequality is derived since \(\tilde{p}_n(z)=c_mz\) for some \(c_m\in \mathbb {C}\) in the case of \(n=1\):

$$\begin{aligned} \Vert \tilde{p}_n(K^{-1})w_{m-1}\Vert \le \vert c_m\vert \Vert K^{-1}\Vert \Vert w_{m-1}\Vert . \end{aligned}$$

Thus, the behavior of \(\Vert \tilde{p}_n(K^{-1})w_{m-n}\Vert\) is described by that of \(\vert c_m\vert \Vert w_{m-1}\Vert\). Figure 2 shows the value \(\vert c_m\vert \Vert w_{m-1}\Vert\) along m for \(\gamma =10^{-2}\) and \(\gamma =10^{-4}\). We can see \(\Vert \tilde{p}_n(K^{-1})w_{m-1}\Vert\) are bounded by around 0.1 for sufficiently large m in both cases.

Next, we consider the decay rate of \(\Vert {\text {res}}(a_m^{{\text {SIA}}})-{\text {res}}(\tilde{a}_m)\Vert\). We can set \(\rho \approx \eta\) and \(\rho < r\lesssim 1/\vert \gamma \vert\). However, as we mentioned in Remark 3.8, choosing r as a larger value makes the constant \(C_2\) in Theorem 3.5 larger. Figure 3 illustrates the value \(\Vert {\text {res}}(a_m^{{\text {SIA}}})-{\text {res}}(\tilde{a}_m)\Vert\) along m. We can see the decay rate is around 9/10. Thus, we set \(\rho =\eta \approx 1\) and \(r=10/9\), and computed the following theoretical upper bound in accordance with the proof of Theorem 3.5:

$$\begin{aligned} b(\rho ,r,m):=2(1+\sqrt{2})C_2(r)\cdot 0.1\frac{(\rho /r)^{m-1}}{1-(\rho /r)}, \end{aligned}$$

where \(C_2(r)=\max _{z\in \varGamma _r}\vert g_{\gamma }(z)\vert\). We can see the theoretical upper bound describes the decay of the value \(\Vert {\text {res}}(a_m^{{\text {SIA}}})-{\text {res}}(\tilde{a}_m)\Vert\) correctly.

Fig. 2
figure 2

The dimension of Krylov subspace m versus \(\vert c_m\vert \Vert \Vert w_{m-1}\Vert\) (left: \(\gamma =10^{-2}\), right: \(\gamma =10^{-4}\))

Fig. 3
figure 3

The dimension of Krylov subspace m versus \(\Vert {\text {res}}(a_m^{{\text {SIA}}})-{\text {res}}(\tilde{a}_m)\Vert\) (left: \(\gamma =10^{-2}\), right: \(\gamma =10^{-4}\))

Finally, we confirm Theorem 3.2. In the same manner as the numerical range of \(K^{-1}\), that of K is shown to be contained in the ball \(\mathbb {D}_{0.99}\). Unfortunately, this evaluation is not sufficient for checking the holomorphicity of \(f_m\) defined in Theorem 3.2 and determine \(\rho\) and r since \(f_m\) has a singular point at 0. Figure 4 illustrates the value \(\Vert {\text {res}}(a_m^{{\text {Arnoldi}}})-{\text {res}}(\tilde{a}_m)\Vert\) along m. It decays for small m, but grows for larger m, which implies the assumption of the holomorphicity of \(f_m\) in Theorem 3.2 would not be satisfied for this example.

Fig. 4
figure 4

The dimension of Krylov subspace m versus \(\Vert {\text {res}}(a_m^{{\text {Arnoldi}}})-{\text {res}}(\tilde{a}_m)\Vert\)

4.3 Confirmation of the decrease in the residuals

To confirm the decrease in the residuals of the Arnoldi and shift-invert Arnoldi approximations, experiments with synthetic data generated by the Landau equation and real-world data are implemented.

4.3.1 Experiments with data generated by the Landau equation

The following Landau equation in \(\mathcal {X}\subseteq [0,\infty )\) [4] is considered:

$$\begin{aligned} \frac{dr}{dt}=0.5r-r^3. \end{aligned}$$
(18)

Discretizing Eq. (18) and adding random noise results in the following discrete dynamical system:

$$\begin{aligned} X_t=X_{t-1}+\varDelta t(0.5X_{t-1}-X_{t-1}^3+\xi _t), \end{aligned}$$
(19)

where \(X_0=1\) and \(\xi _t\) (\(t=0,1,\ldots\)) are independent random variables with the Gaussian distribution with a mean of 0 and a standard deviation of 0.01. For \(n=1\), \(N=50\), \(m=2,3,\ldots ,12\), \(v_0=\varPhi (\mu _{0,N}^S)\), and \(v=\phi (x_{1600})\), the residual \(\Vert {\text {res}}(a_m)\Vert\), where \(a_m=a_m^{{\text {Arnoldi}}}\) or \(a_m=a_m^{{\text {SIA}}}\) are computed. In addition, we set \(\varDelta t=0.01\), \(\gamma =1+\mathrm {i}\), \(k(x,y)=e^{-\vert x-y\vert }\), and \(S=m+1,15\). The computational results of mean values with 100 time-series data randomly generated by Eq. (19) are shown in Fig. 5. The results indicate that the residual \(\Vert {\text {res}}(a_m)\Vert\) decreases as m increases, both with the Arnoldi and shift-invert Arnoldi methods and regardless of whether S depends on m or not. Although the residuals where \(S=m+1\) are larger than those where \(S=15\), they decrease faster than those where \(S=15\).

Fig. 5
figure 5

The dimension of Krylov subspace m versus the residual \(\Vert {\text {res}}(a_m)\Vert\) (left: \(S=15\), right: \(S=m+1\))

4.3.2 Experiments with real-world data

For the last experiment, real-world teletraffic data,Footnote 1 as shown in Fig. 6, is used. For \(t=0,1,\ldots\), \(x_t\) represents the amount of teletraffic (gbps) that passes through a certain node (ID 12) in a network composed of 23 nodes and 227 links at time t. The time width is 15 seconds. To extract the relationship between \(x_{t+1}\) and \(x_{t-p+1},\ldots ,x_{t}\), \(x_t\) is considered to be generated by a random variable \(Y_t\), and \(X_t=[Y_t,\ldots ,Y_{t-p+1}]\) is set for obtaining the relation (1) with \(\mathcal {X}\subseteq \mathbb {R}^p\). For \(n=1,2\), \(N=50\), \(p=1,3\), \(m=2,3,\ldots ,12\), \(v_0=\varPhi (\mu _{0,N}^S)\), and \(v=\phi ([x_{1600},\ldots ,x_{1600+p-1}])\), the residual \(\Vert {\text {res}}(a_m)\Vert\), where \(a_m=a_m^{{\text {Arnoldi}}}\) or \(a_m=a_m^{{\text {SIA}}}\), are computed. In addition, we set \(\gamma =1+\mathrm {i}\), \(k(x,y)=e^{-\vert x-y\vert }\), and \(S=15\). The results are shown in Fig. 7. For \(p=1\), the residual \(\Vert {\text {res}}(a_m)\Vert\) does not always decrease when m becomes larger with either the Arnoldi or shift-invert Arnoldi method. This implies that the relationship between \(x_{t+1}\) and \(x_{t}\) does not describe the behavior of the data fully. For \(p=3\), the residual of the shift-invert Arnoldi method decreases as m becomes larger, whereas that of the Arnoldi method does not decrease even if m becomes larger. This implies that the relationship between \(x_{t+1}\) and \(x_{t},\ldots ,x_{t-2}\) describes the behavior of the data, but the unboundedness of the Perron–Frobenius operator prevents the Arnoldi method from constructing a proper approximation from the data or the assumption about the holomorphicity of \(f_m\) in Theorem 3.2 is not satisfied. The advantage of the shift-invert Arnoldi method over the Arnoldi method is also underlined in this example.

Fig. 6
figure 6

Real-world teletraffic data at each t

Fig. 7
figure 7

The dimension of Krylov subspace m versus the residual \(\Vert {\text {res}}(a_m)\Vert\) (left: \(p=1\), right: \(p=3\))

5 Conclusion

We investigated a convergence analysis for Krylov subspace methods for estimating operator-vector multiplications \(K^nv\), where K is a linear operator and v is a vector in a Hilbert space, in this paper. The Arnoldi method and shift-invert Arnoldi method were considered. Although these methods have been proposed for time-series data analysis in machine learning, their convergence has not been thoroughly analyzed. We proved that the Arnoldi approximation converges to the minimizer of the residual. For the shift-invert Arnoldi method, the derivation was not straightforward since the function that appears in the estimated operator-vector multiplications was not holomorphic. This problem was addressed by applying the factor \(K^{-n}\) that appears in the residual. As a result, we showed the shift-invert Arnoldi approximation converges to the minimizer of the residual on the assumption that a factor related to the initial vector and \(K^{-1}\) is bounded by some constant. The aforementioned results were also confirmed numerically with synthetic and real-world data.

As future work, we will theoretically evaluate the aforementioned factor and give sufficient conditions for the convergence of the shift-invert Arnoldi approximation.