1 Introduction

In this paper, we present an extension of the non-Hermitian Lanczos algorithm (see, e.g., [25]) where the inputs are a 4-mode tensor \(\boldsymbol {\mathcal {A}} \in \mathbb {C}^{N \times N \times M \times M}\) and vectors \(\mathbf {w}, \mathbf {v} \in \mathbb {C}^{N}\) so that wHv≠ 0. We aim to use the introduced algorithm to approximate the bilinear form wHU(t)v, where \(\boldsymbol {\mathsf {U}}(t) \in \mathbb {C}^{N \times N}\) is the so-called time-ordered exponential, i.e., the solution of the ordinary differential equation

$$ \frac{d}{dt}\boldsymbol{\mathsf{U}}(t) = \boldsymbol{\mathsf{A}}(t) \boldsymbol{\mathsf{U}}(t), \quad \boldsymbol{\mathsf{U}}({a})=I_{N}, \quad t \in I = [{a},b], $$
(1)

where \(I_{N}\in \mathbb {R}^{N\times N}\) is the identity matrix and \(\boldsymbol {\mathsf {A}}(t) \in \mathbb {C}^{N \times N}\) is a smooth matrix-valued function defined on the real interval I. Equation (1) can emerge in a variety of applications. For example, its solution is crucial in quantum physics where the matrix A(t) corresponds to the Hamiltonian operator. Situations where U(t) has no accessible expression are frequent in literature, see, e.g., [1, 7, 38, 49]. For instance, in Nuclear Magnetic Resonance (NMR) experiments, the associated bilinear form wHU(t)v represents the measurement of changes in an applied magnetic field caused by nuclear spins that are excited with electromagnetic waves, i.e., spectroscopy [29, 39]. Other applications are found in control theory, filter design, and model reduction problems [4, 5, 14, 37, 47]. In the mentioned applications, the matrix A(t) is often large-to-huge and sparse. The introduced algorithm is motivated and theoretically supported by a new expression for the bilinear form. This expression is given by combining the two symbolic methods known as Path-sum and ⋆-Lanczos algorithm [21,22,23,24]. Given the matrix-valued function A(t) and the vectors w,v, the two symbolic methods produce an expression for the bilinear form wHU(t)v composed of a finite and treatable number of integrals and scalar integral equations. To our knowledge, no other symbolic method can express the bilinear form with a treatable finite number of integral subproblems. Two commonly used alternative expressions are given by the Magnus series, i.e., an infinite series of nested integrals (e.g., [40]), and by the Floquet theory, where the solution of an infinite system of coupled linear differential equations is required (e.g., [7]).

The integrals and the integral equations generated by the ⋆-Lanczos and Path-sum methods do not always have an easily accessible solution. As a consequence, a numerical approach is needed. A possible strategy for the numerical approximation of the mentioned integrals and the integral equations is outlined in [23] and it is based on the discretization of the interval I into M − 1 equispaced subintervals. The algebraic objects resulting from the discretization strategy are the 4-mode tensor \(\boldsymbol {\mathcal {A}}\) (corresponding to A(t)) and the 3-mode tensors V,W (corresponding to v,w).

The outputs obtained by combining the ⋆-Lanczos algorithm with the mentioned discretization strategy are mathematically equivalent to the outputs of the tensor Lanczos algorithm presented here with, as inputs, \(\boldsymbol {\mathcal {A}}, \mathbf {v}, \mathbf {w}\). The main goal of this paper is to show that, in fact, the tensor Lanczos algorithm can converge to the outcome of the ⋆-Lanczos method within an accuracy of the same order as the discretization strategy. Moreover, the reported numerical experiments will show that the approximation of wHU(t)v obtained by combining the tensor Lanczos with the discretized Path-sum approach also converges to the solution within the order of the discretization. Naturally, many numerical methods for the solution of non-autonomous ODEs can be found in literature, see, for instance, [2, 6, 7, 10, 13, 15, 30, 34,35,36]. For large matrices, these numerical methods are known to be highly demanding both in terms of computational cost and storage. This motivates the research of novel approaches suitable for large-scale problems. In order to be competitive with the most advanced techniques, tensor Lanczos needs to be used in combination with more accurate discretization schemes. Development of suitable, faster converging discretization schemes is an ongoing research and out of the scope of this work. At the same time, it is important to note that the algorithm here proposed is part of a wider class of tensor extensions of Krylov subspace methods that recently appeared in the literature, see, e.g., [18, 26, 33, 46].

The Lanczos-type process we introduce can also be equivalently written as a block Lanczos method since the 4-mode tensor \(\boldsymbol {\mathcal {A}}\) can be seen as a block matrix; information about block Krylov subspace methods can be found, e.g., in [20]. Despite this fact, we prefer to interpret such a block structure in a tensorial fashion. Indeed, the tensorial approach has a direct translation in terms of a discretized ⋆-Lanczos algorithm. Moreover, as we will experimentally show, this interpretation is motivated by observing that several tensors from real-world examples related with (1) are characterized by a low parametric approximation known as Tensor Train decomposition (TT) [41, 42]. Such a low parametric approximation allows to efficiently manipulate and store the tensors. This paves the path for further improvements of our proposal, where the TT structure is fully exploited in the Lanczos-type procedure. Examples of tensor Krylov subspace methods combined with the TT decomposition can be found in [18, 48], further motivating our tensor-based point of view.

More in detail, this work is organized as follows. Preliminaries and definitions of tensor operations are given in Section 2. In Section 3 we discuss how to construct the non-Hermitian Lanczos procedure for tensors and we prove several crucial properties. In Section 4 we discuss the breakdown issue which typically arises when working with non-Hermitian Lanczos approaches. Numerical experiments are presented in Section 5 where we also give several examples exposing the low-rank TT structure of the considered tensors \(\boldsymbol {\mathcal {A}}\). Section 6 concludes the paper and A contains several proofs.

2 Preliminaries

In this work, we use a notation borrowed from Matlab®;. Fixing \(i_{1}\in \{1,\dots ,N_{1}\}\) and \(i_{2}\in \{1,\dots ,N_{2}\}\), if \(\boldsymbol {\mathcal {A}} \in \mathbb {C}^{N_{1} \times N_{2} \times M \times M }\), then \(\boldsymbol {\mathcal {A}}_{i_{1},i_{2},:,:}\) stands for the matrix

$$ \boldsymbol{\mathcal{A}}_{i_{1},i_{2},:,:}:=\left[\boldsymbol{\mathcal{A}}_{i_{1},i_{2},j_{1},j_{2}}\right]_{j_{1},j_{2}= 1}^{M}. $$

This notation similarly applies to 3-mode tensors, matrices, and vectors. Table 1 summarizes the notation used in the paper.

Table 1 Summary of notation

In the following, we define several tensorial operations, which can be seen as generalizations of the usual products involving matrices and vectors. We summarize them in Table 2.

Table 2 Each of the considered products involves two tensors with n and m modes and it gives as an outcome a tensor with k-modes

In the following definitions we consider the tensors \(\boldsymbol {\mathcal {A}} \in \mathbb {C}^{N_{1}\times N_{2} \times M \times M },\boldsymbol {{\mathscr{B}}} \in \mathbb {C}^{N_{2} \times N_{3} \times M \times M }\), \(A \in \mathbb {C}^{N_{2} \times M \times M }\), \(B \in \mathbb {C}^{N_{1} \times M \times M }\), \(\boldsymbol {\alpha } \in \mathbb {C}^{M \times M}\). Moreover, the indices \(i_{1}\in \{1,\dots ,N_{1}\}\) \(i_{2}\in \{1,\dots ,N_{2}\}\) are fixed.

Definition 1 (∗-Tensor product)

The product \((\boldsymbol {\mathcal {A}}*\boldsymbol {{\mathscr{B}}})\in \mathbb {C}^{N_{1} \times N_{3} \times M \times M } \) is defined as

$$ (\boldsymbol{\mathcal{A}}*\boldsymbol{\mathcal{B}})_{i_{1},i_{2},:,:}:= {\sum}_{k=1}^{N_{2}}{\boldsymbol{\mathcal{A}}_{i_{1},k,:,:}} \boldsymbol{\mathcal{B}}_{k,i_{2},:,:} . $$

Definition 2 (Tensor-Hypervector product)

The product \((\boldsymbol {\mathcal {A}}*A) \in \mathbb {C}^{N_{1} \times M \times M }\) is defined as

$$ (\boldsymbol{\mathcal{A}}*A)_{i_{1},:,:}:= {\sum}_{k=1}^{N_{2}}{\boldsymbol{\mathcal{A}}_{i_{1},k,:,:}} A_{k,:,:}. $$

We also need to define the action of a 3-mode tensor from the left. Every tensor with three modes that acts, will act, or is the outcome of a ∗-product from the left, will be denoted with a “D” (dual) as apex, and, in the remainder of this work, we will use \(B^{D}_{k,:,:}\) to denote (BD)k,:,:. We define, \( (B^{D}*\boldsymbol {\mathcal {A}})^{D}\in \mathbb {C}^{N_{2} \times M \times M }\) as

$$ (B^{D}*\boldsymbol{\mathcal{A}})^{D}_{i_{2},:,:}:= {\sum}_{k=1}^{N_{1}}{{B^{D}_{k,:,:}} \boldsymbol{\mathcal{A}}_{k,i_{2},:,: }}. $$

Note that the following 4-mode tensor is the identity for ∗-products introduced above

$$ \mathbb{C}^{N_{1} \times N_{1} \times M \times M} \ni (\boldsymbol{\mathcal{I}}_{*})_{i_{1},i_{2},: , :}:= \left \{\begin{array}{ll} I_{M}, & \text{ if } i_{1}=i_{2} \\ 0_{M}, & \text{otherwise} \end{array} \right. . $$

Definition 3 (Hypervector inner-product)

The product \((B^{D}*A) \in \mathbb {C}^{M\times M}\) is defined as

$$ (B^{D}*A)_{:,:}:= {\sum}_{k=1}^{N_{1}}{B^{D}_{k,:,:} A_{k,:,: }}. $$

Definition 4 (Tensor-matrix product)

The products \((\boldsymbol {\mathcal {A}} \times \boldsymbol {\alpha }), (\boldsymbol {\alpha } \times \boldsymbol {\mathcal {A}} ) \in \mathbb {C}^{N_{1} \times N_{2} \times M \times M }\) are defined as

$$ (\boldsymbol{\mathcal{A}} \times \boldsymbol {\alpha})_{i_{1},i_{2},:,:}:= {\boldsymbol{\mathcal{A}}_{i_{1},i_{2},:,: }} \boldsymbol {\alpha} \quad \text{ and } \quad (\boldsymbol {\alpha} \times \boldsymbol{\mathcal{A}} )_{i_{1},i_{2},:,:}:= {\boldsymbol {\alpha} \boldsymbol{\mathcal{A}}_{i_{1},i_{2},:,: }}. $$

Definition 5 (Hypervector-matrix product)

The products \(({A} \times \boldsymbol {\alpha }), (\boldsymbol {\alpha } \times {A} ) \in \mathbb {C}^{N_{2} \times M \times M }\) are defined as

$$ ({A} \times \boldsymbol {\alpha})_{i_{1},:,:}:= {{A}_{i_{1},:,: }}\boldsymbol {\alpha} \quad \text{ and } \quad (\boldsymbol {\alpha} \times {A} )_{i_{1},:,:}:= {\boldsymbol {\alpha}{A}_{i_{1},:,: }}. $$

Definition 6 (Vector-to-Hypervector)

Given \(\mathbf {a} \in \mathbb {C}^{N}\) we define the product \( A=\mathbf {a}\otimes I_{M} \in \mathbb {C}^{N \times M \times M}\) as

$$A_{i_{1},:,:}=\mathbf{a}_{i_{1}}I_{M} \quad i_{1} \in \{1,\dots,N\}.$$

Note that rearranging A as a block matrix, we get the usual Kronecker product. All the products are clearly distributive with respect to the usual addition. On the other hand, the associativity of some of the products is less obvious. Therefore, we state it in the following Lemma 1, postponing its proof to A.

Lemma 1

The following states that the tensor-tensor and tensor-hypervector ∗-products are associative.

  • Given \(\boldsymbol {\mathcal {A}} \in \mathbb {C}^{N_{1} \times N_{1} \times M \times M}\), \(A \in \mathbb {C}^{N_{1} \times M \times M}\) we have

    $$ (\boldsymbol{\mathcal{A}}*\boldsymbol{\mathcal{A}})*{A}= \boldsymbol{\mathcal{A}}*(\boldsymbol{\mathcal{A}}*{A}){.} $$
  • Given \(B^{D} \in \mathbb {C}^{N_{1} \times M \times M}\), \({A} \in \mathbb {C}^{N_{2} \times M \times M }\), \(\boldsymbol {\mathcal {A}} \in \mathbb {C}^{N_{1}\times N_{2} \times M \times M }\), then

    $$ (B^{D} * \boldsymbol{\mathcal{A}})^{D} *A=B^{D} * (\boldsymbol{\mathcal{A}} *A). $$
  • Given \(\boldsymbol {\mathcal {A}} \in \mathbb {C}^{N_{1}\times N_{2} \times M \times M }, \boldsymbol {{\mathscr{B}}} \in \mathbb {C}^{N_{2} \times N_{3} \times M \times M }, \boldsymbol {\mathcal {C}} \in \mathbb {C}^{N_{3}\times N_{1} \times M \times M }\), then

    $$ (\boldsymbol{\mathcal{C}}*\boldsymbol{\mathcal{A}})*\boldsymbol{\mathcal{B}}=\boldsymbol{\mathcal{C}}*(\boldsymbol{\mathcal{A}}*\boldsymbol{\mathcal{B}}). $$

Having introduced the required products and their basic properties, we are ready to derive the tensor non-Hermitian Lanczos algorithm.

3 The Lanczos-type process

Using the operations given in Table 2, we propose a sensible generalization of Krylov subspaces where, instead of the usual matrix-vector product, the tensor-hypervector product is used to generate the subspaces. Section 3.1 describes these tensor Krylov-type subspaces in detail and defines biorthogonal bases for them. A Lanczos-type algorithm which generates these biorthogonal bases is proposed in Section 3.2. In Section 3.3 two important properties of the classical Lanczos algorithm are generalized, namely, the tensor representation of the three-term recurrence relations for the biorthogonal bases and the matching moment property. The computational cost and storage requirements of the algorithm are discussed in Section 3.4.

3.1 Krylov-type tensor subspaces

Consider the tensor \(\boldsymbol {\mathcal {A}}\in \mathbb {C}^{N \times N \times M \times M}\). We define the polynomials of degree of \(\boldsymbol {\mathcal {A}}\) as

$$ \begin{array}{@{}rcl@{}} p(\boldsymbol{\mathcal{A}}) &:= \sum\limits_{k=0}^{{\ell}} \boldsymbol{\mathcal{A}}^{k_{*}} \times \boldsymbol {\alpha}_{k} ,\\ p^{D}(\boldsymbol{\mathcal{A}}) &:= \sum\limits_{k=0}^{{\ell}} \boldsymbol {\alpha}_{k}^{H} \times \boldsymbol{\mathcal{A}}^{k_{*}} , \end{array} $$

where \(\boldsymbol {\mathcal {A}}^{k_{*}}\) stands for k ∗-multiplications of \(\boldsymbol {\mathcal {A}}\) by itself, and \(\boldsymbol {\alpha }_{k}^{H}\) is the conjugate transpose of α. Given the tensors \({A}\in \mathbb {C}^{N \times M \times M}\), \(B\in \mathbb {C}^{N \times M \times M}\) we can define the Krylov-type subspaces

$$ \begin{array}{@{}rcl@{}} \mathcal{K}_{n}(\boldsymbol{\mathcal{A}},A)&:=& \{p(\boldsymbol{\mathcal{A}})*A \text{ s.t. } deg(p) \leq n-1 \},\\ \mathcal{K}_{n}^{D}(B^{D},\boldsymbol{\mathcal{A}})&:=& \{B^{D} * p^{D}(\boldsymbol{\mathcal{A}}) \text{ s.t. } deg(p^{D}) \leq n-1 \}. \end{array} $$

Every element in \(\mathcal {K}_{n}(\boldsymbol {\mathcal {A}},A)\) is a tensor in \(\mathbb {C}^{N \times M \times M}\) and can be written as

$$ p(\boldsymbol{\mathcal{A}})* A = {\sum}_{k=0}^{n-1} (\boldsymbol{\mathcal{A}}^{k_{*}}\times \boldsymbol {\alpha}_{k} ) *A. $$

From now on we will assume that A is of the form A = aIM for some \(\mathbf {a} \in \mathbb {C}^{N}\). In this case, the matrices αk commute with A giving

$$ p(\boldsymbol{\mathcal{A}})* A = {\sum}_{k=0}^{n-1} (\boldsymbol{\mathcal{A}}^{k_{*}} *A)\times \boldsymbol {\alpha}_{k}. $$

An analogous result holds for \(B^{D}*p^{D}(\boldsymbol {\mathcal {A}})\) when \(B^D=\overline {\mathbf {b}} \otimes I_M\) for any \(\mathbf {b} \in \mathbb {C}^{N}\), \(\overline {\mathbf {b}}\) is the conjugated vector.

Driven by the analogy with the matrix case, our aim is to build two “biorthonormal bases” for the Krylov-type subspaces \(\mathcal {K}_{n}(\boldsymbol {\mathcal {A}},A)\) and \( \mathcal {K}_{n}^{D}(B^{D}, \boldsymbol {\mathcal {A}})\). The following Definition 7 allows to characterize spaces spanned by 3-mode tensors.

Definition 7

Given \(V_{1},\dots ,V_{n} \in \mathbb {C}^{N \times M \times M}\), \({W_{1}^{D}},\dots ,{W^{D}_{n}} \in \mathbb {C}^{N \times M \times M}\), we define the subspaces

$$ \begin{array}{@{}rcl@{}} \langle V_{1},\dots, V_{n} \rangle&:=& \left\{V= \sum\limits_{{k}=1}^{n} V_{{k}} \times \boldsymbol{\eta}_{{k}}, \textrm{ for } \boldsymbol{\eta}_{1},\dots, \boldsymbol{\eta}_{n} \in \mathbb{C}^{M \times M}\right\}; \\ \langle {W_{1}^{D}},\dots, {W_{n}^{D}} \rangle&:=& \left\{W^{D}= \sum\limits_{{k}=1}^{n} \boldsymbol{\eta}_{{k}} \times W^{D}_{{k}}, \textrm{ for } \boldsymbol{\eta}_{1},\dots, \boldsymbol{\eta}_{n} \in \mathbb{C}^{M \times M} \right\}. \end{array} $$

We say that \(V_{1},\dots , V_{n}\) is a basis for the subspace \(\langle V_{1},\dots , V_{n} \rangle \) and \({W_{1}^{D}},\dots , {W_{n}^{D}}\) is a basis for the subspace \(\langle {W_{1}^{D}},\dots , {W_{n}^{D}} \rangle \).

Biorthonormal bases for Krylov-type subspaces are represented by the tensors \(\boldsymbol {\mathcal {V}}_{n} \in \mathbb {C}^{N\times n \times M \times M }\) and \(\boldsymbol {\mathcal {W}}_{n}\in \mathbb {C}^{n\times N \times M \times M }\) satisfying

$$ \boldsymbol{\mathcal{W}}_{n}*\boldsymbol{\mathcal{V}}_{n}=\boldsymbol{\mathcal{I}}_{*} \in \mathbb{R}^{n \times n \times M \times M}, $$
(2)

with the hypervectors \(V_{k}:= (\boldsymbol {\mathcal {V}}_{n})_{:,k,:,:}\) and \({W^{D}_{k}}:= (\boldsymbol {\mathcal {W}}_{n})_{k,:,:,:}\), for \(k=1,\dots ,n\), forming, respectively, bases for \(\mathcal {K}_{n}(\boldsymbol {\mathcal {A}},A)\) and \( \mathcal {K}_{n}^{D}(B^{D}, \boldsymbol {\mathcal {A}})\), i.e.,

$$ \begin{array}{@{}rcl@{}} \langle V_{1},\dots, V_{n} \rangle = \mathcal{K}_{n}(\boldsymbol{\mathcal{A}},A), \quad \langle {W_{1}^{D}},\dots, {W_{n}^{D}} \rangle = \mathcal{K}_{n}^{D}(B^{D}, \boldsymbol{\mathcal{A}}). \end{array} $$

In the following section we derive such bases by constructing the tensor non-Hermitian Lanczos Algorithm.

3.2 The tensor Lanczos process

Given the inputs \(\boldsymbol {\mathcal {A}}\in \mathbb {C}^{N\times N \times M \times M}\) and \(\mathbf {v}, \mathbf {w} {\in \mathbb {C}^{N}}\), Algorithm 1 constructs, when no breakdown occurs, the bases \(\boldsymbol {\mathcal {W}}_{n}\) and \(\boldsymbol {\mathcal {V}}_{n}\), for \(\mathcal {K}_{n}(\boldsymbol {\mathcal {A}},A)\) and \( \mathcal {K}_{n}^{D}(B^{D}, \boldsymbol {\mathcal {A}})\), respectively, which satisfy the ∗-biorthogonality conditions (2).

figure a

Details on how the algorithm constructs these bases using three-term recurrences are described below.

  • By definition, the first hypervectors of the bases are \({W_{1}^{D}}, V_{1}\) satisfying \({W_{1}^{D}}*V_{1}=I_{M}\);

  • Consider the vector \(\widehat {W}^{D}_{2} \in \mathcal {K}^{D}_{2}(W^{D},\boldsymbol {\mathcal {A}})\) given by

    $$ \widehat{W}^{D}_{2}:={W_{1}^{D}}*\boldsymbol{\mathcal{A}}-\boldsymbol {\alpha}_{1} \times {W_{1}^{D}}. $$

    Imposing that \(\widehat {W}_{2}^{D}\) satisfies the ∗-biorthogonal condition \(\widehat {W}_{2}^{D}*V_{1}=0\), we have \(\boldsymbol {\alpha }_{1}={W_{1}^{D}} * \boldsymbol {\mathcal {A}} *V_{1}\).

  • Analogously, define the vector \(\widehat {V}_{2} \in \mathcal {K}_{2}(\boldsymbol {\mathcal {A}},V)\) given by

    $$ \widehat{V}_{2} := \boldsymbol{\mathcal{A}}*V_{1}-V_{1}\times \boldsymbol {\alpha}_{1}. $$

    Imposing the ∗-biorthogonality condition, we find the ∗-biorthogonal vectors

    $$ V_{2}:=\widehat{V}_{2}\times \boldsymbol {\beta}_{1}^{-1} \text{ where } \boldsymbol {\beta}_{1}=\widehat{W}_{2}^{D}*\widehat{V}_{2}=\widehat{W}_{2}^{D}*V_{1} \text{ and } W_{2}=\widehat{W}_{2}. $$
  • Clearly \(\mathcal {K}_{2}({\boldsymbol {\mathcal {A}},V})=\langle V_{1},V_{2} \rangle \) and \(\mathcal {K}_{2}^{D}(W^{D},{\boldsymbol {\mathcal {A}}})=\langle {W^{D}_{1}},{W^{D}_{2}} \rangle \).

  • Now, assume the ∗-biorthonormal bases \(V_{1},\dots ,V_{{k}}\) and \({W^{D}_{1}},\dots ,W^{D}_{{k}}\) are available. Consider the hypervector

    $$ \widehat{W}^{D}_{{k}+1}:={W}_{{k}}^{D}*\boldsymbol{\mathcal{A}}-{\sum}_{i=1}^{{k}}\boldsymbol{\boldsymbol{\eta}}_{i} \times {W}_{i}^{D}. $$

    The matrices ηi are determined by the conditions \(\widehat {W}_{{k}+1}^{D}*V_{i}=0\), for \(i~=~1,\dots ,~k\), which give

    $$ \boldsymbol{\eta}_{i}=W_{{k}}^{D}*\boldsymbol{\mathcal{A}}*V_{i}, \quad \text{ for } i=1,\dots,{k}. $$

    In particular, since \(\boldsymbol {\mathcal {A}}*V_{i} \in \mathcal {K}_{i+1}(\boldsymbol {\mathcal {A}},V )\), we get ηi = 0 for \(i=1,\dots ,{k}-2\). An analogous argument is valid for \(\widehat {V}_{k+1}\). This leads to the following three-term recurrences

    $$ \begin{array}{@{}rcl@{}} W_{k+1}^{D}={W_{k}^{D}}*\boldsymbol{\mathcal{A}}-\boldsymbol {\alpha}_{k}\times {W_{k}^{D}}-\boldsymbol {\beta}_{k}\times W_{k-1}^{D}, \end{array} $$
    (3a)
    $$ \begin{array}{@{}rcl@{}} V_{k+1}\times \boldsymbol {\beta}_{k+1}=\boldsymbol{\mathcal{A}} *V_{k}- V_{k}\times \boldsymbol {\alpha}_{k}-V_{k-1}, \end{array} $$
    (3b)

    with coefficients

    $$ \boldsymbol {\alpha}_{k}= {W_{k}^{D}}*\boldsymbol{\mathcal{A}}*V_{k}, \boldsymbol {\beta}_{k+1}=W_{k+1}^{D}*\boldsymbol{\mathcal{A}}*V_{k}. $$
    (4)
  • To prove that \(\langle V_{1}, \dots , V_{n} \rangle =\mathcal {K}_{n}(\boldsymbol {\mathcal {A}},V)\) and \(\langle {W_{1}^{D}}, \dots , {W_{n}^{D}} \rangle =\mathcal {K}_{n}(W^{D},\boldsymbol {\mathcal {A}})\), it is enough to use induction and the fact that \(V_{{k}} \in \mathcal {K}_{{k}}(\boldsymbol {\mathcal {A}},V)\) and \(W_{{k}}^{D} \in \mathcal {K}^{D}_{{k}}(W^{D},\boldsymbol {\mathcal {A}})\) for all \({k}=1,\dots ,n\).

Let us finally observe that, should βk+ 1 not be invertible, we would get a breakdown in the algorithm.

Different rescaling strategies are possible by setting an invertible coefficient γk+ 1 and noticing that

$$ (\boldsymbol{\gamma}_{k+1})^{-1} \times W_{k+1}^{D} \ast V_{k+1} \times \boldsymbol{\gamma}_{k+1} = {I_{M}}. $$

This last observation completes the construction of Algorithm 1.

3.3 Main properties of the tensor Lanczos algorithm

It is important to note that the coefficients in the three-term recurrences (3.23.2) can be represented by a sparse 4-mode tensor. To this aim, let us consider \(\boldsymbol {\mathcal {T}}_{n} \in \mathbb {C}^{n \times n \times M \times M}\) as the tensor defined as

$$ (\boldsymbol{\mathcal{T}}_{n})_{i_{1},i_{2}, :, :}:= \left \{\begin{array}{lll} \boldsymbol {\alpha}_{i_{1}}, & \text{ if } i_{1}=i_{2} & \text{ and } 1 \leq i_{1} \leq n \\ \boldsymbol{\gamma}_{i_{1}}, & \text{ if } i_{2}=i_{1}+1 & \text{ and } 1 \leq i_{1} \leq n-1 \\ \boldsymbol {\beta}_{i_{1}}, & \text{ if } i_{2}=i_{1}-1 & \text{ and } 2 \leq i_{1} \leq n\\ \boldsymbol{0}, & \text{ otherwise } \end{array} \right. . $$
(5)

where \(\boldsymbol {\alpha }_{i_{1}}, \boldsymbol {\beta }_{i_{1}}, \boldsymbol {\gamma }_{i_{1}}\) are the matrices in Algorithm 1. The tensor \(\boldsymbol {\mathcal {T}}_{n}\) is a generalization of the so-called (complex) Jacobi matrix associated with the non-Hermitian Lanczos algorithm; see, e.g., [45] and references therein. By using \(\boldsymbol {\mathcal {T}}_{n}\), Theorem 1 provides a compact representation of the three-term recurrences constructing the biorthogonal bases.

Theorem 1

The three-term recurrences (3.23.2) can be written in the compact form

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\mathcal{A}}*\boldsymbol{\mathcal{V}}_{n}=\boldsymbol{\mathcal{V}}_{n}*\boldsymbol{\mathcal{T}}_{n}+ \widetilde{\boldsymbol{\mathcal{V}}}_{n} \end{array} $$
(6a)
$$ \begin{array}{@{}rcl@{}} \boldsymbol{\mathcal{W}}_{n}*\boldsymbol{\mathcal{A}}=\boldsymbol{\mathcal{T}}_{n}*\boldsymbol{\mathcal{W}}_{n}+\widetilde{\boldsymbol{\mathcal{W}}}_{n} \end{array} $$
(6b)

where \( \widetilde {\boldsymbol {\mathcal {V}}}_{n} \in \mathbb {C}^{N \times n \times M \times M}\) is

$$ (\widetilde{\boldsymbol{\mathcal{V}}}_{n})_{:,k,:,:} := \left \{\begin{array}{ll} V_{n+1} \times \boldsymbol {\beta}_{n+1}, & \text{ if } k=n \\ \boldsymbol{0}, & \text{ otherwise } \end{array} \right., $$

and \(\widetilde {\boldsymbol {\mathcal {W}}}_{n} \in \mathbb {C}^{n \times N \times M \times M}\) is

$$ (\widetilde{\boldsymbol{\mathcal{W}}}_{n})_{k,:,:,:} := \left \{\begin{array}{ll} \boldsymbol{\gamma}_{n+1} \times W^{D}_{n+1}, & \text{ if } k=n \\ \boldsymbol{0}, & \text{ otherwise } \end{array} \right. . $$

Proof

By direct inspection. We have, for all \(i_{1} \in \{1,\dots ,N \}\), \(i_{2} \in \{1,\dots ,n-1\}\)

$$ (\boldsymbol{\mathcal{A}}*\boldsymbol{\mathcal{V}}_{n})_{i_{1},i_{2},:,:}={\sum}_{k=1}^{N}\boldsymbol{\mathcal{A}}_{i_{1},k,:,:}(\boldsymbol{\mathcal{V}}_{n})_{k,i_{2},:,:}={\sum}_{k=1}^{N}\boldsymbol{\mathcal{A}}_{i_{1},k,:,:}({V}_{i_{2}})_{k,:,:}=(\boldsymbol{\mathcal{A}}*{V}_{i_{2}})_{i_{1},:,:} $$
(7)

and

$$ \begin{array}{llllll} (\boldsymbol{\mathcal{V}}_{n}*\boldsymbol{\mathcal{T}}_{n}+ \widetilde{\boldsymbol{\mathcal{V}}}_{n})_{i_{1},i_{2},:,:}&= \sum\limits_{k=1}^{n}(\boldsymbol{\mathcal{V}}_{n})_{i_{1},k,:,:}(\boldsymbol{\mathcal{T}}_{n})_{k,i_{2},:,:} \\ & = (V_{i_{2}})_{i_{1}}\boldsymbol {\alpha}_{i_{2}}+(V_{i_{2}+1})_{i_{1}} \boldsymbol {\beta}_{i_{2}+1}+(V_{i_{2}-1})_{i_{1}} \\ & = (V_{i_{2}}\times \boldsymbol {\alpha}_{i_{2}}+V_{i_{2}+1}\times \boldsymbol {\beta}_{i_{2}+1}+V_{i_{2}-1})_{i_{1}} . \end{array} $$
(8)

The equality between (7) and (8) follows using (3b) and proves (6a). The remaining part of the theorem can be proved analogously. □

If we ∗-multiply (6a) by \(\boldsymbol {\mathcal {W}_{n}}\) from the left we obtain the expression

$$ \boldsymbol{\mathcal{T}}_{n}=\boldsymbol{\mathcal{W}}_{n}* \boldsymbol{\mathcal{A}}*\boldsymbol{\mathcal{V}}_{n}. $$

The tensor \(\boldsymbol {\mathcal {T}}_{n}\) satisfies a generalization of the matching moment property which is stated in Theorem 2.

Theorem 2 (Matching Moment Property)

Let \(\boldsymbol {\mathcal {A}}, V, W\) and \(\boldsymbol {\mathcal {T}}_{n}\) be as described above, then

$$ W^{D} * (\boldsymbol{\mathcal{A}}^{k_{*}}) * V = {E_{1}^{D}} * (\boldsymbol{\mathcal{T}}_{n})^{k_{*}}* E_{1}, \quad \text{ for } \quad k=0,\dots, 2n-1, $$

where E1 = e1IM and e1 is the first vector of the Euclidean base of \(\mathbb {C}^{N \times N}\).

The proof of Theorem 2 can be found in A.

3.4 Numerical properties

The tensor \(\boldsymbol {\mathcal {A}}\) is obtained by discretizing A(t) and stores in \(\boldsymbol {\mathcal {A}}_{k,l,:,:}\) the coefficients representing the (k,l)-th element of A(t). Different methods of discretization are possible. In this paper, following [23], we discretize the interval I obtaining the mesh

$$ \tau_{i} = h (i-1) + a, \quad i = 1,\dots,M, \quad h = \frac{b-a}{M-1}. $$
(9)

For this mesh the discretization of \(\boldsymbol {\mathsf {A}}(t) = \left [\boldsymbol {\mathsf {A}}_{k,\ell }(t) \right ]_{k,\ell }^{N}\) is the tensor

$$ \boldsymbol{\mathcal{A}}_{k,\ell,:,:} := \boldsymbol{\nu}_{k,\ell}, \quad k,\ell =1,\dots,N, $$
(10)

where the matrices \(\boldsymbol {\nu }_{k,\ell }\in \mathbb {C}^{M \times M}\) are lower triangular matrices defined as

$$ \left( \boldsymbol{\nu}_{k,\ell} \right)_{i,j} = \left\{\begin{array}{ll} \boldsymbol{\mathsf{A}}_{k,\ell}(\tau_{i}) h, & i \geq j \\ 0 & i < j \end{array}\right . . $$

This discretization scheme has an accuracy of order \(\mathcal {O}(h) = \mathcal {O}(1/M)\) and, indeed, in Section 5, we show that when this discretization scheme is used, the approximation of the bilinear form of interest also has an accuracy of \(\mathcal {O}(1/M)\).

The computational cost of Algorithm 1 depends on the chosen number of discretization points M and on the number of iterations n. In this algorithm the dominant cost is the multiplication of a 4-mode tensor with a 3-mode tensor, i.e., \(\boldsymbol {\mathcal {A}}\ast V_{k}\) and \({W_{k}^{D}}\ast \boldsymbol {\mathcal {A}}\). The worst case complexity of one such product is \(\mathcal {O}(M^{3} N^{2})\), for a total cost of \(\mathcal {O}(2 n M^{3} N^{2})\). However, since A(t) is sparse in all practical applications, the computational cost can be much lower. For example, if there are Nnz < N nonzeros in each column of A(t), then the cost reduces to \(\mathcal {O}(2 n M^{3} N N_{\text {nz}} )\). It is important to note, moreover, that the term M3 arises from the matrix-matrix multiplication between Vk and the blocks in \(\boldsymbol {\mathcal {A}}\). Since these blocks arise from a discretization strategy, it is likely that they will exhibit a particular structure that can be exploited for efficient computations. E.g., in the discretization used in this work, these blocks are lower triangular matrices for which the matrix-matrix multiplication has a cost of \(\frac {M^{3}}{2}\).

Finally, the storage cost of Algorithm 1 is three basisvectors Vi, three basisvectors \(W^{D}_{i}\) and 3n − 1 nonzero elements of \(\boldsymbol {\mathcal {T}_{n}}\), for a total of \(\mathcal {O}(6M^{2} N+ 3M^{2} n)\). Only three basisvectors must be kept in memory thanks to the underlying three-term recurrence relation.

Let us conclude this section observing that, as highlighted from the previous discussion, both, the computational cost and storage requirement depend strongly on the number M of discretization points used. For the discretization scheme described above, we expect that a large number of discretization points is required since its accuracy is \(\mathcal {O}(1/M)\). This justifies the search for more accurate discretization schemes, for example Legendre polynomial approximation. However, other discretization schemes will not be discussed here since they are subject of future research and since the discretization scheme introduced above suffices to illustrate the potential of Algorithm 1.

4 Breakdowns

If the matrix βk+ 1 is singular, then line 11 in Algorithm 1 cannot be performed and the algorithm breaks down. This breakdown issue is inherited from the (usual) non-Hermitian Lanczos algorithm; see, e.g., [19, 27, 28, 43, 52]. There are two different kinds of breakdowns. The first one, the so-called lucky breakdown, occurs when one of the Krylov-type subspaces \(\mathcal {K}_{k}(\boldsymbol {\mathcal {A}},A)\) or \(\mathcal {K}_{k}^{D}(B^{D},\boldsymbol {\mathcal {A}})\) becomes invariant under ⋆-multiplication with \(\mathcal {A}\) from the left or right, respectively. Suppose that \(\mathcal {K}_{k}(\boldsymbol {\mathcal {A}},A)\) is an invariant subspace, this will result, in exact arithmetic, in \(\widehat {V}_{k+1} = \boldsymbol {0}\) in Line 5 of Algorithm 1. In finite precision \(\widehat {V}_{k+1}\in \mathbb {C}^{N\times M \times M}\) will never be exactly zero. Therefore, the Frobenius norm

$$ \Vert \widehat{V}_{k+1}\Vert_{F} := {\sum}_{i=1}^{N} {\sum}_{j=1}^{M} {\sum}_{{\ell=1}}^{M} \left\vert\left( \widehat{V}_{k+1}\right)_{i,j,{\ell}}\right\vert^{2} $$

is used to define the following criterion to detect a lucky breakdown:

$$ \frac{\Vert \widehat{V}_{k+1}\Vert_{F}}{\Vert{V}_{k}\Vert_{F}} < \epsilon, $$

with 𝜖 << 1, a user-defined threshold close to machine precision. The same applies to the case \(\widehat {W}_{k+1}^{D} = \boldsymbol {0}\). The second kind of breakdown occurs when both \(\widehat {V}_{k+1} \neq \boldsymbol {0}\) and \(\widehat {W}_{k+1}^{D} \neq \boldsymbol {0}\), but \(\boldsymbol {\beta }_{k+1}\in \mathbb {C}^{M\times M}\) is still singular, then the algorithm breaks down. This case is known as a serious breakdown. In numerical computation, the condition number of βk+ 1 is monitored to decide if Line 11 can be computed sufficiently accurate. A user-defined threshold 𝜖s >> 1 specifies an upper bound on the allowed condition number of βk+ 1. That is, if the ratio of its largest and smallest singular value is larger than 𝜖s, i.e., \(\sigma _{\max \limits }(\boldsymbol {\beta }_{k+1})/\sigma _{\min \limits }(\boldsymbol {\beta }_{k+1}) > \epsilon _{s}\), then the algorithm breaks down. Note that the choice of γk+ 1 will influence the condition number of βk+ 1.

In the usual non-Hermitian Lanczos algorithm, a serious breakdown can be treated by using a so-called look-ahead strategy; see, e.g., [8, 9, 19, 27, 28, 43, 51]. Connection between serious breakdowns, (formal) orthogonal polynomials, and matching moment property can be found in [16, 44]. If needed, an analogous look-ahead strategy may be implemented for the tensor Lanczos algorithm. At the moment, an easier strategy to deal with serious breakdowns is to reformulate the problem so to change the input vectors v,w. For instance, when w = ei and v = ei, a serious breakdown is likely to happen due to the sparsity of \(\boldsymbol {\mathcal {A}}\). However, we can rewrite the approximation of the time-ordered exponential U(t) as

$$\boldsymbol{e}_{i}^{H}\boldsymbol{\mathsf{U}}(t)\boldsymbol{e}_{j} = (\boldsymbol{e} + \boldsymbol{e}_{i})^{H} \boldsymbol{\mathsf{U}}(t) \boldsymbol{e}_{j} - \boldsymbol{e}^{H} \boldsymbol{\mathsf{U}}(t) \boldsymbol{e}_{j}, $$

with \(\boldsymbol {e} = (1, \dots , 1)^{H}\). Then one can approximate \((\boldsymbol {e} + \boldsymbol {e}_{i})^{H} \boldsymbol {\mathsf {U}}(t) \boldsymbol {e}_{j}\) and \(\boldsymbol {e}^{H} \boldsymbol {\mathsf {U}}(t) \boldsymbol {e}_{j}\) separately, which are less likely going to have a breakdown thanks to the fact that e is a full vector; see, e.g., [25, Section 7.3] and [23].

5 Numerical examples

Let us consider the following smooth matrix-valued function defined on a real interval I = [a,b]:

$$ \boldsymbol{\mathsf{A}}(t): I \subset \mathbb{C} \rightarrow \mathbb{C}^{N \times N}. $$

As anticipated in the Introduction, the time-ordered exponential of A(t) is the unique matrix-valued function \(\boldsymbol {\mathsf {U}}(t) \in \mathbb {C}^{N\times N}\) defined on I that is the solution of the system of linear ordinary differential equations

$$ \frac{d}{dt}\boldsymbol{\mathsf{U}}(t) = \boldsymbol{\mathsf{A}}(t) \boldsymbol{\mathsf{U}}(t), \quad \boldsymbol{\mathsf{U}}(a)=I_{N}, \quad t \in I, $$

see [17]. In this section, we aim to approximate the bilinear form wHU(t)v by using the tensor non-Hermitian Lanczos algorithm. If the matrix A is so that A(τ1)A(τ2) −A(τ2)A(τ1) = 0 for all τ1,τ2I, then \(\boldsymbol {\mathsf {U}}(t)=\exp \left ({{\int \limits }_{s}^{t}} \boldsymbol {\mathsf {A}}(\tau ) \mathrm {d}\tau \right ).\) Unfortunately, U(t) – and the related bilinear forms – cannot be expressed by an analogous simple form in the general case. Indeed, even for small matrices, U(t) may be given by complicated special functions [32, 53].

A new approach for the approximation of a time-ordered exponential bilinear form was introduced in [22,23,24] and it is based on ⋆-Lanczos, which is a symbolic algorithm. This method is able to approximate the bilinear form

$$\mathbf{w}^{H} \boldsymbol{\mathsf{U}}(t) \mathbf{v}, \quad t \in I$$

for the given vectors w,v, with wH,v≠ 0. The matrices \(\boldsymbol {\alpha }_{1},\dots ,\boldsymbol {\alpha }_{n}\), \(\boldsymbol {\beta }_{2},\dots ,\boldsymbol {\beta }_{n}\), and \(\boldsymbol {\gamma }_{2},\dots ,\boldsymbol {\gamma }_{n}\), which compose the 4-mode tensor \(\boldsymbol {\mathcal {T}}_{n}\) in (5), are obtained by running n iterations of Algorithm 1 with, as inputs, the 4-mode tensor \(\boldsymbol {\mathcal {A}}\) in (10) and the vectors v,w.

Sampling the true solution wHU(t)v on the discretization nodes τi gives the vector \(\hat {\mathbf {s}}\) defined as

$$ {\hat{\mathbf{s}}:= \begin{bmatrix} \mathbf{w}^{H} \boldsymbol{\mathsf{U}}(\tau_{1}) \mathbf{v} & \mathbf{w}^{H} \boldsymbol{\mathsf{U}}(\tau_{2}) \mathbf{v} & {\dots} & \mathbf{w}^{H} \boldsymbol{\mathsf{U}}(\tau_{M}) \mathbf{v} \end{bmatrix}^{\top}.} $$

Exploiting the results described in [23], the sampled solution vector \(\hat {\mathbf {s}}\) can be approximated by

$$ {\mathbf{s}_{n}} := \frac{1}{h}\left( \boldsymbol{\theta} \times \left( R_{\ast}(\boldsymbol{\mathcal{T}}_{n})\right)_{1,1,:,:}\right) \mathbf{e}_{1} \approx \hat{\mathbf{s}}, $$
(11)

where R is the ∗-resolvent , i.e., the tensor

$$ R_{\ast}(\boldsymbol{\mathcal{T}}_{n}) := \boldsymbol{\mathcal{I}}_{\ast} + {\sum}_{k=1}^{\infty} \left( \boldsymbol{\mathcal{T}}_{n} \right)^{k_{\ast}}, $$

and

$$ \boldsymbol{\theta} := h \begin{bmatrix} 1 & 0& 0 &{\dots} & 0\\ 1 & 1 & 0 & {\dots} & 0\\ {\vdots} & {\vdots} & {\ddots} & {\ddots} & \vdots\\ 1 & 1 & {\dots} & 1 & 0\\ 1 & 1 & {\dots} & 1 & 1 \end{bmatrix} \in \mathbb{C}^{M\times M}. $$

Overall, the accuracy of the approximation in (11) can not be better than \(\mathcal {O}(h)\). This is due to the fact that, as explained in [23], the discretization (10) is based on a rectangular quadrature rule. Finally, using the Path-sum method [21] we get the following explicit expression for the ∗-resolvent in terms of a continued fraction

$$ \begin{array}{llllll} &R_{\ast}(\boldsymbol{\mathcal{T}_{n}})_{1,1,:,:} = \\ & \left( \widetilde{\boldsymbol {\alpha}}_{1} - \boldsymbol {\beta}_{2} \left( \widetilde{\boldsymbol {\alpha}}_{2} - \boldsymbol {\beta}_{3} \left( {\cdots} \boldsymbol {\beta}_{n-1} \widetilde{\boldsymbol {\alpha}}_{n}^{-1} \boldsymbol{\gamma}_{n-1} {\cdots} \right)^{-1} \boldsymbol{\gamma}_{3} \right)^{-1} \boldsymbol{\gamma}_{2} \right)^{-1}, \end{array} $$
(12)

with \(\widetilde {\boldsymbol {\alpha }}_{i} = I_{{M}} - \boldsymbol {\alpha }_{i}\). (12) is computed from the most inner term moving outward, where the inversion operation is performed using the backslash operator in Matlab®;. Note that the ∗-resolvent and all inverses appearing in (12) are expected to exist for h small enough, since their continuous counterparts exist under certain regularity conditions on A(t); see [22, 24].

The rest of the section is structured as follows. Section 5.1 describes the measures that will be used to quantify the errors of the final solution and of the computed biorthonormal bases for Krylov subspaces. In Section 5.2 two examples are discussed for which an analytical solution is available. This allows us to compare the approximation to an exact solution and to show that it converges with the expected rate of convergence. Small-scale examples from NMR spectroscopy are discussed in Section 5.3. Finally, in Section 5.4, we analyze the approximability of the previously considered tensors by the Tensor Train representation.

5.1 Error measures

In this section we define a series of error measures which quantify the quality of the generated biorthogonal bases and the accuracy of the approximation (11). These measures use the Frobenius norm, which, for a 4-modes tensor, is defined as

$$ \Vert \boldsymbol{\mathcal{A}}\Vert_{F} := {\sum}_{i=1}^{N} {\sum}_{j=1}^{N} {\sum}_{k=1}^{M} {\sum}_{l=1}^{M} \left\vert\left( \boldsymbol{\mathcal{A}}\right)_{i,j,k,l}\right\vert^{2}. $$

The main goal is to analyze the rate of convergence as the number of discretization points M is increased. To stress the dependence on M of computed quantities we use the superscript “(M)”.

A generalization of the usual error measures for Krylov subspace methods are used. As a measure for the biorthonormality of the bases \(\boldsymbol {\mathcal {V}}_{n} \in \mathbb {C}^{N\times n \times M \times M }\) and \(\boldsymbol {\mathcal {W}}_{n}\in \mathbb {C}^{n\times N \times M \times M }\) generated by n steps of the algorithm, we use

$$ \text{err}_{\mathrm{o}} := \frac{\Vert \boldsymbol{\mathcal{W}}_{n}^{(M)}*\boldsymbol{\mathcal{V}}_{n}^{(M)} - \boldsymbol{\mathcal{I}}_{*}\Vert_{F}}{\max (\Vert\boldsymbol{\mathcal{V}}_{n}^{(M)} \Vert_{F},\Vert\boldsymbol{\mathcal{W}}_{n}^{(M)} \Vert_{F})}. $$

For a robust algorithm, it is paramount that the term \({\max \limits } (\Vert \boldsymbol {\mathcal {V}}_{n}^{(M)} \Vert _{F},\Vert \boldsymbol {\mathcal {W}}_{n}^{(M)} \Vert _{F})\) remains small as n increases. This can be obtained by employing an appropriate strategy to rescale the basisvectors in \(\boldsymbol {\mathcal {V}}_{n}^{(M)}\) and \(\boldsymbol {\mathcal {W}}_{n}^{(M)}\), i.e., by γk+ 1 in Algorithm 1. In this section we will choose γk+ 1 = IM for all k, i.e., no rescaling. An effective rescaling strategy can improve on the numerical results reported below, but developing such a strategy is subject to future research.

To measure the quality of the recurrences (1), we use

$$ \begin{array}{@{}rcl@{}} \text{err}_{\mathrm{V}} := \frac{\Vert \boldsymbol{\mathcal{A}}*\boldsymbol{\mathcal{V}}_{n}^{(M)} - \boldsymbol{\mathcal{V}}_{n}^{(M)}*\boldsymbol{\mathcal{T}}_{n}^{(M)} - \widetilde{\boldsymbol{\mathcal{V}}}_{n}^{(M)}\Vert_{F}}{\max(\Vert \boldsymbol{\mathcal{A}}*\boldsymbol{\mathcal{V}}_{n}^{(M)} \Vert_{F}, \Vert\boldsymbol{\mathcal{V}}_{n}^{(M)}*\boldsymbol{\mathcal{T}}_{n}^{(M)} + \widetilde{\boldsymbol{\mathcal{V}}}_{n}^{(M)}\Vert_{F})},\\ \text{err}_{\mathrm{W}} :=\frac{\Vert \boldsymbol{\mathcal{W}}_{n}^{(M)}*\boldsymbol{\mathcal{A}} - \boldsymbol{\mathcal{T}}_{n}^{(M)}*\boldsymbol{\mathcal{W}}_{n}^{(M)} - \widetilde{\boldsymbol{\mathcal{W}}}_{n}^{(M)}\Vert_{F}}{\max(\Vert \boldsymbol{\mathcal{W}}_{n}^{(M)}*\boldsymbol{\mathcal{A}} \Vert_{F}, \Vert \boldsymbol{\mathcal{T}}_{n}^{(M)}*\boldsymbol{\mathcal{W}}_{n}^{(M)} + \widetilde{\boldsymbol{\mathcal{W}}}_{n}^{(M)}\Vert_{F})}. \end{array} $$

As a measure for the Matching Moment Property, see Theorem 2, we use

$$ \text{err}_{\mathrm{M}}(k) := \frac{\Vert W^{D} * (\boldsymbol{\mathcal{A}}^{k_{*}}) * V - {E_{1}^{D}} * (\boldsymbol{\mathcal{T}}_{n}^{(M)})^{k_{*}}* E_{1} \Vert_{F}}{\max(\Vert W^{D} * (\boldsymbol{\mathcal{A}}^{k_{*}}) * V \Vert_{F}, \Vert {E_{1}^{D}} * (\boldsymbol{\mathcal{T}}_{n}^{(M)})^{k_{*}}* E_{1}\Vert_{F}) }, $$

which should be close to zero for \(k=0,\dots , 2n-1\).

Finally, to quantify the quality of the solution, we consider as error measure for (11) the quantity

$$ \text{err}_{\text{sol}} := \frac{\Vert \hat{\mathbf{s}}-\mathbf{s}^{(M)}_{n} \Vert_{2}}{\Vert \hat{\mathbf{s}} \Vert_{2}}, $$

where, if no analytic expression is available, an approximation of s is obtained by using ode45 in Matlab®;. In the formula above, ∥⋅∥2 stands for the usual Euclidean norm. The rate at which errsol decreases as M increases is expected to be \(\mathcal {O}(h)= \mathcal {O}(1/M)\), i.e., the accuracy of the discretization used here.

5.2 Proof of concept

As a proof of concept, we test our proposal on two problems which originally appeared in [23]. In both experiments a discretization with M points is used and we run n iterations of Algorithm 1 with γk+ 1IM. This produces the tensor \(\boldsymbol {\mathcal {T}}_{n}^{(M)}\) defined in (5) with coefficients \(\boldsymbol {\alpha _{1}}^{(M)},\dots , \boldsymbol {\alpha _{n}}^{(M)}\) and \(\boldsymbol {\beta _{2}}^{(M)},\dots , \boldsymbol {\beta _{n}}^{(M)}\), depending on M. For the two experiments considered here the result of the ⋆-Lanczos algorithm [23] is known. The coefficients resulting from the latter algorithm are bivariate functions \(\alpha _{1}(t,s),\dots ,\alpha _{n}(t,s)\) and \(\beta _{2}(t,s),\dots ,\beta _{n}(t,s)\), because ⋆-Lanczos is a symbolic algorithm. The tensor Lanczos algorithm is a discretization of the ⋆-Lanczos algorithm, which means that \(\boldsymbol {\alpha _{i}}^{(M)}\) and \(\boldsymbol {\beta _{i}}^{(M)}\) can be seen as discretizations of the functions αi(t,s) and βi(t,s), respectively. Consider the evaluation of these functions on the mesh τi:

$$ \boldsymbol{ \hat{\alpha}}_{i} := \begin{bmatrix} \alpha(\tau_{i},\tau_{j}) \end{bmatrix}_{i,j=1}^{M}, \qquad \boldsymbol{ \hat{\beta}}_{i} := \begin{bmatrix} \beta(\tau_{i},\tau_{j}) \end{bmatrix}_{i,j=1}^{M}, $$

then, we can define the errors \(\frac {\Vert \boldsymbol {\hat {\alpha }}_{i} - \boldsymbol {\alpha }_{i}^{(M)}\Vert _{2}}{\Vert \boldsymbol {\hat {\alpha }}_{i} \Vert _{2}}\), \(i=1,\dots ,n\), and \(\frac {\Vert \boldsymbol {\hat {\beta }}_{i} - \boldsymbol {\beta }_{i}^{(M)}\Vert _{2}}{\Vert \boldsymbol { \hat {\beta }}_{i} \Vert _{2}}\), \(i=2^,\dots ,n\). These errors will be used as a measure for the accuracy of the computed tensor \(\boldsymbol {\mathcal {T}}_{n}^{(M)}\). The number of iterations n is chosen equal to the problem size N, which allows us to compare all the available functions \(\alpha _{1}(t,s),\dots , \alpha _{N}(t,s)\) and \(\beta _{2}(t,s),\dots , \beta _{N}(t,s)\) with the elements in \(\boldsymbol {\mathcal {T}}_{N}^{(M)}\) and track the convergence rate with M of the latter.

5.2.1 Time-independent matrix

Consider a constant matrix and starting vectors

$$ {\boldsymbol{\mathsf{A}}}(t) = \begin{bmatrix} -1 & \phantom{0.}1 & \phantom{0.}1\\ \phantom{0.}1 & \phantom{0.}0 & \phantom{0.}1\\ \phantom{0.}1 & \phantom{0.}1 & -1 \end{bmatrix},\qquad \mathbf{v},\mathbf{w} = \begin{bmatrix} 1\\ 0\\ 0 \end{bmatrix}, $$

and the interval I = [0,1]. The inputs of Algorithm 1 are the starting hypervectors vIM,wIM and the tensor \(\boldsymbol {\mathcal {A}}\) whose components \(\boldsymbol {\mathcal {A}}_{i_{1},i_{2},:,:}\) are defined as

$$ \boldsymbol{\mathcal{A}}_{{i_{1},i_{2}},:,:} = \begin{cases} \boldsymbol{\theta}, &\text{ if } {i_{1}=i_{2}} = 1 \text{ or } {i_{1}=i_{2}}=3\\ \boldsymbol{0}, &\text{ if } {i_{1}=i_{2}}=2 \\ -\boldsymbol{\theta}, &\text{ otherwise} \end{cases}, $$

where \(\boldsymbol {0}\in \mathbb {R}^{M\times M}\) is the null matrix. The tensor \(\boldsymbol {\mathcal {A}}\) is obtained by sampling the matrix-valued function A(t) on the M point mesh (9) and following the definition in (10). The output for n = N = 3 iterations is \(\boldsymbol {\mathcal {T}}_{N}^{(M)}\). Table 3 reports the Krylov error measures, which behave as expected for the recurrence measures. The loss of biorthogonality observed for increasing values of M is presumably due to the fact that no rescaling is used in the algorithm.

Table 3 Krylov error measures for time-independent matrix

On the other hand, this loss of biorthogonality does not compromise the moment matching capabilities of \(\boldsymbol {\mathcal {T}}_{N}^{(M)}\), as it becomes evident from Table 4 where we report errM(k) for k ≤ 5 = 2n − 1.

Table 4 Measure for moment matching errM(k) for time-independent matrix. Entries for k = 0,1,2 are omitted since they are equal to zero

Moreover, as the values reported in Table 5 confirm, we can observe that the elements βi converge at the expected rate of \(\mathcal {O}(1/M)\). The elements αi are very accurate for M = 10 whereas for larger M, the accuracy of αi decreases: this decrease is, presumably, the result of error propagation in the numerical algorithm. This error is still smaller than the expected order of \(\mathcal {O}(1/M)\).

Table 5 Error on the elements of the tridiagonal tensor for time-independent matrix

Moreover, in this particular case, the analytical solution to the ODE is known; see [23]:

$$ \hat{\mathbf{s}} = \begin{bmatrix} \left( \exp(A\tau_{1}) \right)_{11} & \left( \exp(A\tau_{2}) \right)_{11} & {\dots} & \left( \exp(A\tau_{M}) \right)_{11} \end{bmatrix}^{\top}, $$

with \(\left (\exp (At) \right )_{11} = -\frac {1}{2} \sinh (2t) + \frac {1}{2}\cosh (2t) + \frac {1}{2} \cosh (\sqrt {2}t)\). Hence, it is possible to compare this exact solution with (11). Note that for n = 3 the ∗-resolvent is given by the continued fraction:

$$ R_{\ast}(\boldsymbol{\mathcal{T}_{3}})_{1,1,:,:} = \left( I_{M} - \boldsymbol {\alpha}_{1} - \left( I_{M} - \boldsymbol {\alpha}_{2} - \left( I_{M} - \boldsymbol {\alpha}_{3}\right)^{-1} \boldsymbol {\beta}_{3} \right)^{-1} \boldsymbol {\beta}_{2} \right)^{-1}. $$

Table 6 shows the error measure errsol for increasing M, which convergences at the expected rate \(\mathcal {O}(1/M)\).

Table 6 Error of approximation to the quantity of interest wHU(t)v for time-independent matrix

5.2.2 Time-dependent matrix

Consider the time-dependent matrix

$$ \tilde{\boldsymbol{\mathsf{A}}}(t) = \begin{bmatrix} \cos(t) & 0 & 1 & 2 & 1\\ 0 & \cos(t)-t & 1-3t & t & 0\\ 0 & t & 2t+\cos(t) & 0 & 0\\ 0 & 1 & 2t+1 & t + \cos(t) & t\\ t & -t-1 & -6t-1 & 1-2t & \cos(t)-2t \end{bmatrix}, $$

the starting vectors \(\mathbf {v} = \mathbf {w} = \begin {bmatrix} 1& 0& 0& 0 & 0 \end {bmatrix}^{\top }\), and the interval I = [10− 4,1]. As it becomes apparent from the results reported in Table 7, for this particular experiment, we obtain that the Krylov error measures and the recurrence measures are small whereas the biorthogonality measure is large.

Table 7 Krylov error measures for time-dependent matrix

The loss of orthogonality is an inherent feature of Lanczos-like algorithms, and it does not necessarily compromise algorithm’s capability to produce an approximation to the bilinear form. Indeed, Table 8 confirms that the matching moment property is not affected by the loss of ∗-biorthogonality.

Table 8 Measure for moment matching errM(k) for time-dependent matrix

On the other hand, the results presented in Table 9, where we report the error measures for the coefficients computed by Algorithm 1, show that the loss of ∗-biorthogonality of the computed bases has a limited impact for the convergence of the algorithm to the solution. Indeed, in this case, for all βi the expected convergence rate is observed whereas, for αi, only i = 1,2,3 show the expected decrease in the error measure.

Table 9 Error on the elements of the tridiagonal tensor for time-dependent matrix

Table 10 shows that the approximation to the quantity of interest converges at the rate \(\mathcal {O}(1/M)\). Hence, the loss of biorthogonality and the inaccurate coefficients of the tridiagonal tensor did not compromise this approximation.

Table 10 Error of approximation to the quantity of interest wHU(t)v for time-dependent matrix

5.3 NMR experiments

Nuclear magnetic resonance (NMR) spectroscopy studies the structure and dynamics of molecules by looking at nuclear spins [31, 39]. Computer simulations of NMR experiments are important because they can improve the design and analysis of laboratory experiments [50]. In this section, three small, realistic examples arising from NMR spectroscopy [3] are discussed. The ODE that governs the dynamics of nuclear spins during NMR spectroscopy is the Schrödinger equation

$$ \frac{d}{dt} \phi(t) = -\imath 2\pi H(t) \phi(t), \quad t\in\left[0,\tau_{\exp}\right] $$

where H(t) is the so-called Hamiltonian, ϕ(t) the wave function, \(\tau _{\exp }\) the duration of the experiment and \(\imath = \sqrt {-1}\). The size of the Hamiltonian is related to the number of nuclear spins present in the system, for l nuclear spins H(t) is of the size 2l × 2l. Hence, H(t) grows exponentially with the number of spins, but it is a sparse matrix, making it an ideal candidate for a Lanczos-like algorithm.

The experiments discussed in this section use M = 500 discretization points, because of memory constraints. It is important to note that such memory constraints may be overcome using the Tensor Train approximation presented in Section 5.4. The number of iterations of the tensor Lanczos algorithm is chosen to obtain the maximal attainable accuracy, which is determined by the discretization scheme with M = 500. Since the discretization used here has an accuracy of \(\mathcal {O}(1/M)\), the smallest number of iterations n such that errsol is of order 10− 3 suffices. Choosing larger n will not decrease errsol further for a fixed M and will increase the computational cost.

5.3.1 Experiment 1: Weak coupling

Consider four nuclear spins with heteronuclear dipolar couplings. In this framework, the Hamiltonian for a magic angle spinning (MAS) experiment [29] is the diagonal matrix

$$ \begin{array}{@{}rcl@{}} H(t) = \text{diag}\left[\{f_{k}(t)\}_{k=1}^{16}\right],\quad f_{k}(t) = \alpha_{k} + \beta_{k} \cos(2\pi \nu t) + \gamma_{k} \cos(4\pi \nu t), \end{array} $$

with \(\alpha _{k},\beta _{k},\gamma _{k}\in \mathbb {R}\) and ν = 104. The diagonal matrix A(t) = −ı2πH(t) commutes with itself at all times and thus the solution U(t) can be computed

$$ \begin{array}{@{}rcl@{}} U(t) &= \text{diag}\left[\left\{\exp\left( -\imath \alpha_{k} t -\imath \frac{\beta_{k}}{2\pi\nu}\sin(2\pi \nu t)-\imath \frac{\gamma_{k}}{4\pi \nu} \sin(4\pi \nu t)\right) \right\}_{k=1}^{16}\right]. \end{array} $$

The starting vectors are chosen to excite and measure the lowest oscillatory components in U(t): \(\mathbf {w} =\mathbf {v} = \begin {bmatrix} 0 & 1 & 1 & 0 & 1 & 1& {\dots } & 0& 1& 1 \end {bmatrix}^{\top }\). A typical experiment would run for a time of the order 102 seconds. Since the problem is (highly) oscillatory and the current discretization requires many points to accurately compute a solution, we choose to restrict the experiment time to \(\tau _{\exp }=5\times 10^{-5}\). This is a valid approach since the total time interval of 10− 2 can be split into subintervals of length 5 × 10− 5 and the solutions on the subintervals can be combined to obtain the solution on the whole interval.

Algorithm 1 is run for n = 3 iterations and the corresponding Krylov error measures are shown in Table 11. A first observation concerns the fact that going from M = 5 to M = 50 a large decrease of the biorthogonality measure is observed. This is due to the fact that when discretizing with fewer discretization points, e.g., M = 5, the original matrix in the ODE − ı2πH(t) is translated into a simpler (and inaccurate) discretized input, for which the tensor Lanczos iteration converges fast. The discretization M = 50 represents the original input better, as is suggested by the stagnation of erro going from M = 50 to M = 500.

Table 11 Error measures for Experiment 1

The error errsol is computed using the analytical solution wHU(t)v evaluated in the discretization points and decays at the expected rate \(\mathcal {O}(1/M)\). Figure 1 shows \(\hat {\mathbf {s}}\) and the approximation \(\mathbf {s}^{(M)}_{n}\) as a function of time; such approximation clearly converges for increasing M (x-axis reports time).

Fig. 1
figure 1

Quantity of interest \(\hat {\mathbf {s}}\) and approximation \(\mathbf {s}^{(M)}_{n}\) for Experiment 1. Real part on the left and imaginary part on the right. x-axis reports time

5.3.2 Experiment 2: Strong coupling

MAS with four nuclear spins with homonuclear dipolar couplings leads to the Hamiltonian

$$ \begin{array}{@{}rcl@{}} H(t) = \text{diag}\left[\{\alpha_{k}\}_{k=1}^{16}\right]+ B \cos(2\pi \nu t) + C \cos(4\pi \nu t), \end{array} $$

where \(\alpha _{k}\in \mathbb {R}\) is a scalar, and \(B,C\in \mathbb {R}^{16\times 16}\) are matrices with a sparsity structure as shown in Fig. 2.

Fig. 2
figure 2

Sparsity structure of B and C in Experiment 2, × denotes a nonzero element

A typical experiment time is 10− 2 seconds and ν = 104. The simulated experiment time is \(\tau _{\exp } = 5 \times 10^{-6}\), the size of the Krylov subspace is k = 4 and \(\mathbf {v}=\mathbf {w}=\begin {bmatrix} 0 & 1 & 1 & 0 & 1 & 1& {\dots } & 0& 1& 1 \end {bmatrix}^{\top }\). The corresponding error measures are shown in Table 12, which show a similar behavior as for Experiment 1. The measure errsol is computed by comparing \(\hat {\mathbf {s}}\) to the solution obtained by ode45.

Table 12 Error measures for Experiment 2

5.3.3 Experiment 3: Uncoupled spins under a pulse wave

The Hamiltonian for four uncoupled spins under a pulse wave is

$$ \begin{array}{@{}rcl@{}} H(t) &= &\text{diag}\left[\{\alpha_{k}\}_{k=1}^{16}\right]+ B(0.5+\cos(4t) + \sin(10t) - 0.4 \sin(16t))\\ &\quad+& C(\sin(4t) + \cos(8t) + 2 \sin(12t)), \end{array} $$

with \(\alpha _{k}\in \mathbb {R}\) and \(B\in \mathbb {R}^{16\times 16}\), \(C\in \mathbb {C}^{16 \times 16}\) have a structure as shown in Fig. 3.

Fig. 3
figure 3

Structure of B and C in Experiment 3, × denotes a nonzero element

A practical experiment time ranges from 10− 6 to 10− 3 seconds, here \(\tau _{\exp } = 10^{-3}\) is used. The starting vectors are \(\mathbf {v}= \mathbf {w}= \begin {bmatrix} 1 & {\dots } 1 \end {bmatrix}^{\top }\) and n = 4 iterations of the tensor Lanczos algorithm are run. The Krylov error measures shown in Table 13 behave similarly to the measures observed for Experiments 1 and 2. The measure errsol is obtained via ode45 and shows a convergence rate a bit slower than \(\mathcal {O}(1/M)\). The slower convergence rate can, in part, be explained by the fact that a comparison is made with the ode45 solution. Additional errors are incurred when comparing s with \(\hat {\mathbf {s}}\), because the former is only available in the points τi and the latter is available only in points which are determined by ode45.

Table 13 Error measures for Experiment 3

5.4 Tensor Train approximations

As briefly mentioned in the Introduction, despite the fact that the block matrix and the tensor formulation of the problem (1) are mathematically equivalent, the tensor formulation introduced and analyzed in this work, allows the exploitation of particular low parametric representations. The aim of this section is indeed to show that for all the examples previously presented, the resulting tensors can be accurately and conveniently approximated using a low parametric representation called Tensor Train (TT) format [41, 42].

As a matter of fact, multilinear algebra, tensor analysis, and the theory of tensor approximations play increasingly important roles in nowadays computational mathematics and numerical analysis, thereby attracting tremendous interest in recent years [12]. In this panorama, Tensor Train (TT) approximations are a powerful technique for dealing with the curse of dimensionality, i.e., the particularly unpleasant feature where the number of unknowns and the computational complexity grow exponentially when the dimension of the problem increases.

Before presenting the computational results, we briefly survey the main features of the TT representation, addressing the interested reader to the surveys [11, 12]. We consider the Tensor Train (TT) format [41] for the tensors of interest in this work. Specifically, a 4-mode \(\boldsymbol {\mathcal {A}} \in \mathbb {C}^{N_{1} \times N_{2} \times M \times M }\) tensor is expressed in TT format when

$$ \boldsymbol{\mathcal{A}}_{i_{1},i_{2},i_{3},i_{4}}=G_{1}(i_{1})G_{2}(i_{2})G_{3}(i_{3})G_{4}(i_{4}) $$

where Gk(ik) is a matrix of dimension rk− 1 × rk and r0 = r4 = 1. The numbers rk are called TT-ranks, and Gk(ik) are the cores of the TT decomposition. If rkr, nkn, then storing the TT representation requires memorizing ≤ 4nr2 numbers. If r is small, then the memory requirement is much smaller than storing the full tensor, i.e., storing n4 numbers.

It is important to note that the TT representation allows to approximate various tensor operations efficiently, see, e.g., [41, Sec. 4]. In this paper, we do not propose a low parametric TT version of Algorithm 1. To be efficient, such a TT version would need a TT representation of the tensor products used in Algorithm 1. This paper aims to show that Algorithm 1 works; further enhancements are postponed to future investigations.

In Tables 1415, and 16 we present the TT ranks for all the tensors considered in Section 5.3. In particular, the tables present the details of the TT approximations obtained using the TT-toolbox [41] when the required accuracy for the approximation is set to 10− 5 and 10− 10. As becomes evident from the presented results, all the considered tensors are amenable of a low parametric representation provided by the TT format and, indeed, for all the presented results the Compression Factor (C.F.), which is defined as \(({\sum }_{k=1}^{4} r_{k-1} \times n_{k} \times r_{k})/nnz(\boldsymbol {\mathcal {A}})\), with \(nnz(\boldsymbol {\mathcal {A}})\) the number of nonzero elements of \(\boldsymbol {\mathcal {A}}\), lies in the interval (10− 3,0.5). It is important to observe that when increasing the accuracy from 10− 5 to 10− 10 the C.F. does not significantly change, suggesting the interesting fact that for the considered tensors, the TT format is closer to an exact representation rather than an approximation. Finally, it is important to note that the ranks of the TT approximations are robust across the choices of the parameter ν (cfr. the TT ranks in Table 14 and in Table 15) and to note also that, for some of the considered problems, the number of parameters needed for the approximation can be two orders of magnitude smaller than \(nnz(\boldsymbol {\mathcal {A}})\); see Tables 15 and 16.

Table 14 TT ranks and compression factor for Experiment 1
Table 15 TT ranks and compression factor for Experiment 2
Table 16 TT ranks and compression factor for Experiment 3

6 Conclusions

In this work we introduced a non-Hermitian Lanczos algorithm for tensors and we provided the corresponding theoretical analysis. In particular, after introducing all the necessary theoretical background, we are able to interpret such Lanczos-type process in terms of tensor polynomials and to prove the related matching moment property. A series of numerical experiments performed on real-world problems confirm the effectiveness of our approach. Using a linearly converging approximation for the inputs, the algorithm produces a linearly converging approximation of the bilinear form wHU(t)v, where U(t) is the solution of the ODE (1). More accurate approximation schemes for the inputs are currently being developed by some of the paper’s authors, possibly leading to faster convergence. Moreover, in all the considered examples, the related tensors show a low parametric structure in terms of Tensor Train representation. This important feature paves the path for future efficiency improvements of our proposal where this representation is fully exploited in the Lanczos-type procedure.