1 Introduction

Many one-dimensional tasks in numerical analysis can be generalized to a two-dimensional formulation by means of tensor product formulas. This is the case, for example, in the context of spectral decomposition or interpolation of multivariate functions. Indeed, the one-dimensional formula

$$s_{i}=\sum\limits_{j=1}^{m}t_{j} \ell_{ij}, \quad 1 \leq i \leq n,$$

where the values tj are linearly combined to obtain the values si (i.e., s = Lt, with \(\boldsymbol {s} = (s_{i}) \in \mathbb {C}^{n}\), \(\boldsymbol {t} = (t_{j}) \in \mathbb {C}^{m}\), and \(L = (\ell _{ij})\in \mathbb {C}^{n\times m}\)), can be easily extended to the two-dimensional case as

$$s_{i_{1} i_{2}}=\sum\limits_{j_{2}=1}^{m_{2}}\sum\limits_{j_{1}=1}^{m_{1}}t_{j_{1}j_{2}} \ell_{i_{1}j_{1}}^{1}\ell_{i_{2}j_{2}}^{2}, \quad 1 \leq i_{1} \leq n_{1}, \quad 1 \leq i_{2} \leq n_{2}.$$
(1)

The meaning of the involved scalar quantities depends on the specific example under consideration. In any case, a straightforward implementation of formula (1) requires four nested for-loops, with a resulting computational cost of \(\mathcal {O}(n^{4})\) (if, for simplicity, we consider m1 = m2 = n1 = n2 = n). On the other hand, formula (1) can be written equivalently in matrix formulation as

$$\boldsymbol{S} = L_{1} \boldsymbol{T} L_{2}^{\mathsf{T}},$$
(2)

where \(L_{1} = (\ell ^{1}_{i_{1} j_{1}})\in \mathbb {C}^{n_{1}\times m_{1}}\), \(L_{2} = (\ell ^{2}_{i_{2} j_{2}})\in \mathbb {C}^{n_{2}\times m_{2}}\), \(\boldsymbol {T} = (t_{j_{1} j_{2}}) \in \mathbb {C}^{m_{1} \times m_{2}}\) and \(\boldsymbol {S} = (s_{i_{1} i_{2}}) \in \mathbb {C}^{n_{1} \times n_{2}}\). The usage of formula (2) requires two separate matrix-matrix products as floating point operations, each of which can be implemented with three nested for-loops: this approach reduces then the cost of computing the elements of \(\boldsymbol{S}\) to \(\mathcal {O}(n^{3})\). On the other hand, a more efficient way to realize formula (2) is to exploit optimized Basic Linear Algebra Subprograms (BLAS) [1, 2, 3, 4], which are a set of numerical linear algebra routines that perform the just mentioned matrix operations with a level of efficiency close to the theoretical hardware limit. A performance comparison of the three approaches to compute the values \(s_{i_{1}i_{2}}\) in matlabFootnote 1 language, for increasing size of the task, is given in Table 1. As expected, for all the values of n under study, the most efficient way to compute the elements of S is realizing formula (2) through the BLAS approach. Remark that the considerations on the complexity cost and BLAS efficiency are basically language-independent, and apply for other interpreted or compiled languages as well, like Python, Julia, R, Fortran, and C++. For clarity of exposition and simplicity of presentation of the codes, we will use in this manuscript, from now on, matlab programming language.

Table 1 Wall-clock time (in seconds) for the computation of the values \(s_{i_{1}i_{2}}\) in formula (1) with increasing size m1 = m2 = n1 = n2 = n and different approaches, using MathWorks MATLAB® R2019a. The input values are standard normal distributed random numbers

In other contexts, such as numerical solution of (stiff) differential equations on two-dimensional tensor product domains by means of exponential integrators or preconditioned iterative methods, it is required to compute quantities like

$$\text{vec}{(\boldsymbol{S})} = (L_{2} \otimes L_{1}) \text{vec}{(\boldsymbol{T})},$$
(3)

being again L1, L2, T and S matrices of suitable size whose meaning depends on the specific example under consideration. Here ⊗ denotes the standard Kronecker product of two matrices, while vec represents the vectorization operator, see Appendix for their formal definitions. A straightforward implementation of formula (3) would need to assemble the large-sized matrix L2L1. If, for simplicity, we consider again m1 = m2 = n1 = n2 = n, this approach requires a storage and a computational cost of \(\mathcal {O}(n^{4})\), which is impractical. However, owing to the properties of the Kronecker product (see Appendix), we can see that formula (3) is equivalent to formula (2). Therefore, all the considerations made for the previous example on the employment of optimized BLAS apply also in this case.

The aim of this work is to provide a common framework for generalizing formula (2) in arbitrary dimension d, which will result in an efficient BLAS realization of the underlying task. This is very useful in the context of solving tensor-structured problems which may arise from different scientific and engineering fields. The pursued approach is illustrated in detail in Section 2, in which we present the μ-mode product and some associated operations (the Tucker operator, in particular), both from a theoretical and a practical point of view. These operations are widely known by the tensor algebra community, but their usage is mostly restricted in the context of tensor decompositions (see [5, 6]). Then, we proceed in Section 3 by describing more precisely the one- and two-dimensional formulations of the problems mentioned in this section, as well as their generalization to the d-dimensional case in terms of μ-mode products. We collect in Section 4 the related numerical experiments and we finally draw the conclusions in Section 5.

All the functions and the scripts needed to perform the relevant tensor operations and to reproduce the numerical examples of this manuscript are contained in our matlab package KronPACK.Footnote 2

2 The μ-mode product and its applications

In order to generalize formula (2) to the d-dimensional case, we rely on some concepts from tensor algebra (see [5, 6] for more details). Throughout this section, we assume that \(\boldsymbol {T}\in \mathbb {C}^{m_{1}\times \cdots \times m_{d}}\) is an order-d tensor whose elements are either denoted by \(t_{j_{1}{\ldots } j_{d}}\) or by T(j1,…,jd).

Definition 2.1

A μ-fiber of T is a vector in \(\mathbb {C}^{m_{\mu }}\) obtained by fixing every index of the tensor but the μth.

A μ-fiber is nothing but a generalization of the concept of rows and columns of a matrix. Indeed, for an order-2 tensor (i.e., a matrix), 1-fibers are the columns, while 2-fibers are the rows. On the other hand, for an order-3 tensor, 1-fibers are the column vectors, 2-fibers are the row vectors while 3-fibers are the so-called “page” or “tube” vectors, which means vectors along the third dimension.

Definition 2.2

The μ-matricization of T, denoted by \(T^{(\mu )}\in ~\mathbb {C}^{m_{\mu } \times m_{1}{\cdots } m_{\mu -1}m_{\mu +1}{\cdots } m_{d}}\), is defined as the matrix whose columns are the μ-fibers of T.

Remark that for an order-2 tensor the 1- and 2-matricizations simply correspond to the matrix itself and its transpose. In dimensions higher than two, the μ-matricization requires the concept of generalized transpose of a tensor and its unfolding into a matrix. The first operation is realized in matlab by the function permute, that we use to interchange μ-fibers with 1-fibers of the tensor T. The second operation is performed by the reshape function, that we use to unfold the “transposed” tensor into the matrix T(μ). In matlab, the anonymous function which performs the μ-matricization of a tensor T, given


m = size(T); d = length(m);

can be written as


mumat = @(T,mu) reshape(permute(T,[mu,1:mu-1,mu+1:d]),... m(mu),prod(m([1:mu-1,mu+1:d])));

By means of μ-fibers, it is possible to define the following operation.

Definition 2.3

Let \(L\in \mathbb {C}^{n\times m_{\mu }}\) be a matrix. The μ-mode product of T with L, denoted by S = T ×μL, is the tensor \(\boldsymbol {S}\in \mathbb {C}^{m_{1}\times \cdots \times m_{\mu -1}\times n\times m_{\mu +1}\times {\cdots } \times m_{d}}\) obtained by multiplying the matrix L onto the μ-fibers of T.

From this definition, it appears clear that the μ-fiber S(j1,…,jμ− 1,⋅,jμ+ 1,…,jd) of S can be computed as the matrix-vector product of L and the μ-fiber T(j1,…,jμ− 1,⋅,jμ+ 1,…,jd). Therefore, the μ-mode product T ×μL might be performed by calling m1mμ− 1mμ+ 1md times level 2 BLAS. However, owing to the concept of matricization of a tensor introduced in Definition 2.2, it is possible to perform the same task more efficiently by using a single level 3 BLAS call. Indeed, the μ-mode product of T with L is just the tensor S such that

$$S^{(\mu)}=L T^{(\mu)}.$$
(4)

In particular, in the two-dimensional setting, the 1-mode product corresponds to the multiplication LT, while the 2-mode product corresponds to (LT \({}^{\mathsf{T}}\)) \({}^{\mathsf{T}}\) = TL \({}^{\mathsf{T}}\). In general, we can compute the matrix S(μ) appearing in formula (4) as L∗mumat(T,mu), and in order to recover the tensor S from S(μ) we need to invert the operations of unfolding and “transposing”. This can be done easily with the aid of the matlab functions reshape and ipermute, respectively. All in all, given the value n = size(L,1), the anonymous function that computes the μ-mode product of an order-d tensor T with L by a single matrix-matrix product can be written as


mump = @(T,L,mu) ipermute(reshape(L*mumat(T,mu),... [n,m([1:mu-1,mu+1:d])]),[mu,1:mu-1,mu+1:d]);

Notice that from formula (4) it appears clear that the computational cost of the μ-mode product, in terms of floating point operations, is \(\mathcal {O}\)(nm1md).

One of the main applications of the μ-mode product is the so-called Tucker operator, which is implemented in KronPACK in the function tucker.

Definition 2.4

Let \(L_{\mu }\in \mathbb {C}^{n_{\mu }\times m_{\mu }}\) be matrices, with μ = 1,…,d. The Tucker operator of T with L1,…,Ld is the tensor \(\boldsymbol {S}\in \mathbb {C}^{n_{1}\times \cdots \times n_{d}}\) obtained by concatenating d consecutive μ-mode products with matrices Lμ, that is

$$\boldsymbol{S} = \boldsymbol{T}\times_{1}L_{1}\times_{2}\cdots\times_{d}L_{d}.$$
(5)

We notice that the single element \(s_{i_{1}{\ldots } i_{d}}\) of S in formula (5) turns out to be

$$s_{i_{1}{\ldots} i_{d}}=\sum\limits_{j_{d}=1}^{m_{d}}{\cdots} \sum\limits_{j_{1}=1}^{m_{1}} t_{j_{1}{\ldots} j_{d}}\prod\limits_{\mu=1}^{d}\ell_{i_{\mu} j_{\mu}}^{\mu},\quad 1\leq i_{\mu}\leq n_{\mu},$$
(6)

provided that \(\ell ^{\mu }_{i_{\mu } j_{\mu }}\) are the elements of Lμ. Hence, as formula (6) is clearly the generalization of formula (1) to the d-dimensional setting, formula (5) is the sought d-dimensional generalization of formula (2). We also notice that the Tucker operator (5) is invariant with respect to the ordering of the μ-mode products, and that the implicit ordering given by Definition 2.4 is equivalent to performing the sums in formula (6) starting from the innermost.

The Tucker operator is strictly connected with the Kronecker product of matrices applied to a vector.

Lemma 2.1

Let \(L_{\mu } \in \mathbb {C}^{n_{\mu } \times m_{\mu }}\) be matrices, with μ = 1,…,d. Then, the elements of S in formula (5) are equivalently given by

$$\text{vec}(\boldsymbol{S}) = (L_{d} \otimes {\cdots} \otimes L_{1})\text{vec}(\boldsymbol{T}).$$
(7)

Proof 

The μ-mode product satisfies the following property

$$\boldsymbol{S}=\boldsymbol{T} \times_{1} L_{1} \times_{2} {\cdots} \times_{d} L_{d} \iff S^{(\mu)} = L_{\mu} T^{(\mu)}(L_{d}\otimes {\cdots} \otimes L_{\mu+1}\otimes L_{\mu-1} \otimes {\cdots} \otimes L_{1})^{\mathsf{T}},$$

see [6]. Then, with μ = 1 we obtain

$$\boldsymbol{S}=\boldsymbol{T} \times_{1} L_{1} \times_{2} {\cdots} \times_{d} L_{d} \iff S^{(1)} = L_{1} T^{(1)}(L_{d}\otimes {\cdots} \otimes L_{2})^{\mathsf{T}}.$$

By means of the properties of the Kronecker product (see Appendix) we have then

$$S^{(1)} =L_{1} T^{(1)}(L_{d}\otimes {\cdots} \otimes L_{2})^{\mathsf{T}} \iff \text{vec}(S^{(1)}) = (L_{d}\otimes {\cdots} \otimes L_{1}) \text{vec}(T^{(1)})$$

and finally, by definition of vec operator,

$$\text{vec}(S^{(1)}) = (L_{d}\otimes {\cdots} \otimes L_{1}) \text{vec}(T^{(1)}) \iff \text{vec}(\boldsymbol{S}) = (L_{d}\otimes {\cdots} \otimes L_{1}) \text{vec}(\boldsymbol{T}).$$

Notice that formula (7) is precisely the d-dimensional generalization of formula (3). Hence, tasks written as in formula (7) can be equivalently stated and computed more efficiently again by formula (5), without assembling the large-sized matrix Ld ⊗ ⋯ ⊗ L1.

We can then summarize as follows: the element-wise formulation (6), the tensor formulation (5) and the vector formulation (7) can all be used to compute the entries of the tensor S. However, in light of the considerations for the μ-mode product, only the tensor formulation can be efficiently computed by d calls of level 3 BLAS, with an overall computational cost of \(\mathcal {O}(n^{d+1})\) for the case mμ = nμ = n. This is the reason why the relevant functions of our package KronPACK are based on formulation (5).

Remark 1

The implementation of a single μ-mode product in the function mump of KronPACK involves two explicit permutations of the tensor (except the 1-mode and the d-mode products, which are realized without explicitly permuting, thanks to the design of the function reshape in matlab). On the other hand, the function tucker, which realizes the Tucker operator (5), performs a composition of any pair of consecutive permutations, thus reducing their overall number. In fact, this is important when dealing with large-sized tensors, because the cost of permuting is not negligible due to the underlying alteration of the memory layout. For this reason, several algorithms which further reduce or completely avoid permutations in an efficient way have been developed (see, for instance, [7, 8, 9, 10]). In this context, for instance, it is possible to use the function pagemtimes to efficiently realize a “Loops-over-GEMMs” strategy. However, as this function has been recently introduced in MathWorks MATLAB® R2020b and it is still not available in the latest stable GNU Octave release 7.1.0, for compatibility reasons we do not follow this approach.

Notice that the definition of μ-mode product and its realization through the function mump can be easily extended to the case in which instead of a matrix L we have a matrix-free operator \({\mathcal{L}}\).

Definition 2.5

Let \({\mathcal{L}}: \mathbb {C}^{m_{\mu }} \to \mathbb {C}^{n}\) be an operator. Then the μ-mode action of T with \({\mathcal{L}}\), still denoted \(\boldsymbol {S} = \boldsymbol {T} \times _{\mu } {\mathcal{L}}\), is the tensor \(\boldsymbol {S}\in \mathbb {C}^{m_{1}\times \cdots \times m_{\mu -1}\times n\times m_{\mu +1}\times {\cdots } \times m_{d}}\) obtained by the action of the operator \({\mathcal{L}}\) on the μ-fibers of T.

In matlab, if the operator \({\mathcal{L}}\) is represented by the function Lfun which operates on columns, we can implement the μ-mode action by


mumpfun = @(T,Lfun,mu) ipermute(reshape(Lfun(mumat(T,mu)),... [n,m([1:mu-1,mu+1:d])]),[mu,1:mu-1,mu+1:d]);

The corresponding generalization of the Tucker operator, denoted again by

$$\boldsymbol{S} = \boldsymbol{T} \times_{1} \mathcal{L}_{1} \times_{2} {\cdots} \times_{d} \mathcal{L}_{d}$$
(8)

and implemented in KronPACK in the function tuckerfun, follows straightforwardly. Clearly, in this case, some properties of the Tucker operator (5), such as the aforementioned invariance with respect to the ordering of the μ-mode product operations, may not hold anymore for generic operators \({\mathcal{L}}_{\mu }\). Generalization (8) is useful in some instances, see Remark 3 and Section 4.2 for an example. We remark that such an extension is not available in some other popular tensor algebra toolboxes, such as Tensor Toolbox for MATLAB [11] — which does not have GNU Octave support, too — and Tensorlab [12], both of which are more devoted to tensor decomposition and related topics.

The μ-mode product is also useful for computing the action of the Kronecker sum (see Appendix for its definition) of the Lμ matrices to a vector v, that is

$$(L_{d}\oplus\cdots\oplus L_{1})\boldsymbol{v}=\text{vec}\left(\sum\limits_{\mu=1}^{d}(\boldsymbol{V} \times_{\mu} L_{\mu})\right),$$
(9)

where v = vec(V). In fact, as it can be noticed from formula (4), the identity matrix is the identity element of the μ-mode product. Combining this observation with Lemma 2.1, we easily obtain formula (9). In our package KronPACK, the matrix resulting from the Kronecker sum on the left hand side of equality (9) can be computed as kronsum(L), where L is the cell array containing Lμ in L{mu}. On the other hand, its action on v can be computed equivalently in tensor formulation, without forming the matrix itself, by kronsumv(V,L).

3 Problems formulation in d dimensions

In this section we discuss in more detail the problems that were briefly introduced in Section 1. Their generalization to arbitrary dimension d is addressed thanks to the common framework presented in Section 2.

3.1 Pseudospectral decomposition

Suppose that a function \(f\colon R\to \mathbb {C}\), with \(R\subseteq \mathbb {R}\), can be expanded into a series

$$f(x)=\sum\limits_{i=1}^{\infty} f_{i}\phi_{i}(x),$$

where fi are complex scalar coefficients and ϕi(x) are complex functions orthonormal with respect to the standard L2(R) inner product, i.e.,

$${\int}_{R}\phi_{i}(x)\overline{\phi_{j}(x)}dx=\delta_{ij}, \quad \forall i,j.$$

Then, the spectral coefficients fi are defined by

$$f_{i}={\int}_{R} f(x)\overline{\phi_{i}(x)}dx,$$

and can be approximated by a quadrature formula. Usually, in this context, specific Gaussian quadrature formulas are employed, whose node and weights vary depending on the chosen family of basis functions. If we consider q quadrature nodes ξk and weights wk, we can compute the first m pseudospectral coefficients by

$$\hat f_{i}=\sum\limits_{k=1}^{q} f(\xi^{k})\overline{\phi_{i}(\xi^{k})}w^{k}\approx f_{i},\quad 1\leq i\leq m.$$

By collecting the values \(\overline {\phi _{i}(\xi ^{k})}\) in position (i,k) of the matrix \({\Psi }\in \mathbb {C}^{m\times q}\) and the values f(ξk)wk in the vector fw, we can compute the pseudospectral coefficients by means of the single matrix-vector product

$$\hat{\boldsymbol{f}}={\Psi}\boldsymbol{f}_{\boldsymbol{w}}.$$

In the two-dimensional case, the coefficients of a pseudospectral expansion in a tensor product basis (see, for instance, [, Ch. 6.10]) are given by 13

$$\hat f_{i_{1} i_{2}}=\sum\limits_{k_{2}=1}^{q_{2}}\sum\limits_{k_{1}=1}^{q_{1}} f\left(\xi_{1}^{k_{1}},\xi_{2}^{k_{2}}\right)\overline{\phi_{i_{1}}^{1}(\xi_{1}^{k_{1}})} \overline{\phi_{i_{2}}^{2}(\xi_{2}^{k_{2}})}w_{1}^{k_{1}}w_{2}^{k_{2}},$$

which can be efficiently computed as

$$\hat{\boldsymbol{F}}={\Psi}_{1} \boldsymbol{F}_{\boldsymbol{W}} {\Psi}_{2}^{\mathsf{T}},$$

where \({\Psi }_{\mu }\in \mathbb {C}^{m_{\mu }\times q_{\mu }}\) has element \(\overline {\phi _{i_{\mu }}^{\mu }(\xi _{\mu }^{k_{\mu }})}\) in position (iμ,kμ), with μ = 1,2, and FW is the matrix with element \(f(\xi _{1}^{k_{1}},\xi _{2}^{k_{2}})w_{1}^{k_{1}}w_{2}^{k_{2}}\) in position (k1,k2).

In general, the coefficients of a d-dimensional pseudospectral expansion in a tensor product basis are given by

$$\hat f_{i_{1} {\ldots} i_{d}}=\sum\limits_{k_{d}=1}^{q_{d}}\cdots \sum\limits_{k_{1}=1}^{q_{1}} f(\xi_{1}^{k_{1}},\ldots,\xi_{d}^{k_{d}})\overline{\phi_{i_{1}}^{1}(\xi_{1}^{k_{1}})}\cdots \overline{\phi_{i_{d}}^{d}(\xi_{d}^{k_{d}})}w_{1}^{k_{1}}{\cdots} w_{d}^{k_{d}}.$$

In tensor formulation, the coefficients can be computed as (see formulas (5) and (6))

$$\hat{\boldsymbol{F}} = \boldsymbol{F}_{\boldsymbol{W}} \times_{1} {\Psi}_{1} \times_{2} {\cdots} \times_{d} {\Psi}_{d},$$

where Ψμ is the transform matrix with element \(\overline {\phi _{i_{\mu }}^{\mu }(\xi _{\mu }^{k_{\mu }})}\) in position (iμ,kμ), and we collect in the order-d tensors \(\hat {\boldsymbol {F}}\) and FW the values \(\hat f_{i_{1} {\ldots } i_{d}}\) and \(f(\xi _{1}^{k_{1}},\ldots ,\xi _{d}^{k_{d}})w_{1}^{k_{1}}{\cdots } w_{d}^{k_{d}}\), respectively. The corresponding pseudospectral approximation of f(x) is

$$\hat{f}(\boldsymbol{x})=\sum\limits_{i_{d}=1}^{m_{d}}\cdots\sum\limits_{i_{1}=1}^{m_{1}} \hat{f}_{i_{1} {\ldots} i_{d}} \phi_{i_{1}}^{1}(x_{1})\cdots\phi_{i_{d}}^{d}(x_{d}),$$
(10)

where x = (x1,…,xd). An application to Hermite–Laguerre–Fourier function decomposition is given in Section 4.2.

3.2 Function approximation

Suppose we are given an approximation of a univariate function f(x) in the form

$$\tilde f(x)=\sum\limits_{i=1}^{m} c_{i} \phi_{i}(x)\approx f(x),$$
(11)

where ci are scalar coefficients and ϕi(x) are generic (basis) functions. This is the case, for example, in the context of function interpolation or pseudospectral expansions. We are interested in the evaluation of formula (11) at given points x, with 1 ≤ n. This can be easily realized in a single matrix-vector product: indeed, if we collect the coefficients ci in the vector \(\boldsymbol {c} \in \mathbb {C}^{m}\) and we form the matrix \({\Phi }\in \mathbb {C}^{n\times m}\) with element ϕi(x) in position (,i), the sought evaluation is given by

$$\tilde{\boldsymbol{f}}={\Phi}\boldsymbol{c},$$

being \(\tilde {\boldsymbol {f}}\in \mathbb {C}^{n}\) the vector containing the approximated function at the given set of evaluation points.

The extension of formula (11) to the tensor product bivariate case is straightforward (see, for instance, [ 14, Ch. XVII]). Indeed, in this case the approximating function is given by

$$\tilde f(x_{1},x_{2})=\sum\limits_{i_{2}=1}^{m_{2}}\sum\limits_{i_{1}=1}^{m_{1}} c_{i_{1} i_{2}}\phi_{i_{1}}^{1}(x_{1})\phi_{i_{2}}^{2}(x_{2})\approx f(x_{1},x_{2}),$$
(12)

where \(c_{i_{1} i_{2}}\) represent scalar coefficients and \(\phi _{i_{\mu }}^{\mu }(x_{\mu })\) the (univariate) basis function, with 1 ≤ iμmμ and μ = 1,2. Then, given a Cartesian grid of points \((x_{1}^{\ell _{1}},x_{2}^{\ell _{2}})\), with 1 ≤ μnμ, the evaluation of approximation (12) can be computed efficiently in matrix formulation by

$$\tilde {\boldsymbol{F}} = {\Phi}_{1} \boldsymbol C {\Phi}_{2}^{\mathsf{T}}.$$

Here we collected the function evaluations \(\tilde {f}(x_{1}^{\ell _{1}},x_{2}^{\ell _{2}})\) in the matrix \(\tilde {\boldsymbol {F}}\), we formed the matrices \({\Phi }_{\mu }\in \mathbb {C}^{n_{\mu }\times m_{\mu }}\) of element \(\phi ^{\mu }_{i_{\mu }}(x_{\mu }^{\ell _{\mu }})\) in position (μ,iμ), and we let C be the matrix of element \(c_{i_{1} i_{2}}\) in position (i1,i2).

In general, the approximation of a d-variate function f with tensor product basis functions is given by

$$\tilde f(\boldsymbol{x})=\sum\limits_{i_{d}=1}^{m_{d}}\cdots\sum\limits_{i_{1}=1}^{m_{1}} c_{i_{1}{\ldots} i_{d}}\phi_{i_{1}}^{1}(x_{1})\cdots\phi_{i_{d}}^{d}(x_{d})\approx f(\boldsymbol{x}),$$
(13)

where \(c_{i_{1}{\ldots } i_{d}}\) represent scalar coefficients while \(\phi _{i_{\mu }}^{\mu }(x_{\mu })\) the (univariate) basis functions, with 1 ≤ iμmμ. Then, given a Cartesian grid of points \((x_{1}^{\ell _{1}},\ldots ,x_{d}^{\ell _{d}})\), with 1 ≤ μnμ, the evaluation of approximation (13) can be expressed in tensor formulation as

$$\tilde {\boldsymbol{F}} = \boldsymbol C \times_{1} {\Phi}_{1} \times_{2} {\cdots} \times_{d} {\Phi}_{d},$$
(14)

see formulas (5) and (6). Here we denote Φμ the matrix with element \(\phi _{i_{\mu }}^{\mu }(x_{\mu }^{\ell _{\mu }})\) in position (μ,iμ), and we collect in the order-d tensors C and \(\tilde {\boldsymbol {F}}\) the coefficients and the resulting function approximation at the evaluation points, respectively. We present an application to barycentric multivariate interpolation in Section 4.3.

Remark 2

Clearly, formula (14) can be employed to evaluate a pseudospectral approximation (10) at a generic Cartesian grid of points, by properly defining the involved tensor C and matrices Φμ. In the context of direct and inverse spectral transforms, for example for the effective numerical solution of differential equations (see [15]), one could be interested in the evaluation of pseudospectral decompositions at the same grid of quadrature points \((\xi _{1}^{k_{1}},\ldots ,\xi _{d}^{k_{d}})\) used to approximate the spectral coefficients. Under standard hypothesis, this can be done by applying formula (14) with matrices \({\Phi }_{\mu } = {\Psi }_{\mu }^{*}\), where the symbol ∗ denotes the conjugate transpose. Without forming explicitly the matrices Φμ, the desired evaluation can be computed using the matrices Ψμ by means of the KronPACK function cttucker.

Remark 3

Several functions which perform the whole one-dimensional procedure of approximating a function and evaluating it on a set of points, given suitable inputs, are available. This is the case, for example in the interpolation context, of the matlab built-in functions spline, interp1 (that performs different kinds of one-dimensional interpolations), and interpft (which performs a resample of the input values by means of FFT techniques), or of the functions provided by the QIBSH++ library [16] in the approximation context. Yet, it is possible to extend the usage of this kind of functions to the approximation in the d-dimensional tensor setting by means of concatenations of μ-mode actions (see Definition 2.5), yielding the generalization of the Tucker operator (8). In practice, we can perform this task with the KronPACK function tuckerfun, see the numerical example in Section 4.2.

3.3 Action of the matrix exponential

Suppose we want to solve the linear Partial Differential Equation (PDE)

$$\left\{ \begin{aligned} \partial_{t}u(t,x) &= \mathcal{A}u(t,x), \quad t>0, \quad x\in{\Omega}\subset\mathbb{R},\\ u(0,x)&=u_{0}(x), \end{aligned}\right.$$
(15)

coupled with suitable boundary conditions, where \(\mathcal {A}\) is a linear time-independent spatial (integer or fractional) differential operator, typically stiff. The application of the method of lines to equation (15), by discretizing first the spatial variable, e.g., by finite differences or spectral differentiation, leads to the system of Ordinary Differential Equations (ODEs)

$$\left\{\begin{aligned} \boldsymbol{u}^{\prime}(t) &= A\boldsymbol{u}(t), \quad t>0,\\ \boldsymbol{u}(0)&=\boldsymbol{u}_{0}, \end{aligned}\right.$$
(16)

for the unknown vector u(t). Here, \(A\in \mathbb {C}^{n\times n}\) is the matrix which approximates the differential operator \(\mathcal {A}\) on the grid points x, with 1 ≤ n. The exact solution of system (16) is obviously \(\boldsymbol {u}(t)=\exp (tA)\boldsymbol {u}_{0}\) and, if the size of A allows, it can be effectively computed by Padé or Taylor approximations (see [17, 18]). If the size of A is too large, then one has to rely on algorithms to approximate the action of the matrix exponential \(\exp (tA)\) on the vector u0. Examples of such methods are [19, 20, 21, 22].

Suppose now we want to solve instead

$$\left\{ \begin{aligned} \partial_{t}u(t,x_{1},x_{2}) &= \mathcal{A}u(t,x_{1},x_{2}), \quad t>0, \quad (x_{1},x_{2})\in{\Omega}\subset\mathbb{R}^{2},\\ u(0,x_{1},x_{2})&=u_{0}(x_{1},x_{2}), \end{aligned}\right.$$
(17)

coupled again with suitable boundary conditions. If PDE (17) admits a Kronecker structure, such as for some linear Advection–Diffusion–Absorption (ADA) equations on tensor product domains or linear Schrödinger equations with a potential in Kronecker form (see [15] for more details and examples), then the method of lines yields the system of ODEs

$$\left\{\begin{aligned} \boldsymbol{u}^{\prime}(t) &= \left(I_{2}\otimes A_{1}+A_{2} \otimes I_{1}\right) \boldsymbol{u}(t), \quad t>0, \\ \boldsymbol{u}(0)&=\boldsymbol{u}_{0}. \end{aligned}\right.$$
(18)

Here Aμ, with μ = 1,2, represent the one-dimensional stencil matrices corresponding to the discretization of the one-dimensional differential operators that constitute \(\mathcal {A}\) on the grid points \(x_{\mu }^{\ell _{\mu }}\), with 1 ≤ μnμ. Moreover, the notation Iμ stands for identity matrices of size nμ, and the component 1 + (2 − 1)n1 of u corresponds to the grid point \((x_{1}^{\ell _{1}},x_{2}^{\ell _{2}})\), that is

$$u_{\ell_{1}+(\ell_{2}-1)n_{1}}(t)\approx u(t,x_{1}^{\ell_{1}},x_{2}^{\ell_{2}}).$$

This, in turn, is consistent with the linearization of the indexes of the vec operator defined in Appendix.

Clearly, the solution of system (18) is given by

$$\boldsymbol{u}(t) = \exp\left(t(I_{2}\otimes A_{1}+A_{2} \otimes I_{1})\right)\boldsymbol{u}_{0},$$
(19)

which again could be computed by any method to compute the action of the matrix exponential on a vector. Remark that, since the matrices I2A1 and A2I1 commute and using the properties of the Kronecker product (see Appendix), one could write everything in terms of the exponentials of the small-sized matrices Aμ. Indeed, we have

$$\begin{array}{@{}rcl@{}} \boldsymbol{u}(t) &= \exp\left(t(I_{2}\otimes A_{1}+A_{2} \otimes I_{1})\right)\boldsymbol{u}_{0} = \exp(t(I_{2}\otimes A_{1}))\exp(t(A_{2} \otimes I_{1}))\boldsymbol{u}_{0} \\ &= \left(I_{2} \otimes \exp(tA_{1})\right)\left(\exp(tA_{2})\otimes I_{1}\right)\boldsymbol{u}_{0} = (\exp(tA_{2})\otimes\exp(tA_{1}))\boldsymbol{u}_{0}. \end{array}$$

However, as in general the matrices \(\exp (tA_{\mu })\) are full, their Kronecker product results in a large and full matrix to be multiplied into u0, which is an extremely inefficient approach. Nevertheless, if we fully exploit the tensor structure of the problem, we can still compute the solution of the system efficiently just in terms of the exponentials \(\exp (tA_{\mu })\). Indeed, let U(t) be the n1 × n2 matrix whose stacked columns form the vector u(t), that is

$$\text{vec}(\boldsymbol{U}(t))=\boldsymbol{u}(t).$$

Then, using this matrix notation and by means of the properties of the Kronecker product, problem (18) takes the form

$$\left\{\begin{aligned} \boldsymbol{U}^{\prime}(t) &= A_{1}\boldsymbol{U}(t) + \boldsymbol{U}(t) A_{2}^{\mathsf{T}}, \quad t>0,\\ \boldsymbol{U}(0)&= \boldsymbol{U}_{0}, \end{aligned}\right.$$

and it is well-known (see [23]) that its solution can be computed in matrix formulation as

$$\boldsymbol{U}(t) = \exp(t A_{1})\boldsymbol{U}_{0}\exp(t A_{2})^{\mathsf{T}}.$$

In general, the d-dimensional version of solution (19) is

$$\boldsymbol{u}(t) = \exp\left(t\sum\limits_{\mu=1}^{d} \left(I_{d}\otimes {\cdots} \otimes I_{\mu+1}\otimes A_{\mu}\otimes I_{\mu-1}\otimes {\cdots} \otimes I_{1}\right)\right)\boldsymbol{u}_{0},$$

which can be written in more compact notation as

$$\boldsymbol{u}(t)=\exp\left(t\left(A_{d}\oplus {\cdots} \oplus A_{1}\right)\right)\boldsymbol{u}_{0}.$$
(20)

Here, Aμ are square matrices of size nμ, and u0 is a vector of length N = n⋯ nd. Then, similarly to the two-dimensional case, we have

$$\boldsymbol u(t)=\exp\left(t(A_{d}\oplus {\cdots} \oplus A_{1})\right)\boldsymbol{u}_{0} =(\exp(tA_{d})\otimes\cdots\otimes\exp(tA_{1}))\boldsymbol{u}_{0}.$$

Finally, using Lemma 2.1, we have

$$\boldsymbol U(t) = \boldsymbol U_{0} \times_{1} \exp(tA_{1}) \times_{2} {\cdots} \times_{d} \exp(tA_{d}),$$
(21)

where U(t) and U0 are d-dimensional tensors such that u(t) = vec(U(t)) and u0 = vec(U0). Hence, the action of the large-sized matrix exponential appearing in formula (20) can be computed by the Tucker operator (21) which just involves the small-sized matrix exponentials \(\exp (t A_{\mu })\). For an application in the context of solution of an ADA linear evolutionary equation with spatially variable coefficients, see Section 4.4.

3.4 Preconditioning of linear systems

Suppose we want to solve the semilinear PDE

$$\left\{ \begin{aligned} \partial_{t}u(t,x) &= \mathcal{A}u(t,x) + f(t,u(t,x)), \quad t>0, \quad x\in{\Omega}\subset\mathbb{R},\\ u(0,x)&=u_{0}(x), \end{aligned}\right.$$
(22)

coupled with suitable boundary conditions, where \(\mathcal {A}\) is a linear time-independent spatial differential operator and f is a nonlinear function. Using the method of lines, similarly to what led to system (16), we obtain

$$\left\{\begin{aligned} \boldsymbol{u}^{\prime}(t) &=A\boldsymbol{u}(t)+\boldsymbol{f}(t,\boldsymbol{u}(t)), \quad t>0, \\ \boldsymbol{u}(0)&=\boldsymbol{u}_{0}. \end{aligned}\right.$$
(23)

A common approach to integrating system (23) in time involves the use of IMplicit EXplicit (IMEX) schemes. For instance, the application of the well-known backward-forward Euler method with constant time step size τ leads to the solution of the linear system

$$M\boldsymbol u_{k+1}= \boldsymbol u_{k} + \tau \boldsymbol{f} (t_{k},\boldsymbol u_{k})$$

at every time step, where \(M=(I-\tau A)\in \mathbb {C}^{n\times n}\) and I is an identity matrix of suitable size. If the space discretization allows (second order centered finite differences, for example), the system can then be solved by means of the very efficient Thomas algorithm. If, on the other hand, this is not the case, a suitable direct or (preconditioned) iterative method can be employed.

Let us consider now the two-dimensional version of the semilinear PDE (22), i.e.,

$$\left\{ \begin{aligned} \partial_{t}u(t,x_{1},x_{2}) &= \mathcal{A}u(t,x_{1},x_{2}) + f(t,u(t,x_{1},x_{2})), \quad t>0, \quad (x_{1},x_{2})\in{\Omega}\subset\mathbb{R}^{2},\\ u(0,x_{1},x_{2})&=u_{0}(x_{1},x_{2}), \end{aligned}\right.$$
(24)

again with suitable boundary conditions, \(\mathcal {A}\) a linear time-independent spatial differential operator and f a nonlinear function. As for equation (17), if the PDE admits a Kronecker sum structure, the application of the method of lines leads to

$$\left\{\begin{aligned} \boldsymbol{u}^{\prime}(t)&=(I_{2}\otimes A_{1} + A_{2}\otimes I_{1})\boldsymbol{u}(t)+\boldsymbol{f}(t,\boldsymbol{u}(t)), \quad t>0,\\ \boldsymbol{u}(0)&=\boldsymbol{u}_{0}, \end{aligned}\right.$$
(25)

which can be integrated in time again by means of the backward-forward Euler method. The matrix of the resulting linear system to be solved at every time step is now

$$M=I_{2}\otimes M_{1}+M_{2}\otimes I_{1}= I_{2}\otimes \left(\frac{1}{2}I_{1}-\tau A_{1}\right)+ \left(\frac{1}{2}I_{2}-\tau A_{2}\right)\otimes I_{1}.$$

If we use an iterative method, we can obtain the action of the matrix M to a vector v as

$$M_{1}\boldsymbol{V}+\boldsymbol{V}M_{2}^{\mathsf{T}} = \boldsymbol{V}_{\!M},\quad \text{vec}(\boldsymbol{V})=\boldsymbol{v},$$

by observing that

$$M \boldsymbol{v}=\text{vec}(\boldsymbol{V}_{\!M}).$$

Moreover, examples of effective preconditioners for this kind of linear systems are the ones of Alternating Direction Implicit (ADI) type (see [24]). In this case, we can use the product of the matrices arising from the discretization of equation (24) after neglecting all the spatial variables but one in the operator \(\mathcal {A}\). We obtain then the preconditioner

$$(I_{2}-\tau A_{2})\otimes(I_{1}-\tau A_{1}) = P_{2}\otimes P_{1} = P,$$
(26)

which is expected to be effective since \(P=M+\mathcal {O}(\tau ^{2})\). In addition, the action of P− 1 to a vector v can be efficiently obtained as

$$P_{1}^{-1}\boldsymbol{V}P_{2}^{-\sf{T}} = \boldsymbol{V}_{\!P^{-1}},$$

by noticing that

$$P^{-1}\boldsymbol{v}=(P_{2}^{-1}\otimes P_{1}^{-1})\boldsymbol{v}=\text{vec}(\boldsymbol{V}_{\!P^{-1}}).$$

Remark 4

Another approach of solution to equation (25) would be to write the equivalent matrix formulation of the problem, i.e.,

$$\left\{ \begin{aligned} \boldsymbol{U}^{\prime}(t)&=A_{1} \boldsymbol{U}(t)+\boldsymbol{U}(t)A_{2}^{\mathsf{T}}+\boldsymbol{F}(t,\boldsymbol{U}(t)), \quad t>0,\\ \boldsymbol{U}(0)&=\boldsymbol{U}_{0}, \end{aligned}\right.$$

and then apply appropriate algorithms to integrate it numerically, mainly based on the solution of Sylvester equations. This is the approach pursued, for example, in [25].

In general, for a d-dimensional semilinear problem with a Kronecker sum structure, the linear system to be solved at every time step has now matrix

$$M = M_{d} \oplus {\cdots} \oplus M_{1}, \quad M_{\mu}=\left(\frac{1}{d}I_{\mu}-\tau A_{\mu}\right).$$

Again, the action of the matrix M on a vector v can be computed without assembling the matrix (see equivalence (9)). Finally, an effective preconditioner for the linear system is a straightforward generalization of formula (26), i.e.,

$$(I_{d}-\tau A_{d})\otimes\cdots\otimes(I_{1}-\tau A_{1})= P_{d}\otimes\cdots\otimes P_{1} = P.$$

Similarly to the two-dimensional case, its inverse action to a vector v can be computed efficiently as

$$\boldsymbol{V} \times_{1} P_{1}^{-1} \times_{2} {\cdots} \times_{d} P_{d}^{-1} = \boldsymbol{V}_{\!P^{-1}},$$
(27)

see Lemma 2.1. In our package KronPACK, formula (27) can be realized without explicitly inverting the matrices Pμ by using the function itucker. We notice that this is another feature not available in the tensor algebra toolboxes mentioned in Section 2. For an example of application of these techniques, in the context of solution of evolutionary diffusion–reaction equations, see Section 4.5.

We finally notice that there exist also specific techniques to solve linear systems in Kronecker form, usually arising in the discretization of time-independent differential equation, see for instance [26, 27].

4 Numerical experiments

We present in this section some numerical experiments of the proposed μ-mode approach for tensor-structured problems, which make extensively use of the functions contained in our package KronPACK. We remark that, when we employ Cartesian grids of points, they have been produced by the matlab command ndgrid. If instead one would prefer to use the ordering induced by the meshgrid command (which, however, works only up to dimension three), it is enough to interchange the first and the second matrix in the Tucker operator (5). The resulting tensor is then the (2,1,3)-permutation of S in Definition 2.4.

All the numerical experiments have been performed with MathWorks MATLAB® R2019a on an Intel® Core i7-8750H CPU with 16 GB of RAM. The degrees of freedom of the problems have been kept at a moderate size, in order to be reproducible with the package KronPACK in a few seconds on a personal laptop.

4.1 Code validation

In this section we validate the tucker function of KronPACK, by comparing it to the corresponding functions of the toolboxes mentioned in Section 2, i.e., ttm and tmprod of Tensor Toolbox for MATLAB and Tensorlab, respectively. We performed several tests on tensors of different orders and sizes and the three functions always produced the same output (up to round-off unit) at comparable computational times. For simplicity of exposition, we report in Fig. 1 just the wall-clock times of the experiments with tensors of order d = 3 and d = 6. For each selected value of d, we take as tensors and matrices sizes mμ = nμ = n, μ = 1,…,d, for different values of n, in such a way that the number of degrees of freedom nd ranges from \(N_{\min \limits }=12^{6}\) to \(N_{\max \limits }=18^{6}\). The input tensors and matrices have normal distributed random values, and the complete code can be found in the script code_validation.m.

Fig. 1
figure 1

Wall-clock times for different realizations of the Tucker operator (5) with the functions ttm, tmprod, and tucker. The left plot refers to the case d = 3, while the right plot refers to the case d = 6. Each test has been repeated several times in order to avoid fluctuations

4.2 Hermite–Laguerre–Fourier function decomposition

We are interested in the approximation, by means of a pseudospectral decomposition, of the trivariate function

$$f(\boldsymbol{x})=\frac{{x_{2}^{2}}\sin(20x_{1})\sin(10x_{2})\exp(-{x_{1}^{2}}-2x_{2})}{\sin(2\pi x_{3})+2}, \quad \boldsymbol{x}=(x_{1},x_{2},x_{3})\in{\Omega},$$

where Ω = [−b1,b1] × [0,b2] × [a3,b3]. The decays in the first and second directions and the periodicity in the third direction suggest the use of a Hermite–Laguerre–Fourier (HLF) expansion. This mixed transform is useful, for instance, for the solution of differential equations with cylindrical coordinates by spectral methods, see [28]. We then introduce the normalized and scaled Hermite functions (orthonormal in \(L^{2}(\mathbb {R})\))

$$\mathcal{H}^{\beta_{1}}_{i_{1}}(x_{1})=\sqrt{\frac{\beta_{1}}{\sqrt{\pi}2^{i_{1}-1} (i_{1}-1)!}}H_{i_{1}}(\beta_{1} x_{1})\mathrm{e}^{-{\beta_{1}^{2}}{x_{1}^{2}}/2},$$

where \(H_{i_{1}}\) is the (physicist’s) Hermite polynomial of degree i1 − 1. We consider the m1 scaled Gauss–Hermite quadrature points \(\{\xi _{1}^{k_{1}}\}_{k_{1}}\) and define \({\Psi }_{1}\in \mathbb {R}^{m_{1}\times m_{1}}\) to be the corresponding transform matrix with element \({\mathcal{H}}^{\beta _{1}}_{i_{1}}(\xi ^{k_{1}}_{1})\) in position (i1,k1). The parameter β1 is chosen so that the quadrature points are contained in [−b1,b1] (see [29]). This is possible by estimating the largest quadrature point for the unscaled functions by \(\sqrt {2m_{1}+1}\) (see [ 30, Ch. 6]) and setting

$$\beta_{1}=\frac{\sqrt{2m_{1}+1}}{b_{1}}.$$

Moreover, we consider the normalized and scaled generalized Laguerre functions (orthonormal in \(L^{2}(\mathbb {R}^{+})\))

$$\mathcal{L}^{\alpha,\beta_{2}}_{i_{2}}(x_{2})= \sqrt{\frac{\beta_{2}(i_{2}-1)!}{\Gamma(i_{2}+\alpha)}} L_{i_{2}}^{\alpha}(\beta_{2} x_{2})(\beta_{2} x_{2})^{\alpha/2}\mathrm{e}^{-\beta_{2} x_{2}/2},$$

where \(L_{i_{2}}^{\alpha }\) is the generalized Laguerre polynomial of degree i2 − 1. We define Ψ2 to be the corresponding transform matrix with element \({\mathcal{L}}^{\alpha ,\beta _{2}}_{i_{2}}(\xi ^{k_{2}}_{2})\) in position (i2,k2), where \(\{\xi _{2}^{k_{2}}\}_{k_{2}}\) are the m2 scaled generalized Gauss–Laguerre quadrature points. The parameter β2 is chosen, similarly to the Hermite case, as

$$\beta_{2}=\frac{4m_{2}+2\alpha+2}{b_{2}},$$

see [30, Ch. 6] for the asymptotic estimate which holds for \(\lvert \alpha \rvert \ge 1/4\) and α > − 1. Finally, for the Fourier decomposition, we obviously do not construct the transform matrix, but we rely on a Fast Fourier Transform (FFT) implementation provided by the matlab function interpft, which performs a resample of the given input values by means of FFT techniques. We measure the approximation error, for varying values of nμ, μ = 1,2,3, by evaluating the pseudospectral decomposition at a Cartesian grid of points \((x_{1}^{\ell _{1}},x_{2}^{\ell _{2}},x_{3}^{\ell _{3}})\), with 1 ≤ μnμ. In order to do that, we construct the matrices Φ1 and Φ2 containing the values of the Hermite and generalized Laguerre functions at the points \(\{x_{1}^{\ell _{1}}\}_{\ell _{1}}\) and \(\{x_{2}^{\ell _{2}}\}_{\ell _{2}}\), respectively. The relevant code for the approximation of f and its evaluation, by using the KronPACK function tuckerfun, can be written as


PSIFUN{1} = @(f) PSI{1}*f; PSIFUN{2} = @(f) PSI{2}*f; PSIFUN{3} = @(f) f; Fhat = tuckerfun(FW,PSIFUN); PHIFUN{1} = @(f) PHI{1}*f; PHIFUN{2} = @(f) PHI{2}*f; PHIFUN{3} = @(f) interpft(f,n(3)); Ftilde = tuckerfun(Fhat,PHIFUN);

where FW is the three-dimensional array containing the values \(f(\xi _{1}^{k_{1}},\xi _{2}^{k_{2}},\xi _{3}^{k_{3}})w_{1}^{k_{1}}w_{2}^{k_{2}}\), where \(\{\xi _{3}^{k_{3}}\}_{k_{3}}\) are the m3 equispaced Fourier quadrature points in [a3,b3) and \(\{w_{\mu }^{k_{\mu }}\}_{k_{\mu }}\), with μ = 1,2, are the scaled weights of the Gauss–Hermite and generalized Gauss–Laguerre quadrature rules, respectively. The values \(\{\xi _{\mu }^{k_{\mu }}\}_{k_{\mu }}\) and \(\{w_{\mu }^{k_{\mu }}\}_{k_{\mu }}\), for μ = 1,2, have been computed by the relevant functions available, for instance, in Chebfun [31]. The complete example can be found in the script example_spectral.m.

Given a prescribed accuracy, we look for the smallest number of basis functions (m1,m2,m3) that achieve it, and we measure the computational time needed to perform the approximation of f and its evaluation with the HLF method. As a term of comparison, we consider the same experiment with a three-dimensional Fourier spectral approximation (FFF method): in fact, for the size of the computational domain and the exponential decays along the first and second directions of the function f we are considering, it appears reasonable to approximate f by a periodic function in Ω and take advantage of the efficiency of a three-dimensional FFT.

The results with α = 4, b1 = 4, b2 = 11, b3 = −a3 = 1, and n1 = n2 = n3 = 301 evaluation points uniformly distributed in Ω are displayed in Fig. 2. As we can observe, the total number of degrees of freedom needed by the HLF approach is always smaller than the corresponding FFF one. In particular, despite the exponential decay along the second direction, the FFF method requires a very large number of Fourier coefficients along that direction in order to reach the most stringent accuracies. In these situations, the HLF method implemented with the μ-mode approach is preferable in terms of computational time to the well-established implementation by the FFT technique of the FFF method.

Fig. 2
figure 2

Achieved accuracies versus wall-clock times (in seconds, averaged over 20 runs) for the Hermite–Laguerre–Fourier (HLF) and the Fourier–Fourier–Fourier (FFF) approaches. The label of the marks in the plot indicates the number of basis functions used in each direction

4.3 Multivariate interpolation

Let us consider the approximation of a function f(x) through a five-variate interpolating polynomial in Lagrange form

$$p(\boldsymbol{x}) = \sum\limits_{i_{5}=1}^{m_{5}}\cdots\sum\limits_{i_{1}=1}^{m_{1}} f_{i_{1}{\ldots} i_{5}}L_{i_{1}}(x_{1}){\cdots} L_{i_{5}}(x_{5}).$$
(28)

Here \(L_{i_{\mu }}(x_{\mu })\) is the Lagrange polynomial of degree mμ − 1 on a set \(\{\xi ^{k_{\mu }}_{\mu }\}_{k_{\mu }}\) of mμ interpolation points written in the second barycentric form, with μ = 1,…,5, i.e.,

$$L_{i_{\mu}}(x_{\mu})=\frac{\frac{w^{i_{\mu}}_{\mu}}{x_{\mu}-\xi^{i_{\mu}}_{\mu}}}{{\sum}_{k_{\mu}}\frac{w^{k_{\mu}}_{\mu}}{x_{\mu}-\xi^{k_{\mu}}_{\mu}}}, \quad w_{\mu}^{i_{\mu}} = \frac{1}{{\prod}_{k_{\mu}\neq i_{\mu}} (\xi_{\mu}^{i_{\mu}}-\xi_{\mu}^{k_{\mu}})},$$

while \(f_{i_{1}{\ldots } i_{5}} = f(\xi _{1}^{i_{1}},\ldots ,\xi _{5}^{i_{5}})\).

For our numerical example, we consider the five-dimensional Runge function

$$f(x_{1},\ldots,x_{5})=\frac{1}{1+ 16{\sum}_{\mu} x_{\mu}^{2}}$$

in the domain [− 1,1]5. We choose as interpolation points a Cartesian grid of Chebyshev nodes

$$\xi_{\mu}^{k_{\mu}}=\cos\left(\frac{(2k_{\mu}-1)\pi}{2m_{\mu}}\right),\quad k_{\mu}=1,\ldots,m_{\mu},$$

whose barycentric weights are

$$w_{\mu}^{k_{\mu}}=(-1)^{k_{\mu}+1}\sin\left(\frac{(2k_{\mu}-1)\pi}{2m_{\mu}}\right),\quad k_{\mu}=1,\ldots,m_{\mu}.$$

This is the five-dimensional version of one of the examples presented in [32, Sec. 6]. We evaluate the polynomial at a uniformly spaced Cartesian grid of points \((x_{1}^{\ell _{1}},\ldots ,x_{5}^{\ell _{5}})\), with 1 ≤ μnμ. Then, approximation (28) at the just mentioned grid can be computed as

$$\boldsymbol P = \boldsymbol{F} \times_{1} L_{1} \times_{2} {\cdots} \times_{5} L_{5},$$
(29)

where we collected the function evaluations at the interpolation points in the tensor F and Lμ contains the element \(L_{i_{\mu }}(x_{\mu }^{\ell _{\mu }})\) in position (μ,iμ). If we store the matrices Lμ in a cell L, the corresponding matlab command for computing the desired approximation is


P = tucker(F,L);

The results, for a number of evaluation points fixed to nμ = n = 35 and varying number of interpolation points mμ = m, are reported in Fig. 3, and the complete code can be found in the script example_interpolation.m.

Fig. 3
figure 3

Results for approximation (29) with an increasing number mμ = m of interpolation points. The relative error (blue circles) is computed in maximum norm at the evaluation points. For reference, a dashed line representing the theoretical decay estimate is added

As expected, the error decreases according to the estimate

$$\lVert f(\boldsymbol{x}) - p(\boldsymbol{x}) \lVert_{\infty} \approx K^{-m}, \quad K = \frac{1}{4} + \sqrt{\frac{17}{16}},$$

see [32, 33].

4.4 Linear evolutionary equation

Let us consider the following three-dimensional Advection–Diffusion–Absorption evolutionary equation, written in conservative form, for a concentration u(t,x) (see [34])

$$\left\{\begin{aligned} &\partial_{t} u(t,\boldsymbol{x}) + \sum\limits_{\mu=1}^{3}\beta_{\mu}\partial_{x_{\mu}}(x_{\mu} u(t,\boldsymbol{x})) =\alpha\sum\limits_{\mu=1}^{3}\beta_{\mu}^{2}\partial_{x_{\mu}}(x_{\mu}^{2}\partial_{x_{\mu}} u(t,\boldsymbol{x}))-\gamma u(t,\boldsymbol{x}),\\ &u(0,\boldsymbol{x})=u_{0}(\boldsymbol{x})=x_{1}(2-x_{1})^{2}x_{2}(2-x_{2})^{2}x_{3}(2-x_{3})^{2}, \end{aligned}\right.$$
(30)

where βμ, μ = 1,2,3, and α > 0 are advection and diffusion coefficients and γ ≥ 0 is a coefficient governing the decay of u(t,x). After a space discretization by second order centered finite differences on a Cartesian grid, we end up with a system of ODEs

$$\left\{ \begin{aligned} \boldsymbol{u}^{\prime}(t)&=(A_{3}\oplus A_{2}\oplus A_{1})\boldsymbol u(t),\\ \boldsymbol{u}(0)&=\boldsymbol{u}_{0}, \end{aligned}\right.$$
(31)

where \(A_{\mu }\in \mathbb {R}^{n_{\mu }\times n_{\mu }}\) is the one-dimensional discretization of the operator

$$(2\alpha\beta_{\mu}^{2}x_{\mu}-\beta_{\mu} x_{\mu})\partial_{x_{\mu}}+\alpha\beta_{\mu}^{2} x_{\mu}^{2}\partial_{x_{\mu}^{2}} -\left(\beta_{\mu}+\frac{\gamma}{3}\right).$$

If we denote by U0 = vec(u0) and U(t) = vec(u(t)) the tensors associated to the vectors u0 and u(t), respectively, then we have

$$\boldsymbol U(t)=\boldsymbol U_{0}\times_{1} \exp(t A_{1})\times_{2} \exp(t A_{2}) \times_{3} \exp(t A_{3}).$$
(32)

We consider equation (30) for x \(\in\) [0,2]3, coupled with homogeneous Dirichlet–Neumann conditions (u(t,x) = 0 at xμ = 0 and \(\partial _{x_{\mu }}u(t,\boldsymbol {x})=0\) at xμ = 2, μ = 1,2,3). The coefficients are fixed to

$$\beta_{1}=\beta_{2}=\beta_{3}=\frac{2}{3},\quad \alpha=\frac{1}{2}, \quad \gamma=\frac{1}{100}.$$

Then, if we compute the needed matrix exponentials by the function expm in matlab and define


E{mu} = expm(tstar*A{mu});

the solution U(t) at final time t = 0.5 can be computed as


U = tucker(U0,E);

since the matrix exponential is the exact solution and thus no substepping strategy is needed. The complete example is reported in the script example_exponential.m.

In Table 2 we show the results with a discretization in space of n = (50,55,60) grid points. Since the problem is moderately stiff, we consider for comparison the solution of system (31) by the ode23 matlab function (which implements an explicit adaptive Runge–Kutta method of order (2)3) and by a standard implementation of the explicit Runge–Kutta method of order four (RK4). For the Runge–Kutta methods, we consider both the tensor and the vector implementations, using the functions kronsumv and kronsum, respectively (see equivalence (9)). The number of uniform time steps for RK4 has been chosen in order to obtain a comparable error with respect to the result of the variable time step solver ode23. As we can see, the tensor formulation (32) implemented using the function tucker is much faster than any other considered approach. Indeed, this is due to the fact that formula (32) requires a single time step and calls a level 3 BLAS only three times. For other experiments involving the approximation of the action of the matrix exponential in tensor-structured problems, we invite the reader to check [15].

Table 2 Summary of the results for solving the ODEs system (31) with the three described approaches. We report the number of time steps, the wall-clock times in seconds for both the tensor and the vector formulations (when feasible) and the relative error in infinity norm of the final solution with respect to the solution given by the tucker approach

4.5 Semilinear evolutionary equation

We consider the following three-dimensional semilinear evolutionary equation

$$\left\{ \begin{aligned} &\partial_{t} u(t,\boldsymbol{x}) = {\Delta} u(t,\boldsymbol{x}) + \frac{1}{1+u(t,\boldsymbol{x})^{2}} + {\Phi}(t,\boldsymbol{x}),\\ &u(0,\boldsymbol{x}) = u_{0}(\boldsymbol{x})=x_{1}(1-x_{1})x_{2}(1-x_{2})x_{3}(1-x_{3}), \end{aligned} \right.$$
(33)

for x \(\in\) [0,1]3, where the function Φ(t,x) is chosen so that the exact solution is \(u(t,\boldsymbol x)\) = \(\mathrm{e}^t u_0(\boldsymbol x)\). We complete the equation with homogeneous Dirichlet boundary conditions in all the directions. This is the three-dimensional generalization of the example presented in [35].

We discretize the problem in space by means of second order centered finite differences on a Cartesian grid, with nμ grid points for the spatial variable xμ, μ = 1,2,3. Then, the application of the backward-forward Euler method leads to the following marching scheme

$$M\boldsymbol{u}_{k+1} = \boldsymbol{u}_{k} + \tau\boldsymbol{f}(t_{k},\boldsymbol{u}_{k}),$$
(34)

where uku(tk,x), τ is the time step size, tk is the current time and

$$\boldsymbol{f}(t_{k},\boldsymbol{u}_{k}) = \frac{1}{1+\boldsymbol{u}_{k}^{2}} + {\Phi}(t_{k},\boldsymbol{x}).$$

The matrix of the linear system (34) is given by

$$M = M_{3} \oplus M_{2} \oplus M_{1}, \quad M_{\mu} = \left(\frac{1}{3}I_{\mu} - \tau A_{\mu}\right),$$

where Aμ is the discretization of the partial differential operator \(\partial _{x_{\mu }^{2}}\) and Iμ is the identity matrix of size nμ. One could solve the linear system (34) using a direct method, in particular by computing the Cholesky factors of the matrix M once and for all (if the step size τ is constant). Another approach would be to use the Conjugate Gradient (CG) method for the single marching step (34). In matlab, the latter can be performed as


pcg(M,uk+tau*f(tk,uk),tol,maxit,[],[],uk);

or


pcg(Mfun,uk+tau*f(tk,uk),tol,maxit,[],[],uk);

where M is the matrix assembled using kronsum (vector approach), while Mfun is implemented by means of the function kronsumv (tensor approach). As described in Section 3.4, an effective preconditioner for system (34) is the one of ADI-type

$$P_{3} \otimes P_{2} \otimes P_{1}, \quad P_{\mu} = (I_{\mu} - \tau A_{\mu}).$$

The action of the inverse of this preconditioner on a vector v can be easily performed in tensor formulation, see formula (27), and the resulting Preconditioned Conjugate Gradient (PCG) method is


pcg(Mfun,uk+tau*f(tk,uk),tol,maxit,Pfun,[],uk);

where Pfun is implemented through the KronPACK function itucker. The complete example is reported in the file example_imex.m.

In Table 3 we report the results obtained for a space discretization of n = (40,44,48) grid points. The time step size τ of the marching scheme (34) is 0.01 and the final time of integration is t = 1. For all the methods, the final relative error in infinity norm with respect to the exact solution is 9.7 ⋅ 10− 3. As it is clearly shown, the ADI-type preconditioner is really effective in reducing the number of iterations of the CG method. Moreover, the resulting method is the fastest among all the considered approaches.

Table 3 Summary of the results for solving the semilinear equation (33) by the method of lines and the backward-forward Euler method. The elapsed time is the wall-clock time measured in seconds

5 Conclusions

In this work, we presented how it is possible to state d-dimensional tensor-structured problems by means of composition of one-dimensional rules, in such a way that the resulting μ-mode BLAS formulation can be efficiently implemented on modern computer hardware. The common thread consists in the suitable employment of tensor product operations, with special emphasis on the Tucker operator and its variants. After validating our package KronPACK against other commonly used tensor operation toolboxes, the effectiveness of the μ-mode approach compared to other well-established techniques is shown on several examples from different fields of numerical analysis. More in detail, we employed this approach for a pseudospectral Hermite–Laguerre–Fourier trivariate function decomposition, for the barycentric Lagrange interpolation of a five-variate function and for the numerical solution of three-dimensional stiff linear and semilinear evolutionary differential equations by means of exponential techniques and a (preconditioned) IMEX method, respectively.