Numerical solution of a class of third order tensor linear equations

Simoncini, V.

doi:10.1007/s40574-020-00247-4

Numerical solution of a class of third order tensor linear equations

Open access
Published: 20 July 2020

Volume 13, pages 429–439, (2020)
Cite this article

Download PDF

You have full access to this open access article

Bollettino dell'Unione Matematica Italiana Aims and scope Submit manuscript

Numerical solution of a class of third order tensor linear equations

Download PDF

V. Simoncini ORCID: orcid.org/0000-0003-0795-5865^1,2

2397 Accesses
5 Citations
Explore all metrics

Abstract

We propose a new dense method for determining the numerical solution to a class of third order tensor linear equations. The approach does not require the use of the coefficient matrix in Kronecker form, thus it allows the treatment of structured very large problems. A particular version of the method for symmetric matrices is also discussed. Numerical experiments illustrate the properties of the proposed algorithm.

Tensor Methods for Solving Symmetric $${\mathcal {M}}$$ -tensor Systems

Article 03 May 2017

Solution of linear systems in high spatial dimensions

Article Open access 01 June 2015

Solving Multi-linear Systems with $$\mathcal {M}$$ -Tensors

Article 08 January 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We are interested in the numerical computation of the unique solution ${\textit{\textbf{X}}}\in {\mathbb {R}}^{n\times n\times n}$ to the nonsingular system ${{\mathcal {A}}} {\textit{\textbf{x}}} = b$ written in the following tensor form

$$\begin{aligned} ( M_1\otimes A_1 \otimes {\textit{\textbf{H}}}+ A_2\otimes {\textit{\textbf{M}}} \otimes {\textit{\textbf{H}}} + H_3\otimes {\textit{\textbf{M}}} \otimes A_3) \mathrm{vec}({\textit{\textbf{X}}}) = b_3\otimes b_2\otimes b_1 \quad \end{aligned}$$

(1.1)

where all coefficient matrices are real and have the same $n\times n$ dimensions. Here $\otimes $ denotes the Kronecker product (to be recalled later) and vec(X) stacks the components of the tensor X one after the other. In particular, in (1.1) two terms share the same matrices, either ${\textit{\textbf{M}}}$ or ${\textit{\textbf{H}}}$ (purposely in bold face), while all other matrices $A_i$,$i=1,2,3$ and $H_3, M_1$ have no relation to each other. The only assumption on the coefficient matrices, in addition to the nonsingularity of ${{\mathcal {A}}}$, is that ${\textit{\textbf{M}}}, {\textit{\textbf{H}}}$ and $M_1,H_3$ be nonsingular. The unknown solution tensor will also be highlighted in bold face, to emphasize that this is the array to be determined.

This tensor equation is representative of a large class of problems that can be described by means of tensors and formulated as linear array equation. For instance, the discretization of three dimensional partial differential equations by means of a tensor basis, as is the case for finite differences on parallelepipedal domains or certain spectral methods, can lead to tensor equations of type (1.1). Tensor equations have become a fundamental algebraic ingredient for the numerical treatment of mathematical models depending on many parameters, as is the case in uncertainty quantification and parameter-dependent model order reduction methodologies; see, e.g., [1, 4, 5, 15, 20, 23]. In typical situations, tensor equations with many terms occur, and each term may have a number of Kronecker products. In a simplified context, for instance, the following tensor equation is of interest (see, e.g., [17])

$$\begin{aligned} \sum _{i=1}^\ell \left( I \otimes \cdots \otimes {\mathop {A_i}\limits ^{i}} \otimes \cdots \otimes I \right) {\textit{\textbf{x}}} ={{\,\mathrm{\bigotimes }\,}}_{i=1}^\ell b_i \end{aligned}$$

For $\ell = 3$ this is a special case of our problem (1.1), where all matrices except $A_i$, $i=1,2,3$ are equal to the identity matrix I. Finally, we explicitly remark that if the right-hand side in (1.1) were the sum of rank-one tensors, then the solution could be expressed as the sum of solutions to tensor equations with right-hand sides of rank one.

The literature on tensors - their analysis and the associated approximation methods - has grown tremendously in the past twenty years. Different tensor representations and decompositions have been analyzed. We refer the reader to [16] for an introductory and historical account, and to [11] for a literature survey up to 2013. Numerous different decompositions have allowed the developments of various problem dependent strategies, see, e.g., [7, 12, 14, 18, 24].

More recently, methods for solving linear equations in tensor format have been proposed and analyzed. In most cases, the authors have been interested in the presence of many summands and many Kronecker products, for which iterative methods appear to be mandatory. In this context, most approaches try to take into account the Kronecker structure and the possible low rank of the involved iteration matrices, see, e.g., [2, 3, 6, 13, 19,20,21]. However, little has been said on “direct” dense methods for low order tensor equations, without the explicit use of the Kronecker form. Here we close this gap for the special case of (1.1), which nonetheless appears to be a feasible algebraic formulation of a quite large set of differential problems.

2 A closed form solution

The numerical solution to (1.1) can be given in closed form by unfolding the 3-mode tensor in one of the three directions. In particular, a tensor $X\in {\mathbb {R}}^{n_1\times n_2\times n_3}$ can be written using the mode-1 unfolding as (see, e.g., [16])

$$\begin{aligned} X_{(1)} = [X_1, X_2, \ldots , X_{n_3}], \quad X_j \in {\mathbb {R}}^{n_1\times n_2}, j=1, \ldots , n_3; \end{aligned}$$

each $X_j$ is called a slice, and $X_{(1)}$ is a matrix in ${\mathbb {R}}^{{n_1}\times n_2n_3}$. Some additional standard notation needs to be recalled. The Kronecker product of two matrices X, Y is defined in the standard block form as

$$\begin{aligned} X \otimes Y = \begin{bmatrix} X_{1,1} Y &{} \cdots &{} X_{1,n_2} Y \\ \vdots &{} \ddots &{} \vdots \\ X_{n_1,1} Y &{} \cdots &{} X_{n_1,n_2} Y \end{bmatrix}, \end{aligned}$$

where $X_{i,j}$ denotes an element of X. Moreover, vec(X) is the operator stacking all columns of the matrix X one after the other. In the case of third order tensors, we will apply the vec operator to the mode-1 unfolding. The reverse operation, for known dimensions of the vector x, will be denoted by $\mathrm{mat}(x,n_1,n_2)$, so that $x=\mathrm{vec}(X)$ and $X=\mathrm{mat}(x,n_1,n_2)$. Similarly, $X=\mathrm{tensor}_{(1)}(x,n_1,n_2,n_3)$ will fold a long vector into a tensor via the mode-1 unfolding. A standard property of the Kronecker product that will be used repeatedly is the following

$$\begin{aligned} \mathrm{vec}(A X B) = (B^T \otimes A) \mathrm{vec}(X), \end{aligned}$$

(2.1)

($B^T$ denotes the real transpose of B), which allows one to go back and forth between the vector and matrix notations. Other properties used in the sequel are i) $(A\otimes B)^T=A^T\otimes B^T$, ii) $(A\otimes B)(C\otimes D) = (A C\otimes BD)$ for compatible matrix dimensions, iii) if A and B are both invertible, then $A\otimes B$ is invertible and $(A\otimes B)^{-1} = A^{-1} \otimes B^{-1}$; see, e.g., [9, Ch.12].

The following result holds. Here $Q^*$ denotes the conjugate transpose of the complex matrix Q, while ${H}^{-T} = ({H}^{-1})^T$ and T denotes transposition.

Theorem 2.1

Let $A_3^T {\textit{\textbf{H}}}^{-T} = Q R Q^{*}$ be the Schur decomposition of $A_3^T {\textit{\textbf{H}}}^{-T}$, and $[\gamma _1, \ldots , \gamma _n]:=b_1^T {\textit{\textbf{H}}}^{-T} Q$. Using the mode-1 unfolding, the solution $\textit{\textbf{X}}$ to (1.1) is given by

$$\begin{aligned} {\textit{\textbf{X}}}_{(1)} = [Q^{-T} {\tilde{Z}}_1^T, \ldots , Q^{-T} {\tilde{Z}}_n^T] \in {\mathbb {R}}^{n\times n^2}, \end{aligned}$$

where for $j=1, \ldots , n$, the matrix ${\tilde{Z}}_j$ solves the generalized Sylvester equation

$$\begin{aligned} {\textit{\textbf{M}}} Z ( H_3^T R_{j,j} + A_2^T) + A_1 Z M_1^T = b_2 \gamma _j b_3^T - \mathrm{mat}({\tilde{Z}}_{j-1} R_{1:j-1,j},n,n)H_3^T . \end{aligned}$$

where $R_{j,j}$ denotes the (j, j) element of the upper triangular matrix R and $R_{1:j-1,j}$ the first $j-1$ components of its jth column; we define $\mathrm{mat}({\tilde{Z}}_{j-1} R_{1:j-1,j},n,n)H_3^T$ to be an empty array for $j=1$.

Proof

Using (2.1) for the unfolded tensor we have

$$\begin{aligned} {\textit{\textbf{H}}} {\textit{\textbf{X}}}_{(1)} (A_2\otimes {\textit{\textbf{M}}} +M_1\otimes A_1)^T + A_3 {\textit{\textbf{X}}}_{(1)} (H_3\otimes {\textit{\textbf{M}}})^T= & {} b_1 (b_3\otimes b_2)^T \\ {\textit{\textbf{X}}}_{(1)} (A_2\otimes {\textit{\textbf{M}}}+M_1\otimes A_1)^T(H_3 \otimes {\textit{\textbf{M}}})^{-T} + {\textit{\textbf{H}}}^{-1} A_3 {\textit{\textbf{X}}}_{(1)}= & {} {\textit{\textbf{H}}}^{-1} b_1 (b_3\otimes b_2)^T (H_3 \otimes {\textit{\textbf{M}}})^{-T}\\ {\textit{\textbf{X}}}_{(1)} (A_2^T H_3^{-T} \otimes I + M_1^T H_3^{-T} \otimes A_1^T {\textit{\textbf{M}}}^{-T}) + {\textit{\textbf{H}}}^{-1} A_3 {\textit{\textbf{X}}}_{(1)}= & {} {\textit{\textbf{H}}}^{-1} b_1 (b_3^T H_3^{-T} \otimes b_2^T {\textit{\textbf{M}}}^{-T}) \end{aligned}$$

For later readability, let us transpose both sides and set $Y=({\textit{\textbf{X}}}_{(1)})^T$. Then we obtain

$$\begin{aligned} (H_3^{-1} A_2 \otimes I + H_3^{-1} M_1\otimes {\textit{\textbf{M}}}^{-1}A_1) Y + Y ({\textit{\textbf{H}}}^{-1} A_3)^T = (H_3^{-1} b_3 \otimes {\textit{\textbf{M}}}^{-1} b_2) b_1^T {\textit{\textbf{H}}}^{-T} . \end{aligned}$$

Using $({\textit{\textbf{H}}}^{-1} A_3)^T = Q R Q^{*}$ and multiplying the equation by Q from the right, we can write

$$\begin{aligned} (H_3^{-1} A_2 \otimes I + H_3^{-1} M_1\otimes {\textit{\textbf{M}}}^{-1}A_1) Y Q + YQ R = (H_3^{-1} b_3 \otimes {\textit{\textbf{M}}}^{-1} b_2) b_1^T {\textit{\textbf{H}}}^{-T} Q . \end{aligned}$$

Let $b_1^T {\textit{\textbf{H}}}^{-T} Q =: [\gamma _1, \ldots , \gamma _k]$ and $Y Q=:[{\hat{z}}_1, \ldots , {\hat{z}}_n]$. Thanks to the upper triangular form of R, for the first column ${\hat{z}}_1$ it holds

$$\begin{aligned} (H_3^{-1} A_2 \otimes I + H_3^{-1} M_1\otimes {\textit{\textbf{M}}}^{-1}A_1) {\hat{z}}_1 + {\hat{z}}_1 R_{1,1} = (H_3^{-1} b_3 \otimes {\textit{\textbf{M}}}^{-1} b_2) \gamma _1 . \end{aligned}$$

(2.2)

For the subsequent columns $j=2, \ldots , n$, taking into account once again the triangular form of R, we set $w_{j-1} = [{\hat{z}}_1, \ldots , {\hat{z}}_{j-1}] R_{1:j-1,j}$ so that

$$\begin{aligned} \!\!(H_3^{-1} A_2 \otimes I + H_3^{-1} M_1\otimes {\textit{\textbf{M}}}^{-1}A_1) {\hat{z}}_j + {\hat{z}}_j R_{j,j} = (H_3^{-1} b_3 \otimes {\textit{\textbf{M}}}^{-1} b_2) \gamma _1 - w_{j-1} .\,\,\qquad \end{aligned}$$

(2.3)

Each column can be obtained in sequence by further unmaking the Kronecker product as follows. Let us reshape each ${\hat{z}}_j$ so that ${\hat{z}}_j = \mathrm{vec}({\hat{Z}}_j)$. By using (2.1) in (2.2) for $j=1$, we can write

$$\begin{aligned} {\textit{\textbf{M}}}^{-1}A_1 {\hat{Z}}_1 (H_3^{-1} M_1)^T + {\hat{Z}}_1(R_{1,1} I + (H_3^{-1} A_2)^T) = {\textit{\textbf{M}}}^{-1} b_2 \gamma _1 (H_3^{-1} b_3)^T, \end{aligned}$$

which can be written as

$$\begin{aligned} {\textit{\textbf{M}}}^{-1}A_1 {\hat{Z}}_1 + {\hat{Z}}_1(R_{1,1} H_3^T M_1^{-T} + A_2^T M_1^{-T} ) = {\textit{\textbf{M}}}^{-1} b_2 \gamma _1 b_3^T M_1^{-T}, \,\, j=2, \ldots , n.\qquad \end{aligned}$$

(2.4)

Analogously, for $j=2, \ldots , n$ and letting $W_{j-1} = \mathrm{mat}([{\hat{z}}_1, \ldots , {\hat{z}}_{j-1}] R_{1:j-1,j})$, from (2.3) we first obtain

$$\begin{aligned} {\textit{\textbf{M}}}^{-1}A_1 {\hat{Z}}_j (H_3^{-1} M_1)^T + {\hat{Z}}_j(R_{j,j} I + (H_3^{-1} A_2)^T) = {\textit{\textbf{M}}}^{-1} b_2 \gamma _j (H_3^{-1} b_3)^T - W_{j-1} H_3^{-T}, \end{aligned}$$

or equivalently, for $j=2, \ldots , n$, as

$$\begin{aligned} {\textit{\textbf{M}}}^{-1}A_1 {\hat{Z}}_j + {\hat{Z}}_j(R_{j,j} H_3^T M_1^{-T} + A_2^T M_1^{-T} ) = {\textit{\textbf{M}}}^{-1} b_2 \gamma _1 b_3^T M_1^{-T} W_{j-1} M_1^{-T} . \end{aligned}$$

(2.5)

Multiplying both sides by $M_1$ (from the right) and by ${\textit{\textbf{M}}}^T$ (from the left), the result follows. $\square $

We notice that the use of the mode-1 unfolding is related to the specific location of the repeated matrices ${\textit{\textbf{H}}}$, ${\textit{\textbf{M}}}$. Different unfoldings could be used if these matrices occupy different positions. We also remark that the same procedure could be applied if data were complex, without any particular change in the proof, or in the algorithm below, except that one should keep in mind that property (2.1) uses real transposition, even if data are complex.

3 The new algorithm

The proof used for Theorem 2.1 is constructive, as it provides an explicit way to generate the tensor solution, one slice at the time. The complete procedure is described in the algorithm below, in the following called the Three-Term-Tensor Sylvester ($\mathrm{T}^3$-Sylv) method.

In practice, using appropriate transformations, the method is a nested Sylvester solver, which treats one slice at the time, and updates the corresponding coefficient matrix and right-hand side F. The solvability of the Sylvester equations is related to that of the original problem, and in particular to the nonsingularity of ${\mathcal {A}}$.

The algorithm relies on the initial Schur decomposition, which provides robust unitary transformations. Moreover, for each slice, a matrix Sylvester equation needs to be solved, whose solution also involves the Schur decompositions of the coefficient matrices, see, e.g., [26]; its sensitivity has been analyzed in [8, sec.4.1]. Stability of the overall algorithm is also affected by the presence of several inverses, which can be harmful in case of ill-conditioned matrices. Indeed, if some of the involved matrices are severely ill-conditioned, the solution may lose accuracy. This fact was experimentally observed in our experiments, some of which are reported in Example 4.3.

3.1 Numerical experiments

In this section we report on some numerical experiments with the $\mathrm{T}^3$-Sylv method. All experiments were performed using Matlab [22].

Example 3.1

To test the efficiency of the method, we consider dense matrices with random entries (taken from a uniform distribution in the interval (0,1), Matlab function rand) of increasing size n. The same is used for the vectors $b_1, b_2, b_3$. We stress that the Kronecker form of the problem would involve a dense matrix ${\mathcal {A}}$ of size $n^3\times n^3$, which could not even be stored.

We readily observe that the method is able to solve a (random) structured dense problem of size $n^3=16,777,216$ in about 34 seconds on a standard laptop. The CPU times in Table 1 show that the computational cost of the method approximately grows between six and ten times as the dimension n doubles. However, going from n to 2n, the problem dimension in the full space would grow from $n^3$ to $2^3 n^3$. Hence, the actual cost appears to grow linearly with $n^3$. Since data are dense, Gauss elimination on ${\mathcal {A}}$ would instead require ${{\mathcal {O}}}( (n^3)^3)$ floating point operations.

Table 1 CPU times (in secs) of $\mathrm{T}^3$-Sylv for increasing dimensions of the coefficient matrices, having uniformly distributed random entries

Full size table

4 The symmetric case

If all matrices are symmetric and positive definite, the solution is also symmetric and positive semidefinite, in the appropriate tensor representation; see, e.g., [10] for a general discussion. With these hypotheses, the derivation of the solution procedure simplifies accordingly, as shown in the following result. We stress that many problems can be brought to this setting. Consider for instance the differential equation $-\Delta u = f$ on the unit cube with zero Dirichlet boundary conditions. By discretizing using linear finite elements in each direction (this may be seen as a linear finite element discretization using ${\textit{\textbf{Q}}}_1$ brick elements), we obtain

$$\begin{aligned} ( M\otimes A \otimes M + A\otimes M \otimes M + M\otimes M \otimes A) \mathrm{vec}({\textit{\textbf{X}}}) = F, \end{aligned}$$

with $M=\mathrm{tridiag}(-1, \underline{4}, -1)\in {\mathbb {R}}^{n\times n}$ and $A=\mathrm{tridiag}(-1, \underline{2}, -1)\in {\mathbb {R}}^{n\times n}$ (the underlined number corresponds to the diagonal entry in the two symmetric and tridiagonal matrices). If f can be well approximated by a separable function in spatial dimensions, then F will have the desired Kronecker form, see, e.g., [10, 17] for a similar description^{Footnote 1}.

Analogously, one could consider the equation ${{\mathcal {L}}}(u) = f(x,y,z)$, $(x,y,z)\in \Omega \subset {\mathbb {R}}^3$ and

$$\begin{aligned} {{\mathcal {L}}}(u)= -{\textit{\textbf{m}}}(z) h_3(y) \phi _1(x) u_{xx} - m_1(x) {\textit{\textbf{h}}}(z)\phi _2(y) u_{yy} - {\textit{\textbf{h}}}(x) {\textit{\textbf{m}}}(y) \phi _3(z) u_{zz}, \end{aligned}$$

while $f(x,y,z)=f_1(x)f_2(y)f_3(z)$. Finite difference discretization leads to the Kronecker form in (1.1), where, ${\textit{\textbf{M}}}$ is a diagonal matrix containing the coefficients in ${\textit{\textbf{m}}}(z_k)$ at the interior nodes $z_k$ in the z-direction; similarly for the other coefficients. The matrices $A_i$, $i=1,2,3$ contain the three-point stencil of the discretized second order derivative in each direction, respectively, together with the coefficient $\phi _i$; see, e.g., [25].

We do not report numerical results with data stemming from these discretizations, as they would not significantly differ from those shown in our examples, for dense data.

We then describe the specialized result for symmetric and positive definite matrices. This leads to better stability properties of the algorithm (see Example 4.3).

Proposition 4.1

Assume all coefficient matrices in (1.1) are symmetric and positive definite, and assume that $b\equiv b_1=b_2=b_3$. Let ${\textit{\textbf{H}}}={\textit{\textbf{L}}}_{\textit{\textbf{H}}}{\textit{\textbf{L}}}_{\textit{\textbf{H}}}^T$, ${\textit{\textbf{M}}}={\textit{\textbf{L}}}_{\textit{\textbf{M}}}{\textit{\textbf{L}}}_{\textit{\textbf{M}}}^T$, $H_3=L_3 L_3^T$ and $L_3^{-1} M_1 L_3^{-T}=L_1 L_1^T$ be the Cholesky factorizations of the corresponding matrices. Let ${\widehat{A}}_3 = Q \Lambda Q^T$ be the eigenvalue decomposition of ${\widehat{A}}_3 = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} A_3 {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-T}$ and $[\gamma _1, \ldots , \gamma _n]:={b}^T {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} Q$. Using the mode-1 unfolding, the solution $\textit{\textbf{X}}$ to (1.1) is given by

$$\begin{aligned} {\textit{\textbf{X}}}_{(1)} = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} Q [ \mathrm{vec}({\textit{\textbf{L}}}_{\textit{\textbf{M}}}{\widehat{Z}}_1 L_3^{-1}), \ldots , \mathrm{vec}({\textit{\textbf{L}}}_{\textit{\textbf{M}}}{\widehat{Z}}_n L_3^{-1})]^T, \end{aligned}$$

where for $j=1, \ldots , n$, ${\widehat{Z}}_1={\widetilde{Z}}_j L_1^{-1}$ and the matrix ${\widetilde{Z}}_j$ solves the generalized Sylvester equation

$$\begin{aligned} {\widetilde{Z}}_j L_1^{-1} (\lambda _jI + {\widehat{A}}_2)L_1^{-T} + {\widehat{A}}_1{\widetilde{Z}}_1 = {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b \gamma _j (L_3^{-1} b)^TL_1^{-T}, \quad j=1, \ldots , n, \end{aligned}$$

with ${\widehat{A}}_2 = L_3^{-1} A_2 L_3^{-T}$ and ${\widehat{A}}_1 = {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} A_1 {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-T}$.

Proof

Consider the following Cholesky factorizations of the given matrices

$$\begin{aligned} {\textit{\textbf{H}}} = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}{\textit{\textbf{L}}}_{\textit{\textbf{H}}}^T, \quad H_3\otimes {\textit{\textbf{M}}} = (L_3 \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}) (L_3^T \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^T) . \end{aligned}$$

Using (2.1) for the unfolded tensor we have

$$\begin{aligned}&{\textit{\textbf{H}}}{\textit{\textbf{X}}}_{(1)} (A_2\otimes {\textit{\textbf{M}}}+M_1\otimes A_1) + A_3 {\textit{\textbf{X}}}_{(1)} (H_3\otimes {\textit{\textbf{M}}}) = b (b\otimes b)^T \\&{\textit{\textbf{L}}}_{\textit{\textbf{H}}}^T {\textit{\textbf{X}}}_{(1)} (A_2\otimes {\textit{\textbf{M}}}+M_1\otimes A_1) (L_3^{-T} \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-T}) + {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} A_3 {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-T} {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^T {\textit{\textbf{X}}}_{(1)} (L_3 \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}) \\&\quad = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} b (b\otimes b)^T (L_3^{-T} \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-T})\\&\widehat{\textit{\textbf{X}}}_{(1)} ({\widehat{A}}_2\otimes I +{\widehat{M}}_1\otimes {\widehat{A}}_1) + {\widehat{A}}_3 \widehat{\textit{\textbf{X}}}_{(1)} = {\widehat{b}} (b^T L_3^{-T} \otimes b^T {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-T}) \end{aligned}$$

where we have defined ${\widehat{b}} = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} b$, $\widehat{\textit{\textbf{X}}}_{(1)} = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^T {\textit{\textbf{X}}}_{(1)} (L_3 \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}})$, ${\widehat{A}}_3 = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} A_3 {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-T}$, ${\widehat{A}}_2=L_3^{-1} A_2 L_3^{-T}$, ${\widehat{M}}_1=L_3^{-1} M_1 L_3^{-T}$ and ${\widehat{A}}_1={\textit{\textbf{L}}}_{{\textit{\textbf{M}}}}^{-1} A_1 {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-T}$. Note that all these “hat” coefficient matrices are still symmetric and positive definite. For later readability, let us transpose both sides and set $Y=(\widehat{\textit{\textbf{X}}}_{(1)})^T$. Then we obtain

$$\begin{aligned} ({\widehat{A}}_2\otimes I +{\widehat{M}}_1\otimes {\widehat{A}}_1) Y + Y {\widehat{A}}_3 = (L_3^{-1} b \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b) {\widehat{b}}^T . \end{aligned}$$

Using ${\widehat{A}}_3 = Q \Lambda Q^T$ and multiplying the equation by Q from the right, we can write

$$\begin{aligned} ({\widehat{A}}_2\otimes I +{\widehat{M}}_1\otimes {\widehat{A}}_1) Y Q + YQ \Lambda = (L_3^{-1} b \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b) {\widehat{b}}^T Q. \end{aligned}$$

Let ${\widehat{b}}^T Q =: [\gamma _1, \ldots , \gamma _k]$ and $Y Q=:[{\hat{z}}_1, \ldots , {\hat{z}}_n]$. Thanks to the diagonal form of $\Lambda $, namely $\Lambda =\mathrm{diag}(\lambda _1, \ldots , \lambda _n)$, for each column ${\hat{z}}_j$ it holds

$$\begin{aligned} ({\widehat{A}}_2\otimes I +{\widehat{M}}_1\otimes {\widehat{A}}_1) {\hat{z}}_j + {\hat{z}}_j \lambda _j = (L_3^{-1} b \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b) \gamma _j . \end{aligned}$$

(4.1)

Each column can then be obtained in sequence by further unmaking the Kronecker product as follows. Let us reshape each ${\hat{z}}_j$ so that ${\hat{z}}_j = \mathrm{vec}({\widehat{Z}}_j)$. By using (2.1) in (4.1), we can write

$$\begin{aligned} {\widehat{Z}}_j (\lambda _jI + {\widehat{A}}_2) + {\widehat{A}}_1{\widehat{Z}}_1 {\widehat{M}}_1 = {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b \gamma _j (L_3^{-1} b)^T, \end{aligned}$$

which, upon factorization of ${\widehat{M}}_1$ as ${\widehat{M}}_1=L_1 L_1^T$ can be written as

$$\begin{aligned} {\widetilde{Z}}_j L_1^{-1} (\lambda _jI + {\widehat{A}}_2)L_1^{-T} + {\widehat{A}}_1{\widetilde{Z}}_1 ={\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b \gamma _j (L_3^{-1} b)^TL_1^{-T}, \quad j=1, \ldots , n{,} \end{aligned}$$

(4.2)

where ${\widetilde{Z}}_j = {\widehat{Z}}_j L_1$. The matrix equation (4.2) is a standard Sylvester equation, which can be solved for each j. Once ${\widehat{Z}}_j$ is recovered, we obtain

$$\begin{aligned} {\textit{\textbf{X}}}_{(1)} = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} Q [ \mathrm{vec}({\textit{\textbf{L}}}_{\textit{\textbf{M}}}{\widehat{Z}}_1 L_3^{-1}), \ldots , \mathrm{vec}({\textit{\textbf{L}}}_{\textit{\textbf{M}}}{\widehat{Z}}_n L_3^{-1})]^T. \end{aligned}$$

$\square $

We will call the corresponding algorithm t$^3$-sym-sylv. Its Matlab implementation is reported in the Appendix.

We notice that the proof does not require all matrices to be positive definite, and $A_3$ only needs to be symmetric. In fact, the matrices $A_2$ and $A_1$ do not even need to be symmetric, although the current proof omits the corresponding transpositions. The proof could be easily adapted to treat this setting. Indeed, for $A_1, A_2$ nonsymmetric the Sylvester equation (4.2) could be written as

$$\begin{aligned} {\widetilde{Z}}_j L_1^{-1} (\lambda _jI + {\widehat{A}}_2^T)L_1^{-T} + {\widehat{A}}_1{\widetilde{Z}}_1 ={\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b \gamma _j (L_3^{-1} b)^TL_1^{-T}, \quad j=1, \ldots , n. \end{aligned}$$

The rest of the derivation would follow as before.

Remark 4.2

We stated Proposition 4.1 for the right-hand side $b\otimes b\otimes b$ to maintain symmetry in the overall problem. Nonetheless, the “symmetric” simplifications still hold also in the more general situation where the right-hand side is $b_1\otimes b_2\otimes b_3$. In this case, the proof goes through in the same way, with obvious modifications to the right-hand side terms, following the steps in the proof of Theorem 2.1.

Example 4.3

We investigate the accuracy of the symmetric procedure, compared with that of the general algorithm t$^3$-sylv, as the condition number of the given matrices increases. We consider $5\times 5$ symmetric and positive definite matrices with random entries taken from a uniform distribution in the interval (0,1). To this end, we define each coefficient matrix by using the Matlab function sprandsym with density 1 (giving a full matrix), type 1 (positive definite), and the same reciprocal condition number $\kappa ^{-1}=.2\,10^{-k}$, with $k=0, 0.2, 0.4, \ldots , 10$. The results are reported in Fig. 1, where the relative errors

$$\begin{aligned} \frac{\Vert x^* - x_{nonsym}\Vert }{\Vert x^*\Vert }, \quad \frac{\Vert x^* - x_{sym}\Vert }{\Vert x^*\Vert } \end{aligned}$$

are displayed, with $x_{nonsym}$ obtained by algorithm t$^3$-sylv and $x_{sym}$ by t$^3$-sym-sylv; here $x^*$ is a reference solution, obtained by using the Matlab direct solver “$\backslash $” on the Kronecker form of the problem with coefficient matrix of size $125\times 125$. The figure also displays the quantities $10^{-15} \kappa ^{3/2}$ and $10^{-15} \kappa ^{5/2}$, which appear to match the dependence of the error on the matrices conditioning when employing either of the two methods, respectively. Clearly, the possibility of using the symmetric solver significantly improves the accuracy of the obtained solution, with respect to the problem condition number. The sensitivity of the method deserves a deeper analysis that will be performed in future work.

5 Conclusions

We have proposed a new method for solving order-3 tensor linear equations. We derived a general approach relying on the Schur decomposition, and a specialized one that effectively exploits the symmetric positive definiteness of the coefficient matrices.

Although the considered class of tensor equations is restricted by the role and position of the two matrices $\textit{\textbf{H}}$ and $\textit{\textbf{M}}$, our presentation shows that this setting is sufficiently general to represent a good variety of practical problems. On the other hand, the repeated presence of ${\textit{\textbf{M}}}$ and ${\textit{\textbf{H}}}$ forced us to use the same dimensions in all modes. We will try to relax this constraint in future work. We also remark that the algebraic problem could be formulated with these two repeated matrices located in other (different) positions in the tensor equation, in a way that a similar solution derivation could be devised.

Since the right-hand side has low numerical tensor rank, we expect $\textit{\textbf{X}}$ to have low tensor rank [10, 20, Th.3.6]. Tensor-based truncation strategies could be employed to satisfactorily approximate the obtained solution. This would avoid storing the whole dense tensor, if dimensions become large.

The proposed strategy allows us to solve with an essentially direct method problems of structured large dimensions. Nonetheless, if n is required to be significantly larger and the coefficient matrices are sparse, then an iterative variant of the proposed method could be considered. As an alternative, the new method can serve as workhorse for solving the reduced equation in projection type procedures for large and sparse third order tensor equations; see similar strategies in [26] for linear matrix equations.

Notes

We recall that for this and other specialized settings, cyclic reduction and Fourier analysis could be classical viable effective alternatives; see, e.g., [27, 28].

References

Bachmayr, M., Schneider, R., Uschmajew, A.: Tensor networks and hierarchical tensors for the solution of high-dimensional partial differential equations. Found. Comput. Math. 16(6), 1423–1472 (2016)
Article MathSciNet Google Scholar
Beckermann, B., Kressner, D., Tobler, Ch.: An error analysis of Galerkin projection methods for linear systems with tensor product structure. SIAM J. Numer. Anal. 51(6), 3307–3326 (2013)
Article MathSciNet Google Scholar
Chen, Z., Lu, L.: A gradient based iterative solutions for Sylvester tensor equations. Math. Probl. Eng. 819479, 7 (2013)
MathSciNet MATH Google Scholar
Dahmen, W., DeVore, R., Grasedyck, L., Süli, E.: Tensor-sparsity of solutions to high-dimensional elliptic partial differential equations. Found. Comput. Math. 16, 813–874 (2016)
Article MathSciNet Google Scholar
Dolgov, S.V., Khoromskij, B.N., Oseledets, I.V.: Fast solution of parabolic problems in the tensor train/quantized tensor train format with initial application to the Fokker-Planck equation. SIAM J. Sci. Comput. 34(6), A3016–A3038 (2013)
Article MathSciNet Google Scholar
Dolgov, S.V.: TT-GMRES: solution to a linear system in the structured tensor format. Russ. J. Numer. Anal. Math. Model. 28(2), 149–172 (2013)
Article MathSciNet Google Scholar
Gavrilyuk, I., Khoromskij, B.N.: Tensor numerical methods: actual theory and recent applications [editorial]. Comput. Methods Appl. Math. 19(1), 1–4 (2019)
Article MathSciNet Google Scholar
Ghavimi, A.R., Laub, A.J.: Backward Error, sensitivity, and refinement of computed solutions of Algebraic Riccati Equations. Num.Linear Algebra Appl. 2(1), 29–49 (1995)
Article MathSciNet Google Scholar
Golub, G., Van Loan, C.F.: Matrix Computations, 4th edn. The Johns Hopkins University Press, Baltimore (2013)
MATH Google Scholar
Grasedyck, L.: Existence and computation of low Kronecker-rank approximations for large linear systems of tensor product structure. Computing 72, 247–265 (2004)
Article MathSciNet Google Scholar
Grasedyck, L., Kressner, D., Tobler, Ch.: A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen 36(1), 53–78 (2013)
Article MathSciNet Google Scholar
Hackbusch, W., Khoromskij, B.N., Tyrtyshnikov, E.E.: Hierarchical Kronecker tensor-product approximations. J. Numer. Math. 13(2), 119–156 (2005)
Article MathSciNet Google Scholar
Huang, B., Xie, Y., Ma, Ch.: Krylov subspace methods to solve a class of tensor equations via the Einstein product. Num. Linear Algebra Appl., 26(e2254), (2019)
Khoromskij, B.N.: Tensors-structured numerical methods in scientific computing: survey on recent advances. Chemom. Intell. Lab. Syst. 110, 1–19 (2012)
Article Google Scholar
Khoromskij, Boris N., Schwab, Christoph: Tensor-structured Galerkin approximation of parametric and stochastic elliptic PDEs. SIAM J. Sci. Comput. 33(1), 364–385 (2011)
Article MathSciNet Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009)
Article MathSciNet Google Scholar
Kressner, D., Tobler, C.: Krylov subspace methods for linear systems with tensor product structure. SIAM J. Matrix Anal. Appl. 31(4), 1688–1714 (2010)
Article MathSciNet Google Scholar
Kressner, D., Kumar, R., Nobile, F., Tobler, Ch.: Low-rank tensor approximation for high-order correlation functions of gaussian random fields. SIAM/ASA J. Uncertain. Quantif. 3, 393–416 (2015)
Article MathSciNet Google Scholar
Kressner, D., Pleinger, M., Tobler, Ch.: A preconditioned low-rank cg method for parameter-dependent Lyapunov equations. Num. Linear Algebra Appl. 21(5), 666–684 (2014)
Article MathSciNet Google Scholar
Kressner, D., Tobler, Ch.: Low-rank tensor Krylov subspace methods for parametrized linear systems. SIAM. J. Matrix Anal. Appl. 32(4), 1288–1316 (2011)
Article MathSciNet Google Scholar
Lv, Ch., Ma, Ch.: A modified CG algorithm for solving generalized coupled Sylvester tensor equations. Appl. Math. Comput. 365, 124699 (2020)
MathSciNet MATH Google Scholar
The MathWorks, Inc. MATLAB 7, r2017b edition, 2017
Matthies, H.G., Zander, E.: Solving stochastic systems with low-rank tensor compression. Linear Algebra Appl. 436, 3819–3838 (2012)
Article MathSciNet Google Scholar
Oseledets, I.V.: Tensor-Train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)
Article MathSciNet Google Scholar
Palitta, D., Simoncini, V.: Matrix-equation-based strategies for convection-diffusion equations. BIT Numer. Math. 56, 751–776 (2016)
Article MathSciNet Google Scholar
Simoncini, V.: Computational methods for linear matrix equations. SIAM Rev. 58(3), 377–441 (2016)
Article MathSciNet Google Scholar
Swarztrauber, P.N.: A direct method for the discrete solution of separable elliptic equations. SIAM J. Numer. Anal. 11(6), 1136–1150 (1974)
Article MathSciNet Google Scholar
Swarztrauber, P.N.: The methods of cyclic reduction, Fourier analysis and the facr algorithm for the discrete solution of Poisson’s equation on a rectangle. SIAM Rev. 19(3), 490–501 (1977)
Article MathSciNet Google Scholar

Download references

Acknowledgements

Open access funding provided by Alma Mater Studiorum-Universite di Bologna within the CRUI-CARE Agreement. The author is a member of Indam-GNCS. Its support is gratefully acknowledged. Part of this work was also supported by the Grant AlmaIdea 2017-2020 - Università di Bologna. We thank an anonymous reviewer for pointing to a relevant reference and for helpful remarks on the presentation.

Author information

Authors and Affiliations

Dipartimento di Matematica, Università di Bologna, Piazza di Porta S. Donato, 5, 40127, Bologna, Italy
V. Simoncini
IMATI-CNR, Pavia, Italy
V. Simoncini

Authors

V. Simoncini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. Simoncini.

Ethics declarations

Conflict of interest

There are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This version dated 24 June 2020.

Appendix

We report the Matlab code of the symmetric version of the algorithm, t$^3$-sym-sylv, as derived in Proposition 4.1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Simoncini, V. Numerical solution of a class of third order tensor linear equations. Boll Unione Mat Ital 13, 429–439 (2020). https://doi.org/10.1007/s40574-020-00247-4

Download citation

Received: 04 May 2020
Accepted: 08 July 2020
Published: 20 July 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s40574-020-00247-4

Keywords

Mathematics Subject Classification

65F10

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Numerical solution of a class of third order tensor linear equations

Abstract

Similar content being viewed by others

Tensor Methods for Solving Symmetric $${\mathcal {M}}$$ -tensor Systems

Solution of linear systems in high spatial dimensions

Solving Multi-linear Systems with $$\mathcal {M}$$ -Tensors

1 Introduction