1 Introduction

We are interested in the numerical computation of the unique solution \({\textit{\textbf{X}}}\in {\mathbb {R}}^{n\times n\times n}\) to the nonsingular system \({{\mathcal {A}}} {\textit{\textbf{x}}} = b\) written in the following tensor form

$$\begin{aligned} ( M_1\otimes A_1 \otimes {\textit{\textbf{H}}}+ A_2\otimes {\textit{\textbf{M}}} \otimes {\textit{\textbf{H}}} + H_3\otimes {\textit{\textbf{M}}} \otimes A_3) \mathrm{vec}({\textit{\textbf{X}}}) = b_3\otimes b_2\otimes b_1 \quad \end{aligned}$$
(1.1)

where all coefficient matrices are real and have the same \(n\times n\) dimensions. Here \(\otimes \) denotes the Kronecker product (to be recalled later) and vec(X) stacks the components of the tensor X one after the other. In particular, in (1.1) two terms share the same matrices, either \({\textit{\textbf{M}}}\) or \({\textit{\textbf{H}}}\) (purposely in bold face), while all other matrices \(A_i\),\(i=1,2,3\) and \(H_3, M_1\) have no relation to each other. The only assumption on the coefficient matrices, in addition to the nonsingularity of \({{\mathcal {A}}}\), is that \({\textit{\textbf{M}}}, {\textit{\textbf{H}}}\) and \(M_1,H_3\) be nonsingular. The unknown solution tensor will also be highlighted in bold face, to emphasize that this is the array to be determined.

This tensor equation is representative of a large class of problems that can be described by means of tensors and formulated as linear array equation. For instance, the discretization of three dimensional partial differential equations by means of a tensor basis, as is the case for finite differences on parallelepipedal domains or certain spectral methods, can lead to tensor equations of type (1.1). Tensor equations have become a fundamental algebraic ingredient for the numerical treatment of mathematical models depending on many parameters, as is the case in uncertainty quantification and parameter-dependent model order reduction methodologies; see, e.g., [1, 4, 5, 15, 20, 23]. In typical situations, tensor equations with many terms occur, and each term may have a number of Kronecker products. In a simplified context, for instance, the following tensor equation is of interest (see, e.g., [17])

$$\begin{aligned} \sum _{i=1}^\ell \left( I \otimes \cdots \otimes {\mathop {A_i}\limits ^{i}} \otimes \cdots \otimes I \right) {\textit{\textbf{x}}} ={{\,\mathrm{\bigotimes }\,}}_{i=1}^\ell b_i \end{aligned}$$

For \(\ell = 3\) this is a special case of our problem (1.1), where all matrices except \(A_i\), \(i=1,2,3\) are equal to the identity matrix I. Finally, we explicitly remark that if the right-hand side in (1.1) were the sum of rank-one tensors, then the solution could be expressed as the sum of solutions to tensor equations with right-hand sides of rank one.

The literature on tensors - their analysis and the associated approximation methods - has grown tremendously in the past twenty years. Different tensor representations and decompositions have been analyzed. We refer the reader to [16] for an introductory and historical account, and to [11] for a literature survey up to 2013. Numerous different decompositions have allowed the developments of various problem dependent strategies, see, e.g., [7, 12, 14, 18, 24].

More recently, methods for solving linear equations in tensor format have been proposed and analyzed. In most cases, the authors have been interested in the presence of many summands and many Kronecker products, for which iterative methods appear to be mandatory. In this context, most approaches try to take into account the Kronecker structure and the possible low rank of the involved iteration matrices, see, e.g., [2, 3, 6, 13, 19,20,21]. However, little has been said on “direct” dense methods for low order tensor equations, without the explicit use of the Kronecker form. Here we close this gap for the special case of (1.1), which nonetheless appears to be a feasible algebraic formulation of a quite large set of differential problems.

2 A closed form solution

The numerical solution to (1.1) can be given in closed form by unfolding the 3-mode tensor in one of the three directions. In particular, a tensor \(X\in {\mathbb {R}}^{n_1\times n_2\times n_3}\) can be written using the mode-1 unfolding as (see, e.g., [16])

$$\begin{aligned} X_{(1)} = [X_1, X_2, \ldots , X_{n_3}], \quad X_j \in {\mathbb {R}}^{n_1\times n_2}, j=1, \ldots , n_3; \end{aligned}$$

each \(X_j\) is called a slice, and \(X_{(1)}\) is a matrix in \({\mathbb {R}}^{{n_1}\times n_2n_3}\). Some additional standard notation needs to be recalled. The Kronecker product of two matrices XY is defined in the standard block form as

$$\begin{aligned} X \otimes Y = \begin{bmatrix} X_{1,1} Y &{} \cdots &{} X_{1,n_2} Y \\ \vdots &{} \ddots &{} \vdots \\ X_{n_1,1} Y &{} \cdots &{} X_{n_1,n_2} Y \end{bmatrix}, \end{aligned}$$

where \(X_{i,j}\) denotes an element of X. Moreover, vec(X) is the operator stacking all columns of the matrix X one after the other. In the case of third order tensors, we will apply the vec operator to the mode-1 unfolding. The reverse operation, for known dimensions of the vector x, will be denoted by \(\mathrm{mat}(x,n_1,n_2)\), so that \(x=\mathrm{vec}(X)\) and \(X=\mathrm{mat}(x,n_1,n_2)\). Similarly, \(X=\mathrm{tensor}_{(1)}(x,n_1,n_2,n_3)\) will fold a long vector into a tensor via the mode-1 unfolding. A standard property of the Kronecker product that will be used repeatedly is the following

$$\begin{aligned} \mathrm{vec}(A X B) = (B^T \otimes A) \mathrm{vec}(X), \end{aligned}$$
(2.1)

(\(B^T\) denotes the real transpose of B), which allows one to go back and forth between the vector and matrix notations. Other properties used in the sequel are i) \((A\otimes B)^T=A^T\otimes B^T\), ii) \((A\otimes B)(C\otimes D) = (A C\otimes BD)\) for compatible matrix dimensions, iii) if A and B are both invertible, then \(A\otimes B\) is invertible and \((A\otimes B)^{-1} = A^{-1} \otimes B^{-1}\); see, e.g., [9, Ch.12].

The following result holds. Here \(Q^*\) denotes the conjugate transpose of the complex matrix Q, while \({H}^{-T} = ({H}^{-1})^T\) and T denotes transposition.

Theorem 2.1

Let \(A_3^T {\textit{\textbf{H}}}^{-T} = Q R Q^{*}\) be the Schur decomposition of \(A_3^T {\textit{\textbf{H}}}^{-T}\), and \([\gamma _1, \ldots , \gamma _n]:=b_1^T {\textit{\textbf{H}}}^{-T} Q\). Using the mode-1 unfolding, the solution \(\textit{\textbf{X}}\) to (1.1) is given by

$$\begin{aligned} {\textit{\textbf{X}}}_{(1)} = [Q^{-T} {\tilde{Z}}_1^T, \ldots , Q^{-T} {\tilde{Z}}_n^T] \in {\mathbb {R}}^{n\times n^2}, \end{aligned}$$

where for \(j=1, \ldots , n\), the matrix \({\tilde{Z}}_j\) solves the generalized Sylvester equation

$$\begin{aligned} {\textit{\textbf{M}}} Z ( H_3^T R_{j,j} + A_2^T) + A_1 Z M_1^T = b_2 \gamma _j b_3^T - \mathrm{mat}({\tilde{Z}}_{j-1} R_{1:j-1,j},n,n)H_3^T . \end{aligned}$$

where \(R_{j,j}\) denotes the (jj) element of the upper triangular matrix R and \(R_{1:j-1,j}\) the first \(j-1\) components of its jth column; we define \(\mathrm{mat}({\tilde{Z}}_{j-1} R_{1:j-1,j},n,n)H_3^T\) to be an empty array for \(j=1\).

Proof

Using (2.1) for the unfolded tensor we have

$$\begin{aligned} {\textit{\textbf{H}}} {\textit{\textbf{X}}}_{(1)} (A_2\otimes {\textit{\textbf{M}}} +M_1\otimes A_1)^T + A_3 {\textit{\textbf{X}}}_{(1)} (H_3\otimes {\textit{\textbf{M}}})^T= & {} b_1 (b_3\otimes b_2)^T \\ {\textit{\textbf{X}}}_{(1)} (A_2\otimes {\textit{\textbf{M}}}+M_1\otimes A_1)^T(H_3 \otimes {\textit{\textbf{M}}})^{-T} + {\textit{\textbf{H}}}^{-1} A_3 {\textit{\textbf{X}}}_{(1)}= & {} {\textit{\textbf{H}}}^{-1} b_1 (b_3\otimes b_2)^T (H_3 \otimes {\textit{\textbf{M}}})^{-T}\\ {\textit{\textbf{X}}}_{(1)} (A_2^T H_3^{-T} \otimes I + M_1^T H_3^{-T} \otimes A_1^T {\textit{\textbf{M}}}^{-T}) + {\textit{\textbf{H}}}^{-1} A_3 {\textit{\textbf{X}}}_{(1)}= & {} {\textit{\textbf{H}}}^{-1} b_1 (b_3^T H_3^{-T} \otimes b_2^T {\textit{\textbf{M}}}^{-T}) \end{aligned}$$

For later readability, let us transpose both sides and set \(Y=({\textit{\textbf{X}}}_{(1)})^T\). Then we obtain

$$\begin{aligned} (H_3^{-1} A_2 \otimes I + H_3^{-1} M_1\otimes {\textit{\textbf{M}}}^{-1}A_1) Y + Y ({\textit{\textbf{H}}}^{-1} A_3)^T = (H_3^{-1} b_3 \otimes {\textit{\textbf{M}}}^{-1} b_2) b_1^T {\textit{\textbf{H}}}^{-T} . \end{aligned}$$

Using \(({\textit{\textbf{H}}}^{-1} A_3)^T = Q R Q^{*}\) and multiplying the equation by Q from the right, we can write

$$\begin{aligned} (H_3^{-1} A_2 \otimes I + H_3^{-1} M_1\otimes {\textit{\textbf{M}}}^{-1}A_1) Y Q + YQ R = (H_3^{-1} b_3 \otimes {\textit{\textbf{M}}}^{-1} b_2) b_1^T {\textit{\textbf{H}}}^{-T} Q . \end{aligned}$$

Let \(b_1^T {\textit{\textbf{H}}}^{-T} Q =: [\gamma _1, \ldots , \gamma _k]\) and \(Y Q=:[{\hat{z}}_1, \ldots , {\hat{z}}_n]\). Thanks to the upper triangular form of R, for the first column \({\hat{z}}_1\) it holds

$$\begin{aligned} (H_3^{-1} A_2 \otimes I + H_3^{-1} M_1\otimes {\textit{\textbf{M}}}^{-1}A_1) {\hat{z}}_1 + {\hat{z}}_1 R_{1,1} = (H_3^{-1} b_3 \otimes {\textit{\textbf{M}}}^{-1} b_2) \gamma _1 . \end{aligned}$$
(2.2)

For the subsequent columns \(j=2, \ldots , n\), taking into account once again the triangular form of R, we set \(w_{j-1} = [{\hat{z}}_1, \ldots , {\hat{z}}_{j-1}] R_{1:j-1,j}\) so that

$$\begin{aligned} \!\!(H_3^{-1} A_2 \otimes I + H_3^{-1} M_1\otimes {\textit{\textbf{M}}}^{-1}A_1) {\hat{z}}_j + {\hat{z}}_j R_{j,j} = (H_3^{-1} b_3 \otimes {\textit{\textbf{M}}}^{-1} b_2) \gamma _1 - w_{j-1} .\,\,\qquad \end{aligned}$$
(2.3)

Each column can be obtained in sequence by further unmaking the Kronecker product as follows. Let us reshape each \({\hat{z}}_j\) so that \({\hat{z}}_j = \mathrm{vec}({\hat{Z}}_j)\). By using (2.1) in (2.2) for \(j=1\), we can write

$$\begin{aligned} {\textit{\textbf{M}}}^{-1}A_1 {\hat{Z}}_1 (H_3^{-1} M_1)^T + {\hat{Z}}_1(R_{1,1} I + (H_3^{-1} A_2)^T) = {\textit{\textbf{M}}}^{-1} b_2 \gamma _1 (H_3^{-1} b_3)^T, \end{aligned}$$

which can be written as

$$\begin{aligned} {\textit{\textbf{M}}}^{-1}A_1 {\hat{Z}}_1 + {\hat{Z}}_1(R_{1,1} H_3^T M_1^{-T} + A_2^T M_1^{-T} ) = {\textit{\textbf{M}}}^{-1} b_2 \gamma _1 b_3^T M_1^{-T}, \,\, j=2, \ldots , n.\qquad \end{aligned}$$
(2.4)

Analogously, for \(j=2, \ldots , n\) and letting \(W_{j-1} = \mathrm{mat}([{\hat{z}}_1, \ldots , {\hat{z}}_{j-1}] R_{1:j-1,j})\), from (2.3) we first obtain

$$\begin{aligned} {\textit{\textbf{M}}}^{-1}A_1 {\hat{Z}}_j (H_3^{-1} M_1)^T + {\hat{Z}}_j(R_{j,j} I + (H_3^{-1} A_2)^T) = {\textit{\textbf{M}}}^{-1} b_2 \gamma _j (H_3^{-1} b_3)^T - W_{j-1} H_3^{-T}, \end{aligned}$$

or equivalently, for \(j=2, \ldots , n\), as

$$\begin{aligned} {\textit{\textbf{M}}}^{-1}A_1 {\hat{Z}}_j + {\hat{Z}}_j(R_{j,j} H_3^T M_1^{-T} + A_2^T M_1^{-T} ) = {\textit{\textbf{M}}}^{-1} b_2 \gamma _1 b_3^T M_1^{-T} W_{j-1} M_1^{-T} . \end{aligned}$$
(2.5)

Multiplying both sides by \(M_1\) (from the right) and by \({\textit{\textbf{M}}}^T\) (from the left), the result follows. \(\square \)

We notice that the use of the mode-1 unfolding is related to the specific location of the repeated matrices \({\textit{\textbf{H}}}\), \({\textit{\textbf{M}}}\). Different unfoldings could be used if these matrices occupy different positions. We also remark that the same procedure could be applied if data were complex, without any particular change in the proof, or in the algorithm below, except that one should keep in mind that property (2.1) uses real transposition, even if data are complex.

3 The new algorithm

The proof used for Theorem 2.1 is constructive, as it provides an explicit way to generate the tensor solution, one slice at the time. The complete procedure is described in the algorithm below, in the following called the Three-Term-Tensor Sylvester (\(\mathrm{T}^3\)-Sylv) method.

figure a

In practice, using appropriate transformations, the method is a nested Sylvester solver, which treats one slice at the time, and updates the corresponding coefficient matrix and right-hand side F. The solvability of the Sylvester equations is related to that of the original problem, and in particular to the nonsingularity of \({\mathcal {A}}\).

The algorithm relies on the initial Schur decomposition, which provides robust unitary transformations. Moreover, for each slice, a matrix Sylvester equation needs to be solved, whose solution also involves the Schur decompositions of the coefficient matrices, see, e.g., [26]; its sensitivity has been analyzed in [8, sec.4.1]. Stability of the overall algorithm is also affected by the presence of several inverses, which can be harmful in case of ill-conditioned matrices. Indeed, if some of the involved matrices are severely ill-conditioned, the solution may lose accuracy. This fact was experimentally observed in our experiments, some of which are reported in Example 4.3.

3.1 Numerical experiments

In this section we report on some numerical experiments with the \(\mathrm{T}^3\)-Sylv method. All experiments were performed using Matlab [22].

Example 3.1

To test the efficiency of the method, we consider dense matrices with random entries (taken from a uniform distribution in the interval (0,1), Matlab function rand) of increasing size n. The same is used for the vectors \(b_1, b_2, b_3\). We stress that the Kronecker form of the problem would involve a dense matrix \({\mathcal {A}}\) of size \(n^3\times n^3\), which could not even be stored.

We readily observe that the method is able to solve a (random) structured dense problem of size \(n^3=16,777,216\) in about 34 seconds on a standard laptop. The CPU times in Table 1 show that the computational cost of the method approximately grows between six and ten times as the dimension n doubles. However, going from n to 2n, the problem dimension in the full space would grow from \(n^3\) to \(2^3 n^3\). Hence, the actual cost appears to grow linearly with \(n^3\). Since data are dense, Gauss elimination on \({\mathcal {A}}\) would instead require \({{\mathcal {O}}}( (n^3)^3)\) floating point operations.

Table 1 CPU times (in secs) of \(\mathrm{T}^3\)-Sylv for increasing dimensions of the coefficient matrices, having uniformly distributed random entries

4 The symmetric case

If all matrices are symmetric and positive definite, the solution is also symmetric and positive semidefinite, in the appropriate tensor representation; see, e.g., [10] for a general discussion. With these hypotheses, the derivation of the solution procedure simplifies accordingly, as shown in the following result. We stress that many problems can be brought to this setting. Consider for instance the differential equation \(-\Delta u = f\) on the unit cube with zero Dirichlet boundary conditions. By discretizing using linear finite elements in each direction (this may be seen as a linear finite element discretization using \({\textit{\textbf{Q}}}_1\) brick elements), we obtain

$$\begin{aligned} ( M\otimes A \otimes M + A\otimes M \otimes M + M\otimes M \otimes A) \mathrm{vec}({\textit{\textbf{X}}}) = F, \end{aligned}$$

with \(M=\mathrm{tridiag}(-1, \underline{4}, -1)\in {\mathbb {R}}^{n\times n}\) and \(A=\mathrm{tridiag}(-1, \underline{2}, -1)\in {\mathbb {R}}^{n\times n}\) (the underlined number corresponds to the diagonal entry in the two symmetric and tridiagonal matrices). If f can be well approximated by a separable function in spatial dimensions, then F will have the desired Kronecker form, see, e.g., [10, 17] for a similar descriptionFootnote 1.

Analogously, one could consider the equation \({{\mathcal {L}}}(u) = f(x,y,z)\), \((x,y,z)\in \Omega \subset {\mathbb {R}}^3\) and

$$\begin{aligned} {{\mathcal {L}}}(u)= -{\textit{\textbf{m}}}(z) h_3(y) \phi _1(x) u_{xx} - m_1(x) {\textit{\textbf{h}}}(z)\phi _2(y) u_{yy} - {\textit{\textbf{h}}}(x) {\textit{\textbf{m}}}(y) \phi _3(z) u_{zz}, \end{aligned}$$

while \(f(x,y,z)=f_1(x)f_2(y)f_3(z)\). Finite difference discretization leads to the Kronecker form in (1.1), where, \({\textit{\textbf{M}}}\) is a diagonal matrix containing the coefficients in \({\textit{\textbf{m}}}(z_k)\) at the interior nodes \(z_k\) in the z-direction; similarly for the other coefficients. The matrices \(A_i\), \(i=1,2,3\) contain the three-point stencil of the discretized second order derivative in each direction, respectively, together with the coefficient \(\phi _i\); see, e.g., [25].

We do not report numerical results with data stemming from these discretizations, as they would not significantly differ from those shown in our examples, for dense data.

We then describe the specialized result for symmetric and positive definite matrices. This leads to better stability properties of the algorithm (see Example 4.3).

Proposition 4.1

Assume all coefficient matrices in (1.1) are symmetric and positive definite, and assume that \(b\equiv b_1=b_2=b_3\). Let \({\textit{\textbf{H}}}={\textit{\textbf{L}}}_{\textit{\textbf{H}}}{\textit{\textbf{L}}}_{\textit{\textbf{H}}}^T\), \({\textit{\textbf{M}}}={\textit{\textbf{L}}}_{\textit{\textbf{M}}}{\textit{\textbf{L}}}_{\textit{\textbf{M}}}^T\), \(H_3=L_3 L_3^T\) and \(L_3^{-1} M_1 L_3^{-T}=L_1 L_1^T\) be the Cholesky factorizations of the corresponding matrices. Let \({\widehat{A}}_3 = Q \Lambda Q^T\) be the eigenvalue decomposition of \({\widehat{A}}_3 = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} A_3 {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-T}\) and \([\gamma _1, \ldots , \gamma _n]:={b}^T {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} Q\). Using the mode-1 unfolding, the solution \(\textit{\textbf{X}}\) to (1.1) is given by

$$\begin{aligned} {\textit{\textbf{X}}}_{(1)} = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} Q [ \mathrm{vec}({\textit{\textbf{L}}}_{\textit{\textbf{M}}}{\widehat{Z}}_1 L_3^{-1}), \ldots , \mathrm{vec}({\textit{\textbf{L}}}_{\textit{\textbf{M}}}{\widehat{Z}}_n L_3^{-1})]^T, \end{aligned}$$

where for \(j=1, \ldots , n\), \({\widehat{Z}}_1={\widetilde{Z}}_j L_1^{-1}\) and the matrix \({\widetilde{Z}}_j\) solves the generalized Sylvester equation

$$\begin{aligned} {\widetilde{Z}}_j L_1^{-1} (\lambda _jI + {\widehat{A}}_2)L_1^{-T} + {\widehat{A}}_1{\widetilde{Z}}_1 = {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b \gamma _j (L_3^{-1} b)^TL_1^{-T}, \quad j=1, \ldots , n, \end{aligned}$$

with \({\widehat{A}}_2 = L_3^{-1} A_2 L_3^{-T}\) and \({\widehat{A}}_1 = {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} A_1 {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-T}\).

Proof

Consider the following Cholesky factorizations of the given matrices

$$\begin{aligned} {\textit{\textbf{H}}} = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}{\textit{\textbf{L}}}_{\textit{\textbf{H}}}^T, \quad H_3\otimes {\textit{\textbf{M}}} = (L_3 \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}) (L_3^T \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^T) . \end{aligned}$$

Using (2.1) for the unfolded tensor we have

$$\begin{aligned}&{\textit{\textbf{H}}}{\textit{\textbf{X}}}_{(1)} (A_2\otimes {\textit{\textbf{M}}}+M_1\otimes A_1) + A_3 {\textit{\textbf{X}}}_{(1)} (H_3\otimes {\textit{\textbf{M}}}) = b (b\otimes b)^T \\&{\textit{\textbf{L}}}_{\textit{\textbf{H}}}^T {\textit{\textbf{X}}}_{(1)} (A_2\otimes {\textit{\textbf{M}}}+M_1\otimes A_1) (L_3^{-T} \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-T}) + {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} A_3 {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-T} {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^T {\textit{\textbf{X}}}_{(1)} (L_3 \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}) \\&\quad = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} b (b\otimes b)^T (L_3^{-T} \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-T})\\&\widehat{\textit{\textbf{X}}}_{(1)} ({\widehat{A}}_2\otimes I +{\widehat{M}}_1\otimes {\widehat{A}}_1) + {\widehat{A}}_3 \widehat{\textit{\textbf{X}}}_{(1)} = {\widehat{b}} (b^T L_3^{-T} \otimes b^T {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-T}) \end{aligned}$$

where we have defined \({\widehat{b}} = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} b\), \(\widehat{\textit{\textbf{X}}}_{(1)} = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^T {\textit{\textbf{X}}}_{(1)} (L_3 \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}})\), \({\widehat{A}}_3 = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} A_3 {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-T}\), \({\widehat{A}}_2=L_3^{-1} A_2 L_3^{-T}\), \({\widehat{M}}_1=L_3^{-1} M_1 L_3^{-T}\) and \({\widehat{A}}_1={\textit{\textbf{L}}}_{{\textit{\textbf{M}}}}^{-1} A_1 {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-T}\). Note that all these “hat” coefficient matrices are still symmetric and positive definite. For later readability, let us transpose both sides and set \(Y=(\widehat{\textit{\textbf{X}}}_{(1)})^T\). Then we obtain

$$\begin{aligned} ({\widehat{A}}_2\otimes I +{\widehat{M}}_1\otimes {\widehat{A}}_1) Y + Y {\widehat{A}}_3 = (L_3^{-1} b \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b) {\widehat{b}}^T . \end{aligned}$$

Using \({\widehat{A}}_3 = Q \Lambda Q^T\) and multiplying the equation by Q from the right, we can write

$$\begin{aligned} ({\widehat{A}}_2\otimes I +{\widehat{M}}_1\otimes {\widehat{A}}_1) Y Q + YQ \Lambda = (L_3^{-1} b \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b) {\widehat{b}}^T Q. \end{aligned}$$

Let \({\widehat{b}}^T Q =: [\gamma _1, \ldots , \gamma _k]\) and \(Y Q=:[{\hat{z}}_1, \ldots , {\hat{z}}_n]\). Thanks to the diagonal form of \(\Lambda \), namely \(\Lambda =\mathrm{diag}(\lambda _1, \ldots , \lambda _n)\), for each column \({\hat{z}}_j\) it holds

$$\begin{aligned} ({\widehat{A}}_2\otimes I +{\widehat{M}}_1\otimes {\widehat{A}}_1) {\hat{z}}_j + {\hat{z}}_j \lambda _j = (L_3^{-1} b \otimes {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b) \gamma _j . \end{aligned}$$
(4.1)

Each column can then be obtained in sequence by further unmaking the Kronecker product as follows. Let us reshape each \({\hat{z}}_j\) so that \({\hat{z}}_j = \mathrm{vec}({\widehat{Z}}_j)\). By using (2.1) in (4.1), we can write

$$\begin{aligned} {\widehat{Z}}_j (\lambda _jI + {\widehat{A}}_2) + {\widehat{A}}_1{\widehat{Z}}_1 {\widehat{M}}_1 = {\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b \gamma _j (L_3^{-1} b)^T, \end{aligned}$$

which, upon factorization of \({\widehat{M}}_1\) as \({\widehat{M}}_1=L_1 L_1^T\) can be written as

$$\begin{aligned} {\widetilde{Z}}_j L_1^{-1} (\lambda _jI + {\widehat{A}}_2)L_1^{-T} + {\widehat{A}}_1{\widetilde{Z}}_1 ={\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b \gamma _j (L_3^{-1} b)^TL_1^{-T}, \quad j=1, \ldots , n{,} \end{aligned}$$
(4.2)

where \({\widetilde{Z}}_j = {\widehat{Z}}_j L_1\). The matrix equation (4.2) is a standard Sylvester equation, which can be solved for each j. Once \({\widehat{Z}}_j\) is recovered, we obtain

$$\begin{aligned} {\textit{\textbf{X}}}_{(1)} = {\textit{\textbf{L}}}_{\textit{\textbf{H}}}^{-1} Q [ \mathrm{vec}({\textit{\textbf{L}}}_{\textit{\textbf{M}}}{\widehat{Z}}_1 L_3^{-1}), \ldots , \mathrm{vec}({\textit{\textbf{L}}}_{\textit{\textbf{M}}}{\widehat{Z}}_n L_3^{-1})]^T. \end{aligned}$$

\(\square \)

We will call the corresponding algorithm t\(^3\)-sym-sylv. Its Matlab implementation is reported in the Appendix.

We notice that the proof does not require all matrices to be positive definite, and \(A_3\) only needs to be symmetric. In fact, the matrices \(A_2\) and \(A_1\) do not even need to be symmetric, although the current proof omits the corresponding transpositions. The proof could be easily adapted to treat this setting. Indeed, for \(A_1, A_2\) nonsymmetric the Sylvester equation (4.2) could be written as

$$\begin{aligned} {\widetilde{Z}}_j L_1^{-1} (\lambda _jI + {\widehat{A}}_2^T)L_1^{-T} + {\widehat{A}}_1{\widetilde{Z}}_1 ={\textit{\textbf{L}}}_{\textit{\textbf{M}}}^{-1} b \gamma _j (L_3^{-1} b)^TL_1^{-T}, \quad j=1, \ldots , n. \end{aligned}$$

The rest of the derivation would follow as before.

Remark 4.2

We stated Proposition 4.1 for the right-hand side \(b\otimes b\otimes b\) to maintain symmetry in the overall problem. Nonetheless, the “symmetric” simplifications still hold also in the more general situation where the right-hand side is \(b_1\otimes b_2\otimes b_3\). In this case, the proof goes through in the same way, with obvious modifications to the right-hand side terms, following the steps in the proof of Theorem 2.1.

Example 4.3

We investigate the accuracy of the symmetric procedure, compared with that of the general algorithm t\(^3\)-sylv, as the condition number of the given matrices increases. We consider \(5\times 5\) symmetric and positive definite matrices with random entries taken from a uniform distribution in the interval (0,1). To this end, we define each coefficient matrix by using the Matlab function sprandsym with density 1 (giving a full matrix), type 1 (positive definite), and the same reciprocal condition number \(\kappa ^{-1}=.2\,10^{-k}\), with \(k=0, 0.2, 0.4, \ldots , 10\). The results are reported in Fig. 1, where the relative errors

$$\begin{aligned} \frac{\Vert x^* - x_{nonsym}\Vert }{\Vert x^*\Vert }, \quad \frac{\Vert x^* - x_{sym}\Vert }{\Vert x^*\Vert } \end{aligned}$$

are displayed, with \(x_{nonsym}\) obtained by algorithm t\(^3\)-sylv and \(x_{sym}\) by t\(^3\)-sym-sylv; here \(x^*\) is a reference solution, obtained by using the Matlab direct solver “\(\backslash \)” on the Kronecker form of the problem with coefficient matrix of size \(125\times 125\). The figure also displays the quantities \(10^{-15} \kappa ^{3/2}\) and \(10^{-15} \kappa ^{5/2}\), which appear to match the dependence of the error on the matrices conditioning when employing either of the two methods, respectively. Clearly, the possibility of using the symmetric solver significantly improves the accuracy of the obtained solution, with respect to the problem condition number. The sensitivity of the method deserves a deeper analysis that will be performed in future work.

Fig. 1
figure 1

Dependence of the symmetric and nonsymmetric solver accuracy on the matrices condition number

5 Conclusions

We have proposed a new method for solving order-3 tensor linear equations. We derived a general approach relying on the Schur decomposition, and a specialized one that effectively exploits the symmetric positive definiteness of the coefficient matrices.

Although the considered class of tensor equations is restricted by the role and position of the two matrices \(\textit{\textbf{H}}\) and \(\textit{\textbf{M}}\), our presentation shows that this setting is sufficiently general to represent a good variety of practical problems. On the other hand, the repeated presence of \({\textit{\textbf{M}}}\) and \({\textit{\textbf{H}}}\) forced us to use the same dimensions in all modes. We will try to relax this constraint in future work. We also remark that the algebraic problem could be formulated with these two repeated matrices located in other (different) positions in the tensor equation, in a way that a similar solution derivation could be devised.

Since the right-hand side has low numerical tensor rank, we expect \(\textit{\textbf{X}}\) to have low tensor rank [10, 20, Th.3.6]. Tensor-based truncation strategies could be employed to satisfactorily approximate the obtained solution. This would avoid storing the whole dense tensor, if dimensions become large.

The proposed strategy allows us to solve with an essentially direct method problems of structured large dimensions. Nonetheless, if n is required to be significantly larger and the coefficient matrices are sparse, then an iterative variant of the proposed method could be considered. As an alternative, the new method can serve as workhorse for solving the reduced equation in projection type procedures for large and sparse third order tensor equations; see similar strategies in [26] for linear matrix equations.