Iterative optimal solutions of linear matrix equations for hyperspectral and multispectral image fusing

For a linear matrix function f in X∈Rm×n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X \in {\mathbb {R}}^{m\times n}$$\end{document} we consider inhomogeneous linear matrix equations f(X)=E\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(X) = E$$\end{document} for E≠0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E \ne 0$$\end{document} that have or do not have solutions. For such systems we compute optimal norm constrained solutions iteratively using the Conjugate Gradient and Lanczos’ methods in combination with the More–Sorensen optimizer. We build codes for ten linear matrix equations, of Sylvester, Lyapunov, Stein and structured types and their T-versions, that differ only in two five times repeated equation specific code lines. Numerical experiments with linear matrix equations are performed that illustrate universality and efficiency of our method for dense and small data matrices, as well as for sparse and certain structured input matrices. Specifically we show how to adapt our universal method for sparse inputs and for structured data such as encountered when fusing image data sets via a Sylvester equation algorithm to obtain an image of higher resolution.


I. INTRODUCTION
Linear systems have a long history and near infinitely many uses and applications.The most basic linear vector system is Ax = b for an m by n matrix A and vectors x ∈ R n and b ∈ R m where A and b are given and x is unknown.Clearly Ax = b is solvable precisely when b lies in the column space of A. Otherwise the given system is unsolvable.Yet even then a best 'near solution' may be useful for applications.And for unsolvable linear vector equations Ax = b, one might need to find a vector x that minimizes the residue Ax−b over all x ∈ R n , measured in an appropriate norm.This is called the 'least squares problem' for linear matrix vector equations when we use the Euclidean norm.
Generalizing to matrix equations, we call an equation f (X) = E linear if f is linear in the unknown matrix X.In this sense, the classical Sylvester equations AX + XB = E with f (X) = AX + XB or A T XA + B T XB = E with g(X) = A T XA + B T XB are linear matrix equations in the unknown matrix X, and so is the commutator equation AX − XB = O for h(X) = AX − XB.More specifically, the eigenvector equation Ax − xλ = 0 for a known eigenvalue λ of A is linear, and so forth.Both the continuous Lyapunov equation AX + XA T = E and its discrete analogue AXA T − X = E have this same linear form f (X) = E for different linear matrix functions f .In this paper we deal with generalized versions of inhomogeneous Sylvester, Lyapunov and Stein equations [43], [37], as well as with their transposed or T-versions of the following general form: where A k ∈ R p×m , B k ∈ R n×q , C j ∈ R p×n , D j ∈ R m×q , E ∈ R p×q are given for k = 1, ..., k 0 , j = 1, .., j 0 and X ∈ R m×n is unknown.Here the row and column sizes m, n, p, q are so that the intended matrix multiplications in (I.1) can be performed and either sum can be void.We call a matrix equation to be of T-type if the second sum (involving X T ) is not void, i.e., if j 0 > 0.
Fusing hyperspectral (HS) and multispectral (MS) images, also known as multiband image fusion, has recently drawn special attention in remote sensing [26], [29], [36].Its purpose is to reconstruct a high-spatial and highspectral multiband image from two degraded and complementary observed images.Based on [41] and [42], this challenging task can be solved by using a Sylvester equation for large sparse matrices with a certain structure which is just a special case of (I.1).Here we develop a universal method to solve a multitude of linear matrix equations and then adapt it to solve the sparse structured Sylvester equation for multiband image fusion problems efficiently.
Here we consider the generalized Sylvester matrix equation f (X) = k0 k=1 A k XB k + j0 j=1 C j XD j = E (I.1), induced from multiband image fusion.This equation is well studied and has many other applications, see [6], [9], [20], [30], [34], [41], [50], [27] for example.In particular, we consider its classical Tikhonov regularization: Find arg min where ζ > 0 is the regularization parameter.Problem (I.2) is equivalent to the following Frobenius norm 'least squares problem' with norm inequality constraint: for some constant ∆ > 0. The proof of the equivalence of formulation (I.3) and Problem (I.2) is given in Section II.Finally note that since {X : X F ∆} is compact, Problem (I.3) has at least one global minimum by Weierstrass' Theorem.
The rest of this paper is structured as follows.Section II gives some notations and preliminaries.In Section III we propose and develop a matrix product based iterative method to solve Sylvester type matrix equations that uses the generalized Lanczos trust region algorithm (GLTR) for solving problem (I.2), see [35] and [14].Our GLTR algorithm is based on the Steihaug-Toint algorithm [35], [38].In Section IV we prove general convergence of the method and speed up the algorithm further.In Section V, applications from the literature and numerical tests will illustrate the efficiency and accuracy of our algorithm in theoretical and real world applications, both for generalized Sylvester, Lyapunov and Stein equations and their T-versions for dense, sparse and structured sparse matrices, respectively and all alike.
Previously semi-direct canonical form methods have been used for dense matrix equations problems, while Krylov projection methods are generally preferred for sparse linear matrix equations.The first class of methods is based on normal form computations of associated matrices and uses Francis' QR algorithm or SVD computations to form triangular equivalent systems that are then solved for the entries of the unknown solution X. See Bartels and Stewart [4] for Sylvester and Kitagawa [23] or Barraud [3] for Lyapunov equations.With the advent of multishift Francis QR by Braman, Byers and Mathias [7], [8], normal form based methods could theoretically be applied for matrix dimensions up to 10,000 by 10,000 and succeed.The Krylov projection approach was developed more recently, see the survey article by Simoncini [34] or her earlier paper [33] for solving sparse Lyapunov matrix equations and also Dopico [11] for sparse structured T-Sylvester equations.
Currently linear matrix equation problems have become huge and structured.Sparse and direct eigen based methods are generally not able to handle such inputs efficiently.Our iterative method relies completely on matrix multiplications and has low overhead and low storage requirements.Our set of algorithms and their computational codes are an extension and outgrowth of the second named author's two previous papers [45], [44], which have dealt with the 1-term Sylvester type matrix equation AXB = E.The current paper builds in part on these earlier works and refines the algorithm, as well as extends it to solve three new classes of linear matrix equations.Moreover, we deal with sparse and structured input matrices as well.In each of our eleven versions for Sylvester-like linear matrix equation problems, only ten lines of code use maximally four matrix multiplications each and our iterations counts stay low.This gives our iterative method a great advantage for dense matrices over canonical form based methods, as well as performing well for general sparse matrices.And moreover, our codes and method can easily be adapted for structured matrices, see the Subsection C for a computed example from multiband image fusion.Our iterative algorithms work alike for solvable and unsolvable linear matrix equations and do so without any known spectral restrictions on the input matrices that sparse or structured Krylov methods often encounter, see [11, Numerical Tests 7.3 through 7.9] for example.

II. NOTATIONS AND PRELIMINARIES
Throughout this paper, I represents the identity matrix of appropriate dimension, and A T and A F denote the transpose and the Frobenius norm of the matrix A, respectively.For A = (a ij ) ∈ R p×m and B = (b ij ) ∈ R n×q , A ⊗ B denotes the Kronecker product of A and B, that is, A ⊗ B = (a ij B) ∈ R pn×mq .The inner product in R k× is defined by A, B = tr (B T A) for A, B ∈ R k× and the induced matrix norm then becomes the Frobenius norm.
In the algorithms and codes that follow we will use the adjoint function f * with respect to the inner product A, B = tr (B T A) of a given linear matrix function f in the form (I.1).By definition, the adjoint of a linear function f with respect to any inner product .., .. is the function f * for which f (x), y = x, f * (y) holds for all x and y in their respective domains.Since f in (I.1) is linear in each of its terms it suffices to find the adjoint of a typical Sylvester summand h(X) = A k XB k in (I.1) and of its T-Sylvester counterpart g(X) = C j X T D j individually.Using elementary properties of the matrix trace function, one can easily derive the identity Likewise for a T-Sylvester term of the form g(X) = C j X T D j in (I.1) we can again use the cyclic property for two or more factors such as tr (AB) = tr (BA) or tr (ABC) = tr (BCA) and the symmetric property tr (X T Y ) = tr (XY T ) for two factor matrix products.Hence if A. Proof of the equivalence of Problem (I.3) and Problem (I.2) Proof.According to [32], to understand the equivalence of formulation (I.3) and Problem (I.2), we observe that if E / ∈ R(f ) with R(f ) denoting the range of f , then any solution of Problem (I.3) is a minimizer.Therefore the Karush-Kuhn-Tucker conditions for a feasible solution X of Problem (I.3) with corresponding Lagrange multiplier λ are (a where from hereon out we continue to write .. instead of subscripting norms by ... F as we will always use the Frobenius norm here .We now describe our iterative method to solve Problem (III.1) in basic detail, as designed to solve Problem (I.2).

Algorithm 3.1: Generalized Sylvester Equation; Basic Version
Input : Compatibly sized input matrices for f (..) = E and a positive real number ∆.

End While
While Switch = 1 and Done = 0 do: (Second (boundary) branch) 2.2 : Find the optimal solution h k of :

End While
Output : Solution matrix X * = X k+1 (from branch 1) or Xk (from branch 2), iterations counter k Remark III.1.If R 0 = −(f * (E)) = 0 then X * = 0 solves Problem (I.3) according to Theorem IV.1 of Section IV below.Therefore we only consider the case R 0 = 0 in all versions of the algorithm that we consider.
The basic iteration of Algorithm 3.1 involves two branches: The first uses the Conjugate Gradient (CG) method in step 1.2 and tries to compute the solution of Problem (I.3) inside the feasible region {X | X < ∆}, see also [14], [35].
When Problem (I.3) cannot be solved in the feasible region via CG, we solve Problem (III.2) instead.In this case the optimal solution lies on the boundary {X : X = ∆} according to Theorems (A.9) and (IV.5), and it is obtained by the More-Sorensen algorithm [31].
The flow chart of this algorithm is in Figure 1.
Now we detail how to solve Problem (III.2).This will complete Algorithm 3.1.Based on Theorem A.9 of Appendix A we solve Problem (A.9) in Algorithm 3.2 below to get one solution of Problem (III.2).Our method is based on the work of J. J.More and D. C. Sorensen in [31] and executed here in slightly different form.
Step 2.2: Solve the Subproblem via More-Sorensen Done=0 Step 1.2: CG method Switch = 0 and Done = 0 Step  In this section, we develop solvability conditions for Problem (I.3), equivalent to our original Problem (I.2), and show that Problem (I.3) can be solved in finitely many iterations if we disregard rounding errors and if subproblem (III.2) can be solved.Then we propose a more effective algorithm than Algorithm 3.1.First let us recall Problem (III.1):min Theorem IV.1.(Solvability condition) The matrix X * is a solution of Problem (I.3) if and only if X * is feasible, i.e., X * ∆, and there is a scalar λ * 0 such that Next we show that problem (I.3) can be solved in finitely many steps in the absence of rounding errors.When the algorithm does not enter the second branch, Remark A.3 of Appendix A tells us that our algorithm has found a solution after finitely many iterations in its first branch.If the algorithm, however, switches to its second branch we only need to show that the second branch stopping criterion will then be satisfied after finitely many steps.The actual proof of this is as follows.
Theorem IV.2.Suppose that the sequences {X k }, {R k } are generated by Algorithm 3.1.Then the following equation holds for all Proof.We use induction.For k = 0 the conclusion holds.Assume that the conclusion holds for k Remark IV.3.Combining Remark A.3 and Theorem IV.2 we see that Hence according to Theorem IV.1, λ * = 0 and X k+1 solves Problem (I.3).
for some λ k ≥ 0, which imply that Xk solves Problem (I.3) according to Theorem IV.1.
Remark IV.6.Based on Lemma A.4, the matrices Therefore the second stopping criterion of the algorithm will be satisfied after finitely many iterations except for rounding errors.
Remark IV.7.(Convergence) According to Remarks A.3 and IV.6, when disregarding rounding errors a solution is obtained in at most m • n iterations by Algorithm 3.1 when using its first branch exclusively or when using both branches in conjunction.
According to Theorem A.6, Algorithm 3.1 can be shortened as follows.

Algorithm 4.1: Generalized Sylvester Equation; Simplified Version
Input : Compatibly sized matrices A, ..., E and a positive real number ∆.

End While
While Switch = 1 and Done = 0 do: (boundary optimum search) Step 2.2: Solve the Subproblem via More-Sorensen Done=0 Step 1.1: CG method (Invisible Lanczos method) Step 2.1: Lanczos method Switch = 0 and Done = 0   2.2 : Use Algorithm 2.2 to compute the solution h k of Problem (A.9),

End While
Output : Solution matrix X * = X k+1 or Xk , iterations counter k.
The flow chart for the simpler and faster version in Algorithm 4.1 is in Figure 2.

V. APPLICATIONS, NUMERICAL TESTS AND COMPARISONS
We have adapted our algorithm to solve nine Sylvester and T-Sylvester type inhomogeneous linear matrix equations f (X) = E = 0 that fall into 4 different classes in Table 1.Note that we also adapt a fast f and f * matrix product implementation code in a model that solves structured sparse Sylvester equations for multiband image fusion via the equation AX + XD = E in fastmult_SGLTR_3i_ADE.m.The MATLAB m-files in the above list differ in just ten entry lines where the respective equation defining linear functions f (X) and their adjoints f * (...) have been adjusted for each of the ten different linear matrix function f .All ten program codes have been tested and they are available on-line at [39].Several detailed numerical examples follow below.
A. Small Random Coefficient Matrix Case.
Our first test uses small and simple random entry matrices to show that Algorithm 4.1 finds the unique norm bounded solution X of the Sylvester equation AXB + CXD = E = 0 precisely except for rounding errors.

For all ∆
X the prescribed solution X is retrieved with small inaccuracies in the last 2 or 3 digits of Matlab's sixteen and for ∆ = 0.99 • X and 0.999 • X the all integer entries of the solution matrix X become increasingly recognizable in the computed X * .Note further that in this example the algorithm takes between 30 and 43 iterations which exceed the theoretical convergence bound of m • n = 5 • 5.
Our second example expands on the first.Here we perturb the fixed right hand side matrix E of the Sylvester equation for the same random entry matrices A through D and now try to solve A Here E p is a small perturbation random entry matrix with E p = X /10.In this example the norm of its solution X * p must differ from the earlier solution X * of the previous unperturbed example.By construction, the perturbed equation is unsolvable the perturbed Sylvester equation can at most equal the right hand side perturbation of size E p .This is borne out in the following graph in which the horizontal line is drawn at E p = 4.3966 and the final relative matrix error of the optimal solution X * p of the perturbed system with X * p ∆ has the size 2.8259 which is well below E p once ∆ is chosen to exceed X , see the annotations of Figure 4.
Note that the relative residual matrix equation errors do not decrease monotonically in general.But the norms X i of the iterates X i increase monotonically in practice as they do in theory, see Lemma A.2 and Figure 5.
In this example our algorithm takes 30 to 35 iterations, again exceeding the theoretical maximal iterations bound of m•n = 5•5 = 25 slightly.We have also investigated the effect of not starting with X 0 = 0. What if we started with a matrix X 0 = 0 of norm ∆/10 or even larger?This extended the number of iterations by 30 to 40 % and gave no better results at all, especially when X 0 was chosen with X 0 > X opt for the given input matrices A through E.
The final example in this subsection involves three matrices A, B, and E, each of size 28 by 28, for which the classical Sylvester equation AX + XB = E cannot be solved with any ∆ according to the well know theory.In our chosen random entries example, the global optimal solution X has a Frobenius norm of approximately 2600.We vary ∆ from 29 through 5800 and record whether the norm restricted optimal solution X was computed on the boundary sphere {X | X = ∆} or in the interior {X | { X < ∆} and depict the relative matrix equation error AX + XB − E / X in Figure 6.The blue dots in the plot indicate for which ∆ the optimal norm constrained solution was computed on the ∆-sphere, while the red + signs indicate that for these ∆ values, the optimal solution was found inside the ∆-sphere.For ∆ < 2600 the maximal of interior branch iterations was achieved just below ∆ = 2600 with 279 interior branch iterations and maximally 39 additional boundary branch iteration steps.For ∆ > 2600 the algorithm only used the interior branch and 1,902 iterations for every ∆ > 2600.
For smaller ∆ ≤ 1100 our algorithm used maximally just 4 interior steps and maximally 18 boundary steps.This data is representative for many similarly sized examples with a relative break-off tolerance of 10 −10 .

B. Sparse Coefficient Matrix Case.
Here we compare our method with MATLAB's built in sylvester.mfunction for solving C 1 X + XC 2 = C 3 for data from [41, formula (7)] and with a simple eigenvalue eigenvector approach suggested by Dopico [12].MATLAB's Sylvester equation solver is based on canonical forms and eigenspace computations as well as blocking methods.It is built on the work of Bartels and Stewart [4] and further extentions of this method by Jonsson and Figure 6 run time speed up relative error average factor Kågström [21], [22].In the computed example below C 1 is 4 by 4, C 2 is 6400 by 6400 and sparse, and C 3 , as well as X are 4 by 6400 data matrices.Since C 1 is small (4 by 4), an appealing solution method [12] might be to diagonalize C 1 = V DV −1 and solve We do this one row of X = V −1 X at a time by Gaussian elimination for the linear system X(j, :)(d j I + C 2 ) = C3 (j, :) Below we include time and accuracy data with C 2 (of size 6400 × 6400) in full and sparse modes for these methods that were performed on the same platform with err = 10 −14 , see Table 2.
The solutions X sparse (with C 2 in sparse matrix mode) and X f ull (C 2 in full mode) computed via CG plus Lanzcos or Gauss, Y from sylvester.m(with C 2 necessarily in full mode) and the solution from [41] differ very slightly : On a platform in Toulouse, Qi Wei compared our CG algorithm to the algorithm developed in [41] that was designed for structured huge input data but not for unstructured sparse problems such as ours is.There our method took 3.3 seconds while the method of [41] took 27 seconds.More comparisons with structured data inputs of our and the method of [41] follow in Section IV. 3. We repeat that our CG plus Lanzcos method is iterative and uses matrix multiplications throughout which work for both sparse and full matrices in MATLAB without requiring any changes in the code while sylvester.m-by using canonical forms and blocking techniques -can only work when all data matrices C i are in full MATLAB mode and the simple eigen based method of [12] requires one of C 1 or C 2 to be a very small matrix.

C. Coefficient Matrix with Kronecker Structure.
This example deals with another image fusion problem with a set of much larger image data matrices one of which has Kronecker structure.It compares the recent work of Qi Wei, Nicolas Dobigeon, and Jean-Yves Tourneret [41] with our algorithm when used for structured left hand side data matrices A through D.
Here we use image data that was acquired over Moffett Field in California in 1994 by JPL and NASA airborne visible and infrared imaging spectrometers (AVIRIS) [16].The original image set has high-spatial and high-spectral resolution.It contains 224 images X i , each of size 390 × 180 for i = 1, • • • , 224.The original data is stored as a third-order tensor X of dimensions 390 × 180 × 224.Its matrix version is X = (vec X ) T ∈ R 224×70200 where we define vec X Each row of X contains one image X i .In practice, the high spatial and high-spectral image data X is unknown.
One aim of image fusion is to approximate the unknown high-spatial high-spectral data from known high-spatial low-spectral multispectral (MS) data (or high spatial resolution panchromatic (PAN) data) with low-spatial highspectral hyperspectral (HS) data.These known complementary image data sets result from linear spectral and spatial degradations of the full resolution image data X, according to the well-regarded model developed in [42], [40] and [41] Y M = LX + N M , Y H = XBS + N H , where (V.1) • X ∈ R 224×70200 is the full resolution target image data with 224 bands (or rows) and 70200 pixels (or columns).A composite RGB image can be formed by selecting the red (or 28th), green (or 19th), blue (or 11th) bands of the target image data.This is shown in Figure 7 (a) as taken from [40] and [41].• L ∈ R 1×70200 is the apriori known spectral degradation, which depends on the spectral response of the MS sensor, see [41] and [48] again.
• B ∈ R 70200×70200 is a cyclic convolution operator acting on each band.Specifically, B has the factored form B = F DF H , where D ∈ R 70200×70200 is a diagonal matrix from [42] and [41], and F and F H are the discrete Fourier transforms (DFT) and the inverse DFT transform, respectively (F F H = F H F = I 70200 ).
• S ∈ R 70200×2808 is a downsampling matrix (with downsampling factor denoted by 25 = 5 × 5) acting on each band (vec X i ) T B for i = 1, • • • , 224 as introduced in [41]; Moreover, the downsampling matrix satisfies the property S H S = I 2808 and the matrix SS H = I 70200 is idempotent, i.e., (S H S)(SS H ) = (S H S) 2 = S H S. S H S. Based on the Lemma 1 of [42], the following result are obtained where 1 5 ∈ R 5 is a vector of ones.
• N M and N H are the MS and HS noise matrices, respectively.The noise matrices are assumed distributed according to the following matrix normal distributions [41].
In this specific image fusion, following [41] we ignore the noise terms N M and N H by setting both Λ M and Λ H equal to the identity matrix.
The solution X = [vec X 1 , vec X 2 , • • • , vec X 224 ] T of this problem does not have full rank because each band (row) (vec X i ) T lies in a subspace whose dimension (set 10) is much smaller than the number of bands 224.In other words, X = HU where H is a full column rank matrix and U is the projection of X onto the subspace spanned by the column of H.In this image simulation, the matrix H is determined from a principal component analysis (PCA) of the HS data Y H as explained in [40].
It is clear how to formulate a Sylvester matrix equation from the linear model (V.1) directly according to the discussions of [42] and [41] for reconstructing the target image in absence of regularization.Namely: where After solving this Sylvester matrix equation without regularization for U * , the desired fused image is X * = HU * .The key issues here is how to solve the Sylvester equation (V.3).Its main difficulty is the huge size of C 2 = BS(BS) H ∈ R 70200×70200 .The second Sylvester term of (V.3) contains C 2 as a factor.C 2 has around 5 • 10 10 entries and thus is too huge to construct, to compute with directly or to store explicitly.And besides, C 1 is not small as was the case in Section IV.2 .
Fortunately, C 2 = BS(BS) H = F D(F H SS H F )D H F H has a specific structure as the product of the Kronecker structure sparse matrix F H SS H F , the DFT matrix F and the diagonal matrices D according to formula (V.2).Note that our algorithm does not destroy the Kronecker structure and sparsity of F H SS H F , nor does it destroy diagonal matrices such as D. Thus our method can take advantage of these properties in every iteration step.We have deposited our fast implementation of the matrix product U (C 2 ) for this specific example in [39,

fastmult SGLTR 3i ABE.m].
There the f and f * fast matrix multiplication subroutines BluSparse and TBluSparse are attached for this specific problem.These codes can easily be adapted to other structured Sylvester matrix equation problems.
A further advantage of our method is that it finds the regularization solution of equation (V.3) when we choose a suitable ∆. Figure 7 (d) shows the composite color image fusion result obtained by Algorithm 4.1 using the problem specific fast codes for the matrix products that involve ...C 2 and ...C T 2 that appear in the matrix function f (...) and its adjoint function f * (...).In our runs we have experimented with choosing 640 ≤ ∆ ≤ 2000 and have obtained near identical optimal results, all in nearly the same CPU run times and with iteration counts differing by at most two when ∆ was changed.
Our algorithm and the algorithm of Wei, Dobigeon and Tourneret in [41] for image fusion differ very little in their data preparation parts.Their only significant variations occur when they solve the associated Sylvester equation which takes up between 50 to 90 % of total CPU time for this problem.The algorithm of Wei, Dobigeon and Tourneret [41] uses interpolation techniques and a subspace projection method.Its solution matrix X has norm 610.88.It is computed in 1.807 sec overall for this example, with its Sylvester solver using 0.586 sec thereof.Our CG plus Lanzcos and More-Sorenson algorithm computes the global solution as X * with Frobenius norm 630.89 in 20.9 seconds and its Sylvester solver with our fast U • C 2 implementation takes 19.1 sec of CPU time.Its relative matrix equation error F is 9.2•10 −10 when we set ∆ = 640 and err = 10 −8 inside our Sylvester solver.Clearly our method is much more accurate but more time consuming.Tightening err increases its accuracy without changing the computed solution X * except in a few of its trailing digits.To understand the inherent inaccuracy of Wei, Dobigeon and Tourneret's projection method [41] we have used our algorithm with ∆ set equal to 610.90 and err = 10 −2 .Then the optimal solution X inside the norm bounded ball {X | ||X ≤ 610.90} is computed on its boundary and it has the Frobenius norm 618.94.This low accuracy run took only 5.146 sec and used 2.922 sec for its Sylvester solver part.It achieves the relative matrix equation error of 9.209 • 10 −4 which is a lower bound for the relative error of X obtained in [41] .Thus the entries of the interpolation and projection method of solution X from [41] carry only around 3 accurate digits.This may be good enough for image fusion problems but it gives food for thought otherwise.
To further evaluate our method further for image fusion, we compare our approach to seven methods, namely to GSA [2], SFIM-HS [25], GLP-HS [1], CNMF [49], HySure [36], MAP-SMM [13] and Wei's FUSE [41].To compare  we use the image data of Headwall's Hyperspec Visible and Near-Infrared, series C (VNIR-C) imaging sensor over Chikusei, Ibaraki, Japan, taken in 2014 [47].Specifically we have selected a 540 × 420-pixel-size image set with 128 bands for the experiment, a 2-band MS image set and an HS image set that were obtained respectively by filtering this reference image set and by down-sampling every 5 pixels in both vertical and horizontal directions for each band of the reference image set.We present the experimental results in table 3 with respect to four quality measures [46]: 1) peak SNR (PSNR) defined as 3) erreur relative globale adimensionnelle de synthèse (ERGAS) defined as x +σ 2 y )(x 2 +y 2 ) .The quality of the constructed image data is listed in terms of their PSNR, SAM, ERGAS and Q2 n .The best results are in bold.These experiments show clearly our CG method, in [39, fastmult SGLTR 3i ABE.m] version obtains very satisfactory results.

D. T-Sylvester Matrix Equations.
An extensive analysis of theoretical and computational aspects of various T-Sylvester type matrix equations was presented by Fróilan Dopico in [10].Iterative Krylov subspace projection methods and codes for T-Sylvester equations AX + X T B = E are available in Dopico et al [11] for specific low rank right hand side matrices E for which explicit multi-dyadic representations E n,n = C1 n,r • (C2 n,r ) T with r < < n known a priori.These codes are fast, but for comparisons with our method, huge dense low rank right hand side matrices E can unfortunately not be handled by our more general iterative method which needs sparse or at least structured matrices in its active X multiplications by C 1 or C 2 .

E. Outlook.
Our MATLAB codes are collected at [39] for solving ten different linear matrix equations.These codes allow for two additional optional inputs apart from the necessary input matrices: the last optional input err is the desired output accuracy.For image restoration problems for example with relatively low accuracy sensor data when compared to MATLAB's machine constant, an output with a relative error of 10 −7 or 10 −10 for the solution X might suffice rather than our default error bound of 10 −14 .Since our algorithm is iterative, to obtain lesser accuracy a lower error threshold will reduce the number of iterations that are needed and this can result in a near 2-fold speed gain.The last but one optional input denotes the maximal norm ∆ of a solution X that we consider for the given Sylvester type equation.Our default is ∆ = 200.But for image processing problems, choosing ∆ = E may be sufficient.If the computed solution X is such that X = ∆, then the algorithm has found an optimal norm bounded solution on the boundary of the admissible set {X | X ∆} and there is a chance that 'better' solutions might lie beyond this ∆ sphere.Increasing the ∆ value to two or three times ∆ and repeating the computations may find a solution with smaller relative residual matrix equation error for the given problem.Besides, there is no great efficiency penalty (involving at most just a few extra iterations) if ∆ is chosen not too far above the actual norm of the optimal solution.Finally, our algorithms work equally well for all solvable and unsolvable Sylvester and T-Sylvester type equations and they either find a norm bounded solution if the equation is solvable or they find the optimal norm bounded solution if unsolvable.
As explained and shown earlier, our iterative methods can easily be adapted and extended to solve any linear matrix equation f (X) = E quickly, accurately and optimally with respect to norm limits for the solution X.This can be done for dense, sparse and certain huge structured matrix systems.It has proven its value as an accuracy checker in a Sylvester matrix equation example with a massive data matrix [41] that previously did not allow for direct accuracy checking of the computed solution.

VI. CONCLUSION
In this paper, we document a matrix iterative algorithm that uses the Conjugate Gradient and Lanzcos methods in conjunction with the GLTR algorithm and More-Sorensen's constraint optimization method to solve inhomogeneous linear matrix equations optimally (Problem (I.2)).We prove global constrained convergence along two branches in finitely many steps.Throughout we model the Conjugate Gradient Method for solving Sylvester type equations f (X) = E = 0 most efficiently through the equations' respective defining functions f and their adjoints f * .All our codes for solving f (X) = E similarly rely on the use of f and f * and can be easily modified to solve other linear matrix equations.Our method is general and generically applicable to all linear matrix equations f (X) = E = 0.It cannot and does not compete with specific applications methods that rely for example on knowledge of low rank factorizations of the right hand side matrix E as [5], [11], or [24] do.However, our numerical tests and real world applications to image fusion and image restoration problems such as encountered in [40], [41] ,and [42] illustrate the efficiency, accuracy, and usefulness of our algorithms for solving Sylvester type linear matrix equations.

APPENDIX A THEORETICAL PROPERTIES OF ALGORITHM 3.1
We develop useful theoretical properties of the matrix sequences of Algorithm 3.1 for the generalized Sylvester equation f (X) = E = 0.These theoretical properties describe orthogonality relations between certain computed matrix iterates that will allow us in Section IV to prove general convergence of the method and to speed up the algorithm further.
Lemma A.1.If the matrix sequences {R i }, {P i } , and {f (P i )} are generated as stipulated by Algorithm 3.1, then we have R i , R j = 0, f (P i ), f (P j ) = 0.
for all i = j, 0 i, j k and P i , R j = 0 for all 0 i < j k.
Proof.We use induction to prove that the conclusion holds for all 0 i < j k.
Step 2. We use that Pi, R i+l = 0, f (Pi), f (P i+l ) = 0 and Ri, R i+l = 0 for all 0 i k and 1 < l < k and show that Pi, R i+l+1 = 0, f (Pi), f (P i+l+1 ) = 0 and Ri, R i+l+1 = 0: From Steps 1 and Step 2, we learn by induction that Ri, Rj = 0, Pi, Rj = 0, f (Pi), f (Pj) = 0 for 0 i < j k.Since A, B = B, A holds for all same size matrix pairs A B, Ri, Rj = 0, f (Pi), f (Pj) = 0 also hold for all 0 j < i k and our proof is complete.
Lemma A.2.The sequence of matrices X k generated by Algorithm 3.1 satisfies Proof.We first show that the matrix sequences generated by Algorithm 3.1 satisfy X k , R k = 0 for k 0 and X k , P k = 0 for k 1.
Our algorithm computes X k+1 recursively in terms of X k .Once all the terms of this recursion are written out explicitly, we have Taking the inner product with R j and applying Lemma A.1 gives us An induction proof will establish the second assertion that X k , P k > 0. To do so we apply Lemma A.1 again and obtain X 1 , P 1 = α 0 P 0 , −R 1 + β 0 P 0 = α 0 β 0 P 0 , P 0 > 0. (A.1) We now make the inductive hypothesis that X k , P k > 0 and show that this implies X k+1 , P k+1 > 0. From (A.1), we have X k+1 , R k+1 = 0, and therefore we have And by the inductive hypothesis the last expression is positive.
Next we prove that X k < X k+1 , where X k+1 = X k + α k P k and k 1. Observe that This shows that X k < X k+1 and our proof is complete.
Remark A.3.In Lemma A.1, the matrix sequence R 0 , R 1 , R 2 , • • • ⊂ R m×n is mutually orthogonal.Therefore there is a positive number k + 1 m • n with R k+1 = 0. Hence, disregarding rounding errors, as long as the algorithm never switches to the second branch, the first stopping criterion of our algorithm will be satisfied after finitely many iterations.
Lemma A.4. [44, Lemma 2] Let {Q i } be the matrix sequence generated by Algorithm 3.1.Then this matrix sequence consists of mutually orthonormal matrices in the Frobenius norm.
The proof below replicates the one given for the matrix equation AXB = C with symmetric constraint in [44].We rework it here for clarity, now with the generalized Sylvester equation with its any number of terms function f (X) in I.1 instead of just AXB in [44].
Recall that the matrix inner product of two matrices in R m×n is defined as A, B = tr (B T A).
Lemma A.5. [44, Lemma 3] Let {γ k }, {T k } and {Q i } be the sequences generated by Algorithm 3.1.Let where e1 is the first unit vector and T k is positive semi-definite.
Proof.By the definition of T k and Q k (k = 0, 1, 2, . . .), we have where Therefore the equation holds.And obviously h T T k h = f ( X), f ( X) 0 for all h ∈ R k+1 .Therefore T k is positive semidefinite and the proof is complete.
Proof.By the definition of Q k and R k , we have where the ai and bi (for i = 0, 1, 2, . . ., k) are real numbers and H = f * • f .These equations imply that Q k and R k belong to the space Furthermore, we have Since h k 2 < ∆ the second formula of (A.10) implies that λ k = 0.And the first formula of (A.10) then ensures T k h k = −γ 0 e 1 .
By Lemma A.5, T k is positive semidefinite.
Case (1): We prove that if T k is positive definite, then solving problem (III.2) in the second branch of Algorithm 3.1 cannot occur, leading to a contradiction.To show that the first branch (CG method) has lead to success in this case we show that P j = 0 f (P j ) = 0 for all j = 0, 1, 2, Since T k is positive definite, its diagonal entries δ j = 0 for all j = 0, 1, 2, • • • , k.Then f (Q j ) = 0 and thus Q j = 0. From Theorem A.6 we know that R j = 0. Since P j = −R j + β j−1 P j−1 and R j ⊥P j−1 , we conclude that P j = 0 for all j = 1, 2, • • • , k.
From the proof of Lemma A.5, we have Then f (P k ) = 0. Since for j = 0, 1, 2, • • • , k each T j is positive definite as a submatrix of T k , we conclude that f (P j ) = 0 for all j.As T k is positive definite, h k = −T −1 k (γ 0 e 1 ) = 0 with h k 2 < ∆.Thus h k is also a solution of Problem (III.2).Clearly h k = −T −1 k (γ 0 e 1 ) must be unique as the solution of min h∈R k+1 1 2 h T T k h + h T (γ 0 e 1 ).This combined with Lemma A.5 states that Xk = (Q 0 , Q 1 , . . ., Q k )(h k ⊗ I) is the unique solution of Problem (A.7).From Lemma A.8, we have X k+1 = Xk with X k+1 from the CG method part.Since Xk = h k 2 < ∆, X k+1 < ∆.From Lemma A.2, we know that X 1 < • • • < X j < • • • < X k+1 < ∆.Therefore in its first k iteration steps, Algorithm 3.1 has only been implemented inside the first, the CG branch, which is a contradiction.
Case (2).If T k is positive semidefinite but not definite, then there exists a vector z such that T k ( hk + z) = −γ 0 e 1 and hk + z = ∆.This implies that hk + z = h k is also a solution of Problem (III.2) on the boundary.The proof of Theorem IV.1 extends the proof of Theorem 2 in [44] that was given there for explicit 1-term Sylvester functions, to multi-term ones and it is now re-formulated in terms of f and its adjoint function f * .
Proof.Assume that there is a scalar λ * 0 such that (IV.1) holds.Define ϕ(X) = 1 2 f (X), f (X) − f (X), E , and For any matrix W ∈ R m×n , we have always holds.Hence for λ * 0 we have ϕ(X) ϕ(X * ) for all X ∈ R m×n with X ∆.Therefore X * is a global minimizer of (I.3).
This implies that (IV.1) holds for λ * = 0.When X * = ∆, the second equation of (IV.1) is satisfied and consequently X * is the solution of the constrained problem for all X ∈ R m×n .Suppose that there are only negative values of λ * that satisfy (B.1).Then we have from (B.2) that ϕ(X) ϕ(X * ) whenever X X * = ∆.
Since we already know that X * minimizes ϕ(X) for X ∆, it follows that X * is a global, i.e., unconstrained minimizer of ϕ(X).Therefore, condition (B.1) holds with λ * = 0, which contradicts our assumption that only negative values of λ * can satisfy (B.1).

B. Proof of Lemma IV.4
Proof.Assume that h k is the solution of Problem (III.2).Then there exists a nonnegative number λ k such that the following identities hold: 3) is a convex quadratic problem and therefore these two conditions are necessary and sufficient.Equivalence with Problem (I.2) follows directly, since a solution X * of Problem (I.3) is also a solution of Problem (I.2) for ζ = λ * .Conversely, if X ζ is a solution of Problem (I.2) for a given ζ, then X ζ solves Problem (I.3) for ∆ = X ζ F .III. AN ITERATIVE METHOD TO SOLVE PROBLEM (I.2) AND ITS PROPERTIES Written out explicitly, our model problem (I.3) for generalized Sylvester equations becomes arg min
(c) Solve Lw = h for w.

Figure 1 :
Figure 1: The flow chart of Algorithm 3.1.

Figure 2 :
Figure 2: The flow chart of Algorithm 4.1.

Figure 4 Figure 5
Figure 4 and Y H ∈ R 224×2808 are the given MS and HS image data matrices, respectively.If the band number of MS is fewer than the subspace dimension we set below, the MS data generates into a PAN data.The one band MS image data Y M , as PAN image, with the composite color image of the HS image data Y H are shown in Figures 7 (b) and 7 (c).

Figure 7 (
Figure 7 (a) : composite color of the target image X; (b) : the PAN image data Y M ; (c) : composite color image of the HS image data matrix Y H ; (d) : composite color image of image fusion obtained by using our CG plus Lanzcos and More-Sorenson algorithm.

Table 1 :
Nine Sylvester and T-Sylvester type matrix equations in four classes.

Table 3 :
The quality measures for the Hyperspec Chikusel image data.