A model reduction approach for inverse problems with operator valued data

We study the efficient numerical solution of linear inverse problems with operator valued data which arise, e.g., in seismic exploration, inverse scattering, or tomographic imaging. The high-dimensionality of the data space implies extremely high computational cost already for the evaluation of the forward operator which makes a numerical solution of the inverse problem, e.g., by iterative regularization methods, practically infeasible. To overcome this obstacle, we take advantage of the underlying tensor product structure of the problem and propose a strategy for constructing low-dimensional certified reduced order models of quasi-optimal rank for the forward operator which can be computed much more efficiently than the truncated singular value decomposition. A complete analysis of the proposed model reduction approach is given in a functional analytic setting and the efficient numerical construction of the reduced order models as well as of their application for the numerical solution of the inverse problem is discussed. In summary, the setup of a low-rank approximation can be achieved in an offline stage at essentially the same cost as a single evaluation of the forward operator, while the actual solution of the inverse problem in the online phase can be done with extremely high efficiency. The theoretical results are illustrated by application to a typical model problem in fluorescence optical tomography.

Here c ∈ X is the quantity to be determined and we assume that M δ : Y → Z , representing the possibly perturbed measurements, is a linear operator of Hilbert-Schmidt class between Hilbert spaces Y and Z , the dual of Z. We further assume that the forward operator T : X → HS(Y, Z ) is linear and compact, and admits a factorization of the form T (c) = V D(c) U, (1.2) with V , D(c), and U again denoting appropriate linear operators. Problems of this kind arise in a variety of applications, e.g. in fluorescence tomography [1,30], inverse scattering [6,13], or source identification [17], but also as linearizations of related nonlinear inverse problems, see e.g., [8,34] or [23] and the references given there. In such applications, U typically models the propagation of excitation fields generated by the sources, D describes the interaction with the medium to be probed, and V models the emitted fields which can be recorded by the detectors. In the following, we briefly outline our basic approach towards the numerical solution of (1.1)- (1.2) and report about related work in the literature.

Regularized inversion.
By the particular functional analytic setting, the inverse problem (1.1)-(1.2) amounts to an ill-posed linear operator equation in Hilbert spaces and standard regularization theory can be applied for its stable solution [2,9]. Following standard arguments, we assume that M δ is a perturbed version of the exact data M and that a bound on the measurement noise is available. We further denote by c † the minimum norm solution of (1.1) with M δ replaced by M = T (c † ). A stable approximation for the solution c † can then be obtained by the regularized inversion of (1.1), e.g., by spectral regularization methods Here T : HS(Y, Z ) → X denotes the adjoint of the operator T and g α (λ) denotes a regularized approximation of 1/λ, i.e., the filter function, satisfying some standard conditions; we refer to [2,Chapter 2] or [9,Chapter 4] for details and to [26] for generalizations. A typical example for the filter function is g α (λ) = (λ + α) −1 , which leads to Tikhonov regularization c δ α = (T T + αI) −1 T M δ , see [36]. Another filter function with certain optimality conditions results from truncated singular value decomposition and reads g α (λ) = 1/λ if λ ≥ α and g α (λ) = 0 otherwise.
Note that the choice of the regularization norm in (1.4) is incorporated implicitly in the definition of the function spaces and one can obtain convergence c δ α → c † of the regularized solutions to the minimum norm solution c † in this norm if δ → 0 and α = α(δ, M δ ) is chosen appropriately; the rate of convergence will depend on the properties of the filter function g α , the parameter choice α(δ, M δ ), and on the smoothness of the minimum norm solution c † ; see [2,9,26] for details. For tomographic applications we have in mind, uniqueness results are usually available [18,28], i.e., T can be assumed to be injective, in which case c † is independent of the regularization norm; see Remark 3.3 below. We will not step further into the analysis of regularization methods, but rather focus on their efficient numerical realization for problems with operator valued data.
For the actual computation of the regularized solution c δ α , a sufficiently accurate finite dimensional approximation of the operator T is required, which is usually obtained by some discretization procedure; in the language of model order reduction, this is called the truth or high-fidelity approximation [3,32]. In the following discussion, we will not distinguish between infinite dimensional operators and their truth approximations. We thus assume that dim(X) = m, dim(Y) = k Y and dim(Z) = k Z . For ease of notation, we assume that k X = k Y = k in the following. We may then identify c with a vector in R m , M δ with a matrix in R k×k , and T with a third order tensor in R k×k×m or a matrix in R k 2 ×m . In typical applications, like computerized tomography, the dimensions m and k are very large and one may assume that k < m < k 2 ; see [5,24]. The inverse problem (1.1) thus can be considered to be typically of large scale and overdetermined. based on projection in data space, where Q N is an orthogonal projection with finite rank N , which is the dimension of the range of Q N . Since T is assumed compact, we can always choose N sufficiently large such that (1.6) and we may assume that typically N m, k, where m, k are the dimensions of the truth approximation used for the computations.
For the stable and efficient numerical solution of the inverse problem (1.1)-(1.2), we may then consider the low-dimensional regularized approximation As shown in [29], the low-rank approximation c δ α,N defined in (1.7) has essentially the same quality as the infinite dimensional approximation c δ α , as long as the perturbation bound (1.6) can be guaranteed. In the sequel, we therefore focus on the numerical realization of (1.7), which can be roughly divided into the following two stages: • Setup of the approximations Q N , T N , and T N T N . This compute intensive part can be done in an offline stage and the constructed approximations can be used for repeated solution of the inverse problem (1.1) for multiple data.
• Computation of the regularized solution (1.7). This online stage, which is relevant for the actual solution of (1.1), comprises the following three steps: Let us note that the analysis step is completely independent of the large system dimension k, m of the truth approximation and therefore the compression and synthesis step are the compute intensive parts in the online stage. If k 2 > m > k, which is the typical situation [5,24], the data compression turns out to be the most compute and memory expensive step. As we will explain below, the tensor product structure of (1.2) allows us to considerably reduce the memory cost in the compression step.
1.3. Low-rank approximations. A particular example of a low-rank approximation (1.5) is given by the truncated singular value decomposition T N svd , for which Q N svd amounts to the orthogonal projection onto the N svd -dimensional subspace of the left singular vectors corresponding to the largest singular values σ 1 ≥ . . . ≥ σ N svd of the operator T . By construction of the singular value decomposition, we have which allows to guarantee the desired accuracy (1.6) by choosing σ N svd +1 ≤ δ < σ N svd . From the Eckhard-Young-Mirsky theorem, we can conclude that N = N svd is the minimal rank of an operator T N satisfying the perturbation bound bound (1.6), i.e., the truncated singular value decomposition certainly yields the best possible low-rank approximation with a given rank N .
Based on Fourier techniques, fast analytic singular value decompositions for linear operators arising in optical diffusion tomography have been constructed in [25] for problems with regular geometries and constant coefficients. In more general situations, the full assembly and decomposition of the operator T is, however, computationally prohibitive. Krylov subspace methods [16,35] and randomized algorithms [14,27] then provide alternatives that allow to construct approximate singular value decompositions using only a moderate number of evaluations of T and its adjoint T . By combining randomized singular value decompositions for subproblems associated to a single frequency in a recursive manner, approximate singular value decompositions for inverse medium problems have been constructed in [5].
In a recent work [24], motivated by [23] and [22], finite dimensional inverse scattering problems of the particular form are considered. Using the Kathri-Rao product (A B ) ij,l = A i,l B j,l for matrices A, B ∈ R k×m with ij = (k − 1)i + j this problem can be cast into a linear system where vec(M ) ∈ R k 2 denotes the vectorization of the matrix M by columns. The Khatri-Rao product structure allows the efficient evaluation of T T , required for the solution of the inverse problem, using pre-computed low-rank approximations for U U and V V ; we refer to [20] for a definition and properties of the Kathri-Rao product and a survey on tensor decompositions. Apart from the more restrictive assumptions on the problem structure, the computational cost of the reconstruction algorithms in [24] is still rather high, since the dimension m of the parameter c still appears in the system, which may be prohibitive for problems with distributed parameters.
Another popular strategy towards dimension reduction for inverse problems with multiple excitations consists in synthetically reducing the number of sources. Such simultaneous or encoded sources have been used, e.g., in geophysics [15,21] and tomography [37]; see [33] for further references. The systematic construction of low-rank approximations is investigated intensively also in the context of model order reduction; see [3,32] for a survey on results in this direction.
Let us note that in the context of inverse problems, the mapping T here has to be understood as a linear operator, i.e., a tensor of order 2, and the norms in which the approximation quality should be measured are prescribed by the functional-analytic setting; see (1.8). By the Eckhard-Young-Mirsky theorem, the truncated singular value decomposition therefore yields the optimal low-rank approximation, and the main aspect here is to compute the truncated singular value decomposition of the operator T , or a sufficiently good approximation, with minimal effort. As we will see, this can be done on a rather general level, only using the particular form (1.2) and some abstract smoothness conditions on the involved operators.
1.4. Contributions and outline of the paper. The main scope of this paper is the systematic construction and analysis of low-rank approximations T N for operators T of the particular form (1.2) with • certified approximation error bounds (1.6), and • quasi-optimal rank N comparable to that of the truncated singular value decomposition. The stable solution of the inverse problem (1.1) can then be achieved by (1.7) in a highly efficient manner. The tensor-product structure (1.2) will further allow us to • set up T N at essentially the same cost as a single evaluation of T (c); • compress the data M δ on the fly already during recording. Before diving into the detailed analysis of our approach, let us briefly highlight the main underlying principles and key steps of the construction.
1.4.1. Sparse tensor product compression. Let Q K,U , Q K,V denote orthogonal projections of rank K in the space Y of sources and the space Z of detectors, respectively. If dim(Y) = dim(Z) = k, then clearly K ≤ k. One may also use different ranks K U , K V for the two approximations, but for ease of notation we take K = K U = K V . In the language of [15,21], the columns of the operators Q K,U and Q K,V amount to optimal sources and detectors, and the appropriate choice of Q K,U and Q K,V is also related to optimal experiment design [31]. We define corresponding approximations for the operators U : Y → U and V : Z → V, each of rank K. Since U and V are compact operators, we may choose K large enough such that the error in these approximations is as small as desired. The resulting tensor product approximation may then be used as an approximation for T . Following our notation, we may write T K,K = Q K,K T , with Q K,K denoting a tensor-product projection in data space. Unfortunately, the rank of T K,K is in general K 2 , which turns out to be typically much larger than the optimal rank achievable by truncated singular value decomposition of the same accuracy. Instead of T K,K , we therefore consider a hyperbolic-cross approximation [7], which has the general form with an orthogonal projection Q K onto the K most significant components in the range of T K,K . We will show in detail how to construct sparse-tensor product approximations T K of any desired accuracy, using knowledge about U K , V K and D(c) only. Moreover, under some mild conditions on the operators U, V, we will see that K ≈ K is sufficient to guarantee essentially the same accuracy as the full tensorproduct approximation. Let us further note that T K does not have a tensor-product structure, but is formally based on a tensor-product approximation, which turns out to be advantageous when computing the projected data M δ K = Q K M δ ; see below. 1.4.2. Recompression. By truncated singular value decomposition, we can further reduce the rank of the sparse-tensor product approximation T K leading to a final approximation which can be shown to have essentially the rank N ≈ N svd of the truncated singular value decomposition of T with the same accuracy; see Subsection 2.4 for details. We thus obtain computable approximations T N for T with essentially the same rank as the truncated singular value decomposition of similar accuracy. Only some mild assumptions on the mapping properties of the operators U and V are required to rigorously establish and guarantee the approximation property (1.6).

Summary of basic properties.
It turns out that the proposed twostep construction of the approximation T N , which is based on the underlying tensorproduct structure of the problem, has significant advantages compared to the truncated singular value decomposition in the setup, i.e., T N can be computed at the computational cost of essentially one single evaluation of the forward operator T (c). Moreover, the underlying tensor-product structure also allows to compute the projection M δ N = Q N M δ in an efficient manner. By construction of the projection that can be computed by two separate projections Q K,U , Q K,V of rank K in the spaces Y and Z of sources and detectors. Since the projection Q K,V can already be applied during recording, simultaneous access to the full data M δ is never required. As a consequence, the memory cost of data recording and compression is thereby reduced to 3Kk + K 2 .

1.4.4.
Outline. The remainder of the manuscript is organized as follows: In Section 2, we discuss in detail the construction of quasi-optimal low-rank approximations T N for problems of the form (1.2) with guaranteed accuracy (1.6). To illustrate the applicability of our theoretical results, we discuss in Section 3 a particular example stemming from fluorescence diffuse optical tomography. An appropriate choice of function spaces allows us to verify all conditions required for the analysis of our approach. In Section 4, we report in detail about numerical tests, in which we demonstrate the computational efficiency of the model reduction approach and the resulting numerical solution of the inverse problems.
2. Analysis of the model reduction approach. We will start with introducing our basic notation and then provide a complete analysis of the data compression and model reduction approach outlined in the introduction.
2.1. Notation. Function spaces will be denoted by A, B, . . . and assumed to be separable Hilbert spaces with scalar product (·, ·) A and norm · A . By A we denote the dual of A, i.e., the space of bounded linear functionals on A, and by a , a A ×A the corresponding duality product. Furthermore, L(A, B) denotes the Banach space of linear operators S : A → B with norm S L(A,B) = sup a A =1 Sa B < ∞. We write R(S) = {Sa : a ∈ A} for the range of the operator S and define rank(S) = dim(R(S)). By S : B → A and S : B → A we denote the dual and the adjoint of a bounded linear operator S ∈ L(A, B) defined, respectively, for all a ∈ A, b ∈ B, and b ∈ B by The two operators S and S are directly related by Riesz-isomorphisms.
Any operator S δ : δ will be called a δ-approximation for S in the following. Let us recall that any compact linear operator S : A → B has a singular value decomposition, i.e., a countable system with singular values σ 1 ≥ σ 2 ≥ . . . ≥ 0 and {a k : σ k > 0} and {b k : σ k > 0} denoting orthonormal basis for R(S ) ⊂ A and R(S) ⊂ B, respectively. Also note that S L(A,B) = σ 1 and rank(S) = sup{k : σ k > 0}. Moreover, by the Courant-Fisher min-max principle [12], also known as the Eckart-Young-Mirsky theorem, the kth singular value can be characterized by Hence every linear compact operator S : A → B can be approximated by truncated singular value decompositions is an orthonormal basis of A. Moreover, the scalar product and the associated norm are independent of the choice of this basis. Let us mention the following elementary results, which will be used several times later on. Here and below, we use a b to express a ≤ Cb with some generic constant C that is independent of the relevant context, and we write a b when a b and b a.
For convenience of the reader, we provide a short proof of these assertions.
Proof. The assumption S ∈ HS(A, B) implies that S is compact with square summable singular values, and hence σ K,S K −1/2 . The truncated singular value decomposition S K then satisfies S − S K L(A,B) ≤ σ K,S K −1/2 which yields (a). After choosing an orthonormal basis {a k } k≥1 ⊂ A, we can write

HS(A,B)
which implies the first inequality of assertion (b). The second inequality follows from the same arguments applied to the adjoint (R S) = S R and noting that the respective norms of an operator and its adjoint are the same.

Preliminaries and basic assumptions.
We now introduce in more detail the functional analytic setting for the inverse problem (1.1) used for our considerations. We assume that the operators U, V, D appearing in definition (1.2) satisfy Assumption 2.2. Let U ∈ HS(Y, U), V ∈ HS(Z, V), and D ∈ L(X, L(U, V )).
Following our convention, all function spaces appearing in these conditions, except the space L(·, ·), are separable Hilbert spaces. We can now prove the following assertions. Proof. Linearity of T is clear by construction and the linearity of U, V, and D. Now let {y k } k≥1 denote an orthonormal basis of Y and let c ∈ X be arbitrary. Then where we used Lemma 2.1 in the second step, and the boundedness of the operators in the third.
for all c ∈ X, which shows that T is bounded. Using Lemma 2.1(a), we can further approximate U and V by operators U K , V K of rank K, such that and we can define an operator T K,K : Using Assumption 2.2 and the bounds (2.5), we thus conclude that T can be approximated uniformly by finite-rank operators, and hence T is compact.
2.3. Sparse tensor product approximation. As direct consequence of the arguments used in the previous result, we obtain the following preliminary approximation result.
Lemma 2.4. Let Assumption 2.2 hold. Then for any δ > 0 there exists K ∈ N with K δ −2 and rank K approximations Here, Q K,U and Q K,V are orthogonal projections on Y and Z, respectively. Furthermore, the operator T K,K defined by T K,K (c) = V K D(c) U K has rank K 2 and satisfies If the singular values of U and V satisfy σ k,U , σ k,V k −α for some α > 1/2, then the assertions hold with K δ −1/α , and consequently rank( Note that different ranks K U , K V could be chosen for the approximations of U and V, but for ease of notation, we assume K U = K V = K. Since U and V are Hilbert-Schmidt, we know that σ k,U , σ k,V k −α with α ≥ 1/2, and, thus, the decay assumption on the singular values are not very restrictive. In general, rank T K,K = K 2 may however be substantially larger than the optimal rank N svd of the truncated singular value decomposition satisfying a similar perturbation bound. We now show that based on the approximations U K , V K , and assuming sufficient decay of the singular values σ k,U , σ k,V , one can construct a δ-approximation T K for T of rank K K 2 . Lemma 2.5. Let σ k,U k −β and σ k,V k −α (or σ k,U k −α and σ k,V k −β ) for some β > 1/2 and α > β + 1/2. Then σ k,T k −β , and for any δ > 0, we can find K ∈ N with K δ −1/β and an approximation T K = Q K T of rank K, such that Proof. Let {σ k, * , a k, * , b k, * } denote the singular systems for U and V , respectively. We now show that the hyperbolic cross approximation [7] T has the required properties. By counting, one can verify that rank( By observing that a k,V V = 1, D(c) L(U,V ) c X , and σ k,V = σ k,V and by using the decay properties of the singular values, we obtain In the last step, we used the fact that −2α + 2β(1 + ) < −1 and K δ −1/β , which follows immediately from the construction.
Remark 2.6. Comparing the results of Lemmas 2.4 and 2.5, we expect to obtain a tensor product approximation T K,K of rank K 2 δ −2/α while the hyperbolic cross approximation T K and consequently also the truncated singular value decomposition of the same accuracy only have rank K δ −1/(α−1/2− ) , with = α − β − 1/2 > 0, which may be substantially smaller for α > 1. Note that, like T K,K , the sparse tensor product approximation T K can be constructed directly from the low-rank approximations U K and V K .
2.4. Quasi-optimal low-rank approximation. Let P N svd denote the orthogonal projection onto the space spanned by the left singular vectors of T corresponding to the first N svd singular values, and let N svd be chosen such that We now show that a compression of any δ-approximation T δ , e.g. the tensor product approximation T K,K or its hyperbolic-cross approximation T K , allows to construct another δ-approximation T δ N δ = P δ N δ T δ for T with quasi-optimal rank N δ ≤ N svd . Lemma 2.7. Let δ > 0 and let T δ : X → HS(Y, Z ) be a linear compact operator such that T δ −T L(X,HS(Y,Z )) ≤ Cδ for some C > 0. Let P δ N δ T δ denote the truncated singular value decomposition of T δ with minimal rank N δ such that i.e., P δ N δ T δ is a δ-approximation for T with quasi-optimal rank. Proof. Step 1. We start by recalling a well-known perturbation result for singular values [19], i.e., we show that for each k ∈ N one has where {σ k } and {σ δ k } denote the singular values of T and T δ , respectively. Let us abbreviate · = · L(X,HS(Y,Z )) , choose ε > 0, and let P svd M T denote the truncated singular value decomposition of T with optimal rank such that P svd M T − T < ε. Using the optimality of M and the non-expansiveness of the projection, we estimate and σ δ k+1 ≤ σ k+1 + Cδ. The second inequality in (2.10) follows by considering T as a δ-approximation for the operator T δ .
Step 2. Now let N δ be as in the statement of the lemma and N svd = M (δ) as defined in Step 1.
and from the monotonicity of the singular values, we conclude that N δ ≤ N svd . Furthermore, , P δ N δ T δ is a δ-approximation for T with quasi-optimal rank N δ ≤ N svd . We now apply the previous lemma with T δ = T K and P δ N = P N the projection of the corresponding truncated singular value decomposition, which leads to By construction this is a δ-approximation for T of quasi-optimal rank N ≤ N svd .
2.5. Summary. Let us briefly summarize the main observations and results of this section. Based on δ-approximations U K , V K for the operators U and V, we can construct a low-rank δ-approximation T K = Q K T for T via hyperbolic-cross approximation. We observed that in typical situations T K is of much lower rank than the corresponding full tensor product approximations T K,K , whose assembly can be completely avoided. By truncated singular value decomposition of T K , we obtained another δ-approximation T N = P N T K = P N (Q K T ) with quasi-optimal rank N ≤ N svd . In summary, we thus efficiently computed a low-rank approximation T N for T with similar rank and approximation properties as the truncated singular value decomposition.
The analysis in this section was done in abstract function spaces and applies verbatim to infinite-dimensional operators as well as to their finite-dimensional (truth) approximations obtained after discretization. As a consequence, the computational results, e.g., the ranks K and N of the approximations, can be expected to be essentially independent of the actual truth approximation used for computations. In the language of model-reduction, the low-rank approximation T N is a certified reduced order model.

Fluorescence optical tomography.
In order to illustrate the viability of the theoretical results derived in the previous section, we now consider in some detail a typical application arising in medical imaging.
3.1. Model equations. Fluorescence optical tomography aims at retrieving information about the concentration c of a fluorophore inside an object by illuminating this object from outside with near infrared light and measuring the light reemitted by the fluorophores at a different wavelength. The distribution u x = u x (q x ) of the light intensity inside the object generated by a source q x at the boundary is described by in Ω, We assume that Ω ⊂ R d , d = 2, 3, is a bounded domain with smooth boundary enclosing the object under consideration. The light intensity u m = u m (u x , c) emitted by the fluorophores is described by a similar equation in Ω, The model parameters κ i , µ i , and ρ i , i = x, m, characterize the optical properties of the medium at excitation and emission wavelength; we assume these parameters to be known, e.g., determined by independent measurements [1]. As shown in [8], the above linear model, which can be interpreted as a Born approximation or linearization, is a valid approximation for moderate fluorophore concentrations.

Forward operator.
The forward problem in fluorescence optical tomography models an experiment in which the emitted light resulting from excitation with a known source and after interaction with a given fluorophore concentration is measured at the boundary. The measurable quantity is the outward photon flux, which is proportional to u m ; see [1] for details. The potential data for a single excitation with source q x measured by a detector with characteristic q m can be described by where u m and u x are determined by the boundary value problems (3.1)-(3.4). The inverse problem finally consists of determining the concentration c of the fluorophore marker from measurements T (c)q x , q m for multiple excitations q x and detectors q m .
We now illustrate that fluorescence optical tomography perfectly fits into the abstract setting of Section 2. Let us begin with defining the excitation operator which maps a source q x to the corresponding weak solution u x of (3.1)-(3.2). The interaction with the fluorophore can be described by the multiplication operator In dimension d ≤ 3, the product cu of two functions c ∈ L 2 (Ω) and u ∈ H 1 (Ω), lies in L 3/2 (Ω) and can thus be interpreted as a bounded linear functional on H 1 (Ω); this shows that D is a bounded linear operator. We further introduce the linear operator In order to apply the results of Section 2, it remains to verify Assumption 2.2. We already showed that D ∈ L(X, L(U, V )) is a bounded linear operator. The following assertion states that also the remaining conditions on U and V hold true. Proof. The Hilbert-Schmidt property follows immediately from the decay behavior of the singular values. Let Ω h be a quasi-uniform triangulation of the domain Ω of meshsize h and ∂Ω h be the induced segmentation of the boundary ∂Ω. Further, let Y h = P 1 (∂Ω h ) ∩ H 1 (∂Ω) ⊂ H −1/2 (∂Ω) be the space of piecewise linear finite elements on ∂Ω h . Let Q h be the L 2 -orthogonal projection onto Y h and q ∈ H 1 (∂Ω) arbitrary. Then standard approximation error estimates, see e.g., [4], yield A-priori estimates for elliptic PDEs yield Uq H 1 (Ω) q H −1/2 (∂Ω) , and hence U can be continuously extended to an operator on H −1/2 (∂Ω); see e.g. [10]. This yields is the dimension of the space Y h . From the min-max characterization of the singular values (2.3), we may therefore conclude that σ k,U k −3/(2d−2) as required. The result for σ k,V follows in the same way.
Remark 3.2. If prior knowledge supp(c) ⊂ Ω on the support of the fluorophore concentration is available, which is frequently encountered in practice, elliptic regularity [10] implies exponential decay of the singular values σ k,U and σ k,V . In such a situation, the ranks K and N in Lemmas 2.4 and 2.7 will depend only logarithmically on the noise level δ, and an accurate approximation T N of very low rank can be found.

Algorithmic realization and complexity estimates.
We will now discuss in detail the implementation of the model reduction approach presented in Section 2 for the fluorescence optical tomography problem and demonstrate its viability by some preliminary considerations. For ease of presentation, we consider a simple twodimensional test problem. Our observations, however, carry over almost verbatim also to three dimensional problems of similar dimensions.

Problem setup.
For the discretization of (3.1)-(3.2) and (3.9)-(3.10), we use a standard finite element method with continuous piecewise linear polynomials. The computational meshes used for the truth approximations are obtained by successive uniform refinement of the initial mesh, leading to quasi-uniform conforming triangulations Ω h of the domain Ω with h > 0 denoting the mesh size. Thus, the corresponding spaces U h , V h ⊂ H 1 (Ω) then have dimension m h −d each. For our test problem, we have d = 2, since we consider a two-dimensional setting. We choose the same finite element space X h also for the approximation of the concentration c. The sources q x , q m for the forward and the adjoint problem are approximated by piecewise linear functions on the boundary of the same mesh Ω h ; hence Y h , Z h ⊂ H 1 (∂Ω) have dimension k h d−1 . All approximation spaces are equipped with the topologies induced by their infinite dimensional counterparts. Standard error estimates allow to quantify the discretization errors in the resulting truth approximation of the forward operator and to establish the δ-approximation property for h small enough. The error introduced by the discretization can therefore be assumed to be negligible.
A sketch of the domain Ω ⊂ R 2 and the coarsest mesh Ω h used for our computations as well as the parameter c † to be identified are depicted in characteristic dimension of the relevant function spaces after discretization can be deduced from Table 4.1. The numbers in the table also illustrate the assumption k < m < k 2 and that the discretized inverse problem is formally overdetermined. Note that an approximation on level ref ≥ 4 is required to guarantee a discretization

Truth approximation.
Let us briefly discuss in a bit more detail the algebraic structure of the resulting problems arising in the truth approximation. Under the considered setup, the finite element approximation of problem (3.1)-(3.2) leads to the linear system Here K x , M x ∈ R m×m are the stiffness and mass matrices with coefficients κ x , µ x , and the matrices R x ∈ R m×m , E x ∈ R m×k stem from the discretization of the boundary conditions. The columns of regular Q x ∈ R k×k represent the individual independent sources in the basis of Y h . Any excitation generated by a source in Y h can thus be expressed as a linear combination of columns of the excitation matrix U ∈ R m×k , which serves as a discrete counterpart of the operator U. In a similar manner, the discretization of the adjoint problem (3.9)-(3.10) leads to whose solution matrix V ∈ R m×k can be interpreted as the discrete counterpart of the operator V. The system matrices K m , M m , R m , and E m have a similar meaning as above, and the columns of Q m represent the individual detector characteristics. Recall from Subsection 1.1 that k = k Y = k Z only to simplify exposition. The algebraic form of the truth approximation finally reads where D(c) ∈ R m×m is the matrix representation of the finite element approximation for the operator D(c h ) with c ∈ R m denoting the coordinates of the function c h ∈ X h . The discrete measurement M ij = (V D(c)U) ij = V(:, i) D(c)U(:, j) then approximates the data taken by the ith detector for excitation with the jth source.
From the particular form (4.3) of the forward operator, one can see that only the matrices U, V, and a routine for the application of D(c) are required to evaluate T(c). In our example, the application of D(c) amounts to the multiplication by a diagonal matrix, which leads to a complexity of O(k 2 m) flops for the application and a memory requirement of O(2km + m) bytes for the forward operator. Note that the matrices U and V can now be stored on a standard workstation, while the application of the forward operator is still too compute intensive to be useful for the efficient solution of the inverse problem under consideration. Note that we simply have M HS := M F if the columns of Qx, Qm are chosen orthonormal with respect to the SY and SZ inner products right from the beginning. We will use this fact in our numerical tests below.
Remark 4.2. Using multigrid solvers, the matrices U and V can be computed in O(mk) operations [11], which is, at least asymptotically, negligible compared to the application of T(c) in tensor product form. In our computational tests, we utilize sparse direct solvers for the computation of U and V, for which the computational cost is O(m 3/2 + km log(m)). Since m 3/2 ≤ mk and log(m) ≤ k in our two-dimensional setting, this is still of lower complexity than even a single evaluation of T(c). Note that some slight modifications would be required here, if the source and detector matrices Qx and Qm would not be chosen as the identity matrices. The columns of Ax and Am are orthogonal with respect to the SY and SZ scalar product and thus define bases of the discrete source and detector spaces. After appropriate scaling, the columns can be assumed to be normalized such that Ax'*SY*Ax and Am'*SZ*Am equal the identity matrix. To simplify the subsequent discussion, we change the definition of the sources and detectors as well as of the excitation and emission matrices, and redefine the forward operator according to Qx=Qx*Ax; U=U*Ax; Qm=Qm*Am; V=V*Am; T=@(c) V'*D(c)*U; The columns of Qx and Qm are now orthogonal with respect to the SY and SZ scalar products, and as a consequence, the Hilbert-Schmidt norm in the measurement space amounts to the Frobenius norm of M=T(c); see Remark 4.1 for details.
The memory cost for storing AY = U * SX * U and AZ = V * SX * V amounts to O(k 2 ) bytes, while the computation of the matrix products requires O(k 2 m) flops. The complexity for the eigenvalue decompositions finally is O(k 3 ) flops. Note that the setup of the matrices AY and AZ has the same cost as a single evaluation T(c) of the forward operator. The additional memory required for storing the k × k matrices AY and AZ is negligible.

4.3.2.
Low-rank approximations for U and V. The eigenvalues computed in the decompositions above correspond to the square of the singular values of U and V. We here allow for different ranks in the approximation and define truncation indices dx=diag(Dx); xKK=find(dx>delta^2); xK=length(xKK); dm=diag(Dm); mKK=find(dm>delta^2); mK=length(mKK); We could further set K=max(xK,mK) to stay exactly with the notation used in Section 2, but we stress that our implementation is in full generality. The low-rank approximations for U and V and the resulting tensor product approximation of the forward operator T(c) are then given by QxK=Qx(:,xKK); UK=U(:,xKK); QmK=Qm(:,mKK); VK=V(:,mKK); TKK=@(c) VK'*D(c)*UK; Observe that the measurements MKK=TKK(c) obtained by this approximation correspond to a sub-block of the full measurements, i.e., MKK=M(mKK,xKK).

4.3.3.
Hyperbolic cross approximation. The proof of Lemma 2.5 shows that we may replace the tensor product operator TKK(c) by the hyperbolic cross approximation TK(c), which takes into account only the entries M(k,l)=MKK(k,l) of the measurements M=T(c) for indices k · l ≤ N δ −β . In our computations, we actually replace N by K, i.e., we utilize the hyperbolic cross approximation TK(c) of TKK(c). The assembly of the matrix representation AK for TK(c) = AK * c then reads m=0; for k=1:K for l=1:floor(K/k) m=m+1; AK(m,:)=(VK(:,k)'.*UK(:,l)')*DD; end end Here, DD is a diagonal matrix representing the numerical integration on the computational domain. Let us note that AK and thus also the operator TK(c) do not have a tensor product structure any more; therefore the measurements MK=TK(c) are stored as a column vector rather than a matrix. The norm in the reduced measurement space then is the Euclidean norm for vectors. Also note that the construction of AK and TK only requires access to the matrices UK and VK defining the operator TKK.
The tensor product approximation TKK(c)=VK'*D(c)*UK requires only subblocks UK, VK of the excitation and emission matrices U, V and, therefore, no additional memory cost arises in setting up this approximation. To achieve a δ-approximation with δ = 10 −3 , for instance, we expect to require approximately K = 100 singular components of U and V; see Lemma 3.1 for details. The tensor product approximation will then have rank K 2 = 10 4 . For the hyperbolic cross approximation TK, we however expect to require only approximately 2K = 200 components of the tensor product approximation TKK; compare with Lemma 2.5. In Table 4.2 we summarize the expected memory and computational cost for the corresponding approximations. Note that Table 4.2 Memory and computation cost for storing and applying the tensor product approximation TKK of rank K 2 = 10 4 and the corresponding hyperbolic cross approximation TK of rank 2K = 200. The theoretical memory and computation cost is given by mem(TKK)=mem(TK)= 16Km bytes, ops(TKK(c))= Km + K 2 m flops, and ops(TK(c))= 2Km flops, respectively. the tensor product structure allows to store the tensor product approximation TKK as efficiently as the hyperbolic cross approximation TK. The application of the latter is, however, substantially more efficient.

Final recompression.
The last step in our model reduction approach consists in a further compression of the hyperbolic cross approximation TK of the tensor product operator TKK; cf. Subsection 2.4 for details. This can be realized by AKt=DD\AK'; AKAKt=AK*AKt; [VA,DA]=eigs(AKAKt,NK); where NK=size(AK,1) is the number of terms used for the hyperbolic cross approximation. The recompression then consists of selecting the largest entries, i.e., N=find(diag(DA)>delta^2); AN=AK*VA(:,N); ANt=DD\AN'; Matrix representations for the projection operators Q K,K , P K , and P N corresponding to the tensor product, the hyperbolic cross, and the final approximation, can be assembled easily from the eigenvectors computed during the construction.
As before, the compression is based on singular value decompositions of operators via the solution of generalized eigenvalue problems for the matrices BKK=AKK*(DD\AKK') respectively BK=AK*(DD\AK'), where AKK and AK are the matrix representations for the operators TKK(c) and TK(c) respectively. The computational cost for the assembly of BKK and BK is listed in Table 4.3. For an evaluation of the computational complexity, we again assume that TKK has rank K 2 = 10 4 and that TK is of rank 2K = 200. Let us note that the required memory for storing BKK and BK is indepen- dent of the mesh size; for the setting considered here, it is given by mem(BKK)= 745MB and mem(BK)= 0.3MB. Assuming that an eigenvalue decomposition of an n × n matrix needs roughly ops(eig)= 50n 3 operations, we obtain ops(eig(BKK))= 46 566 Gflops and ops(eig(BK))= 0.373 Gflops. Even if a computationally more efficient low-rank approximation [14,35] for the tensor product operator TKK would be used, the evaluation of TKK(c) remains rather expensive; see Table 4.2 for details. Therefore, the tensor product approximation TKK is not really useful for the computation of low-rank approximation on large computational meshes. As shown in Section 2, a quasi-optimal approximation TN can be computed also by truncation of the singular value decomposition of the hyperbolic cross approximation TK, which does not require any additional computations.

Online phase.
After the construction of the low-rank approximation TN(c) as outlined above, the actual solution of the inverse problem consists of three basic steps; see Section 1 for a brief explanation. The first step is the data compression which can be expressed as MN=PN*vec(MKK) with MKK = (QmK * M) * QxK. Here we make explicit use of the tensor product structure, which allows us to efficiently compress the data already during recording. After this, only the second projection PN has to be applied. The additional memory required for computing MKK is O(kK) bytes for each, QxK, QmK, and QmK * M and thus negligible. Note that storing the full data M requires O(k 2 ) bytes which is substantially higher. The computational cost of the data compression step is O(K 2 k +k 2 K). As mentioned before, the data can be partially compressed already during recording, such that access to the full data is actually never required.
The final compression MN=PN*MKK is independent of the system dimensions m, k and its computational cost is therefore negligible. The same applies for the solution of the regularized inverse problem zadN=(ANANt+alpha*I)\MN, which is the second step in the online phase and only depends on the dimension N of the reduced model.
The synthesis of the solution according to (1.7) can finally be realized by simple multiplication cadN=ANt*zadN, where ANt denotes the matrix representation of the adjoint of the fully reduced forward operator TN. The additional memory required for ANt is O(mN) bytes, whereas the computation of cadN can be accomplished in O(mN) flops. Thus, as claimed in the introduction, the most compute intensive part of the online phase is the data compression, even if the tensor product structure is utilized to compress the data already during recording.

Computational results.
We now illustrate the practical performance of our model reduction approach for the test problem introduced in the previous section. For comparison, we also report on corresponding results for traditional iterative methods for solving the inverse problem (1.1)-(1.2), as well as for methods based on a tensorproduct approximation. A snapshot of the geometry, the exact solution, and a typical reconstruction is depicted in Figure 4.1.
For our numerical tests, the model parameters are set to κ x = 1, µ x = 0.2, ρ x = 10 and κ m = 2, µ m = 0.1, and ρ m = 10. We further assume prior knowledge that c is supported in a circle of radius 0.9, i.e., the distance of its support to the boundary ∂Ω is at least 0.1. The singular values of the operators U and V as well as of T can thus be assumed to decay exponentially. The noise level in (1.3) is set to δ = 10 −5 and the regularization parameter α is chosen from {10 −n } via the discrepancy principle. In all our computations, this led to α = 10 −8 which complies to the theoretical prediction for exponentially ill-posed problems [9].
Let us note that, in order to ensure sufficient accuracy T h −T L(X;HS(Y,Z )) ≤ δ of the truth approximation, one should utilize a discretization of the forward operator on mesh level ref ≥ 4 for the reconstruction; see Table 4.1 and Section 1. For evaluation of computational performance, we however also report about results on coarser meshes.
All computations are performed on on a workstation with Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz and 768GB of memory. In our tests we use only a single core of the processor and an implementation in Matlab 9.6.0.

Problem initialization.
This step consists of setting up the excitation and emission matrices U, V, which are required for the efficient evaluation of T(c). In Table 5.1, we also report about the singular value decomposition of U and V, by which we orthogonalize the sources and detectors; as mentioned in Subsection 4.3, this is required for computation of the Hilbert-Schmidt norm M δ HS(Y,Z ) of the measurement operator. While the theoretical complexity of the first and third step is somewhat smaller than that of the second and fourth step, the overall computation times for the individual steps in the setup phase are comparable.
Note that the computations reported in Table 5.1 are required for the solution of the inverse problem (1.1), independent of the particular solution strategy, and they can be performed in a pre-processing step 5.2. Model reduction -offline phase. The singular values computed in the decompositions of U and V allow to determine the truncation indices xK and mK used to define the δ-approximations UK=U(:,xKK) and VK=V(:,mKK). The values of xK and mK obtained in our numerical tests are depicted in Table 5.2. On the coarsest mesh, the number of possible excitations and detectors is limited by the number of boundary vertices, but otherwise, the number of truncation indices xK and mK are almost independent of the truth approximation. This can be expected since the eigenvalues converge with increasing refinement of the mesh.

Forward evaluation.
As outlined in Subsection 4.3, the full operator and its tensor product approximation can now be simply defined by T=@(c) V'*D(c)*U and TKK=@(c) VK'*D(c)*UK. In Table 5.3, we report about the computation times for a single evaluation of these operators. As can be seen, even the problem adapted evaluation of the full operator T(c) becomes practically useless for the solution of the inverse problem (1.1). The tensor product approximation TKK(c), which is the underlying approximation for methods based on optimal sources [21] or based on the Kathri-Rhao product [24], seems somewhat better suited but, as we will see below, may still be not appropriate for the efficient solution of the inverse problem.

Truncated singular value decomposition.
As a theoretical reference for model-reduction, we consider the low-rank approximation of T(c) and TKK(c) by truncated singular value decomposition, which can be computed via eigenvalue decompositions for the symmetric operators T(Tt(M)) and TKK(TKKt(MKK)). The latter can be computed numerically by the eigs routine of Matlab in a matrix-free way, i.e., only requiring the application of the operators T(c), TKK(c) and their adjoints Tt(M), TKKt(M). The sum of xK and mK specifies the maximal number of eigenvalues to be considered by the algorithm. In Table 5.4, we display the computation times for eigenvalue solvers and the number N of relevant eigenvalues required to obtain a δ-approximation. The computation times for the decomposition of the full operator T increase roughly by a factor of 8 per refinement, while those for the tensor product approximation only increase by a factor of 4. Computations taking longer than 1000sec were not conducted. Due to the substantially smaller rank, the evaluation TN(c) of the low-rank approximations resulting from one of the singular value decompositions above is faster by a factor of more than 100 compared to that of the tensor product approximation TKK(c), and even on the finest mesh only takes about 0.01sec. Let us recall that a discretization at mesh level ref ≥ 4 is required to guarantee sufficient approximation T h − T L(X;HS(Y,Z )) ≤ δ of the truth approximation T h used for the solution of the inverse problem.

5.2.3.
Setup of reduced order model. As described in Section 2, we can utilize the hyperbolic cross approximation TK instead of the full tensor product approximation TKK without loosing the δ-approximation property. In Table 5.5, we summarize the computation times for assembling the hyperbolic cross approximation TK and the subsequent singular value decomposition used in the final recompression step. Note that the setup cost for the hyperbolic cross approximation increases roughly by a factor of 4 for each refinement, while the subsequent singular value decomposition and the ranks are essentially independent of the mesh level. Due to the moderate rank K = rank(TK) of the hyperbolic cross approximation, it pays off to compute the matrix approximation of TKTKt and to use it for the subsequent eigenvalue decomposition. As can be seen from Table 5.5, the recompression step allows to reduce the rank by another factor of about 5. As predicted by our theoretical investigations, the rank of the final approximation TN is comparable to that of the truncated singular value decomposition of the full operator T or its tensor product approximation TKK; cf. Table 5.4. The use of the hyperbolic cross approximation TK instead of the full operator or its tensor product approximation however allows to speed up the computation of the final low-rank approximation TN substantially. Again, the rank of the approximation becomes essentially independent of the mesh after some initial refinements, reflecting the mesh-independence of our approach.

Solution of inverse problem -online phase.
We now turn to the online phase of the solution process. Iterative methods are used for the solution of the inverse problem with the full operator T and its tensor product approximation TKK. As mentioned before, we choose a regularization parameter α = 10 −8 , which was determined by the discrepancy principle. For the computation of the regularized solution (1.7) with full operator T(c) and the tensor product approximation TKK(c), we use Matlab's pcg routine with tolerance set to tol = αδ 2 . Since the rank of the final reduced order model TN is rather small, we can use a direct solution of (1.7) by Matalb's backslash operator in that case.
In Table 5.6, we display the online solution times and the error err = c δ α − c † obtained for the final iterate. Approximately 1 800 iterations are required for the iter- Table 5.6 Computation times (sec) for the solution of the inverse problem via Tikhonov regularization. Iterative methods are utilized for the solution of (1.4) in the first two cases working with operators T and TKK, while a direct solver is used for the low-rank approximation TN. ative solution of (1.4) with the full operator T and the tensor-product approximation TKK on all mesh levels, which again illustrates the mesh-independence of the algorithms. Note that even for the tensor product approximation, the iterative solution on fine meshes becomes practically infeasible, while for the low-rank approximation T N of quasi-optimal rank, the inverse problem solution remains extremely efficient up the finest mesh. In Table 5.7 we discuss in more detail the computation times for the individual steps in (1.7), namely the data compression, the solution of the regularized normal equations, and the synthesis of the reconstruction. Similar online computation times are also obtained for the low-rank approximation computed by truncated singular value decomposition of the full operator T , since its rank and approximation properties are very similar to that of the approximation constructed by our approach. As announced in the introduction and predicted by our complexity estimates, the data compression step becomes the most compute-intensive task in the online solution via the low-rank reduced order model TN. While the data compression and synthesis step depend on the dimension of the truth approximation, the solution of the regularized normal equations becomes completely independent of the computational mesh. Also observe that the quality of the reconstruction is not degraded by the use of a low-rank approximation in the solution process. Overall, we thus obtained an extremely efficient, stable, and accurate reconstruction for fluorescence tomography.
6. Summary. A novel approach towards the systematic construction of approximations for high dimensional linear inverse problems with operator valued data was proposed yielding certified reduced order models of quasi-optimal rank. The approach was fully analyzed in a functional analytic setting and the theoretical results were illustrated by an application to fluorescence optical tomography. The main advantages of our approach, compared to more conventional low-rank approximations, like truncated singular value decomposition, lies in a vastly improved setup time and the possibility to partially compress the data already during recording. In particular, the computational effort of setting up the reduced order model T N is comparable to that of one single evaluation T (c) of the forward operator. Due to the underlying tensor-product approximation, access to the full data M δ is not required.
The most compute intensive part in the offline phase consists in the setup of the discrete representations for U and V as well as their eigenvalue decomposition. A closer investigation and the use of parallel computation could certainly further improve the computation times for this step. Further acceleration of the data compression and synthesis step could probably be achieved by using computer graphics hardware. The low-dimensional reduced order models obtained in this paper may also serve as preconditioners for the iterative solution of related nonlinear inverse problems, which would substantially increase the field of potential applications.