1 Introduction

Radiative transfer models describe the streaming, absorption, and scattering of radiation waves propagating through a turbid medium occupying a bounded convex domain \(R\subset \mathbb {R}^d\), and they arise in a variety of applications, e.g., neutron transport [11, 35], heat transfer [39], climate sciences [20], geosciences [38] or medical imaging and treatment [2, 4, 45]. The underlying physical model can be described by the anisotropic radiative transfer equation,

$$\begin{aligned} s \cdot \nabla _ru(s,r) + \sigma _t(r) u(s,r)&= \sigma _s(r)\int _S k(s\cdot s')u(s',r)ds' + q(s,r). \end{aligned}$$
(1)

The specific intensity \(u=u(s,r)\) depends on the position \(r\in R\) and the direction of propagation described by a unit vector \(s\in S\), i.e., we assume a constant speed of propagation. The medium is characterized by the total attenuation coefficient \(\sigma _t=\sigma _a+\sigma _s\), where \(\sigma _a\) and \(\sigma _s\) denote the absorption and scattering rates, respectively. The scattering phase function k relates pre- and post-collisional directions, and we consider exemplary the Henyey-Greenstein phase function

$$\begin{aligned} k(s\cdot s')=\frac{1}{4\pi }\frac{1-g^2}{{[1-2g(s\cdot s')+g^2]}^{3/2}}, \end{aligned}$$
(2)

with anisotropy factor g. For \(g=0\), we speak about isotropic scattering, and for g close to one, we say that the scattering is (highly) forward peaked. For simplicity, we assume \(0\le g<1\) in the following. The case \(-1<g\le 0\) is similar. Internal sources of radiation are modeled by the function \(q\). Introducing the outer unit normal vector field \(n(r)\) on \(\partial R\), the boundary condition is modeled by

$$\begin{aligned} u(s,r) = f(s,r) \quad \text {for } (s,r)\in S\times \partial R \text { such that } s\cdot n(r)<0. \end{aligned}$$
(3)

In this paper we consider the iterative solution of the linear systems arising from the discretization of the anisotropic radiative transfer equations (1)–(3) by preconditioned Richardson iterations. We are particularly interested in robustly convergent methods for multiple physical regimes that, at the same time, can embody ballistic regimes \(\sigma _s\ll 1\) and diffusive regimes, i.e., \(\sigma _s\gg 1\) and \(\sigma _a>0\), and highly forward peaked scattering, as it occurs for example in medical imaging applications [22]. Due to the size of the arising systems of linear equations, their numerical solution is challenging, and a variety of methods were developed as briefly summarized next.

1.1 Related Work

Since for realistic problems analytical solutions are not available, numerical approximations are required. Common discretization methods can be classified into two main approaches based on their semidiscretization in \(s\). The spherical harmonics method [5, 19, 35] approximates the solution \(u\) by a truncated series of spherical harmonics, which allows for spectral convergence for smooth solutions. For non-smooth solutions, which is the generic situation, local approximations in \(s\) can be advantageous, which is achieved, e.g., by discrete ordinates methods [26, 35, 43, 44, 46], continuous Galerkin methods [7], the discontinuous Galerkin (DG) method [24, 32, 40], iteratively refined piecewise polynomial approximations [13], or hybrid methods [12, 30].

A common step in the solution of the linear systems resulting from local approximations in \(s\) is to split the discrete system into a transport part and a scattering part. While the inversion of transport is usually straight-forward, scattering introduces a dense coupling in \(s\). The corresponding Richardson iteration resulting from this splitting is called the source iteration [1, 37], and it converges linearly with a rate \(c=\Vert \sigma _s/\sigma _t\Vert _\infty \). For scattering dominated problems, such as the biomedical applications mentioned above, we have \(c\approx 1\) and the convergence of the source iteration becomes too slow for such applications. Acceleration of the source iteration can be achieved by preconditioning, which usually employs the diffusion approximation to (1)–(3) [1], and the resulting scheme is then called diffusion synthetic accelerated (DSA) source iteration [1]. Although this approach is well motivated by asymptotic analysis, it faces several issues, such as a proper generalization to multi-dimensional problems with anisotropy, strong variations in the optical parameters, or the use of unstructured and curved meshes, see [1].

Effective DSA schemes rely on consistent discretization of the corresponding diffusion approximation, see [40, 48] for isotropic scattering, and [41] for two-dimensional problems with anisotropic scattering. The latter employs a modified interior penalty DG discretization for the corresponding diffusion approximation, which has also been used in [47] where it is, however, found that their DSA scheme becomes less effective for highly heterogeneous optical parameters. A discrete analysis of DSA schemes for high-order DG discretizations on possibly curved meshes, which may complicate the inversion of the transport part, can be found in [28]. In the variational framework of [40] consistency is automatically achieved by subspace correction instead of finding a consistent discretization of the diffusion approximation. This variational treatment allowed to prove convergence of the corresponding iteration and numerical results showed robust contraction rates, even in multi-dimensional calculations with heterogeneous optical parameters.

It is the purpose of this paper to generalize the approach of [40] to the anisotropic scattering case, which requires non-trivial extensions as outlined in the next section.

1.2 Approach and Contribution

In this paper we focus on the construction of robustly and provably convergent efficient iterative schemes for the radiative transfer equation with anisotropic scattering. To describe our approach, let us introduce the linear system that we need to solve, which stems from a mixed finite element discretization of (1)–(3) using discontinuous polynomials on the sphere [17, 40], i.e.,

$$\begin{aligned} \begin{bmatrix} \mathbf {R}+ \mathbf {M}^{\!+}&{}\quad -\mathbf {A}\!^{\intercal }\\ \mathbf {A}&{}\quad \mathbf {M}^{\!-}\end{bmatrix}\begin{bmatrix}\mathbf {u}^{\!+}\\ \mathbf {u}^{\!-}\end{bmatrix} = \begin{bmatrix} \mathbf {K}^{\!+}&{}\\ &{}\mathbf {K}^{\!-}\end{bmatrix}\begin{bmatrix}\mathbf {u}^{\!+}\\ \mathbf {u}^{\!-}\end{bmatrix} + \begin{bmatrix}\mathbf {q}^{\!+}\\ \mathbf {q}^{\!-}\end{bmatrix}. \end{aligned}$$
(4)

Here, the superscripts in the equation refer to even (‘\(+\)’) and odd (‘−’) parts from the underlying discretization. The matrices \(\mathbf {K}^{\!+}\) and \(\mathbf {K}^{\!-}\) discretize scattering, while \(\mathbf {R}\) incorporates boundary conditions, \(\mathbf {M}^{\!+}\) and \(\mathbf {M}^{\!-}\) are mass matrices related to \(\sigma _t\), and \(\mathbf {A}\) discretizes \(s\cdot \nabla _r\), and their assembly can be done with standard FEM codes. The even part solves the even-parity equations

$$\begin{aligned} \mathbf {E}\mathbf {u}^{\!+}= \mathbf {K}^{\!+}\mathbf {u}^{\!+}+ \mathbf {q}, \end{aligned}$$
(5)

i.e., the Schur complement of (4), with symmetric positive definite matrix \(\mathbf {E}=\mathbf {A}\!^{\intercal } (\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\mathbf {A}+\mathbf {M}^{\!+}+\mathbf {R}\) and source term \(\mathbf {q}=\mathbf {q}^{\!+}+\mathbf {A}\!^{\intercal } (\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\mathbf {q}^{\!-}\). Once the even part \(\mathbf {u}^{\!+}\) is known, the odd part \(\mathbf {u}^{\!-}\) can be obtained from (4). The preconditioned Richardson iteration considered in this article then reads

$$\begin{aligned} \mathbf {u}^{\!+}_{n+1}= \big (\mathbf {I}- \mathbf {P}_2 \mathbf {P}_1 (\mathbf {E}-\mathbf {K}^{\!+})\big )\mathbf {u}^{\!+}_n + \mathbf {P}_2\mathbf {P}_1 \mathbf {q}, \end{aligned}$$
(6)

with preconditioners \(\mathbf {P}_1\) and \(\mathbf {P}_2\). Comparing to standard DSA source iterations, \(\mathbf {P}_1\) corresponds to a transport sweep, and a typical choice that renders the convergence behavior of (6) independent of the discretization parameters is \(\mathbf {P}_1=\mathbf {E}^{-1}\). More precisely, we show that this choice of \(\mathbf {P}_1\) yields a contraction rate of \(c=\Vert \sigma _s/\sigma _t\Vert _\infty \). The second preconditioner \(\mathbf {P}_2\) aims to improve the convergence behavior in diffusive regimes, \(c\approx 1\). In the spirit of [40], we construct \(\mathbf {P}_2\) via Galerkin projection onto suitable subspaces, which guarantees monotone convergence of (6). The construction of suitable subspaces that give good error reduction is motivated by the observation that error modes that are hardly damped by \(\mathbf {I}-\mathbf {P}_1(\mathbf {E}-\mathbf {K}^{\!+})\) can be approximated well by spherical harmonics of low degree, cf. Sect. 3.4. While for the isotropic case \(g=0\), spherical harmonics of degree zero, i.e., constants in angle, are sufficient for obtaining good convergence rates, we show that higher order spherical harmonics should be used for anisotropic scattering. To preserve consistency, we replace higher order spherical harmonics, which are the eigenfunctions of the integral operator in (1), by discrete eigenfunctions of \(\mathbf {K}^{\!+}\).

The efficiency of the proposed iterative scheme hinges on the ability to efficiently implement and apply the arising operators. While for \(g=0\) it holds \(K^{-} = 0, K^{+}\) can be realized via fast Fourier transformation, and \(\mathbf {E}\) is block-diagonal with sparse blocks allowing for an efficient application of \(\mathbf {E}\), the situation is more involved for \(g>0\). We show that \(\mathbf {K}^{\!+}\) and \(\mathbf {K}^{\!-}\) can be applied efficiently by exploiting their Kronecker structure between a sparse matrix and a dense matrix, which turns out to be efficiently applicable by using \(\mathcal {H}\)- or \(\mathcal {H}^2\)-matrix approximations independently of g. As we show, the practical implementation of \(\mathcal {H}\)- or \(\mathcal {H}^2\)-matrices can be done by standard libraries, such as H2LIB [9] or BEMBEL [15]. This in combination with standard FEM assembly routines for the other matrices ensures robustness and maintainability of the code.

Since \(\mathbf {A}\), \(\mathbf {M}^{\!+}\), and \(\mathbf {R}\) are sparse and block diagonal, the main bottleneck in the application of \(\mathbf {E}\) is the application of \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\). Based on the tensor structure of \(\mathbf {K}^{\!-}\) and its spectral properties, we derive a preconditioner such that \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\) can be applied robustly in g in only a few iterations. Thus, we can apply \(\mathbf {E}\) in almost linear complexity. Efficiency of (6) is further increased by realizing \(\mathbf {P}_1=\mathbf {E}^{-1}\approx \mathbf {P}_1^l\) inexactly by employing a small, fixed number of l steps of an inner iterative scheme. We show that the condition number of \(\mathbf {P}_1^l\mathbf {E}\) is \(O((1-(cg)^{l})^{-1})\), which is robust in the limit \(c\rightarrow 1\). In contrast, we note that the condition number of \(\mathbf {P}_1^l(\mathbf {E}-\mathbf {K}^{\!+})\) is \(O((1-c)^{-1})\), i.e., a straight-forward iterative solution of the even-parity equations using a black-box solver, such as preconditioned conjugate gradients, is in general not robust for \(c\rightarrow 1\).

Summarizing, each step of our iteration (6) can be performed very efficiently. The iteration is provably convergent and numerical results show that the contraction rates are robust for \(c\rightarrow 1\). The result is a highly efficient numerical scheme for the solution of the even parity equations (5) and, thus, also for the overall system (4).

1.3 Outline

The structure of the paper is as follows: In Sect. 2 we recall the variational formulation that builds the basis of our numerical scheme and establish some spectral equivalences for the scattering operator, which are key to the construction of our preconditioners. In Sect. 3 we present iterative schemes for the even-parity equations of radiative transfer in Hilbert spaces, which, after discretization in Sect. 4, result in the schemes described in Sect. 1.2. Details of the implementation and its complexity are described in Sect. 5. Numerical studies of the performance of the proposed methods are presented in Sect. 7. The paper closes with a discussion in Sect. 8.

2 Preliminaries

In the following we recall the relevant functional analytic framework, state the corresponding variational formulation of the radiative transfer problem (1)–(3) and provide some analytical results about the spectrum of the scattering operator, which we will later use for the construction of our preconditioners.

2.1 Function Spaces

By \(L^2(M)\) we denote the usual Hilbert space of square integrable functions on a manifold M, and denote \((u,w)_M=\int _{M} uw\, dM\) the corresponding inner product and \(\Vert u\Vert _{L^2(M)}\) the induced norm. For \(M=D=S\times R\), we write \(\mathbb {V}=L^2(D)\) and \((u,w)=(u,w)_D\). Functions \(w\in \mathbb {V}\) with weak derivative \(s \cdot \nabla _rw\in \mathbb {V}\) have a well-defined trace [36]. We restrict the natural trace space [36], and consider the weighted Hilbert space \(L^2(\partial D_\pm ;|s \cdot n|)\) of measurable functions w on

$$\begin{aligned} \partial D_\pm =\{(s,r)\in S\times \partial R: \pm s\cdot n(r)>0\} \end{aligned}$$

with \(|s \cdot n|^{1/2} w\in L^2(\partial D_\pm )\). For the weak formulation of (1)–(3) we use the Hilbert space

$$\begin{aligned} \mathbb {W}=\{w\in L^2(D): s \cdot \nabla _rw\in L^2(D),\,\, w_{\mid \partial D_-}\in L^2(\partial D_-;|s \cdot n|)\}, \end{aligned}$$

with corresponding norm \(\Vert w\Vert _\mathbb {W}^2=\Vert s \cdot \nabla _rw\Vert _{L^2(D)}^2+\Vert w\Vert _{L^2(D)}^2+\Vert w\Vert _{L^2(\partial D_-;|s \cdot n|)}^2\).

2.2 Assumptions on the Optical Parameters and Data

The data terms are assumed to satisfy \(q \in L^2(D)\) and \(f\in L^2(\partial D_-;|s \cdot n|)\). Absorption and scattering rates are non-negative and essentially bounded functions \(\sigma _a,\sigma _s\in L^\infty (R)\). We assume that the medium occupied by R is absorbing, i.e., that there exists a constant \(\gamma >0\) such that \(\sigma _a(r)\ge \gamma \) for a.e. \(r\in R\). Thus, the ratio between the scattering rate and the total attenuation rate \(\sigma _t=\sigma _a+\sigma _s\) is strictly less than one, \(c=\Vert \sigma _s/\sigma _t\Vert _\infty <1\).

2.3 Even–Odd Splitting

The space \(\mathbb {V}=\mathbb {V}^+\oplus \mathbb {V}^-\) allows for an orthogonal decomposition into even and odd functions of the variable \(s\in S\). The even part \(u^{\!+}\) and odd part \(u^{\!-}\) of a function \(u\in \mathbb {V}\) is defined a.e. by

$$\begin{aligned} u^{\pm }(s,r)=\frac{1}{2}(u(s,r)\pm u(-s,r)). \end{aligned}$$

Similarly, we denote by \(\mathbb {W}^{\pm }\) the corresponding subspaces of functions \(u\in \mathbb {W}\) with \(u\in \mathbb {V}^\pm \).

2.4 Operator Formulation of the Radiative Transfer Equation

The weak formulation of (1)–(3) presented in [17] can be stated concisely using suitable operators and we refer to [17] for proofs of the corresponding mapping properties. Let \(u^{\!+},w^+\in \mathbb {W}^+\) and \(u^{\!-}\in \mathbb {V}^-\). The transport operator \(\mathcal {A}:\mathbb {W}^+\rightarrow \mathbb {V}^-\) is defined by

$$\begin{aligned} \mathcal {A}u^{\!+}=s \cdot \nabla _ru^{\!+}. \end{aligned}$$

Identifying the dual \(\mathbb {V}'\) of \(\mathbb {V}\) with \(\mathbb {V}\), the dual transport operator \(\mathcal {A}':\mathbb {V}^-\rightarrow (\mathbb {W}^+)'\) is defined by

$$\begin{aligned} \langle \mathcal {A}' u^{\!-},w^+\rangle =(\mathcal {A}w^+, u^{\!-}). \end{aligned}$$

Boundary terms are handled by the operator \(\mathcal {R}:\mathbb {W}^+\rightarrow (\mathbb {W}^+)'\) defined by

$$\begin{aligned} \langle \mathcal {R}u^{\!+},w^+\rangle =(|s \cdot n|u^{\!+},w^+)_{\partial D}. \end{aligned}$$

Scattering is described by the operator \(\mathcal {S}:L^2(S)\rightarrow L^2(S)\) defined by

$$\begin{aligned} (\mathcal {S}u)(s) = \int _{S} k(s\cdot s') u(s')ds', \end{aligned}$$

where k is the phase function defined in (2). In slight abuse of notation, we also denote the trivial extension of \(\mathcal {S}\) to an operator \(L^2(D)\rightarrow L^2(D)\) by \(\mathcal {S}\). We recall that \(\mathcal {S}\) maps even to even and odd to odd functions [17, Lemma 2.6], and so does \(\mathcal {K}:\mathbb {V}\rightarrow \mathbb {V}\) defined by

$$\begin{aligned} \mathcal {K}u = \sigma _s\mathcal {S}u. \end{aligned}$$

We denote by \(\mathcal {K}\) also its restrictions to \(\mathbb {V}^\pm \) and \(\mathbb {W}^+\), respectively. The spherical harmonics \(\{H^l_m: l\in \mathbb {N}_0, -l\le m\le l\}\) form a complete orthogonal system for \(L^2(S)\), and we assume the normalization \(\Vert H^l_m\Vert _{L^2(S)}=1\). Furthermore, \(H^{l}_m\) is an eigenfunction of \(\mathcal {S}\) with eigenvalue \(g^l\), i.e.,

$$\begin{aligned} \mathcal {S}H^l_m = g^l H^l_m, \end{aligned}$$
(7)

and \(H^l_m\in \mathbb {V}^+\) if l is an even number and \(H^l_m\in \mathbb {V}^-\) if l is an odd number. Attenuation is described by the multiplication operator \(\mathcal {M}:\mathbb {V}\rightarrow \mathbb {V}\) defined by

$$\begin{aligned} \mathcal {M}u=\sigma _tu. \end{aligned}$$

Introducing the functionals \(\ell ^+ \in (\mathbb {W}^+)'\) and \(\ell ^-\in (\mathbb {V}^-)'\) given by

$$\begin{aligned} \ell ^+(w^+)={}&(q,w^+) +2 (|s \cdot n| f, w^+)_{\partial D_-},&w^+\in \mathbb {W}^+,\\ \ell ^-(w^-)={}&(q,w^-),&w^-\in \mathbb {V}^-,\\ \end{aligned}$$

the operator formulation of the radiative transfer equation (1)–(3) is [17]: Find \((u^{\!+},u^{\!-})\in \mathbb {W}^+\times \mathbb {V}^-\) such that

$$\begin{aligned} \mathcal {R}u^{\!+}- \mathcal {A}' u^{\!-}+ \mathcal {M}u^{\!+}&= \mathcal {K}u^{\!+}+\ell ^+ \qquad \text { in } (\mathbb {W}^+)', \end{aligned}$$
(8)
$$\begin{aligned} \mathcal {A}u^{\!+}+ \mathcal {M}u^{\!-}&= \mathcal {K}u^{\!-}+ \ell ^- \qquad \text { in } \mathbb {V}^-. \end{aligned}$$
(9)

2.5 Well-Posedness

In the situation of Sect. 2.2, there exists a unique solution \((u^{\!+},u^{\!-})\in \mathbb {W}^+\times \mathbb {V}^-\) of (8) and (9) satisfying

$$\begin{aligned} \Vert u^{\!+}\Vert _{\mathbb {W}} + \Vert u^{\!-}\Vert _{\mathbb {V}} \le C \left( \Vert q\Vert _{L^2(D)}+ \Vert f\Vert _{L^2(\partial D_-;|s \cdot n|)}\right) , \end{aligned}$$

with a constant C depending only on \(\gamma \) and \(\Vert \sigma _t\Vert _\infty \) [17]. Notice that this well-posedness result remains true even if \(\sigma _a\) and \(\sigma _s\) are allowed to vanish [18]. As shown in [17, Theorem 4.1] it holds that \(u^{\!-}\in \mathbb {W}^-\) and \(u^{\!+}+u^{\!-}\in \mathbb {W}\) satisfies (1) a.e. in D and (3) holds in \(L^2(\partial D_-;|s \cdot n|)\).

2.6 Even-Parity Formulation

As in [17], it follows from (7) that

$$\begin{aligned} \inf _{r\in R} (\sigma _a+(1-g)\sigma _s) \Vert v^-\Vert _\mathbb {V}^2 \le \Vert v^-\Vert _{\mathcal {M}-\mathcal {K}}^2\le \Vert \sigma _t\Vert _\infty \Vert v^-\Vert ^2_\mathbb {V}\quad \text {for } v^-\in \mathbb {V}^-, \end{aligned}$$
(10)

where we write \(\Vert w\Vert _{\mathcal {Q}}^2=(\mathcal {Q}w,w)\) for any positive operator \(\mathcal {Q}\). Thus, \(\mathcal {M}-\mathcal {K}:\mathbb {V}^-\rightarrow \mathbb {V}^-\) is boundedly invertible, and, by (9),

$$\begin{aligned} u^{\!-}= (\mathcal {M}-\mathcal {K})^{-1} (\ell ^- -\mathcal {A}u^{\!+}). \end{aligned}$$
(11)

Using (11) in (8) and introducing

$$\begin{aligned} \mathcal {E}:\mathbb {W}^+\rightarrow (\mathbb {W}^+)',\quad \mathcal {E}u^{\!+}= \mathcal {R}u^{\!+}+ \mathcal {A}' (\mathcal {M}-\mathcal {K})^{-1} \mathcal {A}u^{\!+}+ \mathcal {M}u^{\!+}, \end{aligned}$$

and

$$\begin{aligned} \ell (w^+)=\ell ^+(w^+) +((\mathcal {M}-\mathcal {K})^{-1} q,\mathcal {A}w^+),\qquad w^+\in \mathbb {W}^+, \end{aligned}$$

the even-parity formulation of the radiative transfer equation is: Find \(u^{\!+}\in \mathbb {W}^+\) such that

$$\begin{aligned} (\mathcal {E}- \mathcal {K}) u^{\!+}= \ell . \end{aligned}$$
(12)

As shown in [17], the even-parity formulation is a coercive, symmetric problem, which is well-posed by the Lax-Milgram lemma. Solving (12) for \(u^{\!+}\in \mathbb {W}^+\), we can retrieve \(u^{\!-}\in \mathbb {V}^-\) by (11). In turn, \((u^{\!+},u^{\!-})\in \mathbb {W}^+\times \mathbb {V}^-\) solves (8)–(9).

2.7 Preconditioning of \(\mathcal {M}-\mathcal {K}\)

We generalize the inequalities (10) to obtain spectrally equivalent approximations to \(\mathcal {M}-\mathcal {K}\). Since \(\mathcal {K}=\sigma _s\mathcal {S}\), we can construct approximations to \(\mathcal {K}\) by approximating \(\mathcal {S}\). To do so let us define for \(N\in \mathbb {N}\) and \(v\in \mathbb {V}\)

$$\begin{aligned} \mathcal {S}_N v =\sum _{l=0}^{N} g^{l} \sum _{m=-l}^{l} (v,H^l_m)_{S} H^l_m. \end{aligned}$$
(13)

Notice that the summation is only over even integers \(0\le l\le N\) if \(v\in \mathbb {V}^+\) and only over odd ones if \(v\in \mathbb {V}^-\). The approximation of \(\mathcal {K}\) is then defined by \(\mathcal {K}_N=\sigma _s\mathcal {S}_N\).

Lemma 1

The operator \(\mathcal {M}-\mathcal {K}_N\) is spectrally equivalent to \(\mathcal {M}-\mathcal {K}\), that is

$$\begin{aligned} \big (1-cg^{N+1}\big ) ((\mathcal {M}-\mathcal {K}_N)v,v)\le ((\mathcal {M}-\mathcal {K})v,v) \le ((\mathcal {M}-\mathcal {K}_N)v,v) \end{aligned}$$

for all \(v\in \mathbb {V}\), with \(c=\Vert \sigma _s/\sigma _t\Vert _\infty \). In particular, \(\mathcal {M}-\mathcal {K}_N\) is invertible.

Proof

We use that \(\{H^m_l\}\) is a complete orthonormal system of \(L^2(S)\). Hence, any \(v\in \mathbb {V}=L^2(S)\otimes L^2(R)\) has the expansion

$$\begin{aligned} v(s,r) = \sum _{l=0}^\infty \sum _{m=-l}^l v^l_m(r) H^l_m(s), \end{aligned}$$

with \(v^l_m\in L^2(R)\) and \(\Vert v\Vert _{\mathbb {V}}^2=\sum _{l=0}^\infty \sum _{m=-l}^l \Vert v^l_m\Vert ^2_{L^2(R)}<\infty \), and

$$\begin{aligned} ((\mathcal {M}-\mathcal {K}_N)v,v) = \sum _{l=0}^L\sum _{m=-l}^l \Big \Vert \sqrt{\sigma _t-g^l\sigma _s}v^l_m\Big \Vert _{L^2(R)}^2 + \sum _{l=N+1}^\infty \sum _{m=-l}^l \Big \Vert \sqrt{\sigma _t} v^l_m\Big \Vert _{L^2(R)}^2. \end{aligned}$$

Using \(c=\Vert \sigma _s/\sigma _t\Vert _\infty \) it follows that

$$\begin{aligned} 0\le ( (\mathcal {K}-\mathcal {K}_N)v,v)&=\sum _{l=N+1}^\infty g^{l}\sum _{m=-l}^l \Big \Vert \sqrt{\sigma _s} v^l_m\Big \Vert _{L^2(R)}^2\le c g^{N+1} ((\mathcal {M}-\mathcal {K}_N)v,v). \end{aligned}$$
(14)

The inequalities in the statement then follow from

$$\begin{aligned} ((\mathcal {M}-\mathcal {K})v,v) = ((\mathcal {M}-\mathcal {K}_N)v,v) - ((\mathcal {K}-\mathcal {K}_N)v,v), \end{aligned}$$

while invertibility follows from [17, Lemma 2.14].

3 Iteration for the Even-Parity Formulation

We generalize the Richardson iteration of [40] for the radiative transfer equation with isotropic scattering to the anisotropic case and equip the iteration process with a suitable preconditioner, which we will investigate later. We restrict ourselves to a presentation suitable for the error analysis and postpone the linear algebra setting and the discussion of its efficient realization to Sect. 5.

3.1 Derivation of the Scheme

We consider the solution of (12) along the following two steps:

Step (i) Given \(u^{\!+}_n\in \mathbb {W}^+\) and a symmetric and positive definite operator \(\mathcal {P}_1:(\mathbb {W}^+)'\rightarrow \mathbb {W}^+\), we compute

$$\begin{aligned} u^{\!+}_{n+\frac{1}{2}}=u^{\!+}_n-\mathcal {P}_1((\mathcal {E}-\mathcal {K})u^{\!+}_n-\ell ). \end{aligned}$$
(15)

Step (ii) Compute a subspace correction to \(u^{\!+}_{n+1/2}\) based on the observation that the error \(e^+_{n+1/2}=u^{\!+}-u^{\!+}_{n+1/2}\) satisfies

$$\begin{aligned} (\mathcal {E}-\mathcal {K}) e^+_{n+\frac{1}{2}} = ((\mathcal {E}-\mathcal {K})\mathcal {P}_1-\mathcal {I})((\mathcal {E}-\mathcal {K})u^{\!+}_n-\ell ). \end{aligned}$$
(16)

Solving (16) is as difficult as solving the original problem. Let \(\mathbb {W}_N^+\subset \mathbb {W}^+\) be closed, and consider the Galerkin projection \(\mathcal {P}_G:\mathbb {W}^+\rightarrow \mathbb {W}_N^+\) onto \(\mathbb {W}_N^+\) defined by

$$\begin{aligned} \langle (\mathcal {E}-\mathcal {K}) \mathcal {P}_G w,v\rangle = \langle (\mathcal {E}-\mathcal {K}) w,v\rangle \quad \text {for all } v\in \mathbb {W}_N^+. \end{aligned}$$
(17)

Using (16), the correction \(u^{\!+}_{c,n}=\mathcal {P}_G e^+_{n+1/2}\), is then characterized as the solution to

$$\begin{aligned} \langle (\mathcal {E}-\mathcal {K})u^{\!+}_{c,n} ,v\rangle = \langle (\mathcal {E}-\mathcal {K})\mathcal {P}_1-\mathcal {I})((\mathcal {E}-\mathcal {K})u^{\!+}_n-\ell ),v\rangle \quad \text {for all } v\in \mathbb {W}_N^+, \end{aligned}$$
(18)

where the right-hand side involves available data only. The update is performed via

$$\begin{aligned} u^{\!+}_{n+1} = u^{\!+}_{n+\frac{1}{2}} + u^{\!+}_{c,n}. \end{aligned}$$
(19)

3.2 Error Analysis

Since \(\mathcal {P}_G\) is non-expansive in the norm induced by \(\mathcal {E}-\mathcal {K}\), the error analysis for the overall iteration (15) and (19) relies on the spectral properties of \(\mathcal {P}_1\). Therefore, the following theoretical investigations consider the generalized eigenvalue problem

$$\begin{aligned} (\mathcal {E}-\mathcal {K})w = \lambda \mathcal {P}_1^{-1} w. \end{aligned}$$
(20)

The following well-known lemma asserts that the half-step (15) yields a contraction if an appropriate preconditioner \(\mathcal {P}_1\) is chosen. We provide a proof for later reference.

Lemma 2

Let \(0<\beta \le 1\) and assume that the eigenvalues \(\lambda \) of (20) satisfy \(\beta \le \lambda \le 1\). Then, for any \(u^{\!+}_n\in \mathbb {W}^+\), \(u^{\!+}_{n+1/2}\) defined via (15) satisfies

$$\begin{aligned} \Vert u^{\!+}-u^{\!+}_{n+\frac{1}{2}}\Vert _{\mathcal {E}-\mathcal {K}} \le (1-\beta ) \Vert u^{\!+}-u^{\!+}_n\Vert _{\mathcal {E}-\mathcal {K}}. \end{aligned}$$

Proof

Assume that \(\{(w_k,\lambda _k)\}_{k\ge 0}\) is the eigensystem of the generalized eigenvalue problem (20). For any \(u^{\!+}_n\), the error \(e^+_n=u^{\!+}-u^{\!+}_n\) satisfies

$$\begin{aligned} e^+_{n+\frac{1}{2}}=(\mathcal {I}-\mathcal {P}_1(\mathcal {E}-\mathcal {K}))e^+_n. \end{aligned}$$
(21)

Using the expansion \(e^+_n=\sum _{k=0}^\infty a_k w_k\), we compute \(\Vert e^+_n\Vert ^2_{\mathcal {E}-\mathcal {K}} = \sum _{k=0}^\infty a_k^2 \lambda _k\). Using (21), we thus obtain \(e^+_{n+1/2} = \sum _{k=0}^\infty (1-\lambda _k) a_k w_k\), and hence

$$\begin{aligned} \Vert e^+_{n+\frac{1}{2}}\Vert _{\mathcal {E}-\mathcal {K}}^2 = \sum _{k=0}^\infty (1-\lambda _k)^2 \lambda _k a_k^2 \le \sup _{0\le k<\infty }(1-\lambda _k)^2 \Vert e^+_n\Vert ^2_{\mathcal {E}-\mathcal {K}}. \end{aligned}$$

Since \(0<\beta \le \lambda _k\le 1\) by assumption, the assertion follows.

The next statement asserts that the iterative scheme defined by (19) converges linearly to the even part of the solution of the radiative transfer equation. It is a direct consequence of Lemma 2 and the observation that \(e_{n+1}^+=(\mathcal {I}-\mathcal {P}_G)e_{n+1/2}^+\) satisfies

$$\begin{aligned} \Vert e_{n+1}^+\Vert _{\mathcal {E}-\mathcal {K}}=\inf _{v\in \mathbb {W}^+_N} \Vert e_{n+\frac{1}{2}}^+-v\Vert _{\mathcal {E}-\mathcal {K}}. \end{aligned}$$
(22)

Lemma 3

Let \(\mathbb {W}_N^+\subset \mathbb {W}^+\) be closed, and assume that the eigenvalues \(\lambda \) of (20) satisfy \(\beta \le \lambda \le 1\) for some \(0<\beta \le 1\). Then, for any \(u^{\!+}_0\in \mathbb {W}^+\), the sequence \(\{u^{\!+}_n\}\) defined in (15) and (19) converges linearly to the solution \(u^{\!+}\) of (12), i.e.,

$$\begin{aligned} \Vert u^{\!+}-u^{\!+}_{n+1}\Vert _{\mathcal {E}-\mathcal {K}} \le (1-\beta )\Vert u^{\!+}-u^{\!+}_n\Vert _{\mathcal {E}-\mathcal {K}}. \end{aligned}$$
(23)

In view of the previous lemma fast convergence \(u^{\!+}_n\rightarrow u^{\!+}\) can be obtained by ensuring that \(\beta \) is close to one or by making the best-approximation error in (22) small. These two possibilities are discussed in the remainder of this section in more detail.

3.3 Generic Preconditioners

The next result builds the basis for the preconditioner we will use later.

Lemma 4

Let \(\mathcal {P}_1\) be defined either by

  1. (i)

    \(\mathcal {P}_1^{-1}=\mathcal {E}\) or

  2. (ii)

    \(\mathcal {P}_1^{-1}=\mathcal {E}_0=(1-cg)^{-1} \mathcal {A}'\mathcal {M}^{-1}\mathcal {A}+ \mathcal {M}+ \mathcal {R}.\)

Then \(\mathcal {P}_1\) is spectrally equivalent to \(\mathcal {E}-\mathcal {K}\), i.e.,

$$\begin{aligned} (1-c)(\mathcal {P}_1^{-1} w^+,w^+) \le ((\mathcal {E}-\mathcal {K})w^+,w^+) \le (\mathcal {P}_1^{-1} w^+,w^+), \end{aligned}$$

for all \(w^+\in \mathbb {W}^+\). It holds \(1-\beta =c\) in Lemma 3 in both cases.

Proof

Since \(\mathcal {A}w^+\in \mathbb {V}^-\), the result is a direct consequence of Lemma 1.

Remark 1

We can further generalize the choices for \(\mathcal {P}_1^{-1}\) by choosing \(N^+\ge -1\), \(N^-\ge 0\), and \(\gamma _{N^-}=1/(1-cg^{N^-+1})\). Then

$$\begin{aligned} \mathcal {P}_1^{-1}=\mathcal {P}^{-1}_{N^+,N^-}=\mathcal {R}+ \gamma _{N^-}\mathcal {A}' (\mathcal {M}-\mathcal {K}_{N^-})^{-1}\mathcal {A}+ \mathcal {M}-\mathcal {K}_{N^+} \end{aligned}$$

and \(\mathcal {E}-\mathcal {K}\) are spectrally equivalent, i.e.,

$$\begin{aligned} (1-cg^{\min (N^-,N^+)+1})(\mathcal {P}_1^{-1} w^+,w^+) \le ((\mathcal {E}-\mathcal {K})w^+,w^+) \le (\mathcal {P}_1^{-1} w^+,w^+) \end{aligned}$$

for all \(w^+\in \mathbb {W}^+\). In particular, \(1-\beta =cg^{\min (N^-,N^+)+1}\) in Lemma 3.

Remark 2

For isotropic scattering \(g=0\), we have that \(\mathcal {E}=\mathcal {E}_0\). Thus, both choices in Lemma 4 can be understood as generalizations of the iteration considered in [40].

The preconditioners in Remark 1 yield arbitrarily small contraction rates for sufficiently large \(N^+\) and \(N^-\). However, the efficient implementation of such a preconditioner seems to be rather challenging. Therefore, we focus on the preconditioners defined in Lemma 4 in the following. Since these choices for \(\mathcal {P}_1\) yield slow convergence for \(c\approx 1\), we need to construct \(\mathbb {W}_N^+\) properly. This construction is motivated next, see Sect. 5.4 for a precise definition.

3.4 A Motivation for Constructing Effective Subspaces

From the proof of Lemma 2, one sees that error modes associated to small eigenvalues \(\lambda \) of (20) converge slowly. Hence, in order to regain fast convergence, such modes should be approximated well by functions in \(\mathbb {W}_N^+\), see (22). Next, we give a heuristic motivation that such slowly convergent modes might be approximated well by low-order spherical harmonics.

Since we use \(\mathcal {P}_1^{-1}\approx \mathcal {E}\) below, let us fix \(\mathcal {P}_1^{-1}=\mathcal {E}\) in this subsection. Furthermore, let w be a slowly damped mode, i.e., w satisfies (20) with \(\lambda \) such that \(\lambda \approx 1-c \approx 0\). Observe that w also satisfies \(\mathcal {K}w = \delta \mathcal {E}w\) with \(\delta =1-\lambda \approx c \approx 1\), and \(\delta \le c\) by Lemma 4(i). Let us expand the angular part of w into spherical harmonics, cf. Sect. 2.4,

$$\begin{aligned} w(s,r) = \sum _{l=0}^\infty \sum _{m=-l}^l w^l_m(r) H^l_m(s), \end{aligned}$$

where \(w^l_m=0\) if l is odd. As in the proof of Lemma 1, we obtain

$$\begin{aligned} \mathcal {K}w = \sum _{l=0}^\infty g^l \sum _{m=-l}^l \sigma _s(r) w^l_m(r) H^l_m(s). \end{aligned}$$

Since \(\sigma _s\le \sigma _t\), orthogonality of the spherical harmonics implies

$$\begin{aligned}&\sum _{l=0}^\infty c g^l \sum _{m=-l}^l \Vert \sqrt{\sigma _t} w^l_m\Vert _{L^2(R)}^2\\&\quad \ge (\mathcal {K}w,w) =\delta \bigg ( \langle \mathcal {R}w,w\rangle + \Vert s \cdot \nabla _rw\Vert _{(\mathcal {M}-\mathcal {K})^{-1}}^2 + \sum _{l=0}^\infty \sum _{m=-l}^l \Vert \sqrt{\sigma _t} w^l_m\Vert _{L^2(R)}^2\bigg ). \end{aligned}$$

Neglecting the contributions from \(\mathcal {R}\) and \(s \cdot \nabla _r\), we see that

$$\begin{aligned} \sum _{l=0}^\infty (c g^l-\delta ) \sum _{m=-l}^l \Vert \sqrt{\sigma _t} w^l_m\Vert _{L^2(R)}^2\ge 0. \end{aligned}$$
(24)

Since \(\delta \approx c\approx 1\) by assumption and \(g<1\), (24) can hold true only if w can be approximated well by spherical harmonics of degree less than or equal to N for some moderate integer N.

To convince the reader that this is likely to be true, we consider in the following the case \(g=0\) and remark that the overall behaviour does not change too much when varying g. If \(c=\delta \), then (24) implies that \(w_m^l=0\) for all \(l>0\). If \(\delta <c\), then (24) is equivalent to

$$\begin{aligned} \Vert \sqrt{\sigma _t} w^0_0\Vert _{L^2(R)}^2\ge \frac{\delta }{c-\delta } \sum _{l=1}^\infty \sum _{m=-l}^l \Vert \sqrt{\sigma _t} w^l_m\Vert _{L^2(R)}^2. \end{aligned}$$

Therefore, using orthogonality of the spherical harmonics once more, we obtain

$$\begin{aligned} \sum _{l=1}^\infty \sum _{m=-l}^l \Vert \sqrt{\sigma _t} w^l_m\Vert _{L^2(R)}^2&= \Vert \sqrt{\sigma _t}w\Vert _{L^2(D)}^2-\Vert \sqrt{\sigma _t}w_0^0\Vert _{L^2(R)}^2\\&\le \Vert \sqrt{\sigma _t}w\Vert _{L^2(D)}^2-\frac{\delta }{c-\delta } \sum _{l=1}^\infty \sum _{m=-l}^l \Vert \sqrt{\sigma _t} w^l_m\Vert _{L^2(R)}^2. \end{aligned}$$

Rearranging terms yields the estimate

$$\begin{aligned} \sum _{l=1}^\infty \sum _{m=-l}^l \Vert \sqrt{\sigma _t} w^l_m\Vert _{L^2(R)}^2 \le \big (1-\delta /c\big )\Vert \sqrt{\sigma _t}w\Vert _{L^2(D)}^2. \end{aligned}$$

Since, by assumption, \(\delta \approx c\), we conclude that w can be approximated well by \(w_0^0 H_0^0\). Note that this statement quantifies approximation in terms of the \(L^2\)-norm. However, using recurrence relations of spherical harmonics to incorporate the terms \(\langle \mathcal {R}w,w\rangle +\Vert s \cdot \nabla _rw\Vert ^2_{(\mathcal {M}-\mathcal {K})^{-1}}\) into (24), suggests that a similar statement also holds for the \(\mathcal {E}-\mathcal {K}\)-norm. A full analysis of this statement seems out of the scope of this paper, and we postpone it to future research. We conclude that effective subspaces \(\mathbb {W}_N^+\) consist of linear combinations of low-order spherical harmonics, and we employ this observation in our numerical realization.

4 Galerkin Approximation

The iterative scheme of the previous section has been formulated for infinite-dimensional function spaces \(\mathbb {W}^+\) and \(\mathbb {W}_N^+\subset \mathbb {W}^+\). For the practical implementation we recall the approximation spaces described in [17] and [40, Section 6.3]. Let \(\mathcal {T}_h^R\) and \(\mathcal {T}_h^S\) denote shape regular triangulations of R and S, respectively. For simplicity we assume the triangulations to be quasi-uniform. To properly define even and odd functions associated with the triangulations, we further require that \(-K_S\in \mathcal {T}_h^S\) for each spherical element \(K_S\in \mathcal {T}_h^S\). The latter requirement can be ensured by starting with a triangulation of a half-sphere and reflection. Let \(\mathbb {X}_h^+=\mathbb {P}_1^c(\mathcal {T}_h^R)\) denote the vector space of continuous, piecewise linear functions subordinate to the triangulation \(\mathcal {T}_h^R\) with basis \(\{\varphi _i\}\) and dimension \(n_R^+\), and let \(\mathbb {X}_h^-=\mathbb {P}_0(\mathcal {T}_h^R)\) denote the vector space of piecewise constant functions subordinate to \(\mathcal {T}_h^R\) with basis \(\{\chi _j\}\) and dimension \(n_R^-\). Similarly, we denote by \(\mathbb {S}_h^+=\mathbb {P}_0(\mathcal {T}_h^S)\cap L^2(S)^+\) and \(\mathbb {S}_h^-=\mathbb {P}_1(\mathcal {T}_h^S)\cap L^2(S)^-\) the vector spaces of even, piecewise constant and odd, piecewise linear functions subordinate to the triangulation \(\mathcal {T}_h^S\), respectively. We can construct a basis \(\{\mu _k^+\}\) for \(\mathbb {S}_h^+\) by choosing \(n_S^+\) many triangles with midpoints in a given half-sphere, and define the functions \(\mu _k^+\) to be the indicator functions of these triangles. For any other point \(s\in S\), we find \(K_S\in \mathcal {T}_h^S\) with midpoint in the given half-sphere such that \(-s\in K_S\) and we define \(\mu _k^+(s)=\mu _k^+(-s)\). A similar construction leads to a basis \(\{\psi _l^-\}\) of \(\mathbb {S}_h^-\). The conforming approximation spaces are then defined through tensor product constructions, \(\mathbb {W}_h^+=\mathbb {S}_h^+\otimes \mathbb {X}_h^+\), \(\mathbb {V}_h^-=\mathbb {S}_h^-\otimes \mathbb {X}_h^-\). Thus, for some coefficient matrices \(\big [\mathbf {U}^{\!+}_{i,k}\big ]\in \mathbb {R}^{n_R^+\times n_S^+}\) and \(\big [\mathbf {U}^{\!-}_{j,l}\big ]\in \mathbb {R}^{n_R^-\times n_S^-}\), any \(u^{\!+}_h\in \mathbb {W}_h^+\) and \(u^{\!-}_h\in \mathbb {V}_h^-\) can be expanded as

$$\begin{aligned} u^{\!+}_h = \sum _{i=1}^{n_R^+}\sum _{k=1}^{n_S^+} \mathbf {U}^{\!+}_{i,k} \varphi _i \mu _k^+,\qquad u^{\!-}_h = \sum _{j=1}^{n_R^-}\sum _{l=1}^{n_S^-} \mathbf {U}^{\!-}_{j,l} \chi _j \psi _l^-. \end{aligned}$$
(25)

The Galerkin approximation of (8)–(9) computes \((u^{\!+}_h,u^{\!-}_h)\in \mathbb {W}_h^+\times \mathbb {V}_h^-\) such that

$$\begin{aligned} \mathcal {R}u^{\!+}_h - \mathcal {A}' u^{\!-}_h + \mathcal {M}u^{\!+}_h&= \mathcal {K}u^{\!+}_h +\ell ^+ \qquad \text { in } (\mathbb {W}_h^+)', \end{aligned}$$
(26)
$$\begin{aligned} \mathcal {A}u^{\!+}_h + \mathcal {M}u^{\!-}_h&= \mathcal {K}u^{\!-}_h + \ell ^- \qquad \text { in } \mathbb {V}_h^-. \end{aligned}$$
(27)

The discrete mixed system (26)–(27) can be solved uniquely [17]. Denoting \(\mathbf {u}^\pm ={\text {vec}}(\mathbf {U}^\pm )\) the concatenation of the columns of the matrices \(\mathbf {U}^\pm \) into a vector, the mixed system (26)–(27) can be written as the following linear system

$$\begin{aligned} \begin{bmatrix}\mathbf {R}+ \mathbf {M}^{\!+}&{}\quad -\mathbf {A}^\intercal \\ \mathbf {A}&{}\quad \mathbf {M}^{\!-}\end{bmatrix}\begin{bmatrix}\mathbf {u}^{\!+}\\ \mathbf {u}^{\!-}\end{bmatrix} = \begin{bmatrix} \mathbf {K}^{\!+}&{}\\ &{}\mathbf {K}^{\!-}\end{bmatrix}\begin{bmatrix}\mathbf {u}^{\!+}\\ \mathbf {u}^{\!-}\end{bmatrix} + \begin{bmatrix}\mathbf {q}^{\!+}\\ \mathbf {q}^{\!-}\end{bmatrix}. \end{aligned}$$
(28)

The matrices in the system are given by

$$\begin{aligned} \mathbf {K}^{\!+}={}&\varvec{\mathsf {S}}\!^{+}\otimes \varvec{\mathfrak {M}}\!^{+}_s,&\mathbf {K}^{\!-}={}&\varvec{\mathsf {S}}\!^{-}\otimes \varvec{\mathfrak {M}}\!^{-}_s, \end{aligned}$$
(29)
$$\begin{aligned} \mathbf {M}^{\!+}={}&\varvec{\mathsf {M}}\!^{+}\otimes \varvec{\mathfrak {M}}\!^{+}_t,&\mathbf {M}^{\!-}={}&\varvec{\mathsf {M}}\!^{-}\otimes \varvec{\mathfrak {M}}\!^{-}_t, \end{aligned}$$
(30)
$$\begin{aligned} \mathbf {A}={}&\sum _{i=1}^d\varvec{\mathsf {A}}_i\otimes \varvec{\mathfrak {D}}_i,&\mathbf {R}={}&{\text {blkdiag}}(\varvec{\mathfrak {R}}_1,\ldots ,\varvec{\mathfrak {R}}_{n_S^+}), \end{aligned}$$
(31)

where we denote by Gothic letters the matrices arising from the discretization on R and by Sans Serif letters matrices arising from the discretization on S, i.e.,

$$\begin{aligned} (\varvec{\mathfrak {M}}\!^{-}_t)_{j,j'}&=\int _R \sigma _t \chi _j \chi _{j'} dr,&(\varvec{\mathsf {S}}\!^{-})_{l,l'}&= \int _S \mathcal {S}\psi _l^-\psi _{l'}^-ds,\\ (\varvec{\mathfrak {M}}\!^{+}_t)_{i,i'}&=\int _R \sigma _t \varphi _i \varphi _{i'} dr,&(\varvec{\mathsf {S}}\!^{+})_{k,k'}&= \int _S \mathcal {S}\mu _k^+\mu _{k'}^+ ds,\\ (\varvec{\mathfrak {D}}_n)_{j,i}&= \int _R \frac{\partial \varphi _i}{\partial r_n} \chi _j dr,&(\varvec{\mathsf {A}}_n)_{l,k}&= \int _S s_n \psi _{l}^- \mu _k^+ ds,\\ (\varvec{\mathfrak {R}}_k)_{i,i'}&= \int _{\partial R} \varphi _i \varphi _{i'} \omega _k dr,&\omega _k&=\int _{S}|s \cdot n|(\mu _k^+)^2ds . \end{aligned}$$

The matrices \(\varvec{\mathfrak {M}}\!^{-}_s\) and \(\varvec{\mathfrak {M}}\!^{+}_s\) are defined accordingly. By \(\varvec{\mathsf {M}}\!^{+}\) and \(\varvec{\mathsf {M}}\!^{-}\) we denote the Gramian matrices in \(L^2(S)\). We readily remark that all of these matrices are sparse, except for \(\varvec{\mathsf {S}}\!^{+}\) and \(\varvec{\mathsf {S}}\!^{-}\), which are dense. \(\varvec{\mathsf {M}}\!^{+}\) and \(\varvec{\mathsf {M}}\!^{-}\) are diagonal and \(3\times 3\) block diagonal, respectively. Moreover, we note that \(\varvec{\mathfrak {M}}\!^{-}_t\) is a diagonal matrix.

To conclude this section let us remark that taking the Schur complement of (28) finally yields the matrix counterpart of the even-parity system (12), i.e.,

$$\begin{aligned} \mathbf {E}\mathbf {u}^{\!+}= \mathbf {K}^{\!+}\mathbf {u}^{\!+}+ \mathbf {q}\end{aligned}$$
(32)

with \(\mathbf {E}=\mathbf {A}\!^{\intercal }(\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\mathbf {A}+\mathbf {M}^{\!+}+\mathbf {R}\) and \(\mathbf {q}=\mathbf {q}^{\!+}+\mathbf {A}\!^{\intercal } (\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\mathbf {q}^{\!-}\).

5 Discrete Preconditioned Richardson Iteration

After discretization, the iteration presented in Sect. 3 becomes

$$\begin{aligned} \mathbf {u}^{\!+}_{n+1}=\mathbf {u}^{\!+}_n-\mathbf {P}_2\mathbf {P}_1((\mathbf {E}-\mathbf {K}^{\!+})\mathbf {u}^{\!+}_n-\mathbf {q}). \end{aligned}$$
(33)

The preconditioner \(\mathbf {P}_1\) is directly related to \(\mathcal {P}_1\) in (15). By denoting the coordinate vectors of the basis functions of the subspace \(\mathbb {W}^+_{h,N}\subset \mathbb {W}^+_h\) by \(\mathbf {W}\), the matrix representation of the overall preconditioner is

$$\begin{aligned} \mathbf {P}_2\mathbf {P}_1 = \mathbf {P}_1+ \mathbf {W}\big (\mathbf {W}\!^{\intercal }(\mathbf {E}-\mathbf {K}^{\!+})\mathbf {W}\big )^{-1}\mathbf {W}\!^{\intercal }(\mathbf {I}^{\!+}-(\mathbf {E}-\mathbf {K}^{\!+})\mathbf {P}_1). \end{aligned}$$
(34)

Denoting \(\mathbf {P}_G=\mathbf {W}\big (\mathbf {W}\!^{\intercal }(\mathbf {E}-\mathbf {K}^{\!+})\mathbf {W}\big )^{-1}\mathbf {W}\!^{\intercal }(\mathbf {E}-\mathbf {K}^{\!+})\) the matrix representation of the Galerkin projection \(\mathcal {P}_G\) defined in (17), the iteration matrix admits the factorization

$$\begin{aligned} \mathbf {I}^{\!+}-\mathbf {P}_2\mathbf {P}_1(\mathbf {E}-\mathbf {K}^{\!+}) = (\mathbf {I}^{\!+}-\mathbf {P}_G)\big (\mathbf {I}^{\!+}-\mathbf {P}_1(\mathbf {E}-\mathbf {K}^{\!+})\big ). \end{aligned}$$

The discrete analog of Lemma 3 implies that the sequence \(\{\mathbf {u}^{\!+}_n\}\) generated by (33) converges for any initial choice \(\mathbf {u}^{\!+}_0\) to the solution \(\mathbf {u}^{\!+}\) of (32). More precisely, by choosing \(\mathbf {P}_1\) according to Lemma 4, there holds

$$\begin{aligned} \Vert \mathbf {u}^{\!+}-\mathbf {u}^{\!+}_{n+1}\Vert _{\mathbf {E}-\mathbf {K}^{\!+}}\le \eta \Vert \mathbf {u}^{\!+}-\mathbf {u}^{\!+}_{n}\Vert _{\mathbf {E}-\mathbf {K}^{\!+}}, \end{aligned}$$
(35)

where \(0\le \eta \le c<1\) is defined as

$$\begin{aligned} \eta =\sup \Vert (\mathbf {I}^{\!+}-\mathbf {P}_G)(\mathbf {I}^{\!+}-\mathbf {P}_1(\mathbf {E}-\mathbf {K}^{\!+}))\mathbf {v}^+\Vert _{\mathbf {E}-\mathbf {K}^{\!+}} \end{aligned}$$
(36)

with supremum taken over all \(\mathbf {v}^+\in \mathbb {R}^{n_S^+ n_R^+}\) satisfying \(\Vert \mathbf {v}^+\Vert _{\mathbf {E}-\mathbf {K}^{\!+}}=1\). The realization of (33) relies on the efficient application of \(\mathbf {E}\), \(\mathbf {K}^{\!+}\), \(\mathbf {P}_1\) and \(\mathbf {P}_2\) discussed next.

5.1 Application of \(\mathbf {E}\)

In view of (30) and (31) it is clear that \(\mathbf {A}\), \(\mathbf {M}^{\!+}\), and \(\mathbf {M}^{\!-}\) can be stored and applied efficiently by using their tensor product structure, sparsity, and the characterization

$$\begin{aligned} (\mathbf {B}\otimes \mathbf {C}){\text {vec}}(\mathbf {X})={\text {vec}}(\mathbf {D}) \quad \Longleftrightarrow \quad \mathbf {C}\mathbf {X}\mathbf {B}^\intercal =\mathbf {D}, \end{aligned}$$
(37)

where \(\mathbf {C}\in \mathbb {R}^{m\times n}\), \(\mathbf {X}\in \mathbb {R}^{n\times p}\), \(\mathbf {B}\in \mathbb {R}^{q\times p}\), \(\mathbf {D}\in \mathbb {R}^{m\times q}\). The boundary matrix \(\mathbf {R}\) consists of sparse diagonal blocks, and can thus also be applied efficiently, see Sect. 6 for details. The remaining operation required for the application of \(\mathbf {E}\) as given in (32) is the application of \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\), which deserves some discussion. Since \(\mathbf {M}^{\!-}-\mathbf {K}^{\!-}\) has a condition number of \((1-cg)^{-1}\) due to Lemma 1, a straightforward implementation with the conjugate gradient method may be inefficient for \(cg\approx 1\). To mitigate the influence of cg, we can use Lemma 1 once more and obtain preconditioners derived from \(\mathcal {M}-\mathcal {K}_N\), which lead to bounds on the condition number by \((1-(cg)^{N+2})^{-1}\) for odd N. In what follows, we comment on the practical realization of such preconditioners and their numerical construction. As we will verify in the numerical examples, these preconditioners allow the application of \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\) in only a few iterations even for g close to 1.

After discretization, the continuous eigenvalue problem (7) for the scattering operator becomes the generalized eigenvalue problem

$$\begin{aligned} \varvec{\mathsf {S}}\!^{-}\varvec{\mathsf {W}}\!^{-}=\varvec{\mathsf {M}}\!^{-}\varvec{\mathsf {W}}\!^{-}\varvec{{\Lambda }}\!^-. \end{aligned}$$

Since \(\varvec{\mathsf {S}}\!^{-}\) and \(\varvec{\mathsf {M}}\!^{-}\) are symmetric and positive, the eigenvalues satisfy \(0\le \lambda _l\le g\), and we assume that they are ordered non-increasingly. The eigenvectors \(\varvec{\mathsf {W}}\!^{-}\) form an orthonormal basis \((\varvec{\mathsf {W}}\!^{-})\!^{\intercal }\varvec{\mathsf {M}}\!^{-}\varvec{\mathsf {W}}\!^{-}=\varvec{\mathsf {I}}\!^{-}\). Truncation of the eigen decomposition at index \(d_N=(N+1)(N+2)/2\), N odd, which is the number of odd spherical harmonics of order less than or equal to N, yields the approximation

$$\begin{aligned} \varvec{\mathsf {S}}\!^{-}= \varvec{\mathsf {M}}\!^{-}\varvec{\mathsf {W}}\!^{-}\varvec{{\Lambda }}\!^-(\varvec{\mathsf {W}}\!^{-})\!^{\intercal }\varvec{\mathsf {M}}\!^{-}\approx \varvec{\mathsf {M}}\!^{-}\varvec{\mathsf {W}}\!^{-}_N\varvec{{\Lambda }}\!^-_N(\varvec{\mathsf {W}}\!^{-}_N)^\intercal \varvec{\mathsf {M}}\!^{-}=:\varvec{\mathsf {S}}\!^{-}_N. \end{aligned}$$
(38)

The discrete version of \(\mathcal {M}-\mathcal {K}_N\) then reads \(\mathbf {M}^{\!-}-\mathbf {K}^{\!-}_N\), with \(\mathbf {K}^{\!-}_N=\varvec{\mathsf {S}}\!^{-}_N\otimes \varvec{\mathfrak {M}}\!^{-}_s\). An explicit representation of its inverse is given by the following lemma. Its essential idea is to use an orthogonal decomposition of \(\mathbb {V}_h^-\) induced by the eigendecomposition of \(\varvec{\mathsf {S}}\!^{-}\), and to employ the diagonal representation of \(\mathbf {M}^{\!-}-\mathbf {K}^{\!-}_N\) in the angular eigenbasis.

Lemma 5

Let \(\mathbf {b}\in \mathbb {R}^{n_S^-n_R^-}\). Then \(\mathbf {x}=(\mathbf {M}^{\!-}-\mathbf {K}^{\!-}_N)^{-1}\mathbf {b}\) is given by

$$\begin{aligned} \begin{aligned} \mathbf {x}&={} \Big (\varvec{\mathsf {W}}\!^{-}_N\otimes \varvec{\mathfrak {I}}\!^{-}\Big )\Big (\varvec{\mathsf {I}}\!^{-}\otimes \varvec{\mathfrak {M}}\!^{-}_t-\varvec{{\Lambda }}\!^-_N\otimes \varvec{\mathfrak {M}}\!^{-}_s\Big )^{-1}\Big ((\varvec{\mathsf {W}}\!^{-}_N)^\intercal \otimes \varvec{\mathfrak {I}}\!^{-}\Big )\mathbf {b}\\&\qquad + \Big (\Big ((\varvec{\mathsf {M}}\!^{-})^{-1}-\varvec{\mathsf {W}}\!^{-}_N(\varvec{\mathsf {W}}\!^{-}_N)^\intercal \Big )\otimes (\varvec{\mathfrak {M}}\!^{-}_t)^{-1}\Big )\mathbf {b}, \end{aligned} \end{aligned}$$
(39)

where \(\varvec{\mathfrak {I}}\!^{-}\) and \(\varvec{\mathsf {I}}\!^{-}\) denote the identity matrices of dimension \(n_R^-\) and \(d_N\), respectively.

Proof

We first decompose \(\mathbf {x}\) as follows

$$\begin{aligned} \mathbf {x}= \big (\varvec{\mathsf {W}}\!^{-}_N(\varvec{\mathsf {W}}\!^{-}_N)\!^{\intercal }\varvec{\mathsf {M}}\!^{-}\otimes \varvec{\mathfrak {I}}\!^{-}\big )\mathbf {x} + \big ((\varvec{\mathsf {I}}\!^{-}-\varvec{\mathsf {W}}\!^{-}_N(\varvec{\mathsf {W}}\!^{-}_N)^\intercal \varvec{\mathsf {M}}\!^{-})\otimes \varvec{\mathfrak {I}}\!^{-}\big )\mathbf {x}. \end{aligned}$$
(40)

Applying \((\varvec{\mathsf {W}}\!^{-}_N)^\intercal \otimes \varvec{\mathfrak {I}}\!^{-}\) to \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-}_N)\mathbf {x}=\mathbf {b}\), (38), and \(\varvec{\mathsf {M}}\!^{-}\)-orthogonality of \(\varvec{\mathsf {W}}\!^{-}_N\) yield

$$\begin{aligned} \big (\varvec{\mathsf {I}}\!^{-}\otimes \varvec{\mathfrak {M}}\!^{-}_t-\varvec{{\Lambda }}\!^-_N\otimes \varvec{\mathfrak {M}}\!^{-}_s\big ) \big ((\varvec{\mathsf {W}}\!^{-}_N)^\intercal \varvec{\mathsf {M}}\!^{-}\otimes \varvec{\mathfrak {I}}\!^{-}\big )\mathbf {x}= \big ((\varvec{\mathsf {W}}\!^{-}_N)^\intercal \otimes \varvec{\mathfrak {I}}\!^{-}\big )\mathbf {b}. \end{aligned}$$

Inverting \(\varvec{\mathsf {I}}\!^{-}\otimes \varvec{\mathfrak {M}}\!^{-}_t-\varvec{{\Lambda }}\!^-_N\otimes \varvec{\mathfrak {M}}\!^{-}_s\) and applying \(\varvec{\mathsf {W}}\!^{-}_N\otimes \varvec{\mathfrak {I}}\!^{-}\) further yields

$$\begin{aligned} \big (\varvec{\mathsf {W}}\!^{-}_N(\varvec{\mathsf {W}}\!^{-}_N)^\intercal \varvec{\mathsf {M}}\!^{-}\otimes \varvec{\mathfrak {I}}\!^{-}\big )\mathbf {x} = \big (\varvec{\mathsf {W}}\!^{-}_N\otimes \varvec{\mathfrak {I}}\!^{-}\big )\big (\varvec{\mathsf {I}}\!^{-}\otimes \varvec{\mathfrak {M}}\!^{-}_t-\varvec{{\Lambda }}\!^-_N\otimes \varvec{\mathfrak {M}}\!^{-}_s\big )^{-1}\big ((\varvec{\mathsf {W}}\!^{-}_N)^\intercal \otimes \varvec{\mathfrak {I}}\!^{-}\big )\mathbf {b}. \end{aligned}$$

For the other part in (40), apply \(\big ((\varvec{\mathsf {M}}\!^{-})^{-1}-\varvec{\mathsf {W}}\!^{-}_N(\varvec{\mathsf {W}}\!^{-}_N)^\intercal \big )\otimes (\varvec{\mathfrak {M}}\!^{-}_t)^{-1}\) to \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-}_N)\mathbf {x}=\mathbf {b}\) and obtain

$$\begin{aligned} \big (\big (\varvec{\mathsf {I}}\!^{-}-\varvec{\mathsf {W}}\!^{-}_N(\varvec{\mathsf {W}}\!^{-}_N)^\intercal \varvec{\mathsf {M}}\!^{-}\big )\otimes \varvec{\mathfrak {I}}\!^{-}\big )\mathbf {x} = \big (\big ((\varvec{\mathsf {M}}\!^{-})^{-1}-\varvec{\mathsf {W}}\!^{-}_N(\varvec{\mathsf {W}}\!^{-}_N)^\intercal \big )\otimes (\varvec{\mathfrak {M}}\!^{-}_t)^{-1}\big )\mathbf {b}. \end{aligned}$$

Substituting both expressions into (40) yields the assertion.

Remark 3

If \(\sigma _s\) has huge variations, a more effective approximation to \(\mathbf {K}^{\!-}\) can be obtained from the eigendecomposition

$$\begin{aligned} \varvec{\mathfrak {M}}\!^{-}_s \varvec{\mathfrak {I}}\!^{-}= \varvec{\mathfrak {M}}\!^{-}_t \varvec{\mathfrak {I}}\!^{-}\Delta \end{aligned}$$

with diagonal matrix \(\Delta \) with entries \(\Delta _{j}=\int _R \sigma _s\chi _jdr/\int _R\sigma _t\chi _j dr\). The modified approximation \({\widetilde{\mathbf {K}^{\!-}}}\) is then computed by considering only those combinations of spatial and angular eigenfunctions for which \(\lambda _l\Delta _j\) is above a certain tolerance.

5.2 Application of \(\mathbf {K}^{\!+}\) and \(\mathbf {K}^{\!-}\)

Although \(\mathbf {K}^{\!+}\) and \(\mathbf {K}^{\!-}\) provide a tensor product structure (29) involving the sparse matrices \(\varvec{\mathfrak {M}}\!^{+}_s\) and \(\varvec{\mathfrak {M}}\!^{-}_s\), the density of the scattering operators \(\varvec{\mathsf {S}}\!^{+}\) and \(\varvec{\mathsf {S}}\!^{-}\) becomes a bottleneck for iterative methods due to quadratic complexity in storage consumption and computational cost for assembly and matrix–vector products. \(\mathcal {H}\)- and \(\mathcal {H}^2\)-matrices, which can be considered as abstract variants of the fast multipole method [21, 23], where developed in the context of the boundary element method and can realize the storage, assembly and matrix–vector multiplication in linear or almost linear complexity, see [8, 25] and the references therein. A sufficient condition for compressibility in these formats is the following.

Definition 1

Let \(\tilde{S}\subset \mathbb {R}^d\) such that \(k:\tilde{S}\times \tilde{S}\rightarrow \mathbb {R}\) is defined and arbitrarily often differentiable for all \(\tilde{\mathbf {x}}\ne \tilde{\mathbf {y}}\) with \(\tilde{\mathbf {x}},\tilde{\mathbf {y}}\in \tilde{S}\). Then \(k\) is called asymptotically smooth if

$$\begin{aligned} \big |\partial _{\tilde{\mathbf {x}}}^{\varvec{\alpha }}\partial _{\tilde{\mathbf {y}}}^{\varvec{\beta }} k(\tilde{\mathbf {x}},\tilde{\mathbf {y}})\big | \le C\frac{(|\varvec{\alpha }|+|\varvec{\beta }|)!}{r^{{|\varvec{\alpha }|+|\varvec{\beta }|}}} \Vert \tilde{\mathbf {x}}-\tilde{\mathbf {y}}\Vert ^{-|\varvec{\alpha }|-|\varvec{\beta }|},\qquad \tilde{\mathbf {x}}\ne \tilde{\mathbf {y}}, \end{aligned}$$
(41)

independently of \(\varvec{\alpha }\) and \(\varvec{\beta }\) for some constants \(C,r>0\).

While several methods [14, 16] can operate on the Henyey-Greenstein kernel on the sphere, most classical methods require an extension into space which we define as

$$\begin{aligned} K(\tilde{\mathbf {x}},\tilde{\mathbf {y}}) = k(\mathbf {x}\cdot \mathbf {y}), \qquad \text {with}~\mathbf {x}=\tilde{\mathbf {x}}/\Vert \tilde{\mathbf {x}}\Vert ,~\mathbf {y}=\tilde{\mathbf {y}}/\Vert \tilde{\mathbf {y}}\Vert . \end{aligned}$$
(42)

The following result allows to use this extension in most \(\mathcal {H}\)- and \(\mathcal {H}^2\)-matrix libraries such as [9, 15, 33] in a black-box fashion.

Lemma 6

Let \(g\ge 0\). Then \(K(\tilde{\mathbf {x}},\tilde{\mathbf {y}})\) is asymptotically smooth for \(\tilde{\mathbf {x}},\tilde{\mathbf {y}}\in \mathbb {R}^d\setminus \{0\}\).

Proof

We first remark that the cosinus theorem implies for \(\mathbf {x},\mathbf {y}\in S\) with angle \(\varphi \) that \(\mathbf {x}\cdot \mathbf {y}= \cos (\varphi ) = 1-\Vert \mathbf {x}-\mathbf {y}\Vert ^2/2\). Moreover, \(\tilde{k}(\xi ) = k(1-\xi ^2/2)\) is holomorphic for \(\Re (\xi )>0\) such that its Taylor series around \(\xi >0\) has convergence radius \(\xi \) and the derivatives of \(\tilde{k}\) satisfy \(\big |\partial _\xi ^\alpha \tilde{k}(\xi )\big |\le cr^\alpha \alpha !|\xi |^{-\alpha }\), \(\alpha \in \mathbb {N}_0\), for all \(\xi >0\). Since \(\tilde{\mathbf {x}}\mapsto \mathbf {x}=\tilde{\mathbf {x}}/\Vert \tilde{\mathbf {x}}\Vert \) is analytic for \(\tilde{\mathbf {x}}\ne 0\) and since \(K(\tilde{\mathbf {x}},\tilde{\mathbf {y}})=\tilde{k}(\Vert \mathbf {x}-\mathbf {y}\Vert )\), the assertion follows in complete analogy to the appendix of [27].

The \(\mathcal {H}\)- or \(\mathcal {H}^2\)-approximation of \(\varvec{\mathsf {S}}\!^{+}\) and \(\varvec{\mathsf {S}}\!^{-}\) and the sparsity of \(\varvec{\mathfrak {M}}\!^{+}_s\) and \(\varvec{\mathfrak {M}}\!^{-}_s\) combined with the tensor product identity (37) then allow for an application of \(\mathbf {K}^{\!+}\) and \(\mathbf {K}^{\!-}\) in almost linear or even linear complexity.

5.3 Choice and Implementation of \(\mathbf {P}_1\)

As shown in Sect. 3, choosing \(\mathbf {P}_1\) as in Lemma 4 leads to contraction rates \(\eta \le c\) in (35), i.e., independent of the mesh-parameters. The choice \(\mathbf {P}_1=\mathbf {E}^{-1}\) can be realized through an inner iterative methods, such as a preconditioned Richardson iteration resulting in an inner-outer iteration scheme when employed in (33). An effective preconditioner for \(\mathbf {E}\) is given by the block-diagonal, symmetric positive definite matrix

$$\begin{aligned} \mathbf {E}_0=\frac{1}{1-cg}\mathbf {A}^\intercal (\mathbf {M}^-)^{-1}\mathbf {A}+\mathbf {R}+\mathbf {M}^+ \end{aligned}$$

which provides the spectral estimates

$$\begin{aligned} (1-cg)\mathbf {x}^\intercal \mathbf {E}_0\mathbf {x}\le \mathbf {x}^\intercal \mathbf {E}\mathbf {x}\le \mathbf {x}^\intercal \mathbf {E}_0\mathbf {x}, \end{aligned}$$
(43)

for all \(\mathbf {x}\in \mathbb {R}^{n_S^+n_R^+}\), cf. Lemma 1. Thus, the condition number of \(\mathbf {E}_0^{-1}\mathbf {E}\) is bounded by \((1-cg)^{-1}\), which is uniformly bounded for \(c\in [0,1]\) for fixed \(g<1\). For clarity of presentation, we will use a preconditioned Richardson iteration for the inner iteration to implement \(\mathbf {P}_1\) in the rest of the paper, but remark that a non-stationary preconditioned conjugate gradient method will lead to even better performance. Applying \(\mathbf {P}_1\) with high accuracy may still involve many iterations. Instead, we use a preconditioner \(\mathbf {P}_1^l\) which performs l steps of an inner iteration, i.e., we set \(\mathbf {P}_1^l \mathbf {b}=\mathbf {z}_l\), where

$$\begin{aligned} \mathbf {z}_0=0,\qquad \mathbf {z}_{k+1}=\mathbf {z}_{k}-\mathbf {E}_0^{-1}(\mathbf {E}\mathbf {z}_k-\mathbf {b}),\quad k<l. \end{aligned}$$
(44)

Notice that, \(\mathbf {P}_1^1=\mathbf {E}_0^{-1}\) while \(\mathbf {P}_1^lb \rightarrow \mathbf {E}^{-1}\mathbf {b}\) as \(l\rightarrow \infty \). In fact, with similar arguments as in Lemma 2, it follows from (43) that

$$\begin{aligned} \Vert \mathbf {P}_1^l \mathbf {b}- \mathbf {E}^{-1}\mathbf {b}\Vert _{\mathbf {E}}\le (cg)^l \Vert \mathbf {E}^{-1}\mathbf {b}\Vert _\mathbf {E}, \end{aligned}$$
(45)

where \(\Vert \mathbf {x}\Vert _\mathbf {E}^2=\mathbf {x}^{\!\intercal }\mathbf {E}\mathbf {x}\). The next result asserts that this inexact realization of the preconditioner leads to a convergent scheme.

Lemma 7

Let \(l\ge 1\) be fixed. The iteration (32) with preconditioner \(\mathbf {P}_1=\mathbf {P}_1^l\) defines a convergent sequence, i.e., (35) holds with \(\eta \le c\) and \(\eta \) as in (36).

Proof

Observing that \(\mathbf {P}_1^l = \sum _{k=0}^{l-1} (\mathbf {E}_0^{-1}(\mathbf {E}_0-\mathbf {E}))^k\mathbf {E}_0^{-1}\) and that each term in the sum is symmetric and positive semi-definite for \(k>0\) and positive definite for \(k=0\), it follows that \(\mathbf {P}_1^l\) is symmetric positive definite. Using (43), we deduce that the sum converges as a Neumann series to \(\mathbf {E}^{-1}\). Hence, it follows that

$$\begin{aligned} \mathbf {x}^{\!\intercal } \mathbf {E}_0^{-1}\mathbf {x}\le \mathbf {x}^{\!\intercal } \mathbf {P}_1^{l}\mathbf {x}\le \mathbf {x}^{\!\intercal } \mathbf {E}^{-1}\mathbf {x}\end{aligned}$$
(46)

for all \(\mathbf {x}\in \mathbb {R}^{n_S^+n_R^+}\), which implies that \(\mathbf {x}^{\!\intercal } \mathbf {E}\mathbf {x}\le \mathbf {x}^{\!\intercal } (\mathbf {P}_1^{l})^{-1}\mathbf {x}\le \mathbf {x}^{\!\intercal } \mathbf {E}_0\mathbf {x}\) and, in turn,

$$\begin{aligned} (1-c)\mathbf {x}^{\!\intercal } (\mathbf {P}_1^{l})^{-1}\mathbf {x}\le \mathbf {x}^{\!\intercal } (\mathbf {E}-\mathbf {K})\mathbf {x}\le \mathbf {x}^{\!\intercal } (\mathbf {P}_1^{l})^{-1}\mathbf {x}, \end{aligned}$$
(47)

where we used Lemma 4. The assertion follows then as in Sect. 3.

Remark 4

On the one hand, inspecting (47) we observe that the condition number of \(\mathbf {P}_1^l(\mathbf {E}-\mathbf {K})\), and, similarly, of \(\mathbf {E}^{-1}(\mathbf {E}-\mathbf {K})\) is \((1-c)^{-1}\), which is not robust for scattering dominated regimes \(c\rightarrow 1\); cf. also Lemma 4. On the other hand, combining the second inequality in (46) with (45), we obtain as in Lemma 1, that

$$\begin{aligned} (1-(cg)^l)\mathbf {x}^{\!\intercal } (\mathbf {P}_1^{l})^{-1}\mathbf {x}\le \mathbf {x}^{\!\intercal } \mathbf {E}\mathbf {x}\le \mathbf {x}^{\!\intercal } (\mathbf {P}_1^{l})^{-1}\mathbf {x}, \end{aligned}$$

which shows that the condition number of \(\mathbf {P}_1^l\mathbf {E}\) is bounded by \((1-(cg)^l)^{-1}\), which, for fixed \(g<1\), is robust for \(c\rightarrow 1\).

5.4 Implementation of the Subspace Correction

The optimal subspaces for the correction (18) are constructed from the eigenfunctions associated with the largest eigenvalues of the generalized eigenproblem (20) as can be seen from the proof of Lemma 3. The iterative computation of these eigenfunctions is, however, computationally expensive. Instead, we employ a different, computationally efficient tensor product construction that employs discrete counterparts of low-order spherical harmonics expansions motivated in (Sect. 3.4). More precisely, the subspace for the correction is defined as \(\mathbb {W}_{h,N}^+=\mathbb {P}_{0,N}(\mathcal {T}_h^S)\otimes \mathbb {P}_1^c(\mathcal {T}_h^R)\), where \(\mathbb {P}_{0,N}(\mathcal {T}_h^S)\subset \mathbb {P}_{0}(\mathcal {T}_h^S)\) is the space spanned by the eigenfunctions associated to the \(d_N=(N+1)(N+2)/2\) largest eigenvalues of the generalized eigenvalue problem

$$\begin{aligned} \varvec{\mathsf {S}}\!^{+}\varvec{\mathsf {W}}\!^{+}= \varvec{\mathsf {M}}\!^{+}\varvec{\mathsf {W}}\!^{+}\varvec{{\Lambda }}^+ \end{aligned}$$

for the scattering operator, mimicking (7) after discretization. Note that \(d_N\) with N even is the number of even spherical harmonics of order less than or equal to N, and \(\mathbb {P}_{0,N}(\mathcal {T}_h^S)\) approximates their span. Denote \(\varvec{\mathsf {W}}\!^{+}_N\) the corresponding matrix of coefficient vectors. The subspace \(\mathbb {W}^+_{h,N}\) is spanned by the columns of the matrix \(\mathbf {W}^{\!+}=\varvec{\mathsf {W}}\!^{+}_N\otimes \varvec{\mathfrak {I}}\!^{+}\). At the discrete level, the correction equation (18), thus, reads as

$$\begin{aligned} \big ({\mathbf {W}^{\!+}}\!^{\intercal }(\mathbf {E}-\mathbf {K}^{\!+})\mathbf {W}^{\!+}\big ) \mathbf {u}_c={\mathbf {W}^{\!+}}\!^{\intercal }((\mathbf {E}-\mathbf {K}^{\!+})\mathbf {P}_1-\mathbf {I})((\mathbf {E}-\mathbf {K}^{\!+})\mathbf {u}_n-\mathbf {q}). \end{aligned}$$
(48)

The efficient assembly of the matrix on the left-hand side relies on the tensor product structure of \(\mathbf {K}^{\!+}\) and the choice of \(\varvec{\mathsf {W}}\!^{+}_N\) as outlined in the following. A simple and direct representation of the scattering operator on \(\mathbb {W}^+_{h,N}\) is obtained by

$$\begin{aligned} {\mathbf {W}^{\!+}}\!^{\intercal }\mathbf {K}^{\!+}\mathbf {W}^{\!+}= \varvec{{\Lambda }}_N^+\otimes \varvec{\mathfrak {M}}\!^{+}_s. \end{aligned}$$

Similarly, we have that \({\mathbf {W}^{\!+}}\!^{\intercal }\mathbf {M}^{\!+}\mathbf {W}^{\!+}=\varvec{\mathsf {I}}\!^{+}\otimes \varvec{\mathfrak {M}}\!^{+}_t\), and the block-diagonal structure of \(\mathbf {R}\) allows to compute \({\mathbf {W}^{\!+}}\!^{\intercal }\mathbf {R}\mathbf {W}^{\!+}\), i.e. the (ij)th block-entry is given by

$$\begin{aligned} \sum _{k=1}^{n_S^+} \varvec{\mathfrak {R}}_k (\varvec{\mathsf {W}}\!^{+}_N(k,i)\varvec{\mathsf {W}}\!^{+}_N(k,j)) \end{aligned}$$

which requires \(O(n_S^+(n_R^+)^{(d-1)/d}d_N)\) many multiplications. The efficient assembly of the remaining term \({\mathbf {W}^{\!+}}\!^{\intercal }\mathbf {A}\!^{\intercal }(\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\mathbf {A}\mathbf {W}^{\!+}\) relies on another eigenvalue decomposition, which diagonalizes \(\mathbf {M}^{\!-}-\mathbf {K}^{\!-}\) on the column range of \(\mathbf {A}\mathbf {W}^{\!+}\). The arguments are similar to those in Sect. 5.1 and we leave the details to the reader.

6 Full Algorithm and Complexity

For the convenience of the reader we provide here the full algorithm of our numerical scheme. To simplify presentation we start with the application of \(\mathbf {E}\) as given in Algorithm 1 and the application of \(\mathbf {P}_1\) as given in Algorithm 2. The full preconditioned Richardson iteration (33) is outlined in Algorithm 3.

figure a
figure b
figure c

For the efficient implementation of these algorithms one may exploit that, except for \(\mathbf {R}\), all matrices provide a tensor product structure, see (29)–(31), allowing for efficient storage in \({\mathcal {O}}(n_S^{\pm }+n_R^{\pm })\) or \({\mathcal {O}}(c_{H}n_S^{\pm }+n_R^{\pm })\) complexity by using their sparsity or their \(\mathcal {H}^2\)-matrix representation.Footnote 1 Here, \(c_{H}\) is a constant related to the compression pattern of the \(H^2\)-matrix. The storage requirements and application of \(\mathbf {R}\) have complexity \({\mathcal {O}}(n_S^+(n_R^+)^{(d-1)/d})\). The relation (37) then allows for an efficient application of all matrices occurring in (28) in \({\mathcal {O}}(n_S^{\pm }n_R^{\pm })\) or \({\mathcal {O}}(c_{H}n_S^{\pm }n_R^{\pm })\) operations. Since the solution vector itself has size \(n_S^+n_R^+\), see also (25), and since \(3n_S^+=n_S^-\) and \(n_R^+\sim n_R^-\), all matrices appearing in (28) can be stored and applied with linear complexity.

In the following we elaborate the algorithmic complexities of Algorithm 1–Algorithm 3 in more detail.

6.1 Complexity of Applying \(\mathbf {E}\)

The listing of Algorithm 1 directly indicates that the main effort of applying \(\mathbf {E}\) lies in the preconditioned conjugate gradient method for applying \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\). From Lemma 5, we obtain that \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-}_N)^{-1}(\mathbf {M}^{\!-}-\mathbf {K}^{\!-})\) is applicable in \({\mathcal {O}}((d_N+c_{H}) n_S^- n_R^-)\) operations, while its condition number is \((1-(cg)^{N+2})^{-1}\). This implies an iteration count for the application of \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\) proportional to \((1-(cg)^{N+2})^{-1/2}\) for \(cg\approx 1\) when using the preconditioned conjugate gradient method with a fixed tolerance. The overall complexity for applying \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\) and, thus, also \(\mathbf {E}\) is then \({\mathcal {O}}((d_N+c_{H}) n_S^- n_R^-/(1-(cg)^{N+2})^{1/2})\). We note that typically \(d_N\ll c_{H}\) for moderate N.

6.2 Complexity of Applying the Preconditioner \(\mathbf {P}_1^l\)

\(\mathbf {P}_1^l\) consists of \(l-1\) applications of \(\mathbf {E}\) and l applications of \(\mathbf {E}_0^{-1}\). Since \(\mathbf {E}_0\) is block-diagonal with \(n_S^+\) sparse blocks of size \(n_R^+\times n_R^+\), the application of \(\mathbf {E}_0^{-1}\) can be performed in \({\mathcal {O}}(n_S^+ (n_R^+)^\gamma )\) if the inversion of each block has \({\mathcal {O}}((n_R^+)^\gamma )\) complexity. This amounts to \({\mathcal {O}}(l (d_N+c_{H}) n_S^+ n_R^+/(1-(cg)^{N+2})^{1/2}+ l n_S^+ (n_R^+)^\gamma )\) complexity for the application of \(\mathbf {P}_1^l\). For moderate N, the subspace correction amounts to solving an elliptic system that is reminiscent of an order N spherical harmonics approximation, which can be solved efficiently with a conjugate gradient method preconditioned by a V-cycle geometric multigrid with Gauss-Seidel smoother, cf. [3].

Let us also remark that each diagonal block of \(\mathbf {E}_0\) discretizes an anisotropic diffusion problem with a diffusion tensor \(\sigma _t^{-1}\int _{K_S} s\cdot s^\intercal ds\) for \(K_S\in \mathcal {T}_h^S\). The results reported in [29] indicate that such problems can be treated efficiently by multigrid methods with line smoothing allowing for \(\gamma =1\). A full analysis in the present context is out of the scope of this paper, but any method that gives \(\gamma =1\) allows to perform one step in the Richardson iteration (33) in linear complexity in the dimension of the solution vector. Although \(\gamma >1\), sparse direct solvers may work well, too, cf. Table 9.

6.3 Complexity of the Overall Iteration

We start our considerations by remarking that the truncated eigendecompositions of the smaller matrices \(\varvec{\mathsf {S}}\!^{+}\) and \(\varvec{\mathsf {S}}\!^{-}\) can be obtained by a few iterations of an iterative eigensolver. Once this is achieved, the computation of the reduced matrix \(\mathbf {E}_c\) can be achieved in \(O(n_S^+n_R^+d_N)\) operations, see Sect. 5.4. Thus, the offline cost for the construction of the preconditioners are \(O(n_S^+n_R^+d_N)\). The discussion on the application of \(\mathbf {E}\) and \(\mathbf {P}_1\) shows that a single iteration of Algorithm 3 can be accomplished in \({\mathcal {O}}(l (d_N+c_{H}) n_S^+ n_R^+/(1-(cg)^{N+2})^{1/2}+ l n_S^+ (n_R^+)^\gamma )\) operations.

Let us remark that in the case \(\gamma =1\) each iteration has linear complexity and it can be implemented such that it offers a perfect parallel weak scaling in \(n_S^+n_R^+\) as long as the number of processors is bounded by \(n_S^+\) and \(n_R^+\). To see this, we note that, with \(\mathbf {R}\) being the only exception, we are only relying on matrix–vector products of matrices having tensor-product structure (or sums thereof). Using the identity (37), it is clear that these operations offer the promised weak scaling when these matrix–matrix products are accelerated by a parallelization over the rows and columns of the middle matrix. The matrix \(\mathbf {R}\) does not directly provide such a structure, but its block diagonal structure, cf. (31), provides possibilities for a perfectly weakly scaling implementation as well.

In summary, each step in (33) can be executed very efficiently with straight-forward parallelization. In the next section we show numerically that the number of iterations required to decrease the error below a given threshold is small already for small values of l and N.

7 Numerical Realization and Examples

We present the performance of the proposed iterative schemes using a lattice type problem [10], see Fig. 1. Here, \(R=(0,7)\times (0,7)\), the inflow boundary source \(f=0\), and \(c= \Vert \sigma _s/\sigma _t\Vert _\infty \approx 0.999\). The coarsest triangulation of the sphere consists of 128 element, i.e., \(n_S^+=64\), and \(n_R^+=3249\) vertices to discretize the spatial domain. Finer meshes are obtained by uniform refinement; the new grid points for \(\mathcal {T}_h^S\) are projected to the sphere. To minimize consistency errors, we use higher-order integration rules for the spherical integrals.

Fig. 1
figure 1

Left: geometry of the lattice problem. The optical parameters are \(\sigma _s=10\) and \(\sigma _a=0.01\) in the white and grey regions, \(\sigma _s=0\) and \(\sigma _a=1\) in the black regions and \(q=1\) in the grey region and \(q=0\) outside the grey region. Right: Sketch of the spherical grid

The timings are performed using an AMD dual EPYC 7742 with 128 cores and with 1024GB memory.

7.1 Application of \((\mathcal {M}-\mathcal {K})^{-1}\)

We show that \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\) can be applied efficiently and robustly in g. To that end, we implemented a preconditioned conjugate gradient method with preconditioner \(\mathbf {M}^{\!-}-\mathbf {K}^{\!-}_N\), see Sect. 5.1. Table 1 shows the required iteration counts to achieve a relative error below \(10^{-13}\). For all g, the iteration counts decrease with N as predicted by the considerations in Sect. 6. In particular, since \(\mathbf {K}^{\!-}=\mathbf {K}^{\!-}_N=0\), only one iteration is needed for convergence for \(g=0\). Moreover, we see that, although increasing the value of N increases the workload per iteration, the overall solution time can decrease, which is due to the fact that the scattering operator dominates the computational cost for moderate \(d_N\), see Sect. 6. In the remainder of the paper, we employ \(N=5\), which yields fast convergence for the considered values of g.

Table 1 Iteration counts (timings in sec.) for the application of \((\mathbf {M}^{\!-}-\mathbf {K}^{\!-})^{-1}\) using a preconditioned CG method with preconditioner \(\mathbf {M}^{\!-}-\mathbf {K}^{\!-}_N\) and tolerance \(10^{-13}\) for \(n_S^+=256\) and \(n_R^+=12{,}769\)

7.2 Convergence Rates

We study the norm \(\eta \) of the iteration matrix \((\mathbf {I}^{\!+}-\mathbf {P}_G)(\mathbf {I}^{\!+}-\mathbf {P}_1^l(\mathbf {E}-\mathbf {K}^{\!+}))\) defined in (36) and its spectral radius

$$\begin{aligned} \rho = \max \{|\lambda |:\,\, \lambda \text { is an eigenvalue of } (\mathbf {I}^{\!+}-\mathbf {P}_G)(\mathbf {I}^{\!+}-\mathbf {P}_1^l(\mathbf {E}-\mathbf {K}^{\!+}))\} \end{aligned}$$

for different choices of preconditioners \(\mathbf {P}_1=\mathbf {P}_1^l\), anisotropy factors g and dimensions \(d_N\) chosen for the subspace correction. Since \(\mathbf {P}_G\) is a projection, we have that

$$\begin{aligned} (\mathbf {I}^{\!+}-\mathbf {P}_G)^\intercal (\mathbf {E}-\mathbf {K}^{\!+}) (\mathbf {I}^{\!+}-\mathbf {P}_G) = (\mathbf {E}-\mathbf {K}^{\!+})(\mathbf {I}^{\!+}-\mathbf {P}_G). \end{aligned}$$

Therefore, \(\eta ^2\) is the largest eigenvalue of the eigenvalue problem

$$\begin{aligned} (\mathbf {I}^{\!+}-\mathbf {P}_1^l(\mathbf {E}-\mathbf {K}^{\!+}))(\mathbf {I}^{\!+}-\mathbf {P}_G)(\mathbf {I}^{\!+}-\mathbf {P}_1^l(\mathbf {E}-\mathbf {K}^{\!+})) \mathbf {w}=\lambda \mathbf {w}. \end{aligned}$$

We use Matlab’s eigs function to compute \(\rho \) and \(\eta \) with tolerance \(10^{-7}\) and maximum iterations set to 300.

For the isotropic case \(g=0\), \(\mathbf {P}_1^l=\mathbf {E}_0^{-1}=\mathbf {E}^{-1}\), i.e., \(\rho \) and \(\eta \) do not depend on l. For \(N=0\), Table 2 shows that the values of \(\eta \) and \(\rho \) are essentially independent of the discretization parameters, see also [40]. We observed numerically that choosing \(N\in \{2,4\}\) improves the values of \(\rho \) and \(\eta \) only slightly.

Table 2 Values of \(\rho \) and \(\eta \) of the iteration matrix for \(g=0\) and different angular grids

In the next experiments, we vary g from 0.1 to 0.9 in steps of 0.2. Tables 3, 4, 5, 6 and 7 display the corresponding values of \(\rho \) and \(\eta \). For these anisotropic cases, the iteration count l for the preconditioner \(\mathbf {P}_1^l\) as well as the number \(d_N\), defined in Sect. 5.4, play an important role. For all combinations of \(d_N\) and l, we observe a convergent behavior with \(\eta \le c<1\), which is in line with Lemma 7. The values of \(\rho \) and \(\eta \) decrease substantially with increasing \(d_N\) which is inline with the motivation of Sect. 3.4, while, for fixed \(d_N\) a saturation in l can be observed. For \(d_N\) sufficiently large, it seems that \(\rho =\eta =g^{l}\), see, e.g. Table 6 for \(d_4\) and \(1\le l\le 4\). We may conclude that we can achieve very good convergence rates for moderate values of \(d_N\) and l if combined appropriately.

Table 3 Values of \(\rho \) and \(\eta \) for \(g=0.1\) and different values of \(d_N\) and l to realize \(\mathbf {P}_1^l\)
Table 4 Values of \(\rho \) and \(\eta \) for \(g=0.3\) and different values of \(d_N\) and l to realize \(\mathbf {P}_1^l\)
Table 5 Values of \(\rho \) and \(\eta \) for \(g=0.5\) and different values of \(d_N\) and l to realize \(\mathbf {P}_1^l\)
Table 6 Values of \(\rho \) and \(\eta \) for \(g=0.7\) and different values of \(d_N\) and l to realize \(\mathbf {P}_1^l\)
Table 7 Values of \(\rho \) and \(\eta \) for \(g=0.9\) and different values of \(d_N\) and l to realize \(\mathbf {P}_1^l\)

7.3 \(\mathcal {H}^2\)-Matrix Approximation of \(\mathcal {S}\)

We demonstrate the \(\mathcal {H}^2\)-compressibility of the scattering operator \(\mathcal {S}\). Since every \(\mathcal {H}^2\)-matrix can be represented as an \(\mathcal {H}\)-matrix, this also demonstrates the compressibility of \(\mathcal {S}\) by means of \(\mathcal {H}\)-matrices. For the implementation we use a Mex interface to include the library H2Lib [9] into our Matlab-implementation.

For the numerical experiments themselves, we choose \(g=0.5\) and the same quadrature formula in our Matlab implementation and in our implementation within the H2Lib. The compression algorithm of \(\textsc {H2Lib}\) uses multivariate polynomial interpolation, requiring the extension of the Henyey–Greenstein kernel as in (42). The compression parameters are set to an admissibility parameter \(\eta _{H}=1.4\), \(p=4\) interpolation points on a single interval and a minimal block size parameter \(n_{\min }=64\), see [8, 25]. We also tested an implementation without the need for an extension within the Bembel library [15] which yields similar results, but requires a finite element discretization on quadrilaterals, rather than triangles. In both cases, the differences between dense and compressed scattering matrix are below the discretization error.

Table 8 lists the memory requirements, setup time, and time for a single matrix–vector multiplication of \(\varvec{\mathsf {S}}\!^{+}\) in dense and \(\mathcal {H}^2\)-compressed form. We can clearly observe the quadratic complexity for storage and matrix–vector multiplication of the dense matrices and the asymptotically linear complexity of the \(\mathcal {H}^2\)-matrices. The scaling of the assembly times for dense and \(\mathcal {H}^2\)-matrices seems to be worse than predicted by theory, which is possibly caused by memory issues. Nevertheless, the scaling of the \(\mathcal {H}^2\)-matrices for the assembly times is much better than the one for dense matrices.

Table 8 Memory consumption in MB, timings in sec. for assembly and matrix–vector multiplication of \(\varvec{\mathsf {S}}\!^{+}\) and corresponding \(H^2\)-matrix approximation \(\overline{\varvec{\mathsf {S}}\!^{+}}\) for \(g=0.5\)

7.4 Benchmark Example

The viability of the preconditioned Richardson iteration (33) is shown for some larger computations. We fix \(g=0.5\) and solve the even-parity equations (32) for the lattice problem. We fix \(l=4\) steps to realize the preconditioner \(\mathbf {P}_1^l\) and \(N=4\), i.e., we use \(d_4=15\) eigenfunctions of \(\varvec{\mathsf {S}}\!^{+}\) for the subspace correction, cf. (5.4). In view of Table 5, we expect a contraction rate \(\eta \approx 0.16\). Therefore, in order to achieve an error bound \(\Vert \mathbf {u}^{\!+}-\mathbf {u}^{\!+}_{n}\Vert _{\mathbf {E}-\mathbf {K}^{\!+}}<10^{-8}\), we expect to require \(n\approx 10\) iterations. In our implementation, we choose \(\mathbf {u}^{\!+}_0=0\), and we stop the iteration at index n for which

$$\begin{aligned} \Vert \mathbf {u}^{\!+}_{n} - \mathbf {u}^{\!+}_{n-1}\Vert _{\mathbf {E}-\mathbf {K}^{\!+}}<10^{-8} \Vert \mathbf {u}^{\!+}_{1}\Vert _{\mathbf {E}-\mathbf {K}^{\!+}}. \end{aligned}$$
(49)

Note that, assuming a contraction rate \(\eta =0.16\), Banach’s fixed point theorem asserts that the error satisfies \(\Vert \mathbf {u}^{\!+}-\mathbf {u}^{\!+}_{n}\Vert _{\mathbf {E}-\mathbf {K}^{\!+}} \le 0.2 \Vert \mathbf {u}^{\!+}_{n} - \mathbf {u}^{\!+}_{n-1}\Vert _{\mathbf {E}-\mathbf {K}^{\!+}}\). The dimension of the problem on the finest grid is \(n_R^+n_S^+=207{,}360{,}000\), i.e., storing the solution vector requires 1.5 GB of memory. Note that the corresponding dimension of the solution vector to the mixed system is about \(1.5\times 10^9\). Motivated by Table 8 we implement the scattering operators \(\varvec{\mathsf {S}}\!^{+}\) and \(\varvec{\mathsf {S}}\!^{-}\) using dense matrices in this example. The application of \(\mathbf {E}_0^{-1}\) is implemented with Matlab’s sparse LU factorization, i.e., here, \(\gamma \le 1.5\) in the complexity estimates of Sect. 6.

Fig. 2
figure 2

\(\log _{10}\)-plot of the spherical average of the numerical solution \(\mathbf {u}^+\) to the benchmark problem as in Sect. 7.4 for \(n_S^+=1024\) and \(n_R^+=12{,}769\)

Figure 2 shows exemplary the spherical average of the computed solution for \(n_S^+=1024\) and \(n_R^+=12{,}769\). Table 9 displays the iteration counts and timings for different grid refinements. We observe mesh-independent convergence behavior of the iteration, which matches well the theoretical bound \(n\approx 10\). Furthermore, the computation time scales like \((n_R^+)^{1.3}\) for fixed \(n_S^+\). If \(n_S^+\) increases from 1024 to 4096, the superlinear growth in computation time can be explained by using dense matrices for \(\varvec{\mathsf {S}}\!^{+}\) and \(\varvec{\mathsf {S}}\!^{-}\), which, as shown in Table 8, can be remedied by using the compressed scattering operators.

Table 9 Iteration index n (timings in sec.) such that (49) holds for the benchmark example.

8 Conclusions

We have presented efficient preconditioned Richardson iterations for anisotropic radiative transfer that are provably convergent and show robust convergence in the optical parameters, which comprises forward peaked scattering and heterogeneous absorption and scattering coefficients. This has been achieved by employing black-box matrix compression techniques to handle the scattering operator efficiently, and by construction of appropriate preconditioners. In particular, we have shown that, for anisotropic scattering, subspace corrections constructed from low-order spherical harmonics expansions considerably improve the convergence of our iteration.

On the discrete level, our preconditioners can be obtained algebraically from the matrices of any FEM code providing the matrices from the mixed system (28). We discussed further implementational details and their computational complexity, which, supported by several numerical tests, showed the efficiency of our method. If a solver with linear computational complexity for anisotropic elliptic problems is employed to realize \(\mathbf {E}_0^{-1}\), each single iteration of our scheme has linear computational complexity in the discretization parameters. Our numerical examples employed low-order polynomials for discretization, but the presented methodology directly applies to high-order polynomial approximations as well.

Let us mention that the saddle-point problem (4) may also be solved using the MINRES algorithm after appropriate multiplication of the second equation by \(-1\). In view of the inf-sup theory for (8)–(9) given in [17], block-diagonal preconditioners with blocks \(\mathbf {E}-\mathbf {K}^{\!+}\) and \(\mathbf {M}^{\!-}-\mathbf {K}^{\!-}\) lead to robust convergence behavior [50, Section 5.2], but the efficient inversion of \(\mathbf {E}-\mathbf {K}^{\!+}\) is as difficult as solving the even-parity equations, which has been considered in this paper.

Our subspace correction approach can also be related to multigrid schemes [51], and we refer to [31, 34, 42] and the references there in the context of radiative transfer. Comparing to non-symmetric Krylov space methods, such as GMRES or BiCGStab, see [1, 6, 49] and the references there, our approach is very memory effective and monotone convergence behavior is guaranteed. Moreover, in view of its good convergence rates, the preconditioned Richardson iteration presented here is competitive to these multilevel and Krylov space methods.