1 Introduction

Diffusive optical tomography (DOT) is a promising imaging technique with many clinical applications, such as screening for breast cancer and the development of cerebral images [18, 19, 22]. As its mechanism, the input flux of near-infrared (NIR) photons is used to illuminate the body and the output flux is measured on the surface of the body. In this process, chromophores in the NIR window such as oxygenated and deoxygenated hemoglobin, water, and lipid, are abundant in the body tissue, and a weighted sum of their contributions gives different absorption coefficients [13]. Then the measured pairs of the fluxes provide some information for detecting and reconstructing the optical properties inside the body by creating images of the distribution of absorption coefficients inside the body. Let the absorption medium occupy an open bounded connected domain \(\Omega \subseteq {\mathbb {R}}^2\) with a piecewise \(C^2\) boundary and D be a subdomain of \(\Omega \) representing the inhomogeneous inclusions. The absorption coefficient of the media in \(\Omega \) is described by a non-negative function \(\mu \in L^{\infty }(\Omega )\), while \(\mu _0\) is the absorption coefficient of the homogeneous background medium and the support of \(\mu -\mu _0\) occupies the subdomain D. We consider the DOT model that the potential \(u\in H^1(\Omega )\) representing the photon density field satisfies the following equation:

$$\begin{aligned}{} & {} - \triangle u_{\omega } + \mu u_{\omega }&= 0 ~~~~ \text {in} ~~ \Omega , \end{aligned}$$
(1.1)
$$\begin{aligned}{} & {} \frac{\partial u_{\omega }}{\partial {\textbf{n}}}&= g_{\omega } ~~~ \text {on} ~~ \partial \Omega , ~~~ \omega = 1,2,\cdots ,N, \end{aligned}$$
(1.2)

where \(g_{\omega }\in H^{-1/2}(\partial \Omega )\) denotes the surface flux along \(\partial \Omega \), and \(f_{\omega } = u_{\omega }|_{\partial \Omega }\) is the measurement of the surface potential on the boundary. In this paper, we consider the Neumann boundary value problem for simplicity, and the proposed algorithm can be also applied to the Robin boundary value problem. The overarching goal of the DOT problem is to recover the geometry of the inhomogeneous inclusions D from the N pairs of potential-flux data \((g_{\omega },f_{\omega })_{\omega =1}^N\), referred as Cauchy data pairs. It is noted that the considered model can be viewed as a special case of the more general one [7] which has more complicated jump conditions.

The inverse procedure in the DOT problem can be described by the well-known Neumann-to-Dirichlet (NtD) mapping

$$\begin{aligned} \Lambda _{\mu }: H^{-1/2}(\partial \Omega ) \rightarrow H^{1/2}(\partial \Omega ), ~~~ g \mapsto u|_{\partial \Omega }, \end{aligned}$$
(1.3)

where u is the trace of the solution of (1.1) according to the boundary condition \(g\in H^{-1/2}(\partial \Omega )\). It is known that the NtD mapping preserves all the information needed to recover \(\mu \) [7]; see also the discussion in Sect. 3. However, approximating the whole NtD mapping requires a large number of Cauchy data pairs, which hinders the application in many practical situations. To circumvent the application limitation, in this work, we focus on developing the reconstruction algorithm that only requires a reasonably small number of Cauchy data pairs.

Over the past decades, much effort has been rewarded with many promising developments of solving the DOT problem. One type of the widely used algorithms are iterative methods. The augmented Lagrange method [1] and the shape optimization methods [42, 43] reconstruct the inhomogeneous inclusions by minimizing a functional measuring the observed and simulated data. To alleviate the onerous computational costs caused by the necessity of using a fine mesh, the multigrid methods [33, 38] have been proposed to resolve this issue. Another standard approach formulates an integral function involving the Green’s function corresponding to background absorption coefficient, and use the Born type iteration [37, 39] to solve this integral equation until convergence. Nonetheless, the major issue of these approaches is that a large number of the forward PDEs need to be solved in the iterative process, which is time-consuming and computationally infeasible in many practical situations such as three-dimensional problems. Thus, for ultrasound-modulated DOT, the works in [2,3,4] provide the hybrid method that is capable of reconstructing the parameters in known inclusions with only a few iterations.

Another popular group of methods for solving the DOT problem are non-iterative methods. The joint sparsity methods [31, 32] have been investigated to solve the aforementioned integration equation by a non-iterative approach in which an ill-conditioned matrix needs to be inverted and some suitable preconditioners or regularization are demanded. The well-known factorization methods described in [7, 12, 28] compute the spectral information of the boundary data mapping and determine the inclusion shape by checking the convergence property of a series involving the spectral data. Based on a nonlinear Fourier transform, the D-bar methods in [35, 36] reconstruct the absorption coefficient by solving a boundary integral equation. Nevertheless, the severe ill-posedness of the original inverse problem will manifest itself in the computation of the boundary integral equations, which brings strenuousness in computation. This is improved in [29, 30] by linearizing the reconstruction problem and reducing it to a well-posed boundary-value problem for a coupled system of elliptic equations such that it can be efficiently solved. Besides these classical approaches, vast development of deep neural networks-based algorithms has been undergone for solving the DOT problem, for instance, using CNN to learn the nonlinear photon scattering physics and reconstruct the 3D optical anomalies [41] and FDU-Net that is able to fast recover the irregular-shaped inclusions with accurate localization for breast DOT [20], and some detailed surveys can be found in the recent monologue [6, 40].

Recently, direct sampling methods (DSM) arise as very appealing non-iterative strategies to solve many geometric inverse problems.

The critical component of DSM is to construct a certain index function indicating the shape and location of the inclusions through a duality product operating on the boundary data and some probing functions. The DSM has been proven to be highly robust, effective and efficient for many inverse problems, such as electrical impedance tomography (EIT) [16], inverse scattering [25, 26], moving potential reconstruction [17] and the inversion of Radon transform for computed tomography [14]. For the DOT problem, the original DSM [15] designs an elegant index function format for the case of a single measurement pair. However, only a single measurement results in insufficient accuracy of the reconstruction in more complicated cases, which stymies its application in practical scenarios of medical imaging. Indeed, the closed form of explicit index function for multiple Cauchy data pairs or complex-shaped domain may become very intricate and deriving it with conventional mathematical approaches will be very challenging.

On the one hand, deep learning recently has become an alternative approach to canonical mathematical derivation. On the other hand, it is worth mentioning that the index function of DSM can be regarded as a mapping from a specially generated data manifold to the inclusion distribution. The data manifold consists of data functions built up by solving the forward problem with background coefficient, and this underling mathematical structure serves as our crucial guideline to design the neural networks (NN) such that the complicated nonlinear mapping can be learned by data. In our previous work for EIT [23], we have found and shown that the corresponding data-driven index function can successfully capture the essential structure of the true index function, smooth out the noise and outperform the conventional DSM.

To address this barrier from the original DSM, in this work we develop a novel deep direct sampling method (DDSM) for solving the DOT problem based a convolutional network (CNN). The main benefits of the proposed DDSM contain:

  • It is easy to implement for 2D and particularly 3D cases which challenge many conventional approaches.

  • The index function in the DDSM is able to incorporate multiple Cauchy data pairs which enhances the reconstruction quality and robustness with respect to noise.

  • The DDSM inherits the high efficiency of DSM [15], which is benefitted from its offline-online decomposition structure. For given measurements, the reconstruction is based on the fast evaluation of the CNN-based index function in the online stage, which has almost the same speed as the original DSM.

  • Similar to DSM, constructing data functions by solving elliptic equations smooths out the noise on the boundary such that the resulting NN is highly robust against the noise.

  • It is capable of incorporating very limited boundary data points to achieve satisfactory reconstruction.

  • It can yield adequate reconstruction even if the absorbing coefficients of materials to be recovered are significantly different from those used for training, which is of much practical interest as only very rough guess is needed.

The rest of the paper is organized as follows. In Sect. 2, we briefly review the original DSM. The development of our proposed DDSM and its theoretical justification are introduced in Sect. 3. The numerical experiments to validate the advantages of the DDSM are provided in Sect. 4. Finally, Sect. 5 summarizes the research findings.

2 Direct Sampling Methods

In this section, we briefly review the conventional DSM proposed by Chow et al. In [15] for DOT that will serve as the framework guiding us to design suitable neural networks. The main idea is to approximate an index function satisfying

$$\begin{aligned} \mathcal {I}(x) = {\left\{ \begin{array}{ll} 1~&{}\quad \text {if} ~ x\in D, \\ 0~&{}\quad \text {if} ~ x \in \Omega \backslash {\overline{D}}, \end{array}\right. } \end{aligned}$$
(2.1)

which depends on the given Cauchy data. Since the key structure of the index function in [15] assumes only a single pair of Cauchy data, we follow this assumption in this section.

We begin with introducing a family of probing functions \(\eta _x(\xi )\) with \(x\in \Omega \) on \(\partial \Omega \) which are the fundamental ingredients for both the theory of uniqueness of DOT [7] and the DSM [15]. Consider the following diffusion equation with homogeneous background medium absorption coefficient \(\mu _0\):

$$\begin{aligned} - \triangle w_x + \mu _0 w_x = \delta _x ~~~~ \text {in} ~ \Omega ; ~~~ w_x = 0 ~~~ \text {on}~ \partial \Omega , \end{aligned}$$
(2.2)

where \(\delta _x(\xi )\) is the delta function associated with \(x\in \Omega \). For a fixed point \(x \in \Omega \), the probing function \(\eta _{x}\) is defined as the surface flux of \(w_x\) over \( \partial \Omega \):

$$\begin{aligned} \eta _x(\xi ):= \frac{\partial w_x(\xi )}{\partial {\textbf{n}}}, ~~~ \xi \in \partial \Omega . \end{aligned}$$
(2.3)

An essential component of DSM is the dual product,

$$\begin{aligned} \langle \phi , \psi \rangle _{\partial \Omega ,s}: = \int _{\partial \Omega } (-\triangle _{\partial \Omega })^s\phi \, \psi ~ ds ~~~~ \text {on} ~H^{2s}(\partial \Omega )\times L^2(\partial \Omega ), \end{aligned}$$
(2.4)

where \((-\triangle _{\partial \Omega })^s\) is a certain fractional order of the surface Laplacian operator defined on \(\partial \Omega \). The index function can be defined as

$$\begin{aligned} \mathcal {I}(x) = {c_0 (x)} \frac{ \langle \eta _x, f - \Lambda _{\mu _0}g \rangle _{\partial \Omega ,s} }{|\eta _x|_Y} \end{aligned}$$
(2.5)

where \(c_0(x)\) is a constant for normalization specified later, and \(|\cdot |_Y\) is an algebraic function of seminorms in \(H^{2s}(\partial \Omega )\) (itself may not be a norm or seminorm), and a typical choice in [15] is \(|\cdot |_Y = |\cdot |^{1/2}_{H^1(\partial \Omega )}|\cdot |^{3/4}_{L^2(\partial \Omega )}\). In the DSM for various geometric inverse problems [15, 16, 26], this duality product structure plays an important role in index functions since it can effectively remove the errors contained in the data \(f - \Lambda _{\mu _0}g\) by the high smoothness of \(\eta _x\) near \(\partial \Omega \).

Note that the format of the index function in (2.4) may not be computationally effective since probing functions \(\eta _x\) change with respect to x. Therefore, an alternative characterization of the index function is derived in [15] which is the foundation of our neural network. Let \(\varphi \) denote the solution of the following equation also with only the background absorption coefficient

$$\begin{aligned} - \triangle \varphi + \mu _0 \varphi = 0 ~~~~ \text {in} ~ \Omega ; ~~~ \varphi = -(-\triangle _{\partial \Omega })^s (f-\Lambda _{\mu _0}g)~~~ \text {on}~ \partial \Omega . \end{aligned}$$
(2.6)

Then the index function in (2.5) can be effectively computed through \(\varphi \). To show the relationship between the duality product and \(\varphi \), we provide a brief derivation for \(s=0\). Based on the equation (2.2), the definition of duality product in (2.3) and Green’s formula, we have

$$\begin{aligned} \begin{aligned} \langle \eta _x, f - \Lambda _{\mu _0}g \rangle _{\partial \Omega ,0}&= \int _{\partial \Omega } \eta _x( f - \Lambda _{\mu _0}g )ds = -\int _{\partial \Omega } \frac{\partial w_x}{\partial {\textbf{n}}} \varphi ds \\&= \int _{\partial \Omega } w_x\frac{\partial \varphi }{\partial {\textbf{n}}} - \frac{\partial w_x}{\partial {\textbf{n}}} \varphi ds = \int _{\Omega } w_x \triangle \varphi - \varphi \triangle w_x dx \\&= \int _{\Omega } w_x (\mu _0 \varphi ) - \varphi \triangle w_x dx = \varphi (x). \end{aligned} \end{aligned}$$
(2.7)

The same result can be obtained for \(s=1\) in [15]. With (2.5) and (2.7), the index function can be equivalently rewritten as

$$\begin{aligned} \mathcal {I}(x) = c_0(x)\frac{ \varphi (x) }{|\eta _x|_Y}, \end{aligned}$$
(2.8)

where the typical choice for \(c_0\) is \((\Vert \varphi \Vert _{L^{\infty }(\Omega )}+ \varphi (x))^{-1}\) [15].

It is highlighted that the index function depends on the Cauchy data only through the function \(\varphi \) while the value of \(|\eta _x|_Y\) is only based on the geometry of \(\Omega \). This mathematical structure guides us to set \(\varphi \) as the input of the networks designed to approximate the index function, which will be detailed in the next section. Given the importance of \(\varphi \), we shall call it the Cauchy difference function in our following discussion. Note that \(\varphi \) is readily solvable and only needs to done once from (2.6) for a single pair of Cauchy data (fg). Since there is only one PDE to solve in the reconstruction procedure, the cost is much lower than the optimization-based methods. Our proposed DDSM inherits this advantage, namely only N PDEs are required to be solved for N Cauchy data pairs.

Besides \(\varphi \), the index function in (2.8) is determined by the probing functions \(\eta _x\) computed from (2.2) and (2.3). However, the evaluation of \(\eta _x\) requires solving (2.2) for every x that needs to be located, which maybe inefficient. To address this issue, the explicit formulas are provided in [15] for some specific domains; for instance, the probing function corresponding to a unit circle is

$$\begin{aligned} \eta _x(\xi ) = \frac{1}{2\pi } \sum _{n\in {\mathbb {Z}}} \frac{J_n(i\sqrt{\mu _0}r_x)}{J_n(i\sqrt{\mu _0})} e^{in(\theta _x-\theta _{\xi })}, ~~~ \xi \in {\mathbb {S}}^1, \end{aligned}$$
(2.9)

where \(J_n\) are the Bessel functions, and \((r_x,\theta _x)\) is the polar coordinate for a point x. However, it has not escaped our notice that such explicit formulas may not be available for all type of geometries. Different from the analytical typed index function (2.5) in the conventional DSM, the index function in the proposed DDSM is represented by a neural network and learned from data, which is not in a closed form but capable of handling more complicated and practical problems.

3 Deep Direct Sampling Methods

Despite the successful application of the original DSM, we believe some aspects can be further improved with the recently developed DNN techniques, for which conventional mathematical derivation may face some difficulties.

First, the index function format above may only involve a single Cauchy data pair, which may limit its accuracy. Moreover, our numerical results show that including one pair of Cauchy data is robust for the spatially-invariant noise used in [15]. But when the noise become highly spatially variant for boundary data points, for example independent and identically Gaussian distribution illustrated in the left plot of Fig. , the reconstruction may not be robust as shown by the numerical results in Fig. , which brings out the importance and necessarity of including multiple measurements. However, it has been unclear so far how to theoretically develop an explicit index function with a closed form that systematically incorporate multiple Cauchy data pairs through canonical mathematical derivation, though some basic strategies can be applied such as average, maximum or product of each individual index function. Second, even for the case of a single measurement, the format of the index function \(\mathcal {I}(x)\) may not be the optimal approximation to the true index function, for example the empirical choice of the tuning parameter s and norm \(| \cdot |_{Y}\). We believe this is where the DNN models can exploit their advantages to replace some theoretical derivation by data driven approaches such that the more optimal index function can be obtained.

Fig. 1
figure 1

The boundary data with independent and identically Gaussian noise (left), the generated \(\varphi (x)\) without noise (middle), \(\varphi (x)\) with noise (right)

Therefore, to enhance the performance of DSMs based the aforementioned aspects, in this work, we propose a Deep Direct Sampling Method (DDSM) by mimicking the underling mathematical structure suggested by the DSM. For simplicity, we mainly discuss the two-dimensional case and the method can be readily and naturally extended to the three-dimensional case. We have implemented the method for both the 2D and 3D DOT problems where the numerical examples and reconstruction results will be provided in Sect. 4.

3.1 Neural Network Structure

We note that the index function (2.8) suggests the existence of a non-linear mapping or operator from \(\varphi \) to the location of x, namely whether it is inside or outside the inclusions. We shall see in Sect. 3.2 that this operator theoretically exists if the NtD mapping is given, i.e., all the Cauchy data pairs are available, but it is not easy to derive an explicit index function to fully incorporate the information with limited Cauchy data pairs.

Motived by the classical DSM, we define Cauchy difference functions \(\{\varphi ^{\omega }\}_{\omega =1}^{N}\), where \(\varphi ^{\omega } (1 \le \omega \le N)\) is the solution of (2.6) with the boundary value formed by the \(\omega -\)th pair of Cauchy data \(f_{\omega } - \Lambda _{\sigma _0}g_{\omega }\), namely

$$\begin{aligned} - \triangle \varphi ^{\omega } + \mu _0 \varphi ^{\omega } = 0 ~~~~ \text {in} ~ \Omega ; ~~~ \varphi ^{\omega } = -(-\triangle _{\partial \Omega })^s (f_{\omega } - \triangle _{\partial \Omega }g_{\omega })~~~ \text {on}~ \partial \Omega . \end{aligned}$$
(3.1)

Then, the input of the DNN is designed as \((x, \varphi ^1, \dots , \varphi ^{N})\). Using these data functions as the input is one of the main difference between our proposed DDSM and the learning-based approaches reviewed in the literature, which brings quite some advantages. First, it extends the boundary data to the domain interior that mimic images and features, for instance the input can be treated as a \((N+2)\)-channel image, and the nonlinear operator to be trained can be viewed as semantic image segmentation process [11, 24] partitioning a digital image into multiple segments (set of pixels) based on two characteristics: inside or outside the inclusions. The relationship is listed in Table  for readers from various background. This analogy provides us with some well-established tools such as the architecture of the CNN [24]. Then, the new index function is assumed to take the form

$$\begin{aligned} \mathcal {I} = \mathcal {F}_{\text {CNN}}(x, \varphi ^1, \dots , \varphi ^N)~:~~ [H^1(\Omega )]^{2N+2} \rightarrow L^2(\Omega ), \end{aligned}$$
(3.2)

where \( \mathcal {F}_{\text {CNN}}\) is a function expressed by CNN with parameters to be trained. This structure agrees with the theory discussed in Sect. 3.2. Note that \(\mathcal {I}\) becomes an operator from one Sobolev space to another, so in the following discussion we shall call it an index operator, which is one of the main differences from the original DSM. In addition, a remarkable feature of generating \(\varphi \) as the inputs is that the noise can be smoothed out at the boundary through solving the elliptic type PDEs, namely they are highly smoothed inside the domain. For example, in Fig. 1, \(\varphi \) on the right has rough behavior only near the boundary but very smooth inside the domain while \(\varphi \) without noise is shown in the middle.

Table 1 Relationship between DSM for DOT problems and image segmentation problems

Now we proceed to describe the detailed structure of the proposed CNN. For simplicity, we first assume that \(\Omega \) has rectangular shape and leave the general situation for later discussion about the implementation details. Then we suppose that \(\Omega \) is discretized by a \(n_1 \times n_2\) Cartesian grid where \(n_1\) and \(n_2\) are corresponding to the direction of \(x_1\) and \(x_2\) respectively. Based on the previous explanation, the input of the CNN is a 3D matrix (a 4D tensor for 3D problems). For example, for an inclusion sample with N Cauchy difference functions \(\{ \varphi ^{\omega }\}_{\omega =1}^{N}\) generated from \(\{ g_{\omega }, f_{\omega }\}_{\omega = 1}^N\), the corresponding input denoted by \(z_{\text {in}} \in {\mathbb {R}}^{n_x \times n_y \times (N+2)}\) is a stack of \(N+2\) matrices in \({\mathbb {R}}^{n_1 \times n_2}\), where the first two slices are formed by spatial coordinates \(x_1\) and \(x_2\) respectively, and the rest N slices are the numerical solutions \(\varphi ^1(x), \dots , \varphi ^N(x)\) computed at the Cartesian grid points. The pictorial elucidation of \(z_{\text {in}}\) is shown in Fig. .

Fig. 2
figure 2

The structure of CNN for 2D DOT problem. The input is a 3D matrix in \( \in {\mathbb {R}}^{n_x \times n_y \times (N+2)}\) and the output is a matrix in \({\mathbb {R}}^{n_1 \times n_2}\)

The overall configuration of the proposed CNN consists of two parts – convolution and transposed convolution networks as illustrated in Fig. 2. The convolution network corresponds to feature extractor that transforms the input to multidimensional feature representation, where the \(i-\)th block can be represented as

$$\begin{aligned} \phi ^{c}_{i}(z_{\text {conv}}) = \mathcal {M}(\kappa (\beta (W_{\text {conv}} * z_{i} + \varvec{b}_{\text {conv}}))), \end{aligned}$$
(3.3)

in which \(z_i\) is the input image, \(\mathcal {M}\) is the max-pooling layer that mainly help in highlighting the most prominent features of the input image by selecting the maximum element from each non-overlapping subregions, \(W_{\text {conv}}\) is the convolution filter for the 2D convolutional layer, \(\kappa \) indicates the activation function, \(\beta \) is batch normalization layer [24], \(*\) denotes the convolution operation, and \(b_{\text {conv}}\) refers the bias. The transposed convolution network is more like a backwards-strided convolution that produces the output by dilating the extracted feature with the learned filter, where the \(i-\)th block can be expressed as

$$\begin{aligned} \phi ^{t}_{i}(z_{\text {trans}}) = \mathcal {C}(\kappa (\beta (\mathcal {T}(z_{i}, W_{\text {trans}}, \varvec{b}_{\text {trans}})))), \end{aligned}$$
(3.4)

where \(\mathcal {T}\) refers a fractionally-strided convolution layer, \( W_{\text {trans}}\) and \(\varvec{b}_{\text {trans}}\) are the corresponding transposed convolutional filter and the bias, \(\mathcal {C}\) is a concatenation layer, and other notations are the same as (3.3). \(\Theta \) is defined as all the unknown parameters to be learned in training, which includes the convolution and transposed convolutional filters, as well as the biases. In this work, we consider two activation functions: the ReLu and sigmoid

$$\begin{aligned} \kappa (z) = \max \{0, z\} ~~~ \text {and} ~~~~ \kappa (z) = \frac{1}{1+ e^{-z}}. \end{aligned}$$
(3.5)

It is known that choosing ReLu can significantly enhance the sharpness of reconstruction. But our experience suggests that only using ReLu may not yield satisfactory reconstructions sometimes for the situation that inclusions are far away from the training data set. Nevertheless, the numerical results show that using the sigmoid activation in the last layer may improve the reconstruction for those situations.

Then the entire CNN model can be represented as

$$\begin{aligned} \varvec{y}_{\text {out}} = \phi ^{t}_{N_t} \circ \dots \circ \phi ^{t}_{1} \circ \phi ^{c}_{Nc} \circ \dots \circ \phi ^{c}_{1}(z_{\text {in}}), \end{aligned}$$
(3.6)

where the output \(\varvec{y}_{\text {out}}\) is a \(n_1 \times n_2\) matrix which is supposed to approximate an inclusion distribution, i.e., the entire index function values on the domain \(\Omega \). Here the complicated DNN function can approximate the index functional or operator, namely

$$\begin{aligned} \phi ^{t}_{N_t} \circ \dots \circ \phi ^{t}_{1} \circ \phi ^{c}_{Nc} \circ \dots \circ \phi ^{c}_{1} \approx \mathcal {I}, \end{aligned}$$

where, again, the input is not the data at individual points but on the entire domain. As mentioned before, different from the conventional DSM, there is no closed form for the DNN-based index functional, but those parameters to be trained may better approximate the true index operator for the current available data.

Let S be the total number of training samples. To measure the accuracy of the CNN model (3.6), we employ the mean squared error (MSE) as the loss function

$$\begin{aligned} \mathcal {L}_{\text {loss}}(\Theta ) = \frac{1}{S}\sum _{\ell = 1}^{S}(\varvec{y}_{\text {out}}(z_{\text {in}}^{\ell }) - \mathcal {I}^{\ell })^{\top }(\varvec{y}_{\text {out}}(z_{\text {in}}^{\ell }) - \mathcal {I}^{\ell }), \end{aligned}$$
(3.7)

where \(\mathcal {I}^{\ell }\) is the true distribution (index values) corresponding to the \(\ell -\)th inclusion sample, \(z_{\text {in}}^{\ell }\) denotes the input image that is related to Cauchy difference functions \(\{\varphi ^{(\ell , \omega )}\}_{\omega =1}^{N}\) for the \(\ell -\)th inclusion sample. To reduce the infeasibility of gradient descent algorithm for large datasets, stochastic gradient descent (SGD) is implemented to find the minimization of the loss function (3.7), for which we omit the details here.

3.2 Unique Determination

Note that the proposed DNN model implicitly assumes that inclusion distribution can be uniquely determined by those data functions \(\{\varphi ^{\omega }\}_{\omega =1}^N\). In this subsection, we show that this is indeed true for \(N\rightarrow \infty \) and \(\mu _0>0\), \(\mu \ge \mu _0\) in D, i.e., the boundary data are all known in the Sobolev space \(H^{-1/2}(\partial \Omega )\times H^{1/2}(\partial \Omega )\). Let us recall some known results at first. Define the difference between the operators \(\Lambda _{\mu }\) and \(\Lambda _{\mu _0}\) as

$$\begin{aligned} {\widetilde{\Lambda }}_{\mu } = \Lambda _{\mu } - \Lambda _{\mu _0}~: ~ H^{-1/2}(\partial \Omega ) \rightarrow H^{1/2}(\partial \Omega ). \end{aligned}$$
(3.8)

According to [7], based on the assumption that \(\mu _0>0\) and \(\mu \ge \mu _0\) in D, it is known that \({\widetilde{\Lambda }}_{\mu }\) is a self-adjoint and positive operator, and thus it admits the eigenpairs \((\lambda _\omega ,\nu _\omega )\), \(\omega =1,2...\), with \(\lambda _\omega >0\), such that

$$\begin{aligned} {\widetilde{\Lambda }}_{\mu } \nu _\omega = \lambda _\omega \nu _\omega , ~~~ {\omega }=1,2... , \end{aligned}$$
(3.9)

where \(\nu _\omega \) form an orthogonal basis of the space \(H^{-1/2}(\partial \Omega )\). Let \(\mathcal {R}({\widetilde{\Lambda }}_{\mu }^{1/2})\) be the range of the operator \({\widetilde{\Lambda }}_{\mu }^{1/2}\). By the factorization theory for DOT [7], it is known that the information of \(\mathcal {R}({\widetilde{\Lambda }}_{\mu }^{1/2})\) together with the probing functions can be used to determine the inclusion distribution. Here we shall employ this theory to show that the data functions can also uniquely determine the inclusion distribution.

Theorem 3.1

Given a collection of orthonormal basis functions \(\{g_{\omega }\}_{\omega =1}^{\infty }\) in \(H^{-1/2}(\partial \Omega )\), let \(\{g_{\omega },\Lambda _{\mu } g_{\omega }\}_{\omega =1}^{\infty }\) be the corresponding Cauchy data pairs and let \(\{\varphi ^{\omega }\}_{\omega =1}^{\infty }\) be the generated data functions. Then the inclusion can be uniquely determined by the data functions \(\{\varphi ^{\omega }\}_{\omega =1}^{\infty }\).

Proof

According to [7], we know that \(x\in D\) if and only if the corresponding probing function \(\eta _x(\xi )\in \mathcal {R}({\widetilde{\Lambda }}_{\mu }^{1/2})\). By the Picard’s criterion, it is further equivalent to the convergence of the following sequence

$$\begin{aligned} \mathcal {S}(x; \{\nu _\omega \}_{\omega =1}^{\infty } ) = \sum _{\omega =1}^{\infty } \frac{(\eta _x,\nu _{\omega })^2_{\partial \Omega }}{| \lambda _{\omega }| } < \infty . \end{aligned}$$
(3.10)

Let us first focus on \(\{g_{\omega }\}_{\omega =1}^{\infty }\) being exactly the eigenfunctions \(\{\nu _\omega \}_{\omega =1}^{\infty }\) in (3.9). Then, using \( \nu _{\omega } = \lambda ^{-1}_{\omega } {\widetilde{\Lambda }}_{\mu }\nu _{\omega } \) and the identity in (2.7), i.e., using the integration by parts, we have

$$\begin{aligned} \begin{aligned} (\eta _x, \nu _{\omega })_{\partial \Omega }&= \lambda ^{-1}_{\omega } (\eta _x, {\widetilde{\Lambda }}_{\mu }\nu _{\omega } )_{\partial \Omega } \\&= \lambda ^{-1}_{\omega } (\eta _x, (\Lambda _{\mu } - \Lambda _{\mu _0})\nu _{\omega } )_{\partial \Omega } = \lambda ^{-1}_{\omega } \varphi ^{\omega }(x) \end{aligned} \end{aligned}$$
(3.11)

where \(\varphi ^{\omega }\) are the data functions corresponding to \(g_{\omega }=\nu _{\omega }\) solved from (2.6) with \(s=0\). Then, combining (3.10) and (3.11), we have

$$\begin{aligned} \mathcal {S}(x; \{\nu _\omega \}_{\omega =1}^{\infty } ) = \sum _{\omega =1}^{\infty } \frac{ |\varphi ^{\omega }(x)|^2 }{| \lambda _{\omega }|^3 }. \end{aligned}$$
(3.12)

Since \(|\lambda _{\omega }| = \Vert {\varphi ^{\omega }}\Vert _{L^2(\partial \Omega )}/\Vert \nu _{\omega } \Vert _{L^2(\partial \Omega )}\), the inclusion distribution is determined by the convergence property of the sequence in (3.12) which is further determined by \(\varphi ^{\omega }\). Next, for general orthonormal basis \(\{g_{\omega }\}_{\omega =1}^{\infty }\) as the basis, following the argument in [5, Theorem 3.8], we can express \(\{\nu _\omega \}_{\omega =1}^{\infty }\) by the expansion of \(\{g_{\omega }\}_{\omega =1}^{\infty }\) which is then used to express \(\{\varphi ^{\omega }\}_{\omega =1}^{\infty }\) corresponding to \(\{g_{\omega }\}_{\omega =1}^{\infty }\). Plugging it in (3.12) we have the new series to determine the inclusion shape that only depends on \(\{\varphi ^{\omega }\}_{\omega =1}^{\infty }\). \(\square \)

Remark 3.2

According to the theory above, we can conclude that the whole uniqueness does not depend on the value of \(\mu _0\) and \(\mu \), which only requires they are distinguished from each other. Indeed, this theoretical property in a certain sense can manifest itself in the proposed DDSM since the nice reconstruction can be obtained for the different values of \(\mu \) even if they are significant from those used for training.

Remark 3.3

In our work for EIT [23], a similar data function is generate, but the input of the fully neural network is the gradient of the Cauchy difference function. This difference is theoretically supported by the format of \(\varphi \) in the generated series (3.12).

Suppose that the data functions \(\{g_{\omega }\}\) are exactly the eigenfunctions \(\{\nu ^{\omega }\}\), \(\omega =1,2...\), and let \(\varphi ^{\omega }\) be the corresponding harmonic extensions. Then, the argument of Theorem 3.1 enables us to explicitly construct a function approximating the true indicator function: given any \(\epsilon > 0\), there exists a sufficiently large integer N and a function \(\mathcal {I}_N\) depending on \(\varphi ^{\omega }\), \(\omega = 1,...,N\), such that

$$\begin{aligned} \Vert \mathcal {I} - \mathcal {I}_N \Vert _{L^{\infty }(\Omega )} \le \epsilon . \end{aligned}$$
(3.13)

To see this, according to the series in (3.12), we define \(\mathcal {S}_N(x):= \sum _{\omega =1}^{N} \frac{ |\varphi ^{\omega }(x)|^2 }{| \lambda _{\omega }|^3 }\). Theorem 3.1 suggests that there is a constant \(\rho \) such that \(\rho > \mathcal {S}_N(x)\), \(\forall x\in D\), and given each \(\epsilon >0\) there is an integer N such that \(\mathcal {S}_N(x)>4\rho \epsilon ^{-2}/\pi ^2\), \(\forall x\notin D\). Then, we can define \(\mathcal {I}_N\) as

$$\begin{aligned} \mathcal {I}_N(x) = 1 - \frac{2}{\pi } \arctan \left( \frac{\pi \epsilon }{2\rho } \mathcal {S}_N(x) \right) \end{aligned}$$
(3.14)

The desired inequality in (3.13) can be simply verified by the basic inequality \(z > \arctan (z) \ge \frac{\pi }{2} - z^{-1}\), \(\forall z>0\). However, it is worthwhile to mention that such function \(\mathcal {I}_N\) is generally not computable in practice as it still requires the full spectrum information of \({\widetilde{\Lambda }}_{\mu } \). If there are only limited data pairs available, the approximation formula remains unknown in theory. Here, instead we borrow the mathematical structure described above to construct CNN for approximation.

3.3 Implementation Details

In order to perform the convolution operator, the discretization of \(\Omega \) needs to be chosen as Cartesian grids which are natural for rectangular domain. Note that the data functions \(\varphi ^{\omega }\) are solved from the equations merely with the background absorption coefficient, so there is no need to require the mesh to align with the interface, and a simple Cartesian mesh may yield satisfactory numerical solutions. As for a general shaped domain, we just need to immerse it into a rectangle such that the Cartesian grid can be generated on the whole rectangular fictitious domain. However, if the data functions \(\varphi ^{\omega }\) are computed on a general triangular mesh, they need to reevaluated at the generated Cartesian grid points since the values at the triangle mesh points are not able to directly support the CNN computation. To alleviate the computational burden, we can prepare these data functions before training rather than solve them during training. It should be noted that the DDSM indeed requires more memories compared with other deep learning approaches that directly input the original boundary point data, since the newly generated data functions are at least 2D.

4 Numerical Experiments

In this section, we present numerical experiments to demonstrate that our newly developed DDSMs are effective and robust for the reconstruction of inhomogeneous inclusions in the DOT. For such medical imaging problems, in general only limited data can be obtained from real clinical environments, but instead, simulation or the so-called synthetic data are cheap, easily accessible and are not subject to the objective factors. As suggested by many works in the literature [8, 21, 23, 27, 44], these features make synthetic data suitable for training DNN, and the resulting network can be further enhanced by realistic data from clinics. So we shall sample the inclusion distribution by randomly creating some basic geometric objects in \(\Omega \) which are then used to generate synthetic data set for training and testing.

4.1 2D Problem Setting and Data Generation

Let the boundary (interface) of the i-th (\(i=1,2,...,M\)) basic geometric object be represented by a level-set function \(\Gamma _i(x_1,x_2)\), then define the boundary of the inclusion as the zero level set of

$$\begin{aligned} \Gamma (x_1,x_2) = \min _{i=1,2,...,M} \{ \Gamma _i(x_1,x_2) \}. \end{aligned}$$
(4.1)

Then the support of the inclusions is the subset \(\{(x_1,x_2) ~:~ \Gamma (x_1,x_2)<0\}\). More specifically, we consider the follow two scenarios with different basic geometric objects for training data generation:

Scenario 1: \(\Gamma _i\) are 5 random circles with the radius sampled from \(\mathcal {U}(0.2,0.4)\) and the center sampled from \(\mathcal {U}(-0.7,0.7)\).

Scenario 2: \(\Gamma _i\) are 4 random ellipses with the longer axis, the eccentricity and the center points sampled from \(\mathcal {U}(0.2,0.6)\), \(\mathcal {U}(0,0.9)\) and \(\mathcal {U}(-0.7,0.7)\), respectively.

Here \(\mathcal {U}(a,b)\) denotes the uniform distribution over [ab]. Sampling circles or ellipses are widely used in many deep learning methods for generating synthetic data in DOT [21, 27]. However, a major difference is that the format (4.1) allows those basic inclusions to touch and overlap with each other such that the overall inclusion distribution could be much more geometrically complicated, which makes the reconstruction more challenging. Besides, we set \(\mu _0=0\) and \(\mu _1=50\) for our experiments.

For the boundary conditions, following our previous work [23], we still use Fourier functions as the applied surface flux boundary data \(g_{\omega }\) since they naturally form the orthogonal bases on the boundary. In particular, we pick the first N modes:

$$\begin{aligned}{} & {} g_{\omega }(\theta ) = \cos (\omega \theta ) ~~ \omega = 1,2,...,N/2 ~~ \text {and} ~~ g_{\omega }(\theta )\nonumber \\{} & {} \quad =\sin ( (\omega -N/2) \theta ), ~~ \omega = N/2+1,...,N, \end{aligned}$$
(4.2)

where \(\theta \in [0,2\pi )\) is the angle of \((x_1,x_2)\in \partial \Omega \), and we choose \(g_{1}(\theta ) = \cos ( \theta )\) for the case of a single measurement, i.e., \(N=1\), and \(N=10,20\) for the case of multiple measurements (Fig. 6).

For the diffusion–reaction equation with a discontinuous absorption coefficient, the solutions generally have smooth first-order derivatives [10] such that typical second-order schemes may achieve optimal convergence for sufficiently fine meshes. But if the jump of the coefficient is large, there might be a thin layer around the inclusion interface [10] requiring extremely fine meshes to resolve [9]. Here, for simplicity, we focus on the case of moderate jump and apply a uniform \(100\times 100\) Cartesian mesh to solve the forward model (1.1). Nevertheless, as we only need the simulated solutions on the boundary, which is away from the coefficient discontinuity, the numerical solution may not be effected much by the discontinuity. For each inclusion sample with every boundary condition in (4.2), we can obtain the numerical solutions \(u_{\omega }\) and \(f_{\omega }=u_{\omega }|_{\partial \Omega }\), and the data pairs \((f_{\omega },g_{\omega })_{\omega =1}^N\) are used in (3.1) to generate data functions as the input of the proposed DNN. Even so, numerical errors can not be avoided in this procedure, which are then considered as noise in the data. In fact, such errors are much smaller than the artificial noise described below.

On the one hand, it is generally difficult to obtain the accurate knowledge of the physiological noise presented in human data. On the other hand, as discussed in Sect. 3 and shown in the left plot of Fig. 1, using boundary data to generate the data functions \(\varphi ^{\omega }\) can smooth out the noise in the sense that the interior of the data functions can be highly smooth. Both the original DSM [15, 16] and our recent work using DDSM for solving EIT [23] indicate that this mechanism can significantly enhance the robustness with respect to the noise. Therefore, instead of adding noise in the training set, we will add very large noise in the test data. This is intentionally to test the robustness of the proposed DNN with respect to noise that is not contained in the training set. In particular, we apply the following point-wise noise on the synthetic measured data

$$\begin{aligned} f^{\delta }_{\omega }(x) = (1 + \delta G(x)) f_{\omega }, \end{aligned}$$
(4.3)

where \(\delta \) is the signal-to-noise ratio chosen as 0, \(5\%\) and \(10\%\), G(x) are Gaussian random variables with standard norm distribution which are assume to be independently identical with respect to the points x. Different from the set-up in [15] that uses a spatially-invariant noise, such noise will cause very rough data on the boundary as shown in Fig. 1, challenging the robustness of the reconstruction algorithms more seriously.

4.2 2D Reconstructions

Now we present reconstruction results for each scenario in 2D, and explore the performance of the proposed algorithm for some more challenging situtions.

Effect of \(\phi \). To begin with, we first present a group of results to demonstrate that using \(\varphi \) as the input to the DNNs can indeed significantly improve the reconstruction accuracy. For this purpose, we train a CNN with merely \(f-f_0\) as the input and compare the performance with the proposed DDSM by examining their average accuracy on a test set in terms of different metrics. The numerical results are reported in Table  where we can see the DDSM outperforms the approach with \(f-f_0\) as the input a lot. In addition, the employment of \(\phi \) can improve the performance in many aspects. When applying the NNs trained with \(\mu _0 = 0, \mu = 10\) to the data generated with \(\mu _0 = 0, \mu = 100\), we still observe significant improvement. As for the reconstruction for out-of-distribution, i.e., the shape to be reconstructed is very different from the training data set, the improvement is also huge, see the numerical example of Fig. .

Table 2 Comparison of NNs with \(f-f_0\) as the input and with \(\phi \) as the input (\(\mu _0 = 0, \mu = 10\))

Some basic results: The reconstruction results for three basic cases are provided in Figs.  and   for each scenario. The figures clearly show the accurate reconstruction of the proposed algorithm. In particular, for the third case of Fig. 4, the true inclusion has a concave portion near the domain center away from the boundary data which is generally difficult to be captured. But the proposed algorithm can recover it quite satisfactorily.

Fig. 3
figure 3

Reconstruction for 3 cases in Scenario 1 (4 circles) with different Cauchy data number and noise level: Case 1(top), Case 2(middle) and Case 3(bottom)

Fig. 4
figure 4

Reconstruction for 3 cases in Scenario 2 (4 ellipses) with different Cauchy data number and noise level: Case 1 (top), Case 2 (middle) and Case 3 (bottom)

We observe that the proposed algorithm using single or multiple measurements almost gives comparably good reconstruction. However, our further numerical results suggest that using multiple measurements can significantly enhance the robustness of the reconstruction with respect to noise. To demonstrate this, for the first case in Scenario 1 (the top one in Fig. 3), we present the reconstruction of the single pair and 10 pairs with noise in Fig. 5. It is observed that even \(5\%\) noise can totally destroy the reconstruction for the single measurement case while the performance of 10 pairs of measurements is much better but still worse than 20 pairs. It is also worth mentioning that, even for the single measurement, the reconstruction is still highly robust with respect to the spatially-invariant noise used in [15], i.e., G is independent of x in (4.3), of which the results are omitted here (Fig. ).

Fig. 5
figure 5

Comparison of reconstruction with respect to noise for the single measurement and 10 pairs of measurements

Sensitivity to Data: The DOT is well-known to be highly ill-posed which means the inclusions are very insensitive to the boundary data. To examine this phenomenon and test the sensitivity of the algorithm with respect to the data, we consider two different inclusion distributions in Fig. , where the only difference is the center inclusion. We plot the corresponding \(f_{\omega }\) by fixing the applied flux \(g_{\omega }\), \(\omega =1,2,...,10\), which are indeed quite close to each other. Namely, the center inclusion is extremely insensitive to noise, which makes its reconstruction very difficult. The results in Fig. 7 show that the center inclusion can still be clearly captured. We highlight that for 20 pairs of measurements the center inclusion is captured even with \(5\%\) noise which is even larger than the relative difference of their boundary data. These results demonstrate that the proposed algorithm can dig out the small difference buried in boundary data for various inclusion distribution but still keep the robustness with respect to the noise.

Fig. 6
figure 6

Plots of \(g_{1,\omega }|_{\partial \Omega }, g_{2,\omega }|_{\partial \Omega }\) and \(g_{1,\omega }|_{\partial \Omega } - g_{2,\omega }|_{\partial \Omega }\) versus the polar angle \(\theta \) (of points on the boundary) where \(g_{1,\omega }\) and \(g_{2,\omega }\) correspond to inclusion distribution with and without the center inclusion respectively

Fig. 7
figure 7

Comparison of reconstruction for two close inclusion distribution

Out-of-distribution: We note that the basic geometric shape such as circles or ellipses may be more appropriate to be chosen to imitate the tumors for the clinical application of the DDSM. But even so, it may not be expected that the shape to be reconstructed is always covered by or close to the training data set, which triggers us to further investigate the capability of the DDSM for the inclusion distribution that has the geometry out of the scope of the training data, that is, they can not be generated by those basic geometry objects. Here we present the reconstruction for three different shapes: a triangle, two bars and an annulus in Fig. . In particular, the annulus has a hollow center which is difficult to be captured by the lights sourced at the boundary. But we can still observe the satisfactory reconstruction of their basic geometric properties for all these inclusions. Similar to the previous results, the reconstruction by 20 parts is still robust with respect to the large noise. Note that one can add these shapes to the training data set if some more accurate reconstruction is needed.

Fig. 8
figure 8

CNN-DDSM reconstruction for 3 special inclusion shapes: one triangle (top), two long rectangular bars (middle) and a rectangular ring (bottom)

In addition, we also apply the trained DNN to a hexagonal inclusion, see the reconstruction in Fig. 9. Here, we specifically compare two different DNNs where one has \(\varphi \) as the input which is the key for the proposed DDSM, and the other one trivially has the \(f-f_0\) as the input.

Fig. 9
figure 9

DDSM reconstruction for a hexagon (out-of-distribution) with different inputs: input \(f-f_0\) without harmonic extensions (top), input \(\varphi (x)\) (bottom). MSE loss: 0.0471 for the input f and 0.0169 for the input \(\varphi \) (1 data pair); 0.0262 for the input \(f-f_0\) and 0.0125 for the input \(\varphi \) (10 data pair). The improvment by using \(\varphi \) is 64.12% (1 data pair) and 52.29% (10 pairs)

Limited data points: Note that all the previous results are generated by assuming that the data are available at every boundary grid point for using finite difference methods to generate data functions \(\varphi ^{\omega }\), which may not be practical. Indeed, according to DOT experiments, see [34] for example, even if a camera may receive the light at every point, the light sources can be placed on only a few points on the boundary. So we shall explore this issue by limiting the data points where the Neumann data \(g_{\omega }\) are available. In particular, we consider the case that there are \(4\,L\), \(L=8,16\), data points on the boundary with L points equally distributed on each side of the domain. The data \(g_{\omega }\) are assumed to be obtained at these points which are then linearly interpolated to generate functions on the boundary. These functions are then used to generate the corresponding Dirichlet data functions \(f_{\omega }\) available at all the boundary mesh points. Then the interpolated Neumann data and the simulated Dirichlet date are used to generate data functions \(\varphi ^{\omega }\) as the input to the same DNN used above, and the reconstruction results are presented in Fig.   for the first case in Fig. 4. As we can see, the three relatively larger ellipses are reconstructed quite satisfactory even with \(10\%\) noise. However, the result for the small ellipse is not as good as the others. We guess it may be due to its small size that receives too little light for passing its geometric information to the boundary. Despite its inaccurate shape, the algorithm still tells us that an inclusion exists around the upper-left corner.

Fig. 10
figure 10

DDSM reconstruction for random ellipses with limited data points and \(N=20\)

Reconstruction for different \(\mu \): Now we demonstrate that the proposed algorithm can be used to obtain reasonable reconstruction even though the true absorption coefficient values are much different from those used to generate training data. Here we also use the first case in Fig. 4 of the scenario 2 as an example. But we generate the boundary data with two different groups of absorption coefficient values \((\mu _0,\mu _1)=(0,200)\) and (1, 100) where the background and inclusion coefficient values may both vary. The same DNN is used predict the inclusion geometry of which the results are presented in Fig. . Although the reconstruction is not as good as the one shown in Fig. 4, we can see that all the four ellipses are clearly captured and the reconstruction is still stable with respect to noise. We highlight that this feature is very important for the practical clinical situations as the material property of patients’ tumors may vary and may not be known accurately. In spite of this, the proposed algorithm still has the potential to detect the appearance and shapes of the tumors to certain extend.

Fig. 11
figure 11

CNN-DDSM reconstruction for different \(\mu \): \((\mu _0, \mu ) = (0, 200)\) (left three columns) and (1, 100) (right three columns)

In addition, we also apply the trained DNNs to a non-piecewise-constant function \(\mu \) which has a Gaussian distribution here, see Fig.  for the reconstruction. In this case, the inclusion boundary can be understood as a diffusion interface. Note that the training data set used here is still generated by \(\mu \) as piecewise constants. Indeed, we can still observe quite reasonable reconstructions of the location.

Fig. 12
figure 12

DDSM reconstruction for \(\mu \) with a random Gaussian distribution

Comparison of DSM and DDSM: we also present a group of numerical examples to compare the classical DSM and the proposed DDSM. The classical DSM is implemented through the formula (2.8), where \(\varphi (x)\) is obtained by solving (2.6) with \(s=1\). The numerical results are reported in Fig. . Specifically, we have implemented the formula with the boundary data generated from various frequencies. It can be certainly observed that the proposed DDSM can generate sharper reconstruction than the classical DSM with a single data pair. Of course, with multiple data pairs, the reconstruction can be improved a lot. Note that the classical DSM may not handle multiple data pairs systematically. In addition, we also present the plots of the normalized \(\varphi \) which do not show any reasonable information. Thus, the process by CNN is critical to dig the hidden information to produce accurate reconstructions.

Fig. 13
figure 13

Comparison of DSM reconstruction and DDSM reconstruction for 3 circles. The MSE of the results of DSM with frequency \(\cos \theta \), \(\cos 2\theta \), \(\cos 4\theta \), \(\cos 8\theta \) are 0.0512, 0.0532, 0.0502 and 0.0487. The MSE of the results of DDSM with 1 and 10 data pairs are 0.0429 and 0.0161, respectively. The last row is the normalized \(\varphi \), i.e., \(\varphi /(\Vert \varphi \Vert _{L^{\infty }(\Omega )} + |\varphi ({\textbf{x}})| )\)

4.3 3D Reconstruction

In this subsection, we apply the DDSM to 3D DOT problems. We consider a cubic domain \(\Omega =(-1,1)\times (-1,1) \times (-1,1)\), and two ellipsoids with the level-set functions \(\Gamma _i(x_1,x_2,x_3)\), \(i=1,2\), which have the axis length, rotation angles and center points sampled from \(\mathcal {U}(0.4,0.6)\), \(\mathcal {U}(0, 2\pi )\) and \(\left[ \mathcal {U}(-0.4,0.4) \right] ^3\), respectively. Similar to (4.1), we let the inclusions be generated by the following function involving the random variables described above

$$\begin{aligned} \Gamma (x_1,x_2,x_3) = \min _{i=1,2} \{ \Gamma (x_1,x_2,x_3) \}. \end{aligned}$$
(4.4)

In the 3D case, we employ the first 9 spheric harmonic functions below and map them to the surface of the domain

$$\begin{aligned} \begin{aligned}&\Re Y^m_l(\theta ,\phi ) = \sqrt{ \frac{2l+1}{4\pi } \frac{(l-m)!}{(l+m)!} } P^m_l(\cos (\theta )) \sin (m \phi ) \\&\Im Y^m_l(\theta ,\phi ) = \sqrt{ \frac{2l+1}{4\pi } \frac{(l-m)!}{(l+m)!} } P^m_l(\cos (\theta )) \cos (m \phi ) \end{aligned} \end{aligned}$$
(4.5)

to generate the flux boundary data, where \(P^m_l(z)\) are the Legendre polynomials with \(l=0,1,2\), \(0\le m \le l\), of which some examples are plotted in Fig. . The other setting-ups are similar to the 2D case.

Fig. 14
figure 14

Harmonic functions on the surface of \(\Omega \)

Fig. 15
figure 15

Reconstruction for 2 cases with different Cauchy data number and noise level: Case 1(top) and Case 2(bottom)

The reconstruction results of two typical cases are presented in Fig. , where the two ellipsoids are glued together in the first case and separated in the second one. To show the reconstructed inclusion distribution, we employ the 3D density plots: the red, blue and mesh surface are corresponding to the isosurface with the values 0.75, 0.06 and 0.025, respectively, which is the first row of each case. In addition, we also plot some cross sections which are given in the second row of each case. As shown by all these plots, the reconstruction results are quite accurate and also robust with respect to large noise. Note that for 3D DOT problems, solving 3D forward problems for iterative methods can be highly expensive. Given the efficiency, accuracy and robustness of the proposed DDSM, we believe it can be very attractive for further real-world applications.

Fig. 16
figure 16

Data points on two faces of the cubic domain

For the second case, we also test the performance of the DDSM for that the boundary data are only available at a few points not every grid point. Specifically, as only the low-frequency data are used for the 3D examples, we assume there are only 9 points evenly distributed on each face of the domain, see Fig.   for an illustration, then there are only 26 points distributed on the boundary. Similar to the 2D case, linear interpolation is used to generate data functions on each face. The reconstruction results are presented in Fig. , where we can see the shape can be recovered quite well even though the data is extremely limited.

Fig. 17
figure 17

Reconstruction with limited data points on the boundary

Moreover, we apply the DNN to an inclusion which has a duck shape that is not in the scope of the training data set, i.e., it is certainly not a union of two ellipsoids (Fig. ). In this case, we can still observe that the reconstruction can capture the basic duck shape which is still robust with respect to the large noise. But the geometry of the beak is lost as the training data set is only sampled from two ellipsoids. It can be expected that more detailed geometry can not be recovered accurately. However, one can still add more inclusion samples with various geometric randomness to the training data set in order to attain better accuracy (Fig. ).

Fig. 18
figure 18

Reconstruction for a duck-shaped inclusion with different noise level

5 Conclusions

Inspired by the DSM [15], this paper proposes a novel deep direct sampling method for the DOT problem, where the neural network architecture is designed to approximate the index functional. Hybridizing the DSM and CNN has several advantages. First, the index functional approximated by the CNN is capable of systematically incorporating multiple Cauchy data pairs, which can improve the reconstruction quality and the algorithmic robustness against the noise. Second, once the neural network is well trained, the data-driven index functional can be executed efficiently, which inherits the main benefit of the conventional DSM. Third, the DDSM can successfully handle challenging cases such as limited data and 3D reconstruction problems, which shows great potential in its practical application. Various numerical experiments justify these findings. Hence, we believe the proposed DDSM provides an efficient technique and a new promising direction for solving the DOT problem.