Networks for Nonlinear Diffusion Problems in Imaging
 737 Downloads
Abstract
A multitude of imaging and vision tasks have seen recently a major transformation by deep learning methods and in particular by the application of convolutional neural networks. These methods achieve impressive results, even for applications where it is not apparent that convolutions are suited to capture the underlying physics. In this work, we develop a network architecture based on nonlinear diffusion processes, named DiffNet. By design, we obtain a nonlinear network architecture that is well suited for diffusionrelated problems in imaging. Furthermore, the performed updates are explicit, by which we obtain better interpretability and generalisability compared to classical convolutional neural network architectures. The performance of DiffNet is tested on the inverse problem of nonlinear diffusion with the Perona–Malik filter on the STL10 image dataset. We obtain competitive results to the established UNet architecture, with a fraction of parameters and necessary training data.
Keywords
Neural networks Deep learning Partial differential equations Nonlinear diffusion Image flow Nonlinear inverse problems1 Introduction
We are currently undergoing a paradigm shift in imaging and vision tasks from classical analytic to learning and databased methods. In particular, this shift is driven by deep learning and the application of convolutional neural networks (CNN). Whereas highly superior results are obtained, interpretability and analysis of the involved processes is a challenging and ongoing task [20, 24].

how and to what extent can learned models replace physical models?

how do learned models depend on training protocols and how well do they generalise?

what are appropriate architectures for the learned models, what is the size of the parameter set \(\varTheta \) that needs to be learned, and how can these be interpreted?
Motivated by the success of these analytic methods to imaging problems in the past, we propose to combine physical models with datadriven methods to formulate network architectures for solving both forward and inverse problems that take the underlying physics into account. We limit ourselves to the case where the physical model is of diffusion type, although more general models could be considered in the future. The leading incentive is given by the observation that the underlying processes in a neural network do not need to be limited to convolutions.
Similar ideas of combining partial differential equations with deep learning have been considered earlier. For instance, this is done by learning of a PDE via optimal control [26], as well as deriving CNN architectures motivated by diffusion processes [6], deriving stable architectures by drawing connections to ordinary differential equations [11] and constraining CNNs [33] by the interpretation as a partial differential equation. ‘PDENET 2.0’ [27, 28] is a recent example of a network architecture designed to learn dynamic PDEs assumed to be of the form (1.3) where the function F is learned as a polynomial in convolution filters with appropriate vanishing moments. Another interpretation of our approach can be seen as introducing the imaging model into the network architecture; such approaches have led to a major improvement in reconstruction quality for tomographic problems [2, 14, 17].
This paper is structured as follows. In Sect. 2, we review some theoretical aspects of diffusion processes for imaging and the inversion based on theory of partial differential equations and differential operators. We formulate the underlying conjecture for our network architecture that the diffusion process can be inverted by a set of local nonstationary filters. In the following, we introduce the notion of continuum networks in Sect. 3 and formally define the underlying layer operator needed to formulate network architectures in a continuum setting. We draw connections to the established convolutional neural networks in our continuum setting. We then proceed to define the proposed layer operators for diffusion networks in Sect. 4 and derive an implementable architecture by discretising the involved differential operator. In particular, we derive a network architecture that is capable of reproducing inverse filtering with regularisation for the inversion of nonlinear diffusion processes. We examine the reconstruction quality of the proposed DiffNet in following Sect. 5 for an illustrative example of deconvolution and the challenging inverse problem of inverting nonlinear diffusion with the Perona–Malik filter. We achieve results that are competitive to popular CNN architectures with a fraction of the amount of parameters and training data. Furthermore, all computed components that are involved in the update process are interpretable and can be analysed empirically. In Sect. 6, we examine the generalisablity of the proposed network with respect to necessary training data. Additionally, we empirically analyse the obtained filters and test our underlying conjecture. Section 7 presents some conclusions and further ideas.
2 Diffusion and Flow Processes for Imaging
Remark 1
When considering bounded domains \(\varOmega \subset \mathbb {R}^d\), we will augment (2.1) with boundary conditions on \(\partial \varOmega \). We return to this point in Sect. 4.
Definition 1
Remark 2
2.1 Forward Solvers
It is also useful to look at the Green’s function’s solutions.
Lemma 1
Proof
2.2 Inverse Filtering
Let us now consider the inverse problem of reversing the diffusion process. That is we have \(u_T\) and aim to recover the initial condition \(u_0\). This is a typical illposed problem as we discuss in the following.
2.2.1 Isotropic Case \(\gamma \equiv 1\)
 (i)
The factor \(\mathrm{e}^{k^2 T} \) is unbounded, and hence, the equivalent convolution kernel in the spatial domain does not exist.
 (ii)
Equation (2.14) is unstable in the presence of even a small amount of additive noise, and hence, it has to be regularised in practice.
2.2.2 Anisotropic Case
Conjecture 1
Remark 3
For the presented problem of nonstationary nonlinear blind deconvolution/inverse filtering, we are not aware of any suitable classical methods. For a recent study that discusses backward diffusion, see [4].
2.3 Discretisation
We introduce the definition of a sparse matrix operator representing local nonstationary convolution
Definition 2
\(\mathsf {W}\) is called a Sparse SubDiagonal (SSD) matrix if its nonzero entries are all on subdiagonals corresponding to the local neighbourhood of pixels on its diagonal.
3 Continuum Networks
Motivated by the previous section, we aim to build network architectures based on diffusion processes. We first discuss the notion of (neural) networks in a continuum setting for which we introduce the concept of a continuum network as a mapping between function spaces. That is, given a function on a bounded domain \(\varOmega \subset \mathbb {R}^d\) with \(f\in L^p(\varOmega )\), we are interested in finding a nonlinear parametrised operator \(\mathcal {H}_\varTheta :L^p(\varOmega )\rightarrow L^p(\varOmega )\) acting on the function f. We will consider in the following the case \(p\in \{1,2\}\); extensions to other spaces depend on the involved operations and will be the subject of future studies.
We will proceed by defining the essential building blocks of a continuum network and thence to discuss specific choices to obtain a continuum version of the most common convolutional neural networks. Based on this, we will then introduce our proposed architecture as a diffusion network in the next chapter.
3.1 Formulating a Continuum Network
The essential building blocks of a deep neural network are obviously the several layers of neurons, but since these have a specific notion in classical neural networks, see, for instance, [35], we will not use the term of neurons to avoid confusion. We rather introduce the concept of layers and channels as the building blocks of a continuum network. In this construction, each layer consists of a set of functions on a product space and each function represents a channel.
Definition 3
(Layer and channels) For \(k\in \mathbb {N}_0\), let \(F_k=\{f^k_1,f^k_2,\cdots , f^k_I\}\) be a set of functions \(f^k_i\in L^p(\varOmega )\) for \(i\in \mathbb {I}=\{1,\dots ,I\}\), \(I\ge 1\). Then, we call: \(F_k\) the layer k with I channels and corresponding index set \(\mathbb {I}\).
The continuum network is then built by defining a relation or operation between layers. In the most general sense, we define the concept of a layer operator for this task.
Definition 4
3.2 Continuum Convolutional Networks
Let us now proceed to discuss a specific choice for the layer operator, namely convolutions. With this choice, we will obtain a continuum version of the widely used convolutional neural networks, which we will call here a continuum convolutional network, to avoid confusion with the established convolutional neural networks (CNN). We note that similar ideas have been addressed as well in [2].
Let us further consider linearly ordered network architectures that means each layer operator maps between consecutive layers. The essential layer operator for a continuum convolutional network is then given by the following definition.
Definition 5
If the layer operator does not include a nonlinearity, we write \(\mathcal {C}_{\varTheta ,{\mathrm{Id}}}\). Now, we can introduce the simplest convolutional network architecture by applying \(K\ge 1\) convolutional layer operators consecutively.
Definition 6
4 DiffNet: Discretisation and Implementation
In this section, we want to establish a layer operator based on the diffusion processes discussed in chapter 2. This means that we now interpret the layers \(F_k\) of the continuum network as time states of the function \(u: \varOmega \times \mathbb {R}_+ \rightarrow \mathbb {R}\), where u is a solution of the diffusion Eq. (2.1). In the following, we assume singlechannel networks, i.e. \(F_k=1\) for all layers. Then, we can associate each layer with the solution u such that \(F_k=u^{(k)}=u(x,t=t_k)\). To build a network architecture based on the continuum setting, we introduce the layer operator versions of (2.10), and (2.20):
Definition 7
Note that this formulation includes a learnable time step and hence the time instances that each layer represents changes. That also means that a stable step size is implicitly learned, if there are enough layers. In the following, we discuss a few options on the implementation of the above layer operator, depending on the type of diffusivity.
Remark 4
The assumption of a singlechannel network, i.e. \(F_k=1\) for all k, can be relaxed easily, either by assuming \(F_k=m\) for some \(m \in \mathbb {N}\) and all layers, or by introducing a channel mixing as in the convolutional operator (3.1).
As a natural application, we could consider RGB or hyperspectral images as a multichannel input. In that case, the filters would become a tensor representing both intra and interchannel mixing but still modelled as a diffusion process—see, for example, [9].
4.1 Discretisation of a Continuum Network
Let us briefly discuss some aspects on the discretisation of a continuum network; we first start with affine linear networks, such as the convolutional networks discussed in Sect. 3.2. Rather than discussing the computational implementation of a CNN, (see, for example, the comprehensive description in [10]), we concentrate instead on an algebraic matrix–vector formulation that serves our purposes.
For simplicity, we concentrate on the twodimensional \(d=2\) case here. Let us then assume that the functions \(f_i\) in each layer are represented as a square nbyn image and we denote the vectorised form as \(\mathbf f \in \mathbb {R}^{n^2}\). Then, any linear operation on \(\mathbf f \) can be represented by some matrix \(\mathsf {A}\); in particular, we can represent convolutions as a matrix.
4.2 Learned Forward and Inverse Operators
4.3 Formulating DiffNet
 (i)
Linear diffusion; spatially varying and possible time dependence, \(\gamma =\gamma (x,t)\).
 (ii))
Nonlinear diffusion; diffusivity depending on the solution u, \(\gamma =\gamma (u(x,t))\).
The linear case (i) corresponds to the diffusion layer operator (4.1) and is aimed to reproduce a linear diffusion process with fixed diffusivity. Thus, learning the meanfree filter suffices to capture the physics. The resulting network architecture is outlined in Fig. 1. Here, the learned filters can be directly interpreted as the diffusivity of layer k and are then applied to \(\mathbf F _{k1}\) to produce \(\mathbf F _k\).
In the nonlinear case (ii), we follow the same update structure, but now the filters are not learned explicitly; they are rather estimated from the input layer itself, as illustrated in Fig. 2. Furthermore, since this architecture is designed for inversion of the nonlinear diffusion process, we employ the generalised stencil \({\mathsf {W}}_{\delta t}(\zeta )\). Then, given layer \(\mathbf F _k\), the filters \(\zeta \) are estimated by a small CNN from \(\mathbf F _{k1}\), which are then applied following an explicit update as in (4.6) to produce \(\mathbf F _k\). Note that the diagonals in the update matrix are produced by the estimation CNN. We will refer to this nonlinear filtering architecture as the DiffNet under consideration in the rest of this study.
In contrast to classical CNN architectures, the proposed DiffNet is nonlinear by design, and hence, no additional nonlinearities are necessary. Compared to previous approaches, such as [6], we note that the parameters of the estimating CNN are not used to process the image directly, but rather to produce the filters \(\zeta \) for the update matrix only.
Comparing ’Diffnet’ to ’PDENET2.0’ [27], in the latter, the training assumes that full timeseries data u(x, t) are available, and the PDE is approximated by a forward Euler method with appropriate stability constraints. In our approach, only initial and final conditions are assumed to be known; the coefficients of the PDE are spatially varying, and both a forward and an inverse problem can be learned, and the latter requires regularisation which is learned simultaneously with the PDE coefficients.
4.3.1 Implementation
The essential part for the implementation of a diffusion network is to perform the update (4.6) with either \(\mathsf {L}_{\delta t}(\gamma )\) or \(\mathsf {W}_{\delta t}(\zeta )\). For computational reasons, it is not practical to build the sparse diagonal matrix and evaluate (4.6); we rather represent the filters \(\gamma \) and \(\zeta \) as an \(n\times n\)image and apply the filters as pointwise matrix–matrix multiplication to a shifted and cropped image, according to the position in the stencil. This way, the zero Neumann boundary condition (4.5) is also automatically incorporated.
For the linear diffusion network, we would need to learn the parameter set \(\varTheta \), consisting of filters and time steps, explicitly. This has the advantage of learning a global operation on the image where all parameters are interpretable, but it comes with a few disadvantages. First of all, in this form, we are limited to linear diffusion processes and a fixed image size. Furthermore, the parameters grow with the image size, i.e. for an image of size \(n\times n\), we need \(5 n^2\) parameters per layer. Thus, applications may be limited.
5 Computational Experiments
In the following, we will examine the reconstruction capabilities of the proposed DiffNet. The experiments are divided into a simple case of deconvolution, where we can examine the learned features and a more challenging problem of recovering an image from its nonlinear diffused and noisecorrupted version.
5.1 Deconvolution with DiffNet
The forward problem is given by (2.1) with zero Neumann boundary condition (4.5) and constant diffusivity \(\gamma \equiv 1\). For the experiment, we choose \(T=1\), which results in a small uniform blurring, as shown in Fig. 4. We remind that for the isotropic diffusion, the forward model is equivalent to convolution in space with the kernel \(G_{\sqrt{2T}}\), see (2.3). As it is also illustrated in Fig. 4, convolution in the spatial domain is equivalent to multiplication in Fourier domain. In particular, high frequencies get damped and the convolved image is dominated by low frequencies. Hence, for the reconstruction task without noise, we essentially need to recover the high frequencies.
The training and test data for DiffNet consist of simple discs of varying radius and contrast. The training set consists of 1024 samples and the test set of an additional 128, each of size \(64\times 64\). The network architecture is chosen following the schematic in Fig. 2, with three diffusion layers and a final projection to the positive numbers by a ReLU layer. The filter estimator is given by a fourlayer CNN, as described in Sect. 4.3.1. All networks were implemented in Python with TensorFlow [1].
The input to the network is given by the convolved image without noise, and we have minimised the \(\ell ^2\)loss of the output to the groundtruth image. The optimisation is performed for about 1000 epochs in batches of 16 with the Adam algorithm and initial learning rate of \(4\,\times \,10^{4}\) and a gradual decrease to \(10^{6}\). Training on a single Nvidia Titan Xp GPU takes about 24 min. The final training and test error are both at a PSNR of 86.24, which corresponds to a relative \(\ell ^2\)error of \(2.5\,\times \,10^{4}\). We remind that this experiment was performed without noise.
5.2 Nonlinear Diffusion
For the experiments, we have used the test data from the STL10 database [7], which consists of 100,000 RGB images with resolution \(96\times 96\). These images have been converted to grey scale and divided to 90,000 for training and 10,000 for testing. The obtained images were then diffused for four time steps with \(\delta t = 0.1\) and \(\lambda =0.2\). A few sample images from the test data with the result of the diffusion are displayed in Fig. 6. The task is then to revert the diffusion process with additional regularisation to deal with noise in the data.
For all experiments, we have used the same network architecture of DiffNet using the architecture as illustrated in Fig. 2. By performing initial tests on the inversion without noise, we have found that five diffusion layers with a fourlayer CNN, following the architecture in 3, gave the best tradeoff between reconstruction quality and network size. Increasing the amount of either layers led to minimal increase in performance. Additionally, we have used a ReLU layer at the end to enforce nonnegativity of the output, similarly to the last experiment. We emphasise that this architecture was used for all experiments and hence some improvements for the highnoise cases might be expected with more layers. All networks were trained for 18 epochs, with a batch size of 16, and \(\ell ^2\)loss. For the optimisation, we have used the Adam algorithm with initial learning rate of \(2\,\times \,10^{3}\) and a gradual decrease to \(4\,\times \,10^{6}\). Training on a single Nvidia Titan Xp GPU takes about 75 min.
6 Discussion
6.1 Generalisability
To test the generalisation properties of the proposed DiffNet, we have performed similar experiments as shown in Sect. 5.2 for nonlinear diffusion, but with increasing amounts of training data. Under the assumption that DiffNet learns a more explicit update than a classic CNN, we would expect also to require less training data to achieve a good test error. To certify this assumption, we have examined four settings of nonlinear diffusion with the Perona–Malik filter: learning the forward model, learning to reconstruct from the diffused image without noise, as well as with 0.1% and 1% noise. We then created training datasets of increasing size from just 10 samples up to the full size of 90,000. For all scenarios, we have trained DiffNet and UNet following the training protocol described in 5.2. Additionally, we have aborted the procedure when the networks started to clearly overfit the training data.
Results for the four scenarios are shown in Fig. 10. Most notably DiffNet outperforms UNet clearly for the forward problem and the noisefree inversion, by 4dB and 3dB, respectively. For the noisy cases, both networks perform very similar for the full training data size of 90,000. The biggest difference overall is that DiffNet achieves its maximum test error already with 500–1000 samples independent of the noise case, whereas the UNet test error saturates earlier with higher noise. In conclusion, we can say that for the noisy cases, both networks are very comparable in reconstruction quality, but for small amounts of data, the explicit nature of DiffNet is clearly superior.
6.2 Interpretation of Learned Filters
Since all updates are performed explicitly with the output from the filter estimation CNN, we can interpret some of the learned features. For this purpose, we show the filters for the ship image from Sect. 5.2 for the three inversion scenarios under consideration. In Fig. 11, we show the sum of all learned filters, i.e. \(\sum _{i=1}^4 \zeta _i  \zeta _5\). If the network would only learn the meanfree differentiating part, then these images should be zero. This implies that the illustrated filters in Fig. 11 can be related to the learned regularisation \(\mathsf {S}(\zeta )\). Additionally, we also show the diagonal filters \(\zeta _5\) in Fig. 12.
We would expect that with increasing noise level, the filters will incorporate more smoothing to deal with the noise; this implies that the edges get wider with increasing noise level. This can be nicely observed for the diagonal filters in Fig. 12. For the smoothing in Fig. 11, we see that the first layer consists of broader details and edges that are refined in the noisefree case for increasing layers. In the noisy case, the latter layers include some smooth features that might depict the regularisation necessary in the inversion procedure. It is generally interesting to observe that the final layer shows very fine local details, necessary to restore fine details for the final output.
7 Conclusions
In this paper, we have explored the possibility to establish novel network architectures based on physical models other than convolutions; in particular, we concentrated here on diffusion processes. As main contributions, we have introduced some nonlinear forward mappings, modelled through learning rather than just through PDEs or integral transforms. We have reviewed (regularised) inverse diffusion processes for inverting such maps. In particular, we have conjectured that these inverse diffusion processes can be represented by local nonstationary filters, which can be learned in a network architecture. More specific, these local filters can be represented by a sparse subdiagonal (SSD) matrix and hence efficiently used in the discrete setting of a neural network. We emphasise that even though we have concentrated this study on a specific structure for these SSD matrices based on diffusion, other (higher order) models can be considered.
We obtain higher interpretability of the network architecture, since the image processing is explicitly performed by the application of the SSD matrices. Consequently, this means that only a fraction of parameters is needed in comparison with classical CNN architectures to obtain similar reconstruction results. We believe that the presented framework and the proposed network architectures can be useful for learning physical models in the context of imaging and inverse problems, especially where a physical interpretation of the learned features is crucial to establish confidence in the imaging task.
Footnotes
 1.
Note the definition of Fourier Transform is chosen to give the correct normalisation so that \(\hat{u}(k)_{k=0} = \int _{{\mathrm {R}}^d} u(x) {\mathrm{d}}^n x.\)
Notes
Acknowledgements
Open access funding provided by University of Oulu including Oulu University Hospital. We thank Jonas Adler and Sebastian Lunz for valuable discussions and comments.
References
 1.Abadi, M., et al.: TensorFlow: largescale machine learning on heterogeneous systems. Software available from https://www.tensorflow.org/ (2015)
 2.Adler, J., Öktem, O.: Solving illposed inverse problems using iterative deep neural networks. Inverse Prob. 33(12), 124007 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
 3.Antholzer, S., Haltmeier, M., Schwab, J.: Deep learning for photoacoustic tomography from sparse data. Inverse Probl. Sci. Eng. 27, 987–1005 (2019)MathSciNetCrossRefzbMATHGoogle Scholar
 4.Bergerhoff, L., Cárdenas, M., Weickert, J., Welk, M.: Stable backward diffusion models that minimise convex energies. ArXiv preprint arXiv:1903.03491 (2019)
 5.Calvetti, D., Somersalo, E.: Hypermodels in the Bayesian imaging framework. Inverse Probl. 24, 034013 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 6.Chen, Y., Yu, W., Pock, T.: On learning optimized reaction diffusion processes for effective image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5261–5269 (2015)Google Scholar
 7.Coates, A., Ng, A., Lee, H.: An analysis of singlelayer networks in unsupervised feature learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 215–223 (2011)Google Scholar
 8.Douiri, A., Schweiger, M., Riley, J., Arridge, S.: Local diffusion regularization method for optical tomography reconstruction by using robust statistics. Opt. Lett. 30(18), 2439–2441 (2005)CrossRefGoogle Scholar
 9.Ehrhardt, M.J., Arridge, S.R.: Vectorvalued image processing by parallel level sets. IEEE Trans. Image Process. 23(1), 9–18 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press. http://www.deeplearningbook.org (2016)
 11.Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Probl. 34(1), 014004 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
 12.Hamilton, S.J., Hauptmann, A.: Deep dbar: real time electrical impedance tomography imaging with deep neural networks. IEEE Trans. Med. Imaging 37, 2367–2377 (2018) CrossRefGoogle Scholar
 13.Hamilton, S.J., Hauptmann, A., Siltanen, S.: A datadriven edgepreserving Dbar method for electrical impedance tomography. Inverse Probl. Imaging 8(4), 1053–1072 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 14.Hammernik, K., Klatzer, T., Kobler, E., Recht, M.P., Sodickson, D.K., Pock, T., Knoll, F.: Learning a variational network for reconstruction of accelerated MRI data. Magn. Reson. Med. 79(6), 3055–3071 (2018)CrossRefGoogle Scholar
 15.Hannukainen, A., Harhanen, L., Hyvönen, N., Majander, H.: Edgepromoting reconstruction of absorption and diffusivity in optical tomography. Inverse Probl. 32(1), 015008 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Hauptmann, A., Arridge, S., Lucka, F., Muthurangu, V., Steeden, J.: Realtime cardiovascular MR with spatiotemporal artifact suppression using deep learningproof of concept in congenital heart disease. Magn. Reson. Med. 81, 1143–1156 (2019)CrossRefGoogle Scholar
 17.Hauptmann, A., Lucka, F., Betcke, M., Huynh, N., Adler, J., Cox, B., Beard, P., Ourselin, S., Arridge, S.: Modelbased learning for accelerated, limitedview 3d photoacoustic tomography. IEEE Trans. Med. Imaging 37(6), 1382–1393 (2018)CrossRefGoogle Scholar
 18.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
 19.Helin, T., Lassas, M.: Hierarchical models in statistical inverse problems and the Mumford–Shah functional. Inverse Probl. 27(1), 015008 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 20.Hofer, C., Kwitt, R., Niethammer, M., Uhl, A.: Deep learning with topological signatures. In: Advances in Neural Information Processing Systems, pp. 1634–1644 (2017)Google Scholar
 21.Jin, K.H., McCann, M.T., Froustey, E., Unser, M.: Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 26(9), 4509–4522 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
 22.Kang, E., Min, J., Ye, J.C.: A deep convolutional neural network using directional wavelets for lowdose Xray CT reconstruction. Med. Phys. 44(10), e360–e375 (2017) CrossRefGoogle Scholar
 23.Khoo, Y., Lu, J., Ying, L.: Solving parametric PDE problems with artificial neural networks. arxiv:1707.03351v2 (2017)
 24.Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., et al.: Interpretability beyond feature attribution: quantitative testing with concept activation vectors (tcav). In: International Conference on Machine Learning, pp. 2673–2682 (2018)Google Scholar
 25.Kimmel, R.: Numerical Geometry of Images: Theory, Algorithms, and Applications. Springer, Berlin (2003)zbMATHGoogle Scholar
 26.Liu, R., Lin, Z., Zhang, W., Su, Z.: Learning pdes for image restoration via optimal control. In: European Conference on Computer Vision, pp. 115–128. Springer (2010)Google Scholar
 27.Long, Z., Lu, Y., Dong, B.: Pdenet 2.0: learning pdes from data with a numericsymbolic hybrid deep network. ArXiv preprint arXiv:1812.04426 (2018)
 28.Long, Z., Lu, Y., Ma, X., Dong, B.: Pdenet: Learning pdes from data. In: Proceedings of the 35th International Conference on Machine Learning (ICML 2018) (2018)Google Scholar
 29.Meinhardt, T., Moeller, M., Hazirbad, C., Cremers, D.: Learning proximal operators: using denoising networks for regularizing inverse imaging problems. In: International Conference on Computer Vision, pp. 1781–1790 (2017)Google Scholar
 30.Perona, P., Malik, J.: Scalespace and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)CrossRefGoogle Scholar
 31.Raissi, M., Karniadakis, G.E.: Hidden physics models: machine learning of nonlinear partial differential equations. arxiv:1708.00588v2 (2017)
 32.Ronneberger, O., Fischer, P., Brox, T.: Unet: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 234–241. Springer (2015)Google Scholar
 33.Ruthotto, L., Haber, E.: Deep neural networks motivated by partial differential equations. ArXiv preprint arXiv:1804.04272 (2018)
 34.Sapiro, G.: Geometric Partial Differential Equations and Image Analysis. Cambridge University Press, Cambridge (2006)zbMATHGoogle Scholar
 35.ShalevShwartz, S., BenDavid, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)CrossRefzbMATHGoogle Scholar
 36.Sirignano, J., Spiliopoulos, K.: DGM: A deep learning algorithm for solving partial differential equations. arxiv:1708.07469v1 (2017)
 37.Tompson, J., Schlachter, K., Sprechmann, P., Perlin, K.: Accelerating Eulerian fluid simulation with convolutional networks. arxiv:1607.03597v6 (2017)
 38.Weickert, J.: Anisotropic Diffusion in Image Processing. Teubner, Stuttgart (1998)zbMATHGoogle Scholar
 39.Weickert, J., Romeny, B.T.H., Viergever, M.A.: Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans. Image Process. 7(3), 398–410 (1998)CrossRefGoogle Scholar
 40.Weinan, E., Jiequn, H., Arnulf, J.: Deep learningbased numerical methods for highdimensional parabolic partial differential equations and backward stochastic differential equations. arxiv:1706.04702v1 (2017)
 41.Wu, Y., Zhang, P., Shen, H., , Zhai, H.: Visualizing neural network developing perturbation theory. arxiv:1802.03930v2 (2018)
 42.Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems, pp. 1790–1798 (2014)Google Scholar
 43.Ye, J.C., Han, Y., Cha, E.: Deep convolutional framelets: a general deep learning framework for inverse problems. SIAM J. Imaging Sci. 11(2), 991–1048 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
 44.Zhu, B., Liu, J.Z., Cauley, S.F., Rosen, B.R., Rosen, M.S.: Image reconstruction by domaintransform manifold learning. Nature 555, 487–489 (2018)CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.