Real-time super-resolution mapping of locally anisotropic grain orientations for ultrasonic non-destructive evaluation of crystalline material

Estimating the spatially varying microstructures of heterogeneous and locally anisotropic media non-destructively is necessary for the accurate detection of flaws and reliable monitoring of manufacturing processes. Conventional algorithms used for solving this inverse problem come with significant computational cost, particularly in the case of high dimensional non-linear tomographic problems. In this paper, we propose a framework which uses deep neural networks (DNNs) with full aperture, pitch-catch and pulse-echo transducer configurations to reconstruct material maps of crystallographic orientation. We also present the first ever application of generative adversarial networks (GANs) to achieve super resolution of ultrasonic tomographic images, providing a factor-four increase in image resolution and up to a 50% increase in structural similarity. The importance of including appropriate prior knowledge in the GAN training dataset to increase inversion accuracy is highlighted; known information about the material's structure should be present in the training data. We show that after a computationally expensive training process, the DNNs and GANs can be used in less that one second (0.9 seconds on a standard desktop computer) to provide a high resolution map of the material's grain orientations.


Introduction
Ultrasonic non-destructive evaluation (NDE) is widely used across a number of industries including aerospace, nuclear, and oil and gas. The technique involves the generation, transmission and reception of high-frequency mechanical waves through a component [10]. An image of the component's interior is then generated via post processing of this data to aid in the detection of any internal defects [31]. Conventional ultrasonic imaging algorithms within NDE typically assume that the material that is being inspected is isotropic and homogeneous. However, metals can develop locally anisotropic and heterogeneous microstructures, particularly when they are subjected to extreme thermal cycles, such as those present in welding and additive manufacturing processes [15,37,45].
Conventional ultrasonic imaging algorithms which assume homogeneity or isotropy can fail to focus the energy correctly in the image domain in such cases and are therefore unreliable [33,41,50].
Algorithms which incorporate a priori information about a material's spatially varying properties significantly improve the accuracy of defect characterisation [41].
In recent years, much effort has been expended on generating material property maps nondestructively using tomographic inversion, where material properties such as wave speed, or microstructural descriptors such as grain orientation, are estimated from the scattered wave field data recorded at the surface of an object. A wide range of advanced tomographic algorithms are used across geophysics [1,2,11,29,43,51,53], bio-medicine [16] and NDE [13,27,41,42]. A common approach is to use iterative methods to improve the fit of the measured data to forward modelled data which depend on an estimate of the material map. They sample potential material maps from some multi-dimensional parameter space, solve a forward problem for each new material property map, and update the estimated map to improve the data fit [27]. In the case of probabilistic sampling frameworks (for example, those built around Markov chain Monte Carlo methods [42,52]), there is the added benefit of extracting uncertainty information on the parameter estimates, facilitating valuable uncertainty quantification studies. Although these algorithms have demonstrated impressive results in reconstructing wave speed and grain orientation maps, they are computationally demanding, often requiring the storage of large sample sets and compute times of several hours to several weeks. This poses a problem for the NDE community, where there is an increasing demand for the monitoring of dynamical processes employed during manufacturing, for example in welding and additive manufacturing processes [23,24], and so it is desirable to carry out inspection in real-time.
Machine learning shows strong potential to solve material characterisation inverse problems rapidly [17]. Specifically, we focus on the use of deep neural networks (DNNs), which can approximate any non-linear relationship between two parameter spaces, given a sufficiently large set of training data (pairs of dependent and corresponding independent parameters [8]). The training of a DNN is computationally expensive. However, the training process is only performed once prior to using a DNN, and a trained network can be used effectively in real time without the need for high-performance computing.
However, DNNs have not yet been implemented for tomographic reconstruction of anisotropic material properties. Although various deep learning algorithms have been used to solve inverse problems in NDE, for example, to predict material fatigue behaviour [3], to augment ultrasonic data [44], and for ultrasonic crack characterisation [35] and crack detection using image recognition [14,21], the use of DNNs for tomography has yet to be explored in this context.
In addition to DNNs, generative adversarial networks (GANs) have more recently been applied to various computer vision tasks, including achieving super-resolution with upscaling by up to a factor of four [30], colourisation [19], segmentation and labelling [22]. This family of algorithms has strong potential to improve image resolution, and has been used increasingly in remote sensing [25] and X-ray tomography [49], however there has been no application of GANs in NDE to produce ultrasonic tomographic images.
In this paper, we present the first DNN framework for rapid, non-linear two-dimensional tomography of heterogeneous and locally anisotropic materials. The datasets used for the tomographic inversion are the arrival times of ultrasonic waves which have been transmitted and received by an array of sensors on the exterior of the component. The examples shown are inspired by the NDE of polycrystalline materials but the methodology should naturally extend to other domains, for example imaging anisotropic fibrous tissue [18,20] or the Earth's subsurface [54]. We compare the network's performance for a range of transducer configurations, model textures and different types of simulated ultrasonic testing data (i.e. we move beyond inverse crime scenarios). A novel GAN-based method for post-processing ultrasound tomographic images to achieve super-resolution with a four-fold upscaling factor is presented, achieving up to 50% improvement using structural similarity metrics. We define the term super-resolution in the context of image processing, as reconstructing images below the original lengthscale. This is different to an alternative definition often used in physical acoustics, which is to image below the wavelength in the data.

Method
We employ model-driven deep learning, where a large dataset of simulated material maps and corresponding travel time measurements are used to train a DNN and hence solve the tomographic inverse problem. The forward modelling problem can be denoted as where f is a forward mechanical wave modeling operator, m is a material model, s contains the locations of the elements in the ultrasonic transducer array and T m is the time-of-flight (ToF) matrix between every pair of array elements. Within each database used for network training, the transducer configuration s is fixed and therefore s is omitted in the notation for the ToF matrix T m . We use deep learning to obtain (or learn) an approximation of f −1 , which maps the measured data T m to a material map m (i.e., DN N ≈ f −1 ). In this study, the training data consists of twodimensional material models with spatially varying crystal orientations θ(x, y) and the travel time matrix T m corresponing to each one. To generate models in such a way that the distribution of orientations are randomly assigned but still exhibit some structural correlation, an initial random Voronoi tessellation [39] with 30 seeds (a set of two-dimensional Cartesian coordinates lying within the domain of interest) is computed and an orientation θ between 0 • and 45 • is randomly assigned to each of the 30 resulting Voronoi regions or cells (Fig.1a). We consider only in-plane crystal rotation, and therefore the orientation θ relates to the orientation of a slowness curve in each cell. This slowness curve plots the reciprocal of velocity in the crystal over a range of incident wave directions [42]. The material models used in the training data {m 16 , T m16 } are generated by discretising the Voronoi tessellation into a regularly spaced 16 × 16 grid and smoothing with a Gaussian kernel

Forward Model Approaches
Following model parameterisation, an efficient forward model is required for computing the timeof-flight matrix T m (Fig.1d) corresponding to a grain orientation model m 16 for each sourcereceiver pair. We take two approaches: a semi-analytic model using an anisotropic multi-stencil fast marching method (AMSFMM) algorithm from [42], denoted as f FMM , and a finite element analysis (FEA) method, denoted as f FEA . The AMSFMM incorporates the effects of ray bending due to variations in locally anisotropic grain orientations, and models the travel-time field by solving the Eikonal equation using an upwind finite difference scheme [36,40,42]. This allows the calculation of the shortest travel time between transmitter and receiver locations and the matrix T m FMM can be constructed (that is T m FMM = f FMM (m 16 )). As wave reflections are not incorporated into the AMSFMM, a different approach is required for the pulse-echo transducer array configuration.
In this case, the time of flight between the transmitter and receiver is calculated by the summation

Deep neural network for orientation mapping
Deep neural networks (DNNs) are mathematical mappings that emulate the relationship between two parameter spaces [17]. Here, we seek a map between the grain orientation models m 16  We configure three DNNs (corresponding to three transducer configurations), each with five fully connected layers (illustrated in Fig. 2), using sigmoid activation functions. The final output layer contains a single node corresponding to the orientation of a single pixel in the imaging domain. Therefore, following the approach of [17], a separate network is trained for each pixel, so for a 16 × 16 resolution image a total of 256 networks are trained. The networks are trained using the Adam optimisation algorithm [28]. A description of network hyper-parameters is provided in Appendix 6.2. These hyper-parameters are selected using a stochastic optimisation library [6] for each network architecture corresponding to different transducer configurations. We use a mean-squared-error (MSE) loss function, given by: where m true 16 and m pred 16 are the true and predicted grain orientation models, i denotes the pixel index and N is the total number of pixels (for models m 16 , N = 256). A validation data set is created using 20% of the training data. To avoid over-fitting the network to the training data, the cost function is periodically evaluated over the validation data set, and we implement an early stopping algorithm so that training stops once the validation loss stops decreasing (with a patience of 10 iterations

Generative adversarial networks for super resolution
Conditional GANs learn a mapping between two images [22] and so can be used for post-processing of the DNN tomography output (m pred 16 ) to increase resolution and accuracy. The GAN architecture, as illustrated in Figure 3(a), consists of two separate trainable networks: a generator (GAN G ) Here, the generator is a modified U-net [38] based on fully convolutional layers (see the Appendix for network architecture). The discriminator takes the output of the generator m G 64 , as well as the known 64x64 high resolution image (m true 64 ) that was used to generate the ToF data, and predicts which image is generated (fake) and which is part of the training data (real). The accuracy of the discriminator prediction can then be established. These competing networks are then trained against each other; in each iteration of training, the accuracy of the discriminator is fed into the loss function of the generator network. The generator seeks to create images m G 64 that decrease the discriminator accuracy meaning that m G 64 cannot be discriminated from the reference training data m true 64 . Following the training process, the generator can be used to map from 16x16 images to 64x64 resolution images.

DNN Results
Following the training of the fully connected DNN, we predict material maps m pred 16 using the three transducer array configurations shown in Figure 1 following where T m FMM is test data which has not been used in the network training process. The test data are generated following the same protocol as for the training data, using smoothed Voronoi models m 16 and the AMSFMM algorithm to generate a total of 200 test models and data. Comparisons of the true models m true 16 with the predicted models m So far, the same mathematical model has been used for both the training data and the test data (a so called inverse crime [47]), and this is not a sufficient challenge of the methodology [26]. We therefore now use a different mathematical model to test the trained DNN. One further additional challenge is to generate material maps using a different method from that used in the training data; so not originating from Voronoi diagrams. The material maps in Figure 6(a) show a range

GAN Results
Three GANs are trained using the layered, 6-seed Voronoi and 30-seed Voronoi models m true 64 , and 200 additional models per GAN are used for testing, of which 5 are shown in Figures 7(a) Figure 10.
For the 5 layer models (Fig. 7), the GAN predictions are significantly more accurate com-  (Fig. 7b). The GAN also performs well for the 6-seed Voronoi tessellation models (Fig. 8), where reconstructed grain orientation maps from the GAN exhibit discontinuous, piecewise constant orientations for each grain. The GAN improves MAE and SSIM in all cases, however there is slight blurring across some grain boundaries. The GAN results for the 30-seed Voronoi tessellation models (Fig. 9) exhibit stronger blurring across grain boundaries. While the GAN prediction are texturally more similar to the true models (piecewise constant and discontinuous regions), the distributions of ∆M AE and ∆SSIM in Figure 10 show the GAN offers only marginal improvements in reconstruction accuracy, and in some cases the accuracy decreases when using the GAN (∆M AE > 0 and ∆SSIM < 0). The difference between the 6-seed and 30-seed Voronoi models is in the model complexity due to smaller individual grains in the 30-seed models.
In these models, multiple grains can fit into a single pixel of a low resolution DNN tomography image, resulting in a loss of spatial information that the GAN cannot fully recover. These results show that a GAN can be used for post-processing tomography results to improve reconstruction accuracy and image resolution, particularly when prior information regarding the spatial distri-   bution of the material map is known (e.g., if the sample is known to be layered, or similarly well-structured) and the spatial distribution is simple.

Discussion
The framework presented includes several stages: (1) the generation of training data using the AMSFMM method, (2) training of the DNN, and (3) training of the GAN. However, each of these stages only need be performed once. Thereafter, the DNN and GAN can be used in effectively real time (< 1 second). Here, the time for generating 7500 ToF matrices T FMM m was approximately 1 hour, for training the DNN was approximately 40 minutes (until convergence), and for training the GAN was approximately 8 hours (using Google Colab GPUs [9]). It is clear that when repeated material map reconstructions are desired, as is the case for NDE monitoring purposes, the deep learning framework excels in its ability to provide real time results.
The benefits of real time inversions comes at the expense of a few limitations that are yet to be overcome in the current work. Firstly, the DNN is trained with a constant transducer configuration, so a trained DNN cannot be generally extended to changes in relative transducer locations. This is not a problem for many applications in NDE, as the transducer arrays are rigid and fixed, and the test sample geometries do not change through time. However, limited network flexibility may be problematic in cases where the configuration changes, such as in-process monitoring of additive manufacturing: during the building process, the shape of the sample changes therefore the distribution of transducer elements also changes. One solution is to train many DNNs for all the possible transducer configurations throughout the building process, however this would require a significantly expensive training process. Another solution, proposed in [17], is to train more flexible networks that account for missing data by augmenting the training data set with additional input samples taken from additional transducer locations. Travel times in the ToF matrix can be set to zero to indicate a transducer is not used for a particular transducer configuration, and then the trained network can invert using multiple configurations.
The GAN is also limited in its applicability. This is highlighted when a trained GAN is used to invert for textures that are dissimilar to those found in the training data. This can be seen in Training a GAN with a much broader training data set, for example including all of the layered, 6-seed and 30-seed Voronoi models in the same training data set would allow for more general application of the GAN where less prior knowledge of the material is known. We leave this for future work.
Where real time inversions are not required, more computationally expensive tomography algorithms can be implemented. Algorithms such as the rj-MCMC [41] offer more information including an estimate of the uncertainty of the tomography results. A place for rapid deep learning-based tomography still exists within this framework as it can provide a fast, coarse initial model which can be used a a starting point for more sophisticated algorithms. Additionally a GAN can be used in post-processing any tomographic image. Often linearised image methods are often regularised and hence predict smoother structures than are expected to exist in the true medium, therefore a GAN can be trained to upscale resolution and sharpen these images. Even where the GAN provides marginal improvements to the DNN tomography results, the GAN output models exhibit discontinuous boundaries. It can be important that such boundaries are present in tomography algorithms where entire waveforms are modelled and matched to the recorded waveforms (that is, full waveform inversion [43]). A GAN might also be extended to take the full waveform as an Figure 11: (a) True high resolution (64x64) grain orientation maps m true 64 , (b) 16x16 resolution DNN tomography output using AMSFMM generated data (m pred 16 ), and (c) 64x64 GAN output m G 64 . For row (b), the MAE and SSIM are calculated on an upscaled image to 64 × 64 resolution using nearest neighbour interpolation. It can be seen that the similar MAE and SSIM values result for the Voronoi diagram arise since this type of texture was used in the training data of the DNN and GAN. However, the GAN performs significantly worse in the cases where the material texture is not part of the training data in columns 2,3 and 4.
input, though this would require expensive FEA modelling to generate the training data, so that all internal reflections are modelled.

Conclusion
We present a deep learning based framework for the real time tomographic reconstruction of spatially varying crystal orientations in locally anisotropic media using ultrasonic array time-offlight data. We train a series of deep neural networks (DNNs) using 7500 models in a training data set, to accurately reconstruct orientation maps using full aperture, pitch-catch and pulse-echo transducer array configurations. We present the first application of generative adversarial networks (GANs) on ultrasonic tomographic data, where a series of GANs are trained with three sets of training data exhibiting increasing levels of complexity in the model textures. The GAN takes the low resolution DNN output and upscales the resolution by a factor of four. We show that prior information used to create the training data for both the DNN and the GAN are important factors in providing accurate estimations of the orientation maps. Using the methods presented unlocks a wide range of potential applications for ultrasonic monitoring, allowing for faster and more accurate detection of flaws and in-process inspection during manufacturing.

Finite element analysis
We implement a finite element simulation of elastic wave propagation in anisotropic media using OnScale [34]. We apply absorbing boundary conditions on all sides of the domain so energy continues past boundaries with no reflections. We use Ricker wavelets with central frequencies of 1 MHz as the source-time function, and apply pressure loads following the full aperture transducer array configuration as shown in Figure 1(d). The values for the finite element node spacing (∆x, ∆y) are selected to ensure spatial stability conditions following ∆x, ∆y = λ 15 , where λ is the shortest wavelength in the domain.
Following the simulation for each transmitting array element, the travel time to each receiving transducer is automatically picked by selecting the time for arriving energy to increase above a threshold. This threshold is taken to be 2% of the peak displacement in the recorded signal.

Network Architectures
The deep neural networks (DNNs) are trained using 5 layers, where each node receives an input from every node in the previous layer and a sigmoidal activation function. The number of nodes in each layer are shown in Table 1 Table 1: Network configurations showing the number of nodes for each layer including the three hidden layers (L1-L3) for the full aperture, pitch-catch and pulse-echo transducer array configurations.
The GAN generator is a modified U-Net based on [22] consisting of an encoder-decoder chain.
Each block in the encoder is a convolution-batch normalisation-leaky rectified linear unit (ReLu) activation sequence. Each block in the decoder is a transposed convolution -batch normalisation-ReLu sequence with skip connections between mirrored layers in the encoder and decoder stacks [38] (as shown in Fig. 12a). All convolutional layers use a kernel size of 4. The generator loss is the discriminator sigmoid cross entropy loss of the generates image with an array of ones combined with the mean absolute error between the generated and known target image.
The GAN discriminator (Fig 12b) follows a PatchGAN architecture [22], which divides the image into smaller 30x30 patches and the discriminator tries to classify each patch separately.
This motivates the GAN to discriminate high frequency structure. The discriminator receives the

Structural Similarity Index Measure (SSIM)
We use the SSIM described by [46] for image comparison. The SSIM is defined as a weighted combination of comparisons between image luminance l(X, Y ), contrast c(X, Y ) and structure s(X, Y ), where X and Y describe an image window in known and estimated images of size N × N .
The SSIM is therefore where α, β and γ are the weighting parameters. We use α = β = γ = 1. Luminance, contrast and structure are calculated as c(X, Y ) = 2σ X σ Y + C 2 σ 2 s(X, Y ) = σ XY + C 3 σ X σ Y + C 3 where µ and σ are the mean and variance of the windows X or Y and σ XY is the covariance of X and Y. This is computed over a sliding Gaussian window of 9 × 9.

Acknowledgments
This work was funded by the Engineering and Physical Sciences Research Council (UK): grant number EP/P005268/1.

Data Availability
The data and Python scripts required to reproduce these findings are available at: https://github.com/jonnyrsingh/DeepLearningAnisoTomo, which can be executed within Google colabotory. This requires no additional software or downloads for the user.