Learning-Based and Quality Preserving Super-Resolution of Noisy Images

Several applications require the super-resolution of noisy images and the preservation of geometrical and texture features. State-of-the-art super-resolution methods do not account for noise and generally enhance the output image's artefacts (e.g., aliasing, blurring). We propose a learning-based method that accounts for the presence of noise and preserves the properties of the input image, as measured by quantitative metrics (e.g., normalised crossed correlation, normalised mean squared error, peak-signal-to-noise-ration, structural similarity feature-based similarity, universal image quality). We train our network to up-sample a low-resolution noisy image while preserving its properties. We perform our tests on the Cineca Marconi100 cluster, at the 26th position in the top500 list. The experimental results show that our method outperforms learning-based methods, has comparable results with standard methods, preserves the properties of the input image as contours, brightness, and textures, and reduces the artefacts. As average quantitative metrics, our method has a PSNR value of 23.81 on the super-resolution of Gaussian noise images with a 2X up-sampling factor. In contrast, previous work has a PSNR value of 23.09 (standard method) and 21.78 (learning-based method). Our learning-based and quality-preserving super-resolution improves the high-resolution prediction of noisy images with respect to state-of-the-art methods with different noise types and up-sampling factors.


Introduction
The super-resolution of 2D images is primarily studied and widespread in many applications, such as biomedicine [LAPB14], astronomy [PK05], and industrial [QCJY22] context.In the literature, several methods guarantee excellent results in terms of reconstruction accuracy [SS20].However, most of the current super-resolution methods do not account for noise in the image.At the same time, the preservation of the features and visual quality of the input data is affected by an underlying noise distribution (Sect.2).In contrast, these super-resolution methods generally smooth the noise with a blurring effect [VCSR21], [MNV + 22] or apply a denoising filter before the super-resolution [HLH21].
On the one hand, the denoise effect may be required in image processing; on the other hand, noise reduction alters the visual quality of the super-resolution image.For example, ultrasound images are affected by speckle noise, which is generated by the superimposition of ultrasound waves.The super-resolution of ultrasound images improves anatomical structures' visibility, keeps the quality of the input data unchanged, makes the high-resolution image visually close to the corresponding low-resolution, and avoids blurring artefacts, allowing the physician better to visualise anatomical features in the image [CNP23].Furthermore, ultrasound images are eventually processed with specialised denoising filters applied after the super-resolution to remove the noise component while preserving the anatomical features [CNP22].It is relevant to generate highresolution images that keep the visual quality regarding noise properties, main geometries and grey-scale values and eventually apply the dedicated denoise filter afterwards.
In this context, we aim to generate a high-resolution image from a low-resolution noisy image, preserving the visual quality and the quantitative similarity of the input data without altering the noise distribution.To this end (Fig. 1), we define a novel super-resolution of noisy images based on a learning model that accounts for the presence of the noise and preserves the similarity of the high-resolution image in terms of visual quality and quantitative metrics (e.g., normalised crossed correlation, normalised mean squared error, peak-signal-to-noise-ration, structural similarity feature-based similarity, universal image quality).In contrast, state-of-the-art super-resolution methods do not account for the noise distribution to preserve the similarity with the high-resolution image.For the learning of our model, we define a data set of ground-truth, noisy, and noisy down-sampled synthetic 2D images, train our network to up-sample the low-resolution noisy image, and match the high-resolution noisy image.At the same time, the prediction has to preserve the noise properties with respect to the ground-truth image.We specialise our networks to different up-sampling factors, i.e., 2X and 4X, and noise, i.e., speckle and Gaussian (Sect.3).Our approach is general to specialise with additional noise types (e.g., Poisson) and up-sampling factors (e.g., 8X).
As the main result, our method improves the super-resolution of noisy images with respect to state-of-theart standard and learning-based methods.We evaluate several quantitative measures for local, structural, and quality similarity.For example, we achieve a PSNR average value of 20.64 with 4X up-sampling factor and Gaussian noise, while previous works achieve an average 20.11 value; also, we gain an average meansquares error value of 247 with 2X up-sampling factor and speckle noise, while state-of-the-art methods achieve more than 400 as an average value.We evaluate the preservation of image properties, such as brightness, contours, and textures.Our super-resolution has comparable results with the standard method and improves learning-based methods on ultrasound images affected by speckle noise.We also analyse the distribution of the generated noise with respect to the ground-truth image; our method generally better preserves the noise distribution.Finally, we discuss the conclusion and future works (Sect.4).Trained networks for 2X and 4X up-sampling factors, Gaussian and speckle noise, are available at https://github.com/cammarasana123/noise-SuperResolution.

Related work
We discuss deep learning and standard methods for the super-resolution of 2D images.
Learning-based super-resolution In the last years, deep learning methods for super-resolution are widespread.The statistical prediction model (SPM) applies a sparse representation of patch pairs over two dictionaries, one for both low-resolution (LR) and high-resolution (HR) images [PE14], and captures A super-resolution generative adversarial network (SRGAN) [LTH + 17] applies a deep residual network with skip-connection and a perceptual loss between generated and target images.The reduction of artefacts of the previous method is addressed by the Enhanced SRGAN [WYW + 19], which improves the network architecture, the adversarial and the perceptual loss, removes the batch normalisation layer, and applies the residual scaling and smaller initialisation values.The perceptual quality of ESRGAN is improved by the ESRGAN+ method [RR20] through a novel Residual-in-Residual Dense Residual block, which increases the network capacity without affecting its complexity.The introduction of weight normalisation and wider features before rectified linear unit activation function [YFH20] achieves good super-resolution results with a low computational cost.Additional methods, classified according to supervised/unsupervised approach and domain-specific applications, are discussed in [WCH20].

Standard super-resolution
We refer to non-learning-based methods as standard methods.The interpolating super-resolution with cubic kernels (i.e., cubic convolution [Key81], CC in short) offers high accuracy with a low computational cost through appropriate boundary conditions and constraints on the kernel functions.The interpolated values are computed as the weighted average of pixels in the 2 × 2 (bilinear, [GJB03]) or 4 × 4 (bi-cubic, [MMP + 14]) neighbourhood.A fast implementation of bilinear and bi-cubic interpolations [KAJ + 20] applies to Field Programmable Gate Array (FPGA), reducing the computational complexity and the FPGA resources while providing an excellent trade-off between image quality and calculation simplicity.After the training of two dictionaries for LR and HR patches [YWHM10], the similarity of the sparse representation of LR and HR patches with the respective dictionary is exploited to generate the high-resolution image.Anchored neighbourhood regression [TDSVG13] and its improved version [TDSVG14] learn a regression to correlate LR and HR images for each atom of the dictionary and precomputed neighbourhood.The search of recursive patches within an image [HSA15] is extended by allow- 3 Learning-based and quality-preserving image super-resolution We describe our learning-based super-resolution method for 2D noisy images (Sect.3.1), the data set and quantitative metrics for the super-resolution network (Sect.3.2), and the experimental results (Sect.3.3).

Learning model
We select WDSR [YFH20], an architecture that exploits residual blocks since it improves the prediction of images where the difference between the input and the target is small.We propose a customised version of this network: custom-WDSR.In particular, our network architecture is a variant of WDSR-A, where the expansion of the features before the rectified linear unit (ReLU) activation allows more information to pass through while preserving the non-linearity of the network.After normalising the data, we apply a 2D convolution and a weighted normalisation that improves the conditioning of the optimisation problem and, thus, the convergence.Then, we apply eight residual blocks with wide activation, where each comprises two Finally, we combine residual blocks and convolution layers, apply deconvolution to interpolate the missing lines and columns, and denormalisation to match the target image.The kernel filter size depends on the up-sampling factor: (3 × 3) in 2X up-sampling, and (5 × 5) in 4X up-sampling.With this setting, the total number of trained parameters is 889K for 2X network and 253K for 4X network (Fig. 2).The network input is a noisy, low-resolution image, while the target is the corresponding high-resolution image.The loss function accounts for both the loss between the predicted image and the target image (i.e., the noisy image) and the log-likelihood of the generated noise with respect to the input noise.In particular, we define the prediction (P), noisy (N) and ground-truth (G) images, and the loss as where Θ is the known noise distribution (e.g., Gaussian) and m is the number of pixels of the high-resolution image.The first term (i.e., ∥P − N∥ F ) trains the network to match the target high-resolution noisy image, while the second term trains the network to generate a prediction whose noise distribution with respect to the ground-truth image matches the Θ distribution; the second term is computed with the log-loss function L, and it is weighted through the parameter λ = −10.The λ term is negative as we maximise the noise distribution properties with respect to the Θ distribution.As additional parameters for the training, we apply the Adam optimiser with a learning rate of 0.001, a maximum number of epochs of 60, and an early

Data sets and metrics
We account for a data set of synthetic images composed of 500 images taken from the Imagenet data set [RDS + 15].Given the ground-truth image G, we apply the artificial noise (e.g., Gaussian, speckle) generating the noisy image N.Then, we down-sample the N image to the noisy low-resolution L image with a down-sampling factor.We apply two different down-sampling factors of k = 2, where we remove one row every two and one column every two, and k = 4, where we remove three columns and three rows every four.We generate separated data sets in terms of up-sampling factor and noise type, as the specialisation of the trained networks improves the accuracy of the reconstruction of the target image.

Quality metrics
The quality preservation of our super-resolution is measured through quantitative metrics.The predicted image P is compared with N to measure the super-resolution accuracy and with G to measure the generated noise properties.Our comparison accounts for the visual similarity of the superresolution images (predicted and noisy) to quantify the preservation of the geometries and features and evaluate several metrics for local, structural, and texture similarity.Given the noisy and the predicted image on m points, we compute the 1/2 , where N and P are the average values of the two images; • the peak-signal-to-noise-ration PSNR = 10 log 10 (max(N)) 2 M SE(N,P) ; • the structural similarity SSIM(P, N) = l(P, N) × c(P, N) × s(P, N), with l(P, N) , where µ(•) is the mean of (•), σ(•) is the standard deviation of (•), σ PN is the covariance between P and N, the positive constants C 1 , C 2 and C 3 are used to avoid a null denominator; , where S L = S P C • S G is a combination of a similarity score of the phase congruency P C [Kov99] and the gradient magnitude G; • the universal image quality (UIQ) [WB02] between N and P UIQ = . NCC, SSIM, and FSIM vary from 0 (worst case) to 1 (best case), PSNR varies from 0 (worst case) to +∞ (best case), MSE and NRMSE go from +∞ (worst case) to 0 (best case), UIQ varies from -1 (worst case) to +1 (best case).Finally, we perform a qualitative assessment of blurring, artefacts, and noise patterns, and analyse the histogram properties of the noisy component with respect to ground-truth image through (P−G) and compare with the histogram properties of the input noise (N − G).
Gaussian noise Figs. 3, 4 show the super-resolution results with Gaussian noise at 2X and 4X, respectively.Our super-resolution better preserves the contours of the geometrical elements (e.g., the sphere's edges) and reduces the generation of artefacts, in particular the blurring effect on the sphere (c.f., CC in Fig. 3) and the scattering effect of the clouds (c.f., EDSR in Fig. 4).Our super-resolution also preserves texture, brightness, According to Tables 1 and 2, our method has better results than state-of-the-art super-resolution methods in terms of quantitative metrics.In particular, the average MSE value of our approach with a 2X upsampling factor is 304, while all the other methods have an average MSE value of more than 400.Also, the PSNR value of our super-resolution with the 4X up-sampling factor is 20.6, while the best result of state-of-the-art methods is 20.1, performed by cubic convolution.Our super-resolution has better results than the other learning-based methods (i.e., EDSR and SPM) with respect to SSIM and FSIM, while CC has slightly better results.Finally, Fig. 5 shows the PSNR value box plot for the 2X and 4X up-sampling factor results.Our method reduces the variability of the results of CC and outperforms EDSR and SPM.According to the histogram (Figs. 6, 7) of the generated noise of our versus state-of-the-art super-resolution methods, our super-resolution better preserves the distribution of the Gaussian noise in terms of mean and standard deviation, both for the 2X and 4X up-sampling factors.Fig. 8 shows the generated noise of the super-resolution methods with respect to the ground truth.In all the methods, the generated noise does not show good randomness properties; instead, the noise tends to adapt to the input image geometries and grey-scale values.Speckle noise Fig. 9 shows the super-resolution results with speckle noise at 2X applied on a phantom image reproducing an ultrasound scanning on cysts at various dimensions.All the methods preserve the contours of the cysts when speckle noise is used.Tables 3 show that our approach results better than state-ofthe-art super-resolution methods in quantitative metrics.In particular, the MSE value of our method with a 2X up-sampling factor is 247, while all the other methods have a MSE value of more than 400.Our approach has comparable results with respect to other learning-based techniques (i.e., EDSR and SPM) concerning SSIM and FSIM metrics, while CC has slightly better results.CC has the worst result in preserving the noise distribution (Fig. 10), while our method has similar results to SPM.Fig. 11 shows the super-resolution results on an ultrasound image from the abdominal anatomical district, with 2X up-sampling.We apply our super-resolution trained with synthetic speckle noise images.Previous learning-based works generate artefacts, e.g., blurring in SPM and scattering in EDSR.CC and our method have comparable results, preserving textures and anatomical features without enhancing artefacts.
Training and execution time Fig. 12 reports the training and validation loss of the Gaussian 2X network, showing us the convergence of our model.Concerning Eq. (1), the y−left axis shows the first term of the loss (i.e., ∥P − N∥ F ), while the y−right axis shows the second term of the loss (i.e., the λ-weighted).
Our training minimises the first term (i.e., the approximation norm with respect to the noisy image) and contemporary increases the second term, i.e., the probability that the generated noise complies with the input noise distribution.We mention that the execution time of the training is around 30 seconds per epoch on the Cineca Marconi100 cluster, at the 26th position in the "top500 " list [url].The cluster uses 980 nodes, each with IBM Power9 AC922 at 3.1GHz 32 cores and 4 NVIDIA Volta V100 GPUs per node, with the GPU interconnection NVlink 2.0 at 16GB and 256GB of RAM each node.The prediction time is lower than 1 second on a standard workstation with 2 Intel i9-9900KF CPUs (3.60GHz), 32GB RAM, and Tensorflow 2.7.

Conclusion and future work
We have presented our novel learning-based method for 2D noisy images super-resolution.Our approach reconstructs the high-resolution image while preserving noise-related geometries and features.We have tested our super-resolution with different up-sampling factors (e.g., 2X and 4X) and noise types (e.g., Gaussian, speckle), comparing quantitative results with state-of-the-art methods.Our method outperforms learningbased methods and has comparable results with standard methods.In future works, we plan to extend our

Figure 11 :Figure 12 :
Figure 11: Comparison among super-resolution methods on an ultrasound image from the abdominal district, 2X up-sampling factor.

Table 1 :
Concerning the 2X up-sampling factor results on Gaussian noise images, we report the metrics computed between target and super-resolution methods as average values on the test data set.The best results are in bold.
• the mean squared error MSE = 1 m m i=1 [N(i) − P(i)] 2 and the normalised mean squared error NRMSE

Table 2 :
Concerning the 4X up-sampling factor results on Gaussian noise images, we report the metrics computed between target and super-resolution methods as the average value on the test data set.The best results are in bold.

Table 3 :
Concerning the 2X up-sampling factor results on speckle noise images, we report the metrics computed between target and super-resolution methods as the average value on the test data set.The best results are in bold.LSK + 17] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee.Enhanced deep residual networks for single image super-resolution.In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pages 136-144, 2017.[LTH + 17] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi.Photo- [