High-resolution image reconstruction for portable ultrasound imaging devices
- 131 Downloads
Pursuing better imaging quality and miniaturizing imaging devices are two trends in the current development of ultrasound imaging. While the first one leads to more complex and expensive imaging equipment, poor image quality is a common problem of portable ultrasound imaging systems. In this paper, an image reconstruction method was proposed to break through the imaging quality limitation of portable devices by introducing generative adversarial network (GAN) model into the field of ultrasound image reconstruction. We combined two GAN generator models, the encoder-decoder model and the U-Net model to build a sparse skip connection U-Net (SSC U-Net) to tackle this problem. To produce more realistic output, stabilize the training procedure, and improve spatial resolution in the reconstructed ultrasound images, a new loss function which combines adversarial loss, L1 loss, and differential loss was proposed. Three datasets including 50 pairs of simulation, 40 pairs of phantom, and 72 pairs of in vivo images were used to evaluate the reconstruction performance. Experimental results show that our SSC U-Net is able to reconstruct ultrasound images with improved quality. Compared with U-Net, our SSC U-Net is able to preserve more details in the reconstructed images and improve full width at half maximum (FWHM) of point targets by 3.23%.
KeywordsImage reconstruction Generative adversarial networks Portable ultrasound imaging system
Generative adversarial networks
- SSC U-Net
Sparse skip connection U-Net
Magnetic resonance imaging
Positron emission tomography/single-photon emission computed tomography
Point spread function
Peak signal-to-noise ratio
Structural similarity index
Full width at half maximum
Graphic processing unit
Ultrasound has become an indispensable imaging technology in current clinical medicine. Compared with commonly used imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography/single-photon emission computed tomography (SPECT/PET), ultrasound imaging is real-time, portable, inexpensive, and free of ionizing radiation risks . Over the past decades, ultrasound imaging has gone through rapid development and presented two trends. Since the major limitation of ultrasonography is the relatively low imaging quality, considerable effort has been made to improve the resolution and contrast, or reduce the artifacts of the ultrasound imaging [2, 3, 4]. Benefiting from major development of imaging techniques, the ultrasound imaging quality has obtained considerable improvement. Developed imaging algorithms or signal processing processes, however, naturally lead to more complex and expensive ultrasound equipment. On the other hand, in order to take full use of the portability of ultrasound imaging, the other trend is to minimize or simplify the imaging equipment for the wide range of applications such as family examination or health care in extreme environment [5, 6, 7]. Due to the size limit of the equipment, the imaging quality is further degraded in portable ultrasound imaging system.
Conventional ultrasound imaging systems usually weight hundreds of kilograms. Most of ultrasound imaging systems equipped with wheels, which allows a certain kind of mobility. For example, bedside ultrasound has been used in biopsy guidance , intraoperative navigation , and obstetrical monitoring . The large size and heavy weight of conventional ultrasound equipment, however, prohibit the usage of ultrasound imaging in other out-of-hospital arenas. With recent advances in integrated circuit, it appeared lightweight pocket-size ultrasound imaging devices, which have been used in emergency aid on the accident scene, disaster relief, military operation, health care in spaceship, family doctor scenario, etc. [11, 12]. Despite the wide application possibilities, the poor imaging quality became a major limitation of the portable ultrasound imaging system. Therefore, it is of great interest to improve the imaging quality of portable ultrasound equipment.
There are three aspects concerning ultrasound imaging quality, namely, spatial resolution, contrast, and noise level. Compared with the traditional normal-size imaging devices, portable equipment typically produces images with lower spatial resolution, lower contrast, and greater noise. The poor imaging quality not only hinders the doctors from giving confident diagnosis but also misleads the doctors in making wrong decision or operation in emergency treatment. As a result, the poor imaging quality has become the major obstacle to the development and further application of portable ultrasound equipment. This motivates us to propose an ultrasound image reconstruction method to improve the imaging quality of portable equipment in terms of resolution, contrast, and noise reduction.
In the last three decades, many methods have been proposed for quality improvement in ultrasound imaging. Beamforming is a commonly used method to improve lateral/axial resolution or contrast of the imaging. By creating spatial selectivity of signals received from or sent to a transducer array, beamforming can produce imaging signals with narrow main-lobe, suppressed side-lobes, dynamic focus, and reduced speed of sound errors . The representative beamforming algorithms include delay-and-sum (DAS), minimum variance (MV) [14, 15], and eigenspace-based MV (ESMV) [16, 17]. Adaptive beamforming such as MV or ESMV is able to provide resolution and contrast improvement around 30% or above than traditional DAS beamforming . Besides beamforming methods, some deep learning methods are introduced recently to reconstruct images from radio frequency (RF) signals. In , Nair et al. designed a fully convolutional neural network to segment anechoic cysts directly from RF signals without beamforming. Luchies et al.  reconstructed ultrasound images from RF signals with a deep neural network and observed a better reconstruction quality compared with DAS beamforming. Although advanced beamforming methods and deep learning methods based on RF signals could improve the imaging quality successfully, this group of methods involve the complex calculation on RF signals which are hardly obtained in commercially available ultrasound imaging equipment.
Compared with the beamforming methods on RF signal domain, image reconstruction methods on image domain are more convenient and versatile. Yang et al.  used a variation of pixel compounding method to reconstruct a high-resolution image from a sequence of ultrasound images acquired with random motion. Taxt and Jirík  proposed a noise robust devolution method that deconvolved the first and second harmonic images separately, resulting in higher resolution images with reduced speckle noise. In , Chen et al. proposed a compressive deconvolution framework to reconstruct enhanced RF images by optimization method of the alternating direction method of multipliers. The main assumption that has been used in  is that point spread function (PSF) of the ultrasound imaging is spatially invariant and can be estimated from RF signal. As a result, for reconstructing a high-quality image, image reconstruction–based methods usually need more information besides a low-quality image. A few measurements with random motion, fundamental and harmonic images, and even parameter estimation from RF signal are required in [21, 22, 23], respectively.
Speckle noise reduction is also an important aspect of image quality improvement since the existence of speckle considerably lowers the image contrast and blurs image details. Many speckle reduction techniques have been proposed such as frequency/spatial compounding [24, 25], spatial filtering [26, 27], and multiscale analysis [28, 29]. Although some methods are able to reduce speckle noise effectively and helpful to image analysis task such as image segmentation, registration, and object detection, this group of methods always tries to strike a balance between noise reduction and detail preservation. Furthermore, speckle noise reduction has no contribution to the resolution which is the most important index of imaging quality.
Abovementioned methods such as beamforming, image reconstruction, and noise reduction tend to focus on one or two aspects of image quality. In this paper, we tried to improve image quality in all aspects by using deep learning method to generate high-quality images. Previous works proved that deep learning methods can be applied to medical image generation [30, 31, 32] and usually outperform traditional methods. Specifically for ultrasound image reconstruction, comparing with normal-size ultrasound imaging equipment, the imaging quality of portable equipment is degraded in resolution, contrast, and heavy noise jointly. To reconstruct a high-resolution ultrasound image from a low-resolution one is similar to an image-to-image translation task . We followed some previous works dealt with similar problems using GAN (Generative adversarial networks) in this paper. An image reconstruction method based on GAN was proposed to break through the imaging quality limitation of portable ultrasound devices. For the task at hand, the GAN-based method has the following advantages: (1) Multi-level nonlinear mapping relationship between the low-quality images and the high-quality images can be extracted by learning model, which therefore has potential to improve imaging quality in multiple aspects. (2) The feature extractors are automatically learned from actual ultrasound images, not human designed, and therefore more representative and adaptive to data. (3) The discriminator used in GAN is able to improve imaging quality visually. (4) Once the model is trained, the reconstruction procedure is a one-step and feedforward process, which is therefore more direct and efficient than some other methods that involve iterated calculations, also more suitable for real-time ultrasound image processing. (5) The fast developing technology on hardware implementation of neural networks allows our method to be implemented on small and portable hardware like FPGAs, and thus easily incorporated into the current portable ultrasound equipment.
The rest of the paper is organized as follows: Methods describes the proposed method and the experimental data, Experiments shows the experimental results, and Results and discussion concludes our work.
2.1 Network architecture
2.1.1 Generator with sparse skip connections
However, as is demonstrated in , a bottleneck exists in the structure of the encoder-decoder model, which limits the sharing of low-level information between input and output. In , the authors proposed a U-Net model to allow more low-level information to pass from the input to the output. To do this, the U-Net model adds skip connections between mirrored encoders and decoders compared with an encoder-decoder model. The U-Net model we used is given in Fig. 2(b). The U-Net model has been successfully applied to the super-resolution reconstruction of many medical images, such as MRI [30, 32], plane wave ultrasound images , and CT . However, as low-quality ultrasound images have many speckles and artifacts, applying a U-Net model to the ultrasound image super-resolution reconstruction task raises a new issue: sharing all low-level information between the input and the output will bring speckles and artifacts in low-resolution images into reconstructed high-resolution images. This is because there is a shallow feedforward path through the top skip connections which extract few features from the input images.
In order to maintain the structure information in the low-resolution image, and meanwhile not bring speckles and imaging artifacts in the low-resolution image to the high-resolution one, we designed a new generator which only concatenated the output of the third encoder to the input of the third decoder. This design keeps the benefits of the U-Net while reduces the low-level information parameters. We call our model a sparse skip connection U-Net (SSC U-Net). Our network is shown in Fig. 2(c).
2.1.2 Discriminator and training strategy
The discriminator uses local patches of the origin image during the training process. This strategy is based on the assumption that pixels from different patches are independent. It encourages the discriminator to model high-frequency details . Other benefits of local patching are that it helps enlarge the data set and save memory resources during the training process. This idea is commonly accepted in tasks like image style transference . The patch size in our network is 128 × 128.
Our training strategy follows the approach in . We train the generator first. The discriminator is trained then with the real images and the images generated by the generator. We use Adam solver in our method.
where x refers to the input vector and y refers to the output vector. D refers to the discriminator and G refers to the generator. L refers to the loss function and E refers to expectation. The adversarial loss is used to let the generator generates images as close to the real images as possible.
In our experiments, we choose α = 100 and β = 80 in Eq. (4).
3.1 Training datasets
Three datasets including 50 pairs of simulation, 40 pairs of phantom, and 72 pairs of in vivo images were used to train and test the GAN model. High-quality and low-quality images of phantom and in vivo data are generated from different devices and hence need to be registered. We align the images using a non-rigid image registration algorithm introduced in . Mutual information is used as the similarity metric .
A total of 50 pairs of simulation data are generated by Field II ultrasound simulation program [44, 45]. All simulated data was generated by plane wave transmission. Two simulation models including cysts and fetus are used. The images of cyst phantoms are simulated with the following setting: number of transducer elements = 64, number of scanning lines = 50, number of scatterers = 100000 and central frequency = 3.5 MHz, 5 MHz, and 8 MHz. We simulated 20 images for each central frequency. The images of 3.5 and 5 MHz central frequency are used as low-quality images. The images of 8 MHz central frequency are averaged to get a high-quality image. For the fetus phantom, we simulate 10 low-quality images with central frequency = 5 MHz, number of transducer elements = 64, number of scanning lines = 128, and number of scatterers = 200000, and 10 high-quality images with central frequency = 8 MHz, number of transducer elements = 128, number of scanning lines = 128, and number of scatterers = 500000. According to [44, 45], these settings are able to simulate fully developed speckles. In all, we obtain 50 pairs of simulated images with 40 pairs of cysts and 10 pairs of fetus.
A total of 40 pairs of phantom data are generated from Vantage 64TM research ultrasound system (Verasonics, Inc., USA) and mSonics MU1 (Youtu Technology, China). The phantoms include two CIRS phantoms (Computerized Imaging Reference Systems, Inc., USA): ultrasound resolution phantom (model 044) and multi-purpose multi-tissue ultrasound phantom (model 040GSE), and two self-made pork phantoms. Verasonics Vantage 64 programmable ultrasound system is used for high-quality images and Msonics MU1 handheld ultrasound scanner for low-quality ones. The setting of Verasonics ultrasound system is as follows: central frequency = 7 MHz, dynamic range = 50 dB with multi-angle plane wave compounding method (20 angles from − 16 to 16°) and a 40-mm wide L11-4v transducer with 128 elements. The setting of Msonics MU1 handheld ultrasound scanner is as follows: central frequency = 7 MHz, gain = 70 dB, using a 40-mm wide L10-5 transducer with 128 elements. In all, we acquire 40 pairs of phantom images, 25 for CIRS phantoms, and 15 for pork phantoms.
Seventy-two pairs of in vivo data are generated from Toshiba Aplio 500 (Toshiba Medical Systems Corporation, Japan) and mSonics MU1 (Youtu Technology, China). Ultrasound images of thyroid from 50 subjects and images of carotid from 22 subjects are scanned. The clinical machine of Toshiba Aplio 500 with central frequency = 7.5 MHz and gain = 76 dB is used to generate the high-quality images. The parameter for portable machine Msonics MU1 is set as follows: central frequency = 6 MHz, gain = 95 dB, using a 40-mm wide L10-5 transducer with 128 elements. The focal depth of the ultrasound image acquisition is around 1 to 2 cm for in vivo data.
3.2 Performance metrics
To evaluate the image reconstruction performance, we calculate peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and mutual information (MI) for all three datasets. Full width at half maximum (FWHM) of point targets and contrast resolution (CR) are calculated for simulated point object images and cyst images respectively.
where MSE is the mean squared error between reconstructed and ground-truth high-quality images, and for uint8 images, n = 8. The higher PSNR indicates the higher intensity similarity between reconstruction image and high-quality image.
where a and b are windows on the reconstructed and ground-truth high-quality images respectively, μa and μb are average values of a and b, σa2 and σb2 are variances of a and b, and σab is the covariance between a and b. c1 and c2 prevent the denominator from being zero. c1 = (k1L)2, c2 = (k2L)2 where k1 = 0.01 and k2 = 0.03. The windows are isotropic Gaussian function with standard deviation of 1.5. The higher MSSIM represents the higher structure similarity between reconstruction image and high-quality image.
where PI and PJ represent the marginal probability distributions of I and J and PIJ is the joint probability distribution of I and J.
where SA and SB represent the average intensity values in two regions of interests (ROIs) with the same size. The higher CR represents the higher imaging contrast.
Since FWHM, CR is only applicable for simulation data, and PSNR, SSIM, and MI are used to evaluate the performance of algorithms in phantom data and in vivo data. The following tables in this part record the results of 5-fold cross-validation experiments. Each dataset is randomly divided into five groups. One group is used as test data while other groups are used as train data each time.
We implement our reconstruction model by Tensorflow. We use a Titan Xp Graphic Processing Unit (GPU) for training. We generated a 128 × 128 patch every ten pixels from an original image. The total number of data patches is 68,450 pairs of simulation patches, 38,480 pairs of phantom patches, and 73,736 pairs of in vivo patches. The input images are uint8 images. We rescaled their pixel intensity values to [− 1, 1]. The mean value of the training dataset is minused from the input data. Output images are added by the mean value of training dataset. Intensity values of the output images are clipped to [− 1, 1]. The images are then transformed into uint8 grayscale images. Learning rate of training is set to 0.00005 and follows an exponential decay to 0.000005. β1 for Adam solver is set to 0.9.
4 Results and discussion
In this section, we report the results from the three models we tested: the encoder-decoder model, the U-Net model, and our SSC U-Net on simulation, phantom, and in vivo data.
4.1 Simulation data results
Results of the three models on simulation dataset
4.2 Phantom data results
Results of the three models on phantom dataset
4.3 In vivo data results
Results of the three models on in vivo dataset
4.4 Differential loss
Image quality is vital for portable ultrasound imaging devices. In this paper, we proposed a new generative model called SSC U-Net to improve the image quality of portable ultrasound imaging devices. We tested our model on three datasets: simulation data, phantom data, and in vivo data. We compared our results with two other widely used GAN models: the encoder-decoder model and the U-Net model. Our experiment results show that our SSC U-Net model out-performed two other models in general on all three datasets. Images generated by our SSC U-Net had a better resolution and preserved more details than the other two methods.
RW and ZF designed the methods, carried out the experiments, and wrote this paper. JG designed the methods and offered help in the experimental design. YG and SZ were responsible for clinical design and data collection. JY and YW proposed the project, directed the methods and experiment design, and revised the paper. CC proposed the clinical requirements and organized clinical data collection. All authors read and approved the final manuscript.
This work is supported by the National Natural Science Foundation of China (61771143), Shanghai Science and Technology Innovation Action Plan (19441903100).
Ethics approval and consent to participate
The ethics approval was waived by the ethics committee of Fudan University Shanghai Cancer Center
Consent for publication
The authors declare that they have no competing interests.
- 7.P Bornemann, G Bornemann, Military family physicians’ perceptions of a pocket point-of-care ultrasound device in clinical practice, Mil Med, 179(12), 1474-1477(2014).Google Scholar
- 14.J Capon, “High resolution frequency-wavenumber spectrum analysis,” in Proc. IEEE, vol. 57, no. 8, pp. 1408-1418, Aug. 1969.Google Scholar
- 18.B. Madore, F.C. Meral, Reconstruction algorithm for improved ultrasound image quality, IEEE Trans. Ultrason., Ferroelectr., Freq. Control. 59(2), 217–230 (2012)Google Scholar
- 19.A.A. Nair, T.D. Tran, A. Reiter, M.A.L. Bell, in A deep learning based alternative to beamforming ultrasound images. IEEE International Conference on Acoustics, Speech and Signal Processing (Calgary, IEEE, 2018), pp. 3359–3363Google Scholar
- 33.O. Ronneberger, P. Fischer, T. Brox, in U-Net: Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI, Cham, 2015), pp. 234–241Google Scholar
- 38.L.A. Gatys, A.S. Ecker, M. Bethge, in Image style transfer using convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition (Las Vegas, IEEE, 2016), pp. 2414–2423Google Scholar
- 39.D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell, A.A. Efros, in Context encoders: feature learning by inpainting. IEEE Conference on Computer Vision and Pattern Recognition (IEEE, Las Vegas, 2016), pp. 2536–2544Google Scholar
- 40.I Goodfellow, J Pougetabadie, M Mirza, B Xu, D Warde-Farley, S Ozair,A Courville, Y Bengio, in Conference and Workshop on Neural Information Processing Systems, Generative Adversarial Nets(NIPS, Montreal, 2014), pp. 2672–2680.Google Scholar
- 41.P. Isola, J.Y. Zhu, T. Zhou, A.A. Efros, in Image-to-image translation with conditional adversarial networks. IEEE Conference on Computer Vision and Pattern Recognition (IEEE, Honolulu, 2017), pp. 5967–5976Google Scholar
- 42.A.B.L. Larsen, S.K. Sønderby, H. Larochelle, O. Winther, in Autoencoding beyond pixels using a learned similarity metric. International Conference on Machine Learning (ICML, New York, 2016) pp. 1558-1566Google Scholar
- 43.TS Yoo, Insight Into Images: Principles and Practice for Segmentation, Registration and Image Analysis, 1st edn. (A.K. Peters Ltd., Natick, 2004).Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.