Keywords

1 Introduction

Increasing the frame rate is a major challenge in 2D and 3D echocardiography. Investigating deformations at different stages of the cardiac cycle is crucial for cardiovascular imaging; hence high temporal resolution is highly desired in addition to the spatial resolution. There are several ways to increase the frame rate of ultrasound imaging; one of the most commonly used techniques, which is implemented in many ultrasound scanners, is multi-line acquisition (MLA) [1], often referred to as parallel receive beamforming (PRB) [2].

Single- vs. Multi-line Acquisition. In single-line acquisition (SLA), a narrow-focused pulse is transmitted by introducing transmit time delays through a linear phased array of acoustic transducer elements. Upon reception the obtained signal is dynamically focused along the receive (Rx) direction which is identical to the transmit (Tx) direction. The spatial region of interest is raster scanned line-by-line to obtain an ultrasound image.

The need to transmit a large number of pulses sequentially results in a low frame rate and renders SLA inadequate for cardiovascular imaging, where a high frame rate is mandatory, especially for quantitative analysis or during stress tests. For the same reason, SLA is neither useful for scanning large fields of view in real time 3D imaging applications.

In an attempt to overcome the frame rate problem, the MLA method was proposed in [1, 3]. The main idea behind MLA is to transmit a weakly focused beam that provides a sufficiently wide coverage for a high number of received lines. On the receiver side, m lines is constructed from the data acquired from each transmit event, thereby increasing the frame rate by m (the latter number is usually referred to as the MLA factor). Signal formation in the SLA and MLA modalities is demonstrated in Fig. 1 where 5-MLA is depicted. For a 5-MLA, we construct 5 Rx lines per each Tx thus increasing the frame rate by the factor of 5.

MLA Artifacts. As the Tx and Rx are no longer aligned in the MLA mode, the two-way beam profile is shifted towards the original transmit direction, making the lateral sampling irregular [2]. This beam warping effect causes sharp lateral discontinuities that are manifested as block artifacts in the image domain.

The observed block artifacts in the ultrasound images (see, e.g., Fig. 1) tend to be more obvious when the number of transmit events decreases. The MLA artifact can be measured by assessing the correlation coefficient between each two adjacent Rx lines in the in-phase and quadrature (I/Q) demodulated beamformed data [4]. In SLA or compensated MLA, the averaged correlation values inside MLA groups and between MLA groups are almost the same. In the uncompensated cases, however, the correlation values are different.

Apart from beam warping, there are two other effects caused by the transmit-receive misalignment: skewing, where shape of the two-way beam profile becomes asymmetric, and gain variation, where the outermost lines inside the group have a lower gain than the innermost lines [4].

Related Work. Several methods have been proposed in literature to decrease MLA artifacts, including transmit sinc apodization [5] and dynamic steering [6], incoherent interpolation [7, 8] (applied after envelope detection), and its coherent (before envelope detection) counterparts [2, 9]. One of the more prominent methods, synthetic transmit beamforming (STB) [2], creates synthetic Tx lines by coherently interpolating information received from each two adjacent Tx events in intermediate directions. This technique creates highly correlated lines, attenuating block artifacts. A common practice for MLA imaging with focused beams is to create 2-4 Rx lines per each Tx event in cases without overlap, or 4-8 lines in the presence of overlaps from adjacent transmissions, in order to perform the correction [2, 4, 10]. Thus, creating eight lines with overlaps provides an effective frame rate increase by the factor of 4. In this paper, however, we used odd MLA factors \(m=5,7\) for the purpose of acquiring data from aligned directions for both SLA and MLA.

Recently, data-driven learning techniques based on convolutional neural networks (CNNs) have been extensively used for solving inverse problems in imaging and in medical imaging in particular, for example, in X-ray CT reconstruction and denoising [11]. Inspired by their success, we propose a data-driven approach to overcome MLA artifacts.

Contributions. We propose an end-to-end CNN-based approach for MLA artifact correction. Our fully convolutional network consists of interpolation layers followed by a trainable apodization layer, and is trained on in-vivo cardiac data to approximate an SLA quality image. We demonstrate the effectiveness of this network both visually and quantitatively using the decorrelation measure (\(D_c\)) and SSIM [12] quality criteria. To the best of our knowledge, this is the first study to report good artifact corrections in the case of 5-7-MLA. We show that the trained network generalizes well across patients, as well as to phantom data.

Fig. 1.
figure 1

Single (left) and multi-line (right, with MLA factor \(m=5\)) acquisition procedures and their corresponding ultrasound scans. Block artifacts can be seen along the axial direction in MLA. Zooming in is recommended.

2 Methods

2.1 Improving MLA with CNNs

Aiming at providing a general and optimal solution for MLA interpolation achieving SLA quality, we propose to replace MLA artifact correction and apodization phases in the traditional MLA pipeline as shown in Fig. 2 with an end-to-end CNN depicted in Fig. 3. We draw similarities to [10] who showed that combining MLA interpolation with an optimal apodization method produces superior results compared to the traditional approaches. Our network comprises both the interpolation and the apodization stages that are trained jointly.

Interpolation Stage. The interpolation stage consists of our CNN containing 10 convolutional layers with symmetric skip connections [13, 14] from each layer in the downsampling track to its corresponding layer in the upsampling track as visualized in Fig. 3. Downsampling is performed using average pooling and strided convolutions are used for upsampling. The number of bifurcations is set to 5 for all the experiments. The interpolation stage takes as an input the time-delayed and phase-rotated element-wise I/Q data from the transducer.

Apodization Stage. Following the interpolation stage, we introduce a convolutional layer to perform apodization. This is performed using point-wise convolutions (\(1\times 1\)) for each element’s channel in the network and the results are then added to the learned weights of the convolution. The weights of the channel are initialized with a Hann window.

Fig. 2.
figure 2

Traditional MLA ultrasound imaging pipeline.

Fig. 3.
figure 3

Proposed CNN-based MLA artifact correction pipeline.

Optimization. We use the \(L_1\) norm training loss to measure the discrepancy between the image predicted by the network and the ground truth SLA images. The loss is minimized using the Adam optimizer [15] with a learning rate of \(10^{-4}\). We observed that adding the apodization stage accelerates the training process, and makes the network converge faster.

2.2 Data Acquisition and Training

We generated a dataset for training the network using cardiac data from six patients; each patient contributed 4-5 cine loops, containing 32 frames. The data was acquired using a GE experimental breadboard ultrasound system. The same transducer was used for both phantom and cardiac acquisition. Excitation sinusoidal pulses of 1.75 cycles, centered around 2.5 MHz, were transmitted using 28 central elements out of the total 64 element in the probe with a pitch of 0.3 mm, elevation size of 13 mm and elevation focus of 100 mm. The depth focus was set at 71 mm. In order to assess the desired aperture for MLA setup, Field II simulator [16] was used as in [10] using the transducer impulse response and tri-state transmission excitation sequence, requiring a minimal insonification of \(-3\) dB for all MLAs from a single Tx.

On the Rx side, the I/Q demodulated signals were dynamically focused using linear interpolation, with an f-number of 1. The FOV was covered with 140/140 Tx/ Rx lines in SLA mode, 28/140 Tx/Rx lines in the 5-MLA mode, and 20/140 Tx/Rx lines in the 7-MLA mode. For both phantom and cardiac cases, the data were acquired in the SLA mode; 5-MLA and 7-MLA data was obtained by appropriately decimating the Rx pre-beamformed data.

In total, we used 745 frames from five patients for training and validation, while keeping the cine loops from the sixth patient for testing. The data set comprised pairs of beamformed I/Q images with Hann window apodization, and the corresponding 5- and 7-MLA pre-apodization samples with the dimensions of \(652 \times 64 \times 140\) (depth \(\times \) elements \(\times \) Rx lines). The MLA data was acquired by decimation of the Tx lines of the SLA samples by the MLA factor (\(m=5,7\)).

We trained dedicated CNNs for the reconstruction of SLA images from 5- and 7-MLA. Each CNN was trained to a maximum of 200 epochs on mini batches of size 4.

3 Experimental Evaluation

3.1 Settings

In order to assess the performance of our trained networks, we used cine loops from one patient excluded from the training/validation set. From two cine loops, each containing 32 frames, we generated pairs of 5- and 7-MLA samples and their corresponding SLA images the same way as described in Sect. 2.2, resulting in 64 test samples. For quantitative evaluation of the performance of our method we measured the decorrelation (\(D_c\)) criterion that evaluates the artifact strength [4], and the SSIM [12] structural similarity criterion with respect to the SLA image. In addition, we tested the performance of our networks on four frames acquired from the GAMMEX Ultrasound 403GS LE Grey Scale Precision Phantom.

3.2 Results

Quantitative results for the cardiac test set are summarized in Table 1. We show a major improvement in decorrelation and SSIM for both 5- and 7-MLA. The corrected 7-MLA performance approaches that of 5-MLA, suggesting the feasibility of larger MLA factors. Figure 4 shows representative images from each imaging modality. We show that the correlation coefficients profile of the corrected 5- and 7-MLA approaches that of SLA.

Table 1. Image reconstruction results on cardiac data: comparison of average decorrelation and SSIM measures between the original and corrected 5- and 7-MLA cardiac images. Decorrelation of SLA is reported in the first column; left and right values in the entry indicate the values calculated for 5- and 7-MLA, respectively.
Table 2. Image reconstruction results on phantom data: Comparison of average decorrelation and SSIM measures between original and corrected 5- and 7-MLA phantom images. Decorrelation of SLA is reported in the first column; left and right values in the entry indicate values calculated for 5- and 7-MLA, respectively.
Fig. 4.
figure 4

CNN-based MLA artifact correction tested on cardiac data. A test frame from cardiac sequence demonstrating the performance of the proposed artifact correction algorithm. Each image is depicted along with the plot of the correlation coefficients between adjacent lines.

Similarly, quantitative results for the phantom test set are summarized in Table 2, again showing a significant improvement in the image quality for both 5- and 7-MLA. Visual results with the corresponding correlation coefficients profiles are depicted in Fig. 1 in the Supplementary Material. These results suggest that the networks trained on real cardiac data generalize well to the phantom data without any further training or fine-tuning. For comparison, [4] reported a decorrelation value of \(-1.5\) for a phantom image acquired in a \(4-\)MLA mode with STB compensation, while we report closer to zero \(D_c\) values, 0.457 for 5-MLA and 0.956 for 7-MLA, which both use a greater decimation rate. The slight dissimilarities in the recovered data can be explained by the acquisition method being used: since the scanned object was undergoing a motion, there is a difference between all but a central line in each MLA group and the matching lines in SLA. We assume that training the network on images of static organs may further improve its performance. Independently, small areas with vertical stripes were observed in several images. In our opinion, the origin of the stripes is a coherent summation of the beamformed lines across the moving object. Since the frame rate of the employed acquisition sequence was slower than of genuine MLA acquisition, the magnitude of this artifact is probably exaggerated.

4 Conclusion

In this paper, we have shown that conventional ultrasound MLA correction can be substituted with an end-to-end CNN performing both optimal interpolation and apodization in order to approximate SLA image quality. In the future, we aim at extending this approach to even earlier stages in multi-line acquisition such as beamforming, assuming it will provide a greater improvement in image quality. Moreover, a similar method could probably be applied for other fast US acquisition modalities, such as multi-line transmission (MLT) [17].