Introduction

Magnetic resonance imaging (MRI) of the knee is among the most commonly performed MRI examinations and requires about 15 min of acquisition time. Reference standards for knee MRI are proton density (PD)– and T1–weighted turbo spin-echo (TSE) sequences due to the excellent tissue contrast and high in-plane spatial resolution with good assessment of meniscal, ligamentous, and cartilaginous injuries [1,2,3].

Due to their anisotropic voxel size, two-dimensional (2D)–TSE sequences require the acquisition of different image planes separately, which is time consuming [4, 5]. One approach to accelerate MRI of the knee is three-dimensional (3D)–TSE techniques, generating isotropic data sets of higher spatial resolution to create virtually any image plane from a single parental data set [4, 6]. Although technical developments can provide accelerated imaging, mainly based on parallel imaging (PI) acceleration, the acquisition time of a high-quality isotropic data set with 3D–TSE requires still around 5 to 10 min [4, 7]. Besides, with increasing acceleration in PI, the signal-to-noise ratio (SNR) decreases rapidly, while residual artifacts are generally increased which limits the achievable speed [8].

Another innovative technique, which is commonly used to accelerate MRI, is compressed sensing (CS), in which only a reduced set of data points is required. SNR is preserved better than by PI only, but CS tends to oversimplify image content, resulting in residual blurring and loss of realistic image textures.

The latest promising approaches to overcome this drawback are deep-learning (DL) algorithms. These feature trainable components in contrast to a priori assumptions on sparsity and promise higher acceleration factors while simultaneously increasing SNR and preserving high image quality [9, 10]. With regard to MRI of the knee, a recently published study using retrospectively undersampled data showed that DL images perform interchangeably with standard clinical images for the detection of internal derangement of the knee [8]. Furthermore, retrospectively undersampled, DL–accelerated images were rated with higher image quality than standard imaging and allowed an acceleration of the standard images [8]. There have been other technical developments of the DL reconstruction recently, but so far, there has been no prospective clinical study at both 1.5 and 3 T.

Our hypothesis was that TSEDL can produce similar image quality that is comparable to clinically used segmented sequences while significantly reducing the acquisition time. Therefore, we implemented TSEDL at 1.5 and 3 T in a prospective study to assess diagnostic performance compared to standard imaging sequences in routine clinical practice.

Materials and methods

Study design

Institutional review board approval and written informed consent from all participants were obtained for this prospective, single-center study. All study procedures were conducted in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments.

The power calculation for our sample size estimation revealed a sample size of 60 subjects using a test for agreement between two raters (kappa statistics) with 80% power to detect a true kappa value of 0.80 in a test with two categories with frequencies equal to 0.35 and 0.65 based on a significance level of 0.05 [11]. Study recruitment commenced consecutively from October 2020 to March 2021. Adult patients who underwent clinically indicated knee MRI were prospectively included. Exclusion criteria were general contraindications for MRI or incomplete study protocol. A final sample of 60 participants was included (see Fig. 1 and Table 1).

Fig. 1
figure 1

Flowchart of study participants

Table 1 Participants’ characteristics

MRI system and acquisition parameters

All examinations were performed on clinical 1.5-T and 3-T scanners (MAGNETOM Skyra, MAGNETOM Prismafit, MAGNETOM Vida, MAGNETOM Aera, and MAGNETOM Avanto; all Siemens Healthineers) with participants in supine position using clinical knee surface coils. All participants underwent our clinical standard knee MRI protocol including 2D-PD TSES with fat suppression in three planes (coronal, sagittal, and axial) and 2D-PD TSEDL with fat suppression in three planes (coronal, sagittal, and axial), as well as 2D-T1-weighted TSES and 2D-T1-weighted TSEDL in coronal orientation. Imaging parameters are displayed in Table 2.

Table 2 MRI acquisition parameters

TSE with DL reconstruction

On the acquisition side, a conventional under-sampling pattern known from PI is used [10, 12], which provides the same performance when reconstructed with DL–based methods as incoherent sampling patterns favored by CS. The prototype image reconstruction comprises a fixed iterative reconstruction scheme or variational network [9, 10, 13]. For the image reconstruction, k-space data, bias-field correction and coil-sensitivity maps are inserted into the variational network. The fixed unrolled algorithm for accelerated MRI reconstruction consists of multiple cascades, each made up from a data consistency using a trainable Nesterov Momentum followed by a convolutional neural network (CNN)–based regularization [13].

The reconstruction was trained on prior volunteer acquisitions using conventional TSE protocols. About 10,000 slices were acquired on volunteers using various clinical 1.5-T and 3-T scanners (MAGNETOM scanners, Siemens Healthineers).

A detailed description of the used reconstruction is given in prior studies [13]. Besides this physics-based k-space to image reconstruction method, no other DL–based image–enhancement techniques such as super-resolution methods [14, 15] were employed in this study.

Image evaluation

Corresponding TSE datasets have been separated for TSES and TSEDL, and each dataset was independently evaluated by radiologists with 3 to 9 years of experience in interpreting musculoskeletal MRI. The readers were blinded toward all participant information, reconstruction type, and clinical and radiological reports as well as each other’s assessments. Prior to the actual image analysis, each reader had received a training session to familiarize themselves with the Likert-scale classification. Image analysis was performed on a PACS workstation (GE Healthcare Centricity™ PACS RA1000). PD– and T1–weighted images were evaluated separately regarding overall image quality, artifacts, banding artifacts, sharpness, noise, diagnostic confidence, and subjective SNR using a 5-point Likert scale (1 = non-diagnostic; 2 = low image quality; 3 = moderate image quality; 4 = good image quality; 5 = excellent image quality). Reading scores were considered sufficient when reaching ≥ 3. Banding artifacts are characteristic artifacts produced by Cartesian DL reconstruction, particularly strong in low-SNR regions of the reconstructed image, appearing as a streaking pattern exactly aligned with the phase-encoding direction [16]. Furthermore, TSES and TSEDL were evaluated regarding the image impression using a Likert scale ranging from 1 to 5 (1 = unrealistic; 5 = realistic).

Assessment of pathologies and internal derangement were conducted by the same three radiologists and included the evaluation of the medial and lateral menisci; medial and lateral collateral ligaments; anterior and posterior cruciate ligaments; and cartilage defects of the medial and lateral femur trochlea, the medial tibia plateau, the trochlear groove, and the retropatellar cartilage. Structural abnormalities were graded as 0 = normal, 1 = altered (degenerative, postoperative), and 2 = tear. Cartilage defects were classified using a modified version of the classification system of the International Cartilage Repair Society (ICRS). If more than one cartilage defect was present, only the dominant cartilage lesion was considered. Areas of bone marrow edema (femoral, patellar, tibial), as well as fractures and joint effusion, were evaluated being present or absent. If there were discrepancies between the readers, a consensus reading was enclosed to define false-positive and false-negative findings. All evaluated items of anatomic structures and pathologies are displayed in Table 3.

Table 3 Evaluated items of anatomic structures and pathologies of the knee

Statistical analysis

Statistical analyses were performed using SPSS version 26 (IBM Corp). Participants’ demographics and clinical characteristics were summarized by using descriptive statistics. Qualitative image analysis assessment was given as mean and median values with interquartile range (IQR). An exact paired-sample Wilcoxon signed-rank test was used to compare the sequences in terms of the image quality scores from each reader. A post hoc multinominal regression analysis (generalized linear model for ordinal variables) was computed for the impact of field strength, reader, and patient demographics. Significance was assumed at a level of p < 0.05.

Inter-reader agreement of the three readers was assessed by using Fleiss’ κ and intra-reader agreement by using weighted Cohen’s κ, both with 95% confidence intervals and interpreted as follows: 0.20 or less, poor agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; and greater than 0.80, almost perfect agreement.

Results

Among 72 eligible participants, a final sample of 60 participants (84%, mean age 44 ± 17; range 18–85 (years); 29 males, 31 females) were prospectively included in this study. Thirty examinations were performed on 1.5 T and 30 examinations on 3 T regardless of diagnosis, current treatment, first examination, or follow-up (Table 1 and Fig. 1).

Image quality

Inter-reader agreement was substantial to almost perfect with values between 0.61 and 0.85 (see Table 4). Because of the good inter-reader reliability, in the following, only the results of reader 1 are given. A summary of all qualitative image analyses and Fleiss’ κ are provided in Table 4.

Table 4 Image quality and inter-reader agreement of standard TSE (TSES) and deep-learning-reconstructed TSE (TSEDL)

With regard to the PD sequences, overall image quality was rated highest for TSEDL (median 5, IQR 5 – 5), significantly higher compared to TSES (median 5, IQR 5 – 5, p = 0.003). Sharpness, noise, and subjective SNR were also rated to be significantly higher in TSEDL (median 5, IQR 5–5) compared to TSES (median 5, IQR 4–5, p < 0.001). The extent of artifacts was rated to be similar between TSEDL and TSES (median 5, IQR 5–5, p > 0.05), although TSEDL was rated to show significantly more banding artifacts (median 5, IQR 4–5) compared to TSES (median 5, IQR 5 – 5, p = 0.003). Nonetheless, no difference was found with reference to the diagnostic confidence of both sequences (median 5, IQR 5 – 5, p > 0.05).

Concerning the T1-weighted sequences, overall image quality was rated to be significantly higher in TSEDL (median 5, IQR 5 – 5) compared to TSES (median 5, IQR 5 – 5, p = 0.046). Noise was evaluated significantly superior in TSEDL (median 5, IQR 5 – 5) compared to TSES (median 5, IQR 5 – 5, p = 0.002). There was no significant difference regarding artifacts, banding artifacts, sharpness, diagnostic confidence, and subjective SNR between TSEDL and TSES (median 5, IQR 5 – 5, p > 0.05).

For further illustration, raw data of a patient examined at 1.5 and of a patient examined at 3 T were exported and exemplary SNR maps were determined offline using a pseudo-replica method. Furthermore, the raw data of the TSEDL acquisition was reconstructed using the DL technique and a conventional generalized autocalibrating partially parallel acquisition (GRAPPA) reconstruction to illustrate the differences between the reconstruction techniques. Note that noise is highest in images acquired at 1.5 T and reconstructed with the GRAPPA reconstruction; see Figs. 7, 8 and 9.

A post hoc multinomial regression analysis via a generalized linear model for ordinal variables was utilized to investigate whether “field strength” (1.5 T/3 T), patients’ demographics (sex and age), and “reader” (readers 1–3) could predict how noise and banding artifacts were rated for each reconstruction type (TSES/TSEDL).

For noise in TSES, the factor “field strength” was found to contribute to the model (p < 0.001), whereas the factor “reader” was not a significant contributor to the model (p > 0.05). For each deduction of noise by 1-point decrease on the Likert scale, the likelihood of the image being scanned on a 1.5-T scanner was almost 19-fold (odds ratio 18.5, 95% CI [8.8–39]).

For noise in TSEDL, the factor “field strength” was not a significant contributor to the model (> 0.05).

For banding artifacts in TSEDL, the factor “field strength” was found to contribute to the model (p < 0.001), whereas the factor “reader” was not a significant contributor to the model (p > 0.05). For each improvement of noise by 1-point increase on the Likert scale, the likelihood of the image being scanned on a 3-T scanner was almost 11-fold (odds ratio 10.9, 95% CI [5.4–22.1]). For banding artifacts in TSES, the factor “field strength” was not a significant contributor to the model (> 0.05).

For image quality in TSEDL and TSES, the patient demographic factors “sex” and “age” were not significant contributors to the model (> 0.05).

Visibility of anatomic structures and internal derangement

Concerning the detection of degeneration or tears of the menisci and ligaments, inter- and intra-reader agreement was almost perfect with κ values between 0.92 and 1.00. There was no clinically relevant difference concerning the detection of structural abnormalities between TSES and TSEDL. Regarding the detection and evaluation of cartilage defects, inter- and intra-reader agreement was substantial to almost perfect with κ values between 0.58 and 0.98. No difference was found between the readers and the two sequences TSES and TSEDL with regard to the detection of femoral, tibial, and patellar bone marrow edema, as well as regarding the detection of fractures. A total of four fractures were detected by all readers in both sequences. Inter- and intra-reader agreement was almost perfect with κ values between 0.89 and 0.97 for the presence of joint effusion.

Intra- and inter-reader agreement of detected pathologies is summarized in Table 5. An overview of all detected pathologies is displayed as supplemental material (Table 6). Image examples of TSES and TSEDL are provided in Figs. 2, 3, 4, 5 and 6.

Table 5 Intra- and inter-reader agreement of detected pathologies in standard TSE (TSES) and deep-learning-reconstructed TSE imaging (TSEDL)
Fig. 2
figure 2

Image example of a standard and deep-learning-reconstructed PD TSE imaging of the knee at 3 T. This is an example of a comprehensive knee MRI at 3 T of a 46-year-old patient with pain in the medial side of the right knee after trauma. PD– and T1–weighted TSES (upper row) and TSEDL (lower row) in different orientations are compared. TSEDL provides higher image quality with lower extents of noise and improved sharpness of the anatomic structures. Note that bone marrow edema (white arrowheads) of the femoral condyle is clearly definable in both TSES and TSEDL

Fig. 3
figure 3

Image examples of a standard and deep-learning-reconstructed PD TSE imaging of the knee at 3 T and 1.5 T. The upper-row images are examples of knee MRI at 3 T in coronal orientation of an 18-year-old professional athlete with pain in the area of the patella of both knees. After a break from training, the complaints had improved. The lower-row images are examples of knee MRI at 1.5 T in coronal orientation (lower row, PD TSES left and PD TSEDL right) of a 30-year-old patient after knee distortion. Comparing PD TSES (left) and PD TSEDL (right), in PD TSEDL, the difference in the extents of noise at 3 T (upper row) is less present than in images acquired at 1.5 T (lower row). Unfortunately, TSEDL images at 1.5 (lower row, right) show characteristic banding artifacts (white arrowheads), known as streaking, which are not present in TSES

Fig. 4
figure 4

Image example of a standard and deep-learning-reconstructed PD– and T1–weighted TSE imaging of the knee at 1.5 T. This is an example of a knee MRI at 1.5 T in coronal orientation of a 59-year-old patient after partial resection of the medial meniscus. PD TSES (upper row, left) and PD TSEDL (upper row, right) and T1w TSES (lower row, left) and PD TSEDL (lower row, right). Comparing PD TSE (upper row) and T1w TSE (lower row), the effect of the noise reduction is more present in PD TSEDL compared to PD TSES than comparing T1w TSEDL compared to T1w TSES

Fig. 5
figure 5

Image example of a standard and deep-learning-reconstructed TSE imaging of the knee at 1.5 T. This is an example of a knee MRI at 1.5 T in sagittal and axial orientation of a 52-year-old patient with pain in the medial side of the right knee. In the sagittal images, the cartilage defect (ICRS grade 4; white arrows) of the medial femoral condyle with adjacent bone marrow edema (white arrowheads) is visible in both TSES (left) and TSEDL (right). Comparing TSES (left) and TSEDL (right) especially in the axial orientation, TSEDL shows characteristic banding artifacts of deep-learning-accelerated images when acquired at 1.5 T

Fig. 6
figure 6

Image example of a standard and deep-learning-reconstructed TSE imaging of the knee. This is an example of a knee MRI in axial orientation comparing the cartilage defects (ICRS grades 1 to 4, from left to right) of the retropatellar cartilage in both TSES (upper row) and TSEDL (lower row). All cartilage defects are definable in both sequences TSES and TSEDL

Discussion

In this study, we investigated the feasibility and performance of a deep-learning-based reconstruction for 2D–TSE sequences (TSEDL) compared to standard 2D-TSE sequences concerning overall image quality items and the diagnosis of internal derangement of the knee at 1.5 T and 3 T. TSEDL enables a robust and reliable acquisition of images in clinical routine practice, providing even higher overall image quality and equal diagnostic performance compared to TSES in a short acquisition time.

The current clinical standard for MRI examinations of the knee is a multi-plane 2D-TSE sequence, which is, due to its multiple planes and contrasts, time consuming, with an acquisition time of about 15 min. Several approaches have been made to accelerate knee imaging, especially promising 3D sequences such as 3D-TSE or 3D-SPACE [4, 7, 17] with the ability to create any imaging plane and slice thickness from a single volume. Regardless, the inverse relationship between acquisition time and image quality leads to relatively long acquisition times of about 10 min for small voxel sizes of (0.5 mm)3 [4, 7]. Small voxel sizes are needed to ensure the visibility of fine anatomic details and interplanar uniformity of reconstructions. Although several studies indicate the equality or even superiority of 3D sequences [7, 17,18,19], this technique has not yet been widely adopted in clinical practice and most study protocols consisted exclusively of PD-weighted images [7].

For the current standard 2D-TSE imaging of the knee, other acceleration techniques have been used, such as PI, CS, and simultaneous multi-slice [20,21,22,23,24]. Diagnostic equivalence can be obtained when using acceleration factors up to twofold. However, PI and simultaneous multi-slice may suffer from reduced SNR, noise enhancement, aliasing, and reconstruction artifacts, especially if higher acceleration factors are used [25, 26]. The immense potential of AI-based reconstruction techniques, such as deep learning, to accelerate MRI while maintaining or even improving the image quality, had been shown in several studies [8, 27,28,29,30,31]. According to these, in our study, TSEDL enabled an improvement of the overall image quality and significantly reduced the extent of noise, especially for images acquired at 1.5 T. The acquisition time of a knee MRI can be reduced to 6:11 min using TSEDL compared to 11:56 min for our standard protocol using TSES. Even though the extent of general artifacts showed no difference between TSES and TSEDL, banding artifacts in images acquired at 1.5 T were present, which have been observed with multiple, different deep-learning reconstruction techniques [16]. They have been correlated to the Cartesian sampling scheme with integrated reference scans and are particularly strong in low signal-to-noise regions of the reconstructed image. As such, images acquired at 1.5 T and image contrasts with fat suppression are known to be more prone to banding artifacts (Figs. 7, 8 and 9). Coincidently, our PD protocols employed spectral fat suppression and therefore were more affected by banding artifacts. Recent approaches have shown promising results to reduce such banding artifacts [16]. However, although banding artifacts are present in TSEDL and need to be reduced in further developments of the used network, they do not affect the diagnostic confidence of TSEDL.

Fig. 7
figure 7

Comparison of different reconstruction techniques and SNR for standard and deep-learning PD TSE imaging of the knee at 3 T. Exemplary visualization of different reconstruction techniques (upper row) and signal-to-noise ratio (SNR) as SNR maps (lower row) of PD-weighted TSE in coronal orientation of the knee acquired at 3 T. On the left, TSES dataset reconstructed with a standard GRAPPA reconstruction. In the middle, TSEDL dataset reconstructed with the DL technique and, on the right, TSEDL dataset reconstructed with a standard GRAPPA reconstruction. Compared to the TSES (upper row, left), the TSEDL reconstructed with GRAPPA (upper row, right) shows higher noise levels and a decrease of SNR (lower row, left and right). The TSEDL reconstructed with the DL technique (upper row, middle) shows lower noise levels and an increase of SNR compared to both TSES and TSEDL reconstructed with GRAPPA (lower row)

Fig. 8
figure 8

Comparison of different reconstruction techniques and SNR for standard and deep-learning T1 TSE imaging of the knee at 3 T. Exemplary visualization of different reconstruction techniques (upper row) and signal-to-noise ratio (SNR) as SNR maps (lower row) of T1-weighted TSE in coronal orientation of the knee acquired at 3 T. On the left, TSES dataset reconstructed with a standard GRAPPA reconstruction. In the middle, TSEDL dataset reconstructed with the DL technique and, on the right, TSEDL dataset reconstructed with a standard GRAPPA reconstruction. Compared to the TSES (upper row, left), the TSEDL reconstructed with GRAPPA (upper row, right) shows higher noise levels and a decrease of SNR (lower row, left and right). The TSEDL reconstructed with the DL technique (lower row, middle) shows lower noise levels and an increase of SNR compared to TSEDL reconstructed with GRAPPA. TSES (left) and TSEDL reconstructed with the DL technique (middle) are comparable concerning the noise levels and SNR

Fig. 9
figure 9

Comparison of different reconstruction techniques for standard and deep-learning PD TSE imaging of the knee at 1.5 and 3 T. Exemplary visualization of different reconstruction techniques at 1.5 T (upper row) and 3 T (lower row) of PD–weighted TSE in coronal orientation of the knee. On the left, TSES dataset reconstructed with a standard GRAPPA reconstruction. In the middle, TSEDL dataset reconstructed with the DL technique and, on the right, TSEDL dataset reconstructed with a standard GRAPPA reconstruction. Comparing the TSEDL reconstructed with GRAPPA at 1.5 T (upper row, right) and 3 T, the image acquired at 3 T (lower row, right) shows less noise levels and the effects of the DL reconstruction are minor when images are acquired at 3 T

Concerning the detection of internal derangement, there was no substantial difference between the TSES and TSEDL sequences. Although intra- and inter-reader agreement for the presence of cartilage defects showed lower κ values, it would not have led to any change in therapy of the participants, and can be explained by the subjective reading, what is already described in literature [32].

With regard to the acquisition time of the MRI, in addition to the acceleration of the data acquisition, there is also another advantage compared to previously used acceleration techniques such as CS: Up to now, acceleration techniques suffered from long post-processing times and the need of high computational resources [33, 34]. The deep-learning approach stands out, due to the fact that most of the computational work has been done in advance during training of the network; thus, the reconstruction time of deep-learning-based sequences is very low.

Our findings should be interpreted within the context of the study’s limitations. First, while all readers were blinded to the shown sequences, the characteristic differences in the appearance allowed readers to recognize the reconstruction technique. Therefore, personal preferences may have influenced the study results. Second, in this study, just one network was used to reconstruct the undersampled image data, and this network was trained on various anatomic regions. Further improvements of the used first network have already been done and should be evaluated in further studies, especially with regard to the extent of banding artifacts at images of 1.5-T scanners. Third, all examinations were performed on MRI scanners produced by a single vendor. Further studies on multiple-vendor scanners are needed evaluating the performance of this network also with regard to other anatomic regions to entirely assess the generalizability of this technique.

In conclusion, our study indicates that TSEDL is clinically feasible, providing even better image quality in a shorter acquisition time. Dependent on its ability to accurately reconstruct meniscus and ligament tears, TSEDL yields comparable diagnostic performance for internal knee derangement to standard TSE.