Feasibility of an accelerated 2D-multi-contrast knee MRI protocol using deep-learning image reconstruction: a prospective intraindividual comparison with a standard MRI protocol

Objectives The aim of this study was to evaluate the image quality and diagnostic performance of a deep-learning (DL)–accelerated two–dimensional (2D) turbo spin echo (TSE) MRI of the knee at 1.5 and 3 T in clinical routine in comparison to standard MRI. Material and methods Sixty participants, who underwent knee MRI at 1.5 and 3 T between October/2020 and March/2021 with a protocol using standard 2D–TSE (TSES) and DL–accelerated 2D–TSE sequences (TSEDL), were enrolled in this prospective institutional review board–approved study. Three radiologists assessed the sequences regarding structural abnormalities and evaluated the images concerning overall image quality, artifacts, noise, sharpness, subjective signal-to-noise ratio, and diagnostic confidence using a Likert scale (1–5, 5 = best). Results Overall image quality for TSEDL was rated to be excellent (median 5, IQR 4–5), significantly higher compared to TSES (median 5, IQR 4 – 5, p < 0.05), showing significantly lower extents of noise and improved sharpness (p < 0.001). Inter- and intra-reader agreement was almost perfect (κ = 0.92–1.00) for the detection of internal derangement and substantial to almost perfect (κ = 0.58–0.98) for the assessment of cartilage defects. No difference was found concerning the detection of bone marrow edema and fractures. The diagnostic confidence of TSEDL was rated to be comparable to that of TSES (median 5, IQR 5–5, p > 0.05). Time of acquisition could be reduced to 6:11 min using TSEDL compared to 11:56 min for a protocol using TSES. Conclusion TSEDL of the knee is clinically feasible, showing excellent image quality and equivalent diagnostic performance compared to TSES, reducing the acquisition time about 50%. Key Points • Deep-learning reconstructed TSE imaging is able to almost halve the acquisition time of a three-plane knee MRI with proton density and T1-weighted images, from 11:56 min to 6:11 min at 3 T. • Deep-learning reconstructed TSE imaging of the knee provided significant improvement of noise levels (p < 0.001), providing higher image quality (p < 0.05) compared to conventional TSE imaging. • Deep-learning reconstructed TSE imaging of the knee had similar diagnostic performance for internal derangement of the knee compared to standard TSE. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-022-08753-z.


Introduction
Magnetic resonance imaging (MRI) of the knee is among the most commonly performed MRI examinations and requires about 15 min of acquisition time. Reference standards for knee MRI are proton density (PD)-and T1weighted turbo spin-echo (TSE) sequences due to the excellent tissue contrast and high in-plane spatial resolution with good assessment of meniscal, ligamentous, and cartilaginous injuries [1][2][3].
Due to their anisotropic voxel size, two-dimensional (2D)-TSE sequences require the acquisition of different image planes separately, which is time consuming [4,5]. One approach to accelerate MRI of the knee is threedimensional (3D)-TSE techniques, generating isotropic data sets of higher spatial resolution to create virtually any image plane from a single parental data set [4,6]. Although technical developments can provide accelerated imaging, mainly based on parallel imaging (PI) acceleration, the acquisition time of a high-quality isotropic data set with 3D-TSE requires still around 5 to 10 min [4,7]. Besides, with increasing acceleration in PI, the signal-tonoise ratio (SNR) decreases rapidly, while residual artifacts are generally increased which limits the achievable speed [8].
Another innovative technique, which is commonly used to accelerate MRI, is compressed sensing (CS), in which only a reduced set of data points is required. SNR is preserved better than by PI only, but CS tends to oversimplify image content, resulting in residual blurring and loss of realistic image textures.
The latest promising approaches to overcome this drawback are deep-learning (DL) algorithms. These feature trainable components in contrast to a priori assumptions on sparsity and promise higher acceleration factors while simultaneously increasing SNR and preserving high image quality [9,10]. With regard to MRI of the knee, a recently published study using retrospectively undersampled data showed that DL images perform interchangeably with standard clinical images for the detection of internal derangement of the knee [8]. Furthermore, retrospectively undersampled, DL-accelerated images were rated with higher image quality than standard imaging and allowed an acceleration of the standard images [8]. There have been other technical developments of the DL reconstruction recently, but so far, there has been no prospective clinical study at both 1.5 and 3 T.
Our hypothesis was that TSE DL can produce similar image quality that is comparable to clinically used segmented sequences while significantly reducing the acquisition time. Therefore, we implemented TSE DL at 1.5 and 3 T in a prospective study to assess diagnostic performance compared to standard imaging sequences in routine clinical practice.

Study design
Institutional review board approval and written informed consent from all participants were obtained for this prospective, single-center study. All study procedures were conducted in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments.
The power calculation for our sample size estimation revealed a sample size of 60 subjects using a test for agreement between two raters (kappa statistics) with 80% power to detect a true kappa value of 0.80 in a test with two categories with frequencies equal to 0.35 and 0.65 based on a significance level of 0.05 [11]. Study recruitment commenced consecutively from October 2020 to March 2021. Adult patients who underwent clinically indicated knee MRI were prospectively

MRI system and acquisition parameters
All examinations were performed on clinical 1.5-T and 3-T scanners (MAGNETOM Skyra, MAGNETOM Prisma fit , a n d MAGNETOM Avanto; all Siemens Healthineers) with participants in supine position using clinical knee surface coils. All participants underwent our clinical standard knee MRI protocol including 2D-PD TSE S with fat suppression in three planes (coronal, sagittal, and axial) and 2D-PD TSE DL with fat  suppression in three planes (coronal, sagittal, and axial), as  well as 2D-T1-weighted TSE S and 2D-T1-weighted TSE DL  in coronal orientation. Imaging parameters are displayed in  Table 2.

TSE with DL reconstruction
On the acquisition side, a conventional under-sampling pattern known from PI is used [10,12], which provides the same performance when reconstructed with DL-based methods as incoherent sampling patterns favored by CS. The prototype image reconstruction comprises a fixed iterative reconstruction scheme or variational network [9,10,13]. For the image reconstruction, k-space data, bias-field correction and coilsensitivity maps are inserted into the variational network. The fixed unrolled algorithm for accelerated MRI reconstruction consists of multiple cascades, each made up from a data consistency using a trainable Nesterov Momentum followed by a convolutional neural network (CNN)-based regularization [13]. The reconstruction was trained on prior volunteer acquisitions using conventional TSE protocols. About 10,000 slices were acquired on volunteers using various clinical 1.5-T and 3-T scanners (MAGNETOM scanners, Siemens Healthineers).  TA time of acquisition, FOV field of view (mm 2 ), TE/TR echo time/repetition time (ms); FA flip angle (degree), AV averages, C concatenations, PAT parallel acquisition technique, TF turbo factor, TSE turbo spin echo, PD proton density, T1w T1-weighted, FS fat saturation A detailed description of the used reconstruction is given in prior studies [13]. Besides this physics-based k-space to image reconstruction method, no other DL-based image-enhancement techniques such as super-resolution methods [14,15] were employed in this study.

Image evaluation
Corresponding TSE datasets have been separated for TSE S and TSE DL , and each dataset was independently evaluated by radiologists with 3 to 9 years of experience in interpreting musculoskeletal MRI. The readers were blinded toward all participant information, reconstruction type, and clinical and radiological reports as well as each other's assessments. Prior to the actual image analysis, each reader had received a training session to familiarize themselves with the Likert-scale classification. Image analysis was performed on a PACS workstation (GE Healthcare Centricity™ PACS RA1000). PD-and T1-weighted images were evaluated separately regarding overall image quality, artifacts, banding artifacts, sharpness, noise, diagnostic confidence, and subjective SNR using a 5-point Likert scale (1 = non-diagnostic; 2 = low image quality; 3 = moderate image quality; 4 = good image quality; 5 = excellent image quality). Reading scores were considered sufficient when reaching ≥ 3. Banding artifacts are characteristic artifacts produced by Cartesian DL reconstruction, particularly strong in low-SNR regions of the reconstructed image, appearing as a streaking pattern exactly aligned with the phase-encoding direction [16]. Furthermore, TSE S and TSE DL were evaluated regarding the image impression using a Likert scale ranging from 1 to 5 (1 = unrealistic; 5 = realistic).
Assessment of pathologies and internal derangement were conducted by the same three radiologists and included the evaluation of the medial and lateral menisci; medial and lateral collateral ligaments; anterior and posterior cruciate ligaments; and cartilage defects of the medial and lateral femur trochlea, the medial tibia plateau, the trochlear groove, and the retropatellar cartilage. Structural abnormalities were graded as 0 = normal, 1 = altered (degenerative, postoperative), and 2 = tear. Cartilage defects were classified using a modified version of the classification system of the International Cartilage Repair Society (ICRS). If more than one cartilage defect was present, only the dominant cartilage lesion was considered. Areas of bone marrow edema (femoral, patellar, tibial), as well as fractures and joint effusion, were evaluated being present or absent. If there were discrepancies between the readers, a consensus reading was enclosed to define false-positive and false-negative findings. All evaluated items of anatomic structures and pathologies are displayed in Table 3.

Statistical analysis
Statistical analyses were performed using SPSS version 26 (IBM Corp). Participants' demographics and clinical Nearly normal (superficial lesions: soft indentation and/or superficial fissures and cracks) 2 Abnormal (lesions extending down to < 50% of cartilage depth) 3a Severely abnormal (cartilage defects extending down > 50% of cartilage depth) 3b Severely abnormal (cartilage defects extending down to calcified layer) 3c Severely abnormal (cartilage defects extending down to but not through the subchondral bone) 4 Severely abnormal (penetration of the subchondral bone) Bone marrow edema 0 Absent 1 P r e s e n t Fracture 0 Absent 1 P r e s e n t Joint effusion 0 Absent 1 P r e s e n t The results are reported as mean [median (interquartile range)] SNR signal-to-noise ratio, Fleiss' κ inter-reader agreement between the three readers characteristics were summarized by using descriptive statistics. Qualitative image analysis assessment was given as mean and median values with interquartile range (IQR). An exact paired-sample Wilcoxon signed-rank test was used to compare the sequences in terms of the image quality scores from each reader. A post hoc multinominal regression analysis (generalized linear model for ordinal variables) was computed for the impact of field strength, reader, and patient demographics. Significance was assumed at a level of p < 0.05.
Inter-reader agreement of the three readers was assessed by using Fleiss' κ and intra-reader agreement by using weighted Cohen's κ, both with 95% confidence intervals and interpreted as follows: 0.20 or less, poor agreement; 0.21-0.40, fair agreement; 0.41-0.60, moderate agreement; 0.61-0.80, substantial agreement; and greater than 0.80, almost perfect agreement.

Results
Among 72 eligible participants, a final sample of 60 participants (84%, mean age 44 ± 17; range 18-85 (years); 29 males, 31 females) were prospectively included in this study. Thirty examinations were performed on 1.5 T and 30 examinations on 3 T regardless of diagnosis, current treatment, first examination, or follow-up (Table 1 and Fig. 1).

Image quality
Inter-reader agreement was substantial to almost perfect with values between 0.61 and 0.85 (see Table 4). Because of the good inter-reader reliability, in the following, only the results of reader 1 are given. A summary of all qualitative image analyses and Fleiss' κ are provided in Table 4.
For further illustration, raw data of a patient examined at 1.5 and of a patient examined at 3 T were exported and exemplary SNR maps were determined offline using a pseudo-replica method. Furthermore, the raw data of the TSE DL acquisition was reconstructed using the DL technique and a conventional generalized autocalibrating partially parallel acquisition (GRAPPA) reconstruction to illustrate the differences between the reconstruction techniques. Note that noise is highest in images acquired at 1.5 T and reconstructed with the GRAPPA reconstruction; see Figs. 7, 8 and 9.
A post hoc multinomial regression analysis via a generalized linear model for ordinal variables was utilized to investigate whether "field strength" (1.5 T/3 T), patients' demographics (sex and age), and "reader" (readers 1-3) could predict how noise and banding artifacts were rated for each reconstruction type (TSE S /TSE DL ).
For noise in TSE S , the factor "field strength" was found to contribute to the model (p < 0.001), whereas the factor "reader" was not a significant contributor to the model (p > 0.05). For each deduction of noise by 1-point decrease on the Likert scale, the likelihood of the image being scanned on a 1.5-T scanner was almost 19-fold (odds ratio 18.5, 95% CI [8.8-39]).
For noise in TSE DL , the factor "field strength" was not a significant contributor to the model (> 0.05).
For banding artifacts in TSE DL , the factor "field strength" was found to contribute to the model (p < 0.001), whereas the factor "reader" was not a significant contributor to the model (p > 0.05). For each improvement of noise by 1-point increase on the Likert scale, the likelihood of the image being scanned on a 3-T scanner was almost 11-fold (odds ratio 10.9, 95% CI [5.4-22.1]). For Fig. 2 Image example of a standard and deep-learning-reconstructed PD TSE imaging of the knee at 3 T. This is an example of a comprehensive knee MRI at 3 T of a 46-year-old patient with pain in the medial side of the right knee after trauma. PD-and T1-weighted TSE S (upper row) and TSE DL (lower row) in different orientations are compared. TSE DL provides higher image quality with lower extents of noise and improved sharpness of the anatomic structures. Note that bone marrow edema (white arrowheads) of the femoral condyle is clearly definable in both TSE S and TSE DL banding artifacts in TSE S , the factor "field strength" was not a significant contributor to the model (> 0.05).
For image quality in TSE DL and TSE S , the patient demographic factors "sex" and "age" were not significant contributors to the model (> 0.05).

Visibility of anatomic structures and internal derangement
Concerning the detection of degeneration or tears of the menisci and ligaments, inter-and intra-reader agreement was almost perfect with κ values between 0.92 and 1.00. There was no clinically relevant difference concerning the detection of structural abnormalities between TSE S and TSE DL . Regarding the detection and evaluation of cartilage defects, inter-and intra-reader agreement was substantial to almost perfect with κ values between 0.58 and 0.98. No difference was found between the readers and the two sequences TSE S and TSE DL with regard to the detection of femoral, tibial, and patellar bone marrow edema, as well as regarding the detection of fractures. A total of four fractures were detected by all readers in both sequences. Inter-and intra-reader agreement was almost perfect with κ values between 0.89 and 0.97 for the presence of joint effusion.
Intra-and inter-reader agreement of detected pathologies is summarized in Table 5. An overview of all detected pathologies is displayed as supplemental material (Table 6). Image examples of TSE S and TSE DL are provided in Figs. 2, 3, 4, 5 and 6.

Discussion
In this study, we investigated the feasibility and performance of a deep-learning-based reconstruction for 2D-TSE Unfortunately, TSE DL images at 1.5 (lower row, right) show characteristic banding artifacts (white arrowheads), known as streaking, which are not present in TSE S sequences (TSE DL ) compared to standard 2D-TSE sequences concerning overall image quality items and the diagnosis of internal derangement of the knee at 1.5 T and 3 T. TSE DL enables a robust and reliable acquisition of images in clinical routine practice, providing even higher overall image quality and equal diagnostic performance compared to TSE S in a short acquisition time.
The current clinical standard for MRI examinations of the knee is a multi-plane 2D-TSE sequence, which is, due to its multiple planes and contrasts, time consuming, with an acquisition time of about 15 min. Several approaches have been made to accelerate knee imaging, especially promising 3D sequences such as 3D-TSE or 3D-SPACE [4,7,17] with the ability to create any imaging plane and slice thickness from a single volume. Regardless, the inverse relationship between acquisition time and image quality leads to relatively long acquisition times of about 10 min for small voxel sizes of (0.5 mm) 3 [4,7]. Small voxel sizes are needed to ensure the visibility of fine anatomic details and interplanar uniformity of reconstructions. Although several studies indicate the equality or even superiority of 3D sequences [7,[17][18][19], this technique has not yet been widely adopted in clinical practice and most study protocols consisted exclusively of PD-weighted images [7]. Fig. 4 Image example of a standard and deep-learningreconstructed PD-and T1weighted TSE imaging of the knee at 1.5 T. This is an example of a knee MRI at 1.5 T in coronal orientation of a 59-year-old patient after partial resection of the medial meniscus. PD TSE S (upper row, left) and PD TSE DL (upper row, right) and T1w TSE S (lower row, left) and PD TSE DL (lower row, right). Comparing PD TSE (upper row) and T1w TSE (lower row), the effect of the noise reduction is more present in PD TSE DL compared to PD TSE S than comparing T1w TSE DL compared to T1w TSE S For the current standard 2D-TSE imaging of the knee, other acceleration techniques have been used, such as PI, CS, and simultaneous multi-slice [20][21][22][23][24]. Diagnostic equivalence can be obtained when using acceleration factors up to twofold. However, PI and simultaneous multi-slice may suffer from reduced SNR, noise enhancement, aliasing, and Fig. 5 Image example of a standard and deep-learningreconstructed TSE imaging of the knee at 1.5 T. This is an example of a knee MRI at 1.5 T in sagittal and axial orientation of a 52-yearold patient with pain in the medial side of the right knee. In the sagittal images, the cartilage defect (ICRS grade 4; white arrows) of the medial femoral condyle with adjacent bone marrow edema (white arrowheads) is visible in both TSE S (left) and TSE DL (right). Comparing TSE S (left) and TSE DL (right) especially in the axial orientation, TSE DL shows characteristic banding artifacts of deep-learning-accelerated images when acquired at 1.5 T Fig. 6 Image example of a standard and deep-learning-reconstructed TSE imaging of the knee. This is an example of a knee MRI in axial orientation comparing the cartilage defects (ICRS grades 1 to 4, from left to right) of the retropatellar cartilage in both TSE S (upper row) and TSE DL (lower row). All cartilage defects are definable in both sequences TSE S and TSE DL reconstruction artifacts, especially if higher acceleration factors are used [25,26]. The immense potential of AI-based reconstruction techniques, such as deep learning, to accelerate MRI while maintaining or even improving the image quality, had been shown in several studies [8,[27][28][29][30][31]. According to these, in our study, TSE DL enabled an improvement of the overall image quality and significantly reduced the extent of noise, especially for images acquired at 1.5 T. The acquisition time of a knee MRI can be reduced to 6:11 min using TSE DL compared to 11:56 min for our standard protocol using TSE S . Even though the extent of general artifacts showed no difference between TSE S and TSE DL , banding artifacts in images acquired at 1.5 T were present, which have been observed with multiple, different deep-learning reconstruction techniques [16]. They have been correlated to the Cartesian sampling scheme with integrated reference scans and are particularly strong in low signal-to-noise regions of the reconstructed image. As such, images acquired at 1.5 T and image contrasts with fat suppression are known to be more prone to banding artifacts (Figs. 7, 8 and 9). Coincidently, our PD protocols employed spectral fat suppression and therefore were more affected by banding artifacts. Recent approaches have shown promising results to reduce such banding artifacts [16]. However, although banding artifacts are present in TSE DL and need to be reduced in further developments of the used network, they do not affect the diagnostic confidence of TSE DL .
Concerning the detection of internal derangement, there was no substantial difference between the TSE S and TSE DL sequences. Although intra-and inter-reader agreement for the presence of cartilage defects showed lower κ values, it would not have led to any change in therapy of the participants, and can be explained by the subjective reading, what is already described in literature [32]. With regard to the acquisition time of the MRI, in addition to the acceleration of the data acquisition, there is also another advantage compared to previously used acceleration techniques such as CS: Up to now, acceleration techniques suffered from long post-processing times and the need of high computational resources [33,34]. The deep-learning approach stands out, due to the fact that most of the computational work has been done in advance during training of the network; thus, the reconstruction time of deep-learning-based sequences is very low.
Our findings should be interpreted within the context of the study's limitations. First, while all readers were blinded to the shown sequences, the characteristic differences in the appearance allowed readers to recognize the reconstruction technique. Therefore, personal preferences may have influenced the study results. Second, in this study, just one network was used to reconstruct the undersampled image data, and this network was trained on various anatomic regions. Further improvements of the used first network have already been done and should be evaluated in further studies, especially with regard to the extent of banding artifacts at images of 1.5-T scanners. Third, all examinations were performed on MRI scanners produced by a single vendor. Further studies on multiple-vendor scanners are needed evaluating the performance of this network also Fig. 8 Comparison of different reconstruction techniques and SNR for standard and deep-learning T1 TSE imaging of the knee at 3 T. Exemplary visualization of different reconstruction techniques (upper row) and signal-to-noise ratio (SNR) as SNR maps (lower row) of T1weighted TSE in coronal orientation of the knee acquired at 3 T. On the left, TSE S dataset reconstructed with a standard GRAPPA reconstruction. In the middle, TSE DL dataset reconstructed with the DL technique and, on the right, TSE DL dataset reconstructed with a standard GRAPPA reconstruction. Compared to the TSE S (upper row, left), the TSE DL reconstructed with GRAPPA (upper row, right) shows higher noise levels and a decrease of SNR (lower row, left and right). The TSE DL reconstructed with the DL technique (lower row, middle) shows lower noise levels and an increase of SNR compared to TSE DL reconstructed with GRAPPA. TSE S (left) and TSE DL reconstructed with the DL technique (middle) are comparable concerning the noise levels and SNR with regard to other anatomic regions to entirely assess the generalizability of this technique.
In conclusion, our study indicates that TSE DL is clinically feasible, providing even better image quality in a shorter acquisition time. Dependent on its ability to accurately reconstruct meniscus and ligament tears, TSE DL yields comparable diagnostic performance for internal knee derangement to standard TSE.
Funding Open Access funding enabled and organized by Projekt DEAL.

Declarations
Guarantor The scientific guarantor of this publication is Prof. Dr. med. Ahmed E. Othman.

Conflict of interest
The authors of this manuscript declare relationships with Siemens Healthineers. The prototype DL reconstruction was provided by Siemens Healthcare, Erlangen, Germany. Full control of patient data was maintained by the authors.
Statistics and biometry No complex statistical methods were necessary for this paper.
Informed consent Written informed consent was obtained from all subjects (patients) in this study. Fig. 9 Comparison of different reconstruction techniques for standard and deep-learning PD TSE imaging of the knee at 1.5 and 3 T. Exemplary visualization of different reconstruction techniques at 1.5 T (upper row) and 3 T (lower row) of PD-weighted TSE in coronal orientation of the knee. On the left, TSE S dataset reconstructed with a standard GRAPPA reconstruction. In the middle, TSE DL dataset reconstructed with the DL technique and, on the right, TSE DL dataset reconstructed with a standard GRAPPA reconstruction. Comparing the TSE DL reconstructed with GRAPPA at 1.5 T (upper row, right) and 3 T, the image acquired at 3 T (lower row, right) shows less noise levels and the effects of the DL reconstruction are minor when images are acquired at 3 T Ethical approval Institutional review board approval was obtained.

Methodology
• prospective • observational • performed at one institution Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.