Evaluation of a deep learning-based reconstruction method for denoising and image enhancement of shoulder MRI in patients with shoulder pain

Objectives To evaluate the diagnostic performance of an automated reconstruction algorithm combining MR imaging acquired using compressed SENSE (CS) with deep learning (DL) in order to reconstruct denoised high-quality images from undersampled MR images in patients with shoulder pain. Methods Prospectively, thirty-eight patients (14 women, mean age 40.0 ± 15.2 years) with shoulder pain underwent morphological MRI using a pseudo-random, density-weighted k-space scheme with an acceleration factor of 2.5 using CS only. An automated DL-based algorithm (CS DL) was used to create reconstructions of the same k-space data as used for CS reconstructions. Images were analyzed by two radiologists and assessed for pathologies, image quality, and visibility of anatomical landmarks using a 4-point Likert scale. Results Overall agreement for the detection of pathologies between the CS DL reconstructions and CS images was substantial to almost perfect (κ 0.95 (95% confidence interval 0.82–1.00)). Image quality and the visibility of the rotator cuff, articular cartilage, and axillary recess were overall rated significantly higher for CS DL images compared to CS (p < 0.03). Contrast-to-noise ratios were significantly higher for cartilage/fluid (CS DL 198 ± 24.3, CS 130 ± 32.2, p = 0.02) and ligament/fluid (CS DL 184 ± 17.3, CS 141 ± 23.5, p = 0.03) and SNR values were significantly higher for ligaments and muscle of the CS DL reconstructions (p < 0.04). Conclusion Evaluation of shoulder pathologies was feasible using a DL-based algorithm for MRI reconstruction and denoising. In clinical routine, CS DL may be beneficial in particular for reducing image noise and may be useful for the detection and better discrimination of discrete pathologies. Summary statement
 Assessment of shoulder pathologies was feasible with improved image quality as well as higher SNR using a compressed sensing deep learning–based framework for image reconstructions and denoising. Key Points • Automated deep learning–based reconstructions showed a significant increase in signal-to-noise ratio and contrast-to-noise ratio (p < 0.04) with only a slight increase of reconstruction time of 40 s compared to CS. • All pathologies were accurately detected with no loss of diagnostic information or prolongation of the scan time. • Significant improvements of the image quality as well as the visibility of the rotator cuff, articular cartilage, and axillary recess were detected.


Introduction
In modern society, shoulder pain is very common and may cause impairment in everyday and work activities [1]. In the general population, acute as well as chronic shoulder pain often originates from rotator cuff pathologies and pathologies of other soft tissue structures [2]. MR imaging is currently the modality of choice for imaging of shoulder pain, due to the high contrast and resolution, in particular if soft tissue pathologies are suspected [3,4]. MR imaging of the shoulder is often challenging in patients with shoulder pain (breathing artifacts, motion artifacts caused by pain, etc.) and potential artifacts caused by surgery [5,6]. Furthermore, imaging quality is often corrupted due to increased image noise (e.g., Rician noise) or limited by the surface coil where tissues that lie peripheral to the coil elements are less assessable due to increased noise [7]. As consequence, increased image noise could lead to inaccurate diagnosis or impaired image analysis [7]. Different strategies have previously been introduced in order to reduce image noise including traditional approaches which mainly are based on filtering, transformations or statistical methods, as well as modern DL-based approaches which are based on convolutional neuronal networks (CNNs) and general adversarial networks (GANs) [7][8][9][10][11][12][13][14]. Traditional patch-based denoising methods such as non-local means (NLM) algorithms rely on the self-spatial similarities of natural images and have proven to be compatible with iterative image reconstruction methods based on parallel accelerated imaging, e.g., SENSE and GRAPPA [10,11,[15][16][17]. In recent studies, the application of deep learning-based algorithms was suggested for reconstruction of accelerated MR scans as well as for image denoising [12,14,18,19]. Zhang et al created a supervised feed-forward CNN which separates noise from noisy observation and uses residual modules as well as batch normalization to speed up the denoising performance [12]. As a self-learning self-supervised image denoising network, Xu et al proposed the "Noisy-As-Clean" network [13]. This method declares corrupted test images as ground truth ("clean" target) and uses synthetic images, which consist out of small alterations to the corrupted test image in order to train the network. In general, CNNs and GANs apply self-learning reconstruction schemes and have shown promising results in order to reduce image noise and accelerate the MR imaging data acquisition process in contrast to the classic iterative reconstruction schemes [18,[20][21][22][23][24]. These DL-based methods apply reconstruction schemes in order to calculate high-quality images from undersampled MR data.
In this study, we used a reconstruction framework which utilizes a novel CNN to integrate and enhance conventional CS algorithms based on the adaptive-CS-network, previously described by Pezzotti et al [18]. Therefore, the purpose of this study was to assess the diagnostic performance and denoising capabilities of the reconstruction framework which combines PI, CS, and a DL-based algorithm (CS DL) for the assessment of various shoulder pathologies on multiplanar shoulder MRI compared to images reconstructed with CS only.

Study participants
In this prospective study, patients with shoulder pain (N = 38, mean age 40 ± 15.2 years, 14 women) that were admitted to the orthopedic and trauma surgery departments between June 2021 and January 2022 were enrolled. The patients presented with shoulder pain due to various pathologies including suspected chronic degenerative changes (n = 21), acute trauma (n = 9), as well as unclear shoulder pain (n = 8). Informed consent was obtained from all study participants prior to inclusion. The study was approved by our institutional review board (Ethics Commission of the Medical Faculty, Technical University of Munich, Germany; ethics proposal number 42/21S). To calculate the appropriate number of study participants, a priori power analysis was performed using data of a preceding study [25]. The data of the SNR calculations between the ankle MRI with CS only and the ankle MRI with the deep learning reconstructed CS was used to simulate a comparison between the two different groups. Finally, a sample size of at least 24 subjects per group was determined to achieve a power > 0.8. Therefore, we included 38 participants into the study to ensure adequate group sizes for the comparison.

MR imaging
Each patient underwent a 3-T MRI examination (Ingenia Elition; Philips Healthcare) of the shoulder using a dedicated 16-channel shoulder coil. A clinical routine imaging protocol was used including a triplanar intermediate-weighted (IM) turbo spin-echo (TSE) sequence with spectral fat saturation, a sagittal T2-weighted TSE sequence, and a coronal T1-weighted TSE sequence. Detailed scan parameters are displayed in Table 1. All acquired data were reconstructed with standard CS and CS DL.

C-SENSE DL
The CS DL reconstructions investigated in this study are based on an adaptive-CS-network, as previously described by Pezzotti et al [18]. This network utilizes a novel CNN to integrate and enhance conventional CS algorithms. The adaptive-CS-network is an advancement of the deep learning-based iterative shrinkage-thresholding algorithm (ISTA) network proposed by Zhang and Ghanem [26]. It integrates multiscale sparsification in a problem-specific learnable manner. Further, the CNN-based sparsification approach is combined with the image reconstruction approach of CS and therefore, ensures data consistency. Prior information such as coil sensitivity distribution and location of the image background are automatically incorporated as well. The adaptive-CS-network therefore combines parallel imaging, compressed sensing, and deep learning into a single algorithm and replaces the wavelet transform by a CNN as sparsifying transform in the CS algorithm. The adaptive-CS-network used in this study was initially trained with a dataset of approximately 740,000 MR images from various anatomical regions acquired using 1.5-T and 3-T MR imaging. The algorithm was refined to run on standard reconstruction hardware, in contrast to the previously reported network [18]. Reconstruction was performed on the scanner and the reconstructions took approx. 80 s for the CS DL reconstructions compared to approximately 40 s for the standard CS reconstructions.

Quantitative image analysis
Signal-to-noise (SNR) and contrast-to-noise (CNR) values were calculated for CS only and CS DL using an established subtraction method (Figs. 1 and 2) [27][28][29]. Therefore, sequences of ten patients were acquired twice in the same exam session. The repeated sequences were subtracted using the inbuild MRI software to create the noise maps in which regions of interest were placed in the same location on three consecutive slices. SNR was calculated as previously described [27]: SI 1 measures the signal intensity of the ROI in the first series, SI 2 the signal from the ROI in the second series, and SI3 the ROI from the ROI in the noise maps. Following SNR calculation was then performed for muscle, ligaments, joint fluid, subchondral bone, and fat. CNR was calculated by subtracting the SNR of tissue 1 with the SNR of tissue 2 and was calculated for cartilage/fluid, subchondral bone/ cartilage, ligament/fluid, and ligament/fat.

Semi-quantitative image analysis
Image readings were performed by two experienced radiologists separately and independently and blinded to all clinical information (Y.L. with 4 years of experience in musculoskeletal imaging and J.N. with 10 years of experience in musculoskeletal imaging). Readings were performed on a PACS work station certified for clinical use (IDS7 21.2, Sectra). The standard CS images and CS DL reconstructed images were read with at least 3 weeks in between readings, respectively. For intra-reader reproducibility, 10 patients were assessed once again after 4 weeks by both radiologists. The CS and CS DL images were analyzed for the visibility of anatomical landmarks graded with a 4-point Likert scale (1 = inadequate, 2 = adequate, 3 = good, 4 = excellent) based on the extent of the partial volume effect, blurring, image noise, signal inhomogeneity, and discrimination from adjacent structures [27]. Following landmarks were assessed for visibility: rotator cuff tendons and muscles, biceps anchor, long biceps tendon, rotator cuff interval, AC joint, articular cartilage, axillary recess, and labrum. Overall image quality was assessed also using a 4-point Likert scale based on the overall image expression. Furthermore, the rotator cuff tendons, long biceps tendon, rotator cuff muscles, glenoid and humeral cartilage, bone, bursa, AC joint, glenoid labrum, and joint capsule were assessed for visibility and presence of pathologies. The bone was assessed for bone marrow edema, subchondral cysts, or osseous defects such as Bankart and Hill-Sachs lesions [30]. Rotator cuff tendons were assessed for tendinopathy including subacromial impingement, rotator cuff tendinitis/tendinosis, and calcific tendonitis, and the fatty infiltration was graded according to Goutallier et al [31]. The biceps tendon was assessed for tendinopathic changes, dislocation, as well as lesions of the biceps pulley. Anatomical variations of the labrum were graded according to Kanatli et al [32] (for detailed information about the gradings of abnormalities, see Table 2). The overall diagnostic confidence of the readings was also graded with a 4-point Likert scale based on how confident the readers were regarding the evaluation of pathologies. Severeness of motion artifacts, blurring, or image noise was semiquantitatively classified into no, little, and severe artifacts.

Statistical analysis
The data were analyzed using IBM SPSS, version 25.0 (IBM Corp.). All statistical tests were performed two sided, and a level of significance (α) of 0.05 was used for all tests. A Shapiro-Wilk test was performed to test for normal or nonnormal distribution of the data. A paired t-test was used for comparison of normally distributed numerical variables and a Wilcoxon signed-rank test was used to compare nonnormally distributed numerical and categorical variables between CS and CS DL image assessments. McNemar's test was used to assess for binary categorical variables.

Assessment of shoulder pathologies/abnormalities
Almost all pathologies were accurately detected on the CS DL reconstructions by both readers compared to the standard CS images with no significant difference (readers 1 and 2: κ 0.95 (95% confidence interval 0.82-1.00)). In total, 9 acute fractures were detected, of which 7 were osseous Bankart lesions (Figs. 3, 4, and 5) and two were humerus fractures. Moreover, four partial tears (Fig. 6) and one complete tear of one or more rotator cuff tendons were detected. Joint effusion was detected in 7 patients and 28 patients showed signs of an osteoarthritis of the AC joint. Detailed numbers and the distributions of pathologies are listed in Table 2. The overall diagnostic confidence of pathologies detected on the CS DL images was higher compared to CS in both readers, and this finding reached the statistical level of significance for one reader (reader 1 CS: 3.2 ± 0.1, CS DL 3.8 ± 0.2, p = 0.18; reader 2 CS: 2.9 ± 0.2, CS DL 3.7 ± 0.1, p = 0.04, Table 3). Compared to the standard CS images, the overall image quality of the CS DL reconstructions was rated significantly higher (reader 1 CS 2.8 ± 0.3, CS DL 3.7 ± 0.1, p = 0.02; reader 2 CS 2.7 ± 0.1, CS DL 3.8 ± 0.3, p = 0.01). No significant differences were detected for the ratings of motion artifacts (reader 1 CS: 3.7 ± 0.4, CS DL 3.8 ± 0.4, p = 0.23; reader 2 CS: 3.7 ± 0.4 and CS DL 3.8 ± 0.5, p = 0.25). There were no severe motion artifacts detected in the CS DL reconstructions as well as in the CS only images.

Visibility of anatomical features
The overall visibility of anatomical regions recorded was higher for the CS DL reconstruction by both readers, yet no statistical significance was reached (reader 1 CS DL 2.5 ± 0.   (Table 4). Only a slight increase was seen in the visibility of the long biceps tendon, muscle, and bone. None of the anatomical regions was rated lower in the CS DL reconstructions than in the standard CS images.

Quantitative image analysis
Significant higher SNR values were detected for ligaments and muscle of the CS DL reconstructions compared to the standard CS images (ligaments p = 0.01, muscle p = 0.04).
Although the SNR values of the CS DL reconstructions were generally higher, no statistical significance was reached for fat, joint fluid, and subchondral bone (Fig. 1). Comparing the CNR values, the CNR values of cartilage/fluid and ligament/ fluid of the CS DL reconstructions were significantly higher compared to standard CS images (cartilage/fluid p = 0.02 and ligament/fluid p = 0.03, respectively; Fig. 2).

Inter-and intra-reader agreement
Inter-reader agreement for the detection and grading of pathologies was substantial to almost perfect (κ 0.89 (95% confidence interval 0.71-1.00)). Agreement for the grading of the visibility of anatomical regions was also substantial to almost perfect (κ 0.94 (95% confidence interval 0.89-1.00)). For intra-reader reliability, both readers reassessed the images of 10 patients after at least 4 weeks. The intra-reader agreement was overall substantial to almost perfect (range κ 0.84 to 1.00) for both readers. All acute pathologies were once more accurately identified on the CS DL reconstructions (κ 1.00 (95% CI 1.00-1.00) for both readers). Only in two patients the tendinopathy of the supraspinatus tendons was rated as partial tears instead (κ 0.84 (95% confidence interval 0.75-1.00)).

Discussion
In this study, the application of a compressed sensing deep learning-based framework for image reconstruction and denoising was investigated and shown to be feasible and to improve image quality while remaining accurate regarding the assessment of various pathologies of the shoulder. Images with CS DL reconstructions overall showed a significantly higher image quality and a higher diagnostic confidence indicating that the better image quality and visibility of anatomical landmarks may be useful for the visualization and differentiation of pathologies. The increased image quality and denoising in the CS DL reconstructions may be particularly beneficial in patients with subtle findings and increased image noise. Almost all pathologies were identically detected on the CS DL reconstructions as in the standard CS images, with no loss of information due to the DLbased reconstruction algorithm. Furthermore, the visibility of the articular cartilage, AC joint, and rotator cuff tendons was rated higher in the CS DL images compared to standard CS reconstructions. The reduced image noise also improved the discrimination between tissues which may improve the detection and assessment of e.g. acute fractures or degenerative changes. Due to the denoising, significant higher SNR values were detected for ligaments and muscle of the CS DL reconstructions which further increases the image quality. CNR for cartilage/fluid and ligament/fluid was significantly higher in the CS DL reconstruction. In general, the reduced image noise of the CS DL reconstruction may be particularly useful when assessing more peripheral pathologies, e.g., injury to the biceps tendon, muscle, or scapula, and might help identify discrete pathologies which otherwise would be difficult to discriminate. In recent studies, different approaches for deep learning-based enhancement of reconstruction quality and noise reduction in CS imaging had been proposed. Manimala et al successfully implemented a CNN for fast denoising of sparse MR images corrupted with Rician noise [19]. The algorithm exploits patch-based processing in order to update and refine the dictionary of weights. Furthermore, it can be employed without estimating the noise level and preserves the local structures better compared to traditional methods like NLM [19]. Chaudhari et al implemented a deep learning-based 3D CNN called "DeepResolve" to reconstruct small-slice high-resolution images from acquired thicker slices and was able to achieve superior quantitative and qualitative diagnostic performance [35]. Quan et al developed a convolutional autoencoder and GAN which employ deeper generator and discriminator networks with cyclic data consistency loss for interpolation of the under-sampled k-space data [36]. This enables faster image acquisition but also enhanced reconstruction quality using a chained network. In a retrospective study, Koch et al successfully utilized a neuronal network for denoising of shoulder and hip MRI which was trained with a supervised learning approach using pairs of high-spatial-resolution high-signal-to-noise ratio images and synthesized low-resolution low-signalto-noise ratio images [37].
To our knowledge, none of the abovementioned studies evaluated shoulder pathologies and anatomical structures with traumatic and non-traumatic pathologies prospectively in a larger patient cohort. Furthermore, with the CS DL algorithm used in this study, image reconstruction was performed automatically during acquisition with no further time-consuming postprocessing needed. We found that CS DL reconstructions are generally applicable and reliable, yet, depending on the pathology and patient compliance, it has a larger or smaller benefit on image quality. Furthermore, our study was conducted in a clinical routine setup and our framework is applicable on every standard 3 T MRI scanner and can be implied on scanning protocols without the need of special hardware.
In our current work, the CS DL reconstructions were not accelerated, meaning there was no difference in scan times. Both CS and CS DL images were reconstructed from the same data, to minimize differences and confounders due to quality variations in the acquired data for both techniques. However, in future work, a thorough optimization of CS DL sequences is warranted to investigate the impact of CS DL on further image acceleration or increased resolution compared to what currently is possible with standard CS.
There are certain limitations to our study which need to be addressed. First, we examined an inhomogeneous patient collective of 38 patients with various disorders and injuries of the shoulder. Studies with larger study cohorts with multiple cases of one pathology are needed in the future in order to show the applicability of the CS DL reconstructions for  various pathologies. For comparison, the standard CS images were used as standard of reference but no further modality, e.g., arthroscopy, was available. Therefore, confirmation of diagnosis was not verified by an external standard of reference. Only in the acute trauma cases with occurring fractures (n = 9) an additional conventional CT scan was available. The assessment of shoulder pathologies was feasible, and the image quality and the SNR were significantly improved, while remaining accurate regarding the assessment of the pathologies when using a compressed sensing deep learning-based framework for image reconstructions. The reduced image noise improved the quality as well as visibility of anatomical landmarks compared to the standard CS reconstructions and might help with the detection of even discrete pathologies. In clinical routine, this automated reconstruction and denoising technique might be particularly useful when applied to challenging MRI acquisitions, e.g., in traumatic shoulder injuries.
Funding Open Access funding enabled and organized by Projekt DEAL. This work was funded by the German Society of Musculoskeletal Radiology (Deutsche Gesellschaft für muskuloskelettale Radiologie; DGMSR) from which G.C.F received a research grant.

Declarations
Guarantor The scientific guarantor of this publication is PD. Dr. med. Alexandra S. Gersing.
Conflict of interest K.W. is employed by Philips GmbH Market DACH but was not involved in data acquisition or analysis. The rest of the authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

Statistics and biometry
No complex statistical methods were necessary for this paper.
Informed consent Written informed consent was obtained from all subjects (patients) in this study.
Ethical approval Institutional review board approval was obtained.

Methodology
• prospective • case-control study • performed at one institution Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.