Introduction

Magnetic resonance imaging (MRI) offers optimal bone and soft tissue contrast and is hence the preferred modality for the assessment of the shoulder joint [1, 2]. MRI allows the assessment in detail of different anatomic structures such as rotator cuff tendons, biceps tendon, labrum, and cartilage as well as respective pathologies with high accuracy [3,4,5,6].

Conventional MRI of the shoulder joint is usually performed using multiplanar fast spin-echo (FSE) sequences, resulting in acquisition of high-resolution images in different contrast weightings [7, 8]. However, motion artifacts due to the strenuous breathing of patients with multiple chronic conditions or from the pulsation of the neighbouring vessels are possible limitations that may lower the image quality of FSE sequences [9, 10]. As the shoulder is located peripherally in the body, problems related to patient positioning may occur. It may result in motion artifacts even in compliant patients, who present with a painful shoulder [11, 12].

To overcome these limitations, different reconstruction techniques to reduce motion artifacts have been proposed, e.g. radial k-space sampling by periodically rotated overlapping parallel lines with enhanced reconstruction (PROPELLER), also termed BLADE (Siemens) and MultiVane-XD (Philips) [13,14,15]. The PROPELLER technique collects data in concentrical parallel lines rotated around the k-space, which enables correction of spatial variations and eventually reduction of motion artifacts. The main drawback of the PROPELLER method is usually an increase in acquisition time [16].

Deep learning–based convolutional neural networks (DL) have been recently introduced to accelerate image reconstruction of conventional sequences, as they allow the reduction of image noise and scan time while maintaining optimal image contrast [17, 18]. Most MRI protocols with deep learning–based reconstructions routinely use FSE sequences, and their successful implementation for assessment of different musculoskeletal structures has been shown in various studies [19,20,21]; however, the application of the deep-learning reconstruction to the PROPELLER sequences has not been examined yet.

Combining the PROPELLER acquisition technique with DL image reconstruction could allow the suppression of motion artifacts and reduce image noise and scan time at the same time.

The aim of this study was to compare the image quality and diagnostic performance of conventional PROPELLER MRI sequences with those of accelerated PROPELLER MRI sequences after post-processing using a DL for the assessment of the shoulder joint.

Materials and methods

Ethical board approval was obtained for this prospective cohort study. Written informed consent was obtained from each included patient.

Patients

Patients with an indication for an MRI of the shoulder between June and October 2021 were prospectively scanned and included in the study. All sequences, including conventional sequences, were acquired in the coronal oblique, sagittal oblique and axial plane using the PROPELLER technique and then transferred to the viewer for the purpose of DL-based post-processing.

An a priori power analysis was performed to evaluate the minimum cohort size with an effect size of 0.5, an alpha error of 0.05, and a beta error of 0.2. There was a Laplace distribution of the data, resulting in a minimum cohort size of 23 patients. Therefore, we set our goal for the cohort size over the required 23 patients and finally included 30 patients.

A total of 30 patients between 18 and 80 years of age with a male predominance (male n = 19, female n = 11) were included in the study. A flow chart with a detailed description is shown in Fig. 1.

Fig. 1
figure 1

Flow diagram of patient selection

MRI protocol and MR acquisition

The MRI examinations were performed using a 1.5-T MRI scanner (SIGNA Artist, GE Healthcare) with a dedicated 16-channel shoulder coil.

The standard examination of the shoulder joint includes axial proton density (PD) fat-saturated (FS), sagittal oblique T2-weighted FS, sagittal oblique T1-weighted, coronal oblique PD FS, and coronal oblique T1-weighted FS sequences. The same sequences were additionally acquired with markedly reduced scan time but increased image noise and were subjected to a DL reconstruction algorithm (AIRTM Recon DL®, GE Healthcare). All sequences were acquired using the PROPELLER technique.

The mean scan time of the standard protocol was 19 min 18 s, compared to 7 min 16 s for the accelerated protocol used for the DL-based reconstruction. For detailed MRI parameters, refer to Table 1.

Table 1 MRI parameters of conventional PROPELLER MRI sequences and accelerated PROPELLER MRI sequences used for post-processing

Once raw accelerated MR images were acquired, they were transferred to the post-processing software (Orchestra SDK, GE Healthcare), and then reconstructed using the AIRTM Recon DL algorithm and finally labelled as DL sequences.

The AIRTM Recon DL pipeline includes a deep convolutional neural network (CNN) that operates on raw, complex-valued imaging data to produce a clean output image. Specifically, the CNN is designed to (1) provide a user-tuneable reduction in image noise, (2) reduce truncation artifacts, and (3) improve edge sharpness. Integration into the scanner’s native, inline reconstruction pipeline is critical as this provides access to raw, full bit-depth data. The CNN contains 4.4 million trainable parameters in approximately 10,000 kernels. It is a convolutional network, making it suitable for all MR relevant image sizes. The CNN was trained with a supervised learning approach using pairs of images representing near-perfect and conventional MRI images. The near-perfect training data consisted of high-resolution images with minimal ringing and very low noise levels. The conventional training data were synthesized from near-perfect images using established methods to create lower-resolution versions with more truncation artifacts and with higher noise levels [22].

The dedicated software for image post-processing (Orchestra SDK, GE Healthcare) uses the AIRTM Recon DL algorithm to remove noise and Gibbs ringing artifacts from the raw data used as input before the final image is calculated [22]. Various levels of SNR improvement in the post-processing can be chosen between low, medium, and high. A maximum level of 100% for SNR improvement was used for DL sequences.

All image data sets were eventually sent to the PACS (IMPAX 6; Agfa-Gevaert N.V.) of our department for further analysis.

Image analysis

In the initial training, a set of fifteen MRI examinations of the shoulder joint not included in the study sample was evaluated using conventional PROPELLER and post-processed DL sequences. Discrepancies were thoroughly discussed until the agreement was achieved. Then, MRI images were assessed independently by two readers (a board-certified radiologist with 6 years of experience and a board-certified radiologist with more than 15 years of experience in musculoskeletal radiology) blinded to any clinical information. All image sets have been stripped of all sequence identifiers (conventional sequence vs. DL sequences) and then mixed. The readers reviewed all images in a random order. After the readouts were performed, information on the sequence type was revealed for the purpose of the statistical analysis.

The intra-reader agreement was performed by the board-certified radiologist (M.K.) 8 weeks after the initial readout.

Qualitative assessment of image quality

The image quality of conventional and DL sequences was assessed separately for bone and cartilage (humeral and glenoid), glenoid labrum, muscle (deltoid muscle and muscles of the rotator cuff), rotator cuff tendons, long head of biceps tendon, subcutaneous fat, and acromioclavicular joint using a 5-point Likert scale (0—poor, 1—mild, 2—moderate, 3—good, 4—perfect). The readers were instructed how to score the image quality basing on previously shown image examples illustrating each of the grade from the 5-point Likert scale. A score of 4 means the best image quality, with high image sharpness and no detectable image noise, and a perfect delineation of the analysed structures without any inhomogeneities or signal changes. A score of 3 was assigned to images comparable to the daily image quality, of a marginally inferior image quality with minor image noise, however with very good delineation of the analysed structures without notable inhomogeneities. A score of 2 was given to images of a considerably lower image quality with easily detectable image noise, with preserved delineation of the structures of the shoulder joint, with significant however not disturbing inhomogeneities. A score of 1 and 0 was given for images of poor image quality, with a lot of noise where delineation of the analysed structures was markedly distorted (score 1) or almost impossible (score 0).

Diagnostic confidence for evaluation of the above-mentioned structures together with assessment of contour sharpness and homogeneity of fat saturation in central and peripheral field of view (FOV) was performed using a 5-point scale (0—poor, 1—mild, 2—moderate, 3—good, 4—perfect). Readers rated diagnostic confidence as follows: score of 4: perfect lesion detection, a very high suspicion of a lesion; score of 3: good lesion detection, a high suspicion of a lesion; score of 2: lesion detection still possible, moderate suspicion of a lesion; score of 1: lesion detection hardly possible; and score of 0: inadequate assessment of any pathologies.

Central FOV was defined as the region of the glenohumeral joint at the level of midportion of glenoid, peripheral FOV as the most medial part of the pectoralis major muscle. Presence of motion artifacts was additionally evaluated in both image sets.

Shoulder structures and associated pathologies

Evaluation of different joint structures was performed as follows: any pathological finding of the bone was noted and described. Cartilage was assessed as either (0) normal and homogenous, (1) focal areas of inhomogeneities with normal contour, (2) partial-thickness cartilage loss of less than 50%, or (3) partial-thickness cartilage loss of more than 50% or full-thickness cartilage loss with exposed subchondral bone. Muscle quality of the rotator cuff muscles was assessed as described by Goutallier et al [23, 24]. The quality of the supraspinatus (SSP) tendon was categorized as (0) normal, (1) tendinopathy, (2) articular-sided partial-thickness tear, (3) bursal-sided partial-thickness tear, and (4) full-thickness tear. The infraspinatus (ISP) and subscapularis (SSC) tendons were characterized as (0) normal, (1) tendinopathy, (2) partial-thickness tear, and (3) full-thickness tear. The quality and position of the long head of biceps tendon (LHBT) in the bicipital groove was evaluated as follows: (0) normal, (1) tendinopathy, (2) subluxation but still within the bicipital groove, and (3) displaced from the bicipital groove. The glenoid labrum was categorized as (0) normal, (1) mild, (2) moderate, (3) advanced degeneration, and (4) torn. The acromioclavicular (AC) joint was evaluated as (0) normal, (1) mild, (2) moderate, or (3) advanced degeneration. The subacromial bursa was characterized as (0) not visible; (1) less than 2 mm, considered normal; and (2) thickened over 2 mm, considered abnormal as described by White et al [25]. Each structure was assessed in all planes of the acquired image sets and sequences.

Quantitative assessment of the image quality

To quantitatively assess image quality, the signal-to-noise ratio (SNR) and the contrast-to-noise ratio (CNR) for both sequences were measured. Regions of interest (ROIs) of 5 mm2 were placed separately on each set to define the signal intensity (SI) in bone (in the humeral head), muscle (deltoid muscle), and subcutaneous fat. The noise was defined as the standard deviation (SD) of the SI in a ROI measured in extracorporeal air.

The SNR and CNR were calculated as:

$$ {\displaystyle \begin{array}{c}\mathrm{SNR}=\frac{\mathrm{SI}}{\mathrm{SD}\left(\mathrm{air}\right)}\\ {}\mathrm{CNR}\left(\mathrm{bone}\right)=\frac{\mathrm{SI}\left(\mathrm{bone}\right)-\mathrm{SI}\left(\mathrm{muscle}\right)}{\mathrm{SD}\left(\mathrm{air}\right)}\\ {}\mathrm{CNR}\left(\mathrm{fat}\right)=\frac{\mathrm{SI}\left(\mathrm{fat}\right)-\mathrm{SI}\left(\mathrm{muscle}\right)}{\mathrm{SD}\left(\mathrm{air}\right)}\end{array}} $$

Statistical analysis

All findings of image quality and diagnostic confidence were summarized and compared between conventional and DL sequences using a Wilcoxon signed-rank test [26]. Correlation between image quality and diagnostic confidence was calculated using Spearman rank correlation. A Shapiro-Wilk test was applied to assess the normal distribution of findings [27,28,29]. If a significant difference between sequences was noticed, a Bonferroni-Holm post hoc test for multiple comparison was additionally performed [30].

Agreement between conventional and DL sequences and inter-reader and intra-reader reliability for image quality and diagnostic confidence were calculated using the intraclass coefficient (ICC) [31]. ICC values under 0.5 were considered poor, between 0.5 and 0.75 moderate, between 0.75 and 0.9 good, and over 0.9 as an excellent reliability [32].

Pathologies of all the evaluated structures were recorded separately for each reader. Cohen’s kappa statistic was applied for inter-reader and intra-reader agreement for evaluation of pathological findings [33, 34]. Kappa values between 0.41 and 0.60 were considered moderate, between 0.61 and 0.80 substantial, and above 0.81 almost perfect agreement [35]. p < 0.05 was considered significant. All statistical analyses were conducted using SPSS, v. 26.0 (IBM).

Results

In total, 30 patients (11 females, 19 males; age range 18–80 years) were included in the study. Patients’ characteristics (including age, gender, side of shoulder joint), indications for the shoulder MRI, and MRI findings are found in the Supplementary materials (Table 1.supp).

Qualitative image quality and diagnostic confidence

The mean image quality of conventional and DL sequences in the assessment of bone was 2.6 and 3.8, respectively, for cartilage 2.2 and 3.6, for rotator cuff muscles 2.6 and 3.7, and for the glenoid labrum 2.6 and 3.5.

The mean diagnostic confidence for evaluation of bone was 3.5 and 3.8 for conventional and DL sequences, 3.6 and 3.9 for rotator cuff muscles, 2.8 and 3.7 for cartilage, and 2.9 and 3.6 for the glenoid labrum.

The mean image quality and diagnostic confidence were significantly better for DL sequences compared to conventional sequences for all analysed structures of the shoulder joint (p < 0.05). Detailed information is found in Tables 2 and 3. Examples of both sequences are shown in Fig. 2.

Table 2 Image quality of all analysed structures of the shoulder joint in conventional and post-processed sequences using deep learning–based convolutional neural network (DL). For the assessment of the image quality of bone and cartilage bone glenoid and humeral were evaluated. Image quality of conventional and DL sequences was assessed using a 5-point Likert scale (0—poor, 1—mild, 2—moderate, 3—good, and 4—perfect)
Table 3 Diagnostic confidence of all analysed structures of the shoulder joint in conventional and post-processed sequences using deep learning–based convolutional neural network (DL). Diagnostic confidence of conventional and DL sequences was assessed using a 5-point Likert scale (0—poor, 1—mild, 2—moderate, 3—good, and 4—perfect)
Fig. 2
figure 2

MR images of the right shoulder joint of a 32-year-old male. a Conventional PROPELLER coronal oblique proton density (PD) fat-saturated (FS) image, b coronal oblique PD FS image after post-processing using a deep learning–based convolutional neural network (DL), c conventional PROPELLER sagittal oblique T2-weighted (T2w) FS image, d sagittal oblique T2w FS image after post-processing using DL, e conventional PROPELLER axial PD FS image, f axial PD FS image after post-processing using DL, g conventional PROPELLER sagittal oblique T1-weighted (T1w) image, and h sagittal oblique T1w image after post-processing using DL

Assessment of shoulder structures and associated pathologies

In 17 cases, thickening of the subacromial bursa was identified using the DL sequences, while only in 7 cases using the conventional sequences with a significant difference between sequences in terms of proper delineation of the subacromial bursa (p < 0.05). The other analysed structures and associated pathological findings could be evaluated properly both by conventional and DL sequences.

The summary of pathologies of all analysed structures in conventional and DL sequences as assessed by both readers can be found in Table 4. Examples of both sequences with pathological findings are shown in Figs. 3, 4 and 5.

Table 4 The summary of all analysed structures in conventional and post-processed sequences using deep learning–based convolutional neural network (DL)
Fig. 3
figure 3

MR images of the right shoulder joint of a 71-year-old female. a Conventional PROPELLER coronal oblique proton density (PD) fat-saturated (FS) image, b coronal oblique PD FS image after post-processing using a deep learning–based convolutional neural network (DL) showing a subacromial bursa (arrow). The presence of a small amount of fluid within the subacromial bursa with a slight thickening of the subacromial bursa is clearly visible in DL sequences

Fig. 4
figure 4

MR images of the right shoulder joint of a 40-year-old female with shoulder pain after anterior shoulder dislocation. a Conventional PROPELLER axial proton density (PD) fat-saturated (FS), b axial PD FS image after post-processing using DL shows a tear of the anterior midportion of the glenoid labrum (arrow). The pathology can be suggested both in conventional and post-processed MR sequences; however, it is more sharply delineated in the post-processed sequence

Fig. 5
figure 5

MR images of the right shoulder joint of a 78-year-old male with chronic shoulder pain. a Conventional PROPELLER sagittal oblique T1-weighted (T1w) image, b sagittal oblique T1w image after post-processing using DL images shows degenerative changes of the acromioclavicular joint (broad white arrow), subchondral cysts in the humeral head (thin white arrow), and a joint effusion (triangle). All pathologies can be delineated in both sequences; however, the post-processed sequence is less noisy so the pathologies can be identified easily

Quantitative assessment of the image quality (SNR and CNR)

The mean SNR for bone, muscle, and fat was higher for DL sequences compared to conventional sequences with significant difference for muscle and fat (p < 0.05), but with no significant difference for bone (p > 0.05) (Table 7 and 8.supp).

The mean CNR was significantly higher for DL sequences compared to conventional sequences (p < 0.05).

Box plots for SNR and CNR are shown in Fig. 6. No motion artifacts were noted in any of the analysed image sets.

Fig. 6
figure 6

SNR (a) for bone, muscle, and fat and CNR (b) for post-processed sequences using DL and conventional sequences. Mean SNR for bone, muscle, and fat was higher for post-processed sequences using DL compared to conventional sequences with significant difference for muscle and fat (p < 0.05), but with no significant difference for bone. Mean CNR was significantly higher for post-processed sequences using DL compared to conventional sequences (p < 0.05)

Inter-reader agreement

There was a moderate overall inter-reader agreement for assessment of the image quality of conventional sequences with ICC values of 0.659 and 0.582 for DL sequences, respectively. There was a moderate inter-reader agreement for assessment of the diagnostic confidence of conventional and D sequences with ICC values of 0.695 and 0.595.

Detailed findings with inter-reader agreement for evaluation of image quality and diagnostic confidence are found in Supplementary materials (Table.supp 2 and 3).

In the evaluation of pathological findings, there was an almost perfect inter-reader agreement for evaluation of the rotator cuff muscles with kappa values of 0.947 and 0.892 for conventional and DL sequences, and a moderate agreement for assessment of the cartilage with kappa values of 0.524 and 0.532.

Inter-reader agreements for assessment of pathological findings are shown in Table 5.

Table 5 Inter-reader agreement for assessment of pathological findings of all investigated structures for conventional and post-processed sequence using deep learning convolutional neural network (DL). Inter-reader reliability for image quality and diagnostic confidence were calculated using intraclass coefficient (ICC). ICC values under 0.5 were considered poor, between 0.5 and 0.75 moderate, between 0.75 and 0.9 good, and over 0.9 as an excellent reliability

Intra-reader agreement

There was a good overall intra-reader agreement for assessment of the image quality and diagnostic confidence of conventional and DL sequences with ICC values of 0.837 and 0.898 and 0.883 and 0.819, respectively.

There was an almost perfect intra-reader agreement for evaluation of the supraspinatus tendon and glenoid labrum of conventional and DL sequences with kappa values of 0.907 and 0.815 and 0.821 and 0.860, respectively. There was a substantial and almost perfect intra-reader agreement for assessment of the cartilage of conventional and DL sequences with kappa values of 0.741 and 0.841. Overall, there was a substantial intra-reader agreement for assessment of the pathological findings of all analysed structures with kappa values of 0.840 and 0.842 in conventional and DL sequences.

Intra-reader agreements for all analysed parameters are shown as Supplementary material (Table.supp 4-6).

Discussion

To the best of our knowledge, this is the first study to combine the PROPELLER MR acquisition technique with a DL image reconstruction approach for imaging of the shoulder joint. The PROPELLER DL sequences showed substantially higher image quality of investigated anatomical structures of the shoulder joint compared to conventional PROPELLER sequences, resulting in higher diagnostic confidence and comparable diagnostic performance.

Dietrich et al described the use of the PROPELLER technique for MRI of the shoulder as a useful method for reduction of motion artifacts while increasing image quality [13]. The PROPELLER technique collects data in concentrical parallel lines rotated around the k-space, which enables correction of spatial variations and eventually reduction of motion artifacts [16]. The main drawback of the PROPELLER method is usually an increase in acquisition time. In our study, the mean acquisition time of the conventional PROPELLER sequences was 19 min 18 s and could be reduced to 7 min 16 s in the accelerated sequences used for post-processing using DL. This equals a reduction in scan time of 62%.

In conventional MR image reconstruction, suppression of Gibbs ringing artifacts results in a loss of spatial resolution and a lower image quality. With application of deep-learning-based vendor software used for the image post-processing, it is possible to suppress ringing artifacts while maintaining high image quality and resolution [22]. As expected, there was an overall better image quality and diagnostic confidence of DL sequences compared to conventional sequences. However, the delineation and detection of most pathological findings was equally possible using both sequences. The only exception was the subacromial bursa which could be better delineated and assessed in DL sequences. This may be explained by the higher image quality and subsequent easier delineation of subtle structures such as the subacromial bursa in DL sequences.

DL-based image reconstruction using FSE MR sequences has been applied for imaging of different organs including the brain, liver, heart, and peripheral nerves [21, 36,37,38,39,40]. Application of the PROPELLER technique with DL-based image reconstruction for imaging of the brain and prostate has been recently described and resulted in improvement of the SNR and image sharpness [41, 42].

The PROPELLER technique is a well-established method for image acquisition when motion reduction is desired especially in the imaging of the abdomen, lung, or shoulder joint [13, 16, 43,44,45,46,47]. Blood flow in the axillary vessels could be a potential source of pulsation artifacts in shoulder MRI; therefore, the use of the PROPELLER technique should be considered to minimize motion artifacts. Application of the PROPELLER technique in our study resulted in suppression of motion artifacts in both conventional and DL sequences, and no motion artifacts were noted.

These findings are in accordance with the study of Hahn et al who investigated the retrospective application of DL reconstructions for fast spin-echo sequences for accelerated shoulder MRI [19]. The mean scan time for accelerated MRI sequences in the study of Hahn et al was 3 min 5 s with the image quality lower than that in conventional MRI sequences, whereas application of deep-learning reconstruction resulted in image quality comparable with that of conventional sequences. While Hahn et al performed a retrospective study, we prospectively investigated a PROPELLER acquisition technique to minimize motion artifacts in combination with DL reconstruction. The substantial reduction of scan time not only allows for higher patient throughput per scanner but also likely affords higher patient comfort.

There was a moderate inter-reader agreement for image quality on conventional and DL sequences, and a moderate agreement for diagnostic confidence on both conventional and DL sequences. The unusual image impression of novel DL sequences to readers who were accustomed to reading conventional MR images might have impacted on subjective image quality perception.

Our study has several limitations. First, all MR images were acquired using the PROPELLER technique; hence, we did not perform a comparison of conventional FSE and the PROPELLER sequences for acquisition of the accelerated sequences used for post-processing using DL. While FSE sequences have been conventionally applied for acquisition of DL as previously described, there is lack of literature on application of the PROPELLER technique for deep-learning-based reconstructions [19,20,21].

Second, we did not follow up the patients with suspected injuries of the shoulder joint, and there was no correlation of MRI findings with an arthroscopic reference standard. Nevertheless, a good correlation of MR findings and arthroscopy in evaluation of the shoulder pathologies has been described in previous studies with a high accuracy in diagnosis of rotator cuff tears, osteochondral defects, and some labral tears, and in assessment of the muscle quality [48,49,50,51,52,53,54]. Moreover, the main aim of this study was to compare image quality and diagnostic performance of the conventional PROPELLER technique versus those using DL reconstructions. Finally, we did not analyse the impact of the PROPELLER technique on image quality and diagnostic performance for imaging of shoulder implants and postoperative susceptibility artifacts. This would be interesting to analyse in further studies.

In summary, the motion-corrected PROPELLER MR imaging technique with DL post-processing showed superior image quality and higher diagnostic confidence compared to the conventional PROPELLER sequences in imaging of the shoulder joint. Pathologies of the shoulder joint can be assessed correctly in the conventional PROPELLER and DL sequences. Due to significantly shorter scan times and higher SNR and CNR compared to conventional sequences, post-processed PROPELLER sequences using DL could be considered for clinical use after further validation at other sites.