Introduction

The incidence of shoulder complaints is high. Rotator cuff pathology is the most common cause of shoulder pain with a reported incidence of 5–40% [8]. Instability is another commonly presented shoulder problem, mostly resulting from lesions of the capsulo–labral complex.

MR imaging is a commonly used non-invasive test for assessing lesions of the glenoid labrum and musculo-tendinious units of the rotator cuff with high accuracy [1, 2, 5, 6, 911, 1618, 21, 23, 25, 26].

However, in daily clinical practice radiologists and orthopaedic surgeons frequently differ in the interpretation of MR examinations of the shoulder. When comparing arthroscopic findings with the clinical MR report this disagreement occurs even more so. What causes this divergence and whether this is a true difference in interpretation or just a different use of terminology has not been investigated before. The aim of this study is to evaluate the inter-observer agreement between orthopaedic surgeons and radiologists in the assessment of MR examinations of the shoulder in daily clinical practice. Furthermore, we evaluate the accuracy in predicting shoulder pathology based on MR imaging of orthopaedic surgeons and radiologists with arthroscopic findings as the standard of reference.

Materials and methods

Patients

The study subjects in this investigation consisted of a group of patients with clinically suspected shoulder pathology who underwent either unenhanced MR imaging or MR arthrography and subsequently arthroscopy of the shoulder at our institution, from January 2007 to January 2010. A total of 73 patients were considered for enrolment in this study. Patients were excluded if imaging quality was impaired (due to movement-artefacts or otherwise) or if the arthroscopy report was not available. Furthermore, patients were excluded if arthroscopy was performed more than 180 days after MR imaging. After applying these exclusion criteria 65 patients remained. Of these patients 50 were selected, so that 25 had unenhanced MR imaging and 25 had MR arthrography of the shoulder joint. Mean time between imaging and arthroscopy was 76 days (median 65 days, range 22–174 days). Of the 50 patients—30 men and 20 women—that were involved in this study, ages ranged from 17 to 79 years old (mean 44, median 46) at the time of MR imaging. The spectrum of pathology in the patients ranged from no abnormalities to the presence of multiple lesions. Four patients had prior surgery of the investigated shoulder.

As shown in Table 1, the majority of patients (54%) that were included in this study presented with shoulder pain. After physical examination 34% of the patients were suspected of having subacromial impingement syndrome, in 18% a rotator cuff tear was suspected. One patient (2%) presented with frozen shoulder syndrome. 46% of the patients in this study presented with unidirectional shoulder instability. Institutional review board approval was not needed for this retrospective study.

Table 1 Patient characteristics (n = 50)

MR imaging protocol

All MR images were obtained at our institution using a 1.5 T MRI (Siemens, either type Avanto or Espree) with a standard shoulder coil. High resolution and small field of view imaging was performed.

For unenhanced MR imaging of the shoulder the following sequences were obtained: T1-weighted turbo-spin-echo images in the axial plane and T1-weighted spin-echo sequences in the coronal–oblique plane. T2-weighted turbo-spin-echo images were acquired in the sagittal plane and T2-weighted turbo spin-echo sequence with fat suppression in the coronal–oblique plane.

MR arthrography was routinely performed with fluoroscopic guidance and in most cases a posterior approach. Intra-articular needle placement was verified with the injection of 1–5 mL Iomeprol (Iomeron® 300 mg/ml). Thereafter 0.5 mL Gadoteridol (ProHance® 2,793 g/10 ml) was diluted in 100 mL 0.9% saline solution. Of this mixture, 15–20 mL was injected in the glenohumeral joint. Following arthrography, T1-weighted turbo-spin-echo sequences with fat suppression were obtained in the axial plane and T1-weighted VIBE sequences in the coronal–oblique plane of which multiplanar reconstructions were obtained in the coronal–oblique, axial and sagittal plane. Proton density and T2-weighted turbo-spin-echo images were acquired in the sagittal plane and the same sequences with fat suppression in the coronal–oblique plane.

Image analysis

Images were interpreted by two radiologists (radiologist 1 and 2) and one orthopaedic surgeon, all three with vast experience in shoulder pathology. MR examinations were retrospectively reviewed and scored independently by all three observers. The observers were blinded for patients’ name, date of birth and patient number. Hereby, it was made sure that none of the observers had access to the arthroscopy report or the clinical MR report. CD-ROM’s with the MR images could only be read on “stand alone” computers (that were not linked to the hospital network) with standard monitor quality. Standard evaluation forms, developed by the authors, were used to score for pathology of rotator cuff, glenoid labrum, tendon of the long head of the biceps brachii (biceps tendon), labral–bicipital complex (SLAP-lesion) and glenohumeral ligaments. Furthermore, the presence or absence of glenohumeral osteoarthritis, a Bankart lesion, a Hill-Sachs lesion or impingement of the rotator cuff was noted on the evaluation forms. In case a rotator cuff lesion was found to be present, the location and size was scored on the evaluation forms. The location of a rotator cuff tear was described in terms of which muscle/tendon was affected (supraspinatus, infraspinatus, subscapularis or teres minor). A partial rotator cuff tear was categorised according to the proportion of rotator cuff thickness affected (less or more than 50%). A full thickness rotator cuff tear was categorised according to the size of the biggest gap in the coronal plane (less or more than 3 cm). Furthermore if a lesion of the glenoid labrum was scored on the evaluation forms, the location of the lesion had to be noted in terms of being superior, inferior, anterior or posterior. If a ligamentous lesion was noted to be present, the exact ligament affected (either the superior, middle or inferior glenohumeral ligament), was also noted.

The observers were not provided with instructions about specific criteria to use for interpreting the MR images. Moreover, they were specifically asked to assess to MR examinations as they would in daily practice. Still, the observers are aware of the standard criteria as established in the literature used for the assessment of labral and rotator cuff pathology [9].

All three observers assessed the 50 MR examinations twice, with a 2-week interval between the appraisal of the first and second series. The first series was assessed in a different order than the second series.

Arthroscopy

All arthroscopies were performed at the same institution, Diakonessenhuis Hospital, Utrecht, the Netherlands. 47 of the 50 arthroscopies (94%) included in this study were performed by the same experienced orthopaedic surgeon. All two shoulder surgeons at our institution routinely perform arthroscopy using a posterior and an anterior portal consecutively, followed by subacromial bursoscopy. The evaluation of the glenohumeral joint is performed at our institution following a protocol that is concordant with the 15-point Anatomy Review as described by Snyder [19]. Subacromial bursoscopy is also performed in a standardised manner concordant with Snyder’s Eight-point Bursal Anatomy Review by all our shoulder surgeons [20].

Patients clinically suspected of having subacromial impingement syndrome (positive Neer test) and who were found to have spurs around the inferior portion of the acromion or the acromioclavicular joint on plain radiographs and or arthroscopy, as well as patients with typical soft tissue changes at arthroscopy were considered having subacromial impingement. These typical changes encompass fraying of the bursal floor and rotator cuff tendon, a partial bursal-sided rotator cuff tear or a degenerative full thickness rotator cuff tear.

The arthroscopic findings were noted by the performing surgeon in the arthroscopy report, using the same systematic approach as for performing the arthroscopy itself.

Data collection

Data on the MR findings were obtained from the standard evaluation forms. Data on the arthroscopic findings were collected from the surgical records. If the rotator cuff or glenoid labrum was described to be “degenerative” in the arthroscopy report, this was regarded as negative for the scoring of rotator cuff tears or labral lesions. Furthermore, structures that were not mentioned in the arthroscopy report were assumed to be normal. The arthroscopy reports that were reviewed in this study did not contain any intra-operative photographs.

Statistical analysis

Intra- and inter-observer agreement for the presence or absence of a given lesion on MR imaging was calculated using the kappa statistic in SPSS. Kappa values were calculated manually when asymmetry of cross tables prevented the calculation by SPSS. The kappa values can be interpreted as poor (K = 0), slight (K = 0.00–0.20), fair (K = 0.21–0.40), moderate (K = 0.41–0.60), substantial (K = 0.61–0.80) and almost perfect agreement (K = 0.81–1.00) [15].

Accuracy of detecting pathology of the glenohumeral joint was determined for all three observers. The sensitivity and specificity for each observer were calculated per lesion type. For each reader, the percentages of correctly diagnosed lesions per lesion type, as confirmed by the arthroscopy report, were calculated. Differences in the percentage of correct diagnoses among the observers were tested for significance using the McNemar statistic. Differences were considered significant at the 5% level (p < 0.05, significant). Statistical evaluations were performed using SPSS 17.0 software.

Results

As shown in Table 2, according to the arthroscopy reports of our 50 study subjects, 16 patients (32%) had rotator cuff pathology, 18 patients (36%) had labral pathology and 7 patients (14%) had a lesion of the biceps tendon. In two patients (4%), several glenohumeral ligaments were ruptured and in three patients (6%) a Hill-Sachs defect was found at arthroscopy. In nine study subjects, degenerative changes of the glenohumeral joint were found at arthroscopy of which there was found to be osteoarthritis in one patient (2%). There was impingement of the rotator cuff in 26 (52%) of our study subjects. In two subjects (4%), no abnormalities were found at arthroscopy. Of the patients with rotator cuff pathology, seven patients had a partial thickness tear, eight patients had a full thickness tear of the rotator cuff and one patient had both. The surgical reports did not render information on the measures of the partial thickness rotator cuff tears. In six of the eight full thickness rotator cuff tears, the arthroscopy did not give conclusive information in the measure of retraction in the coronal plane.

Table 2 Incidence of pathology as found at arthroscopy (n = 50)

Table 3 summarises the inter-observer agreement among radiologists 1 and 2 and the orthopaedic surgeon per lesion type assessed on all MR examinations. Most notable finding listed here is the poor inter-observer agreement among the radiologists and the orthopaedic surgeon in assessing whether a Bankart lesion is present or absent or “not interpretable”. The inter-observer agreement on the presence or absence of impingement among the radiologists is moderate. However, the agreement between the radiologists and the orthopaedic surgeon is fair to slight. Furthermore, among the radiologists there is a fair inter-observer agreement in assessing a Hill-Sachs lesion. There is also a fair agreement between radiologist 1 and the orthopaedic surgeon. However, radiologist 2 and the orthopaedic surgeon have a poor agreement on the presence or absence of a Hill-Sachs lesion.

Table 3 Inter-observer agreement among radiologist 1 and 2 and the orthopaedic surgeon per lesion type assessed on all MR-examinations (n = 50)

As shown in Table 4, when assessing a Bankart lesion on enhanced MR images alone there is a perfect agreement among radiologists (kappa 1.00). The agreement between the radiologists and the orthopaedic surgeon is poor (kappa 0.0). Also, there is a perfect agreement among radiologists in assessing a labral lesions but the agreement between radiologists and the orthopaedic surgeon is fair. The radiologists have substantial agreement on the presence or absence of a lesion of the glenohumeral ligaments, whereas the agreement between radiologists and the orthopaedic surgeon is slight.

Table 4 Inter-observer agreement among radiologist 1 and 2 and the orthopaedic surgeon per lesion type assessed on enhanced MR images (n = 25)

The percentages of correctly diagnosed lesions confirmed by the arthroscopy report are summarised in Tables 5, 6. As marked in Table 5, radiologist 2 has the highest percentage of correctly diagnosed Hill-Sachs lesions, which is significantly higher than the other two observers (p < 0.05; McNemar’s test). Radiologist 1 is significantly less accurate in assessing osteoarthritis compared to the other observers. The orthopaedic surgeon has the highest percentage in correctly diagnosing impingement and is significantly more accurate than radiologist 2. Furthermore, the orthopaedic surgeon is significantly the most accurate observer in determining the cause of impingement. The remaining findings listed in Tables 5 and 6 are not significantly different among the observers.

Table 5 Percentage of correctly diagnosed lesions confirmed by arthroscopy for radiologist 1 and 2 and the orthopaedic surgeon (in predicting pathology of the glenohumeral joint on MR examinations) (n = 50)
Table 6 Percentage of correctly diagnosed lesions confirmed by arthroscopy report per lesion type assessed on enhanced MR images (n = 25)

In Tables 7 and 8, the intra-observer agreement of the radiologists and the orthopaedic surgeon are presented. Intra-observer agreement of the radiologists is almost perfect in assessing tears of the rotator cuff and glenoid labrum, whereas this is substantial in the orthopaedic surgeon. Intra-observer consistency of the orthopaedic surgeon is only slight in assessing ligamentous lesions, although the radiologists have moderate and substantial consistency. The orthopaedic surgeon is more consistent in predicting impingement than the radiologists.

Table 7 Intra-observer agreement for radiologist 1 and 2 and the orthopaedic surgeon per lesion type assessed on all MR examinations (n = 50)
Table 8 Intra-observer agreement for radiologist 1 and 2 and the orthopaedic surgeon per lesion type assessed on enhanced MR images (n = 25)

Tables 9 and 10 contains the sensitivity and specificity of each observer in predicting pathology of the glenohumeral joint.

Table 9 Sensitivity, specificity of radiologist 1 and 2 and the orthopaedic surgeon in predicting pathology of the glenohumeral joint on MR examinations (n = 50)
Table 10 Sensitivity and specificity of radiologist 1 and 2 and the orthopaedic surgeon in predicting pathology of the glenohumeral joint on enhanced MR images (n = 25)

Discussion

We assessed the inter-observer agreement among two radiologists and an orthopaedic surgeon in predicting pathology of the glenohumeral joint on MR examinations. We found a wide range of inter-observer agreements between the radiologists and the orthopaedic surgeon, varying per lesion type.

The inter-observer agreement between the radiologists and the orthopaedic surgeon was remarkably less than the agreement among the radiologists in assessing impingement, ligamentous lesions, Bankart lesions and labral lesions. These findings indicate that the radiologists and the orthopaedic surgeon have a different interpretation of what defines these lesion types. The orthopaedic surgeon was significantly more accurate than the radiologists in assessing impingement. An explanation for this difference is that orthopaedic surgeons commonly need to determine the cause of impingement when preparing for operative treatment. Radiologists do not routinely assess these features when evaluating an MRI of the shoulder joint. On enquiry, the orthopaedic surgeon had a more dynamic approach to assessing impingement; combining the findings of tendinopathy in one plane and spurs around acromion or acromioclavicular joint in another plane to make the diagnosis.

The radiologists and the orthopaedic surgeon also differed in their interpretation of what defines a ligamentous lesion. The orthopaedic surgeon scored elongation of the glenohumeral ligaments as positive for a (chronic) ligamentous lesion; however, this was regarded as negative in the calculation of the sensitivity, specificity and the percentage of correct diagnoses. The radiologists did not score the elongation of glenohumeral ligaments as positive for ligamentous lesions. None of the observers detected the presence of a ligamentous lesion correctly resulting in a sensitivity of 0.0%. This is the result of the low prevalence of ligamentous lesions in the study subjects.

The results of the observers determining which one of the glenohumeral ligaments was affected are useless since none of the observers detected the presence of a ligament lesion correctly in the first place.

In this study, the radiologists agreed perfectly that no Bankart lesion was present at all, whereas the orthopaedic surgeon found several Bankart lesions to be present on enhanced MR studies. This remarkable difference resulted from a different interpretation of what defines a Bankart lesion. The radiologists only scored for bony Bankart lesions whereas the orthopaedic surgeon scored classic Bankart lesions of the anterior glenoid labrum as well. In the arthroscopy reports a Bankart lesion was defined a classic Bankart lesion. However, these differences did not result in a significant difference considering the percentage of correctly diagnosed lesions. This can be explained by the high false positive rate of the orthopaedic surgeon.

The interpretation of what defines a labral lesion was not really different among the observers. However, the orthopaedic surgeon had a higher sensitivity in detecting labral lesions than the radiologists. Also the percentage of correct diagnoses was the highest for the orthopaedic surgeon, but not significantly different from the radiologists.

No significant differences were found in the percentages of correctly diagnosed Bankart lesions and ligamentous lesions.

The inter-observer agreement among radiologists was in the same range as the agreement between the orthopaedic surgeon and the radiologists in detecting osteoarthritis, lesions of the rotator cuff, lesions of the biceps tendon, a Hill-Sachs lesion and a SLAP lesion. These findings indicate that the observers have the same interpretation of the definition of these types of pathology. There were no significant differences in the percentages of correctly diagnosed rotator cuff tears, lesions of the biceps tendon and SLAP lesions. However, radiologist 1 was significantly less accurate than the other observers in detecting cases of osteoarthritis. This was because of a high false positive rate resulting in low specificity. Radiologist 2 was significantly more accurate than the other observers in determining the presence or absence of a Hill-Sachs lesion. The latter was completely due to the high specificity, because radiologist 2 did not once predict the presence of a Hill-Sachs lesion correctly.

In this study, the sensitivity and specificity of the observers in assessing rotator cuff tears, Bankart lesions, lesions of the biceps tendon and SLAP lesions are in the same range as reported in most studies [1, 4, 6, 911, 17, 21, 23, 25]. Furthermore, the sensitivity and specificity of the observers in detecting osteoarthritis is in the same range of the results of Guntern et al. [5] on predicting humeral and glenoid cartilage lesions.

However, the sensitivity of the observers in this study in detecting impingement is notably lower than the findings of Iannotti et al. [9]. In the study by Iannotti et al. [9], however, the MR studies were retrospectively reviewed after the findings at surgery proofed the first assessment to be false.

All three observers had remarkably lower sensitivity in detecting Hill-Sachs lesions than reported in other studies [4, 21]. This can be explained by the low prevalence of Hill-Sachs lesions in the arthroscopy reports. It is likely that small Hill-Sachs lesions were not mentioned in the arthroscopy report because these do not need surgical treatment. The prevalence of Hill-Sachs lesions in the arthroscopy reports is therefore an underestimation of the true prevalence in the study subjects. This led to a lower true positive rate and consequently the relatively low sensitivity of the observers in assessing Hill-Sachs lesions.

Also in assessing labral lesions specificity of all three observers was surprisingly lower than in most studies [1, 2, 4, 9, 21]. However, these findings are difficult to compare because of the small number of MR arthrographies in this study. All three observers scored very low percentages of correct diagnoses in determining the location of labral lesions. This is probably due to nonspecific terminology used in the arthroscopy report, noting for example only that a labral tear is anterior. The observers, however, scored several of these labral lesions as being antero-inferior. These cases were considered “false” in the calculation of sensitivity and specificity and the percentage of correctly diagnosed lesions.

Concordant with the recent literature [21], the sensitivity of the observers in detecting lesions of the biceps tendon in this study is low. This can be explained by the technical shortcomings of MR imaging in general. The arched course of the tendon of the long head of the biceps migrating trough every plane makes it typically susceptible to the “blind spots” of MR imaging, such as the partial volume effect and magic angle artefacts. Furthermore, anomalous origins of the long head of the biceps brachii can also trouble the evaluation on MR images, however, rarely encountered in daily practice [7, 12, 13, 24].

Intra-observer agreement

Internal consistency of our observers varied from almost perfect-to-slight. The radiologists were more consistent than the orthopaedic surgeon in detecting labral and rotator cuff pathology. The orthopaedic surgeon was most consistent in assessing impingement. Overall, all three observers were most consistent in predicting pathology in which they scored the highest percentages of correct diagnoses. The latter indicates that the differences we found in this study are reproducible.

Limitations

The most important limitations of our study lay in its retrospective character. First of all, the use of arthroscopy reports as the standard of reference is precarious. The quality of these surgical reports is generally moderate, which makes them susceptible to interpretation. Moreover, structures not mentioned in the surgical report were assumed to be normal. Second, only patients with an indication for arthroscopic surgery were included in this study. This is only a certain proportion of all patients who undergo MR examination of the shoulder. The assessment of pathology in these patients is of vital importance to determine whether or not arthroscopic surgery is indicated.

The fact that patients who had prior surgery of the investigated shoulder were included in our study could have blurred the assessment of these MR images. Also, the differentiation between physiologic degenerative change and pathologic degeneration was complicated in this study. The observers could not determine the clinical relevance of the finding of degenerative change, because they were blinded for patient’s age and clinical information. In daily clinical practice, the assessing radiologist and orthopaedic surgeon do have access to the patient’s medical history.

Intra-articular injection for arthrography was performed in most but not in all cases with a posterior approach. According to preference of the performing radiologist, an anterior approach could also have been used. This made it difficult for the observers to differentiate between contrast spillage through the puncture hole and (partial thickness) rotator cuff tears.

The use of MR arthrography in detecting partial lesions of the rotator cuff has been advocated [3, 16, 23]. In our institution, we routinely use MR imaging to assess rotator cuff pathology. We believe that due to technical improvement of image quality in modern MR machines that these unenhanced MR studies are also capable of partial rotator cuff lesions. Also, the use of the ABER view can be considered to improve the visualisation of anterior glenoid labrum, the labral–bicipital complex and the rotator cuff [2, 11, 14, 22]. Unfortunately an appropriate shoulder coil that is needed for this arm position is not available yet at our institution.

The above-mentioned factors play a roll in the diagnostic performance of the observers. However, this affected all three observers equally and did not affect their differences.

Daily clinical practice

We used a study design that resembled daily clinical practice. The observers were not provided with particular criteria to use for the interpretation of the MR examinations. Therefore, the differences we found in this study are the differences in the observers’ own interpretation.

At our institution, MR arthrography is routinely performed when labral or ligamentous pathology is clinically suspected. Unenhanced MR imaging is routinely performed when the patient is clinically suspected of having shoulder pathology other than labral or ligamentous lesions. This helped the observers to look for certain pathology in particular and is therefore similar to daily clinical practice in which the radiologist and orthopaedic surgeon have access to the clinical information.

Conclusion

Radiologists and the orthopaedic surgeon at our institution differed in predicting some but not all types of pathology of the glenohumeral joint on MR imaging. The biggest differences were found in the assessment of Hill-Sachs lesions, osteoarthritis and impingement. The orthopaedic surgeon performed better than the radiologists in the assessment of impingement. Furthermore, the differences in the interpretation of what defines Bankart lesions and ligamentous lesions were found. It is important for orthopaedic surgeons and radiologists to become aware of these differences to obtain mutual understanding and to learn from each others expertise.