Introduction

Rotator cuff calcific tendinitis (RCCT) is frequently diagnosed: reported incidence rates range from 6.8–54 % [14]. Nevertheless, information on its epidemiology, radiological characteristics, long-term course and prognostic factors is scarce. In current literature, generally small populations are assessed, with short follow-up periods. This includes a recent trial from our institution, in which results of barbotage (needling and lavage) were superior to subacromial injections after one year of follow-up [5]. Surprisingly, results of both treatments appeared comparable in the case of Gärtner type I calcifications [6]. However, the measurement properties and reliability of most radiological RCCT characteristics, including the Gärtner classification, and their association with long-term outcome are unclear.

The current observational study is the first to assess long-term shoulder function in a large group of RCCT patients treated with barbotage (under local anesthesia) or more conservative methods. Additionally, patient demographics, radiological characteristics (size, location, Gärtner classification), interobserver agreement of radiological characteristics, and the association of these baseline parameters with long-term outcome are evaluated.

Typical RCCT symptoms are pain in the deltoid region, with variable functional impairment [79] and variable duration of symptoms, ranging from months to years [1012]. Treatment of these generally self-limiting symptoms is usually conservative, e.g. with non-steroidal anti-inflammatory drugs (NSAIDs) and physical therapy. In the case of persisting or severe symptoms, more invasive treatments can be applied, including corticosteroids injections, barbotage, extracorporeal shock wave therapy (ESWT), or surgery [4, 79, 1338]. Only a few studies have compared various treatments and their long-term effects. It is also unclear which patients follow a mild and self-limiting course, or who might benefit from more invasive treatment strategies. Consequently, clinical decision making is often based on personal experience and regional preferences.

With regard to the epidemiology of RCCT, several etiologies have been reported, including cell-mediated calcification, RC degeneration, RC overuse and micro-trauma, genetic predisposition, local metabolic or hemodynamic abnormalities, and subacromial impingement [4, 3945]. Based on these theories, RCCT would predominantly affect the dominant arm or both arms in individuals with suboptimal vascular status (e.g. middle to older age, diabetes, or smokers) with frequent overhead activities. However, this has not been confirmed in clinical studies.

Radiological characteristics of RCCT, including the number, size, appearance (Gärtner classification) and location of calcifications, have been associated with clinical outcome by some [3, 23], but this association has been disputed by others [16, 23, 4649]. There is also little knowledge of radiological calcification characteristics in large patient groups and their measurement properties, including reliability.

We assessed demographic and radiological characteristics in a large group of patients with RCCT. Long-term clinical outcomes were evaluated with questionnaires. Our objectives were to evaluate (1) baseline demographics and radiological characteristics, (2) interobserver agreement of common radiological RCCT measures, and (3) the association of demographic and radiological characteristics with long-term shoulder function. More knowledge on these factors may help in predicting patients’ prognoses and in clinical decision making, i.e. when considering more invasive treatments methods for patients with persisting symptoms and negative prognostic factors.

Materials and methods

Study population and baseline RCCT characteristics

Since 1980, patients referred to the Leiden University Medical Center Orthopedics department received a medical diagnosis code. With these codes, all patients diagnosed with RCCT in the period of January 1980 until November 2009 were identified. During most of this period, our institution was considered a center of expertise with regard to RCCT and one of few regional institutions performing barbotage.

Medical records and radiology reports were reviewed for eligibility criteria by the first author, who was not involved in patient care. Patients were included if RCCT was demonstrated on available radiographs and/or noted in the radiology reports, and when aged ≥18 years at time of diagnosis. Patients were excluded in the case that no medical records, radiographs or radiology reports were available, or if the diagnosis of RCCT was not mentioned in these records.

Accordingly, 420 had the RCCT diagnosis code. A total of 78 were excluded because no definite confirmation of RCCT could be made after reassessing all available medical and radiology records (radiographs and reports), or because they were <18 years old, leaving 342 confirmed RCCT patients available for the analysis of baseline demographics, radiographs, and disease characteristics (Fig. 1). Furthermore, these 342 patients were the source population for the follow-up evaluation.

Fig. 1
figure 1

Study flowchart

The following baseline data were recorded: affected side(s), age, gender, date of diagnosis, age at diagnosis, type of treatment (barbotage, or conservative treatment (standard conservative treatment at the time included physical therapy, NSAIDs and/or subacromial corticosteroids injections)), duration of symptoms at presentation, diabetes, tendon problems at other sites, systemic inflammatory diseases, and other systemic or musculoskeletal diseases.

Follow-up and questionnaires

Addresses of the 342 patients and data on patient deaths were checked using the municipal personal records database. All available subjects were contacted by mail for completion of a general information form, the Western Ontario Rotator Cuff index (WORC), which was specifically developed to assess shoulder function and quality of life of patients with rotator cuff disorders, and the Disabilities of the Arm, Shoulder and Hand score (DASH) [5052]. Also, arm dominance, any diseases for which medication was currently used, medical care history and any diseases affecting the shoulder and arm function were recorded. Subjects indicating the latter were excluded from further analyses. Reminders were sent after 4 and 8 weeks to all subjects from whom no reply was received.

Of the 342 confirmed RCCT patients, 31 could not be contacted due to unknown address or death. Of the remaining 311, 252 replied (81.0 %), and 194 could be included for follow-up evaluation (Fig. 1). Demographic baseline data of the available (194 responders) and non-available subjects are depicted in Table 1.

Table 1 Baseline demographics and disease characteristics

As all subjects were contacted from November 2011 onwards, minimum follow-up was 2 years. All responders gave written informed consent and the study was approved by the institutional Medical Ethics Committee (study ID: P09.239).

Baseline radiological characteristics, interobserver agreement and association with long-term outcome

Radiographs acquired within 1 year of the date of diagnosis and before eventual barbotage were used for the evaluation of baseline calcification characteristics. These were available for 204 shoulders in 196 patients. Due to national regulations, radiographs older than 15 years were generally destroyed.

Radiographs were evaluated independently by two trained researchers (PBW, RvA), blinded for the clinical status of patients. In a consensus meeting, final radiological outcome measures (see below) were determined for each subject. In the case of disagreement, radiographs were re-evaluated by an experienced musculoskeletal radiologist (MR), serving as an adjudicator.

Affected tendon(s), Size (mm), and number of calcifications per shoulder were recorded on anteroposterior (AP) (internal and external rotation) and axial views, which are all included in the standard shoulder protocol at our institution. Locations of all calcific deposits in each shoulder were further categorized using the system of Ogon et al., which we refer to as Location [3]. With this method, a line is drawn from the lateral border of the acromion, parallel to the glenoid, on external rotation AP radiographs. Location is the distance (mm) between this line and the medial border of the calcification (Fig. 2). More subacromial extension (negative Location value) has been reported a negative prognostic factor [3].

Fig. 2
figure 2

Locations of the calcific deposits were evaluated using the system of Ogon et al., which we refer to as Location in this paper [3]. A line perpendicular to the most lateral border of the acromion is drawn, parallel to the glenoid, on external rotation AP radiographs. Location is the distance (mm) between this line and the medial border of the calcification, where negative values represent a medial calcification border, i.e. between the glenoid and the drawn line

Calcific deposits were also assessed using Gärtners classification: Type I calcifications have a sharp border and a dense structure; Type II calcifications either have a sharp border and inhomogeneous structure or a vague border and a homogenous structure; Type III calcifications have a vague border, and are more or less transparent, with a cloudy appearance (Fig. 3) [6]. These types allegedly display the natural course of RCCT and are potentially valuable in determining patients’ prognosis [23]. And in a recent trial at our institution, the results of barbotage were superior to subacromial injections in patients with Type II and III calcifications, but not in case of Type I calcifications [5].

Fig. 3
figure 3

Examples of Gärtner calcification classification types [6]. A) Gärtner type I: sharp border and a dense structure; B) Gärtner type II: either a sharp border and an inhomogeneous structure or a vague border and a homogenous structure; C) Gärtner type III: a vague border, more or less transparent in structure and a cloudy appearance

For assessing interobserver agreement, metric measures (Size and Location) and Gärtner classifications of all available radiographs (analogue and digital) and all calcifications were used (n = 248 calcifications in 196 patients). To evaluate the association of baseline radiological characteristics with long-term outcome, characteristics of the largest calcification per patient were used. For these final analyses, all radiographs could be used with regard to e.g. Gärtner classification and affected tendon, but for metric measures, only available digital radiographs (n = 50) could be used, as their magnification factor was known and consistent.

Statistical analysis

Demographics and disease characteristics were expressed using proportions, means and standard deviations, or medians and ranges where appropriate. Data distributions were evaluated using histograms. Questionnaire data were processed similarly. In the case of a missing WORC item, its value was estimated by the mean of the other items in its domain, according to instructions of the designers of the WORC. In case of more missing values in a single domain, the questionnaire was excluded (n = 14). Similarly, 26 incomplete DASH questionnaires were excluded.

For calcification characteristics, interobserver agreement was assessed with the Kappa statistic for Gärtner classifications, and with paired t-tests and intraclass correlation coefficients (ICC) for Size and Location.

In this observational study, the association of baseline characteristics with long-term shoulder function was assessed with the WORC as a primary outcome. Using logistic regression (because of skewed outcomes for DASH and WORC scores), the univariate association of each recorded variable with inferior outcome was evaluated and expressed in odds ratios (OR) with 95 % confidence intervals (95 %-CI). WORC-scores ≥80 were defined as a good outcome. Similarly, DASH-scores ≤20 were regarded as a good outcome. Sensitivity analyses were performed for alternative WORC and DASH cut-offs.

To gain more insight in independent prognostic factors, multivariable logistic regression models were constructed for the WORC and DASH. In order to avoid overfitting, no more than 10 % of the number of events were included as covariates in each model. Variables were selected based on clinical relevance and the univariate results (p-values ≤0.05). Because of missing data on the duration of symptoms in several patients and the limited number of available digital baseline radiographs, associated variables were not entered in the multivariable models.

Statistical analyses were performed using SPSS statistics 20.0 (IBM, Armonk, New York, USA).

Results

Baseline demographics and disease characteristics

Of 342 RCCT patients, 203 (59.5 %) were female. Mean age at diagnosis was 49.0 years (SD = 10.0). In 73 patients (21.3 %), both arms were affected (bilateral disease). Overall, 200 (58.5 %) underwent barbotage (Table 1).

With regard to concomitant pathologies, 17 patients were diagnosed with diabetes, seven with kidney disorders, four with thyroid disorders, two had acromegaly and one was HIV-positive. Concomitant tendon disorders were reported in the records of 15 patients (4.4 %): 11 had had an episode of lateral epicondylitis of the elbow, two had Achilles tendon calcifications, one had biceps tendinitis and one had fasciitis plantaris.

In the 196 patients with available baseline radiographs (248 calcifications, i.e. bilateral and multiple calcifications), the supraspinatus tendon was affected in 167 (85.2 %), and 63 (32.1 %) had a Gärtner type I calcification. Mean calcification Size was 18.7 mm (SD = 10.1), with a mean Location measure of -10.1 mm (SD = 11.8) (Table 2). Of this subpopulation, 58.3 % underwent barbotage.

Table 2 Baseline data obtained from available analog (n = 154) and digital radiographs (n = 50)

Interobserver agreement of radiological RCCT measures

For interobserver agreement, the mean difference between observers for Size was 0.11 mm (95 %-CI: -0.46–0.67; p = 0.71) and for Location, 0.08 mm (95 %-CI: -1.16–1.00; p = 0.89), with ICCs of 0.84 (p < 0.001) and 0.77 (p < 0.001), respectively. The Kappa-value for the Gärtner classification was moderate, with a value of 0.466 (p < 0.001). The Kappa-value was 0.471 (p < 0.001) when assessing interobserver agreement for Gärtner classification I vs. II and III combined.

Long-term shoulder function

The 194 subjects who returned questionnaires had a mean follow-up of 14 years (SD = 7.1, range 2-33, median 13 years). Mean current age was 62 years (SD = 9.2, range 39-89). Median WORC was 72.5 (range, 3.0-100.0) and median DASH 17.0 (range, 0.0-82.0).

For the WORC, 99 (55 %) of 180 available subjects had a WORC <80, and 76 (42.2 %) a WORC even below 60 (Fig. 4A). Univariate analyses demonstrated that patients with a longer duration of symptoms at presentation, bilateral disease, and dominant arm involvement had statistically significant lower long-term outcomes (WORC < 80) (Table 3). Additionally, with OR = 1.82 (95 %-CI: 0.99-3.35, p = 0.05), female gender had a clinically relevant negative association with long-term outcome.

Fig. 4
figure 4

Histograms of the clinical scores A) WORC score: 55 % had inferior long-term functional outcome, with scores below 80 percentage points; B) DASH score: 45 % scored had scores over 20 points, indicating disability

Table 3 Univariate and multivariate analyses of the associations of baseline characteristics with inferior long-term clinical outcome, expressed in a WORC score <80 points

In total, 106 subjects had both baseline radiographs and clinical scores available. Results of univariate logistic regression analyses with radiological parameters are depicted in Tables 3 and 4. The number of calcifications (per shoulder) had an OR = 2.1 (95 %-CI: 0.97-4.62) for WORC < 80, indicating that a larger number of calcifications was associated with inferior long-term shoulder function in our data.

Table 4 Univariate and multivariate analyses of the associations of baseline characteristics with inferior long-term clinical outcome, expressed in a DASH >20 points

The final multivariate WORC model included gender, age at follow-up, years after diagnosis, bilateral disease, dominant side involvement, and treatment method. Female gender had a significant negative effect: OR = 2.2 (95 %-CI: 1.1–4.2). Effect sizes for bilateral disease (OR = 2.2 (95 %-CI: 0.94–5.1)) and dominant arm involvement (OR = 1.7 (95 %-CI: 0.79–3.6)) also indicated relevant negative effects, but did not reach statistical significance. There was no significant association for WORC outcome with applied treatment, either barbotage or more conservative methods (Table 3). Sensitivity analyses using WORC cut-off points <70 and <90 gave similar results (data not shown).

For the DASH, 75 (44.6 %) subjects scored ≥20 points and 37 (22.0 %) subjects scored ≥40 points, indicating inferior long-term shoulder function (Fig. 4B). There were no variables with significant effects found with univariate analyses (Table 4). The final multivariable model for the DASH included gender, age at questionnaire, years after diagnosis, bilateral disease, dominant side involvement, and treatment method. Also in this model, female gender had a statistically significant (negative) effect: OR = 2.0 (95 %-CI: 1.0–4.0) (Table 4). Sensitivity analyses using DASH cut-off points >10 and >30 gave similar results (data not shown).

Discussion

In this first study involving both a long-term follow-up and a large group of RCCT patients, the results show that many subjects have persisting shoulder complaints more than a decade after diagnosis, regardless of applied treatment modality (barbotage vs. conservative). With a mean follow-up of 14 years, about 55 % had WORC scores below 80 points and 42.2 % were even below 60 points, indicating severely impaired shoulder function. Dominant arm involvement, bilateral disease, longer duration of symptoms at presentation, larger number of calcifications and female gender all appeared to be negative prognostic factors for long-term shoulder function. We found no association of outcome with common radiological parameters (calcification size, location, Gärtner classification), and interobserver agreement was good for size and location, but only moderate for the Gärtner classification.

Long-term follow-up

Previous studies on calcific tendinitis have mostly reported on small populations with a relatively short follow-up. There are some studies with >2 years follow-up [10, 13, 5361], or large patient groups (n > 100) [15, 22, 34, 6266], but the combination of both is scarce [3, 9, 11]. In one of the few larger RCCT cohorts with a long-term follow-up, Serafini et al. report good outcomes for both barbotage and conservative treatment, in contrast to our results, with average Constant Scores >90 points at 10 years [9]. In accordance with our results, they found no difference in clinical outcome between barbotage and conservative treatment. However, their conservative group consisted of patients refusing to undergo barbotage, who instead underwent unreported other treatment methods and had a high drop-out rate. A possible explanation for their superior results is that the mean age at diagnosis was 40.2 years, compared to our 49.0 years. Also, RCCT was diagnosed in 323 shoulders in ±3 years, versus 420 patients in 29 years at our institution. The latter might be partially due to both the high density of hospitals and the fact that a general practitioner functions as a gatekeeper for the referral to a medical specialist in our country. Both can potentially limit the referral of patients for treatment, specifically in cases with mild symptoms. Finally, it is possible that referring physicians are more familiar with RCCT and its treatment in the geographical region of Serafini. This could lead to earlier diagnosis, at a younger age, and earlier referral and adequate treatment. Concordantly, our univariate analyses show that longer duration of symptoms at presentation is related to inferior long-term outcome. Lastly, the Constant Score as applied by Serafini et al. is a general shoulder function score, in contrast to the WORC, which is a validated score for rotator cuff problems specifically.

Demographics and prognostic characteristics

Dominant arm involvement was associated with inferior long-term outcome. A plausible explanation is that dominant arm involvement has a larger influence on the patient’s life. The dominant arm was also more often affected than the non-dominant arm. This is in contrast to most studies that either find no influence of arm dominance on outcome, or did not analyze this effect [9, 67, 68]. Bilateral disease had a negative association with clinical outcome, which is supported by others [3, 16].

The higher incidence of RCCT in women is in accordance with most other studies. Some explain this by the higher prevalence of endocrine disorders in women (thyroid and estrogen metabolism) [67]. Of all recorded thyroid disorders we identified from the medical records, the majority (75 %) was in females. Similarly, of all patients with concomitant tendon disorders, 66.6 % were female. Although these data are probably underreported and we did not investigate underlying associations, our results confirm a higher incidence of RCCT in the presence of systemic diseases in women.

We found no effect of treatment (barbotage vs. conservative) on long-term outcome. Because of the alleged self-limiting character of RCCT, treatment is usually conservative [4, 16, 57]. However, as supported by our results, symptoms can persist for more than a decade. Various more invasive methods have been reported for patients with severe or persisting symptoms, including ESWT [26, 28, 29], barbotage [9, 15, 19, 34, 38], and surgery [13, 14]. But reported outcomes are highly variable and only a few studies compare treatment methods. More research is needed to gain knowledge on indications and long-term effects of various treatment methods, specifically for the more invasive techniques.

Radiological measures and prognostic characteristics

This is one of the first studies assessing interobserver agreement and the prognostic value of radiological characteristics of calcifications: the Gärtner classification [6], calcification Size, and the Location measure of Ogon et al [3]. Both metric measures (Size, Location) had good ICCs and small mean interobserver differences. For the Gärtner classification, there was only moderate agreement (Kappa 0.47), comparable to previously reported values in a smaller patient group [48]. Our previous study showed, that the results of barbotage were particularly superior to subacromial injections in patients with Type II and III calcifications [6]. Therefore, we also assessed interobserver agreement for Type I vs. Type II/III calcifications, but there was a similar moderate agreement. Overall, we found no prognostic value of radiological characteristics. Confirmatory to this, others have reported that symptoms and treatment outcome do not depend on the calcific deposit classification and size at baseline, but patients with radiological improvement over time (e.g. decrease in size or Gärtner classification) do report better clinical results [16, 46]. The latter was not investigated in our study. However, we did find a relevant association between a higher number of calcifications at baseline and inferior long-term outcome.

Strengths and limitations

There are some limitations that have to be taken into account when interpreting our results. Firstly, as with all retrospective studies, a substantial part of our data depends on accurate medical record keeping. Furthermore, selection bias could have played a role; 194 of 342 subjects could be included for the follow-up part of our study, and only a limited number also had radiographs available. However, the baseline characteristics of subjects who did not sent a reply were comparable to the evaluated subjects (Tables 1 and 2). Furthermore, sensitivity analyses including only subjects who had both radiographs and clinical scores available showed similar results (Supplementary Table S1). And despite the above, this is one of the largest studies with the longest follow-up of its kind.

Secondly, it is unclear whether the inferior long-term shoulder scores are due to persisting, residual, or recurrent RCCT, or other shoulder pathology (whether or not consequent to RCCT). Although it would be interesting to know whether subjects with inferior outcomes actually still have RCCT, the fact that many subjects (formerly) diagnosed with RCCT have serious symptoms in the long-term is very relevant information by itself: the clinical scores of many subjects in this study are inferior compared to the general population, even years after diagnosis. This is one of the first studies showing this phenomenon. Further research is needed to investigate underlying conditions in the long-term course of RCCT.

Thirdly, it is possible that some patients might have had (secondary) treatments in other institutions. However, our institution was one of few regional centers performing barbotage and other RCCT treatments during the study period. Furthermore, despite potential secondary treatments, we still found persisting symptoms in many subjects after more than a decade.

Lastly, there could have been confounding by indication. Patients who had barbotage are likely to have had other or more serious symptoms. Taking long-term outcome into account, OR’s of treatment method were around 1.0 for WORC and DASH, meaning that if specifically patients with worse symptoms in the past had a barbotage, they had no inferior long-term outcome compared to the more conservatively treated patients.

Conclusions

In this observational study, we found that over 55 % of RCCT patients, treated with either barbotage or more conservative modalities, have symptoms and impaired shoulder function at a mean of 14 years after diagnosis. These observations are in contrast to the general view that RCCT is a self-limiting disease. Dominant arm involvement, bilateral disease, a larger number of calcifications, female gender, and longer duration of symptoms were associated with inferior functional outcome. We found no associations between treatment method and baseline radiological characteristics with long-term outcome. Interobserver agreement of the radiological Gärtner classification was only moderate.

Applying these findings in clinical decision making might be helpful in preventing a long-term symptomatic course; it is plausible that a wait-and-see strategy or conservative treatments (not aimed at decreasing the calcium deposits) are not necessarily the most effective methods in patients with persisting symptoms, no signs of resorption over time, and one or more of the reported negative prognostic factors. We suggest taking into account these factors in future (prospective) studies, in order to evaluate whether earlier-applied and more invasive forms of treatment result in better outcome in selected patients.