Introduction

Although the different types of shoulder arthroplasty usually are successful for treatment of a wide range of glenohumeral disorders [4, 10, 15, 20, 31, 32, 46, 47, 49, 52], the results of these procedures are unpredictably variable: a substantial number of patients having shoulder arthroplasty experience minimal to no improvement or complications. Some of the factors previously associated with poorer results include patients with shoulders with multiple surgeries before the arthroplasty, patients with shoulders with work acquired injuries, patients with comorbidities, and surgeons with limited arthroplasty experience [3, 5, 7, 17, 21, 25, 2729, 41, 54, 56, 57]. It is in the interest of patients, surgeons, and the economy to further define, for individual patients, the factors prognostic of better outcomes from shoulder arthroplasty. This knowledge can help inform each patient’s expectations of the surgical outcome and may influence the decision to proceed with surgery.

Although some guidance can be gained from studies of registries [2, 19, 48], population databases [9, 41, 46], or retrospective case series [13, 61], such reports rarely include prospectively collected details of the characteristics of the patient, the characteristics of the shoulder, and the surgical techniques that influence the result. We suggest that if factors predictive of better outcomes for individual patients could be prospectively identified, patient-surgeon shared decision-making and expectation would be better informed.

In this study we asked: (1) What factors are associated with better outcome after shoulder arthroplasty? (2) What are the sensitivities, specificities, and positive (PPV) and negative predictive (NPV) values of a multivariate predictive model for better outcome?

Patients and Methods

Study Design and Study Subjects

The protocol for this prospective study was established before the first patient was enrolled. Four hundred twenty-five English-speaking patients presenting to either of two experienced shoulder surgeons (FAM, WJW) for consideration of primary shoulder arthroplasty at the University of Washington Medical Center between August 24, 2010 and December 31, 2012 were invited by a research coordinator (SMR) to prospectively enroll in this study approved by our institutional review committee. For patients in whom an arthroplasty was performed on both shoulders during the study period, only the first shoulder was included in the analysis. Eight patients declined to participate, 37 did not have surgery, 24 had surgery after December 2012, and 17 had a procedure other than a primary arthroplasty, leaving 339 patients consenting to participate in this study. Two patients were excluded because they were missing three or more baseline variables. Two-year outcomes were missing for 43 patients, leaving 275 for the initial analysis of the relationship between 2-year outcomes and the characteristics of the patient, the shoulder, and the procedure (Fig. 1). As explained below the characteristics of the missing 43 patients were considered in creating an enhanced predictive model.

Fig. 1
figure 1

Four hundred twenty-five patients were invited to participate in this study using the patient-reported Simple Shoulder Test (SST). Two-year or revision outcomes were available for 337 patients.

The patients, diagnoses, and procedures included in this study represent the actual diversity of the practice of shoulder arthroplasty wherein surgeons and patients choose among various procedures (hemiarthroplasty, arthroplasty for cuff tear arthropathy, ream and run, total shoulder arthroplasty, reverse total shoulder arthroplasty) for managing various conditions (osteoarthritis, rotator cuff tear arthropathy, capsulorrhaphy arthropathy, avascular necrosis, posttraumatic arthritis, chondrolysis, rheumatoid arthritis, secondary arthritis). In each case the decision of which surgery would be performed was based on shared patient-surgeon decision-making that included consideration of the diagnosis, the condition and goals of the patient, the prosthetic options, and the experience of the surgeons, in addition to a full discussion of the possible risks and benefits. We typically suggested total shoulder arthroplasty for patients with glenohumeral arthritis and clinically functional rotator cuffs. For well-motivated patients who desired a high level of shoulder function and who wished to avoid the potential risks and limitations associated with a polyethylene glenoid component and polymethylmethacrylate, we presented the possibility of a ream and run procedure. For individuals with arthritis and rotator cuff deficiency who had active elevation greater than 100°, we suggested arthroplasty using a cuff tear arthropathy prosthesis. For individuals with pseudoparalysis and/or anterosuperior escape, we recommended a reverse total shoulder arthroplasty. Finally, we proposed a hemiarthroplasty alone for patients with avascular necrosis or posttraumatic deformity of the proximal humerus if the glenoid articular surface appeared intact or in extremely tight shoulders with insufficient joint volume to accommodate a glenoid component.

Description of Experiment, Treatment, or Surgery

Informed consent for study enrollment was obtained by a research coordinator (SMR), who then recorded 21 baseline characteristics of the shoulder, the patient, and the procedure (Table 1). Before surgery, each shoulder had a standardized AP radiograph that enabled evaluation for superior decentering of the humeral head on the glenoid, and a standardized axillary view with the arm in a functional position (elevated 90° in the plane of the scapula) that enabled evaluation of the glenoid type, the angle between the glenoid face and the scapular body, and the centering of the humeral head on the glenoid (recognizing that functional posterior decentering of the humeral head on the glenoid can be overlooked by imaging made with the arm at the side) [29, 40, 43, 45, 55, 59].

Table 1 Baseline characteristics of 337 patients by 2-year outcomes

The procedures performed for the study patients included hemiarthroplasty (n = 27), ream and run arthroplasty (n = 115), cuff tear arthropathy arthroplasty (n = 24), and total shoulder arthroplasty (n = 155)—all using components from the Global Advantage® Shoulder System (DePuy Synthes, Warsaw, IN, USA) with standard-length humeral stems. The essential elements of our surgical techniques were described in detail previously [36, 39, 42, 43]: a deltopectoral approach, subscapularis peel, retention of the long head tendon of the biceps unless it was damaged, humeral head cut in 30o retroversion, conservative glenoid reaming without specific attempt to normalize glenoid version, use of the Anchor Peg Glenoid prosthesis (DePuy Synthes) [63] in total shoulder arthroplasties, and fixation of the humeral component using impaction autografting [6, 24]. Any tendency for excessive posterior translation at the time of surgery was managed with anteriorly eccentric humeral head components and/or rotator interval plication [29, 36, 39, 42, 43]. None of the glenoid components were posteriorly augmented. No glenoid bone grafts were used. There were 16 reverse total shoulder arthroplasties using either the Delta (DePuy Synthes) or the Reverse Shoulder Prosthesis (RSP; DJO Global, Vista, CA, USA).

Description of Followup Routine

A research coordinator (SMR) contacted the patients at 6 weeks and at 3, 6, 12, 18, and 24 months after the index surgery, collecting outcome data and documenting any secondary procedures. The Simple Shoulder Test (SST) was the primary instrument used to document the patient-reported status of the shoulder before and sequentially after the shoulder arthroplasty [8, 18, 2123, 27, 3335, 37, 51].

Variables, Outcome Measures, Data Sources, and Bias

Instead of using a predetermined value for a minimum clinically important difference (MCID) [58], we characterized the clinical outcome as the percent of the maximal possible improvement in the preoperative SST realized at 2 years [22, 43, 44]. Recognizing that 12 is the highest possible score on the SST, the change as a percent of maximal possible improvement is calculated from the formula:

$$100\% \times \left( {{\text{SST score at }}2{\text{ years}}{-}{\text{preoperative SST score}}} \right)/\left( {12{-}{\text{preoperative SST score}}} \right).$$

This approach enabled us to set a relatively high standard for analyzing the effects of baseline characteristics on the result: we defined a better outcome as a positive change of at least 30% of the maximal possible improvement in the SST at 2 years after surgery in the absence of a second procedure within 2 years of the index arthroplasty. Shoulders that either did not improve by at least 30% of the maximal possible improvement (including those that had no change) or had any type of second procedure within 2 years were characterized as having worse results. The rationale for this approach is that it sets a higher standard for a better outcome than the MCID, increasing the number of patients with worse outcomes against which those with better outcomes could be compared. The rationale for this approach was explained in a prior publication [22].

At 2 years, patients also were asked to rate their result as delighted, pleased, mostly satisfied, mixed feelings, mostly dissatisfied, unhappy, or terrible. For the open repeat procedures, the results of cultures were documented, noting that all revisions were cultured for Propionibacterium according to our established protocol [38].

Statistical Analysis, Study Size

We determined odds ratio (OR) effect-size estimates for three levels (20%, 33%, 50%) of prevalence of candidate binary baseline characteristics [11, 12]. Given a conservative lost-to-followup rate of 20% at 24 months and an alpha of 0.05, and assuming a failure rate of 20%, we aimed to enroll a minimum of 330 patients to have 80% power to detect ORs greater than 2.4, 2.2, and 2.1 for characteristics with 20%, 33%, or 50% prevalence, respectively, among patients having shoulder arthroplasty.

We used univariate logistic regression models to determine the association between the 2-year outcome and each of the baseline characteristics. We constructed a multivariate logistic regression model with the 2-year outcome as the response variable and all of the baseline characteristics as the independent variables. We then performed backward stepwise variable selection using the Akaike Information Criterion (AIC) [1] to determine which characteristics to include in the model. We made this choice instead of relying on the p values from the univariate analysis alone, which can be misleading when a large number of possibly correlated factors are considered. At each step, the backward stepwise procedure evaluates all variables currently in the model and then removes one variable which, when being dropped, improves the current model the least in terms of the AIC. The process is repeated until no single variable can be dropped to further improve the AIC of the model.

Forty-three patients lacked 2-year outcome data. Certain baseline characteristics (such as female gender, work relationship of the shoulder problem, history of smoking) were relatively more prevalent among the patients without final results and among patients having worse outcomes (Table 1). To account for the potential biasing effect of omitting these patients from the study, we conducted our primary analysis on the available data and then applied the method of generalized estimating equations with robust standard errors and inverse probability weighting to generate an enhanced model [50].

To assess the prognostic performance of the enhanced model, we compared the in-sample prediction using the enhanced model against the true 2-year outcomes. We performed 10-fold cross-validation, where the entire data set was randomly split in 10 parts and predictions for patients in each fold were made using a different model generated from the remainder of the data. We examined overall performance of the enhanced model using the receiver-operator characteristic (ROC) based on the complete set of cross-validated predictions, where the area under the ROC curve indicates how well the model performed at distinguishing patients with better outcomes from those with worse outcomes. We used the ROC curve to estimate a cutoff probability that maximizes sensitivity and specificity and report on the PPV and NPV predictive with that cutoff value.

All statistical analyses were performed using the R statistical analysis package (Version 3.2.3; R Core Team, Vienna, Austria). All data were maintained by the research coordinator (SMR) in a Research Electronic Data Capture (REDCap) database [26].

Results

Results Overview

For the 275 patients with known 2-year outcomes, the SST scores improved from 3.8 ± 2.7 to 9.3 ± 2.9 (p < 0.001). This represents an average improvement of 67% of the maximal possible improvement. Single Assessment Numeric Evaluation [62] scores improved from 37.6 ± 21.4 to 78.9 ± 20.0 (p < 0.001). SF-36 Physical Component Summary [16] scores improved from 39.8 ± 8.9 to 47.8 ± 10.5 (p < 0.001). SF-36 Mental Component Summary [16] scores were unchanged: 51.6 ± 8.3 and 51.5 ± 7.6. (p = 0.891).

Two hundred thirty-seven (81%) of the 275 patients with known results met our definition of having a “better” outcome. Of these, 198 (84%) reported that they were delighted, pleased, or mostly satisfied with the result of their surgery; 19 (8%) had mixed feelings, 10 (4%) rated their result as mostly dissatisfied, unhappy, or terrible, and the responses of 10 (4 %) were missing. Fifty-seven (19%) of the 275 patients with known results met our definition of a “worse” outcome; of these, 18 (32 %) rated their result as mostly dissatisfied, unhappy, or terrible; 15 (26%) were delighted, pleased, or mostly satisfied, 13 (23%) had mixed feelings, and the responses of 11 (19%) were missing. Forty-three (13%) of the total of 337 patients enrolled provided insufficient data to classify their result as “better” or “worse”.

Plots of the recovery of shoulder function with time visually showed the effects of different baseline characteristics on the improvement in the SST during the days after surgery. The work relationship of the shoulder problem, the type of arthroplasty, the superior position of the humeral head on an AP radiograph, the glenoid type, the glenoid scapular body angle, and the degree of decentering of the humeral head on the glenoid each had differing effects on the rate and extent of recovery of shoulder comfort and function (Fig. 2), as did patient gender, age, BMI, working status at the time of arthroplasty, insurance coverage, Charlson Comorbidity Index score (Fig. 3), American Society of Anesthesiologists (ASA) classification, history of anxiety/depression, history of smoking, current alcohol consumption, prior surgery, and preoperative SST score (Fig. 4), diagnosis, and antibiotic prophylaxis (Fig. 5).

Fig. 2A–F
figure 2

The pattern of recovery of patient self-assessed comfort and function as documented by the Simple Shoulder Test (SST) after surgery is shown for patients grouped by (A) the relationship of the shoulder problem to the patient’s work, (B) the type of shoulder arthroplasty, (C) the presence of superior decentering of the humeral head in relation to the glenoid determined on the preoperative AP radiograph taken in the plane of the scapula, (D) the preoperative glenoid type determined on the preoperative standardized axillary view with the arm held in a functional position (90° elevation in the plane of the scapula), (E) the angle between the glenoid face and the scapular body determined on the preoperative standardized axillary view (angles less than 90° indicate retroversion of the glenoid), and (F) the position of the humeral head relative to the glenoid determined on the preoperative standardized axillary view (a ratio of 0.5 indicates a centered humeral head, ratios greater than 0.5 indicate posterior decentering of the humeral head on the glenoid). Zero on the horizontal axis indicates the preoperative SST scores.

Fig. 3A–F
figure 3

The pattern of recovery of patient self-assessed comfort and function as documented by the Simple Shoulder Test (SST) after surgery is shown for patients grouped by the patient’s (A) gender, (B) age at the time of arthroplasty, (C) BMI, (D) working status at the time of arthroplasty, (E) type of insurance covering the shoulder condition, and the (F) Charlson Comorbidity Index.

Fig. 4A–F
figure 4

The pattern of recovery of patient self-assessed comfort and function as documented by the Simple Shoulder Test (SST) after surgery is shown for patients grouped by (A) American Society of Anesthesiologists (ASA) classification, (B) patient history of anxiety or depression, (C) patient history of smoking, (D) the patient’s current consumption of alcohol, (E) history of prior surgery on the shoulder, and (F) the SST score before the arthroplasty.

Fig. 5A–B
figure 5

The pattern of recovery of patient self-assessed comfort and function as documented by the Simple Shoulder Test (SST) after surgery is shown for patients grouped by (A) the shoulder diagnosis and (B) the type of antibiotic prophylaxis used for the arthroplasty. The Cefazolin group received cefazolin with (n = 8) or without (n = 171) vancomycin. The Ceftriazone/Ceftazidime group received either ceftriaxone only (n = 4), ceftriazone and vancomycin (n = 111), or ceftazidime and vancomycin (n = 2). The Clindamycin group received either clindamycin with (n = 5) or without (n = 32) vancomycin. The Other group received either ciprofloxacin only (n = 1), or vancomycin only (n = 3). OA = osteoarthritis; CTA = cuff tear arthropathy.

Most of the revision procedures were performed for postoperative stiffness (Table 2).

Table 2 Second procedure

Factors Associated with Better Outcome

After controlling for potentially relevant confounding variables, the multivariate analysis showed that in our study, the only patient factor significantly associated with better outcomes after shoulder arthroplasty was ASA Class I (OR, 1.94; 95% CI, 1.03–3.65; p = 0.041) (Table 3). The multivariate analysis showed that in our patients, the following shoulder factors were associated with better outcomes after shoulder arthroplasty: shoulder problem not related to work (OR, 5.36; 95% CI, 2.15–13.37; p < 0.001), lower baseline SST score (OR, 1.32; 95% CI, 1.23–1.42; p < 0.001), no prior shoulder surgery (OR, 1.79; 95%, 1.18–2.70; p = 0.006), humeral head not superiorly displaced on the AP radiograph (OR, 2.14; 95% CI, 1.15–4.02; p = 0.017), and glenoid type other than A1 (OR, 4.47; 95% CI, 2.24–8.94; p < 0.001). Thirty-three shoulders with Type A1 glenoids tended to have diagnoses other than osteoarthritis (avascular necrosis in 13, cuff tear arthropathy in nine, osteoarthritis in six, posttraumatic arthritis in four, and capsulorrhaphy arthropathy in one). Neither preoperative glenoid version nor posterior decentering of the humeral head on the glenoid were associated with the 2-year outcomes.

Table 3 Statistical analysis results showing odds ratio (OR) for better outcome

Creating and Testing a Predictive Model

We developed a model predictive of a better result; it was driven mainly by the absence of a relationship of the shoulder problem to the patient’s work, a low preoperative SST score, no prior surgeries on the shoulder, no superior displacement of the humeral head on the AP radiograph, glenoid pathoanatomy other than A1, and a low patient ASA class. The other factors selected by the AIC method for inclusion in the model are age, gender, anxiety/depression, smoking, and alcohol consumption (Table 3). The enhanced model added consideration of the effects of the 43 patients missing 2-year followup data. The area under the ROC curve generated from the cross-validated enhanced predictive model (Fig. 6) was 0.79 (generally values of 0.7 to 0.8 are considered fair and values of 0.8 to 0.9 are considered good). The false-positive and true-positive fractions depend on the cutoff probability selected (ie, the selected probability above which the prediction would be classified as a better outcome) (Fig. 6). A cutoff probability of 0.68 yielded the best performance of the model with cross-validation predictions of better outcomes for 236 patients (80%) and worse outcomes for 58 patients (20%); sensitivity of 91% (95% CI, 88%–95%); specificity of 65% (95% CI, 53%–77%); PPV of 92% (95% CI, 88%–95%); and NPV of 64% (95% CI, 51%–76%). Slightly poorer performance results were seen if a cutoff probability of 0.65 is selected. The cross-validation procedure yielded predictions of better outcomes for 240 patients (82%) and worse outcomes for 54 patients (18%); sensitivity of 92% (95% CI, 88%–95%); specificity of 60% (95% CI, 53%–77%); PPV of 90% (95% CI, 87%–94%); and NPV of 63% (95% CI, 50%–76%).

Fig. 6
figure 6

The receiver operating characteristic (ROC) curve for the predictive model is shown. The true-positive fraction is the sensitivity. The false-negative fraction is 1 minus the specificity. The color scale on the right represents the model-based probability of a better outcome; selecting different probabilities of a “better” outcome as a cutoff yields different sensitivities and specificities. As the ROC curve is constructed through a 10-fold cross-validation process, the area under the curve estimates how well the model based on 90% of the sample performs at predicting the outcome for the remaining 10% of the patients.

These results may help informed decision-making in practices similar to ours using the observation that, in rank order, the ORs for a better outcome are: (1) shoulder problem not related to work, 5.36 (95% CI, 2.15–13.37); (2) glenoid type other than A1, 4.47 (95% CI, 2.24–8.94); (3) humeral head not elevated on the AP radiograph, 2.14 (95% CI, 1.15–4.02); (4) ASA Class I, 1.94 (95% CI, 1.03–3.65); (5) no prior surgery on the shoulder, 1.79 (95% CI, 1.18–2.70); and (6) one point lower preoperative SST score, 1.32 (95% CI, 1.23–1.42). To indicate how these results might be applied to practices with characteristics similar to ours, we incorporated these ORs into an example of a possible outcome estimator (Table 4).

Table 4 Example of an estimator for a better outcome at 2 years

Discussion

Although the different types of shoulder arthroplasty are effective in treating most patients with the various forms of glenohumeral arthritis, the benefit patients derive from these procedures varies widely for reasons that are not yet well defined. Predicting the likely result for each patient has been difficult because the important prognostic baseline patient and shoulder characteristics associated with better or worse clinical outcomes have not been identified. We attempted to address this gap in knowledge, aiming to determine the factors that are predictive of a better outcome for patients having a shoulder arthroplasty and to incorporate these factors in a predictive model.

The factors associated with better outcomes were ASA Class I, a shoulder problem unrelated to the patient’s work, a lower initial SST score, no prior surgery on the shoulder, a humeral head that was not superiorly decentered in relation to the glenoid on the preoperative AP radiograph, and shoulders with a glenoid type other than A1 [15]. Although the association of A1 glenoid types with inferior results may appear counterintuitive, shoulders with Type A1 glenoids were more likely to have diagnoses associated with poorer prognoses, such as posttraumatic arthritis, avascular necrosis, chondrolysis, or cuff tear arthropathy.

In contrast to the findings in some prior studies [13, 14, 30, 31, 56, 60], preoperative radiographic glenoid retroversion, glenoid biconcavity, and posterior decentering of the humeral head on the glenoid were not associated with worse outcomes in our patients. This may be because the surgical techniques used were effective in managing the potentially adverse effects of these pathoanatomic factors [29]. A predictive model was created and showed good function in the 10-fold validation study.

Previous authors have attempted to identify predictors of the outcome of shoulder arthroplasty using smaller patient cohorts. Iannotti and Norris [31] conducted a multicenter clinical outcome study of 128 shoulders in 118 patients with primary osteoarthritis, finding, in contrast to our results, that preoperative posterior humeral head subluxation was associated with inferior outcomes. In a prospective study, Simmen et al. [53] used a 1-year postoperative Constant-Murley score of 80 or more as the definition of a successful outcome, finding that only 47 of 140 (34 %) shoulder arthroplasties were successful; like our study, theirs found that shoulders with previous surgery did less well.

In a prior study completed before initiation of this investigation, Gilmer et al. [22] found that for 176 patients having the ream and run procedure and considering the final SST score as the primary outcome metric, preoperative glenoid retroversion, preoperative glenoid biconcavity, and preoperative posterior decentering of the humeral head on the glenoid were not associated with poorer outcomes. These findings are consistent with the results presented here. In both studies, it is of interest that the arthroplasties were performed without attempting to correct glenoid version; yet no revisions were required for posterior instability or glenoid component failure. Gilmer et al. [22] did not include types of arthroplasty other than the ream and run or diagnoses other than osteoarthritis in their study.

The results of the current study need to be interpreted in light of some limitations. First, the patients were those of two experienced surgeons (FAM, WJM), with similar selection criteria for the different surgical interventions and similar surgical and rehabilitation techniques; thus the outcomes reported here may not be generalizable to other practices with different combinations of patients, surgeons, diagnoses, and procedures. Second, the study design sought outcomes at a defined 2-year time; because some adverse outcomes, such as glenoid component failure may not appear until 5 or more years after the index procedure, the results with longer periods of followup might be different [47, 60]. Third, while this appears to be the largest truly prospective study of its type, the sample size of 337 shoulders is still relatively small; in the future these results should be refined through the study of a larger population, ideally one drawn from multiple practice sites.

We found six factors that were significantly associated with better outcomes: ASA Class I, a shoulder problem unrelated to the patient’s work, a lower initial SST score, no prior surgery on the shoulder, a humeral head that was not superiorly decentered in relation to the glenoid on the preoperative AP radiograph, and shoulders with a glenoid type other than A1.

The clinical significance of these results is they suggest that careful selection of candidates for shoulder arthroplasty may be an effective means of optimizing the outcomes of these procedures. In considering elective surgery for glenohumeral arthritis, patients having factors not associated with a better outcome may be advised to consider nonoperative rather than surgical treatment. If these patients elect to proceed despite the presence of adverse factors, they may be counseled not to expect the same quality of outcome as individuals without these factors.