Introduction

Patients 65 yr of age and older comprise an ever-increasing proportion of the surgical population.1,2 This increase in the number of older people having surgery reflects the substantial aging of Western populations. Advanced age is an important predictor of adverse postoperative outcomes,3,4,5 such as a two- to four-fold increase in morbidity and mortality compared with younger age.6 Multistakeholder partnerships have identified improving outcomes for older patients, as well as understanding what outcomes matter most to patients, as top priorities in anesthesia research.7,8 Nevertheless, understanding which outcomes matter most to older people is largely unstudied in perioperative care.9

Not knowing which outcome measures matter most to patients is a barrier to meaningful healthcare improvement.10 While substantial progress has been made in defining core outcome sets in perioperative medicine11 and in improving care of older adults in general,9 few data address the unique needs and perspectives of older surgical patients. This is especially important as older people may have specific preferences regarding prioritization of function over survival,12 as well as considerations about postoperative support and transitions out of hospital. Recent focus on patient-reported outcomes represent an important advancement in generating evidence that is relevant to patients, but many patient-reported outcomes exist and their uptake remains limited.13 Most studies continue to focus on routinely measured postoperative outcomes, such as in-hospital or 30-day mortality, complications, and length of stay.14

This work aimed to address the lack of knowledge around outcome prioritization for older patients having elective noncardiac surgery. Our primary objective was to directly engage older people one year after surgery and ask them to rate the importance of commonly measured core outcomes for older patients and perioperative medicine. Our secondary objective was to elicit open-ended responses to identify other high priority outcomes, while we also explored whether subgroups of older patients may exist based on their differential prioritization of outcome measures.

Methods

Design and study setting

This was a cross-sectional study nested within a multicentre prospective cohort study conducted at three hospitals in Ottawa, Ontario, Canada.15,16,17 Patients were enrolled in the main study during their preanesthesia clinic visit at either the Civic Campus of The Ottawa Hospital (which provides tertiary care for neuro, vascular, gynecologic, spine, and general surgery patients), the General Campus of The Ottawa Hospital (which provides tertiary care for oncology, orthopedic, thoracic, and head and neck surgery patients), or Hôpital Montfort (a community-oriented centre serving a largely francophone population receiving orthopedic, general, urologic and gynecologic surgery). Together, these hospitals account for approximately 95% of adult major noncardiac surgeries in our local health region, which serves a catchment of approximately 2 million people. The main study evaluated the association of frailty with patient-reported disability after surgery and revealed increased disability rates in those with frailty 90 days after surgery,18 but a nonsignificant difference at one year.17 Findings were reported using appropriate standards for Bayesian analysis, observational studies, and qualitative research.19,20,21,22

Study population, inclusion, and exclusion criteria

Individuals who were at least 65 yr old on the day of major elective noncardiac surgery (i.e., expected length of stay of at least two days, not involving the heart or pericardium) who spoke English or French and who would be reachable by telephone after surgery were eligible for inclusion. An inability to answer outcome scales of the primary study was the only exclusion criterion. Patients were initially recruited in the preoperative assessment unit at each centre, and the nested study involved the final consecutive participants in the primary study, each contacted one year after their initial surgery once an amendment to our study protocol and ethical approval update was granted by the respective ethics review boards (Protocol Approval #20150342-01H and DM-31-08-15; amendment approval date, 20 February 2018).

Outcomes and ascertainment processes

Upon contact by telephone one year after surgery, participants were asked to rate the importance of six postoperative outcomes. Outcome importance was rated on an 11-point Likert scale where 1 represented “not at all important, should not be studied” and 10 represented “extremely important, should always be studied”. Responses were integers; no non-integer values (such as 7.5) were permitted. As no core outcome sets existed for older surgical patients at the time of our study, we identified outcomes that are routinely collected for surgical patients (occurrence of a complication, hospital length of stay, and discharge disposition) and three patient-centred outcomes (not developing a new disability [also known as disability-free survival],23 disability score on a 100-point scale,24 and days at home after surgery)25 (see eAppendix 1 for specific wording of the questionnaire in the Electronic Supplementary Material [ESM]) that are included in core outcome sets for perioperative medicine and care of older adults. To avoid participant burden, we did not ask about survival as this is already part of core outcome sets for older people and was already identified as important.9 Nevertheless, to ensure broad input, participants were also given the opportunity to describe any other outcomes that they thought should be considered in studies of surgical patients similar to themselves. To avoid question-order bias, question orders were randomized for each participant.

Participant characteristics and covariate data

Baseline demographic characteristics (age, sex), medical comorbidities (using Elixhauser diagnoses),26 cognitive impairments,27 frailty status (using the Clinical Frailty Scale),28 and depression and anxiety status were recorded,29 as recommended by practice guidelines for preoperative assessment of older individuals.30

Analysis

Analyses were conducted using SAS version 9.4 for Windows (SAS Institute, Cary, NC, USA) and R programming language (R Foundation for Statistical Computing, Vienna, Austria). We conducted descriptive analyses of the full cohort. Continuous variables are described using means and standard deviations (normally distributed based on visual inspection) and medians and interquartile ranges (skewed distribution based on visual inspection). Categorical variables are described using proportions.

Outcome prioritization

We compared the importance ratings for each outcome to estimate the probability that a given outcome was rated to be more important than each of the other outcomes by participants in our sample. Unlike the more commonly used frequentist approach to analysis (which computes the probability that data as extreme or more extreme would be observed if the difference in ratings were truly zero after repeating the same experiment with the same assumptions many times), we chose to use a Bayesian analysis framework. Bayesian analysis allowed us to calculate the probability of one outcome being rated higher than the comparator, based on our data and prior beliefs. We did not rank outcomes by priority, but performed pairwise comparisons. Because we had little data to inform our beliefs, weakly or noninformative priors were used.

To compare outcome ratings, we used multivariate (i.e., multiple dependent variables [each outcome]) regression models that included only the intercept as a dependent variable (as the intercepts represented an estimate of the central value of each outcome’s rating) in the ‘brms’ package in R.31 Default, weakly informative priors based on the t distribution with three degrees of freedom and scale of 10 (this distribution has little influence on the parameters estimated, but improves sampling efficiency by making unrealistically large or small values less likely) were used for the intercept and σ parameters. Our primary approach was to use linear regression, where the intercept estimated the mean response for each outcome. To test the robustness of these findings, and because Likert scale responses may not adhere to assumptions of linear regression, we repeated our analysis using quantile regression (where the model intercepts estimated the median values). For both models, we estimated the posterior distribution of the intercept for each outcome and then performed pairwise comparisons to estimate the proportion of samples in which the estimate was higher for one outcome than the other. These estimates did not directly equate to the number of participants who rated a given outcome to be higher priority than another, but instead measured the proportion of all samples that composed the posterior distribution where the intercept (i.e., the average estimated priority rating) for a given outcome was greater than the estimated intercept for the comparator outcome. These simulation results were the output that combined contributions from the prior distribution and the maximum likelihood estimate derived from the data and were generated from Markov Chain Monte Carlo simulations using a Hamiltonian Monte Carlo sampler. This allowed us to calculate a probability that a given outcome was rated higher than each other outcome. The performance of each model was assessed by ensuring adequate effective sample sizes (with values >1,000 being considered adequate as a measure of independent information derived from our Markov Chains), Rhat values (optimal = 1.00, which evaluates convergence of Markov Chains by comparing between- and within-chain variances for each parameter; larger values indicate nonconvergence), and through visual inspection of burn in plots, autocorrelation plots, and posterior density plots (using the ShinyStan package).32 Annotated code is provided in ESM eAppendix 2.

We also completed a thematic analysis of qualitative data provided by our open-ended question according to the methods of Braun and Clarke.33 Two reviewers conducted this analysis. First, they jointly created a set of codes using an inductive approach. Next, each reviewer independently assigned codes to each response and reviewers sought consensus. The same process was repeated to organize the codes into larger themes.

Cluster analysis

We performed an exploratory analysis that aimed to identify whether certain clusters of participants were evident based on their importance ratings across each outcome. To identify clusters, we used k-means clustering techniques (SAS, PROC FASTCLUS), which identified clusters by grouping related individuals based on minimization of differences (i.e., distances) in a series of continuous measures (in this case, the outcome prioritization scale for the six outcomes). The optimal number of clusters was identified by calculating the F statistic and cubic clustering criterion for a preplanned number of models with three to a maximum of eight clusters. We also aimed to avoid clusters with fewer than six members, as description of characteristics for small sized clusters would violate healthcare privacy legislation (in other words, we could not report cell sizes < 6). The optimal number of clusters was identified where values of the F statistic and cubic clustering criterion no longer increased substantially with additional clusters and where no clusters had fewer than six members. We then identified the number and proportion of individuals in each cluster and their mean values for each outcome rating and compared the baseline characteristics and study outcome measures across clusters.

Sample size and missing data

No formal sample size estimate was pre-established; our final sample was based on the number of one year follow up calls remaining in the main study when ethical approval for the substudy was granted. Nevertheless, the available sample did inform our analyses, as rules of thumb for: (1) k-means clustering recommends a sample size of at least 2m,34 where m = number of clustering variables (in our study 26 = 64); and (2) linear regression, where different rules suggest a sample size of at least 50 or 100 to estimate a model intercept.35,36 All participants responded to the questionnaires, meaning that no outcome data were missing; all participants were consecutive, avoiding issues of response bias. No adjustment for multiple testing of outcome importance ratings was required as all comparative statistical testing was conducted under a Bayesian framework.

Results

We surveyed 101 consecutive older people one year after elective, inpatient noncardiac surgery. Patients most commonly had orthopedic, thoracic, urologic, or gynecologic surgery; most lived with comorbidity; and one third were frail. Over a quarter screened positive for mild to moderate cognitive dysfunction (Table 1).

Table 1 Study population characteristics

Prioritization ratings

The mean and median ratings for each outcome (along with measures of dispersion) are provided in Fig. 1. The raw numbers and proportions of higher, tied, and lower scores for each comparison are provided in ESM eAppendix 3. Complications and discharge location had the highest mean values, although all were rated ≥ 7.7/10. Four of six outcomes had a median rating of 10. Analyses using linear regression estimated that complications had a larger probability of a high rating than any other outcome (> 99% vs length of stay and days at home, 57% vs discharge location). Discharge location had a larger probability of a higher rating than all outcomes except for complications. Disability and not developing a new disability had larger probabilities of high ratings than length of stay and days at home, but lower than discharge location and complications (see Fig. 2 for all pairwise comparison probabilities). Code and statistical output are provided in ESM eAppendix 2. Similar results were found using quantile regression (ESM eAppendix 4).

Fig. 1
figure 1

This figure represents the probability distributions for the mean ratings of each outcome assessed. The circle represents the point estimate for the median of the highest posterior density interval, the thick line represents the 50% credible interval, and the thin line represents the 95% credible interval. DAH = days at home; DFS = not developing a new disability; LOS = length of stay

Fig. 2
figure 2

This figure reports the pairwise probabilities that one outcome was more highly prioritized than another based on a Bayesian multivariate linear model with weakly informative prior distributions. The arrow size corresponds to probability, and probabilities presented represent the probability that the outcome most proximal to the number is more highly prioritized than the comparator

Open-ended responses

Forty-three (43%) respondents indicated that there were no other outcomes that they thought should be prioritized in perioperative research for older people. Nevertheless, among the 58 who did provide responses, 64 unique recommendations were made. Procedure-specific issues were most commonly identified as outcomes for prioritization (n = 23; 36%), e.g., “I would have liked to know about getting a tube up my nose”; “My bladder function really changed, I wasn’t ready for that”; “Eating habits really changed [after surgery], this is a big change and I needed [more] information”).

Long-term physical recovery was also a consistent concern (n = 20; 31%), e.g., “[I wish I’d known] that the recovery [..] would take a whole year to get back to running”; “What a long-haul recovery is [..], I did not recover well and still have muscle issues”; “I wanted to know the timeline of recovery. How long until I can [expect to do] certain activities”; and “[I needed] more information about recovery from surgery and what life after [surgery] looks like”).

The third most common open-ended theme was about need for post-discharge support services (n = 9; 14%), e.g., “The need for physio [after surgery]”; and “[Home care needs] should be identified and set up in advance. I wish I knew about this [need] ahead of time”). A full list of themes, occurrences, and supporting quotes are provided in ESM eAppendix 5.

Cluster analysis

Our analyses identified that three groups was the most appropriate number of clusters, although we tested models with up to five clusters (although eight were planned, at five clusters, several had consistently < six members). With three groups, our F statistic was higher than with four or five clusters, the cubic clustering criterion was not substantially smaller (and exceeded the minimum value of 2.0 that indicates good clustering),37 the R2 value was not substantially smaller than with four clusters, and no cluster had fewer than six members (see ESM eAppendix 6).

Within the identified clusters, the largest (n = 76; cluster 1) had high mean importance ratings for all outcomes (8.9–9.4). The second largest cluster (n = 19; cluster 2) had high ratings for complications, discharge, disability score, and not developing a new disability (means all > 8.0), but low ratings for days at home and length of stay (5.1 and 3.3, respectively). The smallest cluster (n = 6; cluster 3) had low ratings for all outcomes (2.8–5.3). Participant characteristics within each cluster are reported in Table 2, where individuals in cluster 2 appeared to have greater frailty, multimorbidity, and higher American Society of Anesthesiologists Physical Status scores than those in clusters 1 and 3.

Table 2 Study population stratified by cluster

Discussion

In this prospective, nested cross-sectional study of older people one year after major elective noncardiac surgery, we found that avoidance of major medical or surgical complications and being discharged home were the most highly prioritized outcomes among three routine and three patient-reported outcomes that are often used in perioperative research. Nevertheless, disability score, not developing a new disability, length of stay, and days at home were all highly rated (> 7.7/10) suggesting that commonly recommended outcomes are reassuringly relevant to older people. Furthermore, expected physical recovery trajectory and the importance of procedure-specific impacts of surgery were highlighted in open-ended responses. Across participants, there was evidence of differing priorities, with more vulnerable older people placing greater priority on disability-related outcomes and lower priority of length of stay or days at home. Together these findings should inform future research and practice specific to older surgical patients and suggest the potential need to personalize approaches to outcome measurement.

Core outcome sets are an identified minimum set of outcomes that should be recorded and reported in all studies for a given disease or population.38 Several core outcome sets relevant to older surgical patients exist, including those specific to perioperative care (e.g., Standardized Endpoints for Perioperative Medicine [StEP], a comprehensive core outcome set covering 12 domains relevant to perioperative medicine)11 as well as those related to healthcare for older people more generally.9 While each provides useful insights when identifying important outcomes, both suffer from limitations when addressing outcomes most relevant to older surgical patients. First, neither StEP,11 nor Akpan et al., directly engaged patient partners (although Akpan et al. did receive perspectives from a six-member focus group of older people),9 meaning that recommendations were not generated in a patient-oriented manner.39 Next, older surgical patients are a unique subset of the surgical population with specific preferences, underlying risk factors, and expected recovery trajectories.40 Therefore, one would expect them to have unique perspectives on what outcomes they most highly prioritize. The current study helps to address the important knowledge gap related to outcome prioritization for older surgical patients. Based on responses from 101 consecutive patients with perioperative experience, three routinely collected outcomes (complications, length of stay, and discharge location) and three increasingly studied patient-centred outcomes (disability score, not developing a new disability and days at home) were all prioritized by older people one year after their surgery. This suggests that much of the evidence currently being generated in clinical research is likely relevant to older people.

While it is reassuring that the six outcomes studied were all prioritized, our findings also provide insights into how future research can increase its relevance by focusing on the priorities of older people. Although complications and discharge location had the greatest probability of being highest rated, substantive differences appear to exist between health- and function-specific outcomes (complications, discharge location, disability score, and not developing a new disability) compared with more system-related outcomes (days at home and length of stay). This may reflect patients’ understanding that serious complications can lead to poor recovery and longer term adverse outcomes after surgery.41,42 The high prevalence of loss of independence and non-home discharge in older patients and those with geriatric conditions like frailty18,43,44 is consistent with previous studies showing that older people value function and quality of life as much or more than survival after an episode of acute illness.45 Disability-related outcomes also reflect the prioritization of function, and reflect patient-reported outcomes that are increasingly valued as they directly reflect the patient experience without interpretation by the clinical or research team.46,47 Lower prioritization of system-related outcomes of length of stay and time spent away from home may reflect patients’ recognition that time away from home is to be expected after surgery and in some ways is an investment toward longer term positive outcomes.

Finally, our results indicate that personalizing care, which has typically focused on identifying individuals’ unique risk profiles or expected treatment responses,48,49,,50 may also require personalizing the manner in which the success of a surgical procedure is judged. Although exploratory, our data suggest that there may be subgroups of patients who differentially prioritized the six outcomes evaluated. Future research with larger samples would strengthen the certainty of this phenomenon. This was in keeping with our qualitative results from open-ended questions, which highlighted the need for specific and individualized information to allow patients to better understand the details of their planned procedure, as well as to plan for their transition home. In particular, although we identified procedure-specific information as a core theme, quotes supporting this theme were diverse and reflected both processes (such as need for different types of invasive lines and tubes), as well as specific personal impacts on day-to-day life. This suggests that the preoperative period could be used to provide patients with better procedure-specific education to optimize their understanding of the perioperative journey. Similarly, quotes within the physical recovery theme reflected both high-capacity function (such as running), as well as more basic impacts on activities of daily living. Moving forward, evaluation of successful surgery could use goal attainment scaling, a method of scoring the extent to which patient’s individual goals are achieved in the course of an intervention.51 This approach was first introduced for assessing outcomes in mental health settings and is suitable for health problems that warrant a multidimensional and individualized approach to treatment planning and outcome measurement.52Importantly, goal attainment scaling has been shown to be feasible for older adults as a strategy to facilitate patient-centred care and suggests that the process of personalized goal-setting itself may facilitate goal attainment.53,54

Strengths and limitations

This study should be appraised in terms of its strengths and limitations. First, we studied 101 consecutive older people who had undergone major elective noncardiac surgery from a multicentre study that achieved > 80% enrolment of eligible participants. Therefore, our findings should be generalizable to similar patients. Nevertheless, our sample includes people having a variety of surgical procedures, which may have introduced heterogeneity. To avoid question response bias, participants were asked to prioritize outcomes in a randomized order using standardized prompts. That said, to avoid participant burden (as this was a substudy of a larger project), we only asked about six outcomes. Therefore, we cannot comment on relative prioritization versus other outcomes known to be important (e.g., mortality) or commonly studied perioperatively (e.g., pain, nausea vomiting, and quality of recovery). Additionally, complications are variably defined in the perioperative literature (for example, the National Surgical Quality Improvement Program,55 International Surgical Outcomes Study,56 Clavien–Dindo classification,57 and Post-Operative Morbidity Survey58 are all commonly used but apply different criteria to define a complication), and are probably variably interpreted by patients. Other outcomes queried may also have been variably interpreted by participants, suggesting that future research may need to focus developing a deeper understanding of how patients understand outcome labels routinely applied by researchers. We also do not have a clinically important difference available for the Likert scale used. Our open-ended query did allow participants to provide additional insights, but these data were not structured and required qualitative (as opposed to quantitative analysis). These qualitative data may be further limited by the fact that the interaction was over the telephone after completing outcome questionnaires from the main study (which could both influence how participants were thinking, as well as limit the degree to which they wanted to have further discussions). Qualitative research must also be interpreted within the context of those performing the analysis (often called reflexivity). In the current study, both coders were anesthesiologists actively involved in preoperative and intraoperative care, while one runs a geriatric surgery research program. These experiences may influence how qualitative data were interpreted. Our results would have been further strengthened by inclusion of a representative patient partner in study planning, analysis, and interpretation; unfortunately, the study was conducted without direct contributions by such a partner. Finally, participation required surviving to one year after major surgery, meaning that we cannot infer the preferences of individuals who did not survive the full year after surgery.

Conclusion

One year following major elective noncardiac surgery, older people most highly prioritize health- and function-related outcomes over system-related outcomes. Nevertheless, all studied outcomes were highly rated, suggesting that much of the data currently collected in perioperative outcome studies is relevant to older patients. Personalization of outcomes may represent a means to further improve the relevance of perioperative research to older patients.