With an aging US population, frailty is being increasingly recognized as an important predictor of healthcare outcomes beyond geriatrics medicine [13]. Frailty is defined as a decrease in physiologic reserves giving rise to vulnerability separate from the normal aging process [4]. Frailty relates to an individual patient’s physiologic reserve and resistance to stressors and estimates physiologic reserves primarily in older adults. With over half of all operations in the USA being performed on patients 60 years and older, which is the fastest growing segment of the US population, the ability to identify frailty has become paramount importance today more than ever before [5].

Although many providers caring for the elderly recognize frailty when they see it, there is currently no one standardized clinical assessment tool to quantify frailty that has been established and is universally applied [6]. Two major frailty assessment models have been described. The first, largely applied in the geriatric medicine literature, is a physical phenotype model described by Fried et al. [7]. The second is a multiple domain aggregate model based on the concept of ‘accumulating deficits’ [7]. Both of these models are cumbersome and require additional assessment of patients. Both models are difficult to apply in real time when making decisions about operative risk as it relates to frailty. They are also difficult to apply in a retrospective nature to large population-based datasets.

Given the above, researchers have developed an 11-item modified frailty index (mFI) based off the accumulating deficits model. This index was created for application to the NSQIP dataset and has been described and validated in several surgical studies [8]. The 11-item mFI was created by mapping the 70 items in the Canadian Study of Health and Aging (CSHA)—a large population-based index, to 11 corresponding variables captured by NSQIP [8]. Our group has recently utilized the 11-item mFI to study the impact of frailty on outcomes following paraesophageal hernia (PEH) repair [9]. We were able to demonstrate that frailty could be assessed using the 11-item mFI and NSQIP data. However, a significant portion of eligible patients with an appropriate ICD-9 and CPT code for study inclusion were ultimately excluded because they were missing 1 or more of the variables in the 11-item mFI. We hypothesized that frailty could be assessed using NSQIP with fewer than 11 variables and that by doing so we could include a greater number of subjects from NSQIP in our study. We felt that this would be particularly useful from a practical standpoint if such an index were to be used in real-time decision making with patients. We also felt that such an index would make studying the impact of frailty on outcomes following relatively rare procedures or conditions more feasible using NSQIP.

Materials and methods

The American College of Surgeons National Surgical Quality Improvement Program (NSQIP) participant user file (PUF) for years 2011–2013 was used for this study. The Medical College of Wisconsin policy on publicly available datasets with de-identified patient information made this study exempt from Institutional Review Board approval and obtaining informed consent. Patients were included if they underwent surgery based on CPT and ICD-9 codes consistent with paraesophageal hernia repair (Table 1) and if they were ≥60 years old at the time of surgery.

Table 1 CPT procedure and ICD-9 diagnosis codes for paraesophageal hernia

We created a 5-item mFI using a subset of 5 variables from the 11-item mFI (Table 2). We based our frailty index on the 5 most consistently reported variables out of the 11 total frailty variables in the NSQIP dataset. The excluded 6 variables were consistently missing with 75 % of patients missing one or more of these 6 data points in the dataset, while 84 % of all eligible patients had all 5 of the variables required for our mFI. The frailty index was calculated for each patient by adding the number of variables present in NSQIP for each patient, with 0–5 total points possible.

Table 2 The 11-item modified frailty index (mFI) variables mapped to the 5-item mFI

A frailty index was calculated for each patient by adding the number of variables present for each patient. The 5-item mFI scores were categorized as 0, 1, and 2+ for the purpose of regression analysis. The score categories used for the 5-item mFI were a departure from our original publication analyzing the applicability of the 11-item mFI (categorized as 0, 1, 2, and 3+). The decision to replace the 3+ category with 2+ was made to allow for better representation of the compressed scale, as the 5-item mFI resulted in a reduced frequency of patients with a score ≥3. The scoring systems of the 5-item and 11-item mFIs are statistically supported by a weighted Kappa statistic, which measured the agreement between the 5-item mFI and the 11-item mFI.

Descriptive statistics were calculated for patient characteristics (Table 3) and 30-day outcomes (Table 4), to ensure the condensed mFI represented a similar population as the 11-item mFI. The severity of a postoperative complication was evaluated using the Clavien-Dindo classification system. The Clavien-Dindo classification system ranks surgical complications on the magnitude of the interventions required to manage the complication and whether it results in permanent disability or death [10]. The postoperative complications tracked through NSQIP were grouped based on how they are treated in routine clinical practice under Clavien-Dindo grading criteria (Table 5) [11].

Table 3 Comparison of PEH patient characteristics for the 5-item mFI and the 11-item mFI
Table 4 Comparison of PEH 30-day outcomes for the 5-item mFI and the 11-item mFI
Table 5 Clavien-Dindo classification of surgical complications

Univariate analysis was performed using the Chi-square test for categorical variables and Kruskal–Wallis tests for continuous variables. Multivariate logistic regression models were completed to determine whether the 5-pt mFI was a significant predictor of readmission, mortality, discharge destination, and Clavien-Dindo grade, while controlling for variables that we determined to be associated clinically with adverse surgical outcomes including ASA class, advanced age, surgical approach (open versus laparoscopic), wound class, procedure timing (emergent versus non-emergent). Multiple logistic regression analyses were presented as odds ratio (95 % CI). All p values were two-sided, and p < 0.05 was considered statistically significant. All analyses were completed using SAS software version 9.4 (SAS Institute, Cary, NC).

Results

Of the 4434 PEH repairs performed in patients ≥60 years old during the study interval, 3711 patients (84 %) had all 5 items contained in the condensed mFI, while only 885 (20 %) had data corresponding to all items in the 11-point mFI. For the 6 variables excluded from the 5-item index that were included in the original 11-item index, in the 2011 PUF, one of these 6 variables was missing 44 % of the time. For the 2012 PUF (69 %) and 2013 PUF (100 %), one or more of these 6 variables were missing more frequently. For the 5 variables in the condensed mFI, one or more variables were missing with a relatively low frequency (2.5 % in 2011; 5 % in 2012; and 8.5 % in 2013).

In the larger pool of 3711 patients (5-item mFI), mortality occurred within 30 days of surgery in 1.2 % of patients, Clavien-Dindo complications grade ≥3 in occurred in 5.9 % of patients, 8.8 % of patients were discharged to a place other than home, and 7.8 % of patients were readmitted to the hospital within 30 days of surgery (Table 4). A complete assessment of patient demographics for both the 11-point mFI and the 5-item mFI is shown in Table 3. A weighted Kappa analysis assessing the agreement between the 5-item mFI and the 11-item mFI demonstrated strong agreement, with a weighted Kappa value of 0.871 (CI 0.853–0.889), p < 0.001 (Fig. 1; Table 6).

Fig. 1
figure 1

The scatterplot shows the 11-point mFI (x-axis) and 5-point mFI (y-axis) scores for 885 patients with complete data. The size of the points is proportional to the number of patients. The weighted Kappa statistic and p value are provided. For reference, a weighted Kappa of 1 would correspond to perfect agreement and 0 to no agreement between the two measures

Table 6 Weighted Kappa measuring the agreement between the 5-item mFI and 11-item mFI. A weighted Kappa of 1 implies a perfect agreement between the two scales. p value <0.05 suggests the observed agreement between the indexes is statistically significant

Clavien-Dindo outcomes

Univariate analysis utilizing Chi-square testing revealed a higher frailty score was predictive of a higher Clavien-Dindo complication grade (p < 0.0001). The percentage of patients with a Clavien-Dindo grade ≥3 complication relative to a 5-item mFI score of 0, 1, and ≥2 was 3.6, 5.9, and 10.5 %, respectively. Patient with a mFI score of ≥2 had outcomes that were statistically significant in comparison with patients with a mFI score of 0 (p < 0.001) (Fig. 2). While the 11-item mFI was also significant, its p value did not reach that of the 5-item mFI (Fig. 2).

Fig. 2
figure 2

The forest plot shows the adjusted odds ratios and 95% confidence intervals comparing mFI levels in the 11-point index multiple regression analysis (above) and the 5-point index multiple regression analysis (below). An adjusted odds ratio greater than 1 indicates an increased odds of increasing Clavien-Dindo classification. An overall p value for the association of mFI with the outcome was calculated using a likelihood ratio test (LRT) with 2 degrees of freedom

Multivariate ordinal logistic regression revealed the 5-item mFI was not a significant predictor of Clavien-Dindo grade when controlling for age, emergent procedures, ASA class, wound classification, and surgical approach for the 5-item mFI (p > 0.05). However, the 11-item mFI remained significant (p = 0.012) after we controlled for age, emergent procedures, ASA class, wound classification, and surgical approach. Pairwise comparison of the 11-item mFI revealed the index was only significant when comparing scores of ≥2–0, as an mFI score of 1 on the 11-item index revealed a significance of p = 0.672.

Mortality

Univariate analysis with Kruskal–Wallis testing demonstrated a higher frailty score was significantly correlated with an increased mortality rate. Reported 30-day postoperative mortality relative to a 5-item mFI score of 0, 1, and ≥2 was 0.3, 1.3, and 2.5 % of patients, respectively. Mortality outcomes on the 5-item mFI for a score of 1 and ≥2 were statistically significant in comparison with a mFI score of 0 (p = 0.0002) (Fig. 3). On the contrary, univariate analysis of the 11-item mFI failed to reach statistical significance for 30-day mortality (p = 0.0920). After we controlled for competing risk factors, including age, ASA score, wound classification, surgical approach, and procedure timing (emergent vs non-emergent), we found a score of ≥2 on the 5-item mFI remained predictive of mortality within 30 days of PEH surgery (OR 4.01, 95 % CI 1.09–14.74; p = 0.037). A model demonstrating the same predictive power of mortality for the 11-item mFI was not able to be computed due to a mortality rate of less than 1 % for the 885 patients who had all 11 variables available in NSQIP. This further supports the versatility of the 5-item mFI.

Fig. 3
figure 3

The forest plot shows the adjusted odds ratios and 95% confidence intervals comparing mFI levels in the 5-point index multiple regression analysis (below). An adjusted odds ratio greater than 1 indicates an increased odds of mortality within 30 days of PEH repair. An overall p value for the association of mFI with the outcome was calculated using a likelihood ratio test (LRT) with 2 degrees of freedom. A model demonstrating the same predictive power of mortality for the 11-point mFI was not able to be computed due to the low number of events. Thus, a forest plot for the 11-point mFI is not included

Discharge destination

Univariate analysis utilizing Chi-square test revealed that higher frailty scores were predictive of patients being discharged to a facility other than home. The frequency of patients discharged to a facility other than home relative to a 5-item mFI score of 0, 1, and ≥2 was 4.2, 9.1, and 17.6 %, respectively. Patient outcomes for a mFI score of 1 and ≥ 2 for the 5-item index were statistically significant when compared to outcomes rate for a mFI score of 0 (p < 0.0001). The results of the 5-item mFI were statistically similar to the 11-item mFI data (Fig. 4).

Fig. 4
figure 4

The forest plot shows the adjusted odds ratios and 95% confidence intervals comparing mFI levels in the 11-point index multiple regression analysis (above) and the 5-point index multiple regression analysis (below). An adjusted odds ratio greater than 1 indicates an increased odds of non-routine discharge (among patients discharged alive). An overall p value for the association of mFI with the outcome was calculated using a likelihood ratio test (LRT) with 2 degrees of freedom

Multivariate logistic regression demonstrated that the 5-item mFI continued to be a significant predictor of non-routine discharge following PEH repair, after controlling for competing risk factors, including advanced age, ASA score, wound classification, surgical approach, and procedure timing (emergent vs non-emergent) (p < 0.0001). Further, pairwise comparison of the multivariate logistic regression model revealed a score of ≥2 versus a score of 0 on the 5-item mFI was a significant predictor of a patient being discharged to a destination that was not their home (OR 2.47, 95 % CI 1.63–3.76; p < 0.001). Multivariate logistic regression of the 11-item index revealed similar findings for a score of ≥2 after controlling for the same competing risk factors (OR 3.48, 95 % CI 1.53–7.92; p = 0.003).

Readmission

Univariate analysis utilizing Chi-square test revealed that frailty score was not correlated with readmission rate, utilizing either index (p > 0.05). Patients with a 30-day readmission rate relative to a 5-item mFI score of 0, 1, or ≥2 were 7.5, 7.7, and 9 %, respectively. Multivariate logistic regression analysis also demonstrated neither index was statistically significant in predicting hospital readmission after PEH repair (p > 0.05) (Fig. 5).

Fig. 5
figure 5

The forest plot shows the adjusted odds ratios and 95% confidence intervals comparing mFI levels in the 11-point index multiple regression analysis (above) and the 5-point index multiple regression analysis (below). An adjusted odds ratio greater than 1 would indicate an increased odds of readmission. An overall p value for the association of mFI with the outcome was calculated using a likelihood ratio test (LRT) with 2 degrees of freedom

Discussion

Elderly patients carry an increasing burden of chronic illnesses affecting overall health and well-being [8]. Older patients being considered for surgery have unique physiologic vulnerability requiring assessment beyond traditional preoperative evaluation of adults [4]. Frailty is a unique domain of health status that can be a marker of decreased reserves and resultant vulnerability in older patients and may help explain why some older patients recover better than expected and others far worse [12]. In addition, an objectively assessed measure of frailty may improve our understanding of the heterogeneity of vulnerability in the geriatric patient. An aging population coupled with the increased incidence of symptomatic PEH results in the increasing importance of identifying frail patients at high risk of postoperative morbidity and mortality.

Two major frailty assessment models have been described in the literature. The first, physical phenotype model, was based on five characteristics: unintentional weight loss, exhaustion, weakened grip strength, slow walking, and low physical activity. This model has been largely applied to geriatric medicine literature to assess frailty and has been validated and applied in a number of studies [1214]. The strength of this index is its clinical reproducibility [15]. Variations of the model have been described and continue to be used; however, this model is cumbersome and significant training of personnel and equipment is required to acquire frailty data [1618]. Additionally, its application is difficult for a patient in an acute disease setting population-based database like NSQIP.

A second frailty model, which is the foundation for this paper, is a multiple domain aggregate model based on the concept of ‘accumulating deficits.’ This model integrates medical, psychological, and functional capabilities and has been validated by a number of studies including a 70-item large population-based study—the Canadian Study of Health and Aging (CSHA) [8, 19]. This model has been compared to the phenotype model and was found to be a stronger predictor of outcomes in elderly patients [18, 20]. Contrary to the phenotype model, this model does not require trained personnel and equipment to acquire frailty data. In this model, frailty data are acquired by history and physical examination alone and, thus, can be used in an acute disease setting [18, 19]. A significant downside of the 70-item frailty index is the difficulty in capturing and considering 70 discrete variables. A concise, valid, and reliable frailty index that is able to be determined from readily available data points that are contained in large population-based datasets would be a more useful tool for clinicians and researchers alike.

The 11-item mFI was developed by mapping NSQIP variables to the 70-item CSHA index and has since been studied and validated in several surgical studies [8, 19, 21, 22]. We recently utilized this 11-item mFI to study the impact of frailty on outcomes following PEH repair using the NSQIP dataset [9]. In our original work, a significant percentage of otherwise eligible patients were excluded from the final analysis due to missing NSQIP variables (80 %). While we chose to exclude these patients in our previous work, other researchers have used statistical techniques to impute missing variables. Other researchers have chosen simply to exclude the missing data, as we did in our previous study, at the expense of a smaller sample size [2224]. Surprisingly, others have not articulated how they dealt with missing data [25, 26].

The information in the NSQIP PUF varies slightly from year to year. NSQIP now captures over 300 variables including demographics, preoperative risk factors, intraoperative and postoperative variables, and morbidity and mortality outcomes, with new data points added and others removed each year. Incomplete data have become increasingly common, largely due to the lack of practicality in consistently capturing over 300 variables in a healthcare culture characterized by limited resources and time [27]. Further, as the size of the NSQIP program expands, more data have been missing from the annual PUF files. Rubinfeld et al. demonstrated this concept of increasingly missing variables when comparing the 2005 NSQIP PUF to the 2008 version. In the mere 3-year interval of our study, we observed that the 6 items we omitted from the original 11-item mFI were missing at a rapidly escalating rate (44, 69, and 100 %). With only 20 % of patients who had a PEH repair in the 3-year interval of this study meeting inclusion criteria to allow for calculation of a modified frailty index based on the 11-item index with NSQIP data, we felt that a more concise frailty index was needed. In our original publication analyzing the applicability of the 11-item index in PEH repair, we also observed that the number of patients who had 4 or more preoperative variables used to assess frailty was extremely low. This made us question the necessity of an index that assessed the presence of 11 possible variables to determine frailty.

The variables used to develop the 5-item mFI were mapped from the 11-item mFI variables based on 5 variables that were most consistently reported in NSQIP using the 2011, 2012, and 2013 PUF. A weighted Kappa analysis demonstrated a statistically significant agreement between the 5-item mFI and the 11-item mFI. Figure 1 provides further visual representation of the agreement between the 5-item mFI and the 11-item mFI. Our results suggest that the 5-item mFI is a valid frailty index and can be applied to the NSQIP dataset for assessing outcomes of PEH repair. Additionally, a majority of patients had the same variables on the 5-item and 11-item indexes (the other 6 variables occurring infrequently) (Table 2). The 6 variables we excluded from the 11-item index, many of which were related to cardiovascular risk factors (cerebrovascular accident, myocardial infarction, and angina), were chosen solely because they were rarely present in NSQIP in recent years. Previous studies have examined cardiovascular risk factors and identified a relationship between these variables and frailty [2, 8, 28]. Variables were not excluded from the condensed index for statistical reasons pertaining to frailty.

Our decision to apply the 5-item mFI to the NSQIP dataset allowed for a significantly increased sample size in comparison with the 11-item mFI data. There were 4434 eligible patients with a primary procedure of PEH repair. The sample size went from 885 (20 %) when analyzed with the 11-item mFI to 3711 (84 %) for the 5-item mFI. The results of primary and secondary outcomes had similar trends between the two frailty indexes. With a larger sample size, mortality reached statistical significance when analyzed with the 5-item mFI compared to the 11-item mFI.

One publication suggested that although elderly patients are more likely to suffer from a symptomatic PEH and experience diminished quality of life, clinicians are reluctant to pursue surgical consultation or offer surgical intervention secondary to the fear of increased morbidity and mortality coupled with a perceived lack of symptomatic benefit [29]. The fact that there were only 8 patients with reported mortality within 30 days of PEH surgery in the 11-item mFI between 2011 and 2013 is a testament to the fact that very few frail patients end up undergoing surgical intervention and the practice appears to primarily be a result of the lack of quality and frailty-specific data to allow for a better understanding of the impact of frailty on postoperative outcomes. Objective data describing the prevalence of frailty and the estimated probability of a patient not only tolerating a procedure, but also achieving a reasonable postoperative functional status may eventually guide surgical decision making. For frail patients deemed to be an adequate surgical candidate, decisions regarding the optimal approach (laparoscopic vs open) and, more specifically, the surgeon or hospital (foregut surgery specialist, high volume surgeons and/or hospital) that should perform the procedure, may be informed by these data.

Despite the 5-item mFI having not yet been validated in other studies, the index preliminarily appears to be a promising tool for assessing frailty. Further, the 11-item mFI and 5-item mFI appear to demonstrate the same results as evidenced by Fig. 1. The use of this 5-item mFI may be practical because it requires few variables that are easily available in the clinical setting, is easily applied to the NSQIP database, is easy to adapt to an acute disease setting, and requires very little time in assessing frailty. The 5-item mFI uses patient factors found by simple history alone, disposing the need for additional trained personnel to evaluate and calculate a patient’s frailty index.

Since the NSQIP data are prospectively collected, biases associated with retrospective data collection were minimized. Weaknesses of the study include the lack of control evident in retrospective studies, demonstrated by the missing variables needed to calculate the mFI. Further, assessing frailty using the accumulating deficits model is a limitation of the use of the NSQIP dataset to create a modified frailty index. Further analysis of the 5-item index in a study of prospective design is indicated to validate that we are truly assessing frailty with our proposed 5-item index. However, the concordance between our new index and the previously validated 11-item index suggests we are measuring frailty just as adequately as the 11-item index. We plan to study the 5-item mFI in a prospective clinical trial, as well as assess the applicability of the 5-item mFI with other surgical procedures.

In conclusion, the proposed 5-item mFI is an acceptable alternative to the 11-item mFI. With the reality that many variables in NSQIP are missing, the 5-item mFI is a more statistically sound frailty index as it allows for a more inclusive patient dataset. Our proposed 5-item mFI has been demonstrated to be a significant predictor of 30-day mortality and discharge to a facility other than home following PEH repair. In comparison with the 11-item mFI, the utilization of the 5-item index allows for a significantly increased study sample size and a cohort that is likely more representative of the study group of interest (84 vs 20 % of otherwise eligible patients). Identifying elderly patients who are frail has become increasingly important for surgeons to provide patients with realistic expectations and risks prior to surgery.