Frailty may affect 30–50% of the population aged over 65 yr and presenting for surgery, with higher rates of frailty identified in the emergency surgery population.1 As many as 30% of patients with frailty die, are discharged to a nursing home/care facility, or have a new disability after surgery.2 The 2010 United Kingdom National Confidential Enquiry into Patient Outcome and Death recommended that an agreed means of frailty assessment should be developed and included in the routine risk assessment of older surgical patients.2 National guidelines from the USA and UK recommend routine frailty assessment before surgery.3,4 Implementation of this recommendation has been poor in Australia and internationally. A 2020 survey of Australian anesthesia departments found that only five percent (2/41) routinely record frailty during the preoperative assessment,5 echoing similarly slowed uptake in the USA and Canada.6,7

The Clinical Frailty Scale (CFS) is a simplified nine-point instrument that enables rapid frailty screening incorporating a range of domains affected by frailty, without the need for geriatric expertise or functional testing.8 It is valid and reliable in multiple populations, including surgical populations.9,10 The CFS is simple to apply in the clinic or at the bedside, and has better feasibility and prognostic accuracy than other frailty instruments.11 Although the interrater reliability of the CFS has been assessed in community-dwelling adults, acute hospitalized populations, and the critically ill, to our knowledge only one previous study has reported on the interrater reliability between doctors and nurses in the perioperative setting. Among 67 adults aged 60 yr and over undergoing inpatient vascular surgery in the UK, agreement between consultant anesthesiologists and perioperative nurses was good (Cohen’s kappa, 0.61).12 Another perioperative study assessed agreement between one consultant anesthesiologist and two medical students for 112 adults aged over 80 yr undergoing emergency abdominal surgery in Norway, finding good to very good agreement (Cohen’s kappa, 0.74 to 0.85).13 The objective of our study was to determine the interrater reliability between anesthesiologists and perioperative nurses in the preadmission clinic (PAC) setting at a large university hospital in regional Australia. We hypothesized that good agreement between raters would be found, thus supporting its measurement by either member of the health care team in this setting for future clinical trials and cohort studies.

Methods

This cohort study was conducted at the John Hunter Hospital PAC (New Lambton Heights, NSW, Australia) from July 2020 to February 2021 using prospectively collected CFS ratings.

The John Hunter Hospital is a 796-bed hospital and the principle tertiary referral centre for Newcastle and northern New South Wales. A diverse range of (both adult and pediatric) surgical specialties is covered, with a 50:50 split of planned to unplanned procedures. At the time of data collection, approximately 750 elective procedures occurred per month, with an average of 650 patients attending the PAC. Nurse-led clinical triage by patient and surgical factors would allocate all attendees to either a “health check” telephone call with a dedicated perioperative service enrolled or a registered nurse or a face to face anesthetic assessment with either a perioperative interested consultant or anesthetic trainee. The CFS scale had been introduced as a routine component of face-to-face anesthetic assessments; however, it was proposed that compliance and workflow may improve if nurse completion of the CFS could replace this for all patients attending a face-to-face appointment. No specific training of CFS completion was given to either group.

The primary outcome was interrater agreement between anesthesiologists and perioperative nurses for each individual patient CFS rating (rating 1–9), measured by Cohen’s kappa statistic.14 We hypothesized that agreement between raters would be “good,” as indicated by a Cohen’s kappa value of 0.61 to 0.80. A guide to the interpretation of Cohen’s kappa statistic is shown in Fig. 1.15

Fig. 1
figure 1

Interpretation of Cohen’s kappa statistic

The study was approved through the Hunter New England Health Human Research Ethics Committee (2020/ETH02975). As the study used data collected in routine clinical practice, a waiver of patient and staff consent was granted. The study is reported according to the Reporting of Studies Conducted using Observational Routinely-Collected Health Data (RECORD) statement, an extension of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines and the Guidelines for Reporting Reliability and Agreement Studies.16,17,18

Study population

The study population included adults aged ≥ 65 yr attending the John Hunter Hospital PAC in person ahead of planned inpatient elective noncardiac surgery with an expected length of stay of at least one night. Patients who did not speak English were included, with frailty measured with the assistance of a translator as part of routine perioperative assessment. There were no exclusion criteria. Patients with repeat preadmission episodes of care (either attached to the same or a new surgical procedure) did not have subsequent encounters included in this study.

Blinding of assessors

Clinical Frailty Scale scoring forms part of the routine anesthetic assessment in the John Hunter Hospital Perioperative Service for patients aged ≥ 65 yr undergoing elective surgery. During redesign of PAC workflow in 2020, routine CFS evaluation was moved from medical to nursing assessment. As part of this period of overlap, both nursing (preadmission nurses) and medical staff (anesthesiologists) completed frailty assessment, with each scorer blinded from the other. Nursing assessment occurred first, with CFS notated in a paper form and stored in the research office. Anesthesiologists then completed their own CFS measurement, with results entered into the patients’ clinical notes.

Data collection and management

Data were entered into a dedicated REDCap database (REDCap, Vanderbilt University, Nashville, TN, USA), with separate access for entering nursing and medical assessments, such that data entry personnel remained blinded to alternative scores. Subsequently identifiable data were used to extract details of the corresponding surgical admission from the hospital administration database. Demographic data included date of birth, age, sex, American Society of Anesthesiologists’ Physical Status (ASA–PS),19 and baseline residence (independent/nursing facility). Frailty data included CFS rating, date of assessment, and designation of rater (anesthesiologist/nurse). Surgical factors included procedure name, Australian Classification of Health Interventions procedure code20 and date of surgery.

The REDCap study database was hosted securely on the Hunter New England Local Health District server. Only study investigators and data entry staff had access to the study database.

Statistical analysis

We calculated a minimum sample size based on the likely distribution of CFS in the population, which was chosen to mirror the proportions found in a contemporary surgical cohort in a tertiary metropolitan hospital.21 At a one-sided 5% level of significance and 80% power, we required at least 99 participants to show that the interrater reliability as measured by kappa was at least substantial (> 0.6) assuming the interrater reliability was 0.75. We assumed 5% missing data, with no adjustment made for withdrawals or dropouts.

Statistical analyses were performed using Stata version 15.1 (College Station, TX, USA). Baseline characteristics of the study population were summarized using descriptive statistics. The CFS is a categorical (ordinal) rating scale.22 Agreement between each pair of raters for individual patients was calculated using Cohen’s kappa statistic, with quadratic weighting used (as ascending CFS categories are associated with increased magnitude of clinical difference) and 95% confidence intervals obtained from 5,000 replications. Sensitivity analyses were performed analyzing interrater reliability separating patients by age into four roughly equal strata (age < 70 yr, 70–75 yr, 75–80 yr, and ≥ 80 yr). Cohen’s kappa statistics were categorized using the scale of Landis and Koch.15

Results

Study population

During the study period, CFS was assessed by nursing staff in 308 patients and by medical staff in 245 patients, with 243 patients being assessed by both disciplines (Fig. 2). Five of these were duplicate assessments for a second operation and were excluded from the analysis, leaving 238 assessments by both nursing and medical staff for inclusion in the study. The median [interquartile range] age was 74 [70–80] yr, and 47% of patients were female. Nearly two thirds of the study population had an ASA–PS of III or IV (153/238, 64%). Baseline characteristics of the study population are shown in Table 1.

Fig. 2
figure 2

Study participants ASA = American Society of Anesthesiologists Physical Status; CFS = Clinical Frailty Scale

Table 1 Study population—baseline characteristics

One patient did not proceed to surgery. Process of care data including critical care admission, discharge destination, and hospital readmission within 30 days are presented in Table 2.

Table 2 Study population—process of care outcomes

Primary outcome

Distribution of CFS ratings among the 238 patients are shown in Fig. 3. One hundred and twelve (47%) of CFS scores agreed perfectly between nursing and medical staff, with a further 99 (42%) differing by only one point, 24 (10%) by two points, and three (1%) by three points. The weighted kappa coefficient was 0.70 (95% CI, 0.63 to 0.77; P < 0.001), suggesting good agreement. There were no major differences identified between age strata (Table 3). Dichotomous kappa comparing nonfrail (CFS 1–4) with frail (CFS ≥ 5) was 0.52 (95% CI, 0.37 to 0.67; P < 0.001). Bland–Altman analysis suggested that 95% agreement limits were at approximately three levels above and below the mean. A Bland–Altman plot is shown in Fig. 4.

Fig. 3
figure 3

Distribution of Clinical Frailty Scale ratings among nursing and medical staff

CFS = Clinical Frailty Scale

Table 3 Interrater reliability separated by age strata
Fig. 4
figure 4

Bland–Altman Plot. This Bland–Altman plot shows the difference between medical and nursing CFS assessments against the average of medical and nursing CFS assessments for each patient, with 95% limits of agreement. Of the CFS scores, 112 (47%) agreed perfectly between nursing and medical staff, with a further 99 (42%) differing by only one point, 24 (10%) by two points, and 3 (1%) by three points.

CFS = Clinical Frailty Scale

Discussion

In a single-centre study of preoperative frailty assessments performed by nurses and anesthesiologists, we found good agreement between CFS ratings between the two groups, consistent with our hypothesis. Agreement was stronger when the individual CFS score was used rather than a dichotomized score. Histograms showing the distribution of ratings by each group of assessors were similar. Sensitivity analysis confirmed good agreement within different age groups, although the lower confidence limit dropped to or below 0.5 in three of four age groups.

Our study is consistent with perioperative studies from the UK comparing agreement between anesthesiologists and perioperative nurses (good agreement; kappa, 0.61) and from Norway comparing agreement between an anesthesiologist and two medical students (good to very good agreement; kappa, 0.74 to 0.85).12,13 In critically ill patients, interrater reliability of the CFS was found to be good in three separate studies in the UK, Australia, and Wales.23,24,25 In the emergency department setting, a USA study reported very good interrater agreement between nurses and doctors (weighted kappa, 0.90) over the entire spectrum of CFS scores and also when dichotomized as vulnerable/frail vs not frail (weighted kappa, 0.80).26 A 2019 Canadian study in the emergency department found patient-registered nurse and patient-doctor interrater agreement was moderate (kappa, 0.51 and kappa, 0.42, respectively) and physician-registered nurse agreement was good (kappa, 0.72).27 In the cardiology setting, a 2011 Swedish study showed extremely high interrater reliability (intraclass correlation coefficient, 0.97).28 In the community setting, a 2018 Canadian study was conducted by a geriatric outreach service, showing good interrater reliability (kappa, 0.64).29

Most reports of interrater reliability in various populations have shown moderate or good interrater agreement, with the exception of the 2011 Swedish cardiology study observing extremely high agreement and the Norwegian perioperative study showing good to very good agreement.13,28 Studies of interrater reliability in distinguishing frail (CFS ≥ 5) from nonfrail (comprising both nonfrail [CFS 1–2] and prefrail [CFS 3–4]) have generally found “good” or “substantial” agreement (Cohen’s kappa, 0.61 to 0.80). This study improves on these prior investigations by examining agreement across each increment of the scale, highlighting the improved agreement that occurs when we use the full granularity of the scale. Conversely, the Bland–Altman plot shows that the 95% limits of agreement are quite wide (± 3), suggesting opportunities exist to improve routine frailty assessment in clinical practice, to apply scores across the full range of the scale, and to support application of the CFS by training.

Our study has several limitations. First, no formal training was provided to nursing or medical staff in CFS assessment, potentially leading to variability in assessment. Rater training should be considered in future.30 Second, we cannot exclude informal consultation between nursing and medical staff when assigning CFS rating, as part of good communication in routine clinical care. Third, we conducted a single centre study and our results in this setting require external validation. Finally, results from interrater reliability studies apply to a population of patients; the accuracy of interrater assessment for individuals can vary from patient to patient.

Strengths of our study include the large sample size achieved by collection of routine clinical data during a period of transition in the PAC, making this one of the larger CFS inter-rater reliability studies to date. Blinding of assessors was achieved through the workflow structure of nursing and medical assessors at PAC. This was a real-world study in a large regional university teaching hospital with a high PAC patient load, with no formal training provided to nursing or medical staff in CFS assessment, no exclusion criteria, and two thirds of patients being high risk (ASA–PS III–IV).

Our single-centre study suggests the CFS can be applied in routine clinical practice and is reliable across a population of patients when applied by perioperative nursing staff or anesthesiologists with no formal training in its use. It shows that CFS assessment can be included in the normal PAC workflow with a high degree of data capture, and that when part of routine workflow, it was recorded with almost 100% completeness by nursing staff but less frequently by anesthesiologists. These findings should ideally be confirmed in a multicentre study across a range of public and private hospital facilities in different geographical settings.

At John Hunter Hospital, CFS assessment will be formally moved to the nursing workflow for elective surgery patients seen in PAC. This model for CFS assessment could easily be adopted by other hospitals to enable its completion. A mechanism is similarly required for routine capture of CFS in the emergency surgery setting, and this will likely require it to be systematically recorded by the treating anesthesiologist as standard practice.

Frailty is an important predictor of perioperative outcomes and should be included in future risk-adjustment models in clinical trials, cohort studies, and registries. Further, the addition of frailty as a covariable improves the accuracy of surgical mortality risk prediction models.31,32,33,34 To undertake this modelling, a large data set containing frailty assessment will be required. Implementation of international recommendations for routine assessment and recording of frailty status in the perioperative setting should be an immediate priority for Australian state and territory health systems, led by specialist colleges of surgery and anesthesia. At the health system level, this can be achieved by including frailty status as a variable in Australia’s Admitted Patient Care National Minimum Data Set, regularly reported by public and private hospitals to state and territory health departments.35