Editor’s key points

  • Clinical studies on postoperative delirium (POD) that rely on validated diagnostic scales are a fraction of the available literature accounting for only 5.6% of retrieved studies. This systematic review intended to provide the full list of studies that used a validated POD diagnostic scale.

  • Studies that fulfill inclusion criteria for the present systematic review can provide clinical and research insights for evidence-based prevention, diagnosis, and treatment of POD.

  • Of the 12 scales listed by the ESA-POD-GL, the 260 original studies included in this systematic review (SR), 205/260 (78.8%) adopted a single POD diagnostic scale, 48/260 (18.5%) used 2 diagnostic scales, and 7/260 (2.7%) used 3 or more diagnostic scales. The Confusion Assessment Method (CAM), CAM in Intensive Care Unit (CAM-ICU), and Diagnostic and Statistical Manual of mental disorders (DSM) accounted for more than 95% of used scales.

Introduction

Postoperative delirium (POD) is an acute and fluctuating alteration of mental state of reduced awareness and disturbance of attention that can occur immediately after the recovery from anesthesia and up to 5 postoperative days [1]. It is reported to complicate the postoperative course of a substantial proportion of patients and is diagnosed in 10–50% patients depending on the specific clinical setting including age cohort, type of surgery (e.g., abdominal, orthopedic, urological, thoracic), and the indication criteria (elective, emergency, or urgent) [2, 3]. Clinical presentation of POD includes hyperactivity, hypoactivity, and mixed forms, and its occurrence is associated with increased perioperative morbidity and prolonged hospitalization, worse functional outcome and survival rates to long-term follow-up [4, 5].

In 2017, the task force of the European Society of Anesthesia and Intensive Care (ESAIC-TF) on POD delivered dedicated guidelines that reported the need to implement routine monitoring using validated scales (Table 1) [1]. These guidelines highlight the need to engage with “[ … ] integrated actions aimed to reduce the incidence and duration of POD.”; nevertheless, appropriate initiatives can only be identified using evidence-based medicine principles [6,7,8]. The prerequisite for data analysis is the selection of studies that fulfill necessary quality criteria to be considered for structured recommendations, including appropriate diagnostic workup.

Table 1 Scales validated for POD monitoring listed in the European Society of Anesthesia and Intensive Care (ESAIC) guidelines [1]

The aim of this systematic review (SR) is to identify clinical studies related to POD that included a postoperative monitoring diagnostic workup based on validated scales.

Methods

This SR was registered in the International Prospective Register of Systematic Review (PROSPERO registration number: CRD42021246906) and performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) recommendations [9]. Literature search strategy was based on the 2017 ESAIC-TF POD guidelines; the same keywords were used and to optimize data retrieving; query strings were updated conforming the most recent version of searched databases (PubMed, Embase, Cinhal, Cochrane, Scopus, and Web of science) [1]. Searched keywords included the following terms: postoperative, postsurgical, post anesthesia, anesthesia recovery, delirium, and confusion. Literature search strings, tailored to individual requirements of each of the searched database, were used and are listed in Appendix 1. Considering the enhancement of “automatic term mapping”, implemented in PubMed 2019–delivered version, the use of the asterisk was avoided. Redundancies of the strings used by the ESAIC-TF on POD for literature search accomplished for guidelines released in 2017 have been eliminated, and inconsistencies have been corrected. Furthermore, references sections of retrieved SRs and meta-analyses were searched for studies missed through database literature screening. Literature exploration was accomplished between April and May 2021 and intended to include evidence published up to March 15th, 2021. After automatic and manual duplicate removal, selected papers were evaluated throughout two screening phases.

Study selection and inclusion criteria

Suitable studies included: randomized clinical trials (RCTs), prospective and retrospective studies, cohort studies, and case–control studies. The adult population (older than 18 years old) was tested or monitored in the postoperative period for more than 24 h with a validated POD scale. Studies designed to record POD as primary or secondary outcome, in patients undergoing non-cardiac and non-cerebral surgery have been selectively included. Only full papers in English language were considered eligible. Papers that did not report information on an original dataset of patients (editorials, commentaries, reviews, metanalysis, etc.) and case series reporting < 5 cases were excluded. Abstracts and meeting/symposium proceedings were excluded. Studies that selectively recruited patients presenting POD were not considered appropriate for this SR. Online registered protocols of trials (ClinicalTrials.gov, WHO) were excluded.

First screening phase

Two researchers (GR and LF) independently screened and assessed title, details of the publication, and the abstract, to identify papers that fulfilled the following criteria: full paper in English, that present original information on adult patients who have undergone surgery (except cardiac or brain surgery), and in which POD recording was among the primary or secondary outcomes.

Second screening phase

Studies that qualified for the second screening phase were evaluated throughout full-text analysis of type and timing of POD monitoring. Only those that used a POD validated scale (Table 1), for more than 24 h, in patients not selectively presenting delirium, and that reported the quantitative figure of recorded POD incidence were considered eligible. Selected studies were analyzed using a standardized data extraction form, and the following variables were extrapolated: validated diagnostic POD scale used, number of recruited patients, overall POD incidence, study design and structure (i.e., RCTs, prospective or retrospective observational, and single center or multicentric), type of surgery (categorized as: breast, eye, general abdominal, gynecologic, maxillofacial, orthopedic—not spine, otolaryngologic, plastic, spinal, thoracic, urologic, vascular), and country where the study was conducted. Data will be reported and made available in the SRDR website.

Results

Study identification

Database search led to retrieve 6475 hits; after duplicate removal (1849), 4626 qualified for subsequent evaluation and underwent the first-phase screening (Fig. 1). A total of 4239 studies were excluded during the first screening phase (not full-text or original studies or not published in English or that included patients aged < 18 years or who had undergone brain or cardiac procedures). The analysis of references sections of retrieved SRs and meta-analyses led to identify additional 67 studies not retrieved through database literature search but are suitable for second-phase screening. This process led to select 454 original studies that fulfilled the criteria for second-phase screening. Among studies screened in the second phase, it was not possible to retrieve the full text of 3 manuscripts; 22 were post hoc analyses of trials included in the list. Of the 429 studies analyzed, 128 did not use a POD validated scale; in the 22 that used POD monitoring that lasted < 24 h, 6 selectively included patients presenting delirium, and 13 did not report POD incidence quantitatively. A total of 260 studies (5.6% of the studies retrieved through literature search), published between 1987 and 2021, included in their methods a diagnostic workup with the use of a POD validated scale and monitored patients for more than 24 h therefore are qualified to be included in the present SR; the full list of these papers is in Appendix 2.

Fig. 1
figure 1

CONSORT diagram

Used diagnostic scales

The 260 original studies included in this SR utilized some of the 12 scales listed by the ESA-POD-GL (Table 1): 205/260 (78.8%) adopted a single POD diagnostic scale, 48/260 (18.5%) used 2 diagnostic scales, and 7/260 (2.7%) used 3 or more diagnostic scales. The Confusion Assessment Method (CAM; versions: 3D, b, S, CR, 4) was the most frequently used diagnostic scale and was included in 158/260 (60.8%), the CAM-Intensive Care Unit (ICU) in 38/260 (14.6%), the Diagnostic and Statistical Manual of Mental Disorders (DSM) (versions: III, IV, V) in 58/260 (22.3%), the International Classification of Disease (ICD) (versions 9, 10) in 5/260 (2.0%), the Delirium Rating Scale (DSR) (R-98 or the previous) in 20/260 (7.7%), the Nursing Delirium Screening Scale (Nu-DESC) in 11/260 (4.2 %), the Memorial Delirium Assessment Scale (MDAS) in 9/260 (3.4%), the Delirium observation screening in 13/260 (5.0%), the Neelon and Champagne Confusion Scale (NEECHAM) in 5/260 (2.0%), the Delirium Symptom Interview (DSI) in 3/260 (1.1%), and the 4 'A's (Arousal, Attention, Abbreviated Mental Test - 4, Acute change) Test in 1/260 (0.4%).

Number of recruited patients

Data from a total of 1,327,808 patients were reported in the 260 studies included in this SR. The number of recruited patients ranged between 11 and 578,457; the mode (value that appears more often) is 101 patients. Considering the quartile distribution, 25% of studies include 11–89 patients (mean value: 56.3); 25%, 90–184 (mean value: 124.7); 25%, 186–366 (mean value: 256.1); and 25%, 377–578.457 (mean value: 19.986.7).

Overall POD incidence

Among the 260 studies that qualified for this SR, the reported POD incidence ranged between 0.01 and 84.0%; mean POD incidence was 23.0%, and the mode was 25.0%. Considering the quartile distribution, in 25% of the studies, POD overall incidence ranged from 0.01 to 13.1% (mean value: 8.0%); 25%, 13.5–21.1% (mean value: 17.1%); 25%, 21.2–27.9% (mean value: 24.3%); and 25%, 28.0–84.0% (mean value: 41.3%). In patients aged > 65, reported POD incidence ranged between 15 and 40% (15%, 32%, 40%) [10,11,12].

Study design and structure

Of the 260 studies selected in this SR, 156 were single-center, and 104 were multicentric; of these, 118 were prospective observational, 109 RCTs, and 33 retrospective observational studies.

Type of surgery

The 260 studies included in this SR reported data from patients undergoing a single and selective type of surgery in 223/260 (85.8%), while 35/260 (13.5%) reported data from two or multiple types of surgical procedures; 2/260 (0.8%) included patients undergoing non-brain and non-cardiac surgery, without specifying the exact type of surgery.

Considering the exclusion of cardiac and brain surgery in the first phase of screening, the most common surgery was orthopedic with 126/260 (48.4%); general abdominal surgery with 109/260 (42.0%); thoracic with 23/260 (8.8%); urological with 24/260 (9.2%); spinal with 20/260 (7.7%); maxillofacial with 7/260 (2.7%); breast with 1/260 (0.4%); otolaryngologic with 6/260 (2.3%); vascular with 8/260 (3.0%); gynecological with 9/260 (3.5%); plastic with 1/260 (0.4%), and eye with 1/260 (0.4%).

Country where the studies were conducted

The country with the highest number of original studies was China: 59/260 (22.3%); the USA was the second one: 58/260 (22.3%). Among the multicenter studies, 6/104 were international cooperative studies. One study was included from each of the following countries: Albania, Brazil, Chile, Finland, Israel, Malesia, New Zealand, and Serbia; 2 studies each from Iran, Switzerland, and Turkey; 3 studies each from Denmark, Greece, Portugal, and Thailand; 4 studies each from Australia, Belgium, Spain, and Taiwan; 5 studies from France; 6 studies each from Canada and Norway; 7 studies from Sweden; 9 studies from Italy; 10 studies from the UK, 17 studies each from Korea and Germany; 18 studies from Netherlands; and 22 studies from Japan. Even when report data were taken from similar surgical settings, studies completed in different countries display no homogeneous figures (orthopedic surgery: the USA, the UK,Turkey; Urologic surgery: China, Italy, Germany) [13,14,15,16,17,18].

POD risk factors

The analysis of the studies that qualified for the present SR suggests that there are risk factors that specifically connect with POD. Some of these risk factors can be ruled out in the preoperative phase (age; cognitive status; level of education; type and indication criteria for surgery; preoperative pain severity; some laboratory test exams; individual habits: chronic benzodiazepine use, alcohol and drug abuse, and low physical activity; etc.); other factors can complicate the intra or postoperative course (severe bleeding, using opioid, anemization, transfusion, ICU admission, hypothermia (core temperature < 36 °C at admission to the recovery room); etc.).

Discussion

The primary endpoint of this SR is to provide a selection of clinical studies that addressed POD including in their methods a postoperative monitoring diagnostic workup based on validated scales. The full list of these studies—reported as an appendix to this manuscript—can serve clinicians and researchers to extract methodological hints and available evidence-based clues. The analysis of used POD diagnostic scales indicates that the largest majority of the studies was used a single scale and that CAM and CAM-ICU are the most used. The analysis of the studies that fulfilled the criteria to be listed in the present SR demonstrated the high heterogeneity of the number of recruited patients. Of note, considering the absolute value, the recorded results should adequately support the identification of pre, intra, and postoperative risk factors; stratification criteria; and preventive and therapeutic protocols. The reported POD incidence ranged widely but consistently indicated that, in patients aged > 65 years, this complication is frequent and occurs in about a quarter of the cases thus making it a priority in perioperative medicine. The most represented type of study is prospective observational tightly followed by RCTs witnessing the interest and the effort that the scientific community is investing to better understand the pathophysiologic mechanisms and therapeutic strategies that can effectively be implemented to prevent and treat POD. Several therapies had been tested, but available evidence is conflicting and inconclusive. Presented cases were selectively recruited among single types of surgery, most commonly orthopedic or general abdominal. Interestingly, studies accomplished in China or the USA are equally represented and account for over 50% of the total.

The need to standardize diagnostic criteria is an emerging necessity in studies intended to evaluate postoperative neurocognitive complications. This is also proved by the recent systematic review on POCD published by Borchers et al. that highlights how the heterogeneity in the diagnostic workup can bias extracted evidence [19]. The authors compared the methodology of studies on postoperative cognitive decline to the reference criteria published in 1995 and, similarly to our findings, from more than 8000 studies published, only 274 (3.4%) used baseline cognitive testing and followed patients for a proper length of time. Noteworthy, according to a recent dedicated SR, the level of evidence that supports the largest proportion of guidelines released for anesthesiologists on perioperative care by the North American and European societies relies on a low level of evidence [20]. Hence, it is necessary to identify POD studies that fulfill quality criteria that can extract clinical and research insights.

In the present SR, clinical trials that involved cardiac or brain surgery patients were excluded because of the specific risk of cerebral dysfunction that can take place in these surgical settings. Patients undergoing cardiac surgery, with or without extracorporeal circulation, as well as those undergoing brain surgery, can develop focal ischemia and stroke that mask or trigger functional cognitive abnormalities [21,22,23].

The present literature analysis, despite using the same literature search strategy (including keywords and searched databases) proposed and validated by the ESA task force on POD, extracted a more limited number of hits. This is possibly due to the revised functional algorithm used by scientific library databases. Especially relevant is the finding that a significant number of clinical studies related to POD were not identified with the used keywords and strings. The mismatch between searching criteria and published evidence is possibly attributable to a suboptimal manuscript categorization during the submission and publication phases. In the future, it would be important to identify a clear nomenclature for those studies that provide insights related to POD. The present SR was designed conforming to the latest version of the “automatic term mapping” enhancement released by PubMed in 2019 and the use of the asterisk was removed by the literature search strings accordingly. Furthermore, redundancies of the strings used by the ESAIC-TF for literature search, accomplished to prepare the guidelines released in 2017, have been eliminated.

Study limitations

The major limitation of this SR is that it does not provide any specific information on POD prevention or diagnosis, or treatment. This is because we intend the underlying work to provide to the scientific community a shared platform to extract evidence-based medicine principles and methodological hints to be included in future studies. Considering the large variability of anesthesiology techniques used and the non-homogeneity in reporting the adopted approach (general, neuraxial, or loco-regional) and of the various types of surgical procedures, it was not possible to describe, in the present SR, in detail the relationship between the type of used anesthesia and POD. Future analysis of the presented list can address several aspects including the sensibility and sensitivity of the POD diagnostic scales considering those studies that adopted two or more of the validated scales.

In conclusion, available clinical literature on POD, that qualifies to prepare evidence-based indications, relies on a limited selection of studies that include a diagnostic workup based on validated scales. In order to extract indications based on reliable evidence-based criteria, these are the studies that should be selectively considered. The analysis of these studies can also serve to design future projects and to test clinical hypothesis with a more standardized methodological approach.