1 Introduction

Cognitive change affecting patients after anesthesia and surgery, particularly in elderly individuals, has been recognized in one form or another for more than 100 years [1]. Postoperative cognitive dysfunction (POCD) has gradually become one of the most common postoperative complications, and such cognitive changes after surgery can last for months to years, which have a significant impact on prognosis and quality of life in patients, as well as increasing caregiver burden and health care costs [2,3,4]. Therefore, investigating potential methods of improving cognitive reserve in surgical patients is vital.

Currently, evidence-based strategies that effectively reduce the risk of POCD are still lacking. Some nonpharmacologic interventions are considered first-line treatments [5, 6]. A number of recent studies have proposed that cognitive training (CT) can be used as an innovative, low-risk, scalable intervention to increase community elders' cognitive reserve, which is expected to improve cognitive functions, including attention, short-term memory, visuospatial processing, and so on, and the effects can last for months or even years [7,8,9,10]. Improvements in these cognitive domains may also protect against POCD, which can be characterized by deficits in one or more of these domains. Thus, in recent years, cognitive training in patients undergoing general anesthesia has generated widespread interest to explore whether cognitive training can improve postoperative cognitive function and reduce the risk of POCD [11,12,13,14,15,16]. Both preoperative and postoperative cognitive training have been reported, covering cardiac surgery, neurosurgery and other major noncardiac surgery populations, but its feasibility and effectiveness are still controversial [16,17,18,19,20,21].

We conducted this systematic review and meta-analysis to explore the effect of CT on the prevention of POCD in patients undergoing surgery with general anesthesia and to investigate whether different timings of CT have diverse effects and which surgical populations benefit most.

2 Methods

This work adheres with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [22] and was prospectively registered with PROSPERO (CRD42022355908).

2.1 Information sources and study selection

In this review, we systematically searched Medline, Embase, Web of Science and Cochrane Library from inception to July 18, 2022, for randomized controlled trials that examined the effects of CT on cognitive outcomes in surgical patients with general anesthesia (see full search strategy in Table S1). The search terms used a combination of subject and entry terms. We did not apply database and publication language limits, and reference and citation lists of relevant studies were manually scanned for potential eligible articles. One reviewer (W.J.Z. performed initial eligibility screening based on the title and abstract, and two independent reviewers (Y.J. and P.P.F.) assessed full-text versions. If there were discrepancies that could not be solved, a senior researcher (X.S.L.) joined and approved the final list of included studies. One reviewer (W.J.Z.) contacted authors for full information when data or eligibility was unclear.

2.2 Eligibility Criteria

2.2.1 Types of participants and interventions

We included published reports of randomized controlled trials (RCTs) examining cognitive outcomes of CT in surgical patients with general anesthesia. Interventions included cognitive training, which was defined as repeated practice of cognitively challenging tasks, including strategy training or drill exercises using computer or paper-and-pencil methods. For studies that used a combination of CT and other interventions (e.g., exercise rehabilitation), only those in which CT accounted for at least 50% of the intervention were included.

2.2.2 Types of controls

Passive (no-contact, wait-list) or active (e.g., psychoeducation) controls were needed.

2.2.3 Types of outcomes

The main outcome was the incidence of POCD. Incidence of postoperative delirium (POD), global cognition, various subdomains of cognition (executive function, working memory, speed, language and verbal memory) and psychosocial functioning (depressive symptoms and anxiety symptoms) were included as secondary outcome measures. All eligible outcomes per study and domain were included. We did not exclude pilot studies because some of them had a large sample size.

2.3 Data collection and coding

According to accepted neuropsychological categorization [23], two reviewers (Y.J. and W.J.Z.) coded each outcome measure into cognitive domains (for categorization of outcomes by domains, see Table S2). The main outcome was the incidence of POCD, coded by consensus and approved by X.S.L. It should be noted that POCD within 30 days after anesthesia and surgery, as we have defined it in this study, would be classified as “delayed neurocognitive recovery” using a newly recommended nomenclature for describing perioperative cognitive disorders [1], which could be assessed by single scale or neuropsychological battery tests (NPTs) (full details about the definition of the outcome are shown in supplementary Table S4). Secondary outcomes included postoperative delirium, global cognition, cognitive subdomains (executive function, speed, language, verbal memory and working memory) and psychosocial function (depressive symptoms and anxiety symptoms). Outcomes were recorded as the proportion of POCD risk or as the mean and standard deviation (SD) of cognitive and psychosocial function.

2.4 Risk of bias in individual studies and quality appraisal

The Cochrane Collaboration’s risk of bias tool was used to evaluate the risk of bias of the included studies [24]. This tool contains the following 6 items: sequence generation, allocation concealment, blinding of outcome assessment, incomplete outcome data, selective outcome reporting and other sources of bias, with each item assessing results as low bias, high bias, or unclear. Studies that lacked assessor blinding or complete outcome data were considered to have a high risk of bias. In addition, methodological quality within studies was assessed using the adapted version of the Physiotherapy Evidence Database (PEDro-P) Rating Scale [25]. The original scale consists of 11 items. However, blinding of therapists and patients was not assessed due to impracticality in CT trials, and thus, the maximum obtainable score was set at 9. Assessments were conducted by multiple independent reviewers (Y.J., Z.X.S., and S.G.). A senior reviewer (X.S.L.) established consensus scores and resolved disagreements.

2.5 Statistical analysis

We calculated the summary relative risks (RRs) using a random effects model effect size, indicating the difference in proportion of POCD and POD between the CT condition and control condition. In addition, the standardized mean difference (SMD) between the CT and control groups of change from baseline to posttraining was used for cognitive and psychosocial functioning outcomes. Analogous to Cohen’s d, [26] we calculated SMD as Hedges’ g with a 95% confidence interval (CI) for the above outcome measure, in which estimates of <0.30 and ≥0.30 but <0.60 and ≥0.60 were considered small, moderate, and large, respectively. A positive SMD indicated a therapeutic effect of CT over and above the control. SMD and variance were combined into a single study-level estimate where studies provided more than one outcome per domain for analysis.

The I2 statistic with 95% CI was used to quantify the heterogeneity across studies, and I2 values of 25%, 50%, and 75% were considered low, moderate, and large, respectively [27]. We drew funnel plots for each analysis to inspect for asymmetry that might suggest a small study effect (publication bias) [28]. There were fewer than 10 studies in each outcome; thus, planned analysis of funnel plot asymmetry using Egger’s test of the intercepts was not performed due to insufficient power for such an analysis [29]. A sensitivity analysis was conducted by repeating the random-effects analysis after removal of the included studies one by one once potential asymmetry was found, and the results were robust when the removal of any study did not change the findings. Similarly, to explore potential heterogeneity, we conducted subgroup analyses by study characteristics (timing of CT, evaluation period, implementation methods of CT, methods of evaluation and surgical populations). All analyses were performed using Stata 15.0 (Stata Corporation, College Station, TX) software.

3 Results

3.1 Study selection

We initially obtained 7635 articles in the databases, and 6514 articles were screened for initial eligibility based on titles and abstracts after removing duplicate entries. We then assessed the full-text versions of 117 full-text articles, of which 15 studies were eligible for inclusion in the review. Two articles did not provide summary data, one responded to our contact but could not offer the full information [30], and the other did not respond [31]. In addition, we requested additional data from the authors of 5 reports, of which 2 provided data [32, 33]. Finally, a data set of 13 independent comparisons remained (Fig. 1).

Fig. 1
figure 1

Summary of trial identification and selection. Note that a single study could be excluded on more than one criterion, but appears only once in the chart. RCT randomized controlled trial

3.2 Characteristics of included studies

Overall, there were 989 participants across the 13 included studies (CT, n=497, mean group size=38; controls, n=492, mean group size=38; Table 1). The mean age ranged between 49 and 72 years old, and approximately 47.83% of participants were male. The types of surgery included cardiac and noncardiac surgery (cardiac, k=4, noncardiac, k=9). Two of the 13 studies compared CT to an active control intervention. The mean PEDro-P score was 7.4/9 (SD=0.96), and 7/13 studies were found to have a high or unclear risk of bias (for risk of bias assessments, see Table S3 and Figure S4 in the Supplement).

Table 1 Characteristics of included studies

There were varied intervention designs across studies. Approximately half of the studies (7/13) administered supervised on-line CT (computer based), while the remaining used off-line CT (paper-and-pencil based). Four studies trained participants in the preoperative period, 7 in the postoperative period, and 2 in the perioperative period. Each training session lasted from 20 to 60 minutes, and the total training time ranged from 3 to 30 hours (Table 1). The primary outcome consisted of the majority of delayed neurocognitive recovery evaluated within 30 days after anesthesia and surgery, and only two studies evaluated POCD from 30 days after anesthesia and surgery to 12 months of follow-up. POD was evaluated within 7 days after surgery. Other secondary outcomes were evaluated within 3 months after surgery.

3.3 Overall efficacy on POCD risk

Seven studies were included in the analysis of CT and POCD risk, of which analysis yielded a significant decrease in the risk of POCD (k=7, RR=0.52 [0.34–0.78], P<0.01, I2 =40%, Fig. 2). The funnel plot revealed one conspicuous outlier [11]. Removal of this study yielded a lower and statistically significant combined effect size (RR=0.43 [0.31–0.60], P<0.01, I2 =0%), which did not change the original result (Figure S6). We conducted a subgroup analysis according to the possible predictors. The studies of noncardiac surgery were associated with a statistically low RR, while those of cardiac surgery were not (k=4, RR=0.43 [0.29–0.63], P<0.01, I2 =0% vs k=3, RR=0.73 [0.28–1.87], P=0.51, I2 =68.5%, Figure S2.1). The pooled RRs for preoperative CT and postoperative CT were both low and statistically significant, while that for perioperative CT was not (k=2, RR=0.42 [0.25–0.70], P<0.01, I2 =0% vs k=4, RR=0.43 [0.28–0.67], P<0.01, I2=0% vs k=1, RR=1.44 [0.69–3.01], P=0.34, I2=0%, Figure S2.4). The studies of off-line CT were associated with a statistically low RR, while those of on-line CT were not (k=5, RR=0.40 [0.27–0.59], P<0.01, I2=0% vs k=2, RR=0.85 [0.31–2.32], P=0.75, I2 =75.9%, Figure S2.2). CT is effective in reducing the risk of both delayed neurocognitive recovery and POCD (Figure S2.3). The removal of two studies that assessed POCD by different methods also resulted in statistically significant effect sizes (Figure S2.5). Sensitivity analyses demonstrated that the results were robust (Figure S5).

Fig. 2
figure 2

Overall efficacy of CT on risk of POCD. POCD postoperative cognitive dysfunction, CT cognitive training, CI confidence interval

3.4 Overall efficacy on POD risk

Four studies were included in the analysis of CT and POD risk, of which analysis did not yield a significant decrease in the risk of POD (k=4, RR=0.86 [0.50–1.48], P=0.59, I2=20.6%, Fig. 3). The funnel plot did not reveal substantial asymmetry (Figure S6). A subgroup analysis according to the surgical populations was conducted, and the pooled RR of CT did not show a significant decrease in the risk of POD in cardiac or noncardiac surgery (k=2, RR=1.02 [0.32–3.26], P=0.97, I2=21,3% vs k=2, RR=0.86 [0.37–1.98], P=0.73, I2=53.5%, Figure S3.1). We were unable to perform a subgroup analysis according to the other two possible predictors because of the small number of included studies. Sensitivity analysis demonstrated robust results (Figure S5).

Fig. 3
figure 3

Overall efficacy of CT on risk of POD. POD postoperative delirium, CT cognitive training, CI confidence interval

3.5 Overall efficacy on cognitive outcomes

3.5.1 Global cognition

A total of 3 articles provided outcomes of global cognition. The overall effect of CT on global cognition was not statistically significant (k=3, g=1.56 [−0.46–3.57], P=0.13, I2=95.9%, Fig. 4). The funnel plot revealed one conspicuous outlier [33]. Removal of this study yielded a lower and statistically insignificant combined effect size (g=0.31 [−0.11–0.72], P=0.14, I2 =0%), which did not change the original result (Figure S6). A sensitivity analysis proved that the results were robust (Figure S5).

Fig. 4
figure 4

Efficacy of CT on cognitive subdomains. CT cognitive training, CI confidence interval

3.5.2 Executive function

A total of 6 articles provided outcomes of executive function. The overall effect of CT on executive function was moderate and statistically significant (k=6, g=0.30 [0.05–0.55], P=0.02, I2=20.7%, Fig. 4). The funnel plot did not reveal substantial asymmetry (Figure S6). Sensitivity analysis proved robust results (Figure S5).

3.5.3 Working memory

A total of 5 articles provided outcomes of working memory. The overall effect of CT on working memory was moderate and statistically insignificant (k=5, g=0.44 [−0.05–0.92], P=0.08, I2=74.9%, Fig. 4). The funnel plot did not reveal substantial asymmetry (Figure S6). A sensitivity analysis proved that the results were robust (Figure S5).

3.5.4 Speed

A total of 5 articles provided outcomes of speed. The overall effect of CT on speed was small and statistically significant (k=6, g=0.28 [0.06–0.50], P=0.01, I2=0%, Fig. 4). The funnel plot did not reveal substantial asymmetry (Figure S6).

3.5.5 Language

A total of 5 articles provided language outcomes. The overall effect of CT on language was small and statistically significant (k=4, g=0.28 [0.03–0.52], P=0.03, I2=0%, Fig. 4). The funnel plot did not reveal substantial asymmetry (Figure S6).

3.5.6 Verbal memory

A total of 5 articles provided outcomes of verbal memory. The overall effect of CT on verbal memory was small and statistically significant (k=6, g=0.22 [0.01–0.44], P=0.04, I2=0%, Fig. 4). The funnel plot did not reveal substantial asymmetry (Figure S6).

3.6 Overall efficacy on psychosocial function

3.6.1 Anxiety symptoms

A total of 2 articles provided outcomes of anxiety symptoms. The overall effect of CT on anxiety symptoms was small and statistically insignificant (k=2, g=0.1 [−0.27–0.46], P=0.61, I2=0%, Fig. 5). The funnel plot did not reveal substantial asymmetry (Figure S6).

Fig. 5
figure 5

Efficacy of CT on psychosocial function. CT cognitive training, CI confidence interval

3.6.2 Depressive symptoms

A total of 2 articles provided outcomes of depressive symptoms. The overall effect of CT on depressive symptoms was small and statistically insignificant (k=2, g=0.1 [(−0.27–0.47), P=0.59, I2 =0%, Fig. 5). The funnel plot did not reveal substantial asymmetry (Figure S6).

3.6.3 Adverse events

No adverse events related to CT were reported.

4 Discussion

Previous studies have focused on healthy populations and populations with mild cognitive impairment and dementia, and have demonstrated the effectiveness of cognitive training in these populations [34,35,36,37,38,39]. We were the first to conduct a meta-analysis that sought to conclude, based on the results of 7 moderate-quality randomized controlled trials, that CT may be a feasible intervention for reducing the risk of POCD in surgical patients. Heterogeneity of the results of the component studies was modest.

The results of subset analysis showed that various types of surgery and timings of CT mainly caused heterogeneity. CT in population of noncardiac surgery was associated with a statistically low risk of POCD while in population of cardiac surgery was not, may be due to the insufficient number of CT studies in cardiac surgery patients and the potential different mechanisms of POCD between cardiac and noncardiac surgery may be another key factor [11, 12, 40]. This suggests the need for further large-scale CT in the cardiac surgery population studies. Similarly, heterogeneity was reduced after grouping according to the timing of CT (preoperative period, postoperative period and perioperative period). The subset analysis showed that preoperative CT and postoperative CT statistically decreased the risk of POCD, while perioperative CT did not. This could be explained by the fact that there is only one feasibility study in the perioperative period subgroup, and the sample size may not have enough statistical power to detect significant differences [11]. Another subset analysis based on the type of CT suggested that off-line CT was associated with a low risk of POCD, while on-line CT was not. This result may be due to lower training adherence of surgical patients in the process of on-line CT, resulting in insufficient training time. Moreover, compared with off-line CT conducted face to face by trained professionals, on-line CT lacks a focused environment, which might weaken the effect of CT. Future studies should consider dedicating more time to optimizing on-line CT considering its higher scalability and convenience. In addition, the efficacy of other subset analyses based on the assessment period coincided with the overall efficacy of CT on POCD risk, which suggested that CT is effective in reducing the risk of both delayed neurocognitive recovery and POCD. The results of the subset analyses and sensitivity analyses described above demonstrate that the findings are robust and thus suggest a beneficial therapeutic effect of CT in the surgical population.

The results of POD from 4 randomized controlled trials showed that CT did not yield a significant decrease in the risk of POD.The heterogeneity of the results was small and was increased after grouping according to types of surgery (cardiac surgery and noncardiac surgery), and the results did not reveal any statistically significant difference. The reason might be that the training adherence of the included studies varies widely in this field, and the training time of some studies falls short of the 10 hours presumed to be the effective “dose” of CT. Thus, the compliance of patients in the surgical population may be an important consideration [41]. The type of CT and the timing of CT were possibly other sources of heterogeneity, but the potential predictors cannot be statistically tested in view of the insufficient sample size. Therefore, given the small sample size of the 4 included studies, further validation of the true efficacy of CT on POD in surgical patients is needed to cautiously address feasibility issues before conducting a large randomized controlled efficacy trial.

In line with previous findings in healthy elderly individuals, small to moderate effect sizes on most memory and learning domains were found in our analysis, especially on executive function, a key predictor of functional decline. However, the effect of working memory did not reach the threshold of statistical significance, and the effects in this domain were contrary to those found in elderly individuals with cognitive impairment, which showed a large effect [42]. This result may be due to inadequate working memory training in the surgical population, and more time should be considered for working memory tasks in future studies. Moreover, similar to a previous review of Parkinson’s disease [43], the effect on global cognition was not statistically significant. One reason could be the insufficient sample size and large heterogeneity in the analysis of global cognition, and another may be the use of diverse subjective measures.

Depression has been shown to be associated with cognitive impairment and the progression of conversion to dementia. We found no significant effect of CT on depression and anxiety, which is consistent with previous studies and suggests that CT may not be beneficial for patients' mood [42, 43]. One limitation in this area is that there were only two studies in the analysis of anxiety and depression symptoms, which may not provide adequate statistical power [13, 32].

There are several limitations, most notably that the precision of our results was influenced by the relatively small quantity of RCTs and their typically small sample sizes to some extent, which led to a lack of power of planned subset analyses based on several possible predictors. Second, the methods used to evaluate POCD were not consistent across the included studies due to the lack of a unified diagnostic method for POCD. Therefore, considering the impact on the estimation of heterogeneity, subset analyses were added based on different assessment criteria, but this potential heterogeneity still had some impact on our findings. In addition, we focused on the short-term effect of CT on account of the few studies on long-term effects, so generalizability to the long-term effect of CT in the surgical population is needed in the future. Finally, the included subjects focused on different surgical populations, varied CT methods and training periods, which may bring more possible confounding factors and biases and may render the results less valid. However, for our primary results, subset analyses and sensitivity analyses were performed based on possible predictors, and the conclusions proved to be ultimately reliable.

5 Conclusions

The current body of RCT evidence suggests that both preoperative and postoperative CT may be beneficial to reducing the incidence of POCD, particularly in the noncardiac surgery population, but CT did not yield a significant decrease in the risk of POD. Certain cognitive domains, such as executive function, speed, verbal and verbal memory, were improved at mild or moderate intensity in surgical patients, whereas POD, global cognition, working memory and psychosocial functioning were not. This intervention therefore warrants longer-term and larger-scale trials to examine the effects on the risk of POD and application to the cardiac surgery population.