Introduction

Cognitive impairment is a common non-motor symptom in idiopathic Parkinson’s disease (PD) with a prevalence of approximately 40% [1]. Since cognitive deficits have a negative impact on patients’ quality of life (QoL) [2], increase mortality [3] and so far only limited pharmacological treatment options are available [4, 5], there is a need for research in non-pharmacological interventions. Two meta-analyses showed positive effects of cognitive training (CT) in PD patients regarding executive functioning, working memory, memory, processing speed, or attention with small to medium effect sizes [6, 7]. A review on non-pharmacological management of cognitive impairment in PD reported level B evidence for improving or maintaining memory, attention and working memory performance after CT [8], while another recent review on CT in PD patients with mild cognitive impairment (PD-MCI) and PD dementia did not find clear evidence that CT improves cognitive functioning [9]. However, the authors emphasize the low level of certainty due to small sample sizes, the heterogeneous study population concerning varying degrees of cognitive impairment, and the lack of studies reporting on long-term effectiveness. Moreover, little research has been done in the past to identify predictors of CT responsiveness in PD patients. Few previous studies systematically investigated a variety of sociodemographic, clinical, genetic, and neuropsychological factors [10,11,12,13,14], however, inconsistent results were reported for most predictors.

Our recently published multicenter randomized controlled trial (RCT) that is directly linked to the present study analyzed the short-term results of CT in PD-MCI patients compared to an active physical training control group (PT) [15]. In the CT group, an enhancement of executive functions (especially verbal fluency) and self-reported physical activity could be demonstrated while working memory improved in the PT group. In the memory domain, however, no significant training gains were found. Baseline cognitive levels, education, disease progression, and Apolipoprotein E4 (ApoE4) state were significant predictors for training responsiveness, indicating that vulnerable patients benefit the most from CT. Also, it could be shown that CT is feasible and safe for PD-MCI patients. Here, we report the long-term results of the study at 6 and 12 months follow-up assessments after CT. We aimed (1) to examine the long-term efficacy of CT regarding memory and executive functioning as well as further secondary cognitive and non-cognitive outcome parameters in PD-MCI, and (2) to identify predictors for training responsiveness at these follow-up time points.

Methods

Study design

The study is registered in the German Clinical Trials Register (ID: DRKS00010186) and was approved by the local ethic committees of all participating centers. All patients gave their informed consent in written form. Data were collected in four German university hospitals (Cologne, Duesseldorf, Tuebingen, Kiel) between July 2016 and May 2018. A priori sample size calculation focused on short-term training effects showed that an overall sample size of n = 80 at baseline is necessary to achieve 80% power at a significance level at p = 0.05 when considering a 10–15% drop-out rate. The participants were randomized to the CT or PT group and the persons who carried out the outcome investigations were blinded for intervention type. The patients were assessed pre- and post-intervention as well as 6 and 12 months after intervention, each assessment within a time frame of 4 weeks based on the first or last session of the intervention. All intervention sessions and diagnostic examinations were performed under regular antiparkinsonian medication. Data were entered in a secured online database system in pseudonymized form. Data monitoring was carried out by two members of another study site. For a detailed reporting on study design, randomization procedure and data management following the CONSORT statement, please see Kalbe et al. [15].

Patients

All patients were diagnosed with PD according to the UK Brain Bank criteria [16] and PD-MCI according to the Movement Disorders Society task force Level-II criteria [17] requiring impairment in at least two cognitive tests (operationalized as at least one standard deviation below the mean normative score). Further inclusion criteria were age between 50 and 80 years and a PD duration of at least three years with a stable medication within four weeks before screening procedure as well as subjective cognitive impairment as diagnosed using the Subjective Cognitive Impairment questionnaire [18] and/or objective cognitive impairment in the Montreal Cognitive Assessment [19] (cut-off < 26 points). Exclusion criteria were a clinical PD dementia diagnosis according to the criteria of Emre et al. [20], impaired activities of daily living (ADL) according to the Pill Questionnaire [21] (impact on daily living is supposed when the patient cannot describe his or her regular medication and in case of doubt a caregiver confirms that he or she is no longer able to take the pills safely and reliably without supervision), and severe depression measured with the Beck Depression Inventory II [22] (cut-off ≥ 20 points, range 0–63 points, higher scores indicate more severe signs and symptoms of depression). In an anamnestic interview, the following exclusion criteria were evaluated: suicide tendency, severe comorbidities, severe fatigue, prominent impulse control disorder or dopamine dysregulation syndrome, acute psychosis or psychotic episode in the last six months, dementia medication, participation in other treatment studies within the last two months, pregnancy, or deep brain stimulation.

Interventions

As CT, the NEUROvitalis program [23] was conducted. In this standardized training program, executive functions, memory, attention, and visuocognition are trained by group tasks, activity games, individual exercises, and homework. Furthermore, it contains psychoeducative elements, e.g. explaining cognitive functions and strategies to enhance these functions. Two sessions of the original version of the program were modified in consideration of the characteristic cognitive profile of PD patients. More precisely, two memory sessions were replaced by sessions focusing on executive functions and visuocognition. The modified program was recently published as NEUROvitalis Parkinson [24]. The PT group received a low-intensity physical activity program which aimed to improve motor function but not cognition. Each session included warm-up exercises, specific exercises focusing on stretching, flexibility, loosening up, or relaxation, psychoeducation, and homework. Both training programs were conducted in groups with three to five patients and included two 90 min sessions a week over a total of six weeks. As part of CT and PT, patients were encouraged to stimulate themselves cognitively and physically after the end of the training phase, but no new training sessions or exercises were conducted until the follow-up assessments. For further details of the study interventions, we refer to Supplementary Table 1 in the article by Kalbe et al. [15].

Table 1 Sociodemographic and clinical baseline characteristics of the PD-MCI subgroups that are included in the 6 respective 12 months follow-up analyses

Outcomes

Primary study outcomes were (i) a composite score for memory and (ii) a composite score for executive functions, both defined as averaged z-scores of the respective cognitive test parameters. Secondary outcomes were composite scores for attention, working memory, visuocognition, and language, as well as single test results for ADL, self-reported physical activity, depression, QoL, self-experienced attention deficits, motor impairment, and freezing of gait. The Diagnostic Tests used were the following:

  • Memory: California Verbal Learning Test (CVLT) [25]—total score trials 1–5 and long delay free recall II, Rey-Osterrieth Complex Figure Test (ROCFT) [26]—delayed recall.

  • Executive functions: Regensburger word fluency tests [27]—phonemic and semantic word fluency, modified card sorting test [28]—categories completed, Behavioural Assessment of the Dysexecutive Syndrome [29]—Key Search test.

  • Attention: d2-R [30]—errors and concentration performance.

  • Working memory: Wechsler Adult Intelligence Scale III [31]—letter-number sequencing and digit span backwards.

  • Visuocognition: ROCFT—copy, Benton Judgment of Line Orientation [32].

  • Language: Consortium to Establish a Registry for Alzheimer's Disease [33]—Boston Naming Test, Aphasia Check List [34]—speech comprehension.

  • ADL: Bayer Activities of Daily Living Scale [35].

  • Depression: Beck Depression Inventory II [22].

  • Self-reported physical activity: Physical Activity Scale for the Elderly [36].

  • Quality of Life: Parkinson’s Disease Questionnaire 39 [37].

  • Self-experienced attention deficits: Self-perceived deficits in attention questionnaire [38].

  • Motor impairment: Unified Parkinson’s Disease Rating Scale Part III (UPDRS III) [39].

  • Freezing of gait: Freezing of Gait Questionnaire [40].

Parallel test versions were used if available. Neuropsychological assessments were conducted by trained psychologists, neurological tests were carried out by neurologists, physicians in neurological training, or PD nurses.

Statistical analysis

Data analyses were carried out using SPSS Statistics for Windows, Version 25.0 (Armonk, NY: IBM Corp). To investigate long-term effects of the CT group in comparison to PT, 3 × 2 (time × group) mixed repeated measures analyses of variances (ANOVA) were computed for primary and secondary outcome variables. An effect was considered significant at p ≤ 0.05. As we used two primary outcome scores, we used Bonferroni correction for multiple testing and therefore considered an effect as significant at p ≤ 0.025. Due to the exploratory character, no alpha-correction was applied for the secondary outcome analyses. Partial eta square (η2) is reported as effect size, indicating small effects from η2 = 0.01 to η2 ≤ 0.06, medium effects from η2 > 0.06 to η2 < 0.14, and large effects from η2 ≥ 0.14 [41]. To avoid the risk of drop-out associated bias, we report the results of a per-protocol (PP) approach as well as of an intention-to-treat (ITT) approach for the ANOVAs. For the PP approach, only patients who completed the respective follow-up assessment were included in the analyses; for the ITT approach, missing data were imputed using the Last Observation Carried Forward (LOCF) method.

In case of a significant time x group interaction effect, test-specific post-hoc analyses were calculated to examine direction and temporal course of the effect. For this purpose, change scores were computed by subtracting baseline scores from 6 and 12 months follow-up scores, and tested for normal distribution using the Shapiro–Wilk test. Afterwards, change score differences between the intervention groups were compared with independent samples tests or Mann–Whitney U tests, respectively. Moreover, paired t tests for dependent samples, respectively, Wilcoxon tests were computed to detect significant mean score changes over time within both groups. Post-hoc significance levels were Bonferroni corrected for the number of cognitive tests within the respective domain.

Furthermore, we examined possible predictors of intervention responsiveness. Intervention responsiveness was operationalized by the change scores (differences between baseline level of the respective cognitive outcome score and the performance at follow-up assessment). Therefore, multiple linear regression analyses were performed for the 6 months as well as for the 12 months change scores. Concerning the training’s specificity, the analyses were computed for both intervention groups. Following studies with healthy older adults and PD-MCI patients [42,43,44,45,46,47,48,49], we included as predictors the baseline level of the respective outcome variable, age, sex, education level, and ApoE4 status. Regarding PD characteristics, we added UPDRS III and levodopa equivalent daily dose (LEDD) as possible predictors what is in line with Kalbe et al. [15].

Results

Dropout analysis

Initially, 76 patients were screened for eligibility and after pretest 64 patients were randomly allocated to the CT group (n = 33) or PT (n = 31), respectively. The dropout rate during the intervention phase was 4.7% (CT: n = 2, PT: n = 1). Out of the 61 patients who completed the pre- and post-intervention assessments, 54 patients completed the 6 months (CT: n = 28, PT: n = 26) and 49 patients completed the 12 months follow-up assessment (CT: n = 25, PT: n = 24). Dropout rates were 11.5% from baseline to 6 months follow-up and 9.3% from 6 to 12 months follow-up. Reasons for dropout were illness other than PD that made further participation impossible (CT: n = 2, PT: n = 2), loss of contact (CT: n = 1, PT: n = 3), patients’ wish to stop participation (CT: n = 2, PT: n = 1), and deep brain stimulation (CT: n = 1), see also Supplementary Fig. 1 (online resource). Drop-out patients did not significantly differ from patients who completed the study in terms of age (p = 0.281, Mann–Whitney U test), sex (p = 0.223, Fisher’s exact test), intervention group (p = 1.000, Fisher’s exact test), and motor impairment (p = 0.409, Mann–Whitney U test).

Comparability between groups

Sociodemographic and clinical baseline characteristics of the subgroups included in the 6 and 12 months follow-up analyses can be seen in Table 1. The intervention groups were comparable with regard to age, sex distribution, education, disease onset, disease duration, severity of motor symptoms, LEDD, ApoE4 state, and depression. Further, we checked for comparability between groups concerning the training participation. Patients included in the 6 months follow-up analysis participated in 11 of the 12 training sessions (median; CT range: 8–12, PT range: 9–11) independent of group affiliation (χ2 = 5.333; p = 0.255). For the 12 months follow-up groups median and range did not change (χ2 = 2.536; p = 0.638).

Long-term effects of the cognitive training

Table 2 presents the results of the training effects analyses. Regarding the primary outcome variables, time × group interaction was significant for memory composite score (PP: p = 0.006, η2 = 0.214; ITT: p = 0.023, η2 = 0.123), indicating a medium effect size favouring the CT group. Interaction effects for the executive functions composite score as well as for all secondary cognitive and non-cognitive outcomes did not reach significance. Post-hoc tests showed that change scores are significantly higher in the CT group than in the PT group at 6 months follow-up for CVLT total score (p = 0.011), and ROCFT delayed recall (p = 0.014), however, there were no significant change score differences at 12 months follow-up assessment (Table 3). Moreover, paired t tests showed significantly better test results at 6 months follow-up compared to baseline assessment for CVLT total score (p < 0.001), and ROCFT delayed recall (p = 0.002) in the CT group. No significant differences were found between pre-intervention and 12 months follow-up assessment. In the PT group, there were significant differences between baseline and 6 as well as 12 months follow-up assessments for CVLT delayed recall (p = 0.001 respective p = 0.013) with better test results at the follow-up assessments. All significant results indicate an improvement over time. Between 6 and 12 months follow-up, there were no significant memory changes in either group. The results are presented in Table 4. Figure 1 illustrates the course of the memory scores in both groups.

Table 2 Training effects for both intervention groups
Table 3 6 and 12 months memory change score differences between cognitive training and physical activity group
Table 4 Memory test results before intervention and at 6 and 12 months follow-up assessment in both intervention groups
Fig. 1
figure 1

Memory domain z-scores pre-intervention and at 6 and 12 months follow-up assessments for both training groups

Prediction of long-term effects

Significant models for predicting change scores of the CT group were found within the executive function, visuocognition and language domains as well as for QoL and motor function at 6 months follow-up (0.374 ≤ R2adj ≤ 0.713). There was no significant regression model for the prediction of training responsiveness in the memory domain after 6 months. At 12 months follow-up, significant predictive models were found within the memory, executive functions, attention, working memory, visuocognition, and language domains as well as for self-reported physical activity and QoL (0.337 ≤ R2adj ≤ 0.651). A lower baseline level in the respective outcome variable significantly predicted training gains in almost all significant regression models, the only exceptions were the QoL models. Additionally, higher respective lower age, female respective male sex, higher education level, lower baseline motor status and LEDD, and positive respective negative ApoE4 status were significant predictors for training gains in some secondary outcome parameters after CT. For the PT group, significant regression models were found for the prediction of memory, executive, visuocognitive, language, motor function and ADL change scores after 6 or 12 months (0.374 ≤ R2adj ≤ 0.961) with lower baseline levels as significant predictors for training responsiveness in all cases, and higher age, male sex, higher education level, lower baseline UPDRS III score, and higher baseline LEDD as significant predictors in few single variables. All significant regression models are presented in Supplementary Tables 1 and 2 (online resource).

Discussion

We report the long-term results of a multicenter RCT assessing the effects of CT in comparison to an active control training in PD-MCI. In our previous report [15], we could show that CT is feasible and safe for PD patients. Furthermore, we provided evidence for an enhancement of executive functions shortly after CT compared to PT. In the present study, we extended these results by demonstrating training gains of the CT group in the memory domain after 6 months. The main results for 6 and 12 months follow-up assessments were: (i) CT enhanced memory functions after 6 months while there was no positive effect after 12 months, (ii) there were no significant improvements of executive functions or other cognitive and non-cognitive parameters at 6 and 12 months follow-up assessments, (iii) training gains in the memory domain cannot be predicted by means of baseline score, age, sex, education, LEDD, or ApoE4 state. These results provide Class 1 evidence for memory enhancement following CT after 6 months given the multi-center randomized and single-blinded design.

We found a significant interaction effect for the memory composite score indicating an enhancement of memory performance in the CT group. This effect remained after imputing missing data. Post-hoc analyses showed that the significant interaction effect is driven by significant verbal and nonverbal memory improvement of the CT group from baseline to 6 months follow-up assessment while after 12 months the test performance declines. The largest CT improvement was demonstrated for the CVLT total score trials 1–5, a marker for the multidimensional construct of verbal learning. Remarkably, a comparable word list learning score turned out to be the most sensitive memory score for detecting memory dysfunction and cognitive impairment in PD-MCI patients [50], indicating that CT is enhancing highly vulnerable memory functions. Memory functions as primary outcome were expected to improve as the NEUROvitalis program includes training sessions focusing on the memory domain. Moreover, an enhancement in memory functioning after CT could be shown in previous PD studies [13, 51, 52], however, these studies examined the training effect immediately after intervention. Also Alloni et al. [53] demonstrated significant memory improvement immediately after CT while six months after training, the improvement remained for one out of three memory test variables. Notably, in our study, the CT group did not benefit shortly after intervention regarding memory functioning, but only on the 6 months follow-up assessment. This result is consistent with a study from Lawrence et al. [54] who could show a significant verbal memory improvement 12 weeks after CT while immediately after CT this effect did not reach significance. One possible explanation for the delayed effect could be that CT contributes to the development of cognitive strategies what first results in an enhancement of executive functioning (as we found in our study immediately after training, see Kalbe et al. [15]) and is later transferred to memory performance. An argument for this hypothesis is the high strategic load of the CVLT due to the possibility of semantic clustering. Therefore, an influence of executive control on CVLT performance was demonstrated for patients with PD [55], PD dementia [56], mixed neurological patients [57], and older adults with suspected dementia [58]. Moreover, Alexander et al. [59] showed that patients with frontal lesions have difficulties in the CVLT due to poor implementation of a strategy of subjective organization. This explanation may be also applicable to the ROCFT, even though previous studies mainly focused on executive components of the copy condition and few studies provided inconsistent results regarding a strategic load of the recall condition [60, 61]. Test–retest effects must also be considered as an explanation for the delayed memory improvement as at baseline assessment and 6 months follow-up assessment the same test version was used while immediately after intervention and at 12 months follow-up assessment a parallel version was conducted. However, there are two arguments against this suggestion. First, we found a significant time × group interaction effects while a test–retest effect would affect both groups. Second, there are no relevant mean z-score differences between post-intervention assessment (results reported by Kalbe et al. [15]) and 12 months follow-up for CVLT total score (CT: p = 0.638, PT: p = 0.148) and ROCFT delayed recall (CT: p = 0.271, PT: p = 0.957) in either group, although the same test version was used in these assessments.

Regarding executive functions, the pre-post analyses showed a significant enhancement immediately after the training in the CT group compared to the PT group [15], however, after 6 and 12 months these results did not longer remain evident. Similar results for PD patients were found in the studies from Lawrence et al. [54] and Alloni et al. [53] in which training effects in executive functioning were significant immediately after CT, but mostly not at follow-up assessment (12 and 24 weeks, respectively). Similarly, in MCI patients without PD it has been demonstrated that CT impact is strong in the short-term, but not always strong enough to maintain efficient functioning in the long-term [62]. Especially with regard to the training effort (for both patients and clinical personal), future studies must examine how training effects can be preserved in the long-term. One possible method may be the conduction of further training sessions periodically after the main intervention (so-called “booster training”) for refreshing the strategies learned. Also, continuous home exercises could prevent from a detraining effect over time.

The regression analyses did not reveal a significant model for predicting memory improvement after 6 months, although memory was the only domain in which significant improvements of the CT group could be demonstrated. Therefore, memory enhancement after CT could not be predicted by means of baseline score, age, sex, education level, motor status (UPDRS III), LEDD, or ApoE4 state, indicating that CT was comparably effective in all patients regardless of specific sociodemographic or disease-related characteristics. For executive functioning and the cognitive and non-cognitive secondary outcome variables, the respective baseline level turned out as main predictor for training gain in almost all cases, more precisely, lower baseline levels were predictive for CT responsiveness in the respective domain. This is in line with the short-term results of our study as lower baseline cognitive levels turned out to be the main predictor for training improvement directly after intervention [15]. Additional, higher respective lower age, female respective male sex, higher education level, lower baseline motor status, lower baseline LEDD, and positive respective negative ApoE4 status predicted training gains after 6 or 12 months in the CT group for selected outcomes. Previous PD studies detected lower baseline scores [12, 14], higher global cognitive status [11], higher fluid intelligence and higher self-efficacy expectancy [14], MCI diagnosis [13], higher educational level [11, 14], longer [10] or shorter disease duration [11], younger age [14], and younger age at PD diagnosis [10] as predictive for enhancements in cognitive functions immediately or 3 months after CT. These inconsistent results may be explained by study-specific differences (e.g., sample size and heterogeneity, cognitive tests used), but may also indicate the challenge of predicting CT responsiveness in cognitively impaired PD patients. In our study, the prediction results of the CT group were comparable to those of the PT group as in both groups a lower cognitive baseline level turned out as the main predictor for training responsiveness after 6 and 12 months. Therefore, a low specificity of the predictions for the type of interventional training is assumed. While the randomization procedure minimized the risk of a regression-to-the-mean effect [63], the predictive character of baseline level in both intervention groups may be explained by unspecific test–retest effects. In conclusion, CT can be recommended in PD-MCI patients irrespective of cognitive, educational or motor level, sex, medication characteristics, and ApoE4 status.

There are a few limitations to our study. First, due to recruitment difficulties, the a priori calculated sample size to achieve 80% power for detecting medium effect sizes was missed. However, as we found significant results, the risk of an underpowered study not being able to detect significant effects was not realized in our study. Second, the persons who carried out the diagnostic assessments were blinded regarding the intervention type, but the blinding was not complete as some patients reported details of intervention despite appropriate instructions. However, blinding is a general challenge in non-pharmacological studies. Third, the study did not include a passive control group what may restrict the clinical relevance as a physical activity training does not reflect clinical routine. However, the active control group is also a strength of our study because the significant effects cannot be attributed to unspecific effects due to the attention which is given to the patients during the training sessions. Nevertheless, future studies with an active and a passive control group should be carried out. Another strength of our study is that it is one of the first RCTs examining long-term effects of CT and its predictors for long-term responsiveness in PD-MCI.

Conclusions

In summary, this study provides Class 1 evidence that multi-domain group CT enhances memory functions (but not executive functions) in PD-MCI patients in the long-term. The previously reported results of improvements in executive functioning immediately after CT could be extended by a delayed verbal and nonverbal memory improvement 6 months after intervention. Therefore, CT is an effective treatment of memory and executive functions in PD-MCI. No significant predictors could be detected for memory training gain indicating that CT is useful for PD patients unrelated to sociodemographic or disease-related characteristics.