Background

Parkinson disease (PD) is the most common progressive neurodegenerative disease worldwide. PD prevalence is increasing with age and affects 1% of the population above 60 years [1]. It is estimated that by around 2030, the number of PD patients in China will reach 5 million, accounting for about 50% of the total number of PD patients in the world [2]. PD is characterized by motor symptoms such as rest tremor, bradykinesia, rigidity, and postural instability, which affect gait, balance, and movement quality, leading to difficulty in performing basic daily activities and quality of life and placing a heavy burden on families and society [3]. Multidisciplinary input is increasingly recognized as important in PD management [4]. Currently, drugs and surgical approaches were the main treatments of PD. Clinically approved drug treatments for PD mainly include levodopa, dopaminergic receptor agonists, and monoamine oxidase-B inhibitors. Levodopa is considered as a “first line” drug, but the long-term use of it leads to many complications [5]. Deep brain stimulation may be an effective treatment in PD patients; however, clinical trials have shown that it may have cognitive and psychiatric side effects [6]. Conventional rehabilitation is considered as an adjuvant to pharmacological and surgical treatments for PD to improve many dysfunctions and self-care ability, even delay the progression of the disease.

Virtual reality (VR) has emerged as a promising technology for researching complex impairments in people with PD and for providing personalized rehabilitation [7]. This technology typically combines real-time motion detection within a virtual environment in the context of a (video)game. The user can perceive, feel, and interact with virtual environments, viewing an avatar (a character or graphical representation of the user) that mimics the user’s movements [8] by multiple sensory channels such as sight, sound, and touch [9]. Immediate feedback about performance and success is provided both concurrently (during game play) and terminally (at the end of the game). VR therapy attempts to promote activity-dependent neuroplasticity and motor learning [10, 11]. Recently, numerous systematic reviews (SRs) and meta-analyses (MAs) based on randomized controlled trials (RCTs) regarding the clinical effectiveness of VR therapy in the treatment of PD have been published. However, the overall results have remained mixed or inconclusive and their quality is uneven. An overview of SR-MAs is a relatively new method that aims to support clinical decision-making by synthesizing the findings, critically appraising the quality, and attempting to resolve discordant outcomes.

Therefore, we conducted an overview of SR-MAs to identify and summarize the existing evidence and to systematically determine the clinical effectiveness of using VR therapy to treat PD.

Methods

The overview was completed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [12] and the guidelines recommended by the Cochrane Collaboration [13]. The PRISMA checklist can be found in Additional file 1. The protocol was not prospectively registered.

Search strategy

We systematically searched PubMed, Embase, and Cochrane library databases for systematic reviews from inception to December 5, 2020, and updated to January 26, 2022. We used a combination of Medical Subject Headings with Entry Terms, or EMTREE with keywords as follows: Parkinson Disease, Virtual Reality Exposure Therapy, Virtual Reality, Exergaming, Systematic Review, and Meta-Analysis. In addition, to ensure a comprehensive data collection, references of relevant reviews were searched manually to identify additional eligible studies. The search strategy for the PubMed database is shown in Additional file 2.

Eligibility criteria

Types of reviews

In this overview, we have included SR-MAs of RCTs, and the full-text article was published in the English language. A review qualified as a SR-MA if, at a minimum, it had been conducted with systematic methods, an attempt was made to identify all of the relevant primary studies in at least one database and a search strategy was provided, and it performed a quality appraisal of the primary trials included and included quantitative syntheses. The reason for this is the fact that meta-analytical studies offer an effect estimate which would facilitate data analysis, but this was not the case for systematic reviews.

Types of participants

Participants involved in reviews were clinically definite diagnosis of PD and were defined by the UK Parkinson’s Disease Society Brain Bank or other diagnostic criteria. We had no restrictions on gender, age, drug dosage, duration, and severity of PD. We included reviews reporting an intervention carried out in a mixed sample of participants if data for participants with PD were provided separately.

Types of interventions

Intervention groups were VR-based rehabilitation interventions (with/without combined interventions). Control interventions needed to involve passive treatment (PT) or active treatment (AT) without a VR component. PT included either educational programs or a control group receiving no intervention. AT involved usual care or any other exercise intervention without a VR component.

Types of outcome measures

The primary outcomes we collected included two aspects: (1) Gait. Gait speed, stride/step length, walking stability such as the Dynamic Gait Index (DGI) or Functional Gait Assessment (FGA), and walking distance such as the Two- or Six-Minute Walk Test (2MWT or 6MWT) were used to evaluate gait. (2) Balance function. Balance was assessed with Berg Balance Scale (BBS), Timed Up and Go test (TUG), Single-Leg Stance Test (SLS), or Mini-Balance Evaluation Systems Test (Mini-BESTest).

The secondary outcomes included the following: (1) Balance confidence. The Falls Efficacy Scale (FES), FES-international (FES-I), and Activities-specific Balance Confidence scale (ABC) were used to measure the patient’s level of confidence in doing specific activities that could affect balance and cause falls. (2) Motor function. We used the Unified Parkinson’s Disease Rating Scale (UPDRS) part III to address global motor function changes. (3) Quality of life. Quality of life was determined by the 39-Item Parkinson’s Disease Questionnaire (PDQ-39), or its short form (PDQ-8), or the World Health Organization Quality of Life for Older Persons (WHOQOL-OLD). (4) Activities of daily living. UPDRS part II and the modified Barthel Index (MBI) were employed to measure activities of daily living. (5) Cognitive function. Cognitive function was measured by Montreal Cognitive Assessment (MoCA), Digit Span forward (DSF), and Mini-Mental State Examination (MMSE). (6) Neuropsychiatric symptoms. Beck Anxiety Inventory (BAI), Beck Depression Inventory (BDI), Hamilton Depression Scale (HAMD), Hospital Anxiety and Depression Scale (HADS), and 15-item Geriatric Depression Scale (GDS-15) were used to record neuropsychiatric symptom changes in subjects. (7) Postural control. Sensory organization test (SOT) was designed to examine the degree of postural control.

The exclusion criteria included the following: (1) studies which had mixed samples (PD, stroke, multiple sclerosis, cerebral palsy, or other neurological disorders) cannot extract data separately; (2) studies where PD patients all used VR without control group or control group was healthy individuals; (3) studies where PD patients with different symptoms (freezers vs. non- freezers) underwent the same VR therapy; and (4) non-systematic reviews, guidelines, conference abstracts, surveys, commentaries, editorials, letters, and notes.

Study selection

All titles and abstracts were initially screened by two independent investigators (L.Y.Q and G.Y.G) after automatically removing duplicate results to identify potentially relevant studies for inclusion. At this stage, we excluded studies that were not focused on the effects of VR therapy on PD patients or not described as SR-MAs. Furthermore, full-text articles were reviewed and selected according to eligibility criteria. We excluded reviews that did not present summary statistics for outcomes (effect size with 95% CIs). Final relevant studies were shortlisted. In case of discrepancies, a consensus was achieved by discussion. If consensus could not be reached, a third reviewer (Y.Y.S) was consulted.

Data extraction

Two investigators (L.Y.Q and G.Y.G) extracted the following basic characteristics from each eligible review: the first author, publication year, country of the review author, the number of included studies, sample size, interventions (experiment interventions and control interventions), outcomes of interest, quality assessment tools, and main conclusions. Differences between the review authors were settled by discussion, and a third reviewer (Y.Y.S) was consulted if differences persisted. The study authors were contacted with the aim of acquiring additional information on the data presented.

Quality assessment

Two independent investigators (L.Y.Q and G.Y.G) assessed the methodological quality of the SR-MAs and the certainty of evidence in the included SR-MA. We resolved discrepancies through discussion or, if needed, through arbitration by a third review author (Y.Y.S).

Methodological quality of included SR-MAs

The methodological quality of each included review was evaluated using the Assessing the Methodological Quality of Systematic Reviews 2 (AMSTAR-2) tool [14]. AMSTAR-2 is a comprehensive critical appraisal tool for SRs/MAs of randomized and non-randomized studies that focuses on weaknesses in critical domains but not an overall score. The tool assesses 16 items, among which 7 are critical domains (items 2, 4, 7, 9, 11, 13, and 15). The evaluation is reduced to three options, “Yes,” “Partial Yes,” and “No.” AMSTAR-2 classifies the overall confidence on the results of the review into four levels: high, moderate, low, and critically low.

Certainty of evidence in included SR-MAs

We did not re-evaluate the certainty of the evidence for the main outcomes if the review author had already performed the assessment. We used the Grading of Recommendations Assessment, Development and Evaluation (GRADE) assessment from the pooled outcome data as assessed by authors in a particular systematic review. Where review authors did not undertake GRADE, we performed a new assessment ourselves. The GRADE scoring is judged by the risk of bias, inconsistency, imprecision, indirectness, and publication bias [15]. Results are divided into four levels: high, moderate, low, and very low.

Statistical analysis

We did not conduct novel analyses for this overview. We summarized the characteristics of included reviews as well as the AMSTAR-2 ratings for each separate review. We have presented comparisons for each primary and secondary outcome where possible. Comparisons of primary interest were as follows.

  • VR therapy versus AT

  • VR therapy versus PT

  • VR therapy versus controls (mixed AT with PT)

We created a bubble plot to present evidence base using Microsoft office Excel 2016 software (Microsoft Corp, Redmond, WA, www.microsoft.com). Each bubble plot displayed information in 5 dimensions: effect size (standard mean difference (SMD) or mean difference (MD)) of VR therapy for PD patients (y-axis), clinical outcome area (x-axis), number of trials (bubble size), statistical significance (bubble pattern), and certainty of evidence (bubble color).

Results

Search results

A flow diagram of study screening and selection procedures is illustrated in Fig. 1. Our electronic search yielded 585 potentially relevant publications. After automatic removal of duplicates, 380 records were screened on the basis of the title or abstract. Of the remaining 46 reviews, 34 reviews were excluded: participants were not PD (n = 8), intervention was not VR (n = 1), SR-MAs were not based on RCTs (n = 8), conference abstracts only (n = 3), systematic review without quantitative data syntheses (n = 13), and full text was not English language (n = 1). Finally, 12 SR-MAs [16,17,18,19,20,21,22,23,24,25,26,27] met the inclusion criteria and were included in this overview.

Fig. 1
figure 1

A flow diagram of study screening and selection procedures

Study characteristics

The characteristics of the 12 SR-MAs included in our final analysis are summarized in Table 1. All studies were published between 2015 and 2021. The number of apposite studies included in each review ranged from 2 to 22, and the sample sizes ranged from 74 to 901. All reviews reported the VR-based rehabilitation training (VR therapy) as interventions. Out of the eligible SR-MAs, seven [16, 18, 21,22,23,24,25] included VR therapy versus AT as a comparison, two [19, 26] included VR therapy versus AT or PT respectively as comparisons. Two reviews [17, 20] did not classify the control group, which mixed AT with PT. In addition, one review [27] presented two evidence syntheses that were derived from single studies respectively. Six SR-MAs [16, 19,20,21, 25, 26] used the Cochrane Collaboration’s tool, and six SR-MAs [17, 18, 22,23,24, 27] used the PEDro scale.

Table 1 Characteristics of the included systematic reviews

Methodological quality of SR-MAs

Detailed information on the methodological quality of included SR-MAs was provided in Table 2. AMSTAR-2 score showed that one [25] (8.3%) review was of moderate quality, three [22, 23, 26] (25.0%) were low, and that of all the others [16,17,18,19,20,21, 24, 27] (66.7%) were critically low. The key factors affecting the quality of the literature included item 2 (only five reviews [16,17,18, 25, 26] had registered and had a protocol before performing the review), item 4 (seven reviews [16, 17, 19, 22, 23, 26, 27] used a comprehensive literature search strategy with searching references of relevant reviews or searching relevant gray literature), item 7 (two reviews [25, 26] provided a list of excluded studies and justified the exclusions), item 9 (all reviews [16,17,18,19,20,21,22,23,24,25,26,27] reported risk of bias use a satisfactory technique), item 11 (10 reviews [16, 18, 19, 21,22,23,24,25,26,27] conducted a statistical combination of results using appropriate methods), item 13 (all reviews [16,17,18,19,20,21,22,23,24,25,26,27] accounted for the risk of bias in the primary studies when interpreting the results of the reviews), and item 15 (three reviews [22, 23, 25] carried out an adequate investigation of publication bias study and discuss its impact on the review).

Table 2 Result of the AMSTAR-2 assessments

Effect of interventions

We found marked heterogeneity of the evaluated comparisons and measured outcomes among the included reviews. Various comparison modes in included reviews and key findings are summarized below.

Comparison 1: VR therapy versus AT

An overview of the review result summary is provided in Table 3. Figures 2 and 3 presented the evidence map of effectiveness for VR therapy compared to AT in the patients with PD.

Table 3 Summary of the effectiveness of virtual reality therapy compared to active intervention by outcomes in Parkinson’s disease
Fig. 2
figure 2

Evidence map of effectiveness (MD) of virtual reality therapy for patients with Parkinson’s disease compared with active intervention. Note. MD, mean difference; AT, active intervention; VR, virtual reality; DGI, Dynamic Gait Index; 6MWT, Six-Minute Walk Test; BBS, Berg Balance Scale; TUG, Timed Up and Go test; ABC, Activities-specific Balance Confidence scale; UPDRS, Unified Parkinson Disease Rating Scale; PDQ-39, 39-Item Parkinson’s Disease Questionnaire; MBI, modified Barthel index

Fig. 3
figure 3

Evidence map of effectiveness (SMD) of virtual reality therapy for patients with Parkinson’s disease compared with active intervention. Note. SMD, standard mean difference; AT, active intervention; VR, virtual reality; DGI, Dynamic Gait Index; BBS, Berg Balance Scale; UPDRS, Unified Parkinson Disease Rating Scale; MoCA, Montreal Cognitive Assessment

Five reviews [16, 19, 23, 25, 26] reported the stride/step length and concluded that VR therapy had a greater improvement of stride/step length compared with AT. The balance function was assessed by Berg Balance Scale (BBS) and Timed Up and Go test (TUG) in ten [16, 18, 19, 21,22,23,24,25,26,27] and six [16, 18, 21,22,23, 25] reviews, respectively, and the majority (7/10, 4/6) indicated a significant difference between VR therapy and AT, whereby VR therapy was shown to be superior. Only one review [25] investigated the effect of VR therapy on neuropsychiatric symptoms and found a significant improvement (SMD = −0.96, 95% CI = −1.27 to −0.65, very low-certainty evidence) compared with AT. The low to very low certainty of evidence across reviews means it was not possible to state whether more benefit of VR therapy on stride/step length, balance function, and neuropsychiatric symptoms when compared to AT.

The results regarding gait speed, walking stability, balance confidence, quality of life, and activities of daily living were mixed and provided no convincing evidence of the effect of VR therapy versus AT on these areas.

We found no significant difference between VR and AT on walking distance, motor function, and cognitive function. Most reviews described similar improvements in both exercise groups.

Comparison 2: VR therapy versus PT

An overview of the review result summary is provided in Table 4.

Table 4 Summary of the effectiveness of virtual reality therapy compared to passive intervention by outcomes in Parkinson’s disease

We found three reviews investigating VR therapy versus PT in participants with PD. Triegaardt et al. [19] reported that VR therapy had greater effects on gait speed, stride/step length, balance function, and activities of daily living compared with PT. Dockx et al. [26] showed a significant benefit of VR exercise on balance as a composite measure (SMD 1.02, 95% CI 0.38 to 1.65) compared to PT. The evidence [27] derived from a single study showed an improvement in postural control (SMD 2.57, 95% CI 1.53 to 3.60) after VR therapy. Given the moderate to very low certainty of the evidence and limited data, we were unable to make any conclusion on the effect of VR therapy versus PT on function in people with PD.

Comparison 3: VR therapy versus controls (mixed AT and PT)

One review [17] revealed that training significantly improved balance (g = 0.66, P < 0.001), quality of life (g = 0.28, P = 0.015), activities of daily living (g = 0.62, P < 0.001), and neuropsychiatric symptoms (g = 0.67, P = 0.021) compared to the control group. A second review [20] reported that Kinect and Wii showed immediate positive effects on functional locomotion in people with PD. However, we considered this pooled comparison to be flawed as the combination of AT/PT groups was in our view problematic given the likely differences in underlying effect sizes for these two groups in head-to-head comparisons with VR therapy. We therefore have not presented this result in table. Both reviews reporting pooled analysis rated the quality of the evidence as low to very low.

Discussion

Summary of main findings

Based on the current findings, VR therapy induced (1) increased benefits on stride/step length, balance, and neuropsychiatric symptoms as compared with AT and (2) greater effects on gait speed, stride/step length, balance, activities of daily living, and postural control as compared with PT in people with PD.

Three reviews [16, 23, 26] formally rated the evidence using the GRADE approach and self-rated the evidence as very low quality. The remaining reviews [17,18,19,20,21,22, 24, 25, 27] did not explicitly use the GRADE approach; however, following consideration of factors such as their risk of bias appraisal results and the size of included studies, we rated them also as offering very low certainty of evidence. In addition, the overall quality of methodology of included reviews was also unsatisfactory.

We found that despite included reviews spanning decades of research, this overview was unable to offer any reliable estimate of the effect of VR therapy in terms of gait, balance, motor function, quality of life, activities of daily living, cognitive function, neuropsychiatric symptoms, and postural control.

In addition, we investigated potential causes of inconsistent results for outcome as follows: (1) Participants’ characteristics and clinical stages (Hoehn-Yahr, H&Y) may be different. Sarasso et al. [16] found the larger effect of VR-based balance training was observed in patients with greater balance deficits and disease severity (H&Y > 2) at baseline. Patients with greater balance deficits are usually in a more advanced phase of the disease, having also initial executive-attentive and visuospatial dysfunctions that could influence balance. In these patients, VR might have the potential to train both motor and cognitive domains (particularly executive-attentive and visuospatial functions) leading to a greater balance improvement. (2) Different VR modalities may be a key factor. Sarasso et al. [16] reported that VR rehabilitation-specific systems, designed and customized for a rehabilitative goal, are more effective than non-specific systems, such as commercial exergames, to improve balance in PD patients. This finding is supported by similar preliminary evidence in stroke patients [28] and gives reason for a continuous development and implementation of customizable VR systems. (3) There was high heterogeneity in outcome measures, making it difficult to make valid comparisons between different reviews. For example, activities of daily living assessed with Unified Parkinson Disease Rating Scale part II (UPDRS-II) [18, 19, 25] or modified Barthel index (MBI) [21] did not yield consistent results even under the same comparison mode.

Strengths and limitation of the overview

To the best of our knowledge, our study is the first overview of SR-MAs to explore the effect of VR therapy on PD rehabilitation, which may have certain reference value for the clinical practice. In addition, the findings of this overview were based on relatively recent evidence, as all studies were published in the last 6 years. Moreover, this overview included SR-MAs of RCTs using strict inclusion standards in order to reduce the risk of bias. However, this study has several limitations. First, the methodological quality and evidence quality of the included SR-MAs were generally very low; thus, results based on primary studies should be interpreted with caution. Second, we only searched English databases, so SR-MAs published in other languages that met the inclusion criteria may have been missed. Third, there was a great heterogeneity of outcomes across the included reviews, which limited the ability to interpret overall pooled estimates. For future research, it would be necessary at least to define a homogenous outcome core set to assess the effect of VR therapy in PD patients. Fourth, the combined effects of VR therapy with any type of ATs should be compared with the same type of AT so that the additional benefits of VR therapy can be elucidated. Unfortunately, the meta-analyses often pooled trials with highly heterogeneous interventions (i.e., VR therapy/VR therapy combined with other ATs), which makes interpretation of their results very difficult. However, our overview cannot avoid this limitation and our findings must be interpreted with caution. Fifth, our overall GRADE assessment was based on a combination of assessments made by the systematic review authors and ourselves. This combination may entail inconsistency in assessments, as reliability between the assessment made by the authors of the systematic reviews and our research group is unknown. Therefore, our overview cannot avoid this limitation and our findings must be interpreted with caution.

Conclusion

We found the methodological quality of the reviews and the certainty of the evidence within them was poor. We were therefore unable to conclude with any confidence that, in people with PD, VR therapy is beneficial for gait, balance function, balance confidence, motor function, quality of life, activities of daily living, cognitive function, neuropsychiatric symptoms, and postural control. Rigorous-designed, high-quality RCTs with larger sample sizes are needed to further verify the effectiveness of VR therapy in the treatment of PD.