Background

Stroke is one of the leading global causes of disability [1, 2], with over 17 million individuals worldwide sustaining a stroke each year [2]. Although stroke mortality is decreasing with improvements in medical technology [3], the neurological trauma resulting from stroke can be devastating, and the majority of stroke survivors have substantial motor [4, 5], cognitive [6,7,8,9] and functional rehabilitation needs [3, 10, 11], and much reduced quality of life [3, 12, 13]. Targeted rehabilitation can help address some of these post-stroke deficits, however, historically, many individuals, in particular patients with cognitive impairment, have difficulty engaging in standard therapies [14,15,16] at a level that will produce meaningful and lasting improvements [16,17,18,19]. Enriched and interactive rehabilitation programs are clearly needed to minimize functional disability [13, 20], increase participation in age-appropriate roles and activities [21], lead to greater motivation and treatment compliance [17, 22], and reduce the long-term expense of care in stroke survivors [20, 23, 24].

Virtual reality

Virtual reality refers to simulated interactions with environments and events that are presented to the performer with the aid of technology. These so-called virtual environments may mirror aspects of the real world or represent spaces that are far removed from it, while allowing various forms of user interaction through movement and/or speech [25]. Virtual reality based rehabilitation, or Virtual Rehabilitation (VR), shows considerable promise as a safe, engaging, interactive, patient-centered and relatively inexpensive medium for rehabilitation training [26,27,28,29,30,31]. VR has the potential to target a wide range of motor, functional, and cognitive issues [23], affords methods that automatically record and track patient performance [32], and offers a high level of flexibility and control over therapeutic tasks [17, 18, 33]. This scalability allows patients to train at the highest intensity that would be possible for their individual ability [34], while keeping the experience of interaction with therapeutic tasks enjoyable and compelling [17, 29]. At the same time, VR may enable patients with a neurodisability (like stroke) to practice without excessive physical fatigue [32, 35] which otherwise may deter continued effort and engagement in therapy [36, 37].

Currently, there are two main types of VR: purpose-designed Virtual Environments (VE) and Commercial Gaming (CG) systems. Both types of systems can provide augmented feedback, additional forms of sensory feedback about the patient’s movement over and above the feedback that is provided as a natural consequence of the movement itself [11, 38]. VE systems are often designed by rehabilitation scientists (and others) to enhance the delivery of augmented feedback in order to develop the patient’s sense of position in space [39,40,41], to reinforce different movement parameters (like trajectory and endpoint) and reduce extraneous movements (e.g. excessive trunk displacement) [42, 43].

VE systems are also more likely to involve specially designed tangible user interfaces used in mixed reality rehabilitation systems [13] or training of daily functional activities [44]. By comparison, CG rehabilitation systems are typically “off-the-shelf” devices such as Wii (Nintendo), Xbox (Microsoft) and PlayStation (Sony), which have the advantage of being readily available and relatively inexpensive when compared with VE systems [11]. On the other hand, CG systems are typically designed for able-bodied participants and may not consider the physiological, motor, and cognitive aspects of recovery in rehabilitation, and may lack the scalability of purpose-designed VE systems [45].

Systematic reviews comparing VE and CG systems

There is conflicting evidence about the relative effectiveness of VE- and CG-based VR systems. In a recent Cochrane review of VR following stroke [46], VE systems demonstrated a significant treatment effect on upper-limb function when compared to controls (d = 0.42; 95%CI: 0.07–0.76), while the effect for CG systems failed to reach significance (d = 0.50; 95%CI: -0.04-1.04); a caveat, however, was that only two of nine studies (22%) in these comparisons were CG-based. In contrast, a meta-analysis by Lohse and colleagues of VR following stroke [11] found no significant difference between VE (g = 0.43, based on 13 studies) and CG interventions (g = 0.76, based on three studies) on Body Structure/Function level outcomes. For Activity level outcomes, CG interventions showed a large but non-significant effect (g = 0.76, p = 0.14), but was based on only four of 26 studies (15%); VE interventions, however, showed a significant treatment effect (g = 0.54, p < .001). Taken together, these two reviews suggest benefits of VE systems, while previous analyses of CG treatment effects have been underpowered and inconclusive.

Cognition and VR

Cognitive impairments, including difficulties in attention, language, visuospatial skills, memory, and executive function are common and persistent sequelae of stroke [14, 47] and exert considerable influence on rehabilitation outcomes [48]. Cognitive dysfunction may reduce the ability to (re-)acquire motor [25, 49,50,51,52] and functional skills [47], and decrease engagement and participation in rehabilitation program [48, 53]. While the important role of cognition in both conventional and VR-based rehabilitation is increasingly recognized [52,53,54] the impact of VR on cognitive function has not yet been formally evaluated in a quantitative review.

Analysis of individual domains of functioning

The World Health Organization’s International Classification of Functioning, Disability, and Health (ICF-WHO [55]) is currently one of the most widely used classification systems. It is a foundation for understanding outcome effects in clinical practice [56] and the preferred means for translating clinical findings in a patient-centered manner [56]. Under the ICF-WHO, disability and functioning are seen to arise by the interaction of the health condition, the environment, and personal factors, and can be measured at three main levels: (i) Body Structure/Function, (ii) Activity (or skill), and (iii) Participation. The ICF-WHO has been used to classify outcome measures in studies of VR (for example [57]) and in recent systematic reviews [11, 58, 59]. A brief critique of these reviews reveals a number of important conclusions, but also some significant gaps in the research.

An early systematic review by Crosbie and colleagues [60] examined the efficacy of VR for stroke upon motor and cognitive outcomes. Of the 11 studies reviewed (up to 2005), only five addressed upper-limb function and two addressed cognitive outcomes. Overall, the review reported significant benefits of VR, but only three studies were RCTs and no effect size estimates were reported. At around the same time, a systematic review by Henderson and colleagues [61] showed that there was very good evidence that immersive VR was more beneficial than no therapy for upper-limb rehabilitation in adult stroke, but insufficient evidence for non-immersive VR. Comparisons with traditional physical therapy were less impressive, however.

A 2016 systematic review by Vinas-Diz and colleagues [62] included both controlled clinical trials and randomized controlled trials (RCTs) in stroke, and spanned 2009–2014. The review included 25 papers: four systematic reviews [19, 46, 63, 64] and 21 original trials. Evidence for treatment efficacy on upper-limb function was strong on a mix of measures like the Fugl-Meyer Test, Wolf Motor Function Test, and Motricity Index. However, a quantitative analysis of the effects was not undertaken, and important aspects of treatment implementation like dose and session scheduling were not formally examined.

A recent systematic review by Santos-Palma and colleagues [58] examined the efficacy of VR on motor outcomes for stroke using the ICF-WHO framework, covering work published up to June 2015. Of the studies deemed high quality, 20 examined outcomes at the Body Structure/Function level, 17 at the Activity level, and eight examined Participation. Intriguingly, positive outcomes were evident only at the Body Structure/Function level, while results for Activity and Participation were not conclusive. Unfortunately, only three studies addressed manual ability at the Activity level, which severely limited any evaluation of skill-specific effects.

In a combined systematic review and meta-analysis of 37 RCTs published between 2004 and 2013, Laver and colleagues [46] present a more comprehensive examination of the effects of VR on upper-limb function. As well, they classified outcomes broadly into upper-limb function, Activities of Daily Living (ADLs) and other aspects of motor function. In general, study quality was low, and the risk of bias high, in roughly one-half of the studies. Outcomes were significant for upper-limb function (d = 0.28) and ADLs (d = 0.43), but somewhat smaller than those reported by Lohse and colleagues [11]. Results for other aspects of motor function, including several at what may be considered the Body Structure/Function level, were non-significant. Dose varied considerably between studies, ranging from less than 5 h to more than 21 h in total. In general, studies that used higher doses (> 15 h of therapy) were reported as more effective. Unfortunately, results could not be pooled for cognitive outcomes, and the importance of additional treatment implementation parameters like training frequency and duration, and the impact of specific study design factors including the recovery stage of participants and type of control group (i.e. active vs passive) were not determined.

An updated systematic review by Laver and colleagues [65], included an additional 35 studies that reported outcomes for upper limb function and activity. A subset of only 22 studies that compared VR with conventional therapy showed no significant effect of VR on upper-limb function (d = 0.07). As well, there was no significant difference between higher (> 15 h of therapy), and lower levels of dose. However, when VR was used in addition to usual care (10 studies; 210 participants), there was a significant effect on upper-limb outcomes (d = 0.49). As before, no significant difference was shown between high and low dose studies. Unfortunately, analysis of cognitive outcomes, and moderator analyses including study quality, and implementation parameters (e.g., daily intensity, weekly intensity, treatment frequency, and total number of sessions) were not included in the updated review. As well, the assessment of study quality was limited to the 5-item GRADE system, the ICF classification system was not given full consideration, and no distinction was drawn between treatment as usual (TAU) and active control groups (TAU + some form of additional therapy).

Taken together, recent reviews on the use of VR for adult stroke show encouraging evidence of efficacy at the level of Body Structure/Function, but mixed results for Activity and ADLs, and a paucity of evidence bearing on Participation. The impact and effectiveness of VR on cognitive outcomes also remains poorly understood, despite the important role of cognitive dysfunction in learning and rehabilitation [17, 18], and increased evidence of interconnection between cognitive function and motor deficits at the Body Structure/Function, Activity and Participation levels of the ICF [52]. VE-based platforms have been suggested to be superior to CG approaches [46] in promoting motor function, but until recently there have been few CG studies available for analysis. As well, other design factors that may moderate treatment effects (like stage of recovery, control group type) have either not been explored or are too few in number to draw firm conclusions. There has been considerable variation in the total dose of VR therapy [46, 60], and no analysis has yet tested the dose-response relationship in moderator analyses. Finally, the bulk of conclusions have relied on qualitative synthesis, and there is a paucity of quantitative analysis of empirical data to inform opinion.

In view of limitations in past reviews and continued acceleration in VR the aim of our review was to conduct a systematic literature review and meta-analysis to re-evaluate the strength of evidence bearing on VR of upper-limb function and cognition in stroke. This review is critical given evidence that stroke rehabilitation needs to better optimize intervention techniques during the recovery windows that exist in the acute phase [66] and beyond. Focusing only on RCTs, we consider outcomes across levels of the ICF-WHO, and analyze the moderating effect of design factors and dose-related parameters.

Methods

The current review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [67], it should be noted that the protocol was not registered.

Data sources and search strategy

Scopus, Cochrane Database, CINAHL, The Allied and Complementary Medicine Database, Web of Science, MEDLINE, Pre-Medline, PsycEXTRA, and PsycINFO databases were systematically searched from inception until 28 June 2017. Boolean search terms included the following: “stroke, cerebrovascular disease, or cerebrovascular attack” and “Virtual reality, Augment* reality, virtual gam*” (see Appendix for an example of the full MEDLINE search strategy).

Inclusion and exclusion criteria

RCT studies published in English in peer-reviewed journals, utilizing a VR intervention to address either motor (upper-limb), cognitive, or activities of daily living in stroke patients were included in the current review (see Fig. 1). VR was defined as a type of user-computer interface that involves real-time simulation of an activity/environment, enabling the user to interact with the environment using motor actions and sensory systems. Comparison groups included “usual care”, “standard care” or “conventional therapy”, involving physical therapy and/or occupational therapy. Studies were excluded that applied a “hybrid” approach combining virtual reality with exogenous stimulation or robotics, targeted lower limb function, recruited a mixed study cohort including non-stroke participants, or did not utilize motor, cognitive, or participation outcome measures.

Fig. 1
figure 1

Population, Intervention, Comparison, Outcome (PICO) Question and the main variables included in the systematic literature review and meta-analysis

Identification of relevant studies and data extraction

The eligibility assessment was performed independently using a standardized protocol by two of the authors (AA and JR). After deleting duplicate papers, the title and abstract of all articles were screened by the authors to assess suitability for inclusion. Those considered potentially eligible were read in full. In addition, reference lists of relevant reviews were searched by hand. The last hand search was performed 28 June 2017. For articles meeting inclusion criteria, data on study design, participant characteristics, and intervention outcomes were extracted by two of the authors (AA and JR). Disagreements between reviewers were resolved by consensus.

Extracted VR outcomes were organized according to the three levels of functioning classified by the ICF-WHO [55] including: (i) Body Structure/Function, which refers to physiological functions of body systems (e.g. Fugl Meyer Assessment); (ii) Activity, which refers to the execution of tasks or actions (e.g. Box and Blocks Test); and (iii), Participation, which refers to involvement in life situations (e.g. Motor Activity Log [57]).

Quality assessment

Two authors (AA and PW) assessed the risk of bias of each included article using the Physiotherapy Evidence Database (PEDro) Scale [68]. The PEDro Scale rates methodological quality across 11 bias reducing items relating to the domains of Selection, Performance, Detection, Information, and Attribution biases [69]. Studies with PEDro total scores from 6 to 10 were considered high quality [70]; scores below 6 were considered fair quality. Disagreements between reviewers were resolved by consensus.

Quantitative analysis

From the published manuscript, post-intervention means and standard deviations on each outcome measure, p values, and sample sizes for the experimental and control groups were entered into Comprehensive Meta-Analysis (CMA; Biostat, Englewood, NJ, USA) version 3.3.070. A random-effects model was used to compute the effect size estimate, Hedge’s g, a variation of Cohen’s d that corrects for small sample sizes. The magnitude of Hedge’s g was categorized as follows: small (≥0.2), medium (≥0.5) and large (≥0.8) [71]. Pooled effect sizes were calculated by aggregating the mean effect sizes weighted by each study’s sample size, and the 95% confidence intervals (CI) and z scores based on the overall mean and standard error. Meta-analysis was only performed in cases where there was more than one study in each group [72]. Effect size outcomes favoring VR were assigned a positive value while effects favoring the control condition (i.e. treatment-as-usual) had a negative value. Heterogeneity was formally assessed with the I2 statistic, where an I2 value greater than 50% indicated significant heterogeneity [71]. The risk of publication bias was assessed using the Classic fail-safe N and Egger’s regression test (2-tailed p value). Finally, moderator analyses were conducted using the Q statistic to estimate the likelihood of a given variable moderating observed effect sizes. A total of ten moderator variables were examined, including five design factors, and five implementation parameters (See Table 1).

Table 1 Moderators included in the analyses

Results

Following removal of duplicates, 17,300 records were screened for eligibility. Following the selection process depicted in Fig. 2, a final sample of 31 articles was identified for inclusion in this review. Twenty-eight studies [13, 21, 44, 57, 73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96] utilized an upper-limb training intervention approach, one also aimed to improve cognitive function [97], and two studies [53, 54] targeted cognitive function alone. The pool of studies included work conducted in the UK, Korea, Spain, USA, Brazil, Israel, Sweden, Australia, and New Zealand (see Table 2). Of the 31 articles, two presented two separate studies for analysis [21, 91], providing a total of 33 independent studies. All studies used an RCT design, comparing 492 participants receiving VR (per study M = 14.9, SD = 10.9) with 479 participants receiving Conventional Therapy (CT; per study M = 14.5, SD = 11.4).

Fig. 2
figure 2

Four-phase Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram, showing the process for identifying and screening of the articles for inclusion and exclusion in the systematic literature review and meta-analysis

Table 2 Characteristics of the included studies

Participant characteristics

Sample sizes ranged from 4 to 62 participants per group. Eight studies had less than 10 participants in the VR group [21, 57, 76, 79, 81, 83, 84, 90], while only five studies had over 20 participants (range, 20–59) in the VR group [74, 77,78,79, 86] (see Table 2). The average age was 60.0 years (SD = 6.3 years, range 48.2–74.1). The average time post-stroke for each study (based on 29 independent studies, four studies [54, 90, 91, 97] did not report time post-stroke) varied considerably from 1.9 weeks to 427.8 weeks (M = 79.6 weeks, SD = 105.2). This included seven studies [21, 44, 53, 74, 76, 86, 91] (21%) conducted during the sub-acute (≤ three months) stage (range 1.9–10.3 weeks, M = 3.86 weeks, SD = 3.23), while the remainder completed VR interventions during the chronic (> 3 months post-stroke) stage (range 17.2–427.8 weeks, M = 127.40 weeks, SD = 132.5). Seventeen studies [13, 21, 53, 57, 74, 77, 79,80,81,82, 85, 87, 91, 92, 94] included both ischemic and hemorrhagic stroke patients, three included only ischemic stroke patients [78, 86, 93], and 11 did not report specific details about stroke type. Only three studies [21, 74] reported data on stroke severity, two utilizing the National Institutes of Health Stroke Scale (NIHSS) and one study [86] used the Canadian Neurological Scale of Stroke Severity.

VR and control interventions

Of the 33 independent studies, 19 used a VE approach and 14 studies evaluated a CG-based therapy (see Table 3). VE interventions involved either video capture or tabletop systems. The former required the patient to be seated in front of a wall display while grasping a sensor, such as the Reinforced Feedback in Virtual Environment system [77,78,79] and the Rehab Master game-based VE system [75, 76]. Tabletop systems involved multitouch display technologies (e.g. [92, 94]), requiring finger touch response [94] or the manipulation of tangible user interfaces. CG therapies included Wii (Nintendo [73, 74, 80, 83, 86, 91]), Xavix [83], EyeToy (PlayStation [88]), IREX system [53, 82, 97], Xbox Kinect [81, 89], or a combination of systems [95]. All but two intervention programs (93%) took place in a hospital, one [90] was home-based and another provided rehabilitation at a local community center [92]. Only one study [85] reported on the number of repetitions per session.

Table 3 Description of the virtual rehabilitation interventions, conventional control group therapies, and additional control treatments, when applicable

All VR and CT group participants received CT. In most of the included studies, this “treatment as usual” was only described in limited terms, but typically involved aspects of either physio- or occupational therapy (see Table 3). In 21 studies, CT group participants also received additional rehabilitation interventions, to match the additional time in therapy provided to participants randomized to VR. These so-called “active” control group interventions included, for example, additional physio- and occupational therapy [83], or additional standard therapy tailored to individual needs [77, 79, 93] (see Table 3). In contrast, 12 studies utilized “passive” control groups that received no additional intervention beyond treatment as usual.

Dose and session scheduling

For all VR approaches combined, the mean overall Dose was 685 min (SD = 355, range 200–1440 min), with a mean Daily Intensity of 42 min (SD = 15, median 30, range 20–60 min) and Weekly Intensity of 153.9 min (SD = 80.38, median = 135, range 60–800). The mean Frequency was three sessions a week (range one-five sessions), and the median Duration was 18 sessions (range, 4–36 sessions).

ICF-WHO outcomes

Twenty-seven studies reported Body Structure/Function level outcomes, with the Fugl-Meyer Assessment-Upper Extremity (FMA-UE) as the most common instrument (21 studies). An additional study [95] utilized the FMA-UE to classify baseline participant characteristics but did not include it as an outcome measure. Twenty-nine studies reported Activity level outcomes, most commonly using the Box and Blocks Test (seven studies), Functional Independent Measure (eight studies), and Barthel Index (six studies). Participation level outcomes were reported by five studies, most often utilizing the Motor Activity Log instrument (four studies). Only four studies [53, 54, 91, 97] reported data on cognitive outcomes (see Table 4), each of these studies reported data on multiple cognitive outcomes, and all of these were included in the analyses (Table 5).

Table 4 Outcome measures included in the data analysis
Table 5 PEDro Scale risk of bias ratings for the included studies

Risk of bias

The methodological quality of included studies was generally high (see Table 4), with an average PEDro total score of 7.06 (SD = 1.26, range 5–9). Eligibility criteria were specified in all studies, and all but one study [83] specified random allocation of participants. However, despite more rigorously focusing only on RCT designs, However, despite more rigorously focusing only on RCT designs, four [21, 80, 85, 92] of the included studies were rated only fair quality, due to the omission of concealed allocation, blinding, and intention to treat analyses. In addition, the Egger’s intercept value for all outcomes combined was 1.23, p = 0.02 (two-tailed), suggesting pronounced asymmetry and an increased likelihood that smaller studied tended to report larger than average effects [98]. To minimize the risk of publication bias all reported effect size outcomes were based on a random-effects model to give more weight to larger trials [99].

Main effects of VR after stroke

For all outcomes combined (see Fig. 3 and Additional file 1: Figure S1), the average effect size for VR interventions was small to medium (g = 0.46; 95% CI: 0.33–0.59, p < 0.01), with significant benefit of VR compared to CT. The overall fail-safe N was high at 439, and heterogeneity minimal (I2 = 0%), suggesting a robust finding. Both VE and CG approaches were significantly more effective than CT, with an average small effect size for CG (g = 0.33; 95% CI: 0.14–0.51, p < 0.01), and an average medium effect size for VE interventions (g = 0.58; 95% CI: 0.41–0.76, p < 0.01). Moderator analysis confirmed the difference between VE and CG-based approaches was statistically significant [Q(1) = 3.96, p = 0.047].

Fig. 3
figure 3

Forest plot showing the main effect-sizes of Virtual Rehabilitation after stroke on the motor, functional, and cognitive outcomes combined; the three levels of the International Classification of Functioning (Body Function outcomes included Fugl-Meyer Assessment-Upper Extremity and Modified Ashworth Scale; Activity outcomes included Box and Blocks Test; Participation outcomes included Motor Activity Log and Quality of Movement); and cognitive outcomes using the random-effects model. Notes: CG: Computerized Gaming; CI: Confidence Intervals; CT: Conventional Treatment; ICF: International Classifacation of Functioning; VE: Virtual Environment; VR: Virtual Rehabilitation

The average effect size for cognitive outcomes was small but significant (g = 0.45, 95% CI: 0.02–0.88, p = 0.04). Heterogeneity between studies was minimal (I2 = 14.69%), but the fail-safe N was only 2, suggesting a tenuous finding. For upper-limb motor and functional outcomes, data was examined at each of the three ICF-WHO levels (see Fig. 3 and Additional file 2: Figure S2). Small overall to medium effects were observed on Body Structures/Function (g = 0.41; 95% CI: 0.28–0.55; p < 0.01) and Activity outcomes (g = 0.47; 95% CI: 0.34–0.60, p < 0.01), while Participation outcomes were non significant (g = 0.38; 95% CI: -0.29-1.04, p = 0.27).

Moderator analysis

Moderator analysis (see Fig. 4) found no significant difference in the overall outcomes of interventions that utilized an active or passive control group [Q(1) = 0.05, p = 0.83], and between moderate and high quality studies [Q(1) = 0.001, p = 0.98], and between studies with low and high sample size [Q(1) = 0.67, p = 0.41]. Moreover, there was no significant difference in overall outcomes for patients receiving VR during either the sub-acute or chronic stage [Q(1) = 2.39, p = 0.12], and between interventions that focused specifically on hand function or overall upper-limb function [Q(1) = 2.82, p = 0.09].

Fig. 4
figure 4

Forest plot showing the main moderator analyses of Virtual Rehabilitation outcomes after stroke using the random-effects model. Note: AR: Additional Rehabilitation; CI: Confidence Intervals; CT: Conventional Treatment; TAU; Treatment As Usual; VR: Virtual Rehabilitation

Different levels of dose (high, medium, low) had no significant effect on the overall effect [Q(2) = 2.22, p = 0.33]. Variations in daily intensity [Q(1) = 0.16, p = 0.70], frequency [Q(1) = 0.67, p = 0.71], weekly intensity [Q(1) = 0.03, p = 0.85] and duration [Q(1) = 2.77, p = 0.10] also had no significant impact.

Meaningful comparisons could not be performed between different levels of severity (determined using gold standard FMA-UE outcome measure); there was only a single study that used a group of mild severity [82]. The larger (moderate-severe) group clustered tightly around a mean severity of 34.9 (SD: 8.9). When the mild severity study [82] was pulled-out from the overall analysis, the overall effect for all outcomes combined remained small (g = 0.47; 95% CI: 0.34–0.60, p < 0.01), with significant benefit of VR compared with CT.

On the basis of the statistically significant advantage for VE approaches relative to CG designs, treatment effects for VE-based rehabilitation alone were also analyzed at each ICF-WHO level (see Fig. 5). There was a medium effect overall on Body Structure/Function (g = 0.54; 95% CI: 0.35–0.73; p < 0.01), and a medium to large effect on Activity (g = 0.62; 95% CI: 0.43–0.81, p < 0.01). The overall effect on Participation was unchanged as no CG approaches examined outcomes in this ICF-WHO domain. Within-group heterogeneity was minimal for Activity (I2 = 0%) and Body Function (I2 = 0%) outcomes, and large for Participation outcomes (I2 = 65%).

Fig. 5
figure 5

Forest plot showing the main effect-sizes of Virtual Environment therapy after stroke on the three levels of the International Classification of Functioning using the random-effects model. Body Function outcomes included Fugl-Meyer Assessment-Upper Extremity and Modified Ashworth Scale; Activity outcomes included Box and Blocks Test; Participation outcomes included Motor Activity Log and Quality of Movement. Note: CI; Confidence Intervals; CT: Conventional Treatment; ICF: International Classifacation of Functioning; VE: Virtual Environment

Follow-up data

Twelve studies also included follow-up data: six studies re-assessed outcomes four to six weeks after intervention [44, 76, 79, 86, 94, 96] and six studies re-assessed outcomes eight to 26 weeks later [74, 87, 88, 90, 93, 95]. Both CG [74, 86, 88, 95] and VE [44, 76, 79, 87, 90, 93, 94, 96] approaches, and sub-acute [44, 74, 76, 86] and chronic [79, 87, 88, 90, 93,94,95,96] populations were represented (see Fig. 6). There was no significant difference in treatment effect [Q(2) =0.35, p = 0.72] between the four to six week follow-up (g = 0.36, p = 0.02), the eight to 26 week follow-up (g = 0.58, p < 0.01), and the overall effect of VR observed immediately following intervention (g = 0.46, p < 0.01). Differences between CG and VE approaches were not statistically significant at either follow-up [4–6 weeks: Q(1) = 2.03, p = 0.15; 8–26 weeks: Q(1) = 0.10, p = 0.76]. Overall, small to medium effects for both Body Structure/Function and Activity level outcomes were observed at both the four to six week and the eight to 26 week follow-ups. Only three studies examined Participation outcomes at a follow-up [44, 87, 90], which were small, and non-significant (p = 0.48), in keeping with the pre-post findings. No studies examined cognitive outcomes at follow-up. Consistent with the pre-post data analysis, treatment effects did not vary as a function of the implementation parameters (i.e. dose, daily intensity, weekly intensity, frequency, duration), or recovery stage (i.e. sub-acute vs. chronic).

Fig. 6
figure 6

Forest plot showing the follow-up effects of Virtual Rehabilitation after stroke on the motor, functional, and cognitive outcomes combined using the random-effects model Note: CI; Confidence Intervals; ICF: International Classifacation of Functioning

Discussion

VR is an engaging form of therapy for stroke [19] and suggested to enhance motor, functional, and cognitive performance [11, 19, 46, 54], whether delivered via VE [11, 46] or CG [11]. While recent reviews of VR therapy have shown improvement in upper limb function, superior to conventional physical therapy [11, 19, 46, 58, 62], we know little of treatment effects across all ICF-WHO levels and how outcomes vary along different implementation parameters and design factors [19, 46], resulting in uncertainties about the optimal training protocol that affords the greatest efficacy. The aim of this study was to address these gaps in understanding by analyzing the current evidence base on VR of upper-limb and cognitive function in stroke, in a combined systematic review and meta-analysis.

Overall, the current review of 33 RCTs found that when compared with conventional therapies, VR interventions produced a small to medium overall effect (g = 0.46), above and beyond conventional physical rehabilitation. Specifically, small to medium effects were observed on Body Structure/Function (g = 0.41) and Activity outcomes (g = 0.47), while Participation outcomes (g = 0.38) were highly variable (I2 = 65%) but overall non-significant (p = 0.26). A small to medium effect on cognitive outcomes was shown (g = 0.41), albeit based on only four studies. Intriguingly, the effect of VR was not moderated by dose-related parameters, and no moderator effects for chronicity were evident. These results are discussed in detail below.

Overall effectiveness of virtual rehabilitation

The extent of motor recovery after conventional stroke rehabilitation is often “modest” [100, 101] with no significant advantage between different approaches [100, 102]. When compared with these conventional interventions (including occupational therapy and physiotherapy), the current meta-analysis showed an additional small to medium treatment effect in favor of VR, above and beyond the gains of treatment as usual. The magnitude of this benefit was comparable to that shown in earlier quantitative reviews [11, 19, 46, 65] and reflects an important advance in rehabilitation outcomes. Other attempts to identify novel adjunctive therapies to boost the effects of conventional rehabilitation have been less successful. For example, a review of robotic-assisted therapy for stroke patients with upper-limb impairment [103] showed no significant difference between intensive conventional therapy and robotic-assisted therapy groups in terms of motor recovery, activities of daily living, strength, and motor control.

Virtual environment versus commercial gaming systems

The current review evaluated two main types of VR interventions: purpose-designed VE platforms were examined in 19 studies and commercially available CG systems in 14. Previous reviews have also examined the separate impact of these two types of intervention, but with too few CG studies to make any firm conclusions about relative efficacy [11, 46]. In contrast to the previous two major reviews, which included only 17% [11] and 22% [46] CG-based studies, almost half (42%) of the studies included in the current review were CG-based interventions, suggesting a growing interest in off-the-shelf solutions.

In the current review, both VE and CG intervention types were significantly superior to conventional therapies, with medium effect sizes observed for VE platforms (g = 0.59) and small effects for CG systems (g = 0.33). This difference between VR approaches was statistically significant, and suggests that while both VE and CG systems afford good training effects overall, VE-based systems are somewhat superior [46]. This finding supports the value of customizing rehabilitation tasks according to the clinical needs and capacities of patients. Consistent with previous reviews [11, 46] the positive effect of VE approaches was observed mainly on outcomes at the Body Structure/Function and Activity levels of the ICF, which is discussed in the next section.

Virtual rehabilitation outcomes by domains of function

Over 50 different outcome measures were used by studies included in the current review, underlining the importance of standardized classification using the ICF-WHO [58, 104]. For outcomes at the Body Structure/Function and Activity levels, effects sizes for VR (VE and CG combined) were significant (0.41 and 0.47, respectively). Effects at these levels of the ICF-WHO were more pronounced, however, when VE systems were considered separately: 0.54 and 0.62 for Body Structure/Function and Activity, respectively, compared with 0.27 and 0.32 for CG systems. The results for VE approaches were comparable to previously reported effect sizes at the Body Function (g   =   0.48), and Activity (g   =  0.54) levels [11], which had been based on outcomes from both upper and lower limb interventions combined.

The current meta-analysis of RCTs (published up to June 2017) showed strong evidence of meaningful change across the Body Structure/Function and Activity levels of the ICF-WHO, unlike earlier reviews [46, 58]. First, we showed significant effects at the Body Structure/Function level, where the earlier review of Laver and colleagues showed no change on a group of “other outcomes” that were largely at this ICF-WHO level. Second, our review showed that the largest effect sizes were consistently identified at the Activity level whereas Palma and colleagues [58] found inconclusive support and Laver et al. [46] reported relatively small effects on upper-limb function (d = 0.28). Finally, treatment effects at the Participation level were small (g = 0.38) and non-significant. Variation in the magnitude of effect across studies in our review (g ranging from − 0.37 to 2.04 over five studies) may reflect issues in the assessment of participation outcomes, which is currently an imprecise science [105].

Cognitive outcomes

While cognitive impairment is common post-stroke [16, 20, 27], and cognitive and motor systems overlap at a structural and functional level [9, 20], only four [53, 54, 91, 97] studies included in the current meta-analysis measured cognitive outcomes. While preliminary, the overall effect of VR on cognition was encouraging, with a mean effect size of g = 0.45. The limited number of studies did not permit any conclusions about the superiority of either VE or CG approaches. Palma and colleagues [58] also reviewed cognitive outcomes (from four RCTs), but found no advantage for a VR approach. However, the relevance of several included studies was questionable. One study compared VR with a computerized cognitive rehabilitation program, not with physical therapy [106], a second study contained no identifiable cognitive outcome measures [107] and in a third study, the mental function under investigation was mood state, rather than cognitive status [75]. The fourth study was also included in the current meta-analysis [91]. The results of the current review appear more valid, and provide encouragement that VR can contribute to cognitive rehabilitation. Moving forward, researchers and clinicians are encouraged to be mindful of the inter-relationship between motor and cognitive systems [9] and the potential cognitive benefits of motor-based stroke rehabilitation using VR [25, 31, 54]. For example, a within-group study by Kizony et al. [51] found preliminary evidence supporting the interaction between motor and cognitive function in stroke patients undergoing VR. A more recent study by Subramanian and colleagues [50] provided further evidence of the association between cognitive and motor recovery. Moreover, it was shown that patient’s psychological well-being can also affect motor learning using VR [50], and should also be taken into account in future studies of VR in stroke.

Implementation parameters and design factors

Dose-effect relationships remain inconclusive in the VR area, and in need of further investigation. Reviewing the literature published between 1999 and 2004, Crosbie and colleagues [60] found VR was most commonly delivered three times per week for 1–1.5 h, over a 2–4 week period (i.e. 6–18 h total). Similarly, in their review of the literature from 2008 to 2015, Palma et al. [58] reported the average dose of VR was 17.6 h for upper limb motor function rehabilitation, and 13.2 h for motor activity rehabilitation. These trends were continued in the current review, with the average VR intervention comprised of 40 min sessions delivered three days per week for 6 weeks, for a total of approximately 12 h. However, there was large variability in these implementation parameters, with protocols providing up to 60 min sessions, up to five times per week, for as many as 36 sessions. While a higher number of repetitions and longer training times are argued to be more beneficial for motor learning [108], VR outcomes are argued not to be exclusively dependent on dose [46]. In the current review, moderator analysis also found no clear added benefit of higher doses or massed practice of VR, suggesting a ceiling after which gains plateau. While the dose of rehabilitation may not be the most important factor affecting recovery [109], the average intensity, frequency and duration of VR training identified in the current review appeared to provide an effective schedule for cognitive and motor function outcomes, while reducing the chance of participant fatigue or burn out that may occur under higher intensity training.

Active versus passive control groups

There was no difference in effect sizes (g = 0.45 c.f. 0.48) for interventions that utilized an active control group (i.e. additional conventional therapy beyond just treatment as usual) or a passive one (i.e. treatment as usual only). This was an unexpected finding as active control group designs are preferred for their capability to presumably control for Hawthorne effects and other biases when comparison groups are not balanced in terms of time in therapy. However, the current findings suggest that the use of a passive control group does not inflate the effect size for the intervention group. It also suggests that those treatment strategies embedded in active conditions may not add substantially to the training effects usually observed for treatment as usual. This finding provides credence to studies that lack the resourcing to implement an active control group design and just proceed with a treatment as usual group, which is the case more often than not in rehabilitation research [110].

Sub-acute versus chronic stage

Moderator analysis showed that VR administered in the sub-acute (g = 0.25) and the chronic stages (g = 0.51) were both effective. However, only seven studies included in the current review intervened early after stroke, and the optimal time window for delivering VR remains an issue for further study. For the chronic group, there was a large variation in the time since stroke (range 6 months to several years). While it may be argued that participants with longer-term impairment remained responsive to VR treatment, early intervention is still recommended [111, 112] to address neurological changes before chronic disability ensues [101]. As particular treatment modalities are refined with advances in the technology (e.g., delivery of augmented feedback) there will be unique opportunities to enhance neuroplastic changes during this critical time period [113].

Outcomes at follow-up

A third of all studies included follow-up assessment [44, 74, 76, 79, 86,87,88, 90, 93,94,95,96]. Participant retention was generally high, with only one study experiencing attrition rates over 10% at follow-up [86]. Follow-up duration was four weeks in five studies [44, 76, 79, 86, 94], six weeks in one study [96], eight weeks in two studies [90, 93], 13 weeks in three studies [74, 88, 95] and 26 weeks in one study [87]. Over all follow-up durations, the initial gains reported immediately following VR training were preserved. These findings are encouraging, and suggest that a discrete period of VR can affect longer-lasting improvements in overall motor function, and on ICF-WHO Body Structure/Function and Activity level outcomes in particular. By comparison, there is accumulating evidence that early improvements after conventional rehabilitation may not be sustained long-term after stroke [114, 115]. Notably, the current review showed that gains were maintained regardless of VR approach (CG or VE), dosing (i.e. frequency, intensity, or duration of training), or stage of recovery (i.e. sub-acute or chronic). Surprisingly, no studies examined cognitive outcomes at follow-up, and the durability of post-training improvements in this domain remains unknown. The stability of gains over periods longer than six months has also not been explored but should be encouraged in future research. Also for further study are questions about whether booster sessions or other strategies such as activity monitoring, goal setting, or feedback systems [116] are needed to optimize stroke survivors’ longer-term outcomes after VR.

Risk of bias

To maximize the quality of evidence in this review, all of the included studies were Level 1b (RCTs) to Level 2b (small RCTs) according to the Centre for Evidence-Based Medicine [117]. As evaluated formally using the PEDro Scale, the quality of studies was also generally high. Not surprisingly, the only design component consistently omitted was the blinding of participants, which is difficult to achieve using novel and distinct interventions like VR [46]. One study described their methodology as a double-blind procedure [82], but while participants may have been naïve to the intended outcomes of the study it is unlikely they were unaware of their group assignment (VR vs. passive control group). The current study did not include a search and review of unpublished (grey) literature, which could be important to account for publication bias (or file drawer effect) [118]. The current review specifically focused on published, peer-review articles to ensure the high quality of included data, but performed a fail-safe N [119] calculation to account for missing studies and grey literature. Fail-safe N value of 439: that is, 13 missing studies for every observed study would be required for the overall effect of VR to be nullified, further supporting the observed efficacy of VR. With the risk of biases minimized, we are confident that VR, and in particular VE, can be recommended as a useful adjunct or alternative to conventional therapy when retraining motor and cognitive function following stroke. The ability of VR to enhance experience-dependent neuroplasticity is suggested but demands new research to investigate changes at the brain level. These recommendations are discussed below.

Limitations and directions for future research

The current review did not extend to a formal investigation of active ingredients (i.e. those aspects of VR that are having the most profound impact on functioning), which remains an important and unresolved issue in VR. What makes this issue particularly hard to dissect is the sheer variety in types of interface, augmented feedback, setting, and so on across different studies. It is likely there are both generic and more specific effects of VR on neuroplastic changes and the process of skill learning itself. For example, novelty and engagement are critical to any rehabilitation paradigm and can be captured by a number of well-designed (game-like) VE platforms, or popular CG systems like Wii, Kinect, and PlayStation [90]. The capability of VR to scale levels of difficulty and to provide appropriate rewards to users in the context of gameplay and advancement between levels is critical to CG. Use of augmented feedback (known to be important in motor learning) is one factor that will vary greatly with interface design and the type of human-computer interaction that a given system affords [36]. Componential approaches to system evaluation will be particularly valuable in future research, varying a critical ingredient that is thought to predict an outcome while holding all other factors constant.

The effect of different neurological characteristics on VR rehabilitation outcomes is also in need of examination. Most studies in the current review used mixed samples of hemorrhagic and ischemic clients; only three studies sampled exclusively ischemic stroke patients [78, 86, 93]. Some studies suggest that hemorrhagic stroke may result in more severe cognitive, motor and functional impairment than ischemic stroke [120, 121]. By comparison, other work shows that differences between stroke types are marginal across these domains [122,123,124]. Future investigations would benefit from comparison of these stroke types to test the impact of mixed cohorts. Moreover, consistent reporting of details including lesion location (e.g. Oxfordshire Community Stroke Project Classification) and hemisphere, and initial severity and symptom profiles (e.g. National Institutes of Health Stroke Scale; modified Rankin Scale), will assist in identifying the neurological characteristics of stroke more or less responsive to VR.

Many studies in the current review had small participant numbers. With only an average of 15 participants per group, a number of studies lacked sufficient statistical power to examine more than one or two outcomes [125], and were likely underpowered to examine interactions, predictors, or multivariate effects. As we recommend examining outcomes across all three levels of the ICF-WHO, including cognitive outcomes, larger-n studies are recommended in the future, with power calculations pointing to in excess of 20 participants per treatment arm.

Variation in the choice of primary outcome measure also limits comparison between studies. The VR research field should consider developing a consensus statement on evaluation research to aid the consistency in measurement. For example, at the Body Structure/Function level, the FMA might be considered as a “gold standard” in the absence of a better tool at this point in time. At the Activity level, the Box and Blocks Test has been shown to correlate very highly with longer test batteries that assess skill (like the Action Research Arm Test), and could be included as a standard, easy to administer measure. Due to a limited number of studies reporting cognitive outcomes, the current review could only report on cognition as a unified concept, rather than its more specific domains. Taken together, there is a need to include well-validated assessment of cognition.

The far transfer of training effects to important aspects of daily functioning, independence, and quality of life was examined in only five of the current studies, all of which utilized a VE approach [13, 44, 57, 87, 90]. The overall effect size on Participation outcomes was non-significant (p = 0.26). This result mirrored an earlier review by Saposnik and Levin [19] which identified only one study reporting on (social) participation. By comparison, the review of Lohse and colleagues [11] identified a single study that reported a significant effect on participation outcomes. However, Lohse and colleagues [11] misclassified the Jebsen Taylor Hand Function Test as a Participation measure when in fact it is usually classified as an Activity level outcome [59, 126]. In our review we observed high variability in results, ranging from non-significance (g = − 0.37) to a large significant effect (g = 2.04). The latter study was the only home-based intervention and involved a control group that completed their conventional rehabilitation before the study started. One or both of these unique features may explain the size of the observed effect. Overall, efficacy of VR at the Participation level of the ICF-WHO remains inconclusive (see also [58]) and the amount of evidence bearing on it is very limited [19, 58]. We recommend examination of far-transfer Participation outcomes as standard practice.

Implications for practice

Knowledge of the pattern of treatment effects across ICF levels has important implications for the design of tailored interventions for stroke and evidence-based recommendations for care. Stronger effects for VE-based systems over CG suggest that the added expense of acquiring purpose-designed systems might be a good investment for clinicians, backed, of course, by well-controlled evaluation studies. However, at this point, we still do not have sufficient data to make strong predictions about the (far-transfer) effects of such training on Participation. In cases where cost and access to VE systems is an issue, CG systems will still leverage outcomes at the Body Structure/Function and Activity levels.

There is too little data to yet make firm conclusions about the impact of VR on cognition. However, there are a number of examples where cognitive performance has been enhanced through what are essentially motor-based interventions for the upper limbs. For example, for patients with traumatic brain injury (TBI), Mumford and colleagues [38] showed VR produces a significant subjective improvement in attention and memory function.

Taken together, clinicians and researchers alike are encouraged to seek out purpose-designed VE systems that can boast high-quality evidence for their efficacy. The principled and evidence-based approach to the design, implementation, and evaluation of VE instruments confers a considerable therapeutic advantage at the level of functional movement skill. As a matter of course, future research needs to extend the evaluation of outcomes across all ICF-WHO levels.

As mentioned, the moderator analysis was unable to detect a linear dose-response relationship, as no advantage for higher dosing on any of the VR approaches or rehabilitation outcomes were found. Future studies should seek to explore the more active ingredients of VR, to maximize both the efficacy and the efficiency of treatment rather than simply relying on higher doses. Implications for patient engagement, retention, and satisfaction remain to be explored.

Conclusion

The physical and cognitive impairment resulting from stroke is persistent and prominent, and the prospect of recovery both compelling and elusive. VR interventions offer the unique opportunity for patients to interact in an enriched environment, providing structured, scalable training opportunities augmented by multi-sensory feedback to enhance skill learning and neuroplasticity through repeated practice. Findings from this review suggest VR has an added advantage over conventional interventions, and can produce immediate and longer-term improvements in motor function and the performance of cognitive and motor activities following stroke. The evidence-based efficacy of a VR approach extends to patients in both the acute and chronic recovery stage, utilizing a spaced training schedule delivered via either purpose-designed or commercially available systems. Continued application of this promising technology is encouraged, to refine our understanding of the factors contributing to the beneficial effects of VR, and to promote the transfer of gains to participation outcomes.