Study design and participants
Campbell and colleagues recommend an iterative phased approach starting with exploratory trials (phase II studies) before conducting definitive randomized controlled trials (phase III studies) . This phase II exploratory study used a single arm pre-post testing design. From March to May 2017, potential participants were recruited through public advertisements in local newspapers and from the pensioner community ETH Zurich (PVETH, Switzerland). Assessments and intervention were performed at ETH Hönggerberg (Zurich, Switzerland). Measurements were conducted before (June 2017) and after (September 2017) the intervention period. In addition, a between-measurement consisting of two questionnaires was performed after the first week of training. Before the intervention period started, participants wore an activity monitoring device (StepWatch) for 1 week. The ETH Zurich Ethics Committee (Zurich, Switzerland) granted ethical approval (protocol number EK 2017-N-06). All participants were fully informed prior to participation and signed an informed consent form according to the Declaration of Helsinki before conducting any measurement.
The potential participants were screened using the Montreal Cognitive Assessment (MOCA) to assess cognitive status. Furthermore, the participants completed a health questionnaire including anthropometric data and questions about their health, medical history and physical activity level. Participants fulfilling all of the following inclusion criteria were eligible for the study: (1) age ≥ 65 years, (2) living independently, (3) healthy (self-reported), (4) able to walk at least 20 m with or without walking aids. Participants exhibiting at least one of the following criteria were excluded from the study: (1) mobility impairments that prevent from training participation, (2) severe and uncontrolled health problems (e.g. recent cardiac infarction, uncontrolled diabetes or hypertension), (3) orthopaedic disease that prevents from training participation, (4) neurological disease (e.g. history of stroke or epilepsy, Parkinson’s disease), (5) Alzheimer disease or other forms of dementia, (6) acute severe, rapidly progressive or terminal illness, (7) cognitive impairments (MOCA < 26 points), (8) intake of any psychoactive substances (e.g. neuroleptics, antidepressants), (9) high alcohol, caffeine or nicotine consumption. The minimal intended study sample size of 20 participants was based on previously conducted feasibility and usability studies [35, 36] and on a practitioner’s guide .
From June to September 2017, the participants performed three training sessions per week for 7 weeks resulting in a maximum of 21 training sessions. The training sessions were scheduled individually from Monday to Friday with a guideline of no more than one training session per day. The 21 training sessions were distributed within a period of seven to 9 weeks as a maximum of 2 weeks holiday interruption was allowed in between. Each session consisted of 40 min training with the newly developed Active@Home exergame prototype including Tai Chi-inspired training (20–30 min) and dance exercises (10 min). Tai Chi-inspired exercises were a combination of lower-limb and core strength exercises and Tai Chi elements in three different stance positions (squat, plié, and lunge). Tai Chi-like movements were used as this ancient Chinese physical activity is often performed in a semi-squat posture, placing load on the lower limbs and core muscles . These muscles are important for functional movements as walking [39, 40] and are positively influenced by Tai Chi training in the elderly . Beside of increasing muscle strength, Tai chi has been shown to enhance balance and coordinative skills as well as cognitive functions; the later may be due to the cognitively demanding exercises [38, 42, 43]. To ensure optimal training effects, muscle loading recommendations for older adults were applied to the Tai Chi-inspired exercises (e.g. time under tension of 6 s per repetition, a rest of 4 s between repetitions, 7–9 repetitions per set, a rest of 60s between sets, a training volume of 2–3 sets per exercise) . Additionally, dance exercises were included in the Active@Home exergame. Dancing exercises were based on common dances as Bachata, Disco Fox, Salsa, Waltz, Cha-Cha-Cha, and Jive and, in general, require motor components of balance, coordination, and agility, but also cognitive resources [45,46,47]. Dancing and the execution of rapid and well-directed steps has been shown to improve balance, coordinative skills, endurance and cognitive functions [48,49,50,51,52,53]. Both, Tai Chi and dancing are “holistic” and task-oriented physical activities [54, 55]. The exercises were accentuated with background music .
The exergame prototype implemented some basic training principles  as a feedback system with a real-time colour code for performance (red colour for bad performance, orange colour for moderate performance, green colour for good performance) and performance scores during and after each exercise. To ensure optimal challenge (optimal load of task demands) and increasing difficulty (progression), several difficulty levels for Tai Chi-inspired and dance exercises were developed. Progression was reached through more complex movements in the Tai Chi-inspired exercises (e.g. additional arm movements, upper body rotations, increased range of motion, longer time in unstable position) and through additional weights (e.g. filled water bottles), while faster and more complex motion sequences were performed in dance exercises.
The game story was about travelling in Europe and to train in several different European cities. To demonstrate the exercises, a virtual instructor was used. The game interface was presented on a TV screen connected to a laptop running the exergame software. For movement evaluation, the participants wore four inertial measurement units (IMUs) providing both accelerometer and gyroscope assessments. The IMUs were connected via Bluetooth to the laptop and attached to participants’ wrists and ankles with Velcro straps. Figure 1 shows the training set up. Participants trained alone in the laboratory at ETH Hönggerberg (Zurich, Switzerland) wearing comfortable sports clothes and shoes. Two postgraduate students supported the participants and systematically observed them throughout the intervention. Furthermore, they ensured that the training principles of optimal load and progression were present [57, 58]. Training intensity was individually adapted to target a moderate to vigorous training level [59, 60]. Intervention characteristics as frequency, duration and training intensity were based on recommendations for fall prevention in elderly [44, 59,60,61] and on studies showing positive training effects of exergame training in older adults .
Usability of the newly developed exergame prototype was evaluated using quantitative and qualitative assessments. A mixed method approach was chosen similar to other studies which evaluated the usability of exergames . Questionnaires were completed by participants after three training sessions (between-measurement) and after the intervention period (post-measurement).
The System Usability Scale (SUS) includes 10 items rated on a 5-point Likert scale (0 = “strongly disagree” to 4 = “strongly agree”) and is a validated and reliable scale for evaluating subjective usability of newly developed devices and systems [63, 64]. The sum of all item scores was multiplied with 2.5 and led to the SUS score ranging between 0 to 100, whereas higher scores indicate better usability . Based on the verbal categorization rate of Bangor , we expected a SUS score ≥ 70 for an “acceptable system”. An additional question was added at the end of the SUS, asking participants about their general opinion of the Active@Home exergame. This question was also rated on a 5-point Likert scale (0 = “I don’t like it” to 4 = “I like it a lot”) and the mean was calculated over all participants.
The Game Experience Questionnaire (GEQ) assessed several categories of subjective game experience (competence, immersion, flow, tension, challenge, negative affect, positive affect) [31, 66], and includes in total 42 items rated on a 5-point Likert scale (0 = “not at all” to 4 = “extremely”). Competence implies feelings of being successful, strong or skilful in the game. Immersion includes the interest and pleasure of a player in the game. Flow summarizes the feelings of being deeply concentrated and absorbed, forgetting time and losing connection to the world outside the game. Tension includes feelings of annoyance, frustration and pressure. Challenge implies feelings of being stimulated and challenged. Negative affect summarizes feelings related to a bad mood and boredom, whereas positive affect includes feeling of happiness and enjoyment. The GEQ was analyzed by calculating the average rating for each of the seven categories . Two categories involved only negative coded items (tension and negative affect). These two categories were reverse evaluated .
Training observation and feedback
The usability protocol was structured in six categories: (1) functionality and interaction with the system, (2) IMUs, (3) design, (4) training principles, (5) exercises, and (6) emotions. It was filled in by the supervisors observing the participants during their training sessions. The participants were requested to “think aloud” and mention all thoughts that came to their mind while using the exergame . Furthermore, the protocol included general feedback from participants. The collected observations and statements were separated in positive and negative aspects for each category.
An attendance protocol, filled in by the supervisors after each training session, was used to record the number of visited training sessions. The adherence rate was calculated using the number of visited training sessions as percentage of the maximum possible training sessions [36, 69]. A 70% attendance rate (15 visited out of 21 total training sessions planned) was considered “being adherent” to the training program [69, 70]. For attrition, the number of participants lost during the trial was recorded (drop-outs) and calculated as a percentage of the total sample size. Considering the median rate for attrition in fall prevention interventions for clinical trials, a 10% attrition rate (two drop-outs) was regarded acceptable . Drop-outs were not considered in the calculation of the adherence rate. Reasons for non-adherence and drop-outs were, when given by the participants, recorded with a special interest in mal-compliance related to usability issues.
As secondary outcomes, physical and cognitive functions as well as cortical activity were measured before and after the intervention period (pre-measurement and post-measurement, respectively).
Parameters of gait kinematics were assessed using the Physilog5 IMU (Gait Up Sàrl, Lausanne, Switzerland), which has been shown to reliably measure gait performance . The Physilog5 IMUs were fixed to the top of the right and left forefoot of participants using elastic straps. A USB port allows data transfer to the computer for further data analysis. A walking protocol involving at least 50 gait cycles was used . Participants walked a straight distance of 80 m under two conditions: (1) single-task condition (ST): participants were instructed to walk at preferred speed without talking; (2) dual-task condition (DT): participants had to walk at preferred speed and simultaneously count backwards (cognitive task) in steps of seven from a randomly given number between 200 and 250. In this condition, participants were asked to perform both tasks concurrently and not to prioritise one task at the cost of the other. This is a common method to measure multitasking capabilities [73, 74]. Two walking steps for initiation and termination were discarded in order to analyse steady state walking . Speed [m/s], cadence [steps/min], stride length [m], and minimal toe clearance [cm] were evaluated and expressed as mean values of both legs in the two walking conditions. For each parameter, the dual-task cost (DTC) of walking was calculated as a percentage of loss of the DT relative to the ST condition according to the formula: DTC [%] = (ST – DT)/ST × 100 .
To assess lower extremity functioning, the Short Physical Performance Battery (SPPB) was applied [77, 78]. A maximum of 12 points can be achieved where a low score is associated with a higher risk of falls . The SPPB includes a balance test, a 4 m-walk test and a 5-chair rises test (maximally 4 points for each subtests). Details about the protocol can be found elsewhere . In line with Eggenberger et al. , we extended the balance test with two additional tasks to avoid ceiling effects. The first additional task was a 20s single-leg stance (with preferred leg) where two points were achieved for reaching 20s, one point for 10–20s and zero points for <10s. The second additional task was a single-leg stance with eyes closed (with preferred leg) where one point was assigned for every 5 s of successful task achievement. For the extended version of the SPPB, the maximum point score is unlimited. For the analysis, the total score of the extended SPPB was calculated as well as the score for each subtest (balance score, 4 m-gait score, 5-chair rises score). For pre- and post-measurement comparison, time measures of the 4 m-gait test and the 5-chair rises test were also evaluated.
Higher cognitive functions such as working memory, divided and selective attention, inhibition and mental flexibility were assessed using four tests of the computerized test battery Test of Attentional Performance (TAP, D-TAP 2.3 VL, PSYTEST, Psychologische Testsysteme, Herzogenrath, Germany). The TAP is a valid assessment of different attentional and executive functions . The following tests were performed on a computer using two answer buttons: Working memory (difficulty level 3), Divided attention, GoNogo (1 out of 2), Set-shifting (alternating letters and numbers). Each test was preceded by a short familiarization session. Details about this protocol can be found elsewhere . For each of the four tests, reaction times [ms], number of errors and omissions were analysed.
Cortical activity and analysis
In order to assess cortical oscillatory activity, 5 minutes of resting state electroencephalography (EEG) were recorded at 500 Hz sampling rate, using a 20-channels dry-electrodes cap (ENOBIO 20, Neuroelectrics, Barcelona, Spain) placed according to the international 10–20 system  and referenced using the Driven-Right-Leg (DRL) / Common Mode Sense (CMS) technique (two external electrodes placed on either side of the left earlobe with an ear-clip). Before electrode placement on the forehead and earlobe, the skin was prepared with abrasive paste (H + H Medizinprodukte GbR, Münster, Germany).
EEG data analysis was performed using custom scripts written in MATLAB R2017b (The Mathworks, Natick, Massachusetts, USA) and using the EEGLAB 14.1.0b open source toolbox . EEG data was first high-pass filtered [zero-phase Hamming windowed sinc FIR, cut-off frequency (− 6 dB) 0.5 Hz, passband edge 1 Hz, transition bandwidth 1 Hz, order 1651] and subsequently low-pass filtered [zero-phase Hamming windowed sinc FIR, cut-off frequency (− 6 dB) 45 Hz, passband edge 40 Hz, transition bandwidth 10 Hz, order 167]. Further analysis was performed to seven parieto-occipital EEG electrodes (Pz, P3/4, P7/8, and O1/2) only, since this cortical area is widely used to detect individual alpha frequency (IAF) reliably [83, 84]. Channel rejection was performed using the automatic procedure supplied by the clean_rawdata EEGLAB extension by taking into account if the correlation of a channel to a reconstruction of it based on other channels, in a given time window, was less than 0.4 as well as if a channel was flat for more than 5 seconds. On average, ~ 95% of the parieto-occipital channels in the pre-measurement EEG recordings remained for further analysis (σ: ~ 10%; range: ~ 71–100%) and ~ 92% (σ: ~ 9%; range: ~ 71–100%) in the post-measurement EEG recordings. Artefactual data points were rejected if their amplitude was higher than ±75 μV within a 500 ms width time window as detected by the trimOutlier EEGLAB plugin. On average, ~ 6% of data was rejected in the pre-measurement EEG recordings (σ: ~ 9%; range: ~ 0–30%) and ~ 8% (σ: ~ 14%; range: ~ 0–48%) in the post-measurement EEG recordings. Afterwards, two IAF measures were estimated: peak alpha frequency (PAF) and center of gravity (CoG), by means of the restingIAF v1.0 open source package available from https://github.com/corcorana/restingIAF. This allowed a fully automatic and reliable strategy to determine IAF estimates during resting state EEG recordings, of which a more detailed and extensive description can be found elsewhere [84, 85]. Briefly, one-sided channel-wise power spectral density (PSD) was first calculated in the 1-40 Hz frequency range by the Welch’s modified periodogram method, using a 2048 sample (~ 4 s) Hamming window (50% overlap) across segments (frequency resolution = 0.244 Hz) and normalized by dividing each PSD channel estimate (within the passband) by the mean spectral power. Then, each PSD estimate was smoothed using a Savitzky-Golay filter with frame length equals to 11 frequency bins and polynomial degree of five. From the smoothed PSD and within an a priori defined alpha frequency band (7-13 Hz), evident frequency peaks were detected and IAF estimates from spectral peaks’ boundaries were computed. Using the first derivative to detect spectral peaks seemingly yields true estimates compared to simply searching from maximal values within a predefined alpha frequency band . Finally, IAF estimates were computed by averaging the obtained spectral peaks estimates across channels. The minimum number of valid channels necessary to estimate IAF was set to one, given the relatively low-density parieto-occipital EEG channels used for this analysis. Additionally, spectral power within the alpha frequency band was calculated by averaging in each participant the PSD estimates of all the included channels, and then summing the obtained channels mean power across the alpha frequency band. Alpha spectral bandwidth was defined as the individual PAF ±2 Hz.
Other outcome measures
The participants wore a StepWatch (Orthocare Innovations LLC, Edmonds, Washington, USA) for 1 week before the intervention period started to assess their daily physical activity behaviour. The StepWatch recorded every step and calculated the number of steps for each day. The mean of 7 days was used as baseline characteristic. Furthermore, the participants rated their current training motivation on a Visual Analog Scale (1 = unmotivated lethargic smiley to 5 = motivated happy smiley) before every training. After each training session, the participants estimated their perceived exertion on the Borg scale from 6 to 20 (6 = “less than very light”, 20 = “more than very hard”) for Tai Chi-inspired and dance exercises, respectively.
For all statistical analysis, SPSS 23.0 for Windows (SPSS Inc., Chicago, Illinois, USA) was used. Descriptive statistics were generated for all variables. Following a conservative approach and due to non-normality of some of the data, confirmed by both Shapiro-Wilk test and Q-Q-plots, non-parametric testing was applied. Intragroup differences between the two measurements were analysed by Wilcoxon signed-rank test. A significance level of α = 0.05 was applied. Correlational effect sizes (r), according to the following equation: r = z/√(n1 + n2) with n1 = n at pre-measurement and n2 = n at post-measurement, were calculated in MS Office Excel (version 2016) and reported according to Cohen : an effect size of r = 0.10 indicates a small effect, r = 0.30 a medium effect, and r ≥ 0.50 a large effect. For pre- and post-measurement comparisons, drop-outs were excluded from analysis (per-protocol analysis). The analysis does not consider intention-to-treat analysis because of a clear description of the drop-out reasons . Moreover, only participants who reached 70% of the maximal possible training sessions were included in the pre-post-comparison.