Effects of age simulation suits on psychological and physical outcomes: a systematic review

Age simulation suits (ASS) are widely used to simulate sensory and physical restrictions that typically occur as people age. This review has two objectives: first, we synthesize the current research on ASS in terms of the observed psychological and physical effects associated with ASS. Second, we analyze indicators able to estimate the validity of ASS in simulating “true” ageing processes. Following the PRISMA guidelines, eight electronic databases were searched (BASE, Cinhal, Cochrane, Google Scholar, ProQuest, PsychINFO, Pubmed, and Web of Science). Qualitative and quantitative studies addressing effects of ASS interventions regarding psychological outcomes (i.e., empathy, attitudes) or physical parameters (i.e., gait, balance) were included. The Mixed Methods Appraisal Tool was applied for quality assessment. Of 1890 identified citations, we included 94 for full-text screening and finally 26 studies were examined. Publication years ranged from 2001 to 2021. Study populations were predominantly based on students in health-related disciplines. Results suggest that ASS can initiate positive effects on attitudes toward (dweighted = 0.33) and empathy for older adults (dweighted = 0.54). Physical performance was significantly reduced; however, there is only little evidence of a realistic simulation of typical ageing processes. Although positive effects of ASS are supported to some extent, more diverse study populations and high-quality controlled designs are needed. Further, validation studies examining whether the simulation indeed reflects “real” ageing are needed and should build on reference data generated by standardized geriatric assessments or adequate comparison groups of older adults. Prospero registration: 232686. Supplementary Information The online version contains supplementary material available at 10.1007/s10433-022-00722-1.


Introduction
The application of age simulation suits (ASS) has been undergoing continuous development since the 1990s, when the automotive industry started to use the first prototypes. ASS were originally constructed to raise engineers' awareness of age-related and differently caused physical impairments when designing new cars. Concurrently in the gerontological arena, educational programs involving ASS emerged with the ambition to reduce negative attitudes toward older adults in caregiving settings (Galanos et al. 1993;Pacala et al. 1995), enhance empathy and roletaking in relation to older adults, and explore the benefit of experience-based education at large. In this paper, ASS are defined as devices that simulate both physical and sensory restrictions by using additional weights, hearing protection and specifically designed goggles.
Framing the use of ASS more generally, the global transformation toward an increasingly proportion of older population brings along challenges and requirements for the health care system, i.e., understanding what ageing means on various levels. More specifically, ASS come with the ambition to foster processes of empathy and role taking with respect to understanding daily life dominated by physical and sensory impairments. The main expectation here is the increase in positive attitudes toward older adults through age simulation (Bennett et al. 2016;Chen et al. 2015). Primary target Responsible Editor: Morten Wahrendorf.
1 3 groups are health care personnel, family caregivers, and younger age groups in general (Bowden et al. 2020). ASS meanwhile offer many different means to simulate impairments in gross and fine-tuned motor behavior, hearing, and vision that show strong associations with older age and can be seen as markers of ageing (Bergman and Rosenhall 2001). For example, various versions of goggles and hearing protectors mimic different magnitudes of impaired vision or hearing, additional weights (vest, ankle and wrist cuffs) simulate reduced stamina and physical capacity, while joint bandages are used to limit the range of motion (Allen 2018;Lauenroth et al. 2017;Scherf 2014).
Since a growing popularity can be observed regarding ASS in different settings in the recent years, it is fundamentally important to thoroughly and empirically investigate the possibilities and limitations of such simulations. On the one hand, no false picture of age-related limitations should be conveyed, which may establish fears and concerns about later life or regarding views on ageing and older adults. On the other hand, there is a lack of research that explores if ASS allow a realistic simulation of ageing processes and if the simulated age-range corresponds to average functional abilities of older adults in third (60-79 years) or fourth age (80+). So far, predominantly young participants were included in ASS studies and there is some evidence that the simulated impairments did not correspond to old age but rather middle-adulthood. A realistic simulation (i.e., reaching the average functional impairments of a 70, 80, or 90 years-old person) would also be of importance if ASS are used in the development phases of geriatric assistive devices, when the risk of falling is still high and older adults cannot be consulted for first pilot studies because of practical or ethical reasons. Although there is a relatively large body of case-like reports often containing positive experiences with the application of ASS, a comprehensive systematic overview of the currently existing research on the psychological and physical outcomes of wearing an ASS is missing. A scoping review on age simulation interventions was published in 2017, but included only two studies, which were conducted among nursing students (Coelho et al. 2017). A second review article focused on the effects of ASS on attitudes, empathy and anxiety levels among student populations only (Eost-Telling et al. 2020). A third most recent review focused on the educational effects of ASS on personcentered care (Bowden et al. 2021).
These reviews of existing data on ASS have some shortcomings and gaps in their syntheses. First, none of the available reviews addressed outcomes related to physical functioning such as strength loss, gait parameters or balance issues. Second, none quantified and discussed the validity of simulated physical impairments as a realistic simulation in comparison with "real ageing." Third, existing reviews show limitations on included study populations. Eost-Telling et al. (2020) and Bowden et al. (2021) only included studies with participants in the healthcare sector, which limits generalizability. Fourth, they also included geriatric (medication) games, using role-playing, i.e., with focus on medication intake, meaning that some participants only acted as observers so that not everybody experienced the simulation firsthand. Further, game-based approaches did not apply a complete ASS, but typically used certain parts of the ASS set-up, thus not allowing for a full and more holistic experience.
Therefore, the first objective of the present review was to synthesize the current research examining the effect of ASS interventions on psychological as well as physical outcomes. Psychological outcomes have partly been addressed by Eost-Telling et al. (2020) and Bowden et al. (2021), but several more recent studies have not been included in their synthesis yet. In addition, we did not exclusively focus on students from health professions like previous reviews, but also include studies targeting general populations of younger and middle-aged adults.
Our second objective was to analyze indicators able to estimate the validity of existing ASS in simulating typical ageing processes, i.e., by drawing on reference values of established assessments or via comparisons with the performance of older adults in the target age of the simulation.

Methods
We checked PROSPERO (https:// www. crd. york. ac. uk/ prosp ero/) for similar systematic reviews on this topic or ongoing projects. No registered review could be found. The systematic review was prospectively registered in PROSPERO (CRD42021232686, February 28, 2021) and was conducted in accordance with the PRISMA statements (Moher et al. 2009).

Search strategy
In June 2020 a literature search was conducted in seven electronic databases (BASE, Cinhal, Cochrane, ProQuest, PsychINFO, Pubmed, and Web of Science) without time limits for publication years. Search terms and combinations were customized for each database as shown in the supplemental material,

Eligibility criteria
Studies were included if they (a) applied ASS to mimic physical and sensory limitations; (b) reported qualitative, quantitative, or mixed-methods outcomes regarding attitudes, understanding or empathy toward older adults and/ or assessments of physical functioning (i.e., gait, mobility, balance, strength); and (c) if they were published in English or German language. We also included gray literature and excluded reviews, meta-analyses, comments, protocols, case reports and conference papers/presentations. Studies simulating specific medical conditions (i.e., hemiparesis) were excluded, as we focused on typical and frequent ageing-related physical and sensory limitations. Educational board games or role-plays, which concentrated on single sensory or physical restrictions and did not explicitly report an intervention for all participants, were excluded. Studies which did not report any results or did not initially aim to study effects with a clear research question (i.e., evaluations of seminars) were excluded as well.

Selection
We screened all articles by title and abstract to identify potentially relevant manuscripts based on the inclusion criteria. At this level, only very obviously ineligible titles were removed. For the full-text screening, two authors (AS, LS) independently assessed 50% of the potentially eligible articles while one author (TG) independently assessed all. Disagreements were resolved through discussion and involvement of the respective uninvolved author (AS or LS). Subsequently, the first author extracted information on the study (author, title, year of publication, country of origin), study characteristics (design, methods, sample size, types and modalities of the simulation and duration of interventions), participants' characteristics (age, gender), and indices regarding self-reported psychological outcomes (i.e., empathy) and/or physical performance outcomes (i.e., gait, flexibility). If relevant data were not available, we contacted the authors of the study to request missing information.

Quality assessment and statistical analyses
To assess the quality of selected articles the Mixed Methods Appraisal Tool (MMAT) for systematic mixed methods reviews was used (Hong et al. 2019). Two authors (AS, LS) independently assessed 50% of the articles, while one author (TG) assessed all articles. Disagreements were resolved by discussion, with the involvement of respective uninvolved author (AS or LS) if needed. To compare the effects of ASS interventions between studies, we calculated pre-to-post effect sizes (Cohen's d) from the indices reported or received on request (Lenhard and Lenhard 2017). Cohen's d is interpreted as followed: no effect: d = 0-0.1; small effect: d = 0.2-0.4; medium effect: d = 0.5-0.7; large effect: d ≥ 0.8 (Cohen 1988). Subsequently, we calculated pre-to-post weighted mean effect sizes for attitudes and empathy separately by weighting each effect size by the respective sample size of the study participants receiving an ASS intervention. Those weighted means also include pre-to-post differences of the intervention groups of the few (randomized) controlled studies, weighted by the number of participants in the respective intervention group. For the latter designs, we additionally calculated effects sizes for group differences (control group vs. intervention group), taking into account baseline scores (Morris 2008). Figure 1 illustrates the results of the screening process according to the PRISMA guidelines (Moher et al. 2009). A total number of 1948 articles was found. 1890 abstracts were screened after removing duplicates and 94 were included for the full-text screening. At full-text level, 68 studies were excluded because of the following reasons: geriatric medication/ageing games, role plays or similar studies not using ASS (n = 27), conference contributions (n = 11), not reporting respective results (n = 7), non-academic reports (i.e., newsletter) (n = 7), language not English or German (n = 5), review articles (n = 3), unavailable after contacting the authors (n = 3). After the quality assessment, further studies were excluded due to insufficient data to answer the two screening questions (see next section; n = 5). Finally, 26 articles were included in the synthesis. Of those, 15 studies had not been included in previous review articles.

Quality assessment
The MMAT (Hong et al. 2019) for quality assessment offers the opportunity to evaluate diverse study designs in five categories (1) qualitative, (2) quantitative randomized, (3) quantitative non-randomized, (4) quantitative descriptive, and (5) mixed methods. The tool draws on two screening questions.
(1) "Is there a clear research question?" (2) "Do the collected data allow to address the research question?" and five additional quality criteria, varying depending on the category of study design. Results of quality assessment revealed a heterogeneous picture of study quality. Two studies that did not meet the first screening question and three studies that did not meet the second screening question of the MMAT analysis and were therefore excluded from the following synthesis. Three studies announced written or oral feedback in seminars as qualitative results, but used quantitative descriptive methods to analyze data and were therefore evaluated in the respective category of the MMAT.
With respect to study design, we included qualitative studies (n = 1), quantitative randomized (n = 2), quantitative non-randomized designs (n = 16), quantitative descriptive designs (n = 2) and mixed methods studies (n = 5). One of the randomized trials also included qualitative results and was therefore assigned to the mixed methods category. Results of the MMAT indicated that eight articles met all five quality criteria of the respective design, twelve articles did not meet one criterion and six did not meet two criteria. More specifically, the qualitative study met all relevant criteria, the two quantitative randomized studies received good ratings, with the exception that assessors' blinding was unclear (n = 1) or not implemented (n = 1). Among the sixteen quantitative non-randomized studies, eight studies did not describe nor analyze confounders, while two studies included participants that were not suitable or representative for their target population. For the two quantitative descriptive studies, it remained unclear if authors controlled for nonresponse bias in both studies and in one study, participants were not suitable or representative. Among the five mixed methods studies, there was one study missing an explanation for integrating qualitative and quantitative methods and one study missing a link between chosen methods and their interpretation. Furthermore, two studies lacked an explanation for divergences between quantitative and qualitative results (n = 2).

Study characteristics
Key characteristics of the included studies are presented in Table 1. Studies were mostly conducted in Europe (n = 11), followed by Asia (n = 7), the United States (n = 3), Turkey (n = 2) and one from Australia, Egypt and Iran, respectively. Publication dates ranged from 2001 to 2021, with twelve of the 26 articles published in 2020 and 2021. Sample sizes varied depending on research method and design used. The nineteen studies collecting quantitative data with questionnaires reported the largest numbers of participants (range: N = 49-330), followed by studies on physical performance measurements (range: N = 20-178), and qualitative methods (range: N = 15-64). The majority of studies (n = 21) predominantly included participants between 20 and 30 years, due to the fact that most studies were conducted with pharmacy, medicine or nursing students, and younger health care staff. The duration of the procedures including the application of the ASS, habituation phase (if implemented) and the execution of a diverse range of tasks under ASS conditions ranged from 10 min (Hsu et al. 2016) to 4 h (Bowden et al. 2020). Some studies were embedded in university courses, consisting of an introduction by means of a lecture (Akpinar Söylemez et al. 2021;Jeong and Kwon 2020;Mohamed et al. 2017;Robinson and Rosher 2001;Yu and Chen 2012); others followed a workshop format (Filz 2010) including interactions with older adults (Lee and Teh 2020). In order to study mid-to long-term effects, follow-up designs were only used in three studies, but solely regarding psychological outcomes (Jeong et al. 2017;Jeong and Kwon 2020;Lee and Teh 2020), varying between 3 weeks and 3 months. The vast majority of studies (n = 20) aimed to find starting points to enhance the quality of care, and therefore addressed empathy, attitudes, and/or understanding as these are assumed to be critical skills for health professions. These outcomes were measured by questionnaires, qualitative interviews, or evaluations of group discussions. Another six studies tried to fathom if ASS can simulate diverse age-related impairments and used quantitative performance measurements, e.g., heart rate to determine the physical load, geriatric assessments, gait analysis, and cognitive tasks in one study.

Psychological outcomes
Detailed information on study results and calculated effect sizes can be found in Tables 2 and 3. Nineteen studies measured the effects of ASS on psychological outcomes quantitatively with established or self-developed questionnaires. The following instruments were applied (alphabetical order with frequency used in brackets): Aging Semantic Differential (ASD) (3), Attitude Toward the Older People Scale (1) (11) no respective results (7) non-academic reports (7) not English or German language (5) reviews (3) not available (3) MMAT (5) Studies included in systematic review (n = 26) ((C)-FAQ) (3), Semantic Differential Scale (SD) (2), UCLA Geriatric Attitudes Test (UCLA-GA) (1), and Willingness to Care for Older People Scale (WCOP) (1). Four studies also used self-developed questionnaires. The predominant purpose of these studies was to investigate the usefulness of ASS to improve empathy and/or attitudes toward older adults and/or raise the awareness regarding challenges of the ageing process among samples of younger adults. The most frequent outcome measures were attitudes (n = 12), followed by assessments of empathy and understanding (n = 9), willingness to care for (n = 2) or behavior toward older adults (n = 1). As different scales vary in their coding procedures (i.e., lower scores in ASD, SD or MSS indicate more positive attitudes toward older adults), the term increased is used in the following to indicate more positive and the term decreased is used to indicate more negative attitudes or empathy. Hence, positive effect sizes (Cohen's d) represent an improvement within the respective construct.
Regarding the 12 studies that assessed attitudes toward older adults, our effect size calculations (pre-to-post) with the reported scores of the two randomized controlled trials indicated small (d = 0.36) and medium-sized positive effects (d = 0.71). Our calculations on quantitative non-randomized studies (n = 8) revealed small positive effects (n = 3; range: d = 0.34-.46), one large positive effect (n = 1; d = 4.43), two small negative effect sizes (n = 2; d = -0.23 and -0.36), and no effect (n = 2; d = -0.09 and d = 0.16), respectively. For the quantitative parts of the two mixed method studies, our calculations indicated one large positive (d = 0.95) and one medium negative (d = − 0.63) effect. Of note, two studies that initially found negative effects on attitude measures after the ASS intervention reported positive changes in a later follow-up (Jeong et al. 2017;Jeong and Kwon 2020). Overall, the weighted mean effect size for pre-to-post changes in attitudes was d = 0.33, corresponding to a small effect; detailed results for each study can be found in Tables 2 and  3. We additionally calculated effect sizes between groups for the five studies that used controlled designs (IG vs CG, see Table 2). The weighted mean effect size for attitudes in those between-subjects designs was d = 0.29, corresponding to a small effect.
Regarding the outcomes concerned with empathy for older adults, our effect size calculations indicated no effect (n = 1; d = 0.12) for the randomized trial within the mixed methods design, small and medium effects (n = 3; d = 0.40, d = 0.48, d = 0.54) for the non-randomized quantitative designs, and one small and one large effect for the quantitative parts of the two mixed method studies (n = 2, d = 0.42 and d = 1.03). Two studies reported no adequate data to compute effect sizes. The weighted mean effect size from pre-to-post changes in empathy was d = 0.54, corresponding to a medium-sized effect. We additionally calculated effect sizes between groups for empathy (IG vs. CG, see Table 2) for the three studies that used controlled designs. The weighted mean effect size in those between-subjects designs was d = 0.07, corresponding to no meaningful effect.
From the six qualitative and mixed methods studies, four conducted semi-structured interviews or discussions, where participants could share their experiences after wearing an ASS (Bowden et al. 2020;Jeong et al. 2017;Ross et al. 2013;Sari et al. 2020). In their analysis of focus groups, Bowden et al. (2020) reported enhanced insight for the process of ageing among the participants and growing empathy for their future self. Jeong et al. (2017) used in-depth interviews and results indicated a better understanding for challenges due to physical and sensory impairments. Reports on subjectively increased empathy and the feeling of having gained a better understanding of the process of ageing were communicated across all ASS studies evaluating qualitative data. Beyond that, Jeong et al. (2017) reported subjectively increased willingness to care for older adults, Sari et al. (2020) reported higher awareness regarding difficulties with activities of daily living, and Ross et al. (2013) reported better understanding for specific needs of older people, i.e., fear of falling and feeling safe. Lavallière et al. (2017) reported that the participants rather attributed perceived difficulties to complete given tasks to environmental restrictions than to the ASS, i.e., narrow aisles in a supermarket. Therefore, the analysis did not indicate differentiated awareness of agerelated limitations caused by the suit. Lee and Teh (2020) included a practical interaction with older adults in their polypharmacy workshop before the ASS intervention. Afterward, they used open-ended questionnaires and identified three themes (1) "lending an ear", which meant taking more time to listen, (2) "sense of respect" meaning realizing the challenges in the lives of older adults, and (3) "understanding the emotion," which indicated the importance of empathy in healthcare.

Physical outcomes
Six studies assessed the effects of ASS on physical performance (see Table 4). Four studies used validated (geriatric) assessments Lavallière et al. 2017;Vieweg and Schaefer 2020;Watkins et al. 2021), two focused on self-developed or modified established tests (Scherf 2014;Zijlstra et al. 2016). We calculated Cohen's d effect sizes for the differences with and without the ASS, indicating within-subject or pre-to-post differences (Lenhard and Lenhard 2017). Negative effect sizes represent decreased physical performance in respective tasks, abilities, or physiological parameters. Lavallière et al. (2017) found significantly decreased performance while wearing an ASS in postural balance tests (standing on both legs with eyes open: d = -0.57; eyes closed: d = -0.99), flexibility The two remaining studies used additional physiological and subjective indicators to quantify physical load. Scherf (2014) monitored younger assembly line workers accomplishing a task (putting together automotive parts) with and without an ASS and additionally compared them with older employees without an ASS. In comparison with measures without ASS, participants' heart rate (d = -1.02), subjective physical load (d = -2.03), and completion time (d = -1.02) increased, which characterized a decreased performance. Zijlstra et al. (2016) assessed heart-and respiratory rate, route efficiency, and walking speed in a wayfinding task in a hospital. Findings indicated that while wearing an ASS, participants had a higher heart rate (d -0.60) and respiratory rate (d = -0.35), and were walking significantly slower (d = -0.72); no significant changes were found in route efficiency (d = -0.15).

Findings on research question 2: validity of age simulation suits regarding various age-related impairments
Five of the six studies on physical performance measures provided data that could be used for our second aim, namely to clarify if ASS are valid in terms of a realistic simulation of normative age-related performance decreases (see Table 5). To classify and compare study results, we used established reference values, if available.
We identified five studies comparing ASS physical performance data with reference data. Lauenroth et al. (2017) examined various gait variables (velocity, step length, step time, base width) and compared them between different age groups. Four younger groups (18-29, 30-39, 40-49, and 50-59 years)   were comparable between participants aged 40-49 years with ASS and those aged 60-69 years without ASS in the study (respectively, step length for 50-59 years with ASS was comparable to 70-85 years without ASS). The results of participants aged 40-49 years with ASS for gait velocity were also comparable to external reference values for females aged 60-69 years, but not for male reference values. Younger participants, aged 18-29 years, wearing ASS were slightly faster than reference values for 70-79 year old males and females. For step length and step time, participants' results with ASS were still better than reference values of adults older than 70 years. Established reference values for base-width were not available. Lavallière et al. (2017) assessed different physical outcomes, but without relating these to available reference values. Their reported gait velocity of younger adults (20-29 years) with ASS, conducted on a ten meter walkway, corresponded to reference values for females aged 50-59 and males aged 60-69 years (Bohannon and Williams Andrews 2011).
In the study of Zijlstra et al. (2016), gait velocity was calculated by the time to complete a wayfinding task and the measured distance walked when wearing an ASS (participants' age: 20.0 ± 1.8 years). Reported results were still better than reference values of adults aged 50 years and did not correspond to the target group of older adults of 65 years and older (Bohannon and Williams Andrews 2011). Vieweg and Schaefer (2020) conducted the FFT with a group of students (20-28 years) and compared their results with reference values from Rikli and Jones (1999). The included arm strength test revealed results comparable to reference values of adults aged 60-64 years. Participants' leg strength also decreased when wearing the ASS. Nevertheless, men still did better than reference values for people in their mid-50 s. Results of the TUG indicated a decline with ASS that was comparable to 60-64 years old adults, which was similar for aerobic endurance (2 min stepping test). Hip flexibility with ASS was still better than normative values of adults 60-64 years and shoulder flexibility was comparable to 65-69 years old adults (male/female).
Finally, Watkins et al. (2021) noted that three of their thirty participating students (20-40 years) were not comparable to reference values of middle aged adults (due to still very high performance), though conducting the FRT, six students wearing the ASS reached normative values of 41-69 years old adults, while 21 students reached values of 70-87 years old adults (Long et al. 2020). For the TUG, they reported longer completion time with ASS for all participants. Thirteen participants met normative values for 60-69 years, five for 70-79 years and one for 80-89 years old adults.
Taken all five studies together, results indicated that ASS reduced the physical performance in almost all domains, but overall not to the extent that participants were comparable to older adults' reference values, when wearing the ASS.

Discussion
The primary purpose of this review was to synthesize the current research on ASS and their effects on psychological and physical performance outcomes. Second, the validity of ASS in terms of a realistic simulation of the normative ageing process particularly in its functional domains has been a target of the paper. 26 studies with publication years ranging from 2001 to 2021 were finally included, of which twenty addressed psychological outcomes such as empathy for and attitudes toward older adults, while six focused on physical assessments. Seventeen of the included studies were published in the last 5 years, thus demonstrating that research on ASS found much interest recently. Only five articles contained information that allowed an estimation of the age validity of wearing an ASS, i.e., by providing data from established assessments we could compare to reference values of older adults.

Effects of age simulation suits: psychological outcomes
The majority of studies reported a positive effect on empathy for and attitudes toward older adults. For all studies assessing pre-to-post changes, the weighted mean effect size was d = 0.33 for attitudes and d = 0.54 for empathy. However, some of the rare studies that used controlled designs did not find meaningful differences between the control group and ASS group (Cheng et al. 2020;Lee and Teh 2020), or even negative effects on attitudes immediately after wearing an ASS (Jeong et al. 2017;Jeong and Kwon 2020;Lucchetti et al. 2017). In conclusion, the effects of wearing an ASS on psychological outcomes seem to be overall positive; still, the rather short time frames covered have to be considered. That is, only three studies assessed outcomes in follow-ups longer than three weeks (Jeong et al. 2017;Jeong and Kwon 2020;Lee and Teh 2020).
Taking a more critical look, some of the positive effects cannot be solely attributed to the ASS interventions, as similar results in control groups led to the conclusion that addressing the feeling of being older could be sufficient to improve attitudes toward older adults. As one study indicated that "placebo clothes" caused similar reactions the mind-set of being older might have influenced participants in the same way (Cheng et al. 2020).
Some articles reported reduced positive attitudes toward older adults or decreased empathy immediately after the ASS intervention. The authors concluded that the simulation raised negative emotions, such as anxiety and fear of future physical or sensory limitations, which might lead to these effects. This finding underlines the importance of providing the opportunity to reflect on the experiences. Furthermore, the measurements largely focused on attitudes and empathy, whereas multifaceted views on one's own ageing process such as awareness of age-related gains and losses (Diehl and Wahl 2010) or ageing-related changes in stereotypes in diverse domains (Kornadt and Rothermund 2011) have not been studied yet. Similarly, research has not addressed how ASS affect broader constructs related to more general views on ageing, i.e., age stereotypes in different life domains, perceived obsolescence, or health-related risk perception. Some of the rather descriptive designs or qualitative evaluations gave the impression of not being a priori planned as a study, but rather as a post-hoc course evaluation. This may have led to a publication bias, with positive effects being more likely to be published, whereas mixed or negative results might be underrepresented. In addition, the often missing randomization and blinding of assessors, as well as the assessment of psychological outcomes prone to social desirability may have resulted in biased results. Qualitative results might be biased even more by social desirability, i.e., answering in a manner that will be viewed favorably by other students or the investigator in focus groups. However, the setting and expectation of improvements are rather obvious in most designs. In summary, the limited number of controlled studies only allows for cautious and preliminary conclusions and further research is needed.

Effects of age simulation suits: physical outcomes
Six included studies focused on a variety of performancebased measures addressing the areas of gait parameters (n = 3), flexibility (n = 3), functional mobility (n = 2), balance (n = 2), physiological changes (n = 2), strength (n = 1) and aerobic endurance (n = 1). Strongest decreases in terms of effect sizes due to wearing an ASS were found for flexibility and functional assessments, whereas smaller decreases appeared in balance tests. In most studies, established assessments such as the TUG, FFT, and gait performance were used (n = 5). Limitations with respect to accuracy (i.e., velocity measured with stopwatches) could be overcome with more advanced technical systems. Moreover, covariates such as participants' fitness level or physical activity habits should have been taken into account.
For future ASS studies focusing on physical performance, more complex tasks, more diverse established assessments and everyday activities might have the potential to depict age-related limitations that are often multidimensional and might not be replicated in isolated measurements. For example, motor-cognitive dual tasks, dynamic balance, or (instrumental) activities of daily living could be considered.

Validity of ASS in terms of simulating the ageing experience realistically
For our second objective, to summarize and quantify indicators that can be used for estimations of validity, we were able to draw upon findings from five studies, with three studies assessing gait velocity. The consideration of gait variabilities offers a well-established quantification in locomotion, bearing the advantage that reference values are available for many parameters. Results indicated a decreased performance for young and middle-aged participants and resulted in an "instant ageing" effect of about 20-40 years, when comparing established gait assessments to reference values. The extremely reduced gait velocity in one study (Zijlstra et al. 2016) was not representative for older adults. Though, it should be considered that the authors calculated gait velocity after completing a full wayfinding task, whereas reference values are mostly lab-based data with known limitations, but without distractions. However, reported step length and step time did not reach the levels of older reference groups and the participants still demonstrated better performance . Overall, results indicated that performance scores of the assessments with ASS were often not corresponding to age norms of adults aged 60-64 years or older, but still resembled younger age groups i.e., in leg and arm strength or aerobic endurance. One explanation might be a general good fitness level of participants, which may ASS, Age Simulation Suit; n, numbers, s, seconds, cm, centimeters, °, degrees, min, Minute; calculated effect size Cohen's d (Lenhard and Lenhard 2017) variant 1 effect size represents pre-to-post differences; positive/negative effects reporting increased/decreased performance, Lauenroth et al. (2017) was excluded from the table due to not conducting a within-subjects design and therefore missing comparability  5.3 ± 10.2/1.5 ± 12.2 5.1 ± 9.1/0.0 ± 11.7 3.6 ± 9.4/-1.0 ± 11.7 -5.4 ± 10.4/-2.8 ± 11.9 1.3 ± 9.4/-5.1 ± 12.7 -0.3 ± 9.4/-6.1 ± 10.4 Shoulder flexibility (cm) -3.3 ± 6.5/-10.4 ± 12.0 -1.8 ± 8.9/-8.6 ± 12.2 -3.0 ± 9.4/-10.4 ± 12.4 -4.3 ± 9.7/-11.4 ± 12.4 -5.3 ± 10.4/-14.2 ± 13.0 -6.6 ± 10.7/-14.5 ± 13.7 -9.9 ± 11.4/-15. not be representative. Moreover, length of habituation phase and length of simulation intervention can influence physical performance and has to be considered. Still, complex assessments (TUG, BBS) demonstrated that more than 50% of participants had an increased risk of falling while wearing the ASS and that scores resulted in an "instant ageing" of about 30-40 years. These test are known as the gold standard for evaluating balance limitations in older adults, as impaired balance is one of the major risks for falls in older adults and therefore an important indicator for a typical ageing process (Ambrose et al. 2013). Regarding flexibility measurements, the three respective studies reported mixed findings. Some isolated flexibility measurements seemed to be overstated (e.g., neck), while others were in line with reference values of older adults (FFT shoulder and FRT overall score).
One study assessed the Digit Symbol Test with and without an ASS and found that the performance with ASS was comparable to reference values of adults older than eighty. However, the authors assumed that a large portion of the decline was due to visual impairments rather than cognitive challenges.
In conclusion, physical performance decreases could be simulated among younger and middle-aged participants in most assessments, but predominantly not to the extent that represents adults older than 65 years or even fourth age (80+). Some of the suppliers of ASS specify certain age ranges (i.e., mid-70 s; AGNES ASS) that should be reached with their ASS or claim that users age 30 to 40 years (i.e., GERT ASS), but those assumptions have not been verified with data yet. Our review provides first insights but points out the need for differentiation regarding the population under study with the ASS and the specific tests that are applied. The mentioned studies reinforced the attempt to use of ASS to mimic typical age-related impairments, but should be recognized as a start or proof of concept.

Strengths and limitations
This review's focus on rather homogeneous ASS interventions, thus excluding ageing and geriatric games, which are conducted with people only observing, giving not all participants the chance to experience the simulation, and the consideration of a broad range of outcomes can be seen as strengths and a new approach to the matter. While earlier reviews focused on psychological outcomes only (Bowden et al. 2021;Coelho et al. 2017;Eost-Telling et al. 2020), we extended the synthesis regarding performance-based assessments in our first objective, calculated effect-sizes wherever possible and provided insights on validity estimates in our second objective. Limitations included quite large variations in method quality and study designs and little to no information and consideration of confounders (i.e., sociodemographic information, health status, previous   Watkins et al. (2021) 31.25 ± 5.8 35.1 ± 5.6/37.8 ± 5.6 26.7 ± 8.9/33.5 ± 4.1 experiences with older adults) in the included studies, which may reduce the reliability of results. This adds on to a possible publication bias resulting in the under-representation of negative results. As only studies in English or German language were included, and samples were predominantly drawn from Western, educated and industrialized populations, the generalizability of findings is also limited.

Conclusion
ASS play a prominent role in various contexts as an educational device able to evoke empathy and better understanding of what it means to get older. Considering this, it would be highly desirable to be able to rely on robust research supporting that ASS devices are able to fulfil both, enhancing empathy and positive views on ageing as well as doing this based on a realistic and valid simulation of the typical ageing process. Regarding the rapid growth of research on ASS interventions with the large majority of the included studies published in the recent 5 years, there indeed seems to be a promising development in this research area. Largely consistent with earlier reviews focusing on psychological outcomes of wearing an ASS, predominantly positive effects on attitudes and empathy toward older adults were identified, although effect sizes were not calculated in earlier reviews and showed large variation in our work. The existing research reporting in some instances conflicting findings, sometimes pointing in a more negative direction of ASS effects, unfortunately does not allow for definite conclusions under which conditions such negative consequences are likely to occur. This would be an important task for future research. Given that the awareness of ageing processes and the ability to change perspectives are important soft skills for health care professions. Given that the simulation of older age might help younger adults such as those in midlife to better prepare for their own ageing, ASS indeed seem to be an important resource for future ageing societies on different levels. Regarding a range of key physical outcomes important for independent functioning in everyday life, large effects were identified, although this part of the previous research is still relatively small. Therefore, research on a diversity of outcomes echoing everyday challenges including more complex everyday tasks such as doing chores or cooking would be an important addition. Still, the crucial point is to simulate a range of motor-related everyday tasks in a realistic and age-valid way. Here, considering the domains of gait, functional mobility and strength, only limited evidence is available for the accurate simulation of 65+ years older adults with younger participants wearing an ASS. Future research should follow robust (controlled) research designs, include follow-up measurements, and reduce the likelihood of social desirability, e.g., by using less obvious questions and drawing on anonymous questionnaires instead