Background

Concussion, otherwise referred to as mild traumatic brain injury (mTBI), is described as a transient disturbance of brain function [1] and is a common injury in contact sports, such as rugby league [2], and in certain occupations, such as military personnel [3]. Concussions are caused by transfer of energy across the brain as a result of direct (collision) or indirect (whiplash mechanism) trauma to the head and/or neck [4, 5]. Such impacts cause disruptions in normal cellular function, resulting in an ‘energy crisis’ [4,5,6,7,8,9], with symptoms typically including headache, nausea, poor coordination, vision deficits, and behavioural abnormalities such as irritability or depressive mood states [5, 10]. Given the multiple symptoms that present following a concussion, monitoring recovery can be complex for clinicians and practitioners.

To account for the multitude of symptoms experienced, a variety of assessment tools are made available for clinicians [11]. Across numerous sports, athletes diagnosed with a concussion are guided through a graduated return-to-play (RTP) process by a medical practitioner and/or rehabilitation staff. Progress through the staged RTP is primarily based upon symptom resolution at rest and during exertion as well as a return to pre-concussion baseline for cognitive and motor scores [12,13,14,15]. Of concern, however, is the ambiguity surrounding diagnostic tools and more specifically, the lack of evidence supporting their implementation in the latter stages of concussion management. For example, the common subjective balance assessments used by clinicians (e.g. BESS and tandem gait) [16] may lack the resolution to detect changes in function that can linger post-concussion. There appears to be an increased risk of subsequent concussion and musculoskeletal injuries up to 12 months following SRC [17,18,19], which may be linked to lingering motor deficits [20] and suggest that subclinical changes remain beyond RTP clearance that are poorly detected by many of the assessments readily available to clinicians [17, 19, 21]. Reliance on diagnostic tools as a means to evaluate recovery in conjunction with the subjective nature of many clinical assessments may explain why subtle, underlying motor changes go largely undetected [22]. Due to this concern, it is important to understand how post-concussion changes in motor performance can be monitored more effectively, thus allowing clinicians to make decisions based on sound objective data as well as clinical judgement.

To minimise the risk of incorrect recovery diagnosis, assessments need to demonstrate clinically acceptable reliability and validity, whilst also being feasible to conduct. Reliability refers to an instrument’s ability to produce consistent measures across multiple time points, thus ensuring change in score is attributed to changes in performance as opposed to instrument errors [23, 24]. Validity can be broken into three categories; logical, criterion, and construct [25]. For this review, only construct validity has been reported, i.e. an instrument’s ability to correctly diagnose concussed and non-concussed populations. The higher the sensitivity and specificity of an instrument, the better its ability to classify those with and those without concussion [25]. Feasibility is also vital to consider when selecting a test, the time, and the resources and expertise required as these will influence which tests can be administered.

Numerous lower-limb motor assessments are reported in the literature to monitor impairments following concussion, with varying time, expertise and equipment requirements. Despite this, there is no consensus or collated sources on the reliability, validity and feasibility of these assessments, which makes it difficult for clinicians and practitioners to select the most appropriate assessment based on needs and time since concussion. This systematic review aims to [1] consolidate the reliability and validity of motor function assessments across the time course of concussion management and [2] summarise their feasibility for clinicians and other end-users. The purpose is to provide clinicians with evidence to support the utility and practicality of selected assessments and identify potential gaps in the current management of concussion.

Methods

Search Strategy

This systematic review was structured in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [26] and registered on PROSPERO (reg no. CRD42021256298). Five academic databases, including SPORTDiscus, CINAHL, Web of Science, Medline, and Scopus were systematically searched from earliest record to May 17, 2023. Eligible studies were identified through searching titles, abstracts, and keywords for predetermined search terms (Table 1). References were extracted from each database and imported into a reference manager (EndNote X20, Clarivate Analytics, London, United Kingdom) before removing any duplicate articles. Subsequently, to allow simultaneous, blinded screening, articles were imported into Covidence (www.covidence.org; Melbourne, Australia), an online tool for systematic reviews. Titles and abstracts were analysed by one reviewer (LD); the full texts of the remaining studies were then assessed by two reviewers (LD and RJ). Where any conflicts arose, the two reviewers met to determine study eligibility.

Table 1 Search terms used for review; search 1 to 5 was combined with the operator ‘AND’, search 6 was combined with the operator ‘NOT’

Eligibility Criteria

Eligible studies must have (1) been original research articles (2); been full-text articles written in the English language (3); been peer-reviewed articles with level of evidence equal to or greater than level III [27]; (4) assessed the validity of lower-limb motor assessments used to diagnose or determine RTP readiness for athletes or military personnel who had sustained a concussion or (5); assessed the test–retest reliability of lower-limb motor assessments used for concussion management amongst healthy athletes. Acceptable lower-limb motor assessments were classified into four categories: static, dynamic, gait, and other. Static balance assessments included tasks in which individuals remained in a fixed point during various stances (e.g. BESS) where postural sway or number of balance errors were the outcome variables. Dynamic balance assessments included any task that required movement (e.g. limb excursion) from an individual, while remaining at a fixed point (e.g. Y-balance test). Gait assessments comprised of any task that required locomotion with both temporal and/or spatial parameters measured. Assessments that were specific for sport or military tasks were categorised as other. Further categorisation was performed with assessments being classified as non-instrumented (subjective scoring or use of basic equipment [i.e. Stopwatch]) or instrumented (objective [i.e. accelerometers]).

For studies to be included as reliability studies, they must have assessed the test–retest (intra-class correlation coefficient (ICC)) or inter-rater reliability of an assessment in healthy athletes. For validity, studies must have assessed the between-group differences of a lower-extremity motor task in a case–control study or shown the predictive performance of the measure to diagnose concussed and healthy participants (i.e. area under the curve (AUC), sensitivity, specificity). Reference lists from eligible studies were manually examined for any studies missed during initial search. Selected studies were then screened and assessed for eligibility. Commentaries, letters, editorials, conference proceedings, case reports, conference abstracts, or non-peer-reviewed articles were excluded. Studies examining animal or biomechanical models of brain injury were also excluded from analysis.

Data Extraction and Analysis

Data from eligible studies were extracted into Covidence. Data pertaining to study characteristics and protocols were first extracted from eligible studies. All relevant outcome measures (reliability and/or validity measures) were extracted from each study. Data were categorised according to: assessment type (e.g. static, dynamic, gait) and relevant findings being reliability and/or validity (e.g. sensitivity, specificity). Due to the heterogeneous nature of the findings, a meta-analysis was not performed.

Quality Assessment

To assess the methodological quality and the clinical reported outcome measurements (ClinROMs; reliability and validity), the Consensus-based Standards for the selection of health Measurement INstruments (COSMIN) Risk of Bias tool for outcome measurement instruments [28] and the COSMIN guideline on Risk of Bias to assess quality of studies on reliability and measurement error, that is the variability between repeated measures, were used [29]. The COSMIN checklists were developed to quantitatively assess the methodological quality of studies and the ClinROMs evaluated. The first step involved rating the methodological quality for each study, which was assessed against nine measurement properties: content validity, internal structure (structural validity, internal consistency, and cross-cultural validity), reliability, measurement error, criterion validity, hypotheses testing for construct validity, and responsiveness. Each measurement property was assessed using a four-point grading scale; very good, where the model or formula was described and matched the study design; adequate, where the model or formula was not described, or did not match the study design; doubtful, where no evidence of systematic difference was provided; and inadequate, where calculation was deemed not optimal. Overall methodological reporting quality was determined using the ‘worst score counts’ approach [28, 30]. Feasibility of the assessment tool is no longer included within COSMIN’s measurement properties as it does not refer to the quality of an outcome measurement instrument. We highlighted the feasibility of an instrument, by reporting the interpretability of the outcome, time to complete, and equipment and expertise required. The second step was to rate the ClinROMs from each study (validity and/or reliability values) using the COSMIN criteria for good measurement properties guideline [30]. A rating of sufficient ( +), insufficient (−), or indeterminate (?) was given for each assessment’s measurement property based on the statistical outcome measures for each measurement property [29]. Two authors (LD and RJ) independently assessed the methodological quality and measurement property of all studies; any disagreements were discussed by these authors.

Results

Search Results

The systematic search retrieved 1270 results from five academic databases, of which 637 duplicates were removed. Titles and abstracts of the remaining 633 studies were screened, with 475 not meeting eligibility criteria. Full-text review was conducted on the remaining 158 studies, with 112 deemed ineligible. A total of 46 studies were eligible, with an additional 21 included via the manual screening of reference lists. Therefore, this review included a total of 67 studies. The identification process is outlined in Fig. 1.

Fig. 1
figure 1

PRISMA flowchart depicting steps taken in the search strategy

Research Quality

The quality of research investigating the reliability and/or validity of lower-limb motor assessments for concussion management was variable, with methodological reporting quality ranging from inadequate to very good. Measurement property quality for all studies ranged from sufficient to indeterminate (see Additional file 1: Tables S1–S18).

Study Characteristics

Reliability

Studies were conducted on healthy adults (n = 29) and minors (n = 8), with a total sample size of 6888. The most common assessments were the BESS and tandem gait (instrumented and non-instrumented), each representing 15% of all assessments. A summary of study characteristics is presented in Tables 2, 3, 4 and 5. A full table of study characteristics is presented in Additional file 1: Table S1 through to Additional file 1: Table S7.

Table 2 Overview of reliability, validity and measurement error for static balance assessments for assessments used to monitor movement changes following a concussion
Table 3 Overview of reliability, validity and measurement error for dynamic motor assessments used to monitor movement changes following a concussion
Table 4 Overview of reliability, validity and measurement error for gait assessments used to monitor movement changes following a concussion
Table 5 Overview of reliability, validity and measurement error for other motor assessments used to monitor movement changes following a concussion

There were 22 different lower-limb motor assessments used across 37 different studies; 12 studies assessed the reliability of more than one test; and one study assessed reliability for adults and minors (see Additional file 1: Table S1). Assessments were categorised as static balance (n = 20 studies, 9 different assessments), dynamic balance (n = 5 studies, 4 different assessments), gait (n = 13 studies, 9 different assessments), or other (n = 1 study, 1 assessment). Studies were further subdivided based on type of reliability: test–retest (n = 34 studies, 20 different assessments) or inter-rater (n = 5 studies, 5 different assessments) and instrumented (n = 13 assessments) or non-instrumented (n = 9 assessments).

Static Balance Assessments

For static balance assessments, test–retest correlations ranged from 0.13 to 0.94 with measurement property quality ranging from doubtful to adequate. Outcome variables for non-instrumented assessments included time and number of errors. Instrumented assessments reported number of errors, centre-of-mass (COM) displacement, and centre-of-pressure (COP) displacement. Time between assessments ranged from the same day to 20 months, with a tendency for poorer reliability over longer periods. Assessments included BESS (n = 5), instrumented BESS (n = 2), modified BESS (mBESS) [double leg, single leg, and tandem stance on firm ground] (n = 3), instrumented mBESS (n = 1), single leg stance (n = 2), instrumented single leg stance (n = 1), double leg balance using accelerometers (balance accelerometry measure (BAM)) (n = 2), double leg balance on a portable force plate (balance tracking system) (n = 1), double- and single-leg balance (SWAY balance mobile application) (n = 1), and the Sensory Organization Test (SOT) (n = 2). The BESS demonstrated sufficient reliability when conducted with one trial (ICC = 0.60–0.78). However, reliability was improved when double leg stance was removed and 2–7 trials were performed (ICC = 0.83–0.94). Instrumented BESS using a force plate and Wii Balance Board (0.88–0.89) and the balance tracking system (ICC = 0.92) also displayed sufficient reliability over seven- and 15-day periods, respectively [31, 32]. The BESS and mBESS showed improved reliability with increased number of trials [33]. It is imperative to note that, while studies report improved reliability with increased number of trials, these assessments are routinely performed only once in clinical practice. In summary, a minimum of 2-trials on 4 conditions (excluding double leg variations) of the BESS displayed sufficient test–retest reliability over a seven day period [34]. The balance tracking system utilising a force plate also displayed sufficient reliability in addition to offering clinicians more in-depth, objective analysis [31].

Dynamic Balance Assessments

For dynamic balance assessments, test–retest correlations ranged from 0.32 to 0.99, with measurement property quality ranging from doubtful to adequate. Outcome variables included time, number of errors, COM displacement, and COP displacement. Time between assessments ranged from same day to 11 months, with a median of seven days, with a tendency for poorer reliability over periods greater than 10-days. Assessments included instrumented Y-balance test (n = 1), clinical reaction time (n = 1), instrumented limits of stability test (n = 2), and the dynamic postural stability index (DPSI) (n = 1). The most reliable assessments were the instrumented Y-balance test (ICC = 0.76 to 0.99), which performed same-day test–retest reliability [35] and the instrumented limits of stability test (ICC = 0.95 to 0.96), with tests conducted seven days apart [36]. Both assessments provided clinicians with consistent objective measures across trials.

Gait Assessments

For gait assessments, test–retest correlations ranged from 0.10 to 0.99, with measurement property quality ranging from doubtful to adequate. Outcome variables for non-instrumented assessments included time or number of errors. Instrumented assessments reported COM displacement, COP displacement, and spatio-temporal metrics. Time between assessments ranged from same day to 11 months, with a median of seven days and a tendency for poorer reliability over periods greater than two weeks. Assessments included tandem gait (n = 6), instrumented gait (n = 7), instrumented dual-task gait (n = 2) dual-task tandem gait (n = 2), instrumented dual-task tandem gait (n = 2), timed up and go (TUG) (n = 1), and walking on a balance beam (n = 1). Most gait assessments displayed sufficient test–retest reliability; however, non-instrumented assessments displayed insufficient reliability across periods extending greater than two months. Instrumented gait assessments (e.g. normal, tandem, and dual task gait) utilising force plates or inertial measurement units (IMU) were most consistent across time points extending to eight months. Outcome variables including step length, step time, and gait velocity were most reliable.

Inter-Rater Reliability

Correlations for inter-rater reliability of non-instrumented assessments performed on healthy controls ranged from 0.20 to 0.99, with measurement property quality adequate for all studies. Static balance assessments included BESS (n = 4), which ranged from 0.20 to 0.96 when using 3 assessors [32, 37, 38], and mBESS (n = 2), with reliability ranging from 0.80 to 0.83 using 2 and 3 assessors, respectively [39, 40]. Gait assessments included tandem gait (n = 1), TUG (n = 1), and walking on balance beam (n = 1). The TUG demonstrated best inter-rater reliability (ICC = 0.99) amongst two assessors. Other assessments consisted of the military-specific task run-roll-aim (n = 1), with reliability ranging from 0.28 to 0.89 [41].

Validity

The validity of 32 different assessments was reported across 35 studies; 17 studies assessed the validity of more than one test. Assessments were categorised into static balance (n = 21 studies, 13 different assessments), dynamic balance (n = 8 studies, 8 different assessments), gait (n = 13 studies, 8 different assessments), or other (n = 3 studies, 2 different assessments), and analysed either construct (n = 30) or known-group validity (n = 8 studies). Studies were conducted on adults (n = 24) and minors (n = 11), with a total sample size of 1417 concussed and 1616 control participants. A summary of study characteristics is presented in Tables 2, 3, 4 and 5. A full table of study characteristics is presented in Additional file 1: Table S8 through to Table S15.

Construct Validity

Static Balance Assessments

Outcome variables for non-instrumented static assessments included time or number of errors. Instrumented assessments reported COM displacement, and COP displacement using force plates, IMUs, smartphones, or laboratory equipment. Time since concussion ranged from 24 h to eight months, with a tendency for insufficient sensitivity as time increased. Assessments included the BESS (n = 3), instrumented BESS (n = 2), balance accelerometry measure (BAM) (n = 1), mBESS (n = 7), instrumented mBESS (n = 4), SOT (n = 3), balance tracking system (n = 1), modified clinical test of sensory interaction in balance (MCTSIB) (n = 1), instrumented MCTSIB (n = 1), Phybrata system (n = 1), and virtual reality static balance (n = 1). On average, non-instrumented assessments, BESS and mBESS displayed sufficient sensitivity when conducted within 48 h of sustaining a concussion [42, 43]. However, sensitivity was insufficient when conducted beyond this period, and up to two months post-concussion [44]. Instrumented BESS displayed sufficient sensitivity up to six months post-concussion [45]. Virtual reality balance and Phybrata system displayed sufficient sensitivity at 10- and 30-days, respectively, and are a promising alternative to current assessments if equipment is made more readily available for clinicians [46, 47].

Dynamic Balance Assessments

Outcome variables for non-instrumented assessments included time, heart rate, or number of errors. Instrumented assessments reported COM displacement, COP displacement, or reach distance using force plates, IMUs or laboratory equipment. Time since concussion ranged from 24 h to eight months, with a tendency for insufficient sensitivity as time increased. Assessments included physical and neurological examination of subtle signs (PANESS) (n = 1), community balance and mobility scale (n = 2), Kasch pulse recovery test (KPR) (n = 1), instrumented Y balance test (YBT) (n = 1), battery assessments (n = 2), Computer-Assisted Rehabilitation ENvironment (CAREN) system (n = 1). The KPR test displayed sufficient sensitivity when conducted on adolescents [48]. All assessments except for the battery assessments displayed sufficient sensitivity for adult populations. However, only the PANESS assessment reported time since concussion, with sufficient sensitivity up to 14-days post-concussion.

Gait Assessments

Outcome variables for non-instrumented gait assessments included time to complete or number of errors. Instrumented assessments provided more objective outcomes, including COM displacement, step length, step time, cadence, anterior–posterior and medio-lateral accelerations, and gait velocity using pressure sensitive walkways, IMUs, smartphones, or other laboratory equipment. Time since concussion ranged from same day to three years, with a tendency for insufficient sensitivity as time increased. Assessments included functional gait assessment (n = 2), tandem gait (n = 5), complex tandem gait (n = 1), dual-task tandem gait (n = 3), dual-task gait (n = 1), instrumented gait (n = 3), instrumented dual-task gait (n = 3), and battery of gait assessments (n = 1). In general, sensitivity remained sufficient for up to two weeks for instrumented assessments and seven days for non-instrumented assessments. Time to complete task was the primary outcome measure for non-instrumented assessments.

Other Assessments

Other assessments included a military-specific assessment, the Warrior Test of Tactile Agility (n = 1). This assessment was performed two years post-concussion and required participants to perform various motor tasks including: forward/backward run, lateral shuffle, combat roll, and changes in position (e.g. lying to standing). The lowering and rolling movements within the assessment battery demonstrated sufficient AUC (0.83) [49].

Known-Group Validity

For known-group validity, static balance included paediatric clinical test of sensory interaction in balance (PCTSIB) (n = 1), mBESS (n = 2), virtual reality balance (n = 1). Outcome variables were time and number of errors for non-instrumented assessments. Instrumented versions assessed COP displacement. Time since concussion averaged 7 days for all assessments. Dynamic assessments included Bruininks–Oseretsky test of motor proficiency (n = 1), and Postural Stress Test (PST) (n = 1). Outcome measures for PST assessed weight required for counterbalance. Bruininks–Oseretsky test of motor proficiency measured number of errors and time to complete. Both assessments were conducted at 1-week and 3-month time periods. Gait assessments included tandem gait (n = 1), dual-task tandem gait (n = 1), gait (n = 1), instrumented gait (n = 1), dual-task gait (n = 1), instrumented dual-task gait (n = 1). Time since concussion ranged from seven days to three years. Other assessments included the run-roll-aim task (n = 1) and the Portable Warrior Test of Tactile Agility (n = 2). Both mBESS and virtual reality static balance showed significant between-group differences when conducted within 10-days of sustaining a concussion. Both dynamic assessments displayed significant between-group differences up to three months post-concussion. However, reliance on specialised equipment reduces their feasibility for clinicians. Gait assessments include single- and dual-task tandem gait, and gait also showed significant between-group differences when conducted seven days post-concussion.

Athletes from contact and non-contact sports (n = 2533; 97%) were included, as well as military personnel (n = 83; 3%) who had been diagnosed with concussion. The most common test was the mBESS, representing 16% of all tests.

Measurement Error

The measurement error of 13 lower-limb motor assessments was assessed over 10 different studies. Quality ranged from adequate to very good. Assessments were categorised into static balance (n = 5), dynamic balance (n = 2), and gait (n = 6). Static balance assessments included BESS (n = 1), instrumented BESS (n = 1), SOT (n = 1), instrumented SWAY balance (n = 1), instrumented single leg stance (n = 1). Studies reported the standard error of the measure (SEM), limits of agreement (LOA), or minimal detectable change (MDC). The instrumented BESS (SEM = 0.04–0.45) and instrumented single leg stance (SEM = 0.49–2.97) displayed the lowest SEM [37, 64]. Dynamic assessments included the instrumented limits of stability test (n = 1) and the DPSI (n = 1). Both single-task (SEM = 0.0047–0.023) and dual-task (SEM = 0.004–0.019) variations displayed the lowest SEM [64]. Gait assessments included tandem gait (n = 2), dual-task tandem gait (n = 2), instrumented dual-task tandem gait (n = 1), instrumented gait (n = 4), instrumented dual-task gait (n = 1), and a gait battery assessment (n = 1). All gait assessments displayed low SEM across trials, therefore promoting the use of instrumented or non-instrumented gait assessments as acceptable tools to measure motor changes. A summary of study characteristics is presented in Table 2. Full details of the studies’ characteristics are presented in Additional file 1: Table S16 through to Table S18.

Discussion

This systematic review aimed to [1] consolidate the reliability and validity of motor function assessments across the time course of concussion management, and [2] summarise their feasibility for clinicians and other end-users. In general, instrumented assessments providing objective analysis tended to offer superior reliability and validity compared with non-instrumented, subjective assessments, but may not be feasible for all users. Gait-based assessments showed the best reliability, with instrumented methods offering a range of outcome variables. Sensitivity is improved with an objective method of assessing performance, on more complex tasks, and during the acute stages of injury. Non-instrumented assessments offer greater practical utility, but this may be at the expense of reliability and validity, particularly beyond two weeks post-concussion. Overall, each assessment had limitations, and practitioners should be mindful of these when selecting the most appropriate assessment for their setting. However, best practice encourages practitioners to use a variety of assessments within a battery to accurately assess the multitude of symptoms experienced. Solely relying on a single-assessor, subjective diagnostic test to guide the RTP or return-to-duty process should be avoided. When selecting appropriate assessments and interpreting results, reliability, validity, and feasibility should be considered. Where possible, practitioners should use instrumented assessments for which the error, reliability and validity have been established, and a range of outcome variables can be monitored.

Reliability

In general, objective testing from instrumented assessments offered greater test–retest reliability compared with subjective. Instrumented assessments also offer clinicians more detailed measures of motor function, thus providing a more comprehensive analysis of readiness for RTP [97].

Static Balance Assessments

Test–retest reliability for static assessments varied between subjective and objective measurements. In general, non-instrumented assessments relying on subjective interpretation, such as the BESS and mBESS, displayed insufficient reliability across multiple testing points ranging from two days to 20 months [33, 34, 50, 52, 53]. However, improved reliability was reported for both of these assessments when an increased number of trials was performed and a minimum of two assessors were present [33]. Due to a suggested learning effect associated with the BESS and mBESS, it was found that allowing a practice trial followed by 2–3 subsequent test trials produced the best reliability, taking around 10 min to administer [33]. The BESS displayed greatest test–retest reliability when more than 2 trials of 4 conditions (excluding double leg stance) were performed [34]. This differs from standard practice, where practitioners are to perform a single assessment as a means of evaluating balance deficits. Although this approach is more feasible for clinicians, only one study displayed sufficient reliability with one trial (r = 0.78) [32], with other studies showing greater reliability with multiple trials [33, 34]. Best practice would be to perform multiple trials as a single trial likely jeopardises the reliability of the assessment, limiting its justification for inclusion. Therefore, clinicians need to decide which takes priority; reliability of the measure, or practicality of its implementation. Differences in interpretation of errors between assessors also contribute to the insufficient reliability of these tools [98]. These differences between assessors may be exacerbated when performed on concussed individuals during the acute stage of injury due to an increased number of balance errors offering a greater capacity for disagreement to occur. Previous findings have shown that making recommendations based on the average of 3 different clinicians’ assessments and providing clear guidelines on how to administer and score the test may assist in improving reliability [98], although this may not be viable in many practical settings. Instrumented static balance assessments that offered objective outcomes displayed sufficient reliability, with the instrumented BESS, balance tracking system, and BAM superior to other instrumented static assessments. Of these the BAM, utilising accelerometers, may be a more feasible and cost-effective option for clinicians as opposed to using force plates. Being aware of the inherent noise and the MDC of these assessments is vital for making decisions on changes in performance. For example, the BESS has shown MDC of 7.3 errors for test–retest [37]; however, studies have shown that an average of 3–7 errors is typically performed post-concussion [13, 99]. Therefore, the test may lack the sensitivity to detect important balance deficits beyond the acute stages of injury. Instrumented static assessments (i.e. with a force plate or IMU) should be selected over non-instrumented methods wherever possible. If practitioners are working in settings that only permit non-instrumented, static assessments, they should ensure that there is sufficient familiarisation prior to scoring, use multiple assessors, and ensure that there are clear scoring guidelines. If these criteria cannot be met, justification for conducting the assessment beyond diagnosis should be scrutinised in future standardised assessment protocols.

Dynamic Balance Assessments

Few studies analysed the reliability of dynamic assessments, with results favouring the use of dynamic assessments over static. Only one study assessed the reliability of a non-instrumented dynamic motor response assessment with clinical reaction time (modified drop-stick test) [50]. While this study demonstrated insufficient test–retest reliability (ICC = 0.32) over an 11-month timeframe [50], reliability may be improved over shorter time periods. Instrumented dynamic assessments, on average, displayed clinically acceptable reliability (r = 0.32 to 0.99) when conducted within 10-days. Force plates sampling at 100–1200 Hz were shown to be useful when assessing postural sway [36, 64], but may not be readily available for all clinicians. Alternatively, IMUs also demonstrated sufficient reliability during the Y balance test (ICC = 0.76–0.99) [35] and may be a more feasible option for clinicians. For those who do not have access to the required equipment, non-instrumented gait assessments are recommended.

Gait Assessments

In general, gait assessments were seen to have the greatest test–retest reliability when compared to static and dynamic balance tests. Non-instrumented tandem gait assessments focusing on temporal gait parameters (i.e. time to complete, cadence) showed sufficient reliability across most studies [59, 60, 82,83,84]. However, test–retest reliability was insufficient when conducted beyond two months. This presents an issue when relying on pre-season baseline testing of tandem gait (such as during the SCAT6 protocol [100]) to interpret post-concussion scores. Therefore, if subjective assessments are to be used, it is recommended that practitioners are aware of the reliability and conduct baseline assessments in line with these timepoints.

Instrumented gait assessments assessing temporal and spatial (i.e. stride length) gait parameters also demonstrated sufficient reliability. Lumbar and foot-mounted IMUs were clinically acceptable and offer clinicians an inexpensive and reliable alternative to laboratory equipment [86,87,88]. Smartphone apps measuring movement vectors also displayed sufficient test–retest reliability when firmly positioned on the body [86, 88, 89, 93], but exhibited insufficient reliability when held in the hand. Measures of step length, step time, gait velocity, and cadence when derived from placement at the lumbar spine, or pelvis (anteriorly via belt) were most reliable [89]. The use of laboratory equipment such as 3D motion capture or a GAITRite system also displayed sufficient reliability across trials [89, 90]; however, the associated equipment costs and expertise requirements reduce the feasibility of these tools in most situations. Feasibility is also compromised due to the difficulty in obtaining baseline pre-injury scores, meaning normative or control comparisons are needed. Researchers should aim to develop a more readily available means of capturing pre-concussion baseline scores using commercially available technologies such as smartphones, IMU and global navigation satellite systems (GNSS) devices.

Considerations

Clinicians should be encouraged to implement dynamic balance or gait-based assessments as a part of a comprehensive and multifaceted concussion assessment approach, due to their higher test–retest reliability than static approaches. As previously mentioned, consistency across trials allows variations in motor strategies to be more easily detected, when a concussion is sustained [25]. Multiple trials, with the average taken, should be completed if performing non-instrumented static assessments [33], with the assessment made by multiple clinicians, in preference to one to minimise noise in the measurement and allow for smaller changes in performance to be detected as real changes [98]. Additionally, clinicians should also be mindful of time between repeated measures. Objective measures drawn from instrumented assessments provide better test–retest reliability, place less pressure on the clinician, and limit the ability of players to hide symptoms. The use of more clinically practical tools such as IMUs or smartphones, which are reliable for use in dynamic and gait-based tasks [35, 86,87,88,89], should be encouraged.

Validity

Validity ratings of assessments ranged from sufficient to insufficient based on COSMIN guidelines [29]. In general, dynamic balance and gait assessments offered greater validity when compared with static assessments. However, validity was compromised across all assessments as time since concussion increased beyond seven days, which is likely an artefact of partial or complete recovery from the concussion beyond this point.

Static Balance Assessments

Construct validity for static assessments varied, with instrumented assessments offering better validity when compared with non-instrumented. The commonly used subjective assessments BESS and mBESS displayed insufficient ability to discriminate between groups when performed more than 48 h post-concussion, but had sufficient validity when performed within 24 h [42,43,44, 54,55,56,57]. Therefore, these assessments may aid in diagnosis; however, caution should be applied if implementing as part of a RTP protocol. Traditional models of SRC management include the assessment of subjective static balance (mBESS) to assist with decisions regarding RTP [16, 97]. Whilst instrumenting these assessments with a force plate or IMU improves sensitivity, they are still limited beyond two weeks post-injury [45, 66]. Motor function entails a complex hierarchy of integration between systems and therefore needs to be assessed along a spectrum of varying complexity [97]. During the acute stages, athletes demonstrate a significant increase in errors when performing the mBESS, but return to baseline 3–5 days post-concussion [97, 101]. Due to the gross outcome measures and suggested learning effect, it is believed that these assessments are unable to challenge the sensorimotor system to identify any underlying deficits in motor function [97]. Further, these simple static tasks are not reflective of the complex dynamic athletic tasks performed, such as running and tackling.

Virtual reality static balance using a 3D projection system displayed sufficient ability to discriminate between concussed and non-concussed (0.857) when conducted 10-days post-concussion [47]. This highlights the promise of the use of virtual reality technology in monitoring concussion symptoms, although the equipment is not readily available in most practical settings, thereby reducing its feasibility.

Dynamic Balance Assessments

In general, dynamic balance assessments displayed better construct validity than static balance assessments. However, these were still limited beyond two weeks post-concussion. Findings highlighted the importance of test selection relative to the population being assessed. In particular, the KPR test displayed sufficient sensitivity for children and may be a feasible option for assessing readiness for RTP in this population [48]. The PANESS, community balance and mobility scale, and instrumented Y balance test all demonstrated sufficient sensitivity in adult populations (0.76 – 1.00) when conducted within two weeks post-concussion [75, 76, 80, 81]. Like static assessments, these tasks are unlikely to challenge the neuromuscular system beyond the acute stage of injury. Using them to monitor changes across a graduated RTP protocol may not be best practice, particularly in concussions where symptoms persist beyond two weeks.

Gait Assessments

Validity of gait assessments varied amongst studies. The functional gait assessment ranged from insufficient to sufficient sensitivity (0.05–0.75) [80, 81], with higher sensitivity found when performing the assessment within one week post-concussion. Therefore, clinicians should be cautious if implementing this assessment tool beyond this time. Assessment of gait speed during normal and tandem gait, in general, demonstrated sufficient sensitivity when conducted within 1 week post-concussion [43, 54, 55, 58, 80, 81, 84]. Dual-task gait displayed sufficient sensitivity for children when conducted within two weeks of sustaining a SRC [85]. However, clinicians should be mindful of using gross measures of gait (e.g. time taken), due to the lack of outcome measures provided. The addition of a cognitive task (dual-task) improved sensitivity for most studies [54, 58, 84] when completed 1 week post-concussion [56]. Instrumented gait assessments had mixed results. Assessment of single- and dual-task gait using lumbar and foot-mounted IMUs amongst adult populations within five days of sustaining a concussion demonstrated insufficient sensitivity for gait speed, cadence, and stride length when comparing to normative reference values [92]. However, measures of single-task gait velocity and cadence using a smartphone affixed to the lumbar spine demonstrated sufficient AUC and between-group differences for adolescent populations with concussion when conducted one week post-injury (0.76–0.79) [58]. Like tandem gait assessments, the addition of a cognitive task improved sensitivity. Dual-task conditions aim to highlight potential deficits in attention allocation and executive function. Typically, these are observable through increased errors in a cognitive task, or variability in gait tasks [102]. Although these assessments tend to provide greater sensitivity than single-task versions, limitations still exist beyond two weeks post-concussion [95]. The use of a virtual reality system three months post-concussion displayed sufficient AUC (0.79–0.84) and significant between-group differences for reaction time and lateral movement asymmetries during a reactive movement task [94]. However, further research is warranted due to the small sample size used within this study. Additionally, the need for normative data currently reduces the utility of this assessment. An instrumented battery gait assessment conducted one week post-concussion, consisting of gait velocity, cadence, tandem gait time, and dual-task tandem gait time displayed sufficient sensitivity and specificity when all measures were combined (AUC = 0.91) [58]. However, time taken to conduct may be a barrier. Clinicians are encouraged to implement gait assessments where possible due to their ability to better classify those with and without SRC. Instrumented versions using laboratory equipment or more feasible tools such as IMUs or smartphones are the preferred option.

Other Assessments

The military-specific run-roll-aim assessment demonstrated statistically significant differences between concussed and control participants for ability to complete the task within two weeks post-concussion [41]. No differences were found between total time, number of correct targets identified, or delay in reaction time for cognitive stimulus, otherwise referred to as Stroop effects committed. The Portable Warrior Test of Tactile Agility demonstrated statistically significant differences in time to complete for both single-and dual-task variations [96]. The instrumental version of this assessment, utilising IMUs, demonstrated sufficient ability to discriminate between concussed and control during the ‘lowering and rolling’ movements (AUC = 0.83) [49]. No statistically significant differences for other portions of the assessment were seen.

Considerations

In general, instrumented assessments demonstrated a better ability to discriminate between concussed and non-concussed individuals. Measures of static balance were more accurate via the use of force plates [45, 61] or a 3D virtual reality projection system [47]. However, limitations surrounding suggested learning effects, and the utility of these devices, such as costs and low ecological validity, does limit their application throughout the management process following concussion. Both instrumented and non-instrumented dynamic balance assessments displayed sufficient sensitivity when conducted within two weeks post-concussion, therefore offering cost effective and more objective options for clinicians. Assessing time to complete on dual-task tandem gait was shown to be a sensitive and cost-effective assessment that clinicians could easily implement if access to instrumented versions is not feasible [54, 58, 84]. However, this does not provide clinicians with a variety of outcome measures, nor does it have any use beyond the acute stages of concussion [1, 103].

In general, sensitivity of assessments reduced as time from initial injury increased, which is unsurprising given the varied time course of recovery between individuals. Furthermore, sensitivity of both static [45, 73] and gait [95] assessments was reduced beyond two weeks post-concussion, meaning clinicians must be cautious when using these assessments as a RTP measure beyond this timeframe. Athletes returning to play following a concussion have shown an increased risk of acute musculoskeletal injury [21, 104, 105]. It is suggested that subclinical neuromuscular deficits may linger beyond expected recovery timeframes, but due to poor assessment availability and limited research surrounding best-care concussion management, many of these changes go undetected [21, 97, 104, 105]. This review provides clinicians with reliability and validity measures of assessments to allow a more educated selection of tests. However, it also highlights the problems with concussion management protocols, specifically the over-reliance on tools not initially designed to inform RTP decisions.

Feasibility and Utility

This review aimed to summarise the reliability and validity of lower-limb motor assessments for the management of SRC. However, what should not be overlooked is the clinical utility and feasibility of such assessments and their seamless integration within a RTP or return-to-duty protocol. Aside from the reliability and validity of a measure, stakeholders must also consider other factors such as interpretability of outcomes, cost of equipment, expertise required, and time needed for implementation and analysis of results, when developing assessment protocols. In general, instrumented assessments demonstrated better test–retest reliability across multiple time periods as well as better ability to discriminate between concussed and non-concussed individuals. Of these, laboratory assessments using force plates, 3D motion capture, or pressure-sensitive walkways provided clinicians with more accurate objective measures. However, these display low ecological validity for the assessment of field-based sports due to the controlled environmental conditions [103] and lack of flexibility in tasks that can be performed and therefore may have poor crossover to the stochastic nature of sports competition. Equipment and facility requirements are typically associated with high cost and therefore not feasible for most team-sports [103]. Furthermore, the need for trained personnel to collect and analyse the data may act as further barriers to their uptake within practice.

Other tools used for instrumented assessments included IMUs and smartphone devices. These tools were shown to have better test–retest reliability and validity for most assessment categories (static, dynamic, gait). Studies included in this review assessed the validity and reliability of lumbar and foot-mounted IMUs [35, 86,87,88]. Test–retest reliability for dynamic and gait assessments using these devices were similar to those from laboratory assessments. Similar findings were associated with the use of smartphone devices, displaying sufficient test–retest reliability during gait assessments [71, 89]. Although they achieved poorer validity than laboratory equipment, IMUs and smartphone devices offered clinically acceptable validity, specifically during dynamic balance and gait assessments [58, 76, 79, 92, 95]. In regard to interpretability of results, cadence and gait velocity metrics derived from IMUs and smartphones displayed sufficient ability to discriminate between concussed and non-concussed. Typically, these measures are made readily available for clinicians when using the appropriate software for the respective device and therefore avoid the need for additional analysis. As such, the lower cost, autonomy for analysis, and greater portability of these devices may improve their uptake in the field. These devices may offer practitioners the ability to identify at-risk individuals who require further investigation through more in-depth assessments. Efforts should be made to make these instrumented assessments more feasible for end-users without compromising reliability or validity. Utilising technologies such as IMUs embedded in current wearable technologies (e.g. GPS units, smartphones and watches) should be explored further.

Conclusions

Based on the findings from this review, clinicians are encouraged to implement instrumented or non-instrumented dynamic balance or gait assessments as part of a battery of assessments and not in isolation. Instrumented assessments utilising more complex gait tasks should be encouraged to add resolution to existing RTP protocols. On average, static assessments displayed insufficient test–retest reliability and validity for the management of SRC. If practitioners do not have the resources to perform instrumented tests, it is recommended that they consider the reliability and validity issues that potentially limit the simpler test options. Future research should aim to establish standardised protocols and best practice for monitoring motor function during the RTP period and beyond. Developing the use of accessible technologies such as IMUs, smartphones and the use of marker-less tracking to monitor gait function is an important step for concussion management. Furthermore, understanding how movement changes under more context-specific scenarios, where fatigue, decision-making, and the performance of more complex movements occur, is warranted.