The Validity and Reliability of Commercially Available Resistance Training Monitoring Devices: A Systematic Review

Monitoring resistance training has a range of unique difficulties due to differences in physical characteristics and capacity between athletes, and the indoor environment in which it often occurs. Traditionally, methods such as volume load have been used, but these have inherent flaws. In recent times, numerous portable and affordable devices have been made available that purport to accurately and reliably measure kinetic and kinematic outputs, potentially offering practitioners a means of measuring resistance training loads with confidence. However, a thorough and systematic review of the literature describing the reliability and validity of these devices has yet to be undertaken, which may lead to uncertainty from practitioners on the utility of these devices. A systematic review of studies that investigate the validity and/or reliability of commercially available devices that quantify kinetic and kinematic outputs during resistance training. Following PRISMA guidelines, a systematic search of SPORTDiscus, Web of Science, and Medline was performed; studies included were (1) original research investigations; (2) full-text articles written in English; (3) published in a peer-reviewed academic journal; and (4) assessed the validity and/or reliability of commercially available portable devices that quantify resistance training exercises. A total of 129 studies were retrieved, of which 47 were duplicates. The titles and abstracts of 82 studies were screened and the full text of 40 manuscripts were assessed. A total of 31 studies met the inclusion criteria. Additional 13 studies, identified via reference list assessment, were included. Therefore, a total of 44 studies were included in this review. Most of the studies within this review did not utilise a gold-standard criterion measure when assessing validity. This has likely led to under or overreporting of error for certain devices. Furthermore, studies that have quantified intra-device reliability have often failed to distinguish between technological and biological variability which has likely altered the true precision of each device. However, it appears linear transducers which have greater accuracy and reliability compared to other forms of device. Future research should endeavour to utilise gold-standard criterion measures across a broader range of exercises (including weightlifting movements) and relative loads.


Abstract
Background Technology has long been used to track player movements in team sports, with initial tracking via manual coding of video footage. Since then, wearable microtechnology in the form of global and local positioning systems have provided a less labour-intensive way of monitoring movements. As such, there has been a proliferation in research pertaining to these devices.
Objective A systematic review of studies that investigate the validity and/or reliability of wearable microtechnology to quantify movement and specific actions common to intermittent team sports.
Methods A systematic search of CINAHL, MEDLINE and SPORTDiscus was performed; studies included must have been (1) original research investigations; (2) full-text articles written in English; (3) published in a peer-reviewed academic journal; and (4) assessed the validity and/or reliability of wearable microtechnology to quantify movements or specific actions common to intermittent team sports.
Results A total of 384 studies were retrieved and 187 were duplicates. The titles and abstracts of 197 studies were screened and the full-text of 88 manuscripts were assessed. A total of 62 studies met the inclusion criteria. An additional 10 studies, identified via reference list assessment, were included. Therefore, a total of 72 studies were included in this review.
Conclusion There are many studies investigating the validity and reliability of wearable microtechnology to track movement and detect sport specific actions. It is evident that for the majority of metrics, validity and reliability is multifactorial, in that it is dependent upon a wide variety of factors including wearable technology brand and model, sampling rate, type of movement performed (e.g. straight-line, change of direction) and intensity of movement (e.g. walk, sprint).
Practitioners should be mindful of the accuracy and repeatability of the devices they are using when making decisions on player training loads.

Key Points
• Wearable microtechnology validity and reliability is dependent upon a wide variety of factors including brand, sampling rate, type of movement performed and intensity of movement.
• When making decisions on player training loads, practitioners should bear in mind the accuracy and precision of the devices they are using when (1) determining which metrics to track; (2) progressing or regressing an individual's training; (3) providing 'top up' sessions to players based on comparisons to planned loads or other players.
• Global navigation satellite systems (GNSS) generally possess suitable validity for measuring distance during team sport movements; while validity can be compromised when straight-line and frequent change of direction movements are performed in isolation for devices with a sampling rate < 10-Hz.
• Practitioners should utilise GNSS with a sampling rate ≥ 10-Hz to minimise the error associated with distance measures, particularly when movements are performed in isolation (e.g. during rehabilitation drills).
• Global navigation satellite systems generally possess suitable validity for measuring peak velocity during straightline sprinting.
• Local positioning systems appear to be a suitable alternative to GNSS for measuring common metrics (e.g. total distance, average speed), as long as they are set-up correctly, although further research must be performed to establish the true validity and reliability of these systems for other measures (e.g. peak velocity).
• Intra-device reliability is poorly researched; these studies report a combination of biological and technological variation (intended measure) of the device. As such, the true intra-device reliability is difficult to determine in most instances.

Background
The importance of tracking athlete training intensity and volume to manage fatigue [1], fitness [2][3], injury [4][5] and performance [6][7] has been well established. Subjective ratings of exertion and heart rate are collected to provide an indication of an athlete's internal response to training [8], while player movements have historically been tracked via manual coding of video footage [9][10] or with semi-or fully automated systems to gain an understanding of the amount of training performed (i.e. external training load). However, the limitations associated with these tracking tools led to the development of wearable microtechnologies that allow for numerous metrics to be collected, and measured in both realtime, and downloaded following each session; helping quantify the external loads that athletes are exposed to [11]. Since wearable microtechnologies were introduced to track players' movements, they have become central to sport science, with GNSS, local positioning systems (LPS) and inertial measurement units (IMU) all used across a variety of sports.
Sports that commonly use GNSS and LPS technology to track external loads include rugby league, rugby union, soccer, Australian football, American football, basketball and netball [12]. Total distance, velocity-based threshold distance, velocity (peak, instantaneous, average), accelerations and decelerations are commonly collected metrics [12][13]. The majority of GNSS devices are equipped with a triaxial accelerometer (typically 100-Hz) capable of measuring acceleration in three axes (x, y, z) to compute a composite vector magnitude (g force) [12], termed accelerometer load. Some devices also include gyroscopes and magnetometers, which coupled with the accelerometer and termed IMUs, have been used to develop algorithms for the autodetection of sport specific events such as physical collisions in rugby league [14][15], scrum, ruck, and one-on-one tackle detection in rugby union [14,16], and balls bowled in cricket [17]. Given that GNSS, LPS and IMU tracking devices house multiple sensors collecting various information, they can be collectively referred to as wearable microtechnology.
Over the last decade, there has been a proliferation in research investigating the association between external training load (measured by wearable devices) and player injury risk [4,[18][19][20][21], physical fitness [2,22], in-season availability [23], match activity [24] and technical performance [6,24]. In turn, practitioners are using the information collected by these devices to minimise injury risk, while increasing physical fitness, in-season availability, physical match activities and technical performance. Therefore, it is important that these devices are both valid and reliable in their measurements, allowing stakeholders to make well-informed decisions.
The validity of an instrument is defined as its ability to measure what it is intended to measure with accuracy and precision [25]. This is typically quantified by comparing the output of the respective instrument to the 'gold-standard' or criterion measure. Typical measures of validity include bias (relative and absolute), standard error of the estimate (SEE), standard error of measurement (SEM) and typical error (TE) expressed as a coefficient of variation (CV) [26]. However, when data is received as a time series, other measures such as the root mean square error (RMSE) and mean absolute error (MAE) can be used and expressed as a percentage.
The reliability of an instrument denotes its ability to reproduce measures on separate occasions when it is known that the measure of interest should not fluctuate [27]. Otherwise termed 'intra-device' or 'test-retest' reliability, this is important when tracking and identifying 'meaningful' changes over a specified period (i.e. within player). Further, when the measures of numerous devices are compared (i.e. a squad of players), 'inter-device' or 'between device' reliability is important.
Typical measures of reliability include TE expressed as a CV and intra-class correlations (ICC) [26]. Intra-class correlations quantify the association between two variables that have a permanent degree of relatedness [28], while CV describes the variability between multiple data sets [29].
In 2016, a review of the studies that had examined the validity and reliability of GNSS for quantifying team sport movements was conducted [26]. However, this review did not consider the validity and reliability of other common wearable microtechnology (i.e. LPS). Further, the advances that have been made in GNSS manufacturing since this review have seen numerous changes to these devices and a general increase in the number of units available to the consumer.
Given the steady growth of wearable microtechnologies, importance placed upon their output by practitioners, and the commensurate increase in research assessing the validity and reliability of these devices since this earlier review (pre-2017 = 86 studies vs. post-2017 = 76 studies), an updated review of the literature is warranted. Therefore, the aim of this review was to identify and appraise peer-reviewed studies that investigated the validity and/or reliability of wearable microtechnology to quantify movement and specific actions common to intermittent team sports.

Search Strategy
This systematic review was prepared in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [30]. The academic databases SPORTDiscus, CINAHL and Medline were systematically searched from earliest record to March 2020 to identify English-language peer reviewed original research studies that investigated the validity and/or reliability of wearable microtechnology to quantify movement patterns commonplace to intermittent team sport. Studies were identified by searching abstracts, titles and key words for pre-determined terms relevant to the scope of this review ( Table 1). All search results were extracted and imported into a reference manager (EndNote X9, Thomson Reuters, Philadelphia, PA, USA).

Selection Criteria
The duplicate studies were removed, and the titles and abstracts of all remaining studies were scanned for relevance by two authors (ZC & RJ). Studies that were deemed beyond the scope of the review were removed. The full text of the remaining studies were then assessed for eligibility. To be eligible for inclusion, studies must have (1) been original research investigations; (2) full-text articles written in English; (3) published in a peer-reviewed academic journal; and (4) assessed the validity and/or reliability of wearable microtechnology to quantify movement or specific actions common to intermittent team sports (e.g. rugby league, rugby union, Australian football, handball, basketball, soccer, cricket).
'Validity' and 'reliability' were defined using the definitions previously outlined in this review and elsewhere [25,27]. If it was deemed that a study did not meet the inclusion criteria, it was removed from the analysis. The reference list of all eligible studies was then manually searched for any studies that were not retrieved in the initial search. If a study was identified, it was subjected to the same assessment as previously described.

Data Extraction and Analysis
All relevant data were extracted into a Microsoft Excel (2016; Microsoft Corp, Redmond, WA, USA) spreadsheet by two of the authors (ZC & RJ). The data extracted from each study included: study type (e.g. validity or reliability), wearable device(s) used, sampling rate, movements performed, criterion measure (where relevant) and relevant findings (e.g. CV, bias). The heterogenous nature of the studies to be identified prevented further data analysis (e.g. meta-analysis). In addition, further analysis would require the extraction of the raw means ± SDs, which was not typically reported in interdevice reliability studies.

Research Quality Assessment
The quality of research was assessed by the same two authors (ZC & RJ) using a modified version of the Downs and Black checklist [31] ( Table 2). This method is valid for assessing the methodological quality of observational study designs [31] and has been previously used by systematic reviews pertaining to sport science [32]. Quality was assessed a total of either eight, nine or ten items depending on the study design (e.g. validity vs. validity and reliability). Items were scored on a scale from '0' (unable to determine, or no) to '1' (yes). Quality scores were expressed relative to the best attainable score for each respective study, in which "100%" indicates the highest study quality.

Identification of Studies
The systematic search retrieved a total of 384 studies in which 187 were removed as duplicates. The titles and abstracts of the remaining 197 studies were screened and in turn, 109 were deemed as clearly outside the scope of the review. As such, they were removed and the full manuscript of the remaining 88 studies were assessed. In turn, it was identified that 62 studies met the inclusion criteria. An additional 10 studies, identified via reference list assessment, were also included.
Therefore, a total of 72 studies were included in this review. The identification process is outlined in Figure 1.

Research Quality
The quality of the research investigating the validity and/or reliability of wearable microtechnology when assessed against a modified version of the Downs and Black checklist [31] ranged from a score of 64

Study Characteristics
The studies in this review investigated the validity and reliability of wearable microtechnology such as GNSS (n = 47 studies), LPS (n = 12 studies) and IMUs (n = 23 studies). The results of the studies examining the validity (n = 59 studies; Supplementary

Discussion
The aim of this systematic review was to identify and subsequently appraise studies that investigated the validity and/or reliability of wearable microtechnology to quantify movement and specific actions common to intermittent team sports.
Most validity studies identified in this review did not use, 'gold standard' (i.e. high-speed 3D motion capture systems [e.g. VICON [Oxford Metrics, Ltd, Oxford, United Kingdom]], radar) criterion measures. Thus, to establish the true validity of wearable microtechnology, this should be a focus of future research. In examining the findings of studies in this review, precedence should be given to those using 'gold-standard' criterion measures. Intra-device reliability was poorly researched, with studies relying on human participants to perform the exact same movement repeatedly, which is unlikely to occur. Consequently, the 'intra-device reliability' reported consists of both biological and technological variation, and the true intra-device reliability cannot be determined.
Given the heterogenous nature of the statistical analysis employed between studies, it is difficult to provide collective interpretations of validity and reliability. However, for the purpose of the review, validity and reliability were generally deemed 'suitable' or 'accurate' if the error or variation was below 10%, as seen in previous research pertaining to wearable microtechnology [26,33].

Validity
In the validity studies, statistical analysis (e.g. null hypothesis test, ICC, Pearson's correlation co-efficient) that does not provide sufficient detail about the magnitude and direction of error were used as the primary analysis in some instances (13.6% of validity studies). In addition, simply examining the average difference (i.e. bias) between two measures is also problematic when dealing with time series data. For example, the time series could fluctuate significantly above and below the true value, yet a small bias or error could be reported by the positives and negatives cancelling each other out. In turn, to suitably assess the validity (e.g. SEE, SEM, CV, RMSE) and reliability (e.g. CV) of wearables, an assessment of the residuals must be incorporated in the future.

Total distance
The results for this section are displayed in Supplementary Table 2 . One study has reported that a 5-Hz device is valid during some direction change protocols, however, the null hypothesis test to assess validity does not provide the magnitude of the error, which may in fact be substantial [40]. It appears that the velocity in which movements are performed also plays a role, with validity reducing as movement velocity increases [47,52].
Sampling rate is clearly important to the validity of a unit's measurements, with the margin of error generally smaller for devices that have higher frequencies (≥ 10-Hz), compared to 1-and 5-Hz devices. However, given the heterogenous nature of the studies in question, it is difficult to make direct comparisons. Indeed, it appears that most devices with sampling frequencies of ≥ 10-Hz are not heavily influenced by short straight-line movement [17,34,36 Although a true 'gold-standard' criterion was used, this study employed a null hypothesis test in isolation to quantify validity, and therefore the magnitude of any error cannot be ascertained. Thus, more consideration must be given to the findings of studies that use a suitable statistical analysis while making comparisons to a true gold-standard criterion [34].
Nonetheless, practitioners should utilise devices with a sampling rate ≥ 10-Hz to minimise the error associated with distance measures, particularly when movements are performed in isolation (e.g. during rehabilitation drills).
Therefore, it appears that LPS is not compromised by frequent change of direction [41, 60-65], short-distance movement [62][63][64][65] or high-movement velocities [41, [60][61][62][63][64][65]; findings that are in contrast to some traditional GNSS devices. Although it is encouraging that a system has been reported as accurate during match-play, the criterion (trundle wheel) method that was used in this study is vulnerable to human measurement error [66]. Nonetheless, these systems appear accurate during a variety of movements replicating match-play, when compared to 'gold-standard' 3D motion analysis [34,62,65]. Unlike GNSS however, LPS systems require careful set-up in line with the manufacturer recommendations. Indeed, when set-up 'sub-optimally' (i.e. system asymmetrical, small distance between nodes and testing area), the errors are much larger (bias = 15.0 -29.5%), compared to 'optimal' set-up (bias = 0.5 -1.8%) [63]. In terms of validity, LPS is a suitable and potentially superior alternative to GNSS for quantifying distance.

Velocity-based threshold distance
The results for this section are displayed in Supplementary Table 3.

Global Navigation Satellite System
Practitioners will regularly discretise data into velocity-based thresholds, such as low-, moderate-and high-speed activity.
The literature investigating the accuracy of GNSS to measure velocity-based threshold distance is small, with only three different devices examined. This is likely attributable to the expensive nature of the criterion system (high-speed 3D motion analysis) required. It is crucial that further research is conducted in this area, given practitioners frequently use 'high-speed' running metrics to make decisions about injury risk and prevention [4,20]. The issue with threshold-based distance, is the discretisation of a continuous variable into a categorical one, which can result in a large amount of information loss through over-simplification of data. Further, noise in the data can produce skewed results [67]. To maintain the accuracy of time series data, a specific algorithm should be used, but in most cases, is not [68]. While there is a plethora of statistical techniques available to discretise data, such as change point analysis, this methodical approach has typically not been used for GNSS timeseries data. This is potentially due to a lack of understanding of complex methods, where selecting the correct number of intervals or zones is a difficult task [69]. The accuracy of a device is also influenced by the validity of the segmentation algorithm used to discretise the time series data into specific activities (e.g. distance above 5.5 m . s -1 ). If the segmentation algorithm is inaccurate, this will impact the returned metrics. Therefore, the segmentation algorithm must also be validated to ensure that the distances we are measuring during the activities reflect those that we originally intended to examine.
In shuttle like activities (70 m bouts), 5-and 10-Hz devices can accurately measure distance that is covered while movement velocity is above 4.17 m . s -1 , with a significant reduction in validity for that above 5.56 m . s -1 [38]. An increase in sampling rate to 15-Hz does not appear to improve validity [34], with a large margin of error for across a range of different thresholds (RMSE = 3.7 -97.4%) during a team sport circuit, shuttle runs and small sided game [34]. This is potentially an issue for practitioners looking to monitor the distances their players cover at high speeds.

Local Positioning System
A LPS can accurately quantify distance that is covered within movement velocity thresholds of 0.28 -1.7 m . s -1 and 1.7 -4.2 m . s -1 , with a large reduction in validity (RMSE = 13.9 -207.1%) for distance captured when movement velocity is above 4.2 m . s -1 [34]. As such, it appears that the velocity-based threshold employed has a large influence on validity; decreasing as the threshold (i.e. movement velocity) becomes greater. Given only a single system has been examined, further research must be performed.

Peak velocity
The results for this section are displayed in Supplementary Table 4.

Global Navigation Satellite System
The accurate assessment of peak velocity is important given the association between high-speed exposures and injury risk [21]. The majority of devices appear to accurately detect peak velocity during a variety of straight-line [34, 40, 43-44, 51, 54, 58, 61, 70-75] and team sport protocols [34,49,51,55]. Although significant differences have been identified between 10-Hz devices and timing gates [53,55], the error in question is small (bias = < 2.5%), again highlighting the unsuitability of null hypothesis testing to assess validity.
Throughout change of direction protocols, it is unclear if 1-and 5-Hz devices are accurate with a mixture of findings reported [40,58]. While it may be that change of direction degrades validity, it is also likely that the velocity attained also plays a role. For example, the velocity achieved is much lower during change of direction protocols (4.9 m . s -1 ) [40, 58], compared to team sport circuits or straight-line sprints (6.8 m . s -1 ) [70], and therefore may have an influence on accuracy.
This issue appears to dissipate for devices with a sampling rate of 10-Hz and above [40].
The findings of studies using straight-line sprints are potentially more practically significant, given that the majority of peak velocities obtained during team sport match-play are obtained in open space (e.g. line break in rugby league), and often at critical match scenarios where minimal change of direction is required [76]. A significant limitation of 53.3% of studies is that timing gates are used as the criterion measurement; a method that is not capable of measuring peak velocity.
Timing gates simply provide a measure of time over a set distance (i.e. distance between gates) and therefore only calculate average speed. Future research should use high-speed 3D motion capture systems or laser guns as criterion measures. Given the current evidence, modern wearable devices appear appropriate for measuring peak velocity.

Local Positioning System
There is conflicting evidence about the validity of LPS to measure peak velocity among the literature [34, 51, 61-62, 65, 75, 77]. There is a large amount of error associated with these systems during straight-line movement (trial velocities 1.7 -5.3 m . s -1 ; bias = 11.8 -13.2%), as well as shuttle runs (RMSE = 11.3%) [34,77]. Contrastingly, a range of other systems have shown suitable accuracy (< 10%) during similar movements [34, 51, 61-62, 65, 75, 77], in particular straight-line sprinting (where true peak velocity is likely obtained) which should provide practitioners with confidence when interpreting peak velocity [61,77]. Given there has only been five systems assessed, further research must be conducted to truly establish measurement accuracy.

Instantaneous velocity
The results for this section are displayed in Supplementary Table 5.
However, when instantaneous velocity is assessed during specific components of a straight-line movement (e.g. timing gate splits, acceleration component, deceleration component), validity varies [74,80]. For example, validity is poorest during initial splits (CV = 13.1% vs. 0.9%) for 15-Hz devices [74] while 5-Hz devices are inaccurate during decelerations from high starting movement velocities (5 -8 m . s -1 ) [80]. Similarly, poor validity has been reported during accelerations performed while moving at a low continuous velocity (1 -3 m . s -1 ), while accuracy improves as continuous movement velocity increases (3 -8 m . s -1 ) [80]. Thus, high initial acceleration appears to compromise the validity of 5-and 15-Hz devices [74,80], with 10-Hz possessing superior validity [78,80]. Given that all team sports involve a large number of changes in pace, often performed at lower velocities [7], there may be an issue with using devices of sampling frequencies less than 10-Hz, for monitoring such movements.

Local Positioning System
The validity of instantaneous velocity measures from LPS have only been assessed for two systems (Clearsky T6, Inmotio) [34,63]. The two studies reported different results, which highlighted the influence that specific manufacturing parameters (e.g. software, hardware, data filters) can have on a system's outputs. Through a team sport circuit, shuttle run and small sided game, the 'Inmotio' system was accurate [34], but when isolated change of directions were performed at speed, there was a notable reduction in validity for the 'Clearsky T6' (bias = 33.5 -39.2%) [63], with a further reduction (bias = 74.4 -90.8%) when the system set-up was 'sub-optimal' (system asymmetrical, small distance between nodes and testing area) [63]. This suggests that repeated change of direction compromises the validity of these systems [63]. This is likely attributed to the large and frequent changes in velocity experienced during such movements, which the system then struggles to measure. Whilst more work is required on LPS, this is an issue for quantifying velocity during change of direction movements, that are common to intermittent sports. Moreover, the careful set-up of the system that is required limits the portability of these units.

Average speed
The results for this section are displayed in Supplementary Table 6.

Global Navigation Satellite System
There is minimal error for 1-Hz devices during long distance (487 m), team sport circuits [56]. However, when shortdistance straight-line movements are performed in isolation (e.g. ≤40 m), there is significant differences between the device and 3D motion analysis [58]. It is currently unclear if 5-Hz devices are accurate during similar movements, given conflicting findings [40, 58-59], while a variety of 10-Hz devices have shown suitable accuracy [17, 40-41, 79, 81].
Although the 'Polar Team Sensor' has shown error as high as 33% and 31% for back and chest-mounted sensors respectively [46], this device has not been investigated (n = 1) extensively, as have other devices.
When frequent change of direction is incorporated, validity is compromised for 1-, 5 60-63, 65, 77], shuttle activity [62] and team sport simulations [60][61]77]. The set-up of the system is paramount, with a large reduction in validity (bias = 14.7 -29.1%) for 'sub-optimal' set-ups (system asymmetrical, small distance between nodes and testing area), compared to 'optimal' (bias = 0.5 -2.8%) [63]. Indeed, it is important that practitioners understand the correct set up of each system to ensure validity.

Collision detection
The results for this section are displayed in Supplementary Table 7.

Inertial Measurement Unit
Collisions are detected by the accelerometer and gyroscope housed inside the wearable device, using software-embedded algorithms [15]. The ability to detect the occurrence of a collision is likely a useful load monitoring metric for contact sports, given their association with player fatigue [82][83]. During rugby league and rugby union match-play, devices containing 100-Hz accelerometers are able to accurately detect these events, with superior accuracy when collisions are 'heavy', rather than 'light' [15,[84][85][86].

Sport specific events
The results for this section are displayed in Supplementary Table 8.

Inertial Measurement Unit
Through software embedded and consumer developed algorithms, wearable devices that contain accelerometers, gyroscopes and/or magnetometers can be used to quantify sport specific events. Cricket bowling events can be detected during match-play (sensitivity = 99.5%, specificity = 74.0%) and training (sensitivity = 99.0%, specificity = 98.1%) [17].
Notably, there is a reduction in specificity (increased recording of false positives) during match-play, which may be attributed to a greater number of fielding events performed. In rugby union, algorithms for automatically detecting scrums, rucks and one-on-one tackles appears suitable for use in both training and competition [14]. Although this accuracy is manufacturer and sport specific, with a large number of false-positive (detected an event, the event didn't occur) tackle events identified during Australian football match-play [87].

Acceleration & deceleration-based metrics
The results for this section are displayed in Supplementary Table 9.

Global Navigation Satellite System
There are a variety of acceleration and deceleration derived metrics that are commonly used by practitioners in sport as a load monitoring technique. Generally, expensive high-speed 3D motion capture systems are required as a criterion; therefore, the literature is small.
Acceleration and deceleration (m . s -2 ) is generally derived from the GNSS chip housed inside the wearable, through measures of change in instantaneous velocity. In sporting applications, resultant accelerations are often classified into 'peak', 'average' and 'instantaneous' measures. These devices are currently unable to precisely quantify instantaneous acceleration, as well as distance covered when performing acceleration (> 3 m . s -2 ) and deceleration (< -3 m . s -2 ) efforts [34].
Raw average change of pace (termed, average acceleration) data extracted from 10-Hz devices are accurate [81], however when derived from the manufacturer's software, it appears to compromise validity [81]. This is likely attributable (at least in part) to the filters and smoothing methods applied to the raw data by different manufacturers. Therefore, it may be important to extract the raw data from the device when considering average acceleration measures. Although, it is likely that even this data has undergone some form of filtering already.

Inertial Measurement Unit
Alternatively, a more complex, but potentially accurate tool to quantify acceleration magnitude, or what is termed resultant acceleration, is through a 100-Hz tri-axial accelerometer, typically housed inside GNSS devices, which sums acceleration (g) in multiple axes (x, y, z) to compute a vector magnitude [88]. It is difficult to form a collective conclusion due to the hetergenous nature of studies investigating these measures, however it appears as though the filter and cut-off frequency applied to the raw data has a large influence [88][89][90]. Out of 6 -25-Hz filters, 10 -16-Hz filtered data all possessed suitable accuracy (CV < 10%) for measuring peak resultant acceleration during team sport actvities, with 12-Hz being optimal. [90]. Further, 5-Hz data with a complementary filter is superior during straight-line and change of direction for peak and average acceleration compared to 100-Hz, and 10-Hz data with a Kalman filter [89]. Despite being superior however, validity was still poor (CV > 10%) for peak resultant acceleration, but better for average acceleration (CV = 5.9 -8.9%) [89]. However, when different filters (e.g. 3 and 10 point moving average) are applied to raw average resultant acceleration data, validity is compromised, again highlighting the influence of filter choice [91]. Measuring the vector magnitude during collision events may also be useful for contact sports when a 20-Hz filter is applied, with small error during tackle bag contact (CV = 6.5%), but a degradation in validity (CV = 11.2 -11.3%) when contact occurs with another human [88].

Local Positioning System
Average acceleration and deceleration can be accurately quantified throughout shuttle activities and singular change of direction [62,65]. However, validity is compromised when change of direction is performed repeatedly with a bias as large as 16.1% [62]. While average acceleration can also be quantified during straight-line activity, there is a large margin of error for average deceleration (CV = 15.0 -21.0%, bias = -3.8 -10.7%) [62,65]. Peak acceleration and deceleration follow a similar pattern, with measures obtained during singular change of direction appearing relatively accurate (CV = 5.1 -5.3%), with error increasing when direction change is performed repeatedly (bias = -12.3 -41.1%) as well as shuttle activity (bias = -14.9 -10.1%) [62,65]. The accuracy of LPS for measuring peak acceleration and deceleration during straight-line movement is a little less clear, with conflicitng findings [62,65]. This is likely due to manufacturing differences between systems and as such, it appears as though the 'Clearsky T6' system and 'Inmotio' provide suitable measures of peak acceleration and peak deceleration during straight-line movement, respectivelty. The 'Inmotio' system however is unable to accurately measure instantaneous acceleration [34].

Other metrics
The results for this section are displayed in Supplementary Table 10.

Global Navigation Satellite System
Measures of metabolic energy expenditure (i.e. metabolic power), are generally quantified using open circuit spirometry and radars, and can be determined from a GNSS chip using a method [92] that focuses on the energetic cost of acceleration and deceleration phases of running, based on a theoretical model [93].
There is a systematic underestimation of metabolic energy expenditure (bias = -5.94 kcal . min -1 ) during repeated efforts (i.e. running and collisions) [94], while measures of average metabolic power appear suitable during shuttle activity [38], but not a soccer specific circuit [95]. Therefore, it may be that collision activity degrades the validity of GNSS to quantify measures of energy expenditure [94]. Further, when metabolic power is measured using thresholds (> 20 W . kg -1 , > 25 W . kg -1 ), there is a slight reduction in validity (CV = 9.0 -11.6%) for 5-Hz devices, while 10-Hz is superior (CV = 4.5 -

Inertial Measurement Unit
When measures of energy expenditure are provided by accelerometers, there is a large degree of error (bias = -56.9 -36.7%) [97][98]. Thus, GNSS devices should be used opposed to accelerometers to quantify measures of metabolic energy expenditure.

Local Positioning System
Peak force and power appear accurate when measured using LPS, although further research must be conducted to be confident in these metrics [51].

Total Distance
The results for this section are displayed in Supplementary Table 11.

Global Navigation Satellite System
There are a large number of studies that have investigated the inter-device reliability of a variety of devices, which is important to understand when comparing data between players and tracking training sessions in real-time [11]. It is clear . This is likely attributable to such devices possessing a true sample rate of 5-Hz, which is then interpolated to 15-Hz following collection. Nonetheless, devices sampling at a frequency of 10-Hz and above provide suitable reliability for continuous movement [41] and team sport circuits [33, 35, 51, 53, 55, 74, [100][101]. This is important as this type of protocol is reflective of the movement sequences experienced (e.g. change of direction to sprint to deceleration) during match-play, opposed to single movements performed in isolation (e.g. single change of direction), which rarely occur.

Local Positioning System
Local positioning systems provide suitable between device measures of total distance during team sport circuits [51], continuous movement [41] and change of direction [41], similar to that of GNSS.

Velocity-based threshold distance
The results for this section are displayed in Supplementary Table 12.

Peak velocity
The results for this section are displayed in Supplementary Table 13.

Global Navigation Satellite System
The inter-device reliability of 1-Hz devices is unclear, with a single study reporting a CV range of 2. Collectively, GNSS devices offer suitable reliability during team sport activity and straight-line sprinting, but not frequent change of direction or low intensity running. A player's greatest velocity is likely attained through straight-line sprinting, either in match-play or training. As such, depending on the activity, practitioners can be confident in comparing peak velocity outputs between players.

Local Positioning System
Local positioning systems appear to offer suitable between-device reliability for detecting peak velocity [51]. Although, only one system has been investigated and thus further research must be conducted.

Instantaneous velocity
The results for this section are displayed in Supplementary Table 14.

Average speed
The results for this section are displayed in Supplementary Table 15.

Global Navigation Satellite System
There is limited research investigating the inter-device reliability of 1-

Local Positioning System
Local positioning systems appear to offer suitable between-device reliability for detecting average speed during continuous movement with minimal direction change [41] and frequent change of direction [41]. Although, only one system has been investigated and thus further research must be conducted.

Acceleration & deceleration derived metrics
The results for this section are displayed in Supplementary Table 16.
All devices possess suitable reliability when measuring average acceleration, average deceleration and average acceleration/deceleration [33,103]. Peak acceleration can also be derived from a devices GNSS chip, with reliability appearing to be influenced by manufacturer specific parameters (e.g. filters, cut-off frequencies, software) [71,[100][101], with a CV of 4.0% to 14.0% for 5-and 15-Hz devices, while improving for 16-Hz devices (CV = 6.4%). Peak deceleration, while only investigated once should not be compared between players [101].

Inertial Measurement Unit
The inter-device reliability when calculating inertial movement acceleration magnitude and subsequent frequency (> 1.5 m . s -1delta velocity) using tri-axial accelerometer data, is appropriate [104].

PlayerLoad
The results for this section are displayed in Supplementary Table 17.

Inertial Measurement Unit
PlayerLoad is a composite vector magnitude calculated from the accelerations acting upon the x, y and z axis of an accelerometer. It appears suitable to make between player comparisons for measures of PlayerLoad during team sport match-play and training [54-55, 104-105].

Other metrics
The results for this section are displayed in Supplementary Table 18.

Global Navigation Satellite System
It appears suitable to make between player comparisons for exertion index measurements [54][55]. In contrast, it may be problematic to make comparisons when measuring repeated high intensity efforts [54-55], and a variety of collision based metrics derived from the GNSS chip (e.g. collision velocity, momentum) (CV = 13.2%) [86]. Collision load, designed to indicate the intensity of a collision (e.g. tackle), is calculated using data collected by the GNSS and accelerometer housed inside the wearable [86]. There are however large variations (CV = 10.1%) between devices when worn during contactbased training. Further, reliability for peak power and force measures appear superior for 18-Hz devices, but not 10-Hz [51].

Inertial Measurement Unit
Impact force (g) measured via the accelerometer housed within the wearable device appears to largely vary between devices during contact-based training [86].

Local Positioning System
There was a small amount of variation (CV = 5.9 -7.3%) between theoretical power and force measurements obtained from the 'Kinexon one' system during a team sport circuit [51].

Intra-device reliability
Intra-device reliability is important to understand, given the interest of tracking individualised training loads over time.
Readers should be aware there are inherent limitations with most studies that have investigated the test-retest reliability of wearable microtechnology. That is, they have largely relied on participants to perform identical movements on repeated occasions. Despite closely controlling the movement paths performed, variations (outside of those reported by the device) are going to occur. Therefore, the difference in measurements between tests encompass both biological and technological variation, and the true intra-device reliability, the intended scope of these studies, cannot be determined. To understand the true test-retest reliability of wearables, the biological variation must be eliminated from the movement, by identical movements being performed on repeated occasions.

Total distance
The results for this section are displayed in Supplementary Table 19. study has reported 4-Hz 'VX' and 5-Hz "SPI-ProXⅡ" devices show poor test-retest reliability, the statistical analysis employed only explored the relationship between the test-retest measures, rather than the magnitude of the difference [75], which may explain the disparity in findings compared to other studies. Further, it appears that within-player comparisons should not be made when distance is collected during a straight-line sprint using the 5-Hz 'MinimaxX'[37].

Local Positioning System
Three systems have been assessed thus far, with two showing suitable intra-device reliability during change of direction, match-play replication (wheelchair sport) and straight-line movement [41,61]. It may be problematic to make withindevice comparisons for the 'Inmotio' system, although this system has only been assessed once and thus should be further examined [75].

Velocity-based threshold distance
The results for this section are displayed in Supplementary Table 20.

Global Navigation Satellite System
There is limited research performed in this area given it is difficult to conduct a methodology that truly assesses intradevice reliability for velocity-based threshold distance. The reliability of 10-Hz devices appears superior to that of 1-and 4-Hz, although it is difficult to compare given the large variation among thresholds used (0.  70,95]. Regardless of whether the reliability was suitable or not, the studies that have investigated this have significant limitations given the inclusion of biological error as a result of poor study design where intra-device reliability is concerned. Therefore, future research with suitable methodologies, as previously stated, must be conducted in order to form any conclusions about the intra-device reliability of velocity-based threshold distance.

Peak velocity
The results for this section are displayed in Supplementary Table 21.

Global Navigation Satellite System
Similar to what has been previously discussed in this section, these findings should be approached with caution given that participant peak velocity is likely to vary between trials and thus, biological error will be reported in these studies. Indeed, it appears there is minimal variation between straight-line sprinting and team sport circuit trials for 1-to 10-Hz devices [42,57,59,70,75,106]. Change of direction however appears to degrade reliability for 4-Hz devices (ICC = 0.41 -0.66), while superior for 10-Hz devices (CV = 0.8%) [42]. Collectively, these findings highlight the important considerations that should be given to sampling rate.

Local Positioning System
Consistent with GNSS devices, there is a significant reduction in reliability for frequent and singular change of direction (ICC = -0.09 -0.32), while improving when such movement is removed (CV = 1.6 -2.7%; ICC = 0.97) [61,75]. The degradation in reliability observed may not be caused by the device itself, but rather due to it being more difficult to perform similar peak velocities on repeated occasions for movements involving frequent change of direction compared to simple straight-line sprints.

Average speed
The results for this section are displayed in Supplementary Table 22.

Local Positioning System
It appears that a LPS serves as a viable option to measure average speed when considering intra-device reliability, as

Acceleration and deceleration-based metrics
The results for this section are displayed in Supplementary Table 23.

Global Navigation Satellite System
The literature investigating intra-device reliability when quantifying peak acceleration is small, with that derived from a GNSS chip via time motion analysis possessing poor test-retest associations (ICC = -0.7 -0.49) [75]. This is consistent for distance covered while performing acceleration and deceleration efforts [95].

Inertial Measurement Unit
There is only small within-device variations (CV = 5.0 -5.2%) when peak acceleration magnitude (g) measured via the accelerometer housed inside a GNSS device is considered [59]. The ability to detect an acceleration magnitude above 5 g is superior during a 10 m sprint (CV = 4.7%) as opposed to 30 m (CV = 14.2%) [59]. This may be reflective of the magnitude obtained, in that the magnitude achieved in the 30 m sprint (8.3 g) is much larger than that during 10 m (7.3 g), which the device may not be able to tolerate [59].

Local Positioning System
A single local positioning system produced varying test-retest measures for peak acceleration which may suggest poor testretest reliability [75], although further research must be conducted where the same peak acceleration occurs repeatedly to establish this.

PlayerLoad
The results for this section are displayed in Supplementary Table 24.

Other metrics
The results for this section are displayed in Supplementary Table 25.

Global Navigation Satellite System
Measures of average metabolic power derived from a GNSS chip are repeatable, although, when based on a threshold (> 20 W . kg -1 ), reliability is poor [95].

Conclusion
There are many studies investigating the validity and reliability of wearable microtechnology to track movement and detect sport specific actions. It is evident that, for the majority of metrics, validity and reliability is multi-factorial, in that it is dependent upon a wide variety of factors including wearable technology brand, sampling rate, type of movement performed (e.g. straight-line, change of direction) and intensity of movement (e.g. walk, sprint). As such, it is difficult to form any definite conclusions regarding the overarching validity and reliability of wearable microtechnology devices. However, practitioners should be mindful of the accuracy and repeatability of the devices they are using when making decisions on player training loads. For example, if prescribing 'top-up' drills at the end of a training session based on the high-speed distance players have performed during training, these differences should be interpreted relative to the error of the device.
Similarly, when prescribing increments in training load in a rehabilitation setting, the speeds and distances performed by a player need to be interpreted with the within-device error accounted for.
It is important that future validity research compares the outputs of wearable devices with a true 'gold-standard' criterion for each metric respectively (e.g. high-speed 3D motion capture system for distance covered). While cost effective, the criterion measures commonly used in the reviewed research (e.g. measuring tape, timing gates) possess inherent validity issues, and therefore may contribute to the reported measurement error of the wearable devices.
Many of the differences between data generated from wearable technology and that of criterion measures, like VICON, may be attributed to the filtering and smoothing of the data [89]. Studies have shown that the filtering of data can have a large impact on the results obtained and therefore this should be considered in future studies. Accessing the raw data of both practical and criterion measures and performing the same filtering processes on both data sets would allow for more equitable comparisons. Unfortunately, the selection of the appropriate smoothing cut-off frequency is complex and there are no definitive guidelines. The movements that are being performed is an important aspect to consider, with a trade-off between removing noise in the data whilst maintaining resolution to quantify the metrics of interest.
Most research pertaining to the intra-device reliability of wearable devices, is poor. This is due to methodological issues (e.g. test-retest movements are not identical); as such, the studies in this review assess the combined technological and biological variation between movements, rather than the technological variation alone. In order to measure technological variation, future research must ensure that an identical movement construct (i.e. velocity, distance) is performed on multiple occasions. Given that humans are unlikely to be able to perform such a precise task, we may have to rely on other technology (e.g. model train set). Alternatively, examining the stability of the validity (i.e. assessing validity on multiple occasions), would also provide an indication of test-retest reliability, and should be emphasised in future research. These aspects of future research are vital given the important decisions that are made on the progression or regression of an individual's training loads.

Data Availability Statement
All of the extracted data are included in the manuscript and supplementary files.

Funding
No sources of funding were used to assist in the preparation of this article.  Table 1 Search terms and key words used in each database. Searches 1, 2 and 3 were combined with 'AND' Search 1 Search 2 Search 3 "Rugby" OR Football OR "Team Sport*" OR Soccer OR Basketball OR "Australian Rules" OR Hockey OR Cricket "Global positioning system" OR "Local positioning system" OR "Global navigation satellite system" OR GNSS OR GPS OR LPS OR Microtechnology OR Magnetometer OR Accelerometer OR Gyroscope OR MEMS OR "Micro-electrical mechanical system" OR IMU OR "Inertial measurement unit" Validity OR Reliability           Clearsky T6 3D motion analysis Straight-line sprint to deceleration (10 m) Left and right 75º diagonal movements (5 m) Straight-line sprint with 90º change of direction to deceleration (10 m) Zig-zag (60º and 360º change of direction) Zig-zag (60º change of direction) Bias = 1.5% (optimal set-up); 24.9% (sub-optimal set-up) Bias = 1.8% (optimal set-up); 29.0% (sub-optimal set-up) Bias = 1.6% (optimal set-up); 20.9% (sub-optimal set-up) Bias = 1.5% (optimal set-up); 15.0% (sub-optimal set-up) Bias = 0.5% (optimal set-up); 29.5% (sub-optimal set-up)

Conflicts of Interest
Luteberget et al. [63] 3D motion analysis Straight-line (12 m      Barr et al. [74] Local positioning systems  Orgis et al. [77] 55 Supplementary                CV = co-efficient of variation; ICC = intra-class correlation; *** indicates a metric that is calculated using data extracted from the GNSS chip and accelerometer of the wearable device   Wimu (     Viper (  CV = co-efficient of variation; r = Pearson's correlation co-efficient