Participant Selection
Eighty-eight participants with clinically diagnosed ataxias, 44 with clinically diagnosed parkinsonism, and 34 neurologically healthy control participants were recruited from the Massachusetts General Hospital and in collaboration with the Ataxia-Telangiectasia Children’s Project (see Table 1 for participant demographics). Ataxia diagnoses included a diverse set of underlying conditions. Participants with a clinical diagnosis of possible or probable multiple system atrophy were categorized as ataxia-dominant or parkinsonism-dominant based on their clinical phenotypes. One individual with a clinical diagnosis of progressive supranuclear palsy had predominant cerebellar ataxia and was categorized under the ataxia phenotype. Four participants were diagnosed with presumed autoimmune-related ataxia based in part on their response to immunosuppressant therapy. Inclusion criteria were that participants were 1) between 2 and 90 years old; 2) were able to perform the instrumented FNT; and 3) had a clinical diagnosis of ataxia or parkinsonism, or were neurologically healthy. Fifteen individuals with ataxia, five with parkinsonism, and two controls participated in the experiment multiple times during separate visits.
Table 1 Participant demographics Data Collection
Each participant’s overall motor impairment severity was assessed using BARS (range 0–30, in half-point increments) for the ataxia group, and the Unified Parkinson’s Disease Rating Scale (UPDRS) Part III Motor Examination [20] (range 0–108) for the parkinsonism group. Higher scores indicate greater motor disease severity for both scales. BARS, which is based on the Modified ICARS and correlated to both SARA and ICARS, was used as the assessment tool for participants with ataxia because of its brevity and because its scoring criteria for the FNT emphasize segmentation of movement [21]. Participants were equipped with nine-axis IMUs (Opal, APDM Wearable Technologies) on both wrists and seated upright with their feet firmly on the floor in front of a 12.9 in. tablet device (iPad Pro, Apple Inc.) positioned at approximately 90% arm's reach away in the midline of the body (i.e., in the frontal plane) and at the participant’s eye level (Fig. 1a). The tablet displayed a circular target with a 1.5-cm diameter that alternated between the left and right sides of the screen every 10 s. The position of the tablet and size of the target were chosen to emulate the physician-performed FNT [22], and the 10-s interval was chosen to allow participants to perform multiple reach and return sequences for each target position. Participants performed a continuous, 40-s FNT task with each hand (i.e., a total of 80 s). During each task, participants repeatedly moved their finger between their nose and the circular target as quickly and accurately as possible.
Pre-processing of Inertial Data
Inertial data were sampled at 128 Hz. Gravity was removed from acceleration time-series by subtracting the mean and a sixth order low-pass Butterworth filter with a cutoff frequency of 20 Hz was used to remove high frequency noise from non-human sources [23]. Filtered acceleration time-series were trapezoidally integrated to obtain velocity time-series. Because the reaching target changed positions every 10 s, a sixth order band-pass Butterworth filter with cutoff frequencies of 0.1 Hz and 20 Hz was used to remove integration drift and high frequency noise from velocity time-series [23].
To apply movement decomposition [14], velocity time-series must be oriented with the body’s anatomical axes. Because participants sat upright and repeatedly moved a finger between their nose and a tablet positioned directly in front of them, the rostrocaudal axis was aligned with gravity and the anteroposterior axis was assumed to be in the direction of greatest variance of the velocity time-series identified using Principal Component Analysis [24] (PCA). Each axis of the body-oriented 3D velocity time-series was independently segmented into movement elements at its zero-crossings (i.e., when the velocity is zero) (Fig. 1b) [15]. As in prior work, movement elements smaller than 1 mm or shorter than 5 ms—accounting for 3.0% of the combined total duration of all movement elements—were regarded as potential sensor noise and excluded from further analysis [15].
Feature Extraction
Fifty-three features hypothesized to be relevant to ataxia severity were extracted from each participant’s movement elements. Movement elements from both hands were pooled together for feature calculations. It was hypothesized that less-impaired participants would generate large, consistent movement elements corresponding to relatively smooth reaching motions between the nose and tablet. For example, prior work showed that healthy participants performing a 3D reaching movement towards a target (i.e., reaching for a can of soda) generated velocity profiles dominated by a large, primary movement element in each of the three axes [14]. In contrast, it was hypothesized that more severe participants would perform increasingly segmented, oscillatory, and irregular movements corresponding to smaller, more variable movement elements. In particular, it was hypothesized that dysmetria would induce the generation of many small movement elements with alternating directions reflecting corrective movements.
To capture the size and speed of movement elements, their time durations, distances, and mean speeds were computed. Logarithms of distances and mean speeds were used to emphasize the differences between smaller movement elements and because prior work suggested that the logarithms of the two variables would be approximately linearly related [14]. Each of these attributes were aggregated for each participant using the mean, standard deviation, minimum, maximum, range, interquartile range, median, and tenth and ninetieth percentiles. In neurologically healthy participants, the distance (D) and mean speed (\( \overline{v} \)) of movement elements are related by the two-thirds power law [14], \( \overline{v}\propto {D}^{\alpha } \) where α = 2/3. A decrease in this scaling exponent (α) indicates that slower velocities are generated to achieve a particular movement distance. Because it was expected that ataxia patients would generate slower movements [16], α was computed as a feature by fitting a least-squares linear regression between the logarithms of the distances and logarithms of the mean speeds for each participant and extracting the slope of the regression line.
Temporal relationships were captured by analyzing changes in distance and direction across consecutive movement elements. The probability density of transitions between consecutive movement elements was estimated using a normalized 2D histogram of the signed distances (i.e., signed distances of the prior vs. subsequent movement elements). In other words, each bin of the 2D histogram represented the probability of two consecutive movement elements having a particular distance and direction. To reduce the number of data features required by our model, only histogram bins corresponding to transitions between small movement elements were included as data features because it was hypothesized that transitions between small movements would be more common in ataxic movements as a result of dysmetria (Fig. 3). The ratios of consecutive unsigned movement element distances were computed to understand how movement element distances changed across consecutive movements when considering movements with both small and large distances. The mean, standard deviation, minimum, maximum, range, interquartile range, median, and tenth and ninetieth percentiles of the ratios were extracted as data features.
To capture differences in the morphology of movement elements’ velocity profiles—which relates to the smoothness and pattern of acceleration and deceleration—movement elements were spatially normalized by dividing them by their mean velocities and then temporally normalized by resampling them to sixty samples (Fig. 1c and d), which was chosen based on the median duration of the extracted movement elements [14, 15]. PCA was used to summarize the 60D normalized movement elements in a Leave-One-Subject-Out manner. Statistical aggregations (i.e., the mean, standard deviation, minimum, maximum, range, interquartile range, median, and tenth and ninetieth percentiles) of the first two principal components representing each normalized movement element were extracted as features.
Estimation of Clinical Scores
To estimate participants’ total BARS scores, a regression model (Gaussian Process Regression [25] with a Radial Basis Function kernel) was trained and evaluated using Leave-One-Subject-Out Cross Validation. Pediatric participants were excluded from the training data to reduce the possibility of the model learning trends corresponding to immature motor patterns. Features were scaled such that the training set had a fixed range [26] and clinician-assessed scores were normalized such that the training data had zero mean and unit variance. A similar model was also trained to estimate participants’ summed upper-limb BARS scores (range 0–8). Total BARS was used as the primary label given that the total score has increased granularity, may be more robust to error as it integrates information from several domains, and to support the goal of identifying arm movement properties that relate to overall disease severity.
Estimation performance was evaluated using r2 and the root mean square error (RMSE) of the model-estimated and clinically-assessed scores. The Pearson correlation of the estimation errors to participant ages was computed to determine if the model was biased by age. Reliability was evaluated using participants who received multiple assessments during several visits. Each visit was separated by several months (283.4 ± 106.1 days; range 126–433 days). Pearson correlation and a Welch’s t-test were used to determine if changes in clinician-assessed and estimated BARS were in agreement for participants with multiple assessments. A single rater, consistency, two-way mixed-effects model was used to calculate the intraclass correlation (ICC(3, 1)) and 95% confidence interval (CI) of the repeated model estimates.
To further understand the contribution of dominant and nondominant hand data in estimating clinical severity and to further assess model consistency, BARS scores were estimated based on data features calculated from each hand, separately. (i.e., the model training process was unaltered and additional hand-specific estimates were computed.) Pairwise correlations were calculated between dominant-hand-only estimates, nondominant-hand-only estimates, and the original estimates based on data from both hands.
Ataxia Classification
To evaluate the specificity of the extracted features to ataxia, classification models (Gaussian Process Classifier [25] with a Radial Basis Function kernel) were trained to distinguish between ataxia participants and healthy controls and between individuals with ataxia and parkinsonism. The classification models were trained and evaluated using the same pipeline as the regression model and performance was evaluated using the area under the receiver-operating curve (AUC) and its 95% CI. Because the populations were not age-matched, AUCs for participants in each decade of life were computed and Pearson correlation was used to determine if a relationship between age and model error existed. Additional classification models were also trained using only participants between 18 and 45 years old and using only participants at least 45 years old to mitigate the possibility that the classifiers were leveraging age-related differences in motor performance [27].
Analyzing Morphological Changes During Task Performance
Based on the results of the analysis of movement element morphologies, we investigated if a motor optimization process was occurring during the performance of the FNT. Optimization of normalized movement elements has been observed in healthy participants as they become more proficient at a task [28]. More specifically, the morphology of normalized movement elements converges to the theoretical model proposed by Hoff for 2D point-to-point movements [29]. To determine if optimization occurred during the performance of the FNT, three metrics were computed from the first 20 s and the last 20 s of each participant’s 40-s FNT time-series for each hand: 1) the standard deviation of the first principal component representing each movement element’s morphology, 2) the standard deviation of the second principal component, and 3) the coefficient of determination (r2) between each normalized movement element and the theoretical model. Decreases in the standard deviations of the principal components indicated more consistent morphologies in the second half of the test. Increases in r2 indicated morphologies with greater similarities to the theoretical model in the second half of the test. Significant changes were identified using a one sample t-test with a theoretical mean of zero (i.e., no change).
Additional Statistical Analyses
A significance level of p < 0.05 was used for all tests. Welch’s ANOVA was used to determine if significant differences existed between the number of movement elements extracted from each participant in the three populations. Pearson correlation and Welch’s t-tests were used to measure the strength of relationships between individual features and BARS, and to determine if features were significantly different for healthy and ataxia participants, respectively. Pearson correlation was used to determine the strength of relationships between features and age.
To test for significant differences between feature values, participants were divided into four groups based on their total BARS severity. The healthy group consisted of controls (N=34) and three ataxia groups were determined by equally partitioning the range of BARS scores present in the collected data (total BARS groups 0–8, N=30; 8.5–16, N=44; and 16.5–24, N=15). One participant’s data were included in two groups because their severity increased between subsequent visits. Significant differences were determined using Welch’s ANOVA and Games-Howell post-hoc tests.