Background

Wearable accelerometers are commonly used in clinical research as an inexpensive and unobtrusive means of measuring an individual’s movements, mobility, and physical activity, both within and outside of clinical settings. Specifically, for people with spinal cord injury (SCI), accelerometers have been used to assess physical activity and energy expenditure among wheelchair users, [1, 2] assess sleep, [3] predict in-lab versus at-home activities, [4] and count steps during inpatient physical therapy sessions [5].

A variety of time and frequency domain features can be extracted from wearable accelerometer data to provide diverse characteristics of an individual’s movement. We have defined limb accelerations (LA) as features calculated from an individual’s limb movements while asleep at night, which may capture accelerations from periodic limb movements (PLM), spasms, positional shifts, rolling, and turning. These movements are free of biases that daytime activities may introduce secondary to an individual’s occupation, exercise routines, and leisure time activities. LA have previously been shown to provide rich, descriptive information that was related to functional ambulation among a sample with chronic, motor incomplete SCI [6].

While asleep, an individual likely moves mostly subconsciously for comfort, pressure relief, temperature, or in response to other sensations [7, 8]. These movements encompass aspects of sensation to cue the individual to move and strength to perform the movement. For example, it is expected that more complex voluntary or subconscious movements, such as rolling or substantial repositioning movements, would require greater muscle strength than simpler movements, such as subtly moving a limb. Further, someone with better sensation may have more of these complex movements than someone with poorer sensation since they may have increased cues to reposition [7, 8]. Since these more complex movements would likely produce larger accelerations and longer completion times than simpler movements, we would anticipate that individuals with better strength and sensation would have higher amplitude and duration LA than individuals with limited strength and sensation.

Although more intense spasms may result in larger amplitude and duration movements, most spastic movements are relatively small in amplitude and duration [9]. Further, clinical assessments of spasticity, such as the Modified Ashworth Scale (MAS), define the most severe spasticity scores as considerable increase in muscle tone causing movement to be difficult (score of 3) or a rigid joint (score of 4 out of 4) [10]. It is anticipated that an individual with more severe spasticity may experience more resistance to movement and this may result in lower amplitude and shorter duration movements [11, 12]. It has been shown that supine positioning may increase spasticity, thus, spasticity and other involuntary movements may be more prevalent while laying down to sleep at night [13, 14]. Thus, we believe that features of LA measured during sleep can capture the unique attributes of an individual’s movement patterns and are related to clinical measures of strength, sensation, and spasticity among individuals with SCI.

The primary objective of this study was to provide quantitative evidence of validity of LA as a measure of impairment among individuals with SCI. Construct validity is the demonstrated relationship that a measurement is comparable to a different measure assessing a similar concept and unlike dissimilar concepts [15]. Concurrent validity quantifies the relationship between the novel measure and another previously validated measure of the intended construct [15]. We aimed to establish the construct and concurrent validity of LA as a clinically meaningful metric by evaluating the relationship between LA and summative standard clinical measures of lower limb strength, sensation, and spasticity among a population with chronic SCI. We hypothesized that features of LA related to amplitude and duration of movements would be the features most strongly related to each clinical outcome. Further, we anticipated that better strength, sensation, and spasticity would be associated with larger amplitude and longer duration movements. As a supplemental analysis to provide additional evidence of construct validity, we aimed to quantify the unique information provided by LA as compared to models consisting of possible covariate measures such as pain and sleep quality.

Methods

All participants provided informed consent as approved by the VA Pittsburgh Healthcare System Institutional Review Board. Individuals with chronic (≥ 1 year), motor complete and incomplete SCI were included in this analysis, although individuals with motor complete SCI were recruited in smaller numbers in order to mitigate bias in the impairment score distribution. Participants were excluded if they had a medical diagnosis of a condition that may affect sleep (e.g., sleep apnea), were unable to wear limb accelerometers, or had an injury to the legs that would significantly impair ambulation (e.g., amputation).

Data collection was consistent with methods described in detail in prior work investigating LA among individuals with incomplete SCI [6]. In brief, participants were recruited at adaptive sporting events and from a research registry from 2018 to 2021. Participants completed questionnaires that assessed personal, psychosocial, and environmental factors such as demographics, [16, 17] pain, [18,19,20] and sleep quality [21]. Participants completed a sleep and activity log for each night of the collection which reported activities that could affect sleep, fatigue and sleep quality ratings, and if the participant considered that night “typical” of how they normally sleep [18, 19, 22,23,24,25,26]. Participants also had their strength, sensation, and spasticity in their upper and lower limbs assessed by one of two trained clinicians following the International Standards for Neurological Classification of Spinal Cord Injury and MAS guidelines, except that all participants were assessed in a seated position. Participants then wore ActiGraph GT9X Link accelerometers for 1–5 days on their bilateral ankles and non-dominant wrist. The duration of collection was limited for some participants by logistical constraints, including the short time frame of the adaptive sporting events.

Data analysis

Input variables: LA and covariates

Only nights that the participant reported as “typical” to how they normally sleep were included in the analysis so that the LA analyzed were most representative of the participant’s normal movements and abilities. Sixty-one LA features were extracted from each ankle movement measured with the accelerometers, and the median and interquartile range of each feature (and maximum of one feature) were calculated across all movements per “typical” night. Using the ankle and wrist accelerometers, 10 additional features were computed per night such as the time asleep and proportions of movements that involved each limb. One final set of features per participant was determined by calculating the median for each feature across all nights of the collection [6]. This resulted in a final set of 133 LA features that captured changes in positioning, movement directions, frequency, smoothness, temporal characteristics, signal stability, and intensity (Supplementary Appendix 1 provides additional descriptions of the features) [6, 27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48].

Since the impairment outcomes were measured cross-sectionally, the measurement of both impairment and LA could be affected by factors such as demographics, pain, sleep quality, exercise, sleep medication, or consumption of caffeine or alcohol (24 covariate features, Supplementary Appendix 2) [16,17,18,19,20,21,22,23,24,25,26]. As a supplemental analysis, we evaluated how much unique variance in impairment was explained by adding selected LA features to models made using only covariates. All features were scaled by the minimum and maximum across participants to a 0–1 scale.

Output variables: strength, sensation, and spasticity

Strength was quantified by the lower extremity motor score which sums the manual muscle test motor scores from the L2-S1 myotomes across both lower limbs for a score between 0 (total paralysis) to 50 (normal). Lower limb sensation score was similarly calculated by summing the individual light touch scores from each dermatome across the lower limbs for a total score between 0 (no sensation) and 20 (full sensation) [49].

Spasticity was measured by the MAS for the knee flexors and ankle plantarflexors of both lower limbs. MAS had a skewed distribution in our sample with many participants having no spasticity, only two participants having a MAS score of 3, and no participants having a score of 4 in any of the areas assessed. To address this imbalance in scores and to improve clinical interpretability of lower limb spasticity, the MAS scores were categorized into 3 groups: no, mild, and moderate spasticity. Participants were categorized as “no spasticity” if they had a MAS = 0 for all areas assessed, “mild spasticity” if they had some spasticity (MAS > 0) recorded but all MAS scores were < 2, or “moderate spasticity” if any MAS score was ≥ 2.10

Analysis models

To select a subset of LA features, the least absolute shrinkage and selection operator (LASSO) implemented with least angle regression (LARS) and multinomial logistic regression with ℓ1 regularization algorithms were utilized for the numerical (strength and sensation) and categorical (spasticity) outcomes, respectively. While the LASSO LARS and logistic regression with ℓ1 regularization algorithms have much in common with linear regression and logistic regression, respectively, they are preferred in this instance due to their ability to perform feature selection as part of the model building process, making them more efficient for high dimensional data [50,51,52].

For the supplemental analysis, covariate features were selected using the LASSO LARS and logistic regression with ℓ1 regularization algorithms. Baseline performance using only the selected covariate features were then determined using linear and logistic regression models for the numerical and categorical outcomes, respectively. Selected LA features were then added to the covariates to assess the unique variance explained (strength/sensation) or improvement in classification (spasticity) from the addition of LA. All analyses used 10-fold block cross-validation among the randomized participant feature sets in the selection of the optimal features and all available samples in the final regression models.

Model evaluation

For the strength and sensation LA models, the primary evaluation metric was R² which represents the variance in the outcome explained by the selected input features. When comparing the explained variance between models, it is important to account for the number of features included in the model, as models with more features likely have greater explained variance. Therefore, the adjusted R², which applies a correction to R² for the number of features in the model, was used as the primary evaluation metric for the supplemental analysis for strength and sensation when comparisons were made between models containing only covariates or with covariates and LA [52]. Statistical significance of the change in the strength and sensation linear regression models when LA features were added to covariates was assessed using the change in the F-statistic and p < 0.05. For both the primary and supplemental analyses, other evaluation metrics included mean absolute error, mean squared error, and root mean squared error. Cohen’s \(f\)2 was used to evaluate effect size from the adjusted R² with 0.02, 0.15, and 0.35 indicative of small, medium, and large effects, respectively [53].

The overall classification accuracy (OCA), precision, recall, and F1-score were used to describe the spasticity model performance. OCA represents the percentage of participants who were correctly classified. Precision represents the accuracy of the true classifications (i.e., positive predictive value) while recall represents the fraction of the correctly identified positive classifications (i.e., true positive rate). The F1-score is the weighted average of precision and recall [54, 55]. The log-likelihood ratio was used to assess for statistically significant change from the addition of LA to the covariates logistic regression models in the supplemental analysis.

Evaluation of validity

Construct validity was determined by evaluating the features chosen by each model and the clinical interpretation in relation to the impairment outcome. Further evidence of construct validity was also provided by determining the variance in each impairment outcome that was uniquely explained by LA, in the presence of other factors that could potentially represent similar information to LA or affect the measurement of LA or the impairment outcomes. This provided confirmation that LA are truly a measure of impairment and not simply a surrogate measure of other factors, like sleep quality. Concurrent validity was measured by the variance explained/classification accuracy of the models using LA and the standard clinical assessments.

Results

Participants

Thirty-six participants with motor incomplete SCI and 13 with motor complete SCI completed the data collection. Eight participants were excluded from the analysis because they self-reported that they had no “typical” nights recorded during the collection period. One additional participant was excluded because the accelerometers were likely removed overnight. Data collection was completed for two participants before the spasticity measures were added to the study, so the spasticity analysis had 38 total participants included, while the strength and sensation analyses had 40 participants. A post hoc power analysis showed that > 88% power was achieved for all linear regression models given the sample size, number of predictors, α = 0.1, and effect size (\(f\)2) [56]. Participants were primarily male, non-Hispanic/Latino White, Veterans with paraplegia who used a manual wheelchair as their primary mode of mobility (Table 1). Examples of ankle acceleration plots are shown in Fig. 1.

Table 1 Participant demographics and impairment outcomes
Fig. 1
figure 1

Example of acceleration vector magnitude vs. time plots from one ankle across one night for 3 participants with corresponding demographics and impairment outcomes. Figure 1c includes a zoomed section for additional detail of likely spastic or other involuntary movements

Strength

Sixteen LA features were selected which explained 68.7% of the variance in lower limb strength (Table 2). The features with the greatest association with higher strength scores were larger variations in energy (Wave Approx- IQR), fewer variations in the similarity between recent movements (Num Cross Corr Peaks- IQR), greater variation in local dynamic stability (variations in the response to perturbations, Lyapunov Exp- IQR) and faster rotational movements (Angle Rate Change- Med, Tables 3 and 4). When LA features were combined with covariates, an additional 35.5% of the variance in strength could be explained (adjusted R2 = 0.847, 72% increase, p = 0.021), as compared to the model with only covariates (Supplementary Appendices 3 and 4).

Table 2 Strength, sensation, and spasticity model results
Table 3 Number (percentage) of selected LA features for each impairment outcome by category, with darker shading representing a higher proportion of features selected
Table 4 LA features included in the strength, sensation, and spasticity models, sorted by the absolute value of the coefficient

Sensation

A model containing 15 LA features explained 73.3% of the variance in lower limb sensation. Having a less variable time between movements (Time Since Prev- IQR), more consistent movement directions (Corr YZ- Med), and lower frequency movements (Dom Freq 1- Med) were most strongly associated with more intact sensation. When added to covariates, LA explained an additional 49.2% of the variance in sensation (adjusted R2 = 0.714, 222% increase, p = 0.001).

Spasticity

Spasticity categories had an OCA of 81.6% using 7–10 selected LA features (weighted average F1-Score = 0.814). No participants were falsely classified as having moderate spasticity (precision = 1), but the highest recall (0.933) was for the no spasticity category, indicating that those without spasticity were most likely to be correctly classified. The features most associated with having no spasticity included moving in less consistent directions (Corr YZ- Med), more variable movement symmetry (Skewness- IQR), less power at the second dominant frequency (Power Dom Freq 2- Med), and more variable recent movements (Close Cross Cov/Corr Peak- IQR). LA features associated with moderate spasticity include less variable movement entropy (Wave Entropy- IQR), more movements per hour (Move/hour), less variable symmetry of movements (Skewness- IQR), and lower and less variable energy (Fig. 1c). When combined, the LA + covariates model achieved 89.5% accuracy in classifying spasticity categories, including an increase in the weighted average F1-score of 0.186 (26% increase, p < 0.001, p = 0.275, and p < 0.001 for no, mild, and moderate spasticity models, respectively) as compared to the model using only covariates.

Discussion

We have provided evidence of construct and concurrent validity for LA as a measure of impairment by demonstrating that regression models consisting of only LA features were able to explain the majority of the variance in strength and sensation and correctly classify the majority of participants into spasticity categories. Since LA features are continuous, LA may provide variability and detailed information about impairment that clinical measures currently lack and, thus, may be useful in providing increased resolution compared to existing measures. Further, in the supplemental analysis, LA accounted for additional variance beyond what can be attributed to covariates alone, thus, supporting the construct validity that LA features uniquely capture aspects associated with each impairment outcome and not related measures like sleep quality.

It was hypothesized that LA features such as those measuring amplitude and duration of movements would be most related to the measures of impairment with larger amplitudes and durations being associated with better impairment outcomes, which had mixed support from the findings. Movement duration was not selected for any of the models, and therefore was not directly among the most important LA features in relation to each outcome. However, other features that may indirectly contain movement duration information, such as the percentage of movements that meet the criteria for PLM (PLM %) and PLM per hour (PLM Index), were selected in nearly all models. By definition, PLM must be short duration movements that occur in series [41, 44]. Therefore, having a higher percentage of movements that meet the criteria for PLM being related to greater strength, better sensation, and less spasticity provides support that movement duration, in combination with the other characteristics that define a movement as part of a PLM series, may be an important aspect of LA in relation to impairment.

As shown in Tables 3 and 4, features evaluating the spectral power in the frequency domain and wavelet energy bands (signal characteristics such as Power Dom Freq 2- Med, Power Dom Freq 1/Total- IQR, Wave Approx- IQR) of movements were often some of the most strongly related features to each measure of impairment and were selected for each impairment outcome. Both the statistical and frequency domain features consist of similar information about the intensity of movements, but the statistical features are with respect to time while features like power and energy are with respect to frequency or both time and frequency. Therefore, it makes intuitive sense that higher power and energy movements may be associated with greater strength and less severe spasticity. Likewise, more impaired sensation was associated with higher and less variable frequency (Dom Freq 1- Med/IQR) and more powerful movements (Power Dom Freq 2- Med) suggesting a lack of motor control to vary and regulate movements based upon sensory feedback. Therefore, the hypothesis that larger amplitude movements would be associated with improved outcomes was indirectly supported for strength and spasticity. The hypothesis was not supported for sensation since higher power movements with lower frequency were associated with poorer sensation. Similar features have also been found to be related to lower limb rehabilitation [36] and gait among various populations, [34] further indicating the clinical relevance of these measures.

Movement consistency was found to be related to all impairment outcomes through the consistency of movement directions (correlation coefficients between axes) and consistency between movements (PLM, relationship to recent movements, timing). For example, more consistently timed movements were related to greater strength (Num Cross Corr Peaks- IQR, Time Since Prev- IQR), better sensation (Time Since Prev- IQR, Num Cross Cov/Corr Peaks- IQR, PLM Index), and less spasticity (No Spasticity: Close Cross Cov/Corr Peak- IQR, PLM %; Moderate Spasticity: Move/hour, PLM Index). Additionally, moving in a larger variety of directions was associated with greater strength (Corr XY- IQR/Med) and less spasticity (Corr YZ- Med), but worse sensation (Corr YZ- Med, Corr XZ- IQR). Since moving in a variety of directions requires muscle activation from a greater number of locations, it makes sense that moving in more directions was related to greater strength. Additionally, it is logical to infer that participants with more frequent, consistent, repetitive movements may have more severe spasticity (or related involuntary movements such as myoclonus or PLM) while those with more variable, less consistent movements have little to no spasticity. Since about 40% of individuals with SCI may experience problematic spasms that affect their sleep, [57, 58] LA may be an unobtrusive way to evaluate spasticity and the effects of treatment.

Additional measures of movement consistency that were related to greater strength include having a wider range of responses to perturbations (higher Lyapunov Exp- IQR), smoother movements (lower Num Med Crossings Norm- Med), and more negatively skewed movements (lower Skewness- Med). These findings are supported by previous studies that have shown the Lyapunov exponent to be related to improvements in lower limb rehabilitation [36] and ambulation [6] and that healthy controls generally had smoother movements and more negative skewness than individuals with Parkinson’s disease [34].

Limb movement percentages and velocity and distance features were not selected for any impairment outcomes. While velocity and distance information may be better represented by similar features such as those in the frequency domain, limb movement percentages represented unique information that may not be as meaningfully related to strength, sensation, or spasticity.

Having faster rotational movements (i.e., rolling, Angle Rate Change- Med) and smoother movements (Num Med Crossings Norm- Med) were related to greater strength, but were not among the selected features for sensation or spasticity. The ability for certain categories of LA to be related to some measures of impairment but not others, demonstrates the vast amount of diverse information that LA can detect which may provide increased resolution about an individual’s strength, sensation, and spasticity than current clinical measures. This may be particularly useful in the context of clinical prediction rules, as models using common clinical assessments may be inadequate to accurately predicting long-term functional ambulation among those with incomplete SCI [16, 59,60,61].

Our previous work found that LA improved the classification accuracy of categories of functional ambulation among those with motor incomplete SCI [6]. This further supports the validity of LA and demonstrates that LA likely contain richer information than clinical measures of impairment alone. Future studies should evaluate other lower limb functional tasks and how LA can be utilized for outcome prediction in a longitudinal sample with acute SCI.

Although LA features were specifically extracted to be clinically meaningful individually, they provide the most beneficial and comprehensive information when interpreted together [34]. Since all LA features are calculated using the same data set with minimal computational time to extract many features, one can obtain a versatile set of detailed features related to impairment with minimal collection burden.

Limitations

Although only nights “typical” to how the participant normally sleeps were included in the analysis, this measure was self-reported by the participants and it is possible that even during typical nights, LA were affected by unusual sleep patterns. Factors that may affect the LA data collection such as exercising, consuming alcohol, and daily and overall sleep quality were included in the initial covariate models to ensure that these factors were accounted for in the supplemental analysis. Participants were excluded if they self-reported a medical diagnosis of a condition that affects sleep. Given the high proportion of individuals with chronic SCI who have sleep-disordered breathing and the demographics of the sample, [62, 63]. It is possible that participants were included in the sample that had an undiagnosed sleep disorder. Further research should examine the differences in LA between typical and atypical nights, as well as the effect of sleep disorders on LA.

Although participants were asked to report any medications that they took that may affect sleep, details such as when these medications were taken and if participants used any medications to decrease spasticity were not explicitly recorded. The sample population had a lower proportion of participants with moderate to severe spasticity than expected. Although the results support LA being associated with up to moderate spasticity, future studies should assess the relationship between LA and severe spasticity and the effect of antispasmodic medications on LA.

It is possible that the demographics of the sample are not sufficiently representative of the general SCI population and differences in demographic factors could have affected the presented results. Additional analyses using a larger, representative sample of individuals with SCI should be explored to verify the current findings. Differences in LA based upon participant demographics, such as sex and completeness of injury, should be also explored.

The clinical measures of strength, sensation and impairment have limitations and are not an ideal gold standard for comparison [10, 64,65,66]. However, using summed measures of strength and sensation over the whole lower limbs, a categorized measure of lower limb spasticity, and only two clinicians for all assessments should minimize the effect of the limitations in reliability and responsiveness seen in the individual measurements [66,67,68].

For prediction models, it is critical that the model is assessed using a separate, unseen test set to avoid results that appear favorable, but perform poorly in practice. However, we do not intend to use LA as a predictor of impairment, as this is not clinically useful nor a goal of the current analysis. Thus, holding out a separate test set of samples or using a computationally intensive analysis such as nested cross-validation to assess the model performance on unseen data was not deemed necessary. Therefore, the results from this analysis are effective for estimating the relationship between LA and measures of impairment in our sample and demonstrating the validity of LA as a clinical measure. If prediction is a goal of a future analysis, then utilization of a larger sample and a strict validation method with an unseen test set would be required.

It is possible that the LASSO LARS and multinomial logistic regression with ℓ1 regularization algorithms that were used for feature selection were affected by noise in the input features and resulted in suboptimal feature selection or selection of redundant features. The use of 10-fold cross-validation was intended to minimize this possibility. Further, additional steps were taken (targeted participant recruitment, collection of multiple nights when possible, etc.) to minimize the bias in the data and maximize the generalizability of the findings.

Conclusion

Finding that LA measured during sleep is uniquely related to standard clinical measures of strength, sensation, and spasticity has provided evidence of construct and concurrent validity among a sample with chronic SCI. This demonstrates that features derived from LA are clinically meaningful metrics related to neuromuscular impairment that could be useful in many future applications including clinical prediction rules for ambulation after an acute SCI.