1 Introduction

In cross-country skiing, athletes with structural or functional impairment at the lower limbs, pelvis or trunk compete on a sit-ski. As these sit-skiers have full upper limb function, they generate propulsion with two poles using the double poling technique. To compete in cross-country sit skiing, athletes have to be affected by at least one of the seven eligible physical impairments. Eligible physical impairments for cross-country sit skiing are impaired muscle power, impaired passive range of movement, limb deficiency, leg length difference, hypertonia, ataxia, and athetosis.

With a view to assuring equal and fair competition, sit-skiers with different eligible physical impairments go through a functional classification process. As a results, skiers are appointed to one of the five classes LW10, LW10.5, LW11, LW11.5, and LW12 [1] according to the effects the impairment has on cross-country sit skiing performance [2]. Athletes in class LW10 are unable to maintain an unsupported sitting position. Athletes in classes LW10.5, LW11, and LW11.5 have some ability to control the trunk and to keep balance while sitting that increases with class level. Athletes in class LW12 have complete trunk function [3].

Functional classification has the advantage to be sport-specific, although it is subjective. Athletes are physically and technically judged by at least two experienced classifiers, as described in the World Para Nordic Skiing Classification Rules and Regulations [3]. In addition, athletes may be visually observed during a competition by the classification panel [3]. High precision and reliability may be challenging to achieve with the subjective current classification system for cross-country sit-skiers, making the validity of the method questionable [2]. Indeed, in 2007, the International Paralympic Committee recognized the need for an objective and evidence-based classification for all sports, by stating it in the International Paralympic Committee Classification Code [4].

For each Paralympic sport, evidence-based classification aims to define eligible types of impairment, to set minimum impairment criteria, and to identify and profile classes [5]. To achieve these goals, it is mandatory to develop (i) reliable measures of impairment for classification purposes and, (ii) sport-specific measures of performance determinants [5]. Dealing with measures of impairment, core stability is a key issue for many seated sports. Methodology to evaluate trunk impairment can be applied to different sports (e.g., wheelchair racing, wheelchair rugby, and para-canoeing) [6, 7]. Trunk strength impairments affect force generation during maximal voluntary contractions [8]. Trunk control impairments influence trunk balance and range of motion during unpredictable balance perturbations [9].

Several Paralympic sports have initiated evidence-based classification research taking into consideration measures of both physical impairment and performance. Among them there are wheelchair racing [10, 11], wheelchair rugby [12,13,14,15], athletics [16, 17], throwing [18], and swimming [19]. In cross-country sit skiing few studies aim to propose scientific evidence to support a classification system. Regarding measures of impairment, unpredictable balance perturbations in the sagittal plane have been used to assess trunk control ability [20]. In [21,22,23], an adapted and sensorized ergometer was used for measuring performance determinants. To the best of our knowledge, only one study has been conducted on the relationship between impairment and performance, and it involved athletes with intellectual impairment [24].

Measures of impairment reported in [20] represent an advancement towards an evidence-based classification process for cross-country sit skiing. However, this study has some limitations related to standardization. Participants were tested using their personal sit-ski, with a customized seat and personalized constraints, such as straps and backrest. Based on these considerations, in this work, a new device to measure trunk strength and trunk control impairments in standardized conditions [25] is presented.

The present study aimed first to assess if the proposed measurements of trunk strength and control could be used for classification purposes, in compliance with the Position Stand [2] and following documents [5, 26, 27]. The second purpose was to identify the most influential measures to objectively assess trunk strength and control. A cluster analysis [20, 23] was used to group athletes based on impairment and performance. Then, a minimum set of measures was defined to assess trunk strength and trunk control. The authors hypothesized that the use of a cluster analysis would group sit-skiers according to trunk impairments. Indeed, trunk strength and trunk control affect the ability of a sit-skier to generate propulsive force and balance on the sit-ski, respectively.

2 Methods

2.1 Participants

Fourteen elite cross-country sit-skiers (nine males and five females, mean ± standard deviation: 32 ± 6 years, 160 ± 18 cm, 55 ± 13 kg) volunteered as participants. Participants had different impairment conditions (spinal cord injury = five, spina bifida = two, lower limbs amputation = seven). According to the functional classification process [1], they were assigned to the classes: LW10.5 = one, LW11 = two, LW11.5 = three, and LW12 = eight. Research methods and protocols were approved by the University of Jyväskylä Ethics Committee. The procedures were performed in accordance with the Declaration of Helsinki. Before the tests, participants were informed about the test purpose and procedures and they signed a consent form.

2.2 Materials

The tests took place in the Vuokatti Sports Technology Unit laboratory (University of Jyväskylä, Finland). The testing device was composed of a seat mounted on a motorized sledge and surrounded by an aluminum frame (Fig. 1). The device was integrated with a rubber cover for the seat to prevent participant from sliding or falling, two belts to fix their pelvis and thighs to the seat, two padded belts to hold their shanks, and two lateral bars to prevent lateral falls (Fig. 1).

Fig. 1
figure 1

A new device for testing trunk strength and control impairment in standardized conditions

A horizontal pushing bar and a backrest were fixed to the frame, anteriorly and posteriorly, respectively. Both the bar and the backrest were adjustable in height and depth to fit with the participants’ anthropometry. A tri-axial force sensor (K3D120; Me-Meßsysteme GmbH, Germany, full range 2 kN, resolution 0.4 N) was mounted between the horizontal pushing bar and the frame. A uniaxial force sensor (TB5; Lahti Precision, Finland, full range 5 kN, resolution 0.5 N) was mounted between the backrest and the frame. Two ropes with instrumented handles (uniaxial force sensors, University of Jyväskylä, Finland, full range 1 kN, resolution 0.3 N) [28] were hung from the anterior top of the frame. All force signals were synchronized (CED Power1401-3; CED Ltd., England) and collected with a sample frequency of 100 Hz.

The sledge was electrically driven on a linear rail (LF 12S; Bosch Rexroth, Germany). It was actuated by a three-phase motor (MSK060C; Bosch Rexroth, Germany) and a servo drive (HCS01.1E; Bosch Rexroth, Germany). Sledge movement parameters (acceleration, velocity, and amplitude) were set by an operator using LabVIEW (National Instruments, USA). Maximum sledge acceleration, velocity, and displacement were set to ± 3 m/s2, ± 1 m/s, and ± 0.4 m, respectively.

An inertial sensor (Physilog® 4; GaitUp, Switzerland) was fixed on the posterior right corner of the sledge, with another on the participants’ seventh cervical vertebra. The inertial sensors sampled 3D acceleration (full range ± 16 g, sensitivity 2.048 LSB/g) and 3D angular velocity [full range ± 2000°/s, sensitivity 16.4 LSB/(°/s)] at 500 Hz. The sensors were synchronized by an electronic trigger [29].

2.3 Protocol

All the tests were conducted during the International Paralympic Committee World Cup in March 2016 in Vuokatti (Finland). The protocol consisted of isometric maximum voluntary contraction tests to evaluate strength impairment, and balance control tests to evaluate trunk control impairment [25]. To evaluate test–retest reliability, maximal voluntary contraction and balance control tests were repeated twice (day 1 and day 2) with 24–36 h in between. To assure standard and comparable measures between days, no training or competitions were sustained by the participants in the 4 h before the tests. Tests conditions and the participants’ posture were standardized to increase methodological consistency, reducing any effect of external factors.

The maximal voluntary contraction tests (Fig. 2) were: (a) upright seated press without using the backrest, (b) upright seated press using the backrest, and (c) simulated poling in which the participant pulled the two ropes isometrically.

Fig. 2
figure 2

Isometric maximal voluntary contraction tests conditions: a upright seated press without using the backrest; b upright seated press using the backrest; c simulated poling

The pelvis was fixed to the seat by the two straps to prevent the participants from sliding on the seat. The lower limbs were suspended under the seat using two padded belts. Belts avoided voluntary or involuntary legs movements and prevented the participants from pushing their feet against the ground to generate more force by the upper body.

In the upright seated press tests (a) and (b), participants were requested to keep their hands on the anterior pushing bar. Intra hands distance was defined to have shoulder abduction close to zero in the frontal plane. The anterior pushing bar was then regulated to fit participants’ anthropometry: (i) height corresponding to the participants’ mesosternum, (ii) antero-posterior distance set so that in the sagittal plane flexion angles were 90° for the elbow, and 30° with respect to the trunk for the shoulder.

The participant’s back was in contact with the backrest in test (a) and 75 mm away from the backrest in test (b). The distance was set using a calibrated block interposed between the participant’s back (straight trunk) and the backrest. The block was removed during data collection. Therefore, in upright seated press with backrest, participants could use the backrest to generate more force. In upright seated press without backrest, with the trunk straight and no backrest, participants generated force mostly relying on their core muscles.

The poling test (c) was designed to simulate the beginning of the pushing phase during double poling on flat terrain [30]. The arms were posed in the sagittal plane with flexion angles of 120° for the elbow, and 90° with respect to the trunk for the shoulder. For all the maximal voluntary contraction tests, participants were instructed to exert the maximum force and to hold it for 3 s. Each participant performed three repetitions for each test, with an interval of 30 s between repetitions. A resting interval of 5–10 min separated maximal voluntary contraction tests.

The balance control test consisted of forward and backward unpredictable perturbations, obtained using the electrically driven sledge. Participants were seated on the testing device, with their pelvis and lower limbs strapped, as for the maximal voluntary contraction tests. Participants were instructed to maintain their trunk in a straight position and their arms in a neutral position, relaxed along the body. Participants were forbidden to use their arms and hands to keep or recover the initial position.

The balance control test was performed imposing to the sledge three accelerations: ± 0.5, ± 1, and ± 2.5 m/s2 and a stroke of ± 0.3 m. For each acceleration, five forward and five backward stimuli were given. Stimuli were administered in a random order and inter-stimuli time. In general, forward (backward) stimuli produce a backward (forward) trunk movement before physiological control mechanisms start influencing the movement. After each stimulus, the sledge returned to the initial position, in the middle of the rail.

2.4 Data analysis

For each maximal voluntary contraction test, the best attempt of the three repetitions (highest force peak) using unfiltered data was considered for analysis. The maximal voluntary contraction tests gave five measures of strength. The upright seated press without backrest yielded one measure of strength: the peak of anterior pushing force (PFawo). The upright seated press with backrest, yielded two measures of strength: (i) the peak of anterior pushing force (PFaw) and (ii) the peak of posterior pushing force (PFpw). For PFawo and PFaw, the magnitude of the generated force measured along the three axes was calculated, while the PFpw was measured along the longitudinal axis. The simulated poling yielded one measure of strength that was calculated as the mean value between the left and right peaks of the pulling force (PFp). The fifth measure of strength was calculated as the ratio between PFawo and PFaw (PFawo/PFaw).

In the balance control test, the inertial sensor data were used to compute measures of trunk control. The inertial sensors were functionally calibrated based on an assisted upright posture (to define the trunk’s longitudinal axis) and on controlled forward–backward sledge movements (to define the trunk’s anterior–posterior axis) [29, 31]. Segment orientation was obtained with the strapdown integration with static drift correction and the azimuth orientation constrained to zero [32]. Trunk angle was computed as the dot product between trunk’s longitudinal axis and vertical axis in the fixed global frame (e.g., gravity). Trunk forward movements were defined as positive, whereas backward movements were defined as negative.

The balance control tests gave 18 measures of trunk control. For each of the three acceleration values (0.5, 1.0, 2.5 m/s2) and two perturbation directions (backward and forward), the following measures were evaluated: (i) trunk angle before perturbations (Trunkrest), (ii) trunk reaction time (TRT), and (iii) trunk range of motion in the sagittal plane (Trunkrom). Trunkrest was calculated as the mean of trunk angle in the 500 ms before each backward (Trunkrest-b) and each forward (Trunkrest-f) perturbation for the three accelerations. TRT was calculated as the time between when the trunk started moving and when it inverted the motion. These two instants were automatically detected on the trunk angular velocity signal, respectively, as (i) when the signal started increasing sharply from a baseline value and (ii) when the signal crossed the zero. Trunkrom was calculated as the difference between (i) the trunk angle when the trunk inverted the motion for the first time and (ii) the Trunkrest.

For each participant and acceleration value, the mean of Trunkrest, TRT, and Trunkrom was calculated for backward (Trunkrest-b, TRTb, and Trunkrom-b) and forward (Trunkrest-f, TRTf, and Trunkrom-f) perturbations. All analyses were performed using custom script in MatLab Software (MatLab and Release 2015, The MathWorks, Inc., Natick, Massachusetts, USA).

2.5 Statistical analysis

Tests repeatability between days 1 and 2 was calculated to assess whether the proposed measures of impairment could be used for classification purposes. To assess test–retest reliability, statistical difference between the 2 days (Wilcoxon test), intraclass correlation coefficient (ICC, 2-way mixed, 95% confidence interval), and standard error of measurement were computed for all the analyzed measures [33, 34]. ICC was defined as poor (ICC < 0.50), moderate (0.50 ≤ ICC < 0.75), good (0.75 ≤ ICC < 0.90), or excellent (ICC ≥ 0.90) [34].

To identify the most important measures that naturally group participants according to their trunk strength and control, a cluster analysis was performed on measures of strength (maximal voluntary contraction tests) and control (balance control tests). A similar cluster analysis approach was previously adopted to group cross-country sit-skiers. Trunk impairment in balance perturbation test [20] and performance during simulated action of poling on the ergometer test [23] were considered. In addition, a similar cluster analysis approach was used to evaluate strength impairment in wheelchair racers [6].

Data of the 14 participants in day 1 were considered. Before conducting the cluster analysis, data were prepared as follows. Data were checked for outliers using the method of ± 3 standard deviations. Only measures with a coefficient of variability over 5% were included in the cluster analysis. Because the measures had different scales, before running the cluster analysis they were normalized using the z score.

A k-means cluster analysis [35] was run to empirically categorize participants on the basis of their trunk strength and control. To run k-means cluster analysis is required to predefine the number of clusters. Studies on classification in cross-country sit skiing [20, 23] have worked best with two or three clusters. Therefore, in the present study, it was hypothesized a priori to divide participants into three clusters (cluster_1, cluster_2, and cluster_3). With respect to considering only two clusters, this solution is expected to represent a greater variance of impairment (i.e., absent, partial, or full trunk control).

The cluster analysis was validated using internal and external criteria. For the internal validation, principal component analysis [35] was used to represent data in the space of the first two principal components to visualize the formation of clusters. For external validation, k-means output (participants gathered in clusters) was compared to a priori information [36] (participants gathered in groups). So, three groups of participants were a priori defined according to their trunk control ability as described in [1]. Group_1: participants with absent or limited trunk control and ability to keep balance (LW10.5). Group_2: participants with fair trunk control and ability to keep the balance (LW11). Group_3: participants with complete or near to complete trunk control and ability to keep balance (LW11.5–LW12) [1]. The external validation was assessed to calculate accuracy, precision, and sensitivity [37] for the used k-means classifier. In addition, the macro-averaging F score was calculated to evaluate the harmonic mean between precision and sensitivity for the classifier. The score ranges between 0 and 1 [38].

The contribution of each measure in clusters formation was quantified using Kruskal–Wallis test (Fisher's least significant difference post hoc) and the effect size was assessed. Statistical significance for the analysis was set at an alpha of 0.05 (MatLab and Release 2015, The MathWorks, Inc., Natick, Massachusetts, USA). To determine statistical power, a post hoc power analysis (software = GPower; test = ANOVA: fixed effects, omnibus, one-way test; α = 0.05) was conducted on the measures that contributed most to clustering athletes. The Spearman correlation was calculated between maximal voluntary contraction and balance control test measures.

3 Results

Concerning the assessment of measurements repeatability, very good results were found for test–retest reliability (Table 1). Test–retest reliability was evaluated on 12 out of 14 participants, because two of them performed the test protocol only on day 1. For all maximal voluntary contraction and balance control measures no significant differences were found between days 1 and 2.

Table 1 Results of test–retest reliability between day 1 and day 2

ICC was good to excellent for all maximal voluntary contraction measures (0.71 < ICC < 0.98), with the lowest value for PFawo/PFaw (ICC = 0.71). In the balance control test, for both stimuli directions good to excellent values of ICC were found for Trunkrom at the three accelerations (0.94 < ICC < 0.96) and for TRT for ± 1 and ± 2.5 m/s2 (0.83 < ICC < 0.99) (Table 1) [34]. In contrast, poor ICC was found for Trunkrest in backward and forward perturbations at the three accelerations, and for TRT in both perturbation directions at the lowest acceleration (0.5 m/s2). Low standard error of measurement was found for all maximal voluntary control and balance control measures (Table 1).

Cluster analysis allocated participants in the three clusters as follow: one in cluster_1 (high impact of impairment), four in cluster_2 (middle impact of impairment), and nine in cluster_3 (low impact of impairment). No outliers were found in the data with all the measures showing a coefficient of variability higher than the threshold. Therefore, all five measures of strength and the 18 measures of trunk control were considered in the k-means.

Internal validation results were represented for visualization purposes in the space of the first two principal components (Fig. 3) and three clusters were clearly visible, supporting the hypothesis. External validation results showed an accuracy of 86%, and results for precision and sensitivity are reported in Table 2. The lowest precision was found for cluster_2 (50%) and the lowest sensitivity for group_3 (81.8%). Macro-averaging F score was equal to 0.89, showing an overall balance between precision and sensitivity for the classification model used.

Fig. 3
figure 3

Internal validation. Data representation in the space of the first two components of principal component analysis (PC1 and PC2) clearly showed three clusters

Table 2 Results for external validation

For each identified cluster, means ± standard deviations for all measures were calculated (Table 3). Kruskal–Wallis test and effect size allowed identification of the most relevant measures in clusters aggregations (last two columns of Table 3).

Table 3 Results for measures relevance

Concerning the maximal voluntary contraction test, PFawo and PFawo/PFaw were the two measures that contributed most to distinguish between cluster_2 and cluster_3 (effect size > 0.72). Concerning the balance control test, Trunkrom allowed discriminating cluster_3 from cluster_1 and from cluster_2 in both forward and backward stimuli given at 1 m/s2 (effect size > 0.75). No trunk control measures discriminated between cluster_1 and cluster_2. For the four measures that contributed most to clustering participants, post hoc power analysis exceeded 80%.

Spearman correlation results showed that PFawo and Trunkrom had an inverse correlation for all balance control accelerations and for forward and backward stimuli directions (Table 4). Trunkrom in both stimuli directions also correlated with PFaw and with PFawo/PFaw in balance control at 1 m/s2.

Table 4 Results for Spearman correlation between maximal voluntary contraction and balance control measures

4 Discussion

Overall, ICC, standard error of measurement, and Wilcoxon test showed very good results for the repeatability of impairment measures. Accuracy, precision, sensitivity, and macro-averaging F score showed very good results for the cluster analysis validation. Results of Kruskal–Wallis test, statistical power, and Spearman correlation showed that two measures of strength and two measures of trunk control were relevant in grouping participants in three clusters according to their impairment.

4.1 Measures for the classification purposes

The Position Stand and following documents for the development of an evidence-based classification systems [2, 5, 27] state the importance of identifying measures of impairment that can be used for classification purposes. Measures of impairment should be specific for only one of the eligible impairments, reliable, representative for the maximum variance possible in sport performance, quantitative, and ratio scaled [5]. These requirements are discussed singularly here.

The present study evaluated two different measurements, each of which is specific for an eligible impairment (trunk strength and trunk control). The tests to assess trunk strength (upright seated press and simulated poling) are selective and allow the participants to exert maximum force production [13, 39]. In addition, isometric maximal voluntary contraction tests were selected to evaluate impairment of strength, because isometric tests have been successfully adopted for this purpose in wheelchair racing [6, 8]. Finally, maximal voluntary contraction tests have been recommended to assess muscles strength for the purpose of classification in Paralympic sports, as reported in [39].

Core stability has been defined as the ability to maintain or recovery the trunk equilibrium after a perturbation [40]. Therefore, unpredictable perturbations have been used in the literature to assess trunk stability and trunk control ability [9, 41]. Trunk stability can be maintained by a combination of passive stiffness (ligaments and bones) and active stiffness (muscles), with the latter playing a greater role [42]. Stability can be increased by co-contraction of trunk flexor and extensor muscles and by increasing intra-abdominal pressure [43]. Athletes with physical impairment, such as spinal cord injury, have reduced or absent ability to voluntarily activate core muscles depending on the level of physical impairment.

Previously, balance control tests have been used to evaluate trunk control in cross-country sit-skiers, while seated on their personal sledge [20]. In the current study, balance control tests were performed at three different accelerations to find the one that was comfortable for the participants, but high enough to be influenced by the impairment. During skiing, when sudden balance loss occurs, trunk accelerations are similar to the one used in this study. However, the trunk acceleration at the beginning of the poling phase is expected to be much lower. This condition was not taken into account in the balance control test.

The proposed measurements of impairment of trunk strength and trunk control seem also to be reliable due to the test–retest results between days 1 and 2 (absence of statistical differences, very high intraclass correlation coefficient, and low standard error of measurement). There were some intraclass correlation coefficient results that deserve further discussion. First, the maximal voluntary contraction tests. The ratio between the anterior pushing force without and with a backrest reached a lower intraclass correlation coefficient (0.71) compared to the other four measures of strength (> 0.93). This might be explained because the ratio is a calculated measure, whereas the other four measures were directly acquired. The variability of the two measures used to calculate the ratio may reduce the intraclass correlation coefficient, which nevertheless is still a good value [34].

The other two results of intraclass correlation coefficient that require further discussion concern the balance test. The low intraclass correlation coefficient of trunk angle at rest in both perturbation directions and three accelerations might suggest that the variability within a day and between 2 days was similar. This finding could occur because evaluating trunk angle at rest before the stimulus onset did not highlight participants’ behavior in response to the perturbations given during the tests. Indeed, it might be possible that within participant variability had greater contribution. Therefore, trunk angle at rest does not seem to be a suitable measure from a reliability perspective.

Moreover, it is interesting to notice that at the lowest acceleration (0.5 m/s2), trunk reaction time had low intraclass correlation coefficient in both perturbation directions. This finding may suggest that the lowest acceleration could be a test condition easily managed and controlled by participants. Therefore, 0.5 m/s2 seems to be the least suitable of the three candidate accelerations to measure trunk control.

The proposed measurements of impairment are representative of the greatest variance possible in sport performance [5]. Indeed, trunk strength and trunk control can be assessed with maximal voluntary contraction and balance control tests using one device. Finally, measures of impairment should be quantitative and ratio scaled. These requirements are also satisfied, since numeric (instead of qualitative or grade) scale measures were used to assess impairment level.

4.2 Measures to assess trunk impairment

The use of a cluster analysis is a transparent scientific process and if good results were found during its validation it could contribute in defining a class structure for classification in Paralympic sports [6]. This was also demonstrated in studies focused on cross-country sit skiing [20, 23]. In the present study, cluster analysis validation showed promising results for the proposed classification model. Three clusters were disclosed by the internal validation (Fig. 3). External validation results (accuracy, precision, sensitivity, and F score) showed an overall agreement between the three clusters (cluster analysis output) and the three groups (a priori information defined on participants’ trunk control and balance ability [1]).

Although in cluster_1 there was only one participant (classified as LW10.5), cluster analysis results seem to be in line with the current classification system. Only 2 participants out of 14 were allocated in a cluster different than expected. They were expected to belong to cluster_3, since they were LW11.5 or LW12. However, one of these two has partial trunk control ability due to a spinal cord injury, which seems to be more coherent with the description of partial trunk control ability of cluster_2. Whether cluster analysis offers a similar number of classes as the current system of classification remains to be studied with more participants. Indeed, such a study would benefit from including more sit-skiers from lower classes (LW10 and LW10.5) and able-bodied skiers as upper benchmark.

A second important outcome of the cluster analysis approach was the identification of the measures that mostly discriminated participants’ impairment (Table 3). Among the measures of trunk strength tests, neither anterior nor posterior force in the upright seated press with the backrest significantly contributed to discriminate between clusters. The posterior force was collected as a control that all forces were consistent. However, since posterior force was close to anterior force, it is advisable not to consider it further.

The anterior force in the upright seated press without backrest and the ratio between the two anterior forces (without/with backrest) had the greatest effect in clusters differentiation. Therefore, they are sufficient for classification purposes, so simulated poling test may be unnecessary. However, the ratio of the two anterior forces was the measure that showed the lowest reliability. In further studies, to reduce the variability and increase measure reliability, a possible solution could be to assess the mean value over a time interval instead of considering only the peaks. A suitable time interval would need to be determined.

Concerning the balance control test, trunk angle at rest could not be considered as a measure of trunk control because of the low reliability and the non-significant contribution in discriminating among clusters. Trunk reaction time, being significant only for backward perturbations at 2.5 m/s2, does not seem to be a suitable measure of trunk control either. Trunk range of motion showed the greatest effect size for the three accelerations and for both forward and backward stimuli, with the only exception in forward stimuli at 2.5 m/s2. Comparing the three accelerations, ± 1 m/s2 was the one with the greatest discrimination among the three clusters, whereas the other two (± 0.5 and ± 2.5 m/s2) distinguish only between cluster_2 and cluster_3.

These findings are also in line with correlation results (Table 4) that highlight a very high correlations between the two trunk control measures and the two trunk strength measures. The trunk range of motion in forward and backward perturbations at 1 m/s2 correlated with the anterior force without backrest and with the ratio of the two anterior forces without/with back support, collected in strength tests. Therefore, these four measures can be identified as the most descriptive of the participants’ characteristics. However, it is hard to compare this result with previous work that used unpredictable perturbations for classification [20]. In that study, only ± 2.5 m/s2 acceleration was used, two rather than three clusters were found, and participants were tested on their own sit-ski instead of a standard seat.

Overall, the results indicate that a shorter protocol and fewer measures than used here might be proposed as a measure of trunk strength and control impairment. However, to validate this finding, further analysis with more participants and considering possible gender effects is required.

4.3 Limitations

In this study, there are two main limitations that should be discussed. The first limitation is the small and skewed sample size, especially towards participants with low impairment. This kind of population may lead to difficulties in cluster analysis. However, the four measures that contributed most to cluster participants had power over 80%. Testing more sit-skiers would corroborate actual results about cluster_2 and cluster_3 and would allow generalizing results of cluster_1, considering that only one sit-skier was included in this cluster. Since worldwide the number of elite cross-country sit-skiers is low, athletes from other sitting sports, especially those with high trunk impairment, could be included.

The second main limitation of this study is related to the inclusion of both male and female participants, especially in maximal voluntary contraction tests. However, to limit gender effect on force, the ratio between anterior forces was calculated. Both genders were included in the study because strength generated in all maximal voluntary contraction tests have an overall increasing trend between clusters versus impairment reduction (from cluster_1 to cluster_3). This effect seems to be more evident with respect to gender effect (Fig. 4).

Fig. 4
figure 4

Generated strength for male (black triangle) and female (white circle) participants in the three clusters for the maximal voluntary contraction tests

5 Conclusion

The development of an evidence-based classification in Paralympic sports has become mandatory. An evidence-based classification system is based on measures of impairment that can be used for classification purposes and sport-specific measures of performance. This study aimed to develop and evaluate measures of impairment of trunk strength and trunk control that can be used for classification purposes. Considering the current findings, it seems that performing trunk strength and balance control tests, and adopting a clustering approach, sit-skiers could be gathered in three clusters coherently with their trunk control and balance ability. However, a validation of the current findings including a larger sample size and considering effects of gender should be conducted. Since impairment of trunk is crucial in all sports, the importance of this study is that results found for cross-country sit skiing may also be extended to other sitting sports.