Background

Gait analysis provides highly relevant outcomes for the older population. It reflects both impairment-level deficits and functional status [13]. Temporal-spatial gait variables have repeatedly been shown to be important for identification of injury/disease [46], prediction of falls [7, 8], and quantification of the effect of interventions [9, 10]. In particular, gait speed has been associated with health status, activity levels and quality of life, and is predictive of future morbidity and mortality [1114].

The GAITRite® system is a well established method of quantifying gait. Over 200 papers have been published since 2000 using data collected and processed with the GAITRite® system. The measurement properties of a large number of temporal and spatial outcomes derived from GAITRite® data have been reported (eg. [1517]). Recently, a new program has been developed in order to solve some of the problems with processing difficult footstep patterns, for example overlapping steps and turns. The PKmas® software purports to accurately derive temporal-spatial outcomes from raw GAITRite® data. However, in order to interpret clinical and research findings from PKmas® processed gait data, and to be able to draw comparisons with published data that has used the GAITRite® system, the inter-program reliability of the two processing algorithms needed to be examined. A direct comparison of outcomes from the same walk trials would enable the degree of variability caused by the processing program alone to be determined, irrespective of other sources of noise in the data.

This study examined the level of agreement and inter-program variability between the two processing programs, using data from older people walking at self-selected, preferred speed, on a GAITRite® mat. Very high levels of agreement for an outcome variable would indicate the variable is interchangeable regardless of the program used to process it. Systematic differences, if known, can be taken into consideration during comparisons. Lower levels of agreement due to random spread of differences would suggest the outcome may have important differences when processed with PKmas®, and the reliability and validity of the variable should not be assumed to be the same as with GAITRite®.

Methods

Participants

Data from two groups of participants were used for this study. The first group consisted of 100 healthy older people from the community in Trondheim, Norway. They were recruited for the Generation 100 study, an exercise intervention study (ClinicalTrials.gov identifier: NCT01666340). The second group included 50 older people, who were tested four months after surgical repair of hip fracture. The hip fracture patients were all part of the Trondheim Hip Fracture Trial [18]. All participants gave written informed consent to participate in their respective studies. Ethical approvals for the studies, which included the use of their data for purposes of cross-sectional and methods analyses, were granted by the Norwegian Ethical Review Board for Medical and Health Research (REK) – South East Region (2013/787b) and the Regional Committee of Ethics in Medical Research (Mid-Norway) (REK4.2008.335) respectively.

Procedures

For the healthy group, the baseline GAITRite® (CIR Systems Inc, Havertown, PA) raw data was collected using a 5.5 m mat (active length). Participants were asked to walk along the walkway at their preferred (usual) speed starting and stopping at least 1 m outside the ends of the mat (total walkway length at least 8.7 m). The hip fracture group were similarly asked to walk along a 4.7 m GAITRite® mat (total walkway at least 7.7 m) at their preferred speed. Only the first pass was used for this study.

The raw data was processed with both GAITRite® (v3.8E) and PKmas® ( v507C4i3) (ProtoKinetics, Havertown, PA) software and exported to Excel. After processing, all walks were checked to ensure the same steps, as well as the same number of steps, were used in both processing methods. Thirteen healthy participants and six hip fracture participants were excluded because during the processing of the walk files, a different number of steps were retained. A slight variation in which footfalls are retained would lead to small differences in the outcome variable values. This difference is likely to be clinically insignificant, but we wanted to exclude all sources of variation apart from those caused by the different software algorithms. It was noted that when the walk had two or fewer footfalls with one foot, PKmas® does not calculate standard deviation (SD) for ipsilateral Stride Length, Step Length, Stride Duration, Step Duration and Base Width. In GAITRite®, SD of Stride Length, Stride Duration and Base Width are not calculated. When there is no SD calculated, PKmas® exports a blank cell to Excel, however GAITRite® exports a zero. This creates an error when the right and left values are averaged. For this reason we excluded walks where there were less than six footfalls in total. One healthy participant was excluded for this reason.

Outcome variables

There are many gait variables that can be derived from data collected with GAITRite® mats. The outcome variables compared in this study were chosen as those previously reported in validity and/or reliability studies using the GAITRite® system (eg. [1517], further information is provided in Additional file 1: A). The included variables were those that are calculated from the footfalls themselves, rather than variables that are derived from other gait variables. Thus symmetry variables and composite scores were not examined. Exceptions to this are Speed which is combines Stride Length and Stride Duration, and the ‘percentage of gait cycle’ variables. For all variables apart from Speed and Cadence, the mean of the left and right values were calculated and used as a single data point for the variable.

Statistical analyses

Mean difference between values for each outcome variable from the two programs, and the percentage error (mean of the absolute difference expressed as a proportion of the GAITRite® value) were obtained for each group to identify the magnitude of the differences between the processing algorithms. The mean percentage difference underestimates the variability at individual level if differences are both positive and negative. The mean absolute percentage differences were therefore calculated to better indicate the size of the error at individual level. The mean differences for the total cohort are also presented with this difference expressed as a percentage of the mean GAITRite® value. Intraclass correlation coefficients (ICC) for absolute agreement (2,1) and consistency (3,1) were calculated for each pair of outcomes to determine inter-program reliability [19]. Absolute agreement indicates how close individual data points are to each other using the two programs, while consistency indicates the relative agreement or agreement regardless of systematic error [20]. The Bland-Altman method was used to calculate the 95% limits of agreement (LOA) to demonstrate the spread of differences [21], and mean versus difference plots were inspected in order to identify heteroscedasticity in the differences over the range of values.

Results

The final cohort consisted of 86 healthy and 44 hip fracture participants who had mean age ± SD of 72.0 ± 1.3 years and 82.7 ± 6.0 years respectively. Fifty-six percent of the healthy group and 82% of the hip fracture group were women. Table 1 presents the group means for each group, each program and each variable, plus the mean difference between the values generated by each processing program and mean absolute percentage differences. The mean differences between programs were similar for both groups of participants, although the mean absolute percentage difference was sometimes higher among the healthy group for the variability measures because the SD values tended to be lower among the healthier older people.

Table 1 Data for each outcome variable

Table 2 presents the results of the ICCs, differences for the total cohort, and LOA. The inter-program reliability was very high (both ICCs ≥ 0.99, p < 0.001) for Speed, Cadence, Stride Length, Step Length, Stride Duration, Step Duration, Stance Duration, Swing Duration, Double Support Duration, Stance%, Double Support% and Stride Duration Variability. ICC(2,1) showed absolute agreement above 0.95 for all others except Base Width (0.86) and Step Length Variability (0.84). ICC(3,1) was similar to absolute agreement for all measures except Base Width where consistency was very high at 0.97. High consistency but lower absolute agreement indicates that there was a systematic difference in the Base Width values.

Table 2 Intraclass correlations and limits of agreement

The magnitudes of the mean differences between the two programs were very small relative to the magnitudes of the variables themselves for all measures apart from Base Width (mean difference −1.6 cm, or 17.4% of mean GAITRite® value) and Foot Angle (mean difference 0.7°, or 9.7% of mean GAITRite® value). Mean absolute percentage differences showed individual differences could be quite large for all of the variability measures except Stride Duration Variability. Mean absolute percentage differences were also large for Base Width (around 20%, differences ranged from −4.1 to 0.4 cm) and Foot Angle (range −2.6 to 3.5°). The magnitude of the differences was especially high for Foot Angle with mean absolute percentage difference for the cohort of 57%.

Scatter plots and Bland-Altman plots are shown for Speed, Base Width, Step Length Variability and Stride Duration Variability in Figure 1. The plot for Base Width shows >95% of differences were negative indicating that PKmas® Base Width values were systematically lower than the GAITRite® values. The plots for Stride Duration Variability (not shown) and Step Duration Variability showed greater differences for lower values of variability which affected only a small number of healthy participants. Apart from these two variables the plots showed even spread of differences over the range of values.

Figure 1
figure 1

Associations between GAITRite® and PK MAS ® data. Scatter plots showing the associations between GAITRite® and PKMAS® data, and Bland-Altman plots showing mean difference and 95% limits of agreement for Speed, Base Width, Step Length Variability and Stride Duration Variability. ● = healthy older people, ○ = post hip fracture patients.

Discussion

This study demonstrated high levels of absolute agreement and consistency between the new and the established algorithms for most of the temporal and spatial gait variables we examined using electronic walkway data from healthy and gait impaired older people. All ICC values were greater than 0.84 and, with the exception of Base Width and Step Length Variability, greater than 0.95. However, the study identified several variables that should be considered with some caution at group level, and a few more that could be problematic at individual level if comparing GAITRite® to PKmas®.

Base width

The ICC(2,1) absolute agreement for Base Width was 0.86 but the ICC(3,1) for consistency was 0.97, which suggests that while absolute agreement with GAITRite® values may be lacking, and both individual and group level comparisons not recommended, the variable processed by PKmas® may be itself reliable and as good at detecting change over time as GAITRite®. PKmas® values are approximately 1.6 cm, or about 17%, lower than GAITRite® values. The systematic and random differences between the two programs can be explained by differences in how they define and calculate Base Width (see Additional file 2: B1). In essence, an outward foot angle greater than zero degrees, will lead to the GAITRite® Base Width measure being larger than the PKmas® base width measure. The greater the amount of Foot Angle, the larger the difference between the two Base Width values. It should be noted, however, that previous studies have questioned the reliability of GAITRite® Base Width as an outcome measure. Menz et al. found the test-retest ICC using the average from three walks was only 0.49 for a group of older people [16]. This suggests the within-individual variation can be close to the between-individual variability.

Step length variability

The lower ICCs for absolute agreement and consistency for Step Length Variability suggest that the output from the two processing methods should not be considered equivalent at individual level, and considered with caution at group level. One reason is that the magnitude of the variable itself is quite small so that even small differences between the programs can result in relatively large values for the differences between the values. In addition, step spatial calculations are different in the two processing methods (Additional file 2: B2). These small differences that do not noticeably affect the resulting values for Step Length if the walk is reasonably straight, can result in relatively larger differences in the SD of Step Length. If the direction of progression of the walk is not parallel to the mat, the values, and SDs of the values, can differ between the two programs even more.

Foot angle

The ICCs indicated that Foot Angle was acceptable at group and individual level although values appeared to be consistently about 0.7° higher with PKMAS®. The upper level of the 95% limit of agreement was 2.6°. These differences could be considered unacceptably large. Values for individuals were on average 57% different which also appears unacceptably large. It is important to note here that, as with Base Width, the reliability of the Foot Angle as an outcome measure has been questioned because the variability within individuals is relatively large compared with the magnitude of the variable [16]. The difference between the programs can again be explained by the different methods of calculation (Additional file 2: B3). It is not possible from this study to say which method is more valid or reliable.

All variability measures

The agreement for variability of both the temporal and spatial stride and step values appeared to be good at group level but there were some unacceptably high absolute differences, in particular among individuals with very low variability. This seems to be due to the resolution of the standard deviation calculation when the values are close to zero. Some small values are exported as zero by GAITRite® but as greater than zero by PKmas®. The small differences in the calculation of spatial measures of Stride and Step Length can also be explained by differences in the location of the heel reference point (Additional file 2: B1). There are also differences in the calculations of temporal measures (Additional file 2: B4).

Prior studies have determined the validity and reliability for variables derived from the GAITRite® system (Additional file 1: A). GAITRite® data has been compared with paper and ink techniques, video-based systems, in-shoe stride analysers and 3-dimensional motion analysis systems [1517, 22, 23]. The measurement error between the PKmas® and GAITRite® algorithms was found to be smaller than errors reported in these other comparisons. The clinical meaning of the magnitude of the differences needs to be considered in the light of the purpose of the measurement. The impact of the slight differences in definitions and calculations used by PKmas® for some of the variables may affect (improve or reduce) the validity of the variable in terms of its association with disease status, function and fall risk. Such studies are recommended for future research.

We chose to take the average of the values from left and right sides, rather than the average of all the steps. For most of the variables there will be negligible difference between the mean of the left and right sides and the mean of all the footfalls. However, for the variability measures, this decision is clinically important because mean SD is a better indication of the within-individual variability than the SD of all steps which will also be related to the degree of asymmetry [24]. There were also practical reasons for this approach as GAITRite® only exports left and right means and not the mean of all the footfalls. To derive the mean of all the footfalls, the individual footfalls would need to be exported. PKMAS® exports right, left and grand means. Other considerations regarding the two programs include:

  1. 1.

    We found that PKmas® can indeed process difficult walks that include overlapping, double or backward steps more easily than GAITRite®.

  2. 2.

    GAITRite® exports a zero when a value cannot be calculated, for example due to insufficient steps. This affects the SD of many variables when there are five or fewer footfalls. While only one of our healthy participants needed only five steps to cover the active walkway (5.5 m), our participants were all over 70 years and walking at preferred speed. Researchers interested in the standard deviation of walks from younger participants or people walking at faster speeds should use caution with the data exported from GAITRite®, especially with shorter mats. We also found that SD values close to zero are exported as zero by GAITRite® but as a small value by PKmas®.

  3. 3.

    PKmas® purports to be able to process data recorded with GAITRite® hardware, however we encountered a few problems. In particular, PKmas® periodically reads a single active sensor as a footfall and careful checking is required to identify these ‘extra’ footfalls. In addition, PKmas® occasionally had difficulty determining the duration of stance phase for the final step. This may be because both our mats have ‘seen a lot of action’, but we recommend careful checking of each walk during processing of GAITRite® data with PKmas®.

This study did not directly investigate the reliability or validity of PKmas® derived data, however for the variables with good absolute agreement and consistency and minor differences from GAITRite® derived variables, validity and reliability can be assumed to be the same as for GAITRite®. For the remaining variables, it is not possible to know from this study whether validity and reliability are better or worse than for the GAITRite® derived variables. The study aimed to directly compare the two programs and a strength of the study is that the same footsteps were used by both processing algorithms and therefore the differences found can only be explained by the processing. We included participants with a range of gait ability (preferred gait speed ranged between 27-182 cm/s) and included participants with and without gait impairment. In addition, the study used testing procedures typical of those used in research studies with this population. However, the findings cannot be generalised to all populations and testing procedures.

Conclusions

GAITRite® is a widely used clinical and research tool and this report is an important step in determining the utility of PKmas® as an alternative processing method. We conclude that Speed, Cadence, Stride Length, Step Length, Stride Duration, Step Duration, Stance Duration, Swing Duration, Double Support Duration, Stance%, Double Support% and Stride Duration Variability values are interchangeable with GAITRite® values. Base Width and Foot Angle have systematic differences of 1.6 cm lower with PKmas® and 0.7° higher with PKmas® respectively. The relatively large, randomly spread differences found for Base Width, Foot Angle, and variability of Stride Length, Step Length, Step Duration and Step Width mean that we recommend values are not comparable at individual level. The findings from this study will help inform clinicians and researchers wishing to interpret data processed using PKmas®, and compare individual or group level data with published data that was processed using GAITRite®.