1 Introduction

Statistically significant results are considered as important for many researchers, and even professional practitioners. However, if subjects improve their postural stability (e.g., observed as a decrease in mean center of pressure (COP) trajectory) after an exercise program, those few millimeters while statistically significant, might not mean much for highly skilled athletes [1]; although they might be particularly meaningful for individuals with balance impairments as a result of age, disease, or injury [24]. A combined use of P values and effect sizes can provide significantly and practically meaningful interpretation of the results of balance studies. Verification of this assumption was accomplished by a comparison of statistical and practical significance of variables of task-oriented balance tests in different groups of subjects and after different balance training programs.

2 Task-Oriented Balance Tests

Recently, more sophisticated methods based on visual feedback control of body position, as compared with static and dynamic posturography, have become a part of functional assessment of body balance [57]. Among a variety of alternatives, a visually guided center of mass (COM) target-matching task and a visually guided COM tracking task seem to be the most promising. In both cases, the subjects are provided with feedback on the COM displacement on a computer screen while standing on either a force platform or a spring-supported platform equipped with a system used for feedback monitoring of the COM movement [8].

In the first case, the subjects have to hit the target randomly appearing in one of the corners of the screen by a horizontal COM shift in the appropriate direction (Fig. 1). The system registers the time, distance, and velocity of the COP trajectory between the appearance of the stimulus and its being hit by a horizontal COM shift.

Fig. 1
figure 1

Visually guided COM target-matching task: a subject hits the target randomly appearing in one of the corners of the screen by a horizontal COM shift in the appropriate direction while standing on either b a force platform or c a spring-supported platform equipped with the FiTRO Sway Check system used for feedback monitoring of the COP movement. COM center of mass, COP center of pressure

In the second, the task of the subjects is to trace, by shifting their COM, a curve flowing either in a horizontal or vertical direction (Fig. 2). The deviation of an instant COP position from the curve is recorded at 100 Hz by means of the FiTRO Sway Check system.

Fig. 2
figure 2

Visually guided COM tracking task: a subject is provided with feedback on the COM displacement on a computer screen while standing on a portable force platform. His/her task is to trace, by shifting its COM, a curve flowing in either a b vertical or c horizontal direction. The deviation of an instant COP position from the curve is recorded at 100 Hz by means of the FiTRO Sway Check system. COM center of mass, COP center of pressure

The analysis of repeated measurements showed test-retest correlation coefficients and measurement errors of 0.81 and 8.8 %, respectively, for a visually guided COM target-matching task and 0.83 and 7.0 %, respectively, for a visually guided COM tracking task [9]. Such reliability of task-oriented balance tests is comparable to static balance tests, but with better potential for discrimination of groups with different levels of balance capabilities [10].

3 P Values and Effect Size Values in Balance Research

A unique feature of the above-described, task-oriented balance tests is that voluntary feedback control of COM movement can be provided under two different conditions. First, the subjects can concentrate on a particular part of the action (e.g., hitting the target) or focus on the movements themselves (e.g., the positioning of the feet). A moderate correlation (r = 0.46) between variables of these task-oriented balance tests and the common variance of only 13 % suggests that they measure distinct qualities [11]. Such variations of these tests may allow for the assessment of accuracy of regulation of COM movement that requires less or more feedback processing. This approach that allows evaluating different aspects of postural control is also of importance for the conception and evaluation of visual feedback interventions. The comparison of balance parameters before and after training or rehabilitation not only provides information on physiological adaptations (e.g., improvement in proprioceptive function) but also on mechanical changes in technique (e.g., more precise regulation of COM movement with less effort) [12].

Taking into account slight between-group differences [13, 14] and changes associated with balance training based on platform feedback exercises [15, 16], it may be assumed that these tests are representative for the interpretation of significant and practically meaningful effects of cross-sectional and intervention studies. Large samples are more likely to show a significant difference. In such cases, it is possible to find statistically significant results though the size of the effect is too small to be practically important. Conversely, the finding of large effect sizes may be spurious, particularly with small sample sizes and lack of statistical significance. Therefore, it is important to look at the P values as well as the size of effect and confidence interval when interpreting the test results of balance studies. There are different methods of calculating effect sizes. Cohen’s d [17] is the most common estimate, which can also be used in meta-analyses. This is because when results of studies are reported as statistically significant, the decision about the relative effects of different balance training programs cannot be based on a comparison of P values. A Cohen’s d score is frequently accompanied by a confidence interval (CI). Calculation of a 95 % CI around the Cohen’s d score can facilitate the comparison of effect sizes of different interventions. While P values are used to assess whether or not an effect exists, the use of 95 % CIs allows for assessment of uncertainty in the magnitude of the effect.

However, it should be noted that the effect size is largely not interpretable in the absence of a statistically significant effect, where the P value threshold for statistical significance has been predetermined prior to the analysis. On the contrary, the use of the P value alone does not give any indication about the size of the intervention effect and the magnitude of the P value cannot be used to describe the practical importance of an intervention effect.

3.1 Significant and Practically Meaningful Between-Group Differences in Parameters of Task-Oriented Balance Tests

3.1.1 P Values and Effect Sizes in a Comparison of Subjects of Different Ages

There are several methods to translate statistically significant data into results that may be applied to practice. One of them is a comparison of a sample to a meaningful reference group. Some research results are in the form of test scores that can be compared with a table of norms. By such comparison, we can evaluate whether the sample is average, superior, or inferior when compared with the population. In the following studies, healthy young subjects were used as a reference group [2, 9, 18, 19]. Statistical and practical significance of differences between groups of subjects of different ages in parameters of a visually guided COM target-matching task and a visually guided COM tracking task are shown in Tables 1 and 2. It seems that both P values and effect size statistics provide a comparable estimate of between-group differences when large samples are used. However, the interpretation of Cohen’s d can be problematic in samples with non-normal distributions or restricted ranges. As shown, another alternative of interpreting effect sizes is to convert Cohen’s d scores to percentiles. Effect sizes can also be interpreted in terms of the percent of non-overlap of the reference group‘s scores with those of the group of different ages. The same approach can be used for a comparison of experimental and control groups in intervention studies.

Table 1 Between-group differences in parameters of a visually guided COM target-matching task
Table 2 Between-group differences in parameters of a visually guided COM tracking task

3.1.2 P Values and Effect Sizes in a Comparison of Athletes of Different Specializations

Another study showed no significant differences among groups of competitors in snowboarding, windsurfing, karate, cycling, canoeing, and rowing in the mean COP distance from the horizontally (range 1–11 %) and vertically (range 0.5–10 %) flowing curves [18]. Nevertheless, the effect sizes calculated between groups with the lowest and highest values (1.1 and 0.9, respectively) indicate large effects that can be considered as a great practical significance. The nonsignificant differences are likely because of the small number of participants (from 5 to 13), because t and F statistics are partially a function of the sample size. This assumption may also be collaborated by comparison of these and the above-mentioned studies [2, 9, 18, 19] with equivalent differences between groups with different sample sizes where t and F statistics varied widely. Because in sport practice the results of small groups of elite athletes are often analyzed, effect size estimates that are not influenced by sample sizes should be used.

3.1.3 P Values and Effect Sizes in a Comparison of Individuals After Lower Limb Injuries

A visually guided COM tracking task might be an appropriate alternative for individuals after lower limb injuries, namely in an early phase of rehabilitation when effusion and pain in the joint can make it particularly sensitive to movement, which is perceived as possibly aggravating that injury [20]. A comparison of individuals after anterior cruciate ligament injury (n = 13) showed significantly higher mean COP distance from the curve in the antero-posterior (A-P) direction while standing on the injured leg than on the non-injured leg (16.1 %; P < 0.01; 95 % CI 13.6–18.6). Conversely, its values in the medio-lateral (M-L) direction did not differ significantly between legs (6.5 %; P > 0.05; 95 % CI 4.4–8.6) [18]. However, the calculated effect sizes of 1.7 and 0.5 signify large and moderate effects. It means that evaluation of accuracy of visual feedback control of COM movement in both A-P and M-L directions can provide useful information on between-legs differences and efficiency of rehabilitation after lower limb injuries.

3.2 Significant and Practically Meaningful Post-Intervention Changes in Parameters of Task-Oriented Balance Tests

3.2.1 P Values and Effect Sizes in Evaluation of Acute Response to Balance Exercises

The first study evaluated the accuracy of visual feedback control of body position and static and dynamic balance over repeated trials of a visually guided COM target-matching task (20 sets of 60 stimuli with a 2-min rest in-between) [21]. It has been found that mean response time significantly decreased from the 1st to the 20th trial (44 %, P < 0.01). Substantial share of the improvements took place during the initial six trials. At the same time, the mean distance of COP movement significantly decreased from the 1st to the 12th trial (36 %, P < 0.05) and then slightly increased within the 20th trial (17 %). Conversely, mean COP velocity significantly increased from the 1st to the 20th trial (28 %, P < 0.05). These changes were in accordance with calculated effect sizes (1.7 for mean response time, 0.6 for mean COP distance covered, and 1.7 for mean COP velocity). It means that with repeated trials (together 1,200 shifts of the COM to visual stimuli) subjects responded to visual stimuli faster and more precisely by horizontal shifting of COM in one of the four directions according to the position of the stimulus on the screen.

However, such an acute improvement of accuracy of visual feedback control of body position during practice was not beneficial for improvement of static and dynamic balance. There were no significant changes in the COP velocity registered in static and dynamic conditions (4.1 % and 6.2 %, respectively). Although deemed a small effect (0.3 for both static and dynamic balance), such an improvement would have far greater clinical importance for elderly people and those with impaired coordination due to disease or injury than a large effect of >0.8 in highly skilled athletes.

A second study evaluated the accuracy of visual feedback control of body position and static and dynamic balance over repeated 30-s trials of a visually guided COM tracking task [22]. The distance of sway trajectory from the curve decreased in both A-P and M-L directions when repeatedly performing a visually guided COM tracking task. However, a significant improvement was observed only during the initial seven trials (39.4 %, P < 0.01). After cessation of practice its values slightly decreased over a period of 10 min and then gradually increased towards 30 min of recovery. These changes were in accordance with effect sizes of >0.8 that indicate large effects. It means that this form of balance exercise (in average of 20 min) temporarily improves accuracy of visual feedback control of COP movement in both A-P and M-L directions. Further analysis showed a greater decline in their values over repeated trials under dynamic than static conditions (46.1 % and 26.3 %, respectively). This effect is very probably because of more efficient regulation of COM movement primarily by rotation of ankle joints during stance on an unstable spring-supported platform.

3.2.2 P Values and Effect Sizes in Evaluation of the Effects of Balance Training Interventions

3.2.2.1 Balance Exercises Without and with Visual Feedback

The study compared the effects of three different 10-week training programs without and with visual feedback balance exercises on coordination abilities in early school-age children [23, 24]. Regarding the changes in parameters of a visually guided COM tracking task, all data in experimental groups have been found to be significant at P < 0.01. Even the most trivial effect has been found to be significant. It may be corroborated by small effect sizes found in experimental groups 1, 2, and 3 in the mean COP distance from both the horizontally (0.23, 0.20, and 0.18, respectively) and vertically flowing curve (0.17, 0.28, and 0.09, respectively). This means that if the means of two measures did not differ by 0.2 standard deviations (SDs) or more, the pre-post intervention changes can be considered as trivial, even if it is statistically significant. In other words, the effect size of <0.2 is described as a small effect and not particularly meaningful for practice. Such a small effect size may be the result of small mean differences and/or relatively large SD values. In this study, it was mainly due to large SD values.

3.2.2.2 Task-Oriented Balance Exercises

The first study evaluated the effect of 3 weeks of visually guided COM target-matching exercise (three sets of 200 stimuli with a 5-min rest in between, three times a week) on neuromuscular performance in untrained subjects [25]. Mean response time significantly decreased (47.9 %, P < 0.01). At the same time, mean distance of COP movement also decreased significantly (18.2 %, P < 0.05) while mean COP velocity increased (32 %, P < 0.05). While the mean response time showed large effects (effect size >0.8), small and moderate effects were found for mean COP distance and mean COP velocity (effects sizes of 0.4 and 0.7, respectively).

The second study evaluated the effect of 12 weeks of conventional and task-oriented balance training on visual feedback control of body position in individuals with functional imbalances [22]. The training during the initial 4 weeks consisted of conventional exercises (four sessions/week) followed by including visual feedback exercises into the program during the next 8 weeks (two of four sessions/week). The mean COP distance from the horizontally and vertically flowing curve measured during a visually guided COM tracking task only slightly decreased during the initial 4 weeks (~8.7 %, P > 0.05). However, its greater decline was observed from the 5th to the 8th week (~10.6 %) and from the 9th to the 12th week of the training (~14.5 %, P < .05) when visual feedback exercises were included into the training program. A similar trend was also observed in the case of a visually guided COM target-matching task. However, there were significant individual differences. The subject with a good initial performance learned faster as compared with the subject with a slower response time and a longer distance of COP movement registered prior to the training (29.3 % and 17.0 %, respectively).

These findings indicate that a conventional training program consisting of balance exercises does not improve the accuracy of visual feedback control of body position (effect size = 0.35). Providing visual feedback of COM movement on a computer screen during training contributes to a more precise perception of COM position and regulation of its movement during different task-oriented balance exercises (effect size = 1.1). However, a limited sample size (six subjects) was used in this study [22]. In such a case, using elaborate statistics does not make the finding more meaningful. The study using a small number of subjects (<10) should be considered simply as a case study and should be treated accordingly.

3.2.2.3 Balance Exercises Alone and in Combination with Agility or Resistance Exercises

One study evaluated the effect of 8 weeks of instability agility training on parameters of balance and reaction time in recreationally active individuals [26]. Where statistically significant, pre-post training changes in parameters of static balance tests were found, the effect sizes were ≥0.8. In contrast, no significant changes in task-oriented balance tests after the training were associated with effect sizes ≤0.2.

Another study evaluated the effect of 12 weeks of different forms of instability exercises, either performing on a Bosu ball or using an Aquahit, on body balance in karate-kata competitors [27]. Significant improvements at P ≤ 0.01 in parameters of task-oriented balance tests were accompanied by effect sizes >0.8.

The last study evaluated the effect of 12 weeks of combined balance and resistance exercises on neuromuscular performance in athletes after anterior cruciate ligament reconstruction [28, 29]. The improvement in accuracy of visual feedback control of COM position was significant in the A-P direction (31.0 %; P < 0.05) but not in the M-L direction (19.8 %; P > 0.05). It could be argued that these differences prior to and after the exercise program are practically meaningful, and a calculation of effect sizes supports this possibility. The effect sizes of 1.5 and 0.6 for mean COP distance from the curve in A-P and M-L directions would be described as large and moderate effects and particularly meaningful in terms of clinical practice. As the exercise program was designed specifically for each subject because of the different type and magnitude of the injury, another method of how to interpret the results is to compare individual changes during the exercise program.

A summary of statistical and practical significance of the above-mentioned studies is provided in Table 3.

Table 3 Statistical and practical significance of different balance training programs

4 Conclusions

The present study outlined significant and practically meaningful effects in balance research in the context of sport training, rehabilitation, and health-oriented exercise programs. Findings showed significant differences in parameters of task-oriented balance tests between groups of subjects with different ages that were in accordance with large effect sizes. Conversely, no significant differences were found between groups of athletes of different specializations. Likewise, the values did not differ significantly during stance on injured and non-injured legs. Nevertheless, moderate to large effect sizes indicate that the data are particularly meaningful in terms of sport and rehabilitation practice.

Controversy exists, however, concerning the effects of intervention studies. Six of our studies differing in the group of interest and the type of balance training were taken as representative for interpretation of findings. A visual feedback balance training in school-age children showed highly statistically significant changes. Even the most trivial effect (effect size <0.3) has been found to be significant. Such a small effect size may be the result of small mean differences and/or relatively large SD values. In this case, it was mainly due to large SD values in the children population. Some discrepancy has been also found between statistically significant effect of task-oriented balance training in untrained subjects and small to high effect sizes. Conversely, statistical significance of different balance training programs in physically active individuals and competitive athletes corresponded with calculated effect sizes. However, using elaborate statistics in studies with a small sample size (individuals with functional imbalances and after lower limb injuries) did not make the findings more meaningful. Such studies using a small number of subjects (<10) should be considered simply as case studies and should be treated accordingly.

It may be concluded that frequently used statistical significance in balance research does not imply the between-group differences and changes observed after the training are practically meaningful, or vice versa. Therefore, both P values and effect sizes should be used when interpreting the results of cross-sectional and intervention balance studies.