1 Introduction

Virtual reality (VR) has been inserted in many different fields and has established itself as a valuable tool for improvement of performances or skill acquisition (Hülsmann et al. 2019; Michalski et al. 2019; Petri et al. 2019b, 2019c). VR becomes more attractive since it enables to expand the psychological and sport training applications due to the possibility of controlling and manipulating the process or interaction of the participants in the virtual scene (Harris et al. 2019; Hoffmann et al. 2014; Mueller et al. 2007). Previous research emphasizes the usability of VR for players and coaches and confirmed the possibility to develop reliable assessment and training situations despite technical restrictions such as latencies or cybersickness (Faure et al. 2020).

Generally, a simulation of scenes or a reproduction of images that are difficult to realize in the real environment allows multifaceted usability. For example, different task constraints can be varied (Gray 2017), or feedback information can be generalized, which is challenging to realize in real-world scenarios (Sigrist et al. 2015). In previous VR scenarios, additional benefits have been crystallized, such as the interaction with each desired object (Fox et al. 2009), the conduction of experiments or training, which are questionable because of the risk of injuries under real conditions, and the possibility for people to train without needing to access the sporting environment or multiple training partners (Michalski et al. 2019). The importance of monitoring participants’ performances and developing their skills when constantly challenged is increasingly emphasized in VR interventions (Farley et al. 2020; Khan et al. 2019). This process can be supported by adjusting the difficulty level depending on the users’ skill level (Gray 2017).

Thinking about using VR as a training tool, different types of VR applications have been considered in which multiple applications emerge (Huang et al. 2019; Jackson II 2021). A popular system is a head-mounted display (HMD), which is smaller and more practical/portable (wearable and some even wireless) compared with others, such as the CAVE (Neumann et al. 2018). It has reached the commercial market, and the application has been extended to different fields such as healthcare and education (Nor et al. 2020). Especially for sports, benefits have been crystallized elicited through this system because natural locomotion could take place ensuring an increased immersion (Kallioniemi et al. 2017).

2 Background and current issues

Currently, only a few studies compared the quality of visual perceptional cues in VR to natural conditions or discussed how VR training could be conceptualized to effectively acquire complex sensorimotor skills in the real world (Harris et al. 2019). To realize a more realistic VR training, different measurement systems have been included in or combined with VR applications, for example, eye-tracking (Pastel et al. 2020a), EEG (Tauscher et al. 2019), and motion capturing systems realizing the whole-body visualization (WB) (Pastel et al. 2020b). Those approaches are necessary to expand the translation of physical actions into virtual sports performances (Neumann et al. 2018). Currently, differences in perception and the transfer from VR-adapted skills into real-world conditions can still be detected (Pastel et al. 2020a). Therefore, it is necessary to investigate how VR training recommendations or conceptions should be made leading to a positive transfer to reality.

A few studies focused on the use of VR in sports (for review, see Neumann et al. 2018). Especially, the assumption that the learned skills and experiences can be transferred to real-world conditions has not yet been satisfactorily answered. An improvement after training sessions in VR could be observed, such as in juggling (Lammfromm and Gopher 2011), karate (Petri et al. 2019a), throwing darts (Tirp et al. 2015), and baseball batting (Gray 2017). Michalski et al. (2019) stated that participants significantly improved their real-world table tennis performance when only a VR training was completed. Conclusions of positive transfer effect from VR-adapted skills into real-world can be drawn indirectly since representative real-world tennis was developed, and the authors suggested the potential of performance transfer (Le Noury et al. 2021). Although VR simulators are supposed to be for the positive transfer of adapted skills into the real world, their validity and fidelity must be explored before its adoption as a suitable training device (Wood et al. 2021).

The basis of the aforementioned studies helps in understanding the possible integration of VR applications into sports scenarios. Summarizing the content, we see some missing points for presenting a conceptual design regarding sport interventions in VR. In some studies, the transfer of the newly learned skill into the real world is missing. An improvement of performances was often observable within the VR environment, but the transfer to the real world is not considered. Although studies exist comparing different technologies in supporting the learning progress of complex motor skills (Burns et al. 2011), a lot has happened regarding possibilities appearing through new devices and their software development.

Apart from a few exceptions, for example (Liu et al. 2020), which dealt with both, most of the previous enumerated studies examined the execution of motoric tasks. The natural locomotion realizable within HMD-based VR scenarios ensures relative natural body movements (taking one step forward leads to an adequate positional shifting in VR), essential considering sports activities. Also crucial for learning new movements is perceiving the body via kinesthetic, proprioceptive, vestibular, and visual cues (Wong et al. 2012). Since the participants complete natural movements during the perception of the VR environment with an HMD, they rely on similar feedback to real-world muscle tension and stretching, or from the vestibular system, although possible differences should not be ruled out (for example, the weight of HMD, 500 g in average). Furthermore, only a few studies have examined how much of the body must be visually presented within VR to ensure adequate performance compared to real-world conditions. Pastel et al. (2020a, b) could show that the WB visualization is not necessary to ensure high performance during the execution of different motoric tasks but emphasized the importance of having at least one body part visualized, which is in line with other findings (Kim et al. 2018; Lugrin et al. 2018). However, the body visualization was examined to ensure movement execution and not test whether different body visualization types could impact the acquisition of previously experienced movements.

In sports training, improving motor skills by visual feedback of the own performance compared to the superlative is often preferred since the athletes can detect and correct errors (see Rhoads et al. 2014; Vaziri et al. 2006). The authors presented the benefits of visual feedback on motor learning in many kinds of sports but also warned against implementing it in the coaches’ learning methods without hesitations since research findings do not show visual feedback as the most efficient. They also discussed the usage of visual feedback concerning the athletes’ level since minor improvements in performances could be crucial to become the best (Rhoads et al. 2014). When talking about visual feedback, several forms are considered. It can show the athletes videos of experts’ movement execution or their performances to acquire a better imagination of basic movement patterns. In another context, it is often associated with the persistent visual feedback of the own body leading to significantly greater accuracy for visuomotor tasks than non-persistent ones (Unell et al. 2021). The reason why the provision of visual feedback could not be established as a crucial tool at the current state may be the type of presenting. The athletes were often shown only videos instead of being in a fully immersive environment to get feedback on their performances or better movements’ imagination, where the perspectives can be freely chosen by them to ensure a three-dimensional impression. Here, we see VR as a great potential to fulfill such requirements since realistic training scenarios could be perceived via the first-person perspective.

Currently, the VR applications became more commercial and are available for the majority due to low costs and practical use. In the next decade, most of the population will probably never have their motion capturing system allowing a reliable visualization of the WB. Instead, other hardware components tracking one’s body could solve the problem of private learning sessions accompanied with WB such as HTC Vive Tracker (HTC Corporation, Taiwan) allowing to visualize single body parts or objects. Further questions arise regarding how important it is to visualize the WB for motor learning or which body parts have to be prioritized. Concerning the practical use, the controllers are often used for visualization of the forearms for helping the participant to orientate themselves in VR scenes or to interact with objects (Lougiakis et al. 2020). The VR use during a new skill acquisition could accelerate and support the learning process and automatically help the participants get used to it. Besides integrating VR additionally into the training, the importance of standalone training is increased associated due to rising stress levels and other daily charges (Pan et al. 2012). In this context, video-based training could also be helpful to enable motor learning compared to VR (Yu et al. 2020), but which of these methods is the more effective could not be answered yet sufficiently. Both forms can be used as an additional training tool that amplifies the possibility of motor learning for private uses. People are often reluctant to participate in new sports because they feel insecure comparing their performances with those with intermediate skills. To accelerate learning progress or clean out individual mistakes, VR can generate additional and individualized feedback that supports motor learning (Thurlings et al. 2013).

Previous research has examined the use of VR application for motor learning and different kind of sports have been considered, such as golf (Harris et al. 2020) or squat training (Hülsmann et al. 2019). In the current study, a sport-specific technique related to combat sports has been chosen to be tested. The Soto Uke in Zenkutsu Dachi is a defensive technique from karate kumite and is a complex movement for a beginner or someone without martial arts experience. We chose this movement to test whether VR or videos can be used as a practical learning tool in acquiring skills, since the majority did not collect any experience (except the combats sports athletes). Furthermore, the movement does not require any jumps or fast rotations that could be harmed by the technical hardware components (weight of the HMD).

So far, no study emerges from the research dealing with a comparison of different degrees of body visualization in the context of movement learning concerning the quality of the movement to be learned and draws a comparison between a WB and partial visualization. Previous studies only dealt with the effect of the visibility of body parts during movement execution and not with movement learning. Likewise, the influence of different observation perspectives or different forms of representation of the avatar in a WB visualization on movement learning has been investigated, but not the substantial impact of those different visualization types. The predominant body of research concerning motor learning took place in controlled laboratory settings and was driven by theoretical-driven examinations by using simple movement tasks (Farrow and Robertson 2017).

In the current study, the main goal was to examine whether VR can be used to acquire a special combat technique, the Soto Uke in Zenkutsu Dachi. Four different groups were formed: a VR training group perceiving the whole-body visualization (VR-WB) and one in which only the forearms were visualized during the intervention (VR-FA), a video training group representing a conventional training (VB), and a control group (CG) with no intervention. A comparison of the training groups was made to determine whether training in VR can lead to adequate outcome performances compared to a video-based learning program. Therefore, the study aimed to examine whether learning a combat technique with a VR training tool is as successful as adequate video-based training. A secondary aim was whether the visualization of the forearms by using the controller is sufficient for adequate learning without any loss of acquisition quality. Due to the high immersion within the VR scenario and the three-dimensional presentation of each partial movement, we assume that VR could lead to more efficient learning acquisition than video-based learning. Since visual feedback is essential for motor learning (Park et al. 2018), we assume perceiving the whole body leads to a more remarkable improvement than those with less visual feedback of their body, especially for body parts that are not visually perceivable. Although the learning content is similar between the conditions (virtual reality and video-based training), it should be investigated whether the skills to be learned can be acquired equally in a virtual environment. Hereby, further statements can be made whether movements in VR are equally visual perceivable and whether learning content can proceed similar to real condition, resulting in additional suggestions realizing future VR training sessions.

3 Methods

Four groups were tested and went through a different kind of intervention. A pretest was conducted to determine the starting level of the participants’ performances. After the individual intervention for each group, the post- and retention tests were carried out. The retention test has to be completed by the participants after 2 days’ pause, which consisted of the same procedure as the pre- and post-test (see Fig. 2). The approval of the Ethics Committee of the Otto-von-Guericke University at the Medical Faculty and University Hospital Magdeburg was obtained under the number 132/16.

3.1 Technique description Soto Uke in Zenkutsu Dachi

The Soto Uke (SU) in Zenkutsu Dachi (ZD) is a defense technique from karate kumite (see Fig. 1). It consists on the one hand of the step position, which can be executed forward (zenkutsu dachi) or backward in ZD or also in other karate stands (Wichmann and Seer 2005). Additionally, an arm movement (Soto Uke) is also performed, which is intended to block an attack by the opponent. With the ZD, the direct distance of the heels is about two shoulder widths. The main weight rests on the front leg, which is strongly bent. The outer edge of the front foot is positioned so that it points straight forward. The back leg is extended to give a firmer grip on the ground and a stable stance (Wichmann and Seer 2005). With the SU (sometimes also called Soto-Ude-Uke), attacks are repelled laterally by the shortest route. Both arms are included within this form of movement, the attack can be parried from both sides. To execute the basic defense, one must first lunge with the arm with which one wishes to ward off the attack and place the fist at eye level to the side of the head (Wichmann and Seer 2005). The inside of the fist is turned out so that the thumb points forward. The upper arm is horizontal, and the forearm is vertical (90°). This is the starting position from which the blocking movement is initiated. The other arm is extended, and a fist position is assumed at the opponent’s chin level. This movement was chosen since it includes the movement of each body part, and it is less likely that the participants were confronted with such kind of movement except for karate athletes. Thinking about using VR for movement learning for all different body parts, the chosen karate technique could be representative of such kinds of movements.

Fig. 1
figure 1

Illustration of the Soto Uke in Zenkutsu Dachi. The first row (a) indicates the arm movement (Soto Uke), the second row (b) shows the leg movement (Zenkutsu Dachi) from the frontal and lateral perspective, and the third row (c) visualizes the complete technique including both arm- and leg movements (Soto Uke in Zenkutsu Dachi)

3.2 Experimental apparatus for both studies

3.2.1 Hardware

A wireless HTC Vive Pro Eye (HTC, Taiwan) was chosen with a field of view of 110°. To execute the VR environment smoothly, a high-performance desktop equipped with Intel i7 CPU, 16 GB memory, 512 GB SSD, and Nvidia GTX 1080 8 GB graphics card was used. A motion capture system (Vicon Shogun, Oxford, UK), including 13 cameras with a sampling rate of 200 Hz, was used to visualize the WB (including finger tracking) in VR. With this, another computer was connected to the previous one to run Vicon Shogun (equipped with Intel i7 CPU, 32 GB memory, 512 GB SSD, and Nvidia Quadro K2200 4 GB graphic card). To ensure the synchronization with the individual skeleton to the virtual avatar’s ones, 63 markers were placed on the participants’ bodies, which was included in the group with whole-body visualization. The two controllers of the HTC Vive system were used to visualize the forearms for the other virtual reality training group (see Fig. 3). Two synchronized cameras (GoPro 6 and Casio Exilim EX-F1, 1920 × 1080 pixels, 60 Hz) were used to record the movement from the lateral and frontal perspective in the pre-, post-, and retention test.

3.2.2 Software

To create an environment of high fidelity preventing the participants from a conflict between the real-word and the virtual environment, the virtual room was created with Blender (version 2.79) using the scales and the textures of the objects in the real world. The test room was also used during the experiment for all other training groups (video-based and control group). The created virtual environment was then imported into Unity3D (version 2019.1), and the SteamVR (version 2.5.0) was used to enable users to interact in the virtual reality. Visual Studio 2017 was used for implementing the C# program for Unity3D to control the studies.

3.2.3 Participants and experimental setup

Eighty-three young sports students (33 females and 50 males, age 22.92 ± 3.11) participated in this experiment voluntarily with normal or corrected to normal vision, no report of the eye or neurological impairment. A Power analysis was done (using the software G-Power, version for the within–between interaction (time and group) by considering the number of participants, the number of groups, the used statistical method, the number of measurements, the expected effect (we assumed at least a medium effect size: f = 0.25, since the control group conducted no training and improved performances after training intervention was expected), and the alpha error probability (0.05), whereas a Power was measured at 0.99 (1 − β error probability). The participants gave their written consent after fully understanding the aim and the procedure of the study. 48.2% of participants had already gained VR experience, but none of them possessed their own VR application. VR experience was noted down when the participants had either taken part in at least one VR study or had ever participated in a VR gaming session. We classified the VR experiences as marginal since more than just participation is needed for being schooled within virtual environments, which is unlikely when it could not be used at home. Sixty-one percentage of participants regularly played video games (M = 7.87 h per week, SD = 7.80). Even among the participants who have gained game experiences, there are large deviations within the weekly hours (VR-WB: 2.73 ± 6.20; VR-FA: 6.18 ± 9.58; VB: 4.10 ± 5.76; C: 6.14 ± 6.34). Most participants did not report owning a VR device for private uses (VR-WB: 100%; VR-FA: 90.9%; VB: 100%; C: 95.2%). In addition, no experiences regarding combat sports, in general, have been made by 40% in VR-WB, 32% in VR-FA, 70% in VB, and 38% in C. Taking a closer look at this varied distribution of prior experiences, it turned out that more people in VR-WB, VR-FA, and C participated in the judo course at the university, which we did not count to combat experiences, since it is made for beginners, and contains other demanded technical components. The division into groups was prioritized by ensuring the same VR experiences crystallized through the ownership of VR devices for private uses and the combat level, which was also the same for all groups.

3.2.4 Procedure

Before starting the experiment, each participant had to read or sign the participant information, the consent form, and a self-created questionnaire about possible pre-experiences in VR or combat sport. Participation was canceled if pre-experiences in karate kumite or other combat sport existed for at least six months. An overview of the procedure is shown in Fig. 2. In the VR part of the study, the HMD was calibrated by the investigator using the official built-in calibration protocol from SteamVR. For the group that visually perceived the whole body during the interventions, a subject calibration including the attachment of the markers on the motion suit took place before starting the VR scenario. The investigator explained the study’s procedure and measured the subjects’ interpupillary distance to ensure a clear visual input from the HMD. The participants could look and walk around in the VR scene during this 1-min adjustment phase. After the adjustment phase, the formal study started. Before starting the tests and interventions, the participants were instructed to warm up for at least 5 min.

Fig. 2
figure 2

Test design of the current study

3.2.5 Pre-test, post-test, and retention test

The participants were instructed to observe the movement on the monitor three times. Two cameras were placed at fixed locations (distance, high, inclination angle) that remained the same during the whole study to ensure valid and reliable data for the video analysis. The perspective of the first camera was in the front, and the second one was placed laterally with sufficient distance to capture the whole movement. After the observation, the participants were asked to reconstruct the observed movement as best as they can. They should start with the right leg and stop when the left leg’s step was completed (in total, two steps had to be taken forward). The tests were conducted in the laboratory, so a transfer from virtual training into a real-world setting could be investigated after the interventions.

3.2.6 Intervention

An overview of the content of each training session is presented in Table 1. Different requirements of the VR tool were set up, such as autodidactic learning without an external trainer, home-based training, and mainly visual information processing took place within the learning sessions. The three-dimensional observation from different perspectives (also free chosen), the additional information of technical details and auditive signals, the controlled attention steering minimizing the distractors within the VR scenes can emerge as possible advantages to support the learning process. For the forearm group (VR-FA), there were some limitations regarding the fist pose’s learning process due to the static posture of the controller. This part was skipped, and they just regarded the fist pose to imagine the correct posture. Those limitations were considered through the whole intervention phase; further restrictions did not take place. Generally, the units were split into observation and executions phases. Different execution velocities were chosen during the phases to introduce the participants to the movement to be learned as comfortably as possible. Before starting with the new unit, the main contents from the previously conducted unity were repeated within 5 min. During the sessions, a board with additional information in written form about the correct execution of the sub-movement was displayed to provide another form of details. Each session lasted approximately between 15 and 20 min. Although the participants received additional information about the key points of movement via written notes on a virtual board, they did not get individual feedback about their movement which is comparable to other studies in which the participants’ level still improved (Petri et al. 2019a, b, c). Table 1 shows the learning contents of each training session and the methods that have been used.

Table 1 Overview of the contents for each training sessions

The groups differentiated in the following points:

  • Group 1 (VR-WB)

    For this group, the whole body was visualized in VR by wearing the motion suit that allows synchronizing the participants’ motion with the avatar’s body, including finger tracking (see Fig. 3). To avoid differences in perceiving the body proportions, we chose between female and male avatar skin surfaces. During the intervention, the participants perceived the WB, which can be used to adjust their movements in the virtual mirror.

  • Group 2 (VR-FA)

    The training for this group also took place in the virtual room. The participants were equipped with the Vive-Controllers, which enabled the body’s forearm’s visualization (see Fig. 3).

  • Group 3 (video-based, VB)

    This group trained by video-based acquisition. The participants were placed in front of a monitor where the movements from different perspectives and speeds were presented. The learning content was equally compared with the training sessions in VR (see Tab. 1).

  • Group 4 (control group, C)

Fig. 3
figure 3

Overview of the groups: a whole-body visualization (participants had to wear the motion suit), b forearms visualization, the user is holding the controller instead of wearing the motion suit and c video-based training from the first-person perspective (1PP) and the third-person perspective (3PP) in the real-world setting (RW). A mirror was implemented in front of the participants within the VR scene, ensuring visual feedback or awareness of their position. The control group is not presented in this graph because the participants completed no intervention. In the VR scenarios, the left avatar shown in the virtualized mirror indicates the virtual trainer (transparent blue field), and the right one is the user’s avatar (transparent red field) (color figure online)

This group underwent no intervention and just conducted the pre-, post-, and retention test.

3.2.7 Proposed VR application

The supervisor guided the current VR application by following a protocol. There was no User Interface in which the participants could choose between, for example, perspective, avatar’s movements, speed, etc. The process should be standardized for all groups, and no delay should occur within each training session. Therefore, the supervisor guided each participant by activating specific visual cues by simply clicking (for example, colored lines indicating the next position that should be taken) or by verbally instructing the number of repetitions for each movement that should be completed. Starting different movements was realized through public arrays, in which the different movements could be chosen by clicking on them. To get an impression of the participants’ view, different scenes are presented in Fig. 4.

Fig. 4
figure 4

Examples of participants’ perspectives during the training sessions. For better illustration, the visualization of the own body (FA the forearms and WB the whole body) has been removed in this graph. a The different colored lines representing the lateral perspective (blue line) and the frontal perspective (yellow line). b The fist animation for better imagination of the position of each finger. c The lines on the ground indicating the width of the leg position and the “roof” that should not be touched during moving forward. d The board on which all movement’s instructions were presented. The avatar is the virtual trainer (color figure online)

3.2.8 Data analysis and statistics

The movement quality was compared by using the obtained scores of the expert’s rating system. The rating system was constructed by two experts having many years of experience and graduation of the second and the fourth Dan (Deutscher Karate Verband e.V.). The scoring system consisted of 42 items, whereas 8 different items described the movement of the upper and 8 items described the movement execution level of the lower body. Although the upper body, the lower body and the fist movement consist of 8 criteria, each of them was multiplied by a factor emphasizing the importance of the learning progress for beginners (less important = 0.5, important = 1, very important = 1.5). Since both movements were rated two times, more scores could be collected [in total 32 items]. Five items were reporting the quality of the starting movement in the hip-width stance. To analyze the smooth transition of the upper- and lower body movements, another 5 items were developed. The raters were instructed by the experts and were aware of the distribution of points. A maximum of 2 points per item was possible to give, and specific rules had to be considered of the raters depending on the positioned markers on participants’ bodies. The raters were not informed about the time point (pre-, post-, or retention test) the video was captured. Their ratings were made independently from each other. To measure the quality of ratings between the raters, the intraclass correlations coefficient (ICC) was calculated using the definition created by Koo and Li (2016). We calculated the ICCs for the upper (0.924) and lower body (0.882) movements and the fist pose (0.825) separately. In addition, the ICC for the total score was determined. An ICC of 0.877 turned out to be an acceptable value for using the data for statistical analysis (Koo and Li 2016). The lowest ICC 0.787 within the fist pose was calculated, which is still acceptable.

After testing for acceptable ICC values, we used the mean between the two raters for further statistical analysis. Before starting the inferential statistics, we first checked whether there are differences between the starting level of all participants being in the different intervention groups [VR-WB, VR-FA, VB, C] and whether there were statistical outliers by using Bland–Altman boxplots. The starting level is determined through the means of the scoring system of the two raters. We used a one-way ANOVA for independent samples by using all groups’ total points of movement quality. The analyses resulted in no significant differences between the groups after they had conducted the pretest (p > 0.05). This was strengthened since the boxplots within each group (and all participants together) did not show any significant outliers.

Two main questions were considered during the data analysis. The first one thematizes the comparison between the groups in their improvements caused by the different interventions. Therefore, we conducted a mixed ANOVA with repeated measurements with the between-subject factor group [VR-WB, VR-FA, VB, C] and the within-subject factors time [pre-, post-, and retention test] and the body regions [upper body, lower body, fist pose]. We also calculated with the total score (the sum of the body regions) to get a better impression of performance improvements. Since each body region includes a different number of possible points because of the difference in the maximum achievable score caused by the previously explained weighting factors (upper body = 26 points, lower body = 32 points, begin of the movement = 11 points, and smooth transition = 10 points, fist movement = 17 points), we used the percentages of the achieved points being able to compare them and to make further statements on movement acquisition’s quality. In total, a maximum score of 96 points could be reached by the participant. For the total score, we conducted an additional two-way ANOVA with the between-subject factor group [VR-WB, VR-FA, VB, C] and the within-subject factor time [pre-, post-, and retention test]. To examine the body visualization, the groups VR-WB and VR-FA were compared additionally, since the less visualization in the VR-FA group could lead to a loss in movement acquisition, especially for the lower body part. If no sphericity was given, the Greenhouse–Geisser correction was used for all calculations. In addition, we conducted a multilevel linear regression with the points (not using the percentage) as the dependent and group and time as independent variables.

Other factors such as enjoyment during the training sessions are considered to crystallize further advantages evoked through the different intervention types. After each training session, the participants had to state how much they felt motivated (from 1 to 10, whereas 1 = does not apply at all and 10 = fully applies). Therefore, multivariate ANOVAs were conducted to reveal possible differences between the groups [VR-WB, VR-FA, VB, C] and training sessions [T1, T2, T3, T4].

4 Results

4.1 Comparison of the groups

First, we concentrated on the effectiveness of learning for each group by observing the total score (sum of all body regions). All groups that experienced an intervention improved over the time from pre to post (see Fig. 1). The C remained unchanged. There were no significant differences between the post and retention for all groups (see Table 2).

Table 2 The descriptive statistic of the groups

After observing the total score, we were interested in whether this occurred for all body regions since different visualization types between the groups might harm the performance. Significant differences were found within the main factors time, and body region (see Table 2). Performance differences could be proven between the pre- and post-test, as well as between pre- and retention test. The same occurred for the body region since no improvement for the fist pose could be observed for all groups compared to the others. As shown in Fig. 5, the analysis of the between-subject effects showed significant differences between the groups accompanied by a large effect (see Table 2). After excluding the C from the statistical analyses, the interaction of the three factors (group, body region, time) was not significant (F(6.864, 87.656) = 0.890, p = 0.525). Therefore, the process of learning within all body regions (upper and lower body, fist pose) over the time (pre-, post-, and retention) took place equally for all intervention groups.

Fig. 5
figure 5

Overview of the learning process over the time (pre-, post-, and retention test) for the video-based (VB), whole-body visualization in VR (WB-VR), forearms visualization in VR (VR-FA) and control group (C)

4.2 Improvement of the body regions over time

The controller in the VR-FA group had no impact on performance level concerning the fist pose, nor did they limit the lower body performance. This could also be observed in Fig. 6, in which each group’s progress within each body region is presented.

Fig. 6
figure 6

The differences from the percentage of performances from the following to the last testing time (pre-, post-, and retention test). The dashed line indicates no learning effect; consequently, each positive value covers learning progress

Multilevel linear regression was used to examine whether the independent variables “group” and “time” were able to statistically significant predict the dependent variable (calculating with points instead of percentages). The R2 for the overall model was 0.65 (adjusted R2 = 0.42), indicative for a high goodness of fit, according to Cohen 1988. Group and time statistically predict the participants’ level, F(2, 248) = 92.10, p < 0.001. The time is a significant predictor for points. An increase over time is recognizable (β = 8.403; t (248) = 10.63; p < 0.001). However, it must be considered that this increase cannot be seen between the post and retention (Fig. 7).

Fig. 7
figure 7

Distribution of the points (the total number) for the independent variables Group and Time and the reference line. Similar results are observable regarding to the previous presented results. In (a), the distribution of the points only differs between the training groups (VB, VR-WB, VR-FA) and the group that had not trained (C). b The improvement over the time indicated by the reference line (minimum and maximum performance). The differences between the performance quality consist only between the pre and the other time points (post and retention)

4.3 Enjoyment

We examined additionally the enjoyment during the training phases. The univariate ANOVA revealed a significant interaction of the groups*training sessions (F(6, 177) = 2.742, p = 0.14, η2 = 0.085). The least fun occurred in the VR-WB group. Considering the main factor, the factor group revealed a significant difference between the VR-WB and the VR-FA (p < 0.05, Cohen's f: 0.44, strong effect). Generally, the training sessions showed no significant differences F(2,457, 177) = 0,430, p = 0.732, η2 = 0.07. For better illustration, the enjoyment factor after the pretest, the subjective impression of the difficulty of the movement to be learned, the preferred perspective during the avatar’s movement observation, and the enjoyment during the training are presented in Fig. 8.

Fig. 8
figure 8

Plotted responses of the participants for a the enjoyment after the pretest to measure the motivation before starting the study (1 = does not apply at all, 10 = applies for me completely), b the subjective estimated difficulty of the movement to be learned (1 = very easy, 5 = very difficult), c the perspectives have been sufficient to learn the movement (1 = does not apply at all, 5 = applies for me completely), d the enjoyment after each training session (T) to check the motivation of the participants (1 = does not apply at all, 10 = applies for me completely). In (d), the control group did not conduct training

5 Discussion

A trend of using VR as an additional learning tool has been observed in the last years in many different fields, and the application possibilities determined by the technical components are increasing (Ahir et al. 2020). How and where to use VR in sports is not fully transparent due to not matching its effectiveness in the currently conducted research. But, considering VR as a training tool that enables standalone training at home when, for example, no training group exits in the immediate vicinity, or for current reasons such as containments caused by pandemic, it is essential to examine whether such learning is comparable to conventional training methods such as video training or similar. It is essential to investigate the learning process enabled by new technology since possible negative outcomes could be evoked (Srinivasan et al. 2006). Therefore, the present study compared learning methods of a movement that requires the coordination of the whole body. The sport-specific technique from the karate kumite, the Soto Uke in Zenkutsu Dachi, is well suited for this purpose since the upper and lower body have to be coordinated.

In total, four groups were defined, two of them trained in VR (VR-WB, VR-FA), one via video-based training (VB), and one who experienced no training (C) to determine the intervention as performance-limiting factor. The participants’ level was measured in the pre-, post-, and retention test. Hereby, the participants had to observe three times the movement from different perspectives shown on a monitor without imitating simultaneously. After observation, they were pleased to execute the shown movement as accurately as possible. A rating system created with the cooperation of experts was used to investigate the participants’ levels. It consisted of partial movements of the upper and lower body, as well as the fist pose. As participants with no prior knowledge of the technique or combat sports, in general, were tested, emphasis was placed on the basic elements of the movement. To ensure a reliable and objective scoring system, the ICC between the two instructed raters was calculated. Mainly, each group (except the C) passed through the same developed intervention. This related to the procedure of the study, for example, the split between the partial movements, the repetitions of each movement, the perspectives (except in VR where also a freely selectable was enabled), the given feedback via written information, and the possibility to observe their performance in a mirror. A few restrictions occurred for the VR-FA group since no visual feedback was given of all body parts. Besides, the fist pose could not be imitated at the same level as the other groups since the controller had to be held permanently in the hands that harm the finger movements. In VB, no auditive feedback for remaining the vertical height could be provided during the first lesson in which the ZD (lower body movement) was presented. The performance was compared separately within the VR groups to investigate the influence of the different body visualization types on learning. To make further statements whether a WB visualization is necessary to gain each movement’s components, a closer look was obtained on comparing between the VR-WB and VR-FA.

Generally, it has been crystallized that the VR intervention can compete with the more used and established video training since no significant differences were found between the groups (VR-WB, VR-FA, VB). All groups improved significantly from pre- to the post-test, and the level of improvement could keep up, indicated by no significant difference between the post- and retention tests. Positive transfer of VR-adapted skills into real world could be found previously in football (Huang et al. 2015), rowing (Hoffmann et al. 2014), table tennis (Liu et al. 2020) and further, but not reported in combat sports (Petri et al. 2019c). Nevertheless, we have to reject the first hypothesis, in which we assumed higher learning for the VR training groups than the VB.

Being in an immersive virtual room while an avatar presents various movements seems to be an attractive learning situation, especially when the perspective of observation could be freely chosen to the individual needs. Different perspectives could improve movement depending on the point of observations, as previous studies have shown (Hülsmann et al. 2019). It is assumable that gathering impressions in a virtual scene is promising since other studies also report higher vividness, interactivity, telepresence, and satisfaction for media consumption in VR instead of perceiving on a 2-D screen (Kim and Ko 2019). Nevertheless, no effect between the VR training groups and the VB could be determined, which is not in line with other studies in which VR training seems to be more effective than video training (Vignais et al. 2015). This may result from ensuring comparability between the intervention since the learning contents had to be kept the same within the different learning methods. A further possibility could be the unfamiliar surroundings, which caused a higher cognitive load, leading to higher distraction, and therefore, reducing the learning outcome (Makransky et al. 2019). This suggests that simply being in a three-dimensional virtual environment does not necessarily support learning in a way that is not realizable in video-based training. To use VR as a more efficient learning method, a higher range of possible advantages must be integrated to maximize the learning progress. It is essential to discuss these advantages and how they can be integrated into the VR sessions, which was done previously (Checa and Bustillo 2020).

Although the type of learning (split into partial movements, number of repetitions, chosen perspectives, speed of motion, etc.) remained the same over the different groups, there were some added features within the VR intervention. Besides, the participants got auditive feedback if they exceeded the vertical line when they moved forward. The auditive feedback can enhance interaction performance and also compensate the missed haptic (Rausch et al. 2012). Except for those, no differences between the interventions existed. Thus, the range of advantages that VR can provide was not fully integrated, explaining the same course of improvement over time.

Since no significant differences between the VR training groups accompanied by different body visualization types could be detected, the WB does not necessarily have to be visualized in VR to achieve satisfactory performance, which is in line with previous findings (Lugrin et al. 2018; Pastel et al. 2020b). The VR-FA group could keep up with the fist pose’s performance and revealed similar progress in the lower body performance compared to VR-WB or VB, although the participants of this group did not receive visual feedback of the whole body during the intervention (expect the forearms, see Fig. 3). This suggests that the controller’s posture has no adverse effect on acquiring hand posture in this context. The lack of lower limb visualization also did not decrease the learning of the lower body movement. The question is whether this also applies for a high-level training constructed for experts, in which details have been perfected, and more visual feedback is needed. Based on this result, we can currently assume that WB is unnecessary, especially when learning a new technique including basic skills acquisition. In addition, the VR-FA reported a higher enjoyment during the training sessions, and less effort had to be made as only the calibration of the two controllers took place. Considering the upper body movement improvements, there is a learning restriction detectable elicited by the controller. Although no significant differences appeared between all intervention groups, VR-FA’s dispersion showed no learning effect for some individuals. Other studies also claimed the interaction and embodiment relating to the usage of handheld controllers (Lougiakis et al. 2020). Especially in positioning tasks in which interactions with other objects have been tested, the controller seemed to be an adequate method to complete tasks with high precision. The authors also stated that visualizing the hand generated the strongest sense of ownership and was the preferred representation chosen by the participants.

The construction of the training intervention has followed some guidelines, which have been summarized and further recommended (Fischer and Paul 2020). The authors mainly described the optimal video training design, but the same goals are transferable to VR training interventions. Their expressed goals are to improve the understanding of the interrelationships between the actions and functions of sporting techniques through movement analyses concerning kinematic factors (e.g., joint angles, trajectories, etc.) (Fischer and Paul 2020). Also, the optimizing of athletic training by becoming aware of discrepancies between the actual execution of movement and its internal representation. The generation of the target values was realized through the virtual avatar or the athlete who was presented during the video training. To strengthen the internal representation of the optimal technique, the avatar/athlete demonstrated each partial movement with less speed and key points of the movements were frozen, so the participants had a higher chance to compare their performance with the optimum. At this point, it would be helpful to integrate external feedback from the trainer/avatar to provide the actual value of participants’ movement, which just relied on their judgment. Especially at the beginning of learning a new movement, slowing down can bring an advantage; however, to compensate for this, the movement’s presentation in real time is indispensable (O and Hall 2009). Therefore, we decided to slow down the movements’ speed at the beginning (avatar/athletes’ movements and own execution) and increased it each training session.

Often reported benefits of VR learning sessions are higher motivation (reflected in presence feeling), interest, and fun (Makransky et al. 2019; Parong and Mayer 2018). High motivation could be observed over the training units for all groups, but no difference between the VR groups and the VB is detectable. The verbal feedback of the participants for the VR training scenarios was quite positive. They reported having much fun, and it was perceived as an exciting and thrilling experience. Sometimes, time was needed to get used to the new surroundings, since only two percent owned a VR device for private uses, 48.2% indicated pre-experiences in participation of other VR studies.

6 Limitation

The current study hosts some missing points concerning the usage of VR in a learning scenario. To maintain the comparability, the interventions were similarly constructed for each group to determine the impact of a high immersed virtual environment on acquisition. For example, an interactive avatar that reacts to the participants individually was not integrated. Furthermore, no feedback of the current state during the intervention was given, and the participants could only correct their movements by their sensation. Besides, it should keep in mind that not all athletes prefer the visual demonstration of a movement to acquire the basic elements. Auditive feedback has been retained in general, and different learning types should be considered. Although the participants received an auditive signal when they exceeded a certain height, it cannot be spoken of detailed auditory feedback like verbal instructions given by the avatar or realistic sounds during its movement. This study does not answer what kind of feedback is most effective to acquire motor skills sufficiently. Its complexity is well discussed (Thurlings et al. 2013), and further research, especially in VR due to the favorable situation to enable any feedback, should be followed.

7 Conclusion

We conclude that VR is a suitable tool to acquire sport-specific techniques, especially for beginners. We also detected high interest and motivation to a not yet experienced training method. Since the content of the training intervention did not significantly differ, the benefits in VR could not yet be fully shown. That’s why the performance outcome was quite similar between the groups. Those benefits should be clarified in future research to maximize the learning progress within VR learning sessions. The outcome of the current study shows that it is not more effective than just being in an immersive virtual environment for learning a high-skilled movement. For revealing all advantages, VR could elicit in the future, more features have to be included, such as real-time feedback of the own performance, interaction with an avatar, or other feedback sources such as auditive ones. Furthermore, the results suggest that it is not necessary to visualize the whole body. Single body limbs could be visualized via simple sensors without using elaborated technical components. Of course, if possible, other tracking methods like the Vive Tracker could be used to visualize the hands or feet, which should be sufficient to enable feedback of one’s position in the virtual scene and revealing additional feedback of body limbs’ positions. Within the learning process, different learning types should be considered in future research since this study focused on the key elements’ visual perception.

Besides, the transfer effect was just measured in the execution, not to a reaction of a real attack conducted by an enemy. To prepare for the defense in the best possible way, the training should have included attacks from the virtual avatar, where haptic feedback could be provided by using vibrotactile feedback of the controller to simulate hits. Another exciting idea is to develop an autonomous learning scenario in which the participants can decide, for example, how often the demonstration of a single movement should be repeated, the variability of speed, or frozen moments.