1 Introduction

In recent years, Virtual Reality (VR) has become a useful tool in many different applications, such as learning tutorials, simulations of hazardous situations, sports, etc. Its multiple advantages can be used for amplifying knowledge about, for example, anxiety [75], autism therapy [99], or the importance of body perception during sport motoric task completion [66]. Using VR simplifies research by minimizing costs, time-consuming processes, accessing the needed physical surroundings. An additional benefit of VR is the synchronization with other measurement systems such as eye-tracking (ET) [16, 67] or infrared motion capturing systems [24, 66]. With that, extensions of possibilities arise that cannot be implemented in reality, such as occlusion of one’s body parts [66] or changing and expanding body properties [33, 73]. Moreover, VR has been beneficial in sports training [62] by improving athletes’ performances [70, 90]. Previous studies have already confirmed a positive transfer effect in sports skill acquisition from VR into the real world (RW) [25, 58], for instance in attentional training [7]. Even more positive aspects have been verified, such as increased athlete motivation and a decrease in the feeling of subjective strain during sports activities [26, 54, 97].

Through commercial success, several devices exist in which virtual environments can be presented, e.g., the Cave Automatic Virtual Environment (CAVE) or the head-mounted displays (HMDs). Thereby, different immersions can occur, such as physically and mentally [81]. A high immersion occurs in an HMD due to the total computer-generated visual impressions [21]. Since wider Field-of-Views (FoVs) are realizable, recently released HMDs generate a higher sense of spatial presence and immersion, relevant composing the VR experience [49, 85]. A high immersion ensures more natural user behavior and is essential to conclude realistic humans’ reactions to VR stimuli. A few papers list available HMD technologies and what types of visual information processing have already been examined to improve mobility performances or support people with low vision [18, 22, 30]. Due to the first-person perspective (1PP) presented in HMD with a real-time calculation of rotations and translations of participant’s body, a high relationship to realistic interaction within a computer-generalized three-dimensional world can be ensured and further recommended also for sports-related purposes.

VR has established itself as a learning or training tool in different fields, and its progress is constantly increasing. This is accomplished by implementing different technologies in VR systems that enable the collection of additional data of the human behavior. Thus, gaze behavior can be captured by integrated ET in VR. Thinking about using the combined technology of ET and VR, future training applications might be more target-orientated and more efficient. The increased interest in especially HMD-based VR could lead to an overthinking of how users/athletes reach their specific training goals in future training sessions [37]. To illustrate this, we first explain what possibilities occur using ET in VR more general. Afterward, we demonstrate what rule gaze behavior plays within the sports sector and why it is important to improve it. We specify the review’s goal and purpose at the end of the introduction.

2 Eye-tracking in virtual reality

The progressive technical development enables the integration of ET in HMDs, which facilitates an easy method to analyze gaze behavior within virtual environments [16, 59]. By enabling the synchronization of both systems, attractive and innovative ideas came up to expand the gaze behavior analysis in previously not realizable situations and to ensure higher sophisticated immersion of virtual environments. This can also be important within the field of sports. Previous studies show differences between experts and novices in visual information uptake [64, 101]. For example, supposing a virtual learning session in which a novice should observe a new movement to be learned, VR can serve as a guideline for highlighting visual cues to attract the user’s attention to the key points of movement. Meanwhile, ET can be used to determine the user’s visual attention and enable gaze-controlled VR scenarios or control the processes within virtual environments dependent on the user’s gaze behavior.

Nevertheless, ET cannot elucidate all the processes involved for visual input. A critical issue concerning the measurement of visual perception is that ET devices only analyze overt (foveal information) but no covert attention (peripheral information) [92]. In this regard, it is helpful to apply occlusion techniques and use performance parameters as indicators for the usage of visual information [5, 95]. Hereby, the implementation of occlusions within the visual field of a participant can easily be developed in a virtual environment. Due to easy access to software and hardware components needed to create immersive virtual environments, it is possible to make any desired user-perceived scenario; for example, a virtual handball thrower and the handball goalkeepers had to anticipate the ball’s trajectories [93]. Therefore, the VR applications are a predestinated method for realizing gaze training concepts, which can optimize or enhance gaze behavior during sports activities. We refer to [4], who also stated the benefit of simulations recreating a sports environment and how visual training improves sports performances. They described that visual perceptual and oculomotor tasks enhance the vision, leading to quicker sensory processing, more accurate motor movements, and a reduced risk of being injured [4]. To measure the athletes’ skill level or examine essential visual cues, direct methods, which require the application of eye-tracking devices, ensure real-time measurement of eye movements during the observation of static pictures or natural scenes [38].

3 Enhancing gaze behavior

The advantages mentioned above could bring significant extensions to plan and create training that improves gaze behavior. Predominantly, visual, acoustic, and haptic sensation is used during the perception of the virtual environment [20]. The infinite selection of visual stimuli in terms of their presentation of magnitude, form, color, duration of appearance, etc., makes it easy to create standardized visual learning programs to improve vision skills. To illustrate the advantages and purpose of this, previous work was done in which studies were listed to identify the meaning of enhancing gaze behavior for sports competition [50]. Those findings show improved perceptual skills and decision-making by learning with impaired vision in basketball [78]. It was also shown that enhanced gaze behavior leads to more accurate and consistent practical performance in the golf swing [9].

Further studies revealed the superiority of the experts’ perceptual and cognitive skills compared to novices, which were less efficient in using search strategies by blocking the opponents’ attacks in karate [60]. Skilled athletes’ ability to anticipate opponent actions or scan the environment and recognize advanced cues arises from differences in visual perception [1, 2, 11]. Analyzing the gaze behavior in sports is already a powerful tool in determining athletes’ levels. It has been done to amplify the knowledge of perceptual-cognitive skills to assess the differences in decision-making between novices and experts [6, 52].

4 Research deficits and aim of the review

Currently, only a few studies described how an HMD-integrated ET could be used in terms of its’ technology properties and how to extract usable gaze parameters to quantify gaze behavior analysis in virtual environments. A reason for that might be the differences that may occur due to the artificial representation of the surrounding environment via HMD. For example, the remaining distance between the users’ eyes and the display(s) ensures vergence but no accommodation that decreases the possibility to rely on depth perception. Although VR has already been used in sports, less integrated ET systems have been applied to examine gaze behavior within sports-relevant scenarios within virtual environments. It could help to generate more knowledge about differences within the visual perception through the standardization of experimental procedures ensured in VR (for example, the appearance of visual stimuli the athletes should react on), reducing confounding variables. Besides, ET within VR could identify anticipatory cues during gaming and training and between different athletes’ performance levels. Trainers and referees could profit to get informed about their athletes’ attention within sports-specific scenarios and help them focusing the relevant part.

Therefore, the present study aims to provide a systematic review of the current state of ET usage in virtual environments and what possibilities occur to use this combined technic in terms of sport-related questionnaires. The broad question in the present review is: What are the technical options for integrating ET in VR, and how can they be transferred to sports science? The review’s purpose is to present actual methods applied for using ET in VR, give recommendations on how to use the combined technology during sports scenarios. The results should provide an overview for using ET in VR and serve as a source of ideas for using this combined technique to measure gaze behavior during sports or to expand and improve the opportunities and applications.

To approach the aforementioned goal step by step, we first listed the HMDs and the ET systems used to capture the gaze behavior in VR to give an overview of the accessible devices. Second, we considered all gaze parameters which were determined within each study. Nearby, we were interested in whether comparisons between VR and natural conditions of the extracted gaze parameter were accomplished to reveal possible differences that may occur due to the still uncharted new technology. Furthermore, we picked studies that include gaze parameters (fixations, saccades, pupil diameter, POR, etc.) that are transferrable for sports-related purposes. In the discussion, the studies’ contents were summarized and categorized, and additional recommendations for the usage of ET in VR during sports scenarios were made and rated around the real conversion.

5 Literature review method

5.1 Search method

The literature search was carried out using three databases, PubMed, Scopus, and IEEE Xplore. The search terms were: (virtual AND reality [also VR] AND eye tracking AND gaze). The relevant articles were screened in three stages. Sequentially, the title, the abstract, and the full text of each article were checked. The last status of the research for both databases was on October 16, 2020.

5.2 Study selection

Various inclusion and exclusion criteria were used to filter relevant studies. The following were defined as exclusion criteria: studies that were no research articles (therefore, only “Journal Article” was used in the PubMed & Scopus database as well as in IEEE Xplore, in which conference papers were also considered), meaning marketing- and research trends studies, and studies in which VR is only used as a distractor to reduce for example sensation of pain [43]. Furthermore, studies in which an audience’s social behavior and attention patterns were analyzed were also excluded. While the latest research shows that HMDs offer the best immersion [85], all studies with VR simulators or desktop VR were excluded. Moreover, studies related to gaze parameters that provided no product-specific HMD or ET information or gave information about participants and their role in the examination were excluded. All studies not older than 2015 (2014 was the final year cut off) are considered within the review. All inclusion criteria are also summarized in Fig. 1. As the primary goal of this review is to collect information about studies analyzing or using gaze behavior in VR, studies that did not meet both criteria were not considered in this review.

5.3 Data collection

The following information was extracted from the remaining studies that met all criteria: application areas, the technology used (HMD and ET devices), used gaze parameters, and differences between the gaze behavior in VR compared to the RW. The gaze parameters can be separated into static and dynamic gaze parameters, respectively, defined as eye movements. The static gaze parameters include the following types: Fixations [29], quiet eye [94], and visual pivot point [71]. LaValle (2017) categorized dynamic eye movements into six movements: saccades, smooth pursuit eye movements, vestibulo-ocular reflex, optokinetic reflex, vergence, and microsaccades [39].

The following flowchart is intended to provide a comprehensive overview of the literature research. Ultimately, 38 useful and appropriate studies were included in the review.

Fig. 1
figure 1

Flowchart of the literature search process

6 Results

Based on the literature search, we found 1161 articles. From these, 38 articles met our search criteria and were therefore used and analyzed for this review.

6.1 Technologies

First, the different application areas were compiled in which ET is used in VR. The two technologies are most frequently combined and utilized in informatics (39,5%), followed by engineering with 15.8%, in the field of psychology (13.2%), and human-computer interaction (HCI) with approximately 10%. Only 2.6% of the relevant studies came from sports science and ergonomics/ healthcare. An overview is given where all application areas are listed (Table 1).

As mentioned in the literature review method, only articles with HMD devices were included. The HTC VIVE (and HTC VIVE Pro) was the most often used VR device with 63.2% in all included articles, followed by the Oculus Rift DK2 with 18.4%, the FOVE 0 with 13.2%, and a self-made HMD, which was used only in one study (2.6%), in which the display of the HTC Desire Eye was built into [40] (see Table 1). To better understand different HMD systems, we refer to [56], who tested various products, including neck strain, heat development, color accuracy, text readability, comfort, and contrast perception. The authors endorse the HTC Vive Pro that performs best which indicates the high number of uses in this review.

This section provides information about the used ET devices within the HMDs mentioned above. All ETs used in these studies are integrated into the HMDs. The SensoMotoric Instruments (SMI) integrated ET was used most frequently with 34%, followed by the Pupil Labs integrated eye tracker with 24%, FOVE 0 integrated eye tracker with 13%, Tobii Pro integrated eye tracker with 13%, HTC VIVE integrated ET with 7% and the aGlass integrated eye tracker, Oculus Rift DK2 monocular add-on cups and the self-made HMD, with endoscope cameras and an infrared light-emitting element from OSRAM with 3% each (see Table 1). The recording frequency of the ET system is also listed since the majority of sports performances take place at a fast pace. However, one should distinguish whether the Hz display of the HMD or the recording frequency of the integrated ET system was mentioned, which could not clearly be stated from some of the founded studies.

Table 1 Characteristics of the studies (n = 38) using eye-tracking in VR and additionally the comparison of the gaze behavior between VR and real condition

6.2 Common regarded gaze parameters within sports-related purposes

To further develop an imagination of possible integration of the combined technic into sports, the collected gaze parameters used in the 38 studies were extracted and presented in this section. We are interested in those often regarded and can be transferred for sports-related purposes (for review, see [38]). The recording of the gaze parameters of the respective studies is described below and visualized in Table 1. A field is marked with “No” in the table if no gaze parameters were found in this study or their definition was not described. The authors want to notify that this review only serves as an overview without providing detailed information on each study. This section provides a better understanding of what parameters can be extracted from ET in VR and how the authors defined them. In addition, it shows how visual perception can be measured using various gaze parameters and how they are defined.

In many studies, the identification and calculation of gaze fixations were used to proceed with different applications further. Therefore, the authors often describe an algorithm for identifying fixations in VR [51], including various parameters such as fixation time and dispersion thresholds [48]. Hereby, the raw horizontal and vertical distributed gaze points data (often found in a coordination system with x and y-values) were used to determine fixations by calculating the orthodromic distance and the speed between successive gaze points [28]. Potential fixations were linked if they occurred 150 ms apart and two group centroids were displaced by less than 1°. Another criterion was to exclude fixations with less than 100 ms.[8] define fixations as stable viewpoints that can be extracted from the raw gaze data. They used an algorithm to identify fixations, which was when the gaze had a duration of a minimum of 80 ms and a maximum dispersion threshold of 3 ° great circle distance. This is not in line with other findings using other thresholds from 100 to 250 ms [3], or a minimum of 150 ms and a dispersion of a maximum of 1 ° [87]. In this context, “areas of interest” (AOIs) were often defined by identifying fixations which are defined as focusing on an object for at least 200 ms [35]. The authors analyzed the visual perception process by determining the perceived gaze fixation data from a duration of 200 ms as conscious and from a duration of 300 ms as visual understood. This method is standard within ET studies within real scenarios [91] but often relates to frame-by-frame analysis [38]. Hereby, VR has great potential to simplify gaze analyses using AOIs, since gaze vectors within virtual environments can easily be followed when they hit virtual objects. Determining gaze fixations within sports scenarios is essential since picking up relevant visual information is fundamental to perform sports exercises effectively [38]. The visual system is limited to the fovea centralis, which continuously forces us to initiate eye movements.

In addition, the analysis of saccades is also considered in most of the studies. [8] define them by determining the angle differences for successive fixations. One study aimed to examine the VOR that stabilizes the gaze position during a gaze fixation while the head is moving. The authors also measured saccadic eye movements by using a method that is based on the “Developmental Eye Movement” (DEM) test [41]. Here, the saccades are divided into “Adapted Time”, “Remaining Time”, and “Transfer Time”. In the following study by [53], saccades are examined, which are identified by determining the speed of PORs. Therefore, the distance between the direction of the eyes of the next and the previous sample is divided by the time between these samples. According to [53], the amplitude of the total change in eye orientation is another criterion and must be at least 1°, and the gaze should move at least 20% faster than the ball. The authors categorized the identification of saccades into corrective and predictive saccades. Especially for saccadic detection, it is necessary to work with a high-frequency eye-tracker (at least 50 Hz) to receive undebatable results [38]. In [88], the authors examine saccades and the associated saccadic suppression. The suppression occurs before, during, and after a saccade [88]. Saccadic suppression begins with each saccade and can last up to 100 ms, whereas a visual saccade lasts from 20 to 200 ms. The previous two gaze patterns are scanned using the eye tracker to estimate the current angular speed of the user’s gaze. If the angular velocity is greater than 180 ° per second, this is identified as a saccade. Saccadic performances are analyzed and compared between athletes with different skill levels to examine automatisms within gaze behavior elicited by motor learning [23]. The combination of VR and ET could be helpful to investigate further saccadic eye movements and visual attentional cues within different sports situations. VR has great potential to conduct standardized training sessions and measure participants’ gaze behavior within virtual environments. For example, saccadic eye movement training in web game playing is often used to reduce the abnormal occurrence, such as longer anti-saccade latencies and lower pro-saccade accuracy [42]. In addition, it has been used after a sports-related concussion to restore saccadic amplitude and velocity [61].

Many studies considered the raw data extracted from the ET system to measure gaze accuracy and precision [67]. The raw data included all points of regard (PORs) measured frame by frame. The PORs are necessary to calculate the gaze parameters such as fixations, saccades, or VOR. They can also be used for foveated rendering to present a more realistic virtual environment (higher graphical components) and less computational power supporting usability. The measurement of the gaze accuracy and precision is also essential to determine the quality of the ET system, which further improves the validity of the extracted gaze parameters by using adequate algorithms.

Other studies focus on the VOR for analyzing gaze behavior or developing new interaction concepts calculating the vergence angle resulting from the vergence movements of the eyes [32, 87]. This is the angle between the visual axes of both eyes when the subject focuses on a point. To calculate the vergence angle, the 3D positions of both eyes are recorded, the line of sight of both eyes is converted into angles of rotation using inverse kinematics (using the OpenSim software) and the Iskander eye model. The eye tracker records the 3D position of the eyes and the point of gaze in the Unity Space. [87] shows an inverse linear relationship between head speed and the relative gaze speed when fixing a target. The reflex ensures that the eyes move against the direction of head movement. In [77], gaze accuracy and gaze precision are defined by determining the deviations from the POR. The smooth ET movements (SPEM) cause the difference in fixation accuracy between the moving target and the static fixation cross [77].

6.3 Objectives and possible application into sports

After considering the gaze parameters found in the related studies, suggestions are made using examples and their possible application to sports. For better imagination of potential ET application in VR and future realizable gaze analyses, see Fig.  (Table 2).

Fig. 2
figure 2

An example of possible gaze analyses within virtual environments during movement observation. The yellow circle indicates the binocular Point of Regard (POR) of the participant. Using colliders that are overlayed about different body regions enables the measurement of visual attention (1). (2) shows the sum of observation time in each region. (3) shows the scanning pattern (sequential POR pathway reconstruction and duration of observation indicated by the circle size) during the observation of movement execution

Table 2 Objectives of the studies where the extraction of gaze parameters is reported and the meaning of them concerning the usage into the sports science

7 Discussion

The current review gives an overview of the actual state of combining ET systems in VR applications and shows the extraction of various gaze parameters that could be used within sports-related situations. In the end, 38 studies were collected that presented different approaches measure or to use gaze behavior to develop higher implementations in VR. This review does not provide a summary of the entire content of each study, but it can be seen as an overview how ET might be useful for VR scenarios in general. Furthermore, it could be further used to collect ideas using it for sports-related purposes. In this section, the content of each study, including the extracted gaze parameter, enhances the practical use concerning sports-related issues. Several significant categories emerged, which are subdivided, and the studies’ content has been derived. In the end, the compiled results support new ideas for using ET in VR during sports scenarios to analyze gaze data or improve VR applications making it more suitable for sports.

Tremendous work has been made regarding the integration of ET systems in HMD VR for a variety of scientific purposes. Generally, ET data was not only used to expand knowledge of visual perception. In many studies, especially in informatics (brain computer interfaces, human computer interaction), the captured gaze data was used to optimize VR applications by benefitting from the weaknesses of the visual system. Before presenting the main idea behind each study, we recommend reading through studies, which clearly describe how the ET can be integrated into VR to ensure the basic comprehensive [16, 40]. This may serve as a supportive step to understand each study’s purpose. After reviewing the literature, different studies’ goals or purposes have been crystallized. Therefore, we sorted them into similar intentions and discussed possible sports references.

7.1 Predictive eye-movements

A few studies focused on fixations that can be captured through the integrated ET systems listed [3, 48, 87]. The main question that was worked on was how people explore and how user attention can be driven in virtual environments [87]. In this study, an existing fixation bias could be detected, which was used by the authors to adapt existing saliency predictors. The authors stated that predicting exploratory user behavior could be useful regarding gaze-contingent techniques [87]. Hereby, the numbers and duration of fixations, the speed of gaze, and VOR were considered in their analyses. Predicting gaze is also used to maintain the accuracy of the integrated ET, even if the position of the HMD changed relative to the position of the user’s head [82]. The authors presented a solution for inaccurate gaze estimation by detecting the drift vector without interrupting the VR experiences, which could also be usable for foveated rendering [82]. Predictive eye movements were also analyzed in VR using ET during a task that demanded the participants hit a bouncing ball [53]. The authors concluded that predictive saccades direct the gaze above the location at which the ball will bounce for having suitable ball tracking after the touch of the ground. [53] could be seen as a kind of pioneer in the use of ET in VR during sports. This feature is suitable for examining the anticipatory skill, which is more sophisticated in experts’ visual behavior than novices [27, 76, 79]. Detecting fixations accurately and precisely while perceiving the virtual environment is suitable for examining the quiet eye duration, which is defined as the final fixation before initiating the movement and lasts at least 100 ms [13].

7.2 Recognition of spatial information

Among others, the gaze fixations’ analyses were used to recognize of spatial information for wayfinding in virtual environments [35]. Therefore, Areas of Interests (AOIs) were created to observe gaze fixation in specific areas to obtain information on participants’ cognitive processes of visual perception during wayfinding tasks. Other studies that do not fully match our search criteria exploited a recently developed VR environment in a novel RW 360° scene. Hereby, ET was also used to determine exploratory behavior in eye movements by extracting the number and duration of fixations, saccades, and head turns [28], or with focus on color perception [17]. Using similar parameters, a visual exploration of omnidirectional panoramic scenes has been conducted to extend further the correlation between eye- and head movements [8]. In sports, it could be helpful to create AOIs not only for measuring wayfinding ability but also for finding relevant visual cues when it comes to learn new movements or inspecting the events on the playing field. Previous findings could confirm that the observation of an individual learning a new movement enhances one own performance [55].

7.3 Comparing different methods to analyze gaze behavior

Another study using gaze fixations compared two methods, the classical frame-by-frame analysis against a newly developed algorithm that allows fixation detection in immersive virtual environments [3]. They have made use of the ray-casting originating to determine where the user is looking at. It could be shown that algorithmic solutions seem to be more efficient than conservative frame-by-frame analyses. It is fundamental to determine fixations’ spatial and temporal characteristics since different definitions could lead to different results, which is also discussed in [48, 67]. A further approach to developing an algorithm identifying fixations is made by [48]. They presented a guideline for algorithms detecting parameters such as the number of fixations and the percentage of points that belong to a fixation [48]. Overall, detecting fixations in virtual environments can be provided well since comparison to conventional methods has been verified using ET in VR [3, 87].

7.4 Foveated rendering

A high number of studies did not focus on visual perception analysis. ET was used to exploit the human visual system’s limitations to increase rendering performance, which leads to a more realistic visualization of the virtual environment without an increased computational capacity [44, 57, 69, 77, 96]. Foveated rendering provided a quality loss in the peripheral regions (FoV) to ensure high fidelity in the fovea. The HMD technology develops continuously, which results from increased display resolution and target refresh rates [69]. The resulting problem is providing real-time rendering, maintaining low latencies reducing nausea and ensuring high-level immersion [77, 96]. For example, the HTC Vive Pro (the most used application in VR studies, see Table 1) provides a resolution of 2880 × 1600 pixels and a FoV of 110°. The entire dynamic FoV of the human visual system requires a 32k x 24k pixels resolution and a FoV of 220° horizontal and 150° vertically [77]. To circumvent this problem, studies using foveated rendering have set the goal to develop different approaches to reduce the quantity of data but ensure higher fidelity, better immersion, and less motion sickness. One approach consisted of a simple technique for reducing the cost of foveated rendering by leveraging ocular dominance [57]. They could show that eye-dominance-guided foveated rendering provides similar results to conventional foveated rendering methods and ensures a higher rendering frame rate.

To guarantee smooth foveated rendering, the integrated ET system has to deliver high quality in accuracy and precision to meet perceptual requirements [77, 96]. Therefore, the authors let the participants fixate on static and dynamic targets. Sufficient accuracy for static stimuli was measured, but not for moving targets [77], which is also in line with other studies [67]. In this context, another study evaluated the perceptual abilities of human peripheral vision accompanied by a foveated rendering technic [69]. The authors complained that previous methods using foveated rendering significant exhibit head- and gaze-dependent temporal aliasing, which harms the immersion and increases possible distraction. Their system ensures a reduced rendering workload, a constant visibility sampling rate, avoidance of gaze-dependent blurring artifacts, and fewer shading pixel quads without significantly impacting the perception. To generate more computational power into focused regions, the limitation of the human visual system through the depth-of-field (DoF) was used to remove high-frequency signals from the visual periphery by its inherent blur [96]. The approaches in all studies considering foveated rendering showed significant quality improvements. They can serve as recommendations, which can be used for optimizing the design of its virtual environments without obtaining new expensive hardware components. This development is interesting for all different research areas because it could simplify the accessibility due to lower costs of the hardware components and increase graphical features that host synergy effects such as higher immersion, feeling of being present, and perceiving a more detailed environment. Especially during sports, fast situations occur, and a smooth process must be ensured to keep up the user’s presence impression. Also, a lot of (ET) data and movement data, such as joint angle, etc., can be recorded; shifting the computational power on the relevant mechanism within the scene could be beneficial.

7.5 Facial reenactment

Another innovative use of ET data in VR was the gaze-aware facial reenactment in real-time [89]. The purpose was to create a novel way of video teleconferencing, in which two people can participate in a VR conference by using the HMD, including a photo-realistic 3d rendering of the interlocutor. A solution was found regarding the transfer of facial expressions and natural eye appearance, which was hard to realize because of the occlusion of the majority of the face due to the HMD and the possibility of relying on ET devices combined with VR [89]. An integrated ET system captures eye motion in real-time (for example, blinks), which is crucial for preventing natural eye contact that ensures realistic conversations [89]. With similar technology, another study was conducted in which available stereoscopic depth cues were found as a supportive factor to recognize images of faces [47]. Those findings could be relevant for communication in future VR applications allowing multiple users to participate in one training session.

7.6 Saccadic eye-movements

A further limitation of the human visual system can be exploited to ensure infinite walking in virtual environments [88]. The size of the physical surrounding can have a massive impact on VR experiences. Therefore, [88] developed a method that detects saccadic suppression and head movements that redirect the user without notice. They could prove that rotation gains could be made without visual distortion and simulator sickness [88]. Therefore, large open virtual spaces can be explored in small physical environments (they reported to use a field as small as 12.25 m2). This presupposes a system that can detect rapid eye movements (saccades). The authors developed a method that provokes the user to make saccades without losing the focus of the essential task demands. The camera rotation (12,6°) could not be detected by the user as long as the gaze velocity was above 180° / sec by enhancing the visual search performance. The frequency and duration of saccades, the tolerance of image displacement during saccadic suppression, and the ET-to-display latency are emphasized as the important factors that ensure redirected walking. The authors also mentioned that the effectiveness of their developed method increases when the size of the physical environment also expands [88]. This method is crucial for all VR applications and can enormously enhance the perceiving of virtual environments since rotation gains increase without introducing visual distortions or simulator sickness. It is necessary to initiate movements during sports situations quickly, and many rotations have to be completed at an enormous speed. This method could realize doing VR training also at home with limited physical space.

Related to walking methods within virtual environments, [45] dealt with a navigation method using ET data (steering in the direction to walk), ensuring continuously moving to the target instead of jumping point by point. Therefore, two methods were compared, the eye-gaze steering (gaze ray where the eyes converge) and head-ray steering (ray emitted by the headset), to examine participants’ eyeball activity, which is crucial for VR training inducing lucid dreaming. Here, the interplay between saccades and fixations is crucial for realizing this method. Whether and to what extent these methods can also be used for sports is questionable, since in sports, the locomotion can deviate from the gaze- or head direction. This method may promote the activity of the eyeballs, which in turn can lead to physiological adaptations also provoked during visual training.

Visual search behavior is relevant for athlete performances, especially when using the parafoveal vision to gather more information at once [72]. The importance of saccadic eye movements is also mentioned in [41], which extended the measurement of saccades in the Developmental Eye Movement (DEM) test integrated into VR. The authors showed that more parameters could be extracted more easily since head movements are inevitable during natural behavior, and gaze analyses can be completed with fixed placed HMD on participants’ heads. This enables collecting more detailed data such as the time the eyes stay on the target or the delay between perceiving the target and the initiated movement [41]. Being able to detect saccades and fixations could be informative concerning the role of different sources (foveal versus peripheral) during opponents’ attacks [80].

7.7 Interaction of eye, head and body movements

Another study’s purpose using ET Data in virtual environments was to examine the interaction between eye, head, and body movements and therefore given up recommendations for design implications for human-computer interaction [86]. The authors stated the importance of gaze behavior since statements of attention, interest, and intent can be drawn through gaze shifts can also initiate torso shifts to the objects observed by the user [86]. One aim was to make the VR application more suitable, for example: by placing the UI elements at a comfortable posture in the FoV to reduce the initiations of head movements or avoiding placing them at the edges of the screen, where a decreased accuracy of the integrated ET system exists, and interaction quality could suffer. This could be essential for future stand-alone training applications in VR. Regarding the practical uses, they also highlighted multiple advantages of ET in VR, such as preventing external light and a more stable system that is mounted closer to participants’ eyes and is less affected through shifts triggered by head movements. Here,[15] also developed a fully convolutional network (deep learning system), which can identify corneal reflections during position shifting of the HMD with less memory and faster execution than other deep learning systems.

A few studies included analyses of gaze scanning or pupil dilation, which reveal clinical implications for patients suffering from visual field loss [14], examine individual light perception [51], or provide additional information about working memory [83]. Quite similar to [57], the approach of using the combined technology to analyze gaze data of each eye separately, the researcher investigated whether the peripheral vision is limited through the performance of the “better-eye” [14]. It was also associated with head movements, as in other studies [86]. By letting the participants conduct a searching task, they could show that the weaker eye still provides essential information in the binocular sensitivity periphery, which is not in line with previous findings [14]. One point of examining the role of peripheral vision in sports was to determine the gaze anchor, which is a cue-optimized position allowing the monitoring of peripheral cues and avoiding saccades suppression [92]. With ET in VR, more information can be collected due to endless ways of modeling and programming sport-specific scenarios. This could be confirmed in [31], in which the user’s gaze behavior in VR was investigated during dynamic scenes under free-viewing conditions to better understand the visual attention and predict user’s gaze position. This could be important for VR content design, for gaze-controlled interactions, and gaze-contingent rendering. This also shows the integration of ET in VR in more complex scenes, including dynamic visual stimuli attracting user’s attention comparable to sports situations.

7.8 Differences between the virtual and real environment

Other authors were concerned that the HMDs cannot compete with the human eye’s ability in terms of the high dynamic range of brightness and color [51]. They recommended adapting simulations to each user using gaze parameters such as pupil size and gaze direction to increase visual acuity. In contrast to other studies, the authors reveal the performance of the ET system in terms of its accuracy (∼ 1.0°) and precision (∼ 0.08°) instead of presenting the manufacturer’s data. Considering the different task demands, the data quality is in line with other studies except for a slight loss in gaze accuracy [67]. In this study, the data quality of the gaze behavior between the VR and RW was compared. The participants wore an HMD HTC VIVE combined with an integrated SMI eye tracker in VR. In the RW condition, they used the binocular ET glasses 2.0 from SMI to record participants’ gaze behavior. The participants had to complete various tasks, visualized on a 23.5-inch monitor in the virtual environment perceivable through an HMD display in VR and a real setup that includes the same physical arrangement. This study aimed to record the gaze accuracy and precision during three tasks, to investigate subsequent differences between VR and RW. First, the participants observed static crosses at the edges of the screen. Second, they had to pursue a moving target as an infinite loop from the same distance to the monitor as demanded in task 1. During the third task, the distance of the monitor was shifted, and a fixed cross in the monitor’s center should be fixated for at least three seconds. The results showed no significant differences in gaze accuracy between VR and RW if the participants looked at static targets at a short distance (1 m). However, there are minor significant differences if the targets were placed at different distances and large significant differences in tracking moving targets. The results have shown less quality in VR in all tasks than in RW in terms of precision. Nevertheless, the authors stated that the precision values are still in an acceptable range, and therefore, they endorsed the use of ET in VR.

7.9 Scanning pattern

An analysis of gaze scanning patterns revealed an impact evoked by different engineering information formats provided, such as 2D isometric drawing, 3D model, or VR model during task performance in a construction operation task [83]. One point is the visualization of additional information, which could lead to more cognitive load and harm the ability to extract essential information [83]. Besides, the authors emphasized the correlation between gaze scanning patterns and the levels of working memory. Due to the integration of different visualization systems, their interests were to compare them to investigate the extra affection on working memory. The results showed improved task performance due to the participants’ high immersion level and sense of presence during the 3D and VR model groups.

7.10 Vergence-accommodation-conflict

A small part of the studies found thematizes the analysis of the eye movements, to be more specific: the vergence-accommodation-conflict (VAC) [32, 36]. The studies focused on how humans use allocentric information for memory-guided reaching of visual targets in depth [36]. They found that binocular depth cues such as vergence and retinal disparity provided essential information for coding target location in depths and revealed a preferred use of allocentric representations [36]. The VAC is the main contributor to visual fatigue when perceiving virtual environments [32]. Therefore, the authors tested how the HMDs could affect the vergence system to make further recommendations on how a virtual environment must be constructed to allow unrestricted and comfortable interaction through eye gaze. The authors pointed out visualizing gaze-controlled virtual objects in a specific size and distance to enhance performance.

7.11 Synchronization with other measurement systems

Interaction in VR is not only limited through ET. A new way of interaction has been modified, including ET, VR, and electromyography (EMG), which allows using physiological signals as input [65]. The idea was to create a hands-free interaction method using ET data for pointing and EMG-data (muscle activity in the forearm) for selecting. Due to reliable and valid extracted data, this method improves VR interaction and increases the VR experiences [65]. As in all other studies which focused on gaze interaction during VR, the accuracy of the integrated ET system is emphasized as an essential factor to ensure trouble-free usage. [98] confirmed the importance of precisely detecting the users’ gaze, secured by adjusting the IPD. Comparing different interaction methods, dwell gaze (to fixate the target for a fixed duration that allows interaction) has scored lowest, whereas the usage of motion controller (conservative approach) and the combination of ET and EMG have been similarly preferred [65]. [47] used gaze-based methods named “OrthoGaze”, which allows the manipulation of the position of a virtual object only by using the eyes and head gaze.

An additional benefit of VR is the synchronization with infrared motion capturing systems such as Vicon Nexus [68] and Vicon Shogun [66], or systems that generate inputs like a smartphone keyboard as a typing interface [34]. Regarding sports, the visualization of a virtual avatar can simplify body parts’ visual occlusion, amplifying the knowledge about the use of different visual cues [6]. A further system has been integrated, allowing measuring neurophysiological signals during training under stress in VR accompanied by ET [84] or testing the human-robot interface design [100]. The Functional Near-Infrared Spectroscopy (fNIRS) is an innovative, easy-to-apply, portable method that can measure brain activity during movement [12]. It has also been integrated into VR with ET, enabling the investigations of possible correlations between task performances and gaze movement patterns or neural features [84]. With this setup, the impact of stressors on effective learning can be examined. The integration of physiological measurements within VR scenes has also been done in which EEG signals are used to examine participants’ attention further [19]. Another research direction focused on user identification methods through human kinesiological movements and gaze data [63]. For example, this method is essential for multiple users simultaneously perceiving the same virtual environment or for security to identify the individualized body properties. Gaze data such as fixations, saccades, and accelerations are additionally used as biometric features to improve individual characteristics.

7.12 Limitations of the combined technology crystallized by the outcome of the studies

In the majority, the limitations of each study were considered concerning their content, but only a few reported today’s technology boundaries. Even though the visualization of virtual environments has already developed further, the problem of the appearance of motion sickness still exists, which harms the participants’ performance, for example, slower walking and stopping to look around the scene [16]. Longer performances can also lead to increased eye fatigue, which results in the loss of gaze accuracy, which is crucial for analyzing and the usage of methods improving scene visualization without the additional purchase of hardware allowing higher computational power (foveated rendering). A loss in gaze accuracy was also discussed in [65], who reported repeated calibration was necessary to achieve an acceptable level of accuracy. They also found differences caused by different samples, including several factors such as eye color, size, or the relative position of the eyes [65]. One study claimed the restricted FoV during the experimental conduction since it is hard to compare the visual perception or search strategies with realistic conditions [67]. The missing construction operations are also mentioned in [82], who examined in a controlled laboratory environment instead of complex RW scenarios.[67] also revealed significant differences between the integrated ET system in the HMD and mobile one. Even though the lower precise system extracts the values in the HMD, the quality is still sufficient compared to other measurement systems. In addition, the authors were restricted in using older ET systems (developed in 2015) due to the focus of the comparison between them. More sophisticated and higher developed ET and VR systems are available in which an increased quality can be expected. Nevertheless, a loss in data quality was found when the participants had to follow moving targets, which was also noticed in [77]. This could occur through tracking latency, the unpredictability of target’s movement, or tracking precision itself, which is an issue of smooth pursuit eye movements. With this, the velocity of presenting the visual stimuli could harm accuracy due to physiological constraints [77].

In terms of visualization of the virtual environment, it would be helpful when the manufacturer would provide design guidelines for interactive objects. They also should present information about the lenses that were used to specify the zone of comfortable viewing[32] or provide an application programming interface (API) for further embedding [63]. As an additional limitation of the VR system itself is mentioned, the RW accommodation of the eye lenses is crucial for depth cues since the distance between the display and eye positions never changes [36]. Less precise depth perception should be considered making it hard to transfer all results in RW scenarios entirely. One experiment reported a loss in accuracy rate caused by eyelashes and provided a solution [40]. Concerning the gaze interaction method OrthoGaze, the authors stated increased usability if the user can change their viewpoint [46]. The results also showed low accuracy on the ground plane; checking which areas in the virtual environment play an essential role in the interaction is recommended. To improve the usability of gaze interaction concepts, an optimized visualization is required, based on the user’s viewpoint [46]. [51] claimed the perceived visual acuity of RW lightness was impossible to realize in the HMD due to missing hardware performance [51], which was also discussed in [69]. [53] also claimed methodological limitations and emphasized the importance of using suitable hardware components (ET system with high temporal accuracy) to make statements about performances that needed high temporal constraints [53]. When using foveated rendering, it is still a challenge since artifacts due to temporal aliasing of moving objects, phase-aligned aliasing, and saliency-map-based aliasing can occur, e.g. [57]. We recommend to [86], who showed that gaze is to be understood as multimodal concerning eye-, head- and torso movements, and therefore, some limitations can be excluded for different user groups. [89] presented a of way real-time gaze aware reenactment, which was limited to only one eye modification, and a second IR camera needs to be included to collect more data.

8 Limitations

This review still has some limitations which need to be mentioned. First, it merely gives all interests an overview of different applications going in different directions, but it doesn’t show a straightforward way to implement such software and hardware components. Second, incorporating the research fields has not been easy since highly interdisciplinary research has been observed, and the specific background was not easy to follow. Third, although many studies have been found and regarded in this review, one should expand the search through more conference papers (not only IEEE) since new and innovative technologies are presented there, helping to understand the potential of ET in VR. The current review should give impressions on how the combined technology can be used in future sports applications and encourage new ideas.

9 Conclusions

The current review gives an overview of ET usage in VR and insight into the commonly used hardware components. In addition, it shows different categories and application fields in which ET data are used to analyze participants’ gaze behavior within a virtual environment, improve usability and enable more realistic viewing conditions without increasing the computational requirements. Except for one example, no study has used this advanced technique to record and analyze gaze behavior during sports situations. The content of the studies leads to new ideas regarding sports science research and the permanent development of today’s technology ensures the use of VR inclusive ET.

In general, it has been shown that individual gaze parameters such as gaze fixations, saccades, vergence, and accommodation can be recorded with newly developed algorithms. Furthermore, in some studies, AOIs were also defined, and gaze data within them were recorded at high frame rates (adequate to real conditions). Besides, the advantages of VR concerning the collection of viewing angle data were often mentioned, which can be extracted by the precise acquisition of distances from the viewing point to the object fixed to the camera. Due to the advantage of VR itself, any sports situation can be played without additional costs or time allowances, etc. Although HMDs equipped with ET are generally more expensive than those without, the development of VR scenarios is free of charge since game engines such as Unity and other freeware for modeling and simulating, e.g., Blender, are available and usable even for people without informatics background. This can be guaranteed through numerous tutorials giving an overview of the main functions of each program. Future virtual training sessions may be more cost-efficient than real training settings, e.g., no costs for personnel (coaches, opponents), infrastructural causes like travel costs, or material costs. The integration of further measuring systems in VR combined with ET, such as the fMRI or EMG, will enable even more user-friendly operation and help collect more data within one scene in terms of biofeedback.

Compared to classical systems, the integrated ET in VR offers several advantages to measure the gaze behavior of athletes in different sports scenarios. For example, when fast movements are made, one big issue is the shifting from the position of the HMD relative to the head to generate valid and reliable data sets from the ET system. A solution for this has been provided, which is crucial for analyzing gaze behavior during sports situations. Furthermore, ET offers the possibility of making the small range of sharp vision even more realistic using today’s computing systems. For virtual immersion, the degree of immersion is of immense importance, which can be guaranteed by the increased graphic representation and innovative forms of infinite walking. This would be of great advantage, especially in sports scenarios, since vast playing areas exist, for example, football, soccer, basketball, etc. When fast movements occur, the measuring systems must provide high frequencies and accurate measurements. Many studies stated suitable fresh rates of the ET systems (generating a minimum of 60 Hz and a maximum of 250 Hz in VR) for sports-related movements, whereas in real world the standard of ET devices (at least for mobile ones) is 25–60 Hz. Especially for saccadic detection, fast systems are necessary to measure saccadic movements over time. At this moment, the refresh rate of the HMDs should also be considered since this can be shift from the integrated ET system and influence the output. In this context, the problem frequently occurs that no information is provided or only those given by the manufacturer are reported, which is not always in line with real conditions.

10 Future directions

Considering the usage of ET in VR during sports-related contexts, the studies’ contents stimulate food for thought to further reveal athletes’ visual attention. The compiled results have shown that the existing methods are feasible but still need improvement.

Nevertheless, we must emphasize that this equipment will not be available or accessible to everyone. Regarding the content of each study, the understanding of the development and the later application possibilities is changing. The representation of facial features could also be used in sports for a particular form of training. The HMD with integrated ET can detect gaze behavior and head movements, which also play a crucial role in sports. Based on the results of the current review, it can be suggested that analyzing the gaze behavior during sports scenarios in VR is realizable, and the advantages can be used to the full extent. The combined technology can be used to examine the correlation between eye and body movements since both data can be recorded simultaneously, and previous studies have already tried to understand the sequential order of gaze-, visual- and motor systems in planning complex movements. Instead of using the conservative method (ET and video presentation), it can be used especially for determining depth perceptional cues. For example, in VR scenarios, the binocular gaze vector is represented by a ray that can hit observed objects and gives feedback on whether objects with higher distances were really fixated by the participants. In real-world settings, 2D maps are often used and operated as an indirect indicator of users’ gaze position. Furthermore, videos do not represent realistic situations, especially in sports, whereas HMD-based virtual scenarios allow the participant to experience complex situations profiting from a high immersion. Since developed algorithms can already compensate for the relative position of the HMD on the head and the eyes, it is imaginable to use it also in faster and mobile situations. Such kind of training also allows increasing variations or flexibility such as positioning changing within a team sport on the field. Gaze parameters could be collected during all these situations and could reveal the visual attention of the athletes. Trainers could comprehend the athletes’ acting in different situations and innervate the athletes’ attention on essential cues. Furthermore, it can be used to understand the visual behavior of advanced athletes to novices or to develop visual training scenarios within VR. Due to the advantages of VR, this can be done more easily than in the real environment, for example, being able to calculate 3D gaze points and gazed objects using colliders.

Further examinations in the sports-related context should clarify the combined technic ET and VR suitability since only one study was found in which this combined technology was used to measure gaze behavior in a virtual environment in a sport-specific scenario. Conclusions about the use of this technology could therefore only be drawn indirectly.