The interaction and observation of dynamic objects is one of the most common actions of intelligent life. Humans routinely analyze their surroundings to find and react to dynamic objects. Example tasks are manoeuvring in traffic or playing basketball. Within those task environments, analyzing the movement of dynamic objects, like a car or a ball, helps us to evaluate our current situation. This enables us to select an action to enhance the outcome of our nearest future, such as braking in front of a traffic light or catching the ball. The knowledge on the information gathered to perform a task therefore increases the insight into cognitive processes. The motion of the eye in combination with the used sources of information is defined as eye movement (Seifert, Rötting, & Jung, 2001). This article describes an Integration Guideline for Dynamic Areas of Interest (IGDAI) for capturing and integrating dynamic objects into an experimental setup, and combining them with eye movement.

Over the past decades, human computer interaction analysis has gained in attractiveness due to the fact that the best product does not only have the best functionality but also an intuitive control concept. This also led to an increase in popularity for eye-movement analysis, as it is an objective method for measuring the information-gathering process of humans interacting with interfaces. Eye-movement analysis has been used in a variety of domains, such as reading (e.g., Castelhano & Rayner, 2008; Granka, Joachims, & Gay, 2004; Rayner, 1995) or image processing (e.g., Henderson, Weeks, & Hollingworth, 1999; McCarthy, Sasse, & Riegelsberger, 2004; Rayner & Pollatsek, 1992). The first-level analysis of eye-movement data usually focuses on the separation between fixations and saccades. Second-level analysis concentrates on summarizing first level results in relation to the task environment, such as the combination of an area of interest (AOI) with distribution of eye movement, scan paths or transition matrices. AOIs are defined regions where a certain information is presented within the task environment (e.g., screen) (Duchowski, 2007).

The first-level analysis requires an algorithm for separation of raw eye-movement data into fixations and saccades. For that matter numerous algorithms are already well documented and can be divided by real-time versus post analysis (Duchowski, 2007) and velocity-, area-, or dispersion-based (Salvucci & Goldenberg, 2000). However, in a dynamic task environments smooth pursuits (Reimer & Sodhi, 2006) restrains a clear separation between fixations and saccades.

The second-level analysis usually includes information about the task or the environment and combines these with the eye movement. A common approach is the application of AOI. Combining AOI with eye movement allows insights into the information-gathering process within the environment, such as the meaningfulness and visibility of a status bar integrated into a website. The maximum amount of information gathering is reached by combining all visible information and all actions performed by the human. Gathering this kind of data normally takes a lot of planning and consideration of the task and the environment, especially in the case of a dynamic environment with moving objects.

This article focuses on the combination of eye movement with dynamic AOI within artificial environments. Studies combining complex dynamic environments and eye movement occur frequently in the domains of driving (e.g., Akamatsu et al., 2001; Palinko, Kun, Shyrokov, & Heeman, 2010), marketing (Bulling & Gellersen, 2010; Josephson & Holmes, 2006; Nielsen, 2006) or air traffic management (Bruder, Leuchter, & Urbas, 2003; Landry, Sheridan, & Yufik, 2001; Möhlenbrink, Oberheid, & Werther, 2008). These studies use eye movement within artificial environments that comprise moving objects that are partially or completely controlled by participant actions, such as the iew out of a moving car, interaction with web pages, or air traffic controller interfaces. The moving objects within these different environments are of importance to the tasks. However, these studies concentrate on the application of dynamic AOI and do not aim to improve the method itself, some of them even do not publish their method. Comments in the conclusions of those studies state that combining eye-movement data with dynamic objects is extremely time consuming and specialized only for the described environment (e.g., Papenmeier & Huff, 2010).

As described above, regions within the task environment that are a source of information and useful to the eye-movement analysis are considered as AOI. Attributes to describe an AOI within a task environment are identifier, shape, and position. The identifier is unique for each AOI within a task environment. Shape defines the area of the AOI. Position defines the center of the shape in relation to the task environment. The static and dynamic AOIs can both be defined using identifier, position, and shape. The distinction between static and dynamic AOIs can be explained using a coordination system as reference for the areas over time. Static AOIs have the same position, shape, and size in relation to the reference system at all times. Dynamic AOIs change their position, form, orientation, or size in relation to the reference system over time. An example for a dynamic task environment is a radar screen because it contains both types of AOIs. A radar screen normally displays a two dimensional top view of airspace sectors. Assuming that the radar screen is fixed over a particular sector, it provides a two-dimensional coordination system as reference for AOIs. The radar screen sometimes shows motor highways, rivers, or forests to increase the orientation for the air traffic controller. These areas never move in relation to the screen and therefore represent static AOIs. The radar screen also shows aircraft that move through the sector. The moving aircraft are AOIs that change their position in relation to the screen and are therefore dynamic.

Considering the differences between static and dynamic AOIs, an analysis of eye-movement data in combination with only static AOIs would not provide a comprehensive view on the used sources of information. The solution is to define dynamic AOIs for moving objects and combine them with eye-movement data in post processing. Thereby the separation of the dynamic information and the static background becomes possible and increases the significance of the data analysis.

The radar screen example above demonstrates the three main issues for applying dynamic AOIs. The first issue (Set-up requirements) is used to configure the environment to capture the needed information. The second issue (Storage requirements) is used to identify the information that has to be recorded and derive a suitable data structure. The third issue (Data Analysis Methods) is used to match eye movement and AOIs. All three issues depend on each other; this means that the solution to the first issue inflicts the solution to the second and third issue. An overall approach that addresses all issues and is universal for the application to any task is needed for applying dynamic AOIs to any environment. Below an overview of existing approaches for combining eye movement and dynamic environments is provided.

Existing approaches

Research connected to dynamic AOIs ranges from theoretical descriptions (e.g., Bruder et al., 2003) to the application of methods that are customized to the studied task or environment (e.g., Papenmeier & Huff, 2010). The three issues identified above are usually covered in the method section by shortly describing the individual solutions. Only a few publications try to give an overall view covering all three issues connected to dynamic AOIs. To assess the different approaches we have to investigate how the three issues are solved.

The approaches can be categorized into online and offline. Applying dynamic AOIs online means to match the gaze vector with the dynamic objects within the environment at the time the eye movement is recorded. In contrast, recording dynamic AOIs offline means the matching is made in a post-processing step. One example for an online matching of eye movement and dynamic AOIs is presented by Sennersten et al. (2007). They combined an eye-tracking system (Tobii) and a game engine (e.g., CryEngine (Crytek GmbH, n.d.)) and transferred the gaze position and direction from the eye tracker online into the virtual environment generated by the game engine. Thereby they cover the Set-up requirements and the Storage requirements. Then they calculate where the eye vector hits the first surface within their scene. The name of this surface is then stored for further analysis. This covers the Data Analysis Method issue. Avoiding the post processing of eye movement and dynamic AOIs by online matching is time efficient. However, playback of the analysis is sometimes difficult because the complete virtual environment, all participant actions and eye vectors have to be stored. This problem is increased if the analysis has to be rerun, in case for example some objects (e.g., windows) are to be removed from the analysis (Papenmeier & Huff, 2010) or the calibration of the eye tracker had a small divergent. The online matching of dynamic AOIs also has the disadvantage of not applying any method for correcting the eye-movement data (Hornof & Halverson, 2002; Zhang & Hornof, 2011). Due to the research presented above, the authors believe that, at present, methods of replay and post correction are necessary in the field of eye-movement analysis. Therefore, the offline matching of dynamic AOIs is pursued in the following research.

A straightforward approach for realizing offline matching of dynamic AOIs is to use an eye-tracking system and a scene camera. Those combined generate videos that include the scene and the calculated eye position, also covering Set-up requirements and Storage requirements. The Data Analysis Method is covered while post processing the dynamic objects. They are identified offline and manually (e.g., Land & McLeod, 2000; Sennersten, 2004) by separate rating of every frame of the combined video. This approach is time consuming and subjective since the position of the eye in relation to dynamic AOIs can be interpreted differently.

BeGaze 3.0, an analysis software developed by SMI, also supports an offline definition of dynamic AOI. Here the analyst can define key frames with AOI and the software interpolates the positions of the AOIs depending on the number of frames between the key frames. The dynamic AOI can even be stored for repetitive use. However, the interaction between operator and environment changes most scenarios and creates an individual set of moving objects for each run (and therefore also AOI), therefore the repetitive use of BeGaze is limited. An automated procedure for synchronizing video files with eye-movement data is presented by Papenmeier and Huff (2010). They present DynAOI which matches the gaze position with three-dimensional (3D) models underlying the presented video files. The approach converts the captured eye-movement data on a two dimensional screen into a three-dimensional representation of the events within the presented video files. The gaze vector within the three-dimensional representation then is used to determine all objects on its path, with the object then serving directly as AOIs. Huff, Papenmeier, Jahn, and Hesse (2010) use this approach with real-time generated videos.

Bruder et al. (2003) propose an approach by presenting a set of technical requirements for capturing the dynamic information, covering the Set-up Requirements. The requirements include that eye movement is captured in relation to the screen on which the dynamic objects are presented. They cover the Storage Requirements by suggesting the application of markers, generated by the environment and stored with the eye-movement data indicating a change of information on a screen.

Another approach for the Data Analysis Method, depending on a synchronized and separate recording of eye movement and dynamic AOIs, is proposed by Zelinsky and Neider (2008) using the shortest distant rule for assigning the eye-movement data on a screen to the presented dynamic AOIs. They calculate the distances between each measured eye-movement point on the screen to each center of the visible dynamic AOI. The dynamic AOI that is closest to the eye movement is then classified as “looked at.” This approach can be implemented quickly and provides a good estimation of the results (Gross, Friedrich, & Möhlenbrink, 2010) if bigger dynamic AOIs further away do not cover smaller ones that are nearer to the eye movement. Fehd and Seiffert (2008) extended the shortest distant rule with a location competition analysis that calculates a weight for each moving target in relation to the nearest eye movement. The weights are accumulated over all frames to identify the targets. This approach will lead to a distribution of eye-movement data on to dynamic objects but sometimes has difficulties if the participants do not look at any object.


IGDAI was developed as an efficient way to cover all three issues (Set-up, Storage, and Data Analysis) in a structured and flexible manner that is need for each individual eye-tracking task environment. Because of the wide variety of different eye-tracking tasks and environments, IGDAI does not determine a fixed solution. IGDAI addresses the main issues of integrating dynamic AOIs into an environment for capturing eye movement by defining requirements that need to be fulfilled. Therefore the implementation of the requirements are open and can be adapted to the individual environment, but are at the same time interconnected by following IGDAI. IGDAI also includes an example implementation that fulfils the requirements to raise understanding for the process.

The guideline begins by defining Set-up requirements (Fig. 1 A, B, and C) that concern the process of data capturing to ensure the needed output information. Requirement D (Fig. 1) covers the storage of the eye-movement data with all necessary information. Then Requirement E (Fig. 1) accounts for overlaid visual information and thereby supports the process of post analysis. IGDAI provides a data structure that can store dynamic information as dynamic AOIs (Fig. 1 F). An example of how to store the necessary information for the post analysis is provided below. Taking all the previous steps into account, IGDAI describes the combination of eye movement and dynamic AOIs (Fig. 1 G and H) with the depth information. Practical solutions are provided for matching eye-movement data with dynamic AOIs.

Fig. 1
figure 1

IGDAI separated into three issues and seven requirements

Set-up requirements

The Set-up requirements (Figs. 1 and 2) are Mapping of Eye Movement (A), Timer (B), and Visualization Engine (C). Requirement A: The eye tracker shall have the capability to map the eye movement onto a reference system (e.g., a surface representing the computer monitor). Requirement B: The capturing of eye movement and dynamic information shall be time synchronized. Requirement C: The visualization engine shall log the dynamic information on the same reference system as used by the eye tracker. The main purpose of the Set-up requirements is to harmonize the reference systems between the eye tracker and visualization engine, in preparation for combining the eye-movement data and the dynamic AOIs.

Fig. 2
figure 2

Set-up Requirements in relation to capturing of eye-movement data and dynamic areas of interest (AOIs)

Figure 2 shows the components of a simplified set-up in relation to the requirements A, B, and C, which we will now discuss in detail. As mentioned above, Requirement A concerns the capturing of the eye-movement data in relation to a reference system. A solution to fulfil the requirement is to use virtual surfaces as reference systems and map the eye movement onto virtual surfaces that are for example equal to the screens in the environment. Therefore the eye tracker needs to have the capability of defining virtual surfaces for mapping the eye movement onto them. These virtual surfaces should be coherent with the actual monitors that present visual information. For example, Fig. 3 shows the virtual surfaces that cover a multimonitor workplace. The eye movement is determined as the intersection of gaze vector (vector with the eye position as origin) with the first virtual surface on its path. The result of this requirement is x, y, and screen information of the eye movement.

Fig. 3
figure 3

Multiple virtual surface setting for a multi monitor workplace (Friedrich, Möhlenbrink, & Carstengerdes, 2012)

Requirement B concerns the time synchronized capturing of eye movement and dynamic information. If the eye tracker and the visualization engine are separate systems, which they often are, their processes of capturing and storing the data need to synchronize. This requirement can be fulfilled through the application of a timestamp that is used by both systems. The introduction of a timestamp can be realized in several different ways. Figure 2 shows an example solution where an external timer provides the timestamp for both systems. An alternative implementation could be that either the eye tracker or the visualization engine provides timestamps and distributes them to the respective other system. Dependent on the implementation the synchronized timestamps there are also different sources for time lags that also have to be taken in consideration. The three major sources are the processing time of the eye tracker, the time between logging, and showing the AOIs on the screen and the duration to synchronize eye tracker and visualization engine. The resolution of the timestamp should be in the range of milliseconds (ms), otherwise the accuracy of the eye tracker (modern eye trackers reach up to 2,000 Hz) is decreased. A rule of thumb, from the author’s experience, for two systems is that the time deviation is less than 15 ms for 1 min. Therefore, synchronizing every 10 s should be enough for a 200-Hz eye-tracking system. Independent from the implementation, the result of this requirement is that the presented visual information and the captured eye movement are synchronized by timestamps.

As Fig. 2 shows, Requirement C concerns the generating of dynamic information in relation to the same reference system as the eye movement and with a logging rate. If the virtual surfaces (Requirement A) cover the screens in the task environment, as proposed, the reference systems for the dynamic AOIs should be the same screens. In addition to their identifier, position, shape, orientation, and size, dynamic AOIs also have timestamps (Requirement B) depending on the logging rate. The logging rate depends on the context change of the task environment (e.g., 1 Hz for an ATC radar workplace or 60 Hz for a driving simulator). For example if the task environment changes with 1 Hz, a logging rate of 1 Hz is enough because a higher logging rate would not generate more information but only duplicate the already logged one.

Figure 4 shows an example of the information needed to fulfil Requirement C. The two objects on the screen change their position once per second and are therefore logged with 1 Hz. The objects move for a total of 3 s. The time is synchronized with the eye movement. The name of the object is a unique identifier that is consistent over time. Each set of X and Y represent a corner of an object. The X and Y values are percentages of their particular axis (e.g., on a screen with 1,920 pixel horizontal resolution X1 = 0.10 stands for pixel 192).

Fig. 4
figure 4

Two dynamic areas of interest (AOIs) moving for 3 s in different directions

Requirements A, B, and C are the necessary extensions to an existing eye movement environment. If fulfilled, the set-up captures the eye movement and the dynamic AOIs synchronized and in relation to the same reference system. This completes the first step (Fig. 1) of IGDAI.

Storage requirements

IGDAI sets-up the task environments to record dynamic information, but also prepares the analysis process by structuring the storage of the dynamic information. The requirements for the storage of dynamic information are Eye (Fig. 1D), Depth (Fig. 1E), and Position and Shape (Fig. 1F). Requirement D: The eye-movement data shall be stored with the timestamp. Requirement E: The dynamic information shall be stored with depth information to distinguish between visible and hidden objects. Requirement F: The position and shape of dynamic information shall be stored as AOIs with a timestamp.

Requirement D concerns the storage of the eye-movement data with the appropriate timestamp and in relation to the reference system (Requirement A). The result of Requirement A is the information of x, y, and the screen of each eye-movement point. Requirement D is fulfilled if this information is logged together with the timestamp of the capturing. The information can be logged as a new line into a log file. If stored correctly, this file can be loaded automatically and used for the data analysis.

Requirement E concerns the storage of AOIs that overlay each other within a task environment. This requirement can be fulfilled by storing the depth information of every AOI in relation to the reference system (Requirement A and C). In 3D task environments the depth information could be described as distance from the dynamic object to the reference system. The dynamic object with the least depth would be the visual object in the foreground unless it is transparent or semi-transparent. In 2D task environments no depth information is available. IGDAI suggests the usage of a layer model for a 2D task environment to put the dynamic objects in order in relation to the reference system.

The layer model orders the dynamic objects by high (nearest to the reference system) and low (furthest away from the reference system), but is independent from the task relevance of the visual information. The lowest layer should contain an AOI for the complete reference system (screen) to secure that each eye-movement point can be assigned to at least one AOI. The amount and order of applied layers depends on the task environment and can be redefined for each different task environment.

Figure 5 shows an example implementation of the layer model adapted to a 2D air traffic control environment with four different layers. The base layer contains the screen where the visualization is presented. The window layer is reserved for dialog windows that are on top of the other layers. The object layer contains the dynamic objects that change their position in relation to the observer, for example the moving aircraft on a radar screen. The background layer contains static information that does not change their position in relation to the reference system, for example routing structure of an air space or the speed indicator within a driving environment. The base layer contains the screen.

Fig. 5
figure 5

The layer model used for an air traffic controller task environment

Requirement F is fulfilled by storing the shape and position for every timestamp as a new line into a log file. The shape of the AOI could be defined by points along a path defining a convex or concave polygon. As preparation for the data analysis the path should be equally (clockwise or counterclockwise) for all logged AOIs, because some algorithms for solving the point-in-polygon problem have one or the other order as restriction, for example, the Winding number algorithm (Hormann & Agathos, 2001). If a complex polygon (the path intersects itself) is used special methods have to be applied for the analysis (Galetzka & Glauner, 2012). The position of the AOI represents the information where to place the shape in relation to the screen. IGDAI supports Requirement F by providing a XML data structure that accounts for these assumptions that dynamic information do not change their position or shape over the majority of time and also integrates the depth information. The XML data structure is optimized for one screen only, which means that for a multimonitor environment only the screen with actual dynamic AOIs have to be logged.

The IGDAI XML data structure separates (Table 1) the shape of an AOI from its movement, size, or orientation and from the time it is visible. IGDAI XML has the main node types ShapeTemplateList, UpdateList, and AoiList. ShapeTemplateList contains shape template nodes that describe the shapes (e.g., list of points) that occur during the recording. The UpdateList contains an Update node for each time interval describing the current position and depth (3D: distance; 2D: layer model) of a dynamic AOI. The nodes in UpdateList can also contain size or orientation changes. The AoiList contains the link between the dynamic AOIs and their shapes. The reuse of equal shapes reduces the amount of data stored. Table 1 shows the XML example implementation of an aircraft shape template in combination with AoiList using the shapes and the update of the screen at second 2.

Table 1 ShapeTemplateList, UpdateList, and AoiList implemented in XML

The following description explains the process of storing dynamic AOIs within the visualization engine. For each timestamp the dynamic objects have to be transformed to the reference systems (Requirements A and C). Then the shapes of the previous timestamp are compared with the current shapes based on their AOI_IDs. The comparison identifies if the position, orientation or size has changed. If one of the values has changed the update contains this information. If a new AOI_ID is introduced its shape is compared to all existing nodes in the ShapeTemplateList. If no match is found a new node containing the shape of the new AOI_ID is added to ShapeTemplateList. The AOI_ID is then added to the AoiList and Update node. With Requirements D, E, and F we are now able to combine the dynamic AOIs with the eye-movement data.

Data analysis methods

In preparation of the eye-movement analysis IGDAI (Fig. 1) defines a requirement for matching eye movement and AOIs (G). Requirement G: For each timestamp with eye-movement data shall a set of AOIs separated by depth be available. The eye movement uses the same timestamp (Requirements A and D) as the stored AOIs. The goal is to assign dynamic AOIs to the eye movement.

Algorithm 1 describes pseudocode for automatic matching of eye movement and AOIs. By taking the layer model into account the matching algorithm would start by checking the AOIs in the highest layer (Fig. 5, window layer). Checking whether an eye-movement point is inside an AOI is a point-in-polygon problem (for complex polygones Galetzka & Glauner, 2012; for simple polygones Taylor, 1994). If the eye-movement point does not match with the AOIs on the highest layer, the matching process is continued using the AOIs from the layer directly below (Fig. 5, object layer). This process is continued for all layers until an AOI is identified that contains the eye-movement point. The result is that each eye-movement point is assigned to only one AOI.

Algorithm 1 Merge algorithm to identify the correct AOI for every captured eye-movement point

figure a

IGDAI also accounts for two special cases that arise by matching eye movement to AOIs, independent from the implementation. Requirement H: The special cases shall be addressed to conduct the eye-movement analysis. The first special case occurs if an eye-movement point can be assigned to multiple AOIs on the same layer. This inhibits a singular classification between eye movement and the gathered information. This special case will occur for example if a pedestrian covers a traffic sign in a driving simulation and both AOIs are classified in the same layer. In a 3D task environment the eye-movement point should be assigned to the AOI with the least depth (Requirement E). In a 2D task environment IGDAI recommends to determine the distance between the eye-movement point and the center of all matching AOIs. The eye-movement point is then classified based on the shortest distance (Algorithm 1, Shortest_Distance function), assuming that the AOIs surround the visual information equally centered.

The second special case is the mismatch of eye movement and AOI due to the accuracy of eye-tracking systems. Figure 6 (1×) shows an example of eye movement aggregated for one second (blue dots) and the nearest AOI (square). The eye movement in Fig. 6 (1×) would be classified incorrectly on a lower layer, probably the lowest one, because of the accuracy of the eye-tracker system. IGDAI recommends an enlargement of the AOIs if it is possible to determine the degree of accuracy. The accuracy consists of the accuracy of the gaze vector and the accuracy of calibration. The accuracy of the gaze vector should be provided by the eye tracking manufacturer. The accuracy of calibration depends on the calibration of the participant and the stability of the calibration throughout the experiment (e.g., a head mounted eye-tracking system could dislocate). Therefore, the degree of accuracy should to be checked in small intervals throughout the capturing of the eye movement, e.g., by calibration points during the experiment. A scale factor for enlargement can be calculated using the tangents of the degree of accuracy (normally expressed in angle), the distance of the participants to the screen and the original size of the AOI on the screen. Figure 6 (1× to 4×) shows the offsetting of the AOI with three different scale factors. If the degree of accuracy cannot be checked during the experiment due to experimental design or task environment constrains, IGDAI recommends using no scale factor (1×) for the eye-movement analysis. Even if no scale factor is used, a set of self-selected scale factors can help to determine the quality of the recorded eye movement. This analysis is based on the assumption that each participant must dedicate an average amount of eye movement to the dynamic AOIs to solve the task. A calibration error (e.g., a dislocated eye-tracking system) could lead to a strong increase of eye-movement data in the perimeter of the dynamic AOIs. A systematically increasing of the dynamic AOIs helps to identify participants with an abnormal increase in assigned eye movement. An individual examination of the eye-movement data of these participants should help to decide if their data is valid for further analysis. An example analysis is provided in the result section.

Fig. 6
figure 6

Eye-movement data (dots) and dynamic areas of interest (AOIs) on the object layer (square) with different area sizes

IGDAI defines requirements for the task environment, provides a structure for recording the dynamic AOIs and then simplifies the post processing by taking into account the previous steps. IGDAI is kept generic to be applicable to a large variety of task environments. An IGDAI implementation was done by the German Aerospace Center (DLR) using a SMI head mounted eye tracker, an air traffic controller simulation (FAirControl) as task environment and the analysing software Eye Tracking Analyzer (EyeTA). The EyeTA was also developed by the DLR and represents the first software that implements IGDAI for the purpose of standardized eye-movement analysis with dynamic AOIs.Footnote 1

Verification of IGDAI

The feasibility of IGDAI is shown by implementing it into a task environment and showing the effect for the analysis. We selected FAirControl (Möhlenbrink, Werther, & Rudolph, 2007) as a task environment. For the verification, we assume the focus of the study is to identify if the participants gather information by monitoring fixed points or pursuing dynamic objects to generate a decision.


The group of participants consisted of nine female and 11 male (N = 20) students with an average age of 24 years (SD = 2.12 years). The participants studied at the Technical University of Berlin. They participated in the experiment in separate sessions and were compensated with 10 €.


The task environment to capture the eye movement was set up as proposed by IGDAI and therefore similar to Fig. 2. FAirControl represents an airfield from a top view (Fig. 7). The airfield consists of a runway and two routes to reach the runway entry point (target circle). FAirControl simulates an aircraft (LandA) landing on Route Land and an aircraft (StartA) taking off on Route Start. The aircraft begin simultaneously on both routes. The time for an aircraft to fly its particular route varies between 9.5 and 11 s. The distance between the aircraft to reach the entry point vary between 1.5 s (LandA first on Target) to -1.5 s (StartA first on Target), but was never zero. The speed from the aircraft on Route Land varied between 6.8 and 7.9 cm/s. The AOI of LandA was a square with sides of 2.5 cm. The speed from the aircraft on Route Start varied between 0.9 and 1.1 cm/s. The AOI of StartA was a square with sides of 1.5 cm.

Fig. 7
figure 7

FAirControl, the task environment that was used for verification of IGDAI

FAirControl was presented in full screen mode on a 22-in. display (17.6-in. wide by 13.7-in. high) with a resolution of 1,280 × 1,024 pixels. A SMI RED eye-tracker system with a 60 Hz sampling rate was used to capture the eye movement. The eye-tracker warning system made sure that the participants kept a distance of 65 cm and stayed within an area of 50 × 30 cm to the center of the screen.

The eye-tracker system fulfils Requirement A. The timer (Requirement B) was implemented into FAirControl. Therefore the eye-movement data were sent to FAirControl and a timestamp was added. If no eye-movement data were sent or the eye-movement point was not on the screen, FAirControl added zero values to the x and y coordinates in the eye log file (Requirement D). The logging of the dynamic AOIs were also set to 60 Hz to match the eye movement. FAirControl stored the dynamic AOIs using the layer model (Requirement E) and the data structure (Requirement F). The chosen layer names and depths of the layers are equal to Fig. 5, except for the window layer that was not used, because FAirControl can be handled without extra dialog windows. The base layer contains the screen. The background layer contains AOIs for approach, runway, both taxiways (in and out), the target, and the buttons (Fig. 7). The object layer contains the bounding boxes surrounding the aircraft that moved via both routs.


After reading the instructions and prior to the experimental session, the eye-tracking system was calibrated to each participant. In a 10-min training session, participants were familiarized with the FAirControl. The participants were instructed to monitor FAirControl and decide which aircraft would reach the runway entry point (Fig. 7, Target) first. They answered by pressing one of the buttons and while the aircraft on Route Land moved through the decision window (Fig. 7). After the decision was made each aircraft continued their route and the participants got feedback if they were correct or not. When the aircraft on Route Land reached the start position of Route Start the next cycle began. Every participant assessed 60 randomized cycles with an equally distributed number of aircraft from both routes reaching the runway entry point first. The experiment took approximately 17 min to complete.

Results and discussion

As described above, the IGDAI layer model was used to classify the different types of AOIs used within FAirControl. Due to the problem of the clear distinction between fixation, saccades, and smooth pursuits in dynamic task environments the data analysis was based on the raw eye movement and the classification by the IGDAI layer model. To verify the impact of IGDAI to the analysis of eye movement, we performed three analyses. The first analysis quantifies the acceleration of data preparation by IGDAI in relation to a manual description. The second analysis quantifies the quality of the recorded eye movement. The third analysis shows the impact of the layer model to quantify the eye movement distribution.

The first analysis shows the influence of IGDAI on the AOI preparation time (amount of time needed for generating AOIs) and the quality of the AOIs. The analysis was performed using BeGaze to transcript five randomly selected data sets without using IGDAI. The five datasets had an average length of 17.2 min (SD = 0.95) with two dynamic (LandA and StartA) and seven static (Screen, Approach, TaxiOut, Buttons, Target, Runway, and TaxiIn) AOIs. The AOI preparation time took 35.5 min on average (SD = 4.9). Therefore, the transcription of all 20 data sets would have taken approximately 11 h. The AOI preparation time without IGDAI increases linear with the number of participants. The AOI preparation time with IGDAI was zero, because the capturing of AOIs took place while conducting the study. The additional effort for implementing IGDAI into the task environment was approximately 6 h. The implementation time for IGDAI is a fixed value independent from the number of participants. The influence of IGDAI on the quality of AOIs was determined by comparing the different dwell times in percentage. Figure 8 shows that the results from BeGaze and IGDAI are similar. The difference between the dynamic AOIs is 0.02 % (BeGaze 31.49 %; IGDAI 31.51 %). Neither dynamic [t(4) = -0.10807, p <. 91] nor static [t(4) = 0,14887, p <. 88] showed significant differences.

Fig. 8
figure 8

Comparison between dwell times in percentages for five participants and static or dynamic areas of interest using BeGaze and IGDAI

The second analysis concentrated on the quality of the eye-movement data recorded for the individual participants. First, the amount of valid eye movement per participant (valid eye movement divided by total number of captured eye movement) was calculated and compared. Eye movement was invalid if the eye tracker could not detect the eyes or the gaze vector was not on the experimental screen. For each participant the validity of the eye movement data reached more than 94 % (M = 97 %, SD = 1.8). Second, the quality of the initial calibration was determined by a 9-point verification method. After the calibration of the eye tracker, 91 % (SD = 4.5) of the eye-movement data of each participant had an error of 1.76° around the actual reference points. Third, because the experimental design did not allow checking the degree of accuracy during the experiment, the alternative analysis proposed by IGDAI is to use a self-selected scale factor (2×, 3×, and 4×) to increase the dynamic AOIs. This leads to an increase in eye data assigned to the dynamic AOIs in relation to the original size (1×). If the increase shows outliers, these participants should be checked individually for calibration errors. Figure 9 shows the increase between 1× and the scale factors in boxplots to identify outliers. An individual examination of both outliers (VP4 and VP18) showed that their percentage of eye movement on dynamic AOIs did not increase at one particular factor, but was above average even for 1×. The analysis with self-selected factors could show that the increase of assigned eye data is similar for all participants, and therefore all participants are comparable on the following analysis.

Fig. 9
figure 9

Increase of assigned eye data to dynamic areas of interest (AOIs) in relation to the scale factor 1× for three self-selected factors

The third analysis demonstrates the impact of the IGDAI layer model to the eye-movement analysis. As a baseline for this analysis the base and background layer were analyzed first and then contrasted with the object layer. The dynamic AOIs on the object layer had the original size (1×). Figure 10 shows the results using IGDAI layers (base and background layer – BackgroundL; BackgroundL plus object layer – ObjectL). The bars represent the percentage of eye movement per FAirControl AOI (Fig. 7). The bars for each condition summarize to 100 %. By using only the BackgroundL the data is divided into seven AOIs. When extending the analysis with the object layer the data is divided into nine AOIs.

Fig. 10
figure 10

Mean plot for percentage of dwell times with standard error (n = 20) separated by application of different layers

Figure 10 shows that the base layer always contains less than 7 % of the eye movement, independent from the analysis. BackgroundL contains 91.2 % (Runway + Target + TaxiIn + Buttons = 30.8 %; Approach + TaxiOut = 60.362 %) of the eye movement. Including ObjectL into the analysis has a significant decreasing effect to the percentage of eye-movement data on Approach and TaxiOut [t(19) = 9.833 , p < .001]. No other AOIs in the background layer showed a significant increase or decrease when ObjectL was included.

The results of the three analyses show an overall view of the possibilities of structured data analysis that IGDAI facilitates. The analysis of distributed eye-movement data was increased in detail through the application of the IGDAI layer model. By the use of the new method of structuring the AOIs in a different layer we were able to see the extended distribution of eye movement for the example data. In relation to only concentrating on static AOIs this leads to a detailed insight.


We introduced the guideline IGDAI, which allows an automated classification of eye movement dependent on static or dynamic AOIs. IGDAI enhances the eye-movement analysis by simplifying the integration of capturing dynamic AOIs by a step by step approach of integration. IGDAI is also independent from the used eye tracking technology and the task environment because it structures the flow of information that is necessary for eye-movement analysis. It can be applied to simple one screen task environments, but also to complex multiple monitor task environments (e.g., Fig. 3) or even aircraft cockpits (Sarter, Mumaw, & Wickens, 2007). The requirements (Fig. 1) to configure the task environment build the basis for an efficient capturing of the dynamic information and consequently accelerate the data analysis.

An important step in understanding how participants solve a task is to identify the information-gathering sequence. This sequence has to be as accurate as possible because approximations could lead to misinterpretation. With IGDAI the information-gathering sequence can objectively be identified. The verification study demonstrated the proper functioning of IGDAI and its influences to various levels of eye-movement analysis. It showed that the actual position and size of a dynamic object allows a more detailed analysis than static AOIs alone.

Future IGDAI developments should extend the automatic generation of dynamic AOIs to non-artificial task environments. The generic structure of IGDAI allows the adaptation onto those task environments, such as a view through the windscreen while driving in a real environment. Methods (Hsu & Huang, 2001) and tools to support the reconstruction of video movements by post processing already exist and only need small adaptation to fulfil the IGDAI requirements, such as the DynAOI generation tool (Papenmeier & Huff, 2010).

The results showed that IGDAI lowers the effort of using static and dynamic AOIs as a method of eye-movement analysis. The application of dynamic AOI is often associated with effort that is not justified by the results. IGDAI changes that and might lead to dynamic AOIs as a standard tool for eye-movement analysis.