1 Introduction

Eye-tracking data can provide insight into human behavior and cognition. Many methods for the analysis of scanpaths are designed to identify patterns, e.g., similar group behavior. Purely quantitative measures can provide computational results, but the analyst has to trust and interpret the numbers. Visual analytics keeps the human in the loop and allows interactive exploration of data sets. However, as the amount of data increases, visualization methods reach limits of available display space, leading to overdraw and clutter.

One way to tackle this issue is to apply scanpath simplification, smoothing, or interpolation algorithms. But this might introduce unwanted artifacts. For example, the eye-tracking data set used in our evaluation (Netzel et al. 2016) is based on public transit maps. Meaningful correlations to the underlying data (stations, transit lines) might become disturbed, or lost, in interpolated views. The same observation potentially applies to many other data sets in which the visual stimuli might become disconnected from the scanpaths after interpolation. Instead, we want to select a representative subset of the original input data that conveys a concise impression of the whole data set. To support the identification of behavioral patterns with less displayed data, we propose the application of Group Diagrams (GDs) to eye-tracking scanpaths. In previous work, GDs were introduced to represent common movement patterns of animals or other moving entities, in particular to detect split and merge points in groups of (typically gps-based) trajectories (Buchin et al. 2020). The GD framework allows to select a distance measures to guide the selection of representative subsets of the original data. What makes GDs particularly interesting is the fact that the user may input an error threshold \(\delta\) to adjust the level of abstraction. The output diagram then ensures that each input trajectory is represented by a path in the diagram such that their distance (according to the chosen distance measure) does not exceed \(\delta\). Hence, GDs come with a provable quality guarantee, which is in stark contrast to most methods applied to scanpath simplification or clustering in previous work.

2 Contribution

Our main contributions target the application of GDs to eye-tracking-specific scanpaths. We carefully implement the algorithmic framework for GDs and explore the suitability of different similarity measures for scanpaths. As the structures of eye-tracking scanpaths differ from gps-based trajectories, new challenges arise when implementing GDs.

We show that GDs based on the equal-time and similar-time distance (which work sufficiently well for gps-based trajectories) result in heavily fragmented scanpath diagrams, which is undesired. Using the Fréchet distance, however, we show that the output diagram faithfully represents the input scanpaths and sensible abstractions for visual analytics are produced.

Furthermore, we investigate and improve computational aspects of GDs. The construction of GDs requires to segment the input paths by the insertion of so-called augmentation points. We show that especially the long straight parts of scanpaths, where a person jumps from one part of the visual stimulus to another, demand the insertion of a huge amount of these augmentation points. We observe that heavily intertwined trajectories create combinatorial challenges, resulting in impractical running times. Fortunately, these challenges can be met by moderate changes in the algorithmic design. We propose a method to reduce the amount of augmentation points needed without losing the quality guarantee of the GD.

We describe further engineering ideas tailored to scanpaths and provide an experimental evaluation that demonstrates the usefulness of the approach for scanpath analysis. We discuss in detail how the choice of the error threshold \(\delta\) impacts the output and how providing different levels of abstractions might benefit the visual analysis.

An earlier version of our work was presented at the vinci ’22 conference (Schäfer et al. 2022). In this article, we will present new results and improvements. Most importantly, we were able to further reduce the number of augmentation points by re-interpreting a minimality criterion. We will explain why this approach is justified and how it improves scalability. Secondly, we improved the running times of our prototype implementation. We will show the benefits of multi-core processing. To evaluate the usefulness of our approach, we conducted a qualitative study with a group of eye-tracking experts, comparing our results to a similar technique, and collecting suggestions for improvements. Finally, we applied our implementation to the above mentioned gps data sets, and find that our design enhancements allow for significantly improved detail in the original problem setting, too.

3 Related work

Identifying patterns in groups of trajectories is a crucial step in many areas, including: computational movement analysis (Gudmundsson et al. 2011), visualization of scanpath sets (Raschke et al. 2014), or map generation from heterogeneous trajectories (He et al. 2018).

For studying the spatial and temporal aspects of scanpaths, a great number of methods have been proposed. Both Andrienko et al. (2012) and Anderson et al. (2015) provide extensive and comparative overviews. Peysakhovich and Hurter (2017) proposed a pipeline consisting of clustering and attribute-driven edge bundling to produce so-called flow direction maps for visualization. Rodrigues et al. (2018) introduced a multiscale aggregation method and evaluated it on the same data set we use in our evaluation. One issue with their method is that it relies on a discretization of the plane that might induce some artifacts. Andrienko et al. (2012) present metrics for path similarity analysis (PSA) that could serve to calculate distances to an optimal trajectory. However, they have no method to actually calculate this optimal path. To the best of our knowledge, there is no work on scanpath set representation that comes with a quality guarantee for the produced output.

For the analysis and representation of gps-based trajectory sets, different concepts were introduced in previous work. One idea is to condense the whole input set into a single trajectory, e.g., the so-called median trajectory (Buchin et al. 2013). But this method is typically too restrictive to find patterns that only arise in a subset of the input. Approaches for detecting and representing groups of moving objects were described under notions such as flocks (Gudmundsson and van Kreveld 2006), swarms (Li et al. 2010), or herds (Huang et al. 2008). Of those three, only the herds concept allows for splitting and merging of sub-groups. Recently, the concept of Group Diagrams was introduced as an approach to capture sub-group movements with a quality guarantee (Buchin et al. 2020). The results of the algorithm are guaranteed to lie within a given distance range, thus allowing to control their visual complexity. There is, however, no guarantee regarding the number of pruned points. So far, the method was applied to analyze the movement patterns of geese. We will describe in detail how the approach works and how it can be applied to produce useful results on eye-tracking scanpaths.

4 Group Diagrams

The concept of Group Diagrams (GDs) was introduced by Buchin et al. (2020). We will shortly recall the main aspects of their algorithm and refer to the original paper for further details.

We model one scanpath, or trajectory, by a polygonal curve \(T=((p_1,t_1),(p_2,t_2),\dots ,(p_n,t_n))\) consisting of n vertices \(p_1, \dots , p_n \in \mathbb {R}^2\) with attached timestamps \(t_i\). Given a set \(\mathcal {T}=\{T_1,\dots ,T_k\}\) of k trajectories, with a total of N vertices, the task is to compute a set of sub-trajectories from these input trajectories.

Fig. 1
figure 1

The main stages of Group Diagram computation

Definition 1

(Group Diagram, Buchin et al. 2020) A Group Diagram is a geometric graph with vertices augmented by a temporal component that represents all input trajectories \(\mathcal {T}\). We say the graph represents a trajectory \(T \in \mathcal {T}\) if there is a similar path P in the graph under a given similarity measure and a similarity threshold \(\delta\). A GD is minimal if it is minimal in size, either with respect to its number of edges, or the total length of edges.

In Fig. 2b, we demonstrate why it may be necessary to select partial segments into the solution set. If we selected only complete segments as representatives, some parts to the input data would be covered by more than one representative. Since we want a minimal solution, we have to allow for partial segments. We call this the local minimality criterion [see Buchin et al. (2020) for a formal definition].

The algorithmic framework consists of three major stages (see Fig. 1):

(1) Segmentation:

insert augmentation points

(2) Clustering:

compute a set of relevant cluster candidates

(3) Set-Cover:

pick a minimal set that covers the input data

(1) Segmentation In order to satisfy the local minimality criterion, we need to add extra vertices to the input data, so-called augmentation points. In Fig. 2c, d, we show how these augmentation points are computed for the Fréchet distance variant of the algorithm. For the equal-time, and similar-time, variant of the algorithm, augmentation points are needed to allow for proper alignment of timestamps. In Sect. 5.1, we discuss the impact of augmentation points on scalability and find ways to improve it.

(2) Clustering A set of close sub-trajectories with a representative \(\tau\) is called a cluster \(c(\tau )\). During the Clustering stage, a set of relevant candidate clusters is determined. Clusters are computed by first selecting a representative segment, then finding all close segments according to the given distance measure. Next, we try to extend the cluster as long as the set of affected trajectories does not change. Once the cluster cannot be extended without changing the set of trajectories, we call the resulting cluster relevant. See Fig. 3 for examples of relevant and irrelevant clusters.

(3) Set-Cover Given the set of all relevant clusters from the second stage, we want to select a minimal subset of clusters that cover all input trajectories of the input data. This is done by solving a Set-Cover problem instance during the third stage of the algorithm framework. Since the Set-Cover problem is known to be NP-complete, and even hard to approximate, a greedy approach with a given approximation guarantee is the best practical choice. The greedy algorithm in each step picks the largest cluster and removes its segments from the remaining set. If we denote the optimal result size by m, the solution of the greedy approach is no worse than \(m \log N\) (where N is the total number of segments). Its runtime is bounded by \(O(sN \cdot \min \{s,N\})\), where s is the number of relevant clusters.

Fig. 2
figure 2

Local minimality criterion and augmentation points

Fig. 3
figure 3

Relevant and irrelevant clusters. a is an irrelevant cluster because it can be extended. b, c and d are relevant clusters

The algorithmic framework allows for some flexibility in choosing the similarity measure. In the following, we discuss three available similarity measures and their usefulness for our problem settings. The framework also allows for different optimization goals: find a solution with a minimum number of segments, or find a solution that minimizes total segment length. Since we aim at reducing visual complexity, we will concentrate on minimizing the number of segments.

4.1 Group Diagrams using equal-time and similar-time distance

The segmentation step of the GD framework depends on the used similarity measure. There exist many measures for spatio-temporal trajectories, see, e.g., Buchin et al. (2014) for a more in-depth discussion. Two rather simple measures used in Buchin et al. (2020) are the equal-time and similar-time distance. For the equal-time distance, we align the vertices of the trajectories by their timestamps. (Timestamps were recorded for each vertex during the eye-tracking process.) We call trajectories similar for a given threshold \(\delta\), if the Euclidean distance of the aligned vertices is at most \(\delta\). During the segmentation stage of the algorithm, \(O(k^3n)\) augmentation points have to be inserted. For the similar-time distance, the timestamp alignment is relaxed to an interval \([t_i-a,t_i+a]\) for each timestamp \(t_i\).

Unfortunately, both distance measures are too restrictive for our envisioned application to scanpaths. Equal-time distance and similar-time distance are appropriate for trajectories moving at similar speed, and with similar starting points in time. If individual speeds differ too much, results are prone to become heavily fragmented. This effect is expected to happen on scanpath sets, since speed might vary significantly among different scanpaths. Therefore, we are looking for a more flexible distance measure that is less sensitive to speed but still considers the orientation and course of the scanpaths.

4.2 Group Diagrams using the Fréchet distance

The Fréchet distance is widely used as a trajectory similarity measure and in particular for trajectory clustering (Alt and Godau 1995; Yuan et al. 2017). It is often called the dog-leash distance. Consider a man walking his dog along two curves: the man follows one curve and the dog follows the other curve. Both move at arbitrary speed (but always forward). Then, the Fréchet distance is the minimum length of the leash that allows man and dog to traverse the curves; see Fig. 4, top, for an illustration. We shortly recall the formal definition of the Fréchet distance:

Definition 2

(Fréchet distance) Given two parameterized curves \(P:[0,1] \rightarrow \mathbb {R}^2\) and \(Q:[0,1] \rightarrow \mathbb {R}^2\), the Fréchet distance

$$\begin{aligned} \delta _F(P,Q) \ =\ \inf _{\sigma ,\tau } \max _{\ \begin{array}{c} s\in [0,1],\\ t\in [0,1] \end{array}\ } \ \Vert \ P(\sigma (s))\ -\ Q(\tau (t))\ \Vert \end{aligned}$$

is the infimum over all continuous and increasing bijections \(\sigma ,\tau : [0,1] \rightarrow [0,1]\). By \(\Vert \cdot \Vert\) , we denote the underlying norm, in our case the Euclidean norm.

Fig. 4
figure 4

Fréchet distance and free-space diagram. The x-axis maps to the red curve; the y-axis maps to the blue curve. The shaded area represents all pairs of points whose distance is \(\le \delta\)

The Fréchet distance seems difficult to compute because we have to find the minimum over all possible functions. The basic algorithm was introduced by Alt and Godau (1995), solving the decision problem (‘is \(\delta _F(P,Q) \le \varepsilon\) ?’) in \(O(n^2)\) time. An important concept is the free-space diagram (see Fig. 4b), which will also be the centerpiece of the second, clustering stage of our algorithm. We compute a free-space diagram for the concatenation of all trajectories \(T_1\circ T_2\circ \dots \circ T_k\). Upon this (huge) free-space diagram, a directed graph is computed that allows us to quickly find reachable parts within the free-space diagram. Using this data structure, relevant clusters \(c(\tau )\) are computed by finding monotone paths within the free-space diagram (see Fig. 5 for a schematic example). In order to find those paths, we need to follow a sequence of critical locations in the free-space diagrams, i.e., locations in the free-space diagram where new routes open, close, or diverge.

During the first, segmentation stage of the algorithm, up to \(O(k^4n^4)\) augmentation points need to be inserted into the input data. Although the worst-case scenario is unlikely to happen on real inputs, a large amount of augmentation points might have to be inserted. This has a negative effect on both runtime and memory usage. We discuss practical methods to tackle these challenges in Sect. 5.

Fig. 5
figure 5

Computing clusters for the Fréchet distance variant comprises finding sets of monotone paths (yellow) within a free-space diagram. The x-axis maps to the cluster representative \(\tau\); the y-axis maps to the cluster segments \(c(\tau )\). In order to traverse those diagrams efficiently, they are overlaid with a directed graph structure [explained in detail by Buchin et al. (2008)]. Augmentation points need to be inserted to cover critical locations of the free-space diagram, and to satisfy the local minimality criterion

4.3 Runtime complexity

Computing minimal GDs exactly has been proven to be NP-hard for all variants. Depending on the similarity measure, we can, however, give worst-case estimates for the first and second stage of the algorithm (segmentation and clustering).

For the equal-time and similar-time distance variant, we can compute \(O(k^3n)\) Set-Cover instances in time \(O((k^5+k^4 \log n)n)\), where k is the size of each instant.

For the Fréchet distance variant, we can compute a Set-Cover instance of size O(kN) in time \(O(k^2 N^3)\), where k is the number of trajectories and N the maximum number of vertices in a trajectory. See Buchin et al. (2020) for further details.

The subsequent third stage (Set-Cover) can, as we already noted, be solved approximately in polytime. The chosen greedy algorithm returns a solution of size less than \(m \log u\), where m is the minimum number of subsets and u the total number of elements. The runtime of the greedy algorithm is bounded by \(O(su \min \{s,u\})\), where s is the number of subsets.

5 Implementation

As already observed in previous sections, implementing the GD algorithmic framework for scanpaths poses a number of challenges with respect to runtime and memory usage: During the segmentation stage, many additional points have to be inserted into the input data, increasing both memory usage and runtime of following stages. Additionally, the Fŕechet distance variant of the algorithm uses large data structures to compute relevant clusters. We now give an overview of the implementation choices that we made in order to improve the scalability of the respective algorithms.

5.1 Improving segmentation

During the segmentation stage of the Fréchet variant of the algorithm, two types of augmentation points are inserted into the input data (see also Fig. 2):

Type 1:

for each vertex that is closer than \(\delta\) to another segment, a new point is inserted with distance \(\le \delta\)

Type 2:

for intersecting segments, up to four new points are inserted with distance \(\delta\)

These augmentation points are necessary to satisfy the local minimality criterion (see Fig. 2b), which in turn is essential to arrive at a minimal solution. Augmentation points of ‘Type 1’ are constructed by raising the perpendicular from a vertex to a nearby segment (see Fig. 2c). If the distance is closer than \(\delta\), an augmentation point is inserted. With heavily intertwined input trajectories like ours, the amount of these augmentation points can be considerably large, in particular for small values \(\delta\). We observe, however, that for the local minimality criterion to be satisfied, there is some flexibility in choosing the augmentation point. The requirement is to select a point whose distance is \(\le \delta\). Thus, we improved the process by using a more flexible approach:

  1. 1.

    For each vertex and each close line segment, record the interval of points closer than \(\delta\) (i.e., intersect a circle of radius \(\delta\) with the line segment),

  2. 2.

    collect all intersection intervals for a given segment,

  3. 3.

    choose the minimal number of points to stab all intervals.

The interval stabbing task can be done by a simple order-and-sweep algorithm:

  1. 1.

    sort intervals by their lower bounds in an ascending order,

  2. 2.

    put the sweep line at the upper bound of the first interval,

  3. 3.

    iterate over sorted intervals, while their lower bound is smaller or equal than the current sweep line,

  4. 4.

    pick the upper bound of the next interval. Repeat from step 3.

It is easy to show that this sweep algorithm produces an optimal result in \(O(n \log n)\) time.

For augmentations points of ‘Type 2,’ we consider intersecting segments. Additional points are inserted at the locations where the distance between both segments is exactly \(\delta\). Again, for heavily intertwined trajectories, the number of those augmentations points can become prohibitively large. We therefore choose to relax the local minimality criterion by allowing partial segments to be slightly longer than necessary. By doing so, augmentation points of ‘Type 2’ can now be chosen from an interval (see Fig. 2e for illustration). Again, we use the stabbing approach to service overlapping intervals by a minimum set of points. The interval size is adjustable. It is important to note that, by relaxing the local minimality criterion, we do not modify the number of solution segments and thus do not compromise the global optimization goal.

The results of our improvements are examined in Table 1. In particular, the reduction on ‘Type 2’ points with small values of \(\delta\) is considerable. Whereas the original algorithm was not executable for some input data due to memory constraints, we are now able to run the algorithm for all values of \(\delta\) (see also Table 3).

Table 1 Number of augmentation points in original algorithm, and after our improvements

5.2 Space-saving free-space diagrams

As mentioned above, Buchin et al. (2020) construct a free-space diagram from the concatenation of all input trajectories. While conceptually easy to understand, we observe that this data structure contains redundant data—it is symmetrical to the diagonal. In other words, the free-space diagram \(\hbox {FD}(T_i,T_j)\) contains the same information as \(\hbox {FD}(T_j,T_i)\). \(\hbox {FD}(T_i,T_i)\) is not needed at all. To save memory, we chose to store only the lower half of the concatenated free-space diagram, i.e., we store \(\hbox {FD}(T_i,T_j)\) only if \(i < j\). In order to retrieve the symmetrical information, we also had to adjust the free-space graph data structure described by Buchin et al. (2008). Another observation with practical input data is that usually some areas of the free-space diagram remain empty. Bringmann et al. (2019) developed a number of heuristics to map out these unneeded parts of the free-space diagram. Applying these techniques may be another future path for improving the memory footprint and speed of the algorithm.

5.3 Parallelization

Some computation steps can be executed independently for all input trajectories. Assuming that the number k of trajectories is large enough to keep all processor cores busy, the following steps are easy to rewrite for parallel computation:

  • Segmentation: computation of augmentation points can be done in parallel for all input trajectories

  • Free-space diagrams: computing the concatenated free-space diagram (see Sect. 5.2) can be done in parallel for each pair of trajectories

  • Clustering: computation of relevant clusters by sweep can be done in parallel for all input trajectories

The greedy algorithm to solve the final, Set-Cover stage of the algorithm is inherently sequential. It is not well suited for parallelization.

Table 2 Effectiveness of parallelization

6 Experimental results

We implemented the algorithm in C++, using Intel tbb for multi-core support. The source code is available from our git repository (Schäfer 2022). Experiments were conducted on a 32-coreFootnote 1 AMD processor with clock speeds from 2.1 to 4.2 GHz.

Parallelization experiments (see Table 2) were conducted with an (artificially enlarged) input instance containing 80 trajectories. Speed-up factors are excellent up to 20 threads and then level out, which might be explained by the sequential portions of the algorithm.

6.1 Sample data

To evaluate the use of Group Diagrams for eye-tracking scanpaths, we chose a previously recorded data set by Netzel et al. (2016). They used public transit maps of 24 cities and asked study participants to find a connection between two stations indicated by a green hand (start) and a red dartboard (destination). Eye tracking enabled them to analyze reading strategies on colored and monochrome maps. The data from 40 participants were recorded at 60 Hz using a Tobii T60 XL eye tracker. Netzel et al. used the manufacturer’s provided I-VT algorithm to extract fixations and saccades. For each transit map, experiments were conducted with two different pairs of start and end points, and with colored and gray-scale images. This resulted in 1920 scanpaths on 96 stimuli. The number of vertices varies from about 20 to a few hundred vertices per scanpath. Test runs required up to 40 gb of ram. Running times varied from 74 ms per instance, up to 25 s (ca. 230 s on single-core hardware). Usually, very small values of \(\delta\) have a negative impact on both, total running time and on ram usage.

Trying to analyze different reading strategies by simply drawing all 20 scanpaths on each stimulus leads to high amounts of clutter and overdraw. Using a different color for each path also leads to hues that are hard to distinguish. As all participants need to find a connection between stations, they will follow the metro lines. Some will transfer between lines at one station, some at another. Some might try to minimize the travel distance and explore multiple possible lines before arriving at the destination. However, we can assume that there are much fewer different strategies than study participants. Therefore, it is possible to select representative scanpaths to reduce clutter and aid the analysts in their task. This is where Group Diagrams come into play.

6.2 Influence of the distance measure

We first verified our hypothesis that the Fréchet distance is more suitable for scanpath GDs than the simpler measures proposed by Buchin et al. (2020). As expected, GDs become heavily fragmented under the equal-time and similar-time distance measures and produce unsatisfactory results, as illustrated in Fig. 6b. In contrast, differences in each participant’s task completion time do not affect the Fréchet distance. That is why this measure allows for a more concise selection of sub-trajectories (see Fig. 6c, d)

Fig. 6
figure 6

Results on a public transit map of Antwerpen. Fréchet distance results are usually more coherent and easier to interpret

Fig. 7
figure 7

Result set size as a function of threshold \(\delta\). (x-axis: threshold \(\delta\), y-axis: number of retained points in relation to input size)

Fig. 8
figure 8

Scanpaths on a public transit map of Frankfurt

Fig. 9
figure 9

Scanpaths on a public transit map of Zürich

6.3 Influence of the threshold value and implementation choices

The threshold value \(\delta\) effectively controls the level of detail in the resulting GD: Small threshold values result in very detailed diagrams, while larger values create less detailed but more descriptive diagrams (see Fig. 6b). As shown in Table 3 and Fig. 7, the number of points in the GD is significantly reduced compared to the input. The adjustable \(\delta\)-threshold gives analysts good control over the level of clutter vs. reduction. Complete sets of input data and GDs with several threshold values are provided in supplementary material to this paper.

Table 3 Result set size

As outlined in Sect. 5, we used several strategies to keep running times and memory usage in check. Table 1 shows the effect of our improved segmentation algorithm (stabbing) on the number of augmentation points.

6.4 Problem size

We observe that running times are mostly dependent on the number of input points and augmentation points. Our largest data set containing 2700 points (plus some 28,000 augmentation points) was processed in 25 s using parallel hardware. Scanpath data sets containing less than 1800 points could be processed in less than 4 s. Most of our scanpath data sets contained, however, less than 1000 points and are processed in less than a second.

For very small \(\delta\), both memory usage and runtime grow rapidly, primarily caused by the combinatorial explosion of augmentation points. On the other hand, too small values of \(\delta\) are less useful to produce concise results, anyway.

6.5 Comparison to previous work

Rodrigues et al. (2018) used the same data set in their work on multiscale visualization of scanpaths. Thus, we can compare our results to theirs. Both techniques allow the analyst to dynamically adjust the amount of clutter in visualizations. In the multiscale approach, users choose the grid coarseness: The finer the grid, the more details are displayed. With Group Diagrams, analysts choose the distance threshold \(\delta\): A large threshold produces a more concise output.

As shown in Fig. 8, the source data visualization is very cluttered. Nonetheless, one can discern a main route where study participants first follow a metro line to the east, before turning south near the center of the map (east-first strategy). Other participants went south first and then turned east without passing through the center. The multiscale approach reduces the complexity of path shapes. The introduced meta-fixations with averaged locations spread the visualized lines parallel to each other. But the modified spatial positions detach the paths from the underlying metro lines, so that it becomes difficult to trace the routes taken by participants. Also, it is hard to discern the south-first strategy from the dense triangle. Outliers in the northeast and southeast are perceptible, but hard to associate with metro stations.

Similarly, our Group Diagram with \(\delta =800\) does not show precise outlier positions, despite the ragged shape that covers them. On the positive side, this GD yields an uncluttered view with a single trajectory. When we decrease \(\delta\) to 600, we arrive at a well-suited representation of the original scanpaths. We can see exact outlier positions in the northeastern and southern map areas. Both the east-first, and south-first strategies are visible. Most importantly, we can better correlate the paths to the underlying stimulus and identify the stations where participants changed metro lines. In summary, GDs yield representative paths that reduce clutter significantly, without introducing artifacts.

6.6 Expert review

So far, we discussed mainly technical aspects: What is the influence of the distance measure and threshold? What is the runtime and complexity? We also presented a number of result images, enabling the reader to perform their own visual evaluation. Now, as a third level of evaluation, we report the results of an expert review with seven participants (E1E7) that have previous experience in the analysis of eye-tracking data (work colleagues, non-authors). Participants were aged between 25 and 45. Five users were female and two were male.

In the first section of the expert review, we show the original data in Fig. 8a and give a coarse explanation of the Group Diagram in Fig. 8d. Users were given 10–15 min to evaluate the images. Except for E4, all experts confirmed that the trajectories in the Group Diagram represent the original data well. E4 noted an outlier in the GD and, therefore, preferred the original view for a first glimpse at the data to avoid issues of over-representation. Four experts preferred the GD because it is less cluttered. Two participants acknowledged the cleaner look but were undecided because the details in the original data might be necessary for specific analysis tasks.

Next, we introduced a view using the multiscale visualization from Rodrigues et al. (2018) with one level of subdivision (see Fig. 8b). This time, we asked whether the participants preferred the Group Diagram or the multiscale (MS) visualization for a first glimpse. Three experts chose GDs because they were more confident in the location accuracy of the trajectories, including the outlier. E1 and E5 preferred the MS view. The latter explained the decision with their suitability for finding hot areas, transition frequencies between quadrants, and finding connectivity between clusters of common viewing strategies. Two experts were undecided, because the views work well for different tasks. They mentioned that GDs work well for spatial accuracy, but the MS view shows a coarse shape and count of the trajectories.

In another phase, we also asked the experts for other visualizations of multiple eye-tracking scanpaths and how they compare against Group Diagrams. Four participants suggested using an edge-bundling technique with the original data. Of these, two preferred the bundling-based visualization, while two were undecided. E3 and E4 mentioned heat-maps. While E3 preferred GDs, E4 leaned toward the heat-map because of its simplicity. Two experts suggested defining areas of interest (AOIs) and creating scarf-plots for analysis. However, for a first glimpse at the data, they still preferred GDs because they do not require manual annotation with AOIs. Individual participants also mentioned other techniques for the data analysis [e.g., clustering of fixations, clustering of AOI sequences, interactive lenses, and space-time cubes (Kurzhals and Weiskopf 2013)], but none preferred them for an initial glimpse. The reasons were that Group Diagrams already provide a good overview of the data and the alternatives require preparatory work to set up or an interactive environment.

Finally, participants provided general feedback and suggestions. Most noted that GDs provide a cleaner, less cluttered view of the data. E2 mentioned that Group Diagrams provide a good overview of the eye-tracking scanpaths, help with finding common paths, and might be useful for sequence analysis. However, they would not serve well to compare specific individual trajectories, due to the visualization of a reduced data set. E1 liked that the GDs show scanpaths on the exact location of the stimulus without moving them. One expert would like to know more details about the algorithm of Group Diagrams to increase trust in the selection of displayed scanpaths.

Multiple participants had suggestions with regards to the use of color for the trajectories in Group Diagrams. They ranged from choosing different colors schemes to dynamically changing the color along each path to encode additional information. While we agree that those suggestions could benefit the analysis of the eye-tracking trajectories, our algorithm is color-agnostic. It only requires positional information and is targeted toward reducing clutter while guaranteeing that all scanpaths lie within a maximum distance of the chosen representatives. Therefore, and for the purposes of better presenting our algorithm, we chose to maintain colors between the view with the entire data set and the selected representatives to preserve the mental map for the comparison of the images.

6.7 Experiments on movement data

We also tested our implementation on the original input data set that was used by Buchin et al. (2020): migration data of white-fronted geese equipped with gps trackers. As explained in their previous work, the input data contain too many redundant samples,Footnote 2 so that data reduction techniques are mandatory. Buchin et al. (2020) accomplished data reduction by coalescing all movement samples in the radius of 90 kms, which is, supposedly, a rather coarse measure. Due to the enhancements we made to the scalability of the algorithm, we were able to improve on their results significantly. We ran our experiments with coalescing thresholds of 1 km and less, resulting in much more detailed output data. Running times remained still acceptably low in the range of 5–30 s. Some results are shown in Fig. 10.

Fig. 10
figure 10

Movement data of white-fronted geese across Northern Europe

We conclude that our algorithmic approach, thanks in part to design enhancements, is well suited for both types of input data. These promising results suggest that it might be useful for other, similar problem settings, too.

7 Conclusion and outlook

We showed that Group Diagrams are an effective new approach for reducing clutter on sets of scanpaths and on other data sets. In particular, the Fréchet distance variant produces useful results. The adjustable threshold value gives analysts good control over the level of detail. We were able to improve the scalability of the original algorithm significantly by small adjustments to the problem definition and by employing various implementation techniques. We evaluated our approach for multiple use cases with metrics and an expert review with positive outcome. Future work could focus on improving the Group Diagram pipeline further, with the goal to allow analysts to change thresholds interactively. Also, outlier detection algorithms that filter nonsensical scanpaths might produce even more concise outputs (see Fig. 9 for an example with noticeable outliers).