Abstract
A common enhancement of scatterplots represents points as small multiples, glyphs, or thumbnail images. As this encoding often results in overlaps, a general strategy is to alter the position of the data points, for instance, to a gridlike structure. Previous approaches rely on solving expensive optimization problems or on dividing the space that alter the global structure of the scatterplot. To find a good balance between efficiency and neighborhood and layout preservation, we propose Hagrid, a technique that uses spacefilling curves (SFCs) to “gridify” a scatterplot without employing expensive collision detection and handling mechanisms. Using SFCs ensures that the points are plotted close to their original position, retaining approximately the same global structure. The resulting scatterplot is mapped onto a rectangular or hexagonal grid, using Hilbert and Gosper curves. We discuss and evaluate the theoretic runtime of our approach and quantitatively compare our approach to three stateoftheart gridifying approaches, DGrid, Small multiples with gaps SMWG, and CorrelatedMultiples CMDS, in an evaluation comprising 339 scatterplots. Here, we compute several quality measures for neighborhood preservation together with an analysis of the actual runtimes. The main results show that, compared to the best other technique, Hagrid is faster by a factor of four, while achieving similar or even better quality of the gridified layout. Due to its computational efficiency, our approach also allows novel applications of gridifying approaches in interactive settings, such as removing local overlap upon hovering over a scatterplot.
Graphical abstract
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Scatterplots are widely used representations for 2D data. The position of the dot is the main visual encoding and allows the user to perceive proximity or similarity between individual data points. Additional channels, such as color, shape, and size, can be used to show other properties of the respective data point.
A possible enhancement for scatterplots is replacing dots with meaningful glyphs or images that provide additional semantic information (Ward 2002). For instance, the individual points can represent the handwritten digits from the MNIST dataset (Deng 2012). In that case, their positions are the 2D projections resulting from applying dimensionality reduction to the highdimensional dataset. Rather than using dots, color, or shape to represent the points, we can plot the small image of the handwritten digit, allowing the user gain more insight from the data.
For datasets that are not sourced from images, we may use glyphs, which may have various shapes and sizes, such as circular or hexagonal ones. In these cases, occlusions stemming from overlapping points might impede the readability of the scatterplot (Hilasaca and Paulovich 2019). A common approach to dealing with this issue is to detect collisions between image or glyph points and slightly jitter or move them around to reduce the overlap. Handling collisions is also necessary in graph drawing, wordles, and in gridifying maps.
Techniques formulated as optimization problems (Meulemans et al. 2016; Liu et al. 2018) produce overlapfree layouts with high quality, but come with high runtimes, impeding interactive applications. Techniques generating spacefilling results have better runtimes (Hilasaca and Paulovich 2019; Duarte et al. 2014), but can only preserve the shape and patterns of the scatterplot if adding dummy points, which also increases the runtime complexity for large grids. Our work is primarily motivated by dimensionality reduction processes in which users have to frequently change the parameterization of the algorithms to find good results. For such applications, a good tradeoff between fast runtimes and maintaining the original global structure is important.
To address these issues, we propose Hagrid^{Footnote 1}, a technique that uses spacefilling curves (SFCs) to “gridify” a scatterplot. SFCs are created by a starting pattern repeatedly replacing the vertices of the pattern with the same pattern, but rotated and flipped, so that the ends of the pattern connect to each other. This process creates a selfsimilar and selfavoiding curve, where the limit of the recursion leads to a spacefilling curve (see Fig. 2).
Most importantly, this recursion is bijective, i.e., a point in the area of the SFC can be mapped to a 1D index on the curve, and then decoded back to the 2D domain. The vertices of the SFCs correspond to the centers of the squared (HC) or hexagonal (GC) cells on the resulting grid, where the glyph or image point will be ultimately plotted (see Fig. 1 as example for the use of GC). We use these properties to align the points of a scatterplot on a grid and to handle collisions, i.e., points mapped to the same SFC vertex or the same cell on the grid. We resolve collisions on the curve by moving the respective point left or right on the SFC (see Algorithm 2).
To evaluate Hagrid, we quantitatively compare it with the three related approaches. To this end, we use 339 scatterplots and compute different quality metrics on neighborhood preservation and layout similarity, as well as runtimes. Our results indicate that Hagrid is substantially faster while keeping similar or even better visual quality. In summary, we make the following main contributions:

Hagrid, a technique for aligning 2D points on a grid, defined using spacefilling curves.

The results of a quantitative evaluation comparing Hagrid to DGrid, CorrelatedMultiples ( CMDS), and Small Multiples with Gaps ( SMWG).
Equipped with this faster approach, we provide two case studies illustrating how it can be used in interactive applications and to visualize large datasets. Our implementations of Hagrid and all evaluation metrics used are available, both in Python (https://github.com/kix2mix2/Hagrid) and JavaScript (https://github.com/saehm/hagrid). This article is an extension of a previous conference paper on the same topic (Cutura et al. 2021).
2 Related work
Spacefilling curves can be used in many different ways (Bader 2012), for instance, to lay out data in the cache, which improves the runtime of certain algorithms. Here, we provide related work on how spacefilling curves are currently used in visualization use cases. Then, we review existing techniques for collision removal or reduction in scatterplots and similar visual encoding techniques.
2.1 Applications of spacefilling curves
This section discusses existing techniques that use SFCs for visualizationrelated use cases.
Jigsaw (Wattenberg 2005) uses SFCs, such as Hilbertcurves, to generate a spacefilling map, competing with treemap generation algorithms. Their SFC approach creates nonrectangular maps, which have several superior properties, such as better preservation of ordering of their tree visualization.
Buchmüller et al. (2018) use SFCs to create 2D visualizations of simulation processes, socalled MotionRugs. Spacefilling curves allow them to transfer the position of agents in the simulation to a vertex on the curve. The curve then gets visualized as a onepixel wide column. The artifacts from this process allow tracing of behaviors of the agents in the simulation.
Muelder and Ma (2008) use SFCs for nodelink graph layouts. They compute a matrix ordering of the graph, and project this ordering on a spacefilling curve, in this case a HC or a GC. Taking advantage of the neighborhood preservation property of SFCs, this leads to a graph layout where clustered nodes are close in the final visualization.
Gospermap (Auber et al. 2013) creates treemaps with nonrectangular shapes of hierarchical data, by laying it out on the Gosper Curve. Other works use spacefilling curves for neighborhood search in point clouds (Ježowicz et al. 2014).
These techniques take advantage of the fast mapping either from 2D to 1D or vice versa. They use this mapping and the neighborhoodpreservation property to visualize different aspects of the specific data. Our technique does not focus on mapping data for a specific use case. Our primary goal is more general, that is, creating a gridified layout without any overlap by remapping points. While we illustrate our approach with scatterplots, it can be used for any data that can be processed as a set of (x, y) coordinates, for instance nodelink diagrams, or point clouds.
2.2 Point positioning
We review existing techniques for collision removal or reduction in scatterplots and similar visual encoding techniques. Closest to our work are approaches that seek to remove overlap of point representations by altering their position. The problem of point positioning occurs across different visualization types. We use the visualization type to categorize them into the following groups:
2.2.1 Graphs—Node overlap removal
The idea of removing node overlap in graph drawing is similar to our approach. For example, MIOLA (GomezNieto et al. 2013a) arranges rectangular boxes in a way such that no overlap occurs anymore and the neighborhood of a box is preserved as much as possible. It uses a mixed integer quadratic optimization formulation, which can be solved by interfacing to optimization engines. To align all points on a grid, MIOLA requires additional constraints, further increasing the runtime. The algorithm gets measured and compared with VPSC (Dwyer et al. 2006), Prism (Gansner and Hu 2008), Voronoi (Du et al. 1999), and RWordleC Strobelt et al. (2012), which have the same goal and use case. For comparison, they use various quality metrics: Euclidean distance, layout similarity, orthogonal ordering, size increase, and neighborhood preservation. The selected datasets are video snippet collections. The runtime success is based on solving their proposed formulation with Gurobi, the fastest optimization engine commercially available. Nachmanson et al. (2016) builds a minimum spanning tree and grows edges between colliding nodes. They compare their technique GTree with Prism (Gansner and Hu 2008) by measuring the area of the result, edge length dissimilarity, and procrustean similarity (Borg and Groenen 2003). MarcílioJr et al. (2019) also evaluate techniques (Pinho et al. 2009; Dwyer et al. 2006; GomezNieto et al. 2013b; Strobelt et al. 2012; Gansner and Hu 2008) of node overlap removal. All of those techniques were compared to at least one of the baselines implemented for this paper, and hence were not selected for our evaluation.
2.2.2 Spacefilling Treemaps
NMap (Duarte et al. 2014) seeks to generate a spacefilling treemap of a given visual area. Starting with a scatterplot, it recursively replaces the individual dots with unevenly sized rectangular boxes, which eventually form a treemap. They compare their method with OOT (Onedimensional Ordered Treemap) and SOT (Spatiallyordered Treemap) by comparing the aspectratio of individual boxes for each point, displacement, and neighborhood preservation. They evaluate the relationship between runtime and number of points on nine generated datasets. NMap, OOT, and SOT have in common that areas should be transformed in such a way that the proximity between objects remains intact, thereby filling the whole visualization plane. NMap has an extension that adds points in such a way that the algorithm results in a layout where all generated boundingboxes have the same size. By comparison, our technique preserves as much as possible the distances (i.e., empty spaces) between points and generates equally sized squares or hexagons by default.
2.2.3 Maps
Several techniques exist for arranging geographical entities to uniform tiles, for instance, small multiples with gaps ( SMWG (Meulemans et al. 2016)), generating tile maps (McNeill and Hale 2017), or coherent grid maps (Meulemans et al. 2020). These methods work on the centroids of geographical areas, trying to maintain the global shape while keeping potential adjacency of respective areas. Evaluating the use case of these techniques needs additional metrics, for instance, orthogonal ordering is important. Simplifying maps has a high demand on visual quality to keep the map recognizable, but runtimes are relatively unimportant as they are mostly used in a static setting. In contrast, we focus on use cases like gridifying dimensionality reduction and interactive settings, in which runtime plays a central role.
2.2.4 Scatterplots
Dust &Magnet (D &M) (Vollmer and Döllner 2020) is a technique for creating overlapfree scatterplots. Virtual magnets attract or reject points depending on a given feature. It avoids overlaps with a continuous optimization process, by first checking from the target position of a point in a defined grid in different directions for an empty spot. After finding an empty spot, in a second step, each placed point looks “back” for all adjecent grid cells closer to the target point where the point should have been placed, and taking the nearer available space. This approach can be used if the input is highdimensional, our technique gridifies already projected data.
GridFit (Keim and Herrmann 1998) uses quad trees to gridify scatterplots and proposes two baselines for their evaluation: a naive approach using nearest neighbors to find the best available cell to place colliding point, and one employing SFCs to handle overlap. Their SFC baseline shares some commonalities to our method, but only uses the SFC for the grid coordinate computation, whereas Hagrid uses the SFCs for collision handling, which constitute the biggest runtime efficiency. GridFit, as presented in the paper, is not entirely reproducible and none of the related work, that we are aware of, evaluated against it.
CorrelatedMultiples ( CMDS) (Liu et al. 2018) use a variation of MDS to “gridify” data plots. Their use case is to show uniformlysized small multiples instead of dots in a scatterplot to enrich the visualization with more information. They follow an approach similar to a forcedirected layout method, usually employed for graphs. Their evaluation focuses on the runtime analysis. They compare their technique against SpatialGrid (Wood and Dykes 2008), and GridMap (Eppstein et al. 2015). They also conduct a user study for analyzing the usefulness of small multiples in a scatterplot with favorable results.
DGrid (Hilasaca and Paulovich 2019) takes a similar space dividing approach to GridFit, and bisects the visual space repeatedly, so that each point has its own rectangular or squared grid cell while preserving neighborhoods. They compare their technique with Kernelized Sorting (Quadrianto et al. 2010), SelfSorting Map (Strong and Gong 2014), and IsoMatch (Fried et al. 2015). This evaluation is the most extensive one, by evaluating neighborhood preservation, layout similarity, and crosscorrelation on a range of datasets from the UCI Machine Learning Repository.
SMWG, CMDS and DGrid address the same use case as we do and are, thus, primary candidates for comparison. We have selected some of the quality metrics that they used in their individual evaluations (see Sect. 4). The metrics employed and the results of this comparison are presented in Sect. 4.
3 Technique
This section starts by providing some background on spacefilling curves (SFCs), and specifically Hilbert and Gosper curves. We also derive a list of properties that are beneficial for our goal.
3.1 Background and properties of spacefilling curves
SFCs start with a pattern that is recursively repeated. For the Hilbert curve (HC) (Hilbert 1935), it is the pattern
, creating a grid of squares indexed continuously from 0 to \(m^l\), where m is the number of vertices in the pattern and l the number of recursions, referred in this paper as the (depth) level of the curve. The Gosper curve (GC) (Gardner 1976) creates a hexagonal grid with the startpattern
. Examples of HC and GC at various levels are available in Fig. 2.
Such techniques take advantage of the fast mapping either from 2D to 1D or vice versa. They use this mapping and the neighborhoodpreservation property to visualize different aspects of specific data. Our technique does not focus on mapping data for a specific use case. Our primary goal is more general, i.e., creating a gridified layout without any overlap by remapping points. While we illustrate our approach with scatterplots, it can be used for any data that can be processed as a set of (x, y) coordinates, for instance, nodelink diagrams or point clouds.
In our approach, we specifically use Hilbert and Gosper curves, as they have some desirable properties for visualization use cases that we seek to leverage with our approach:

Spacefilling As the level of the curve increases toward infinity, every highdimensional point can be represented by a vertex of the 1D curve. This ensures there is no constraint on either the dimensionality or the number of items to be visualized (Uher et al. 2019).

Bijectivity 2D points can be mapped to the 1D curve and back in the 2D plane, allowing us to switch between the two representations (Wattenberg 2005).

Neighborhoodpreservation A point always retains approximately the same position on the curve regardless of whether the level of the curve is increased or decreased (Wattenberg 2005).

Stability Small changes in the position on a curve yield the smallest possible change in the 2D outputs (Uher et al. 2019).
3.2 Proposed technique
With Hagrid, our goal is to create overlapfree, gridified scatterplots where the dots are replaced with images or glyphs. Our technique differs from the spacefilling approaches ( DGrid, NMap) above in that we do not only seek to preserve the local neighborhoods, but also the global structure of a scatterplot. By global structure, we mean the preservation of the characteristics of a set of points projected into 2D Euclidean space. These properties can include, but are not limited to, measures such as density, skewness, shape, and outliers (Sedlmair et al. 2012; Wilkinson et al. 2005). Loss of global structure implies that information like outliers or point density is lost in the resulting layout. CMDS and SMWG try to retain the global structure of the visualization, but have high runtimes.
Hagrid consists of following steps:

(1)
Begin by setting the level l of the used SFC, where \(l \ge l_\text {min}=\lceil \log _m n\rceil\) (see Eq. 1).

(2)
Assign the datapoints to a grid defined by level l and the used SFC, \(g_l: {\mathbb {R}}^2 \rightarrow G_l = \{(i, j)\}\), where the pairs (i, j) represent the possible coordinates of the grid \(G_l\) (see Fig. 2).

(3)
Use \(f_l: G_l \rightarrow I_l \subset {\mathbb {N}}\), where \(I_l\) is the set of indices or vertices of the SFC of level l.

(4)
If a point is assigned to an already occupied vertex of the SFC, Algorithm 2 resolves the collision.

(5)
Finally, \(f_l^{1}: I_l \rightarrow G_l\) maps the points back onto the grid \(G_l\), resulting in an overlapfree scatterplot.
The function \(g_l\) maps the original 2D coordinates to the Grid \(G_l\). First, the data is transformed to fit into the boundaries of grid \(G_l\). Then, the transformed coordinates get rounded to the grid cell (i, j) that the datapoint should be assigned to.
The main added value of using the SFC is, therefore, the collision handling.
Figure 3 conceptually illustrates the process introduced above, using images from the public Art UK Paintings dataset (Crowley and Zisserman 2014) instead of dots. The mapping \((f_l \circ g_l): {\mathbb {R}}^2 \rightarrow I_l\) (steps 2 & 3 in the list) returns an index, transforming the position of the 2D point to a 1D vertex on the SFC (see Fig. 3 middle). The SFC needs to be deep enough to hold all the points of the used dataset (see Eq. 1). We loop through each point in the list and assign it a 1D index on the SFC. Collision handling happens in the 1D space, on the fly. If a new point is assigned to an occupied vertex, we move the point to the left or to the right, based on which direction offers the closest empty spot (yellow areas in Fig. 3). After the final point positioning, all collisions are resolved (Fig. 3 right). Finally, the 1D vertices are transformed back to the 2D grid \(G_l\), using function \(f_l^{1}: I_l \rightarrow G_l\). The entire process is detailed in Alogrithm 1.
The final set of coordinates represents the centers of square or hexagonal cells of uniform sizes. The size of the cells is determined by the level of depth of the curve (l), introduced in Sect. 3.1. The depth level is, therefore, a parameter of our technique. Since our goals cover both being able to plot images or glyphs, and completely avoiding overlaps, we can reformulate them as follows: we want to plot a set of images as large as possible without overlap.
Given the spacefilling property of the curves, if we were to generate deep enough curves, no overlap would happen, as the curve would cross through all 2D space as \(l \rightarrow \infty\). In other words, the gridified structure would approach the original layout of the scatterplots. However, a higher curve level l also leads to a more finegrained grid, and hence smaller cells where the glyphs and images can be plotted. For this reason, we want to calculate the minimal level of depth according to the number of points to be plotted. This \(l_\text {min}\) is the default value and calculated by:
where n is the number of points to be scattered, and m is the number of vertices in the initial recursion pattern (i.e., the SFCs at level 1). For HC (
), \(m=4\) and for GC (
), \(m=7\) (see Fig. 2). The users may always alter the level to achieve different results depending on
their individual goals. For example, if analyzing the global structure of a scatterplot is more important, we recommend setting the level to \(l_\text {min} + 1\) or higher, or using the level of depth that guarantees a userdefined minimum percentage of whitespace, by using \(n \cdot (1+w)\) in Eq. 1 instead of n, where w is the percentage of whitespace. For example, the level with guaranteed 50% whitespace is calculated with \(l_{50\%}=\lceil \log _m(n \cdot 1.5)\rceil\). Table 1 lists the maximum number of cells available for different levels.
The process of placing a point onto the SFC is similar to inserting a value into a hash table. The recursion depth l of the SFC depends on the number of points in the scatterplot (see Eq. 1), and defines the required number of steps of the index computation of a single point. The computation of an index assignment of one point using \((f_l \circ g_l): {\mathbb {R}}^2 \rightarrow I_l\) needs \(\mathcal {O}(l)\) time, or—as l depends on n (Eq. 1)—\(\mathcal {O}(\log n)\) time. For all points, the overall runtime is thus \(\mathcal {O}(n \cdot l) = \mathcal {O}(n \log n)\).
The above considerations are valid as long as there is no collision on the SFC. This best case occurs if the points are evenly distributed in the sense that there is at the most one point per index on the SFC from the mapping by f. However, collisions will happen in most cases. Additional computational costs can be introduced by collision handling. The worst case happens if all points are initially mapped by f to the same SFC index. In this case, collision handling will take as many steps as there are points already placed on the SFC. When placing n points subsequently, this will lead to an average of n/2 collisionhandling steps per point, or \(\mathcal {O}(n^2)\) steps to handle all n points. In total, the runtime will therefore be \(\mathcal {O}(n \log n + n^2) = \mathcal {O}(n^2)\). The first term \(\mathcal {O}(n \log n)\) comes from the initial placement of the points and the second term \(\mathcal {O}(n^2)\) deals with collision handling. Therefore, the worst case runtime is governed by collision handling.
However, both extreme cases are not typical for our applications. For inbetween cases, it is critical to model the number of collisionhandling steps, depending on n. Let us assume that the function c(n) provides the average number of collisionhandling steps. Then, we arrive at a total runtime of \(\mathcal {O}( n \log n + c(n) \cdot n)\), composed of the initial placement of points and the subsequent handling of collisions for all n points. Of course, c(n) depends heavily on the distribution of points in 2D and, therefore, cannot be discussed here in full detail. We refer to generative data models (Schulz et al. 2016) for some approaches to model data with certain characteristics that could be related to the properties relevant for collision handling. Even without such a data model, we know that the number of collision handling steps are bounded: \(0 \le c(n) \le n\), comprising the best and worst case scenarios from above. While c(n) is a theoretical model, Fig. 4 shows the number of actual collisions for randomly generated uniformly distributed datasets, as well as their impact on runtime. Therefore, our runtime will then be between \(\mathcal {O}(n \log n)\) and \(\mathcal {O}(n^2)\).
There is a relationship between our collision handling and linear probing in closed hashing (also called open addressing) (Cormen et al. 2009) that allows us to model an inbetween case. If we assume that our data is distributed in a way that it leads to the same collision probabilities as in uniform hashing, then we arrive at an average number of probing steps \(1/(1\alpha )\) for adding a point (or for unsuccessful search in hashing); here, \(\alpha\) is the load factor, i.e., the percentage to which the table is filled by points. This means that the number of collisionhandling steps depends on the load factor, but not on n. To fill the SFC, we keep on adding points subsequently, thus increasing the load factor from 0 to the maximum load factor \(\alpha _\text {max}\) at equidistant discretization steps.
In the limit case for asymptotic behavior, we then have infinitesimal steps for \(\alpha\), and \(\int _{0}^{\alpha _\text {max}} 1/(1\alpha )\,d\alpha = \ln (1\alpha _\text {max})\) for the average number of collisionhandling steps. Therefore, with the assumption of uniform hashing and fixed maximum load factor, we have \(c(n) \in \mathcal {O}(1)\) to handle one point and, thus, \(\mathcal {O}(n \log n + n) = \mathcal {O}(n \log n)\) for the overall runtime, which is identical to the best case behavior. With increasing load factor, there is an increasing number of collisions, which impacts the runtime.
It should be noted that the input data at hand does not necessarily come with the properties of uniform hashing; therefore, the runtime will then be between \(\mathcal {O}(n \log n)\) and \(\mathcal {O}(n^2)\).
4 Quantitative analysis
In this section, we present a quantitative evaluation of Hagrid in comparison to selected techniques.
We introduce the quality metrics used, the setup of the evaluation, and the results and runtime analysis.
4.1 Competing techniques
We compare our technique with its two versions, Hilbert curve (\({{\text {Hagrid}}_\mathrm{{HC}}}\)) and Gosper curve (\({{\text {Hagrid}}_\mathrm{{GC}}}\)), to three stateoftheart approaches: CMDS, DGrid, and SMWG. We selected these candidates based on two criteria: (i) how well they address our use cases, and (ii) whether the techniques had not been previously compared against each other. To this end, we have selected CMDS and DGrid as they specifically cater to scatterplot layout simplification. In previous evaluations against related methods (Eppstein et al. 2015; Fried et al. 2015; Strong and Gong 2014; Wood and Dykes 2008), these approaches showed very convincing results in terms of runtime and quality metrics.
All in all, the selected methods represent some of the best available techniques that address the problem of point positioning, and more specifically, our scatterplot layout use case. A side contribution of this paper are the Python and JavaScript implementations of the used techniques (except SMWG), as well as of the evaluation metrics that will be presented in the next section.
4.2 Evaluation metrics
For the comparison, we have selected the following evaluation metrics: neighborhood preservation (NP, also called \(\mathrm{AUC}_{\mathrm{log}}\mathrm{RNX}\) (Lee et al. 2015)), crosscorrelation (CC), Euclidean distance (ED), size increase (SI), and runtime (RT). The metrics in this list were selected based on the ones mentioned in the related work.
Our motivation behind selecting these particular metrics is twofold. Neighborhood preservation and crosscorrelation describe whether the local neighborhoods are preserved. The other metrics, Euclideandistance and size increase, are more sensitive to how the global aspect of the scatterplot changes. We have selected those metrics, because our main goal is to remove overlap, while roughly holding the position of the points in the original scatterplots. The local metrics neighborhood preservation and crosscorrelation will help us compare against DGrid where the overarching goal was spacefilling while maintaining neighborhoods. To make the metrics comparable, we scaled the extents of all original and gridified scatterplots to [0, 1].
Neighborhood preservation (NP, or \(\mathrm{AUC}_{\mathrm{log}}\mathrm{RNX}\) (Lee et al. 2015)) is a metric that enhances the metric kNeighborhood Preservation (\(\text {NP}_k\)) (Paulovich and Minghim 2008) by aggregating the measurements for all neighborhood sizes k. The metric NP_{k} is used by nearly all methods mentioned in the related work section (Duarte et al. 2014; GomezNieto et al. 2013a; Hilasaca and Paulovich 2019; MarcílioJr et al. 2019). It calculates the average percentage of the knearest neighbors of each box that are preserved in the final layout. It takes a value between 0 and 1.
We calculate \(\text {NP}_k\) as follows:
where X is the original set of points, Y is the gridified set of points. \(V_i^{(k)}\) returns the set of the knearest neighbors of the ith point of the respective set of datapoints, and n is the total number of points in the data. These values are scaled and aggregated as follows:
Values between 0 and 1 are possible, where 1 means perfect neighborhood preservation, and 0 means no neighborhood preservation.
Crosscorrelation (\(\text{CC}\)) measures the distance correlation between pairwise distances in the original layout compared to ones in the new layout. The distance can be interpreted as a measure of dissimilarity between the points; ideally, dissimilar points in the original layout remain as such in the new one. Hilasaca and Paulovich (Hilasaca and Paulovich 2019) also use this measure in their evaluation of DGrid. The \(\text {CC}\) measure is defined as:
where \(x_i\) and \(x_j\) are points belonging to X (the original set of points), \(y_i\) and \(y_j\) are points belonging to Y (the gridified set of points), \(\sigma _X\) and \(\sigma _Y\) are the respective standard deviations, and \({\overline{\delta }}_X\) and \({\overline{\delta }}_Y\) are the respective mean distances between any pair of points.
This measure should be interpreted the same way as any correlation coefficient would be. If the value is close to \(1\), the pairwise distances are negatively correlated, i.e., points that used to be close together are now far away from each other. If the value is around 0, it means that there is no relationship between the pairwise distances of the original and new layouts. Ideally, this measure takes the value of 1: then, points that were close together stay together, and points further away from each other remain far away.
Euclidean distance (\(\text {ED}\)) is the average distance between the original points and their gridified counterparts:
where \(x_i\) and \(y_i\) correspond again to the original and the final position of our data points. We use Euclidean distance as a measure for the global structure of the visualization. The main objective of methods such as NMap and DGrid is a spacefilling visualization, rather than the preservation of global structure. Therefore, we expect them to perform poorly with respect to Euclidean distance. In our case, intuitively, in order for the global structure to change as little as possible, the points should be moved to a position as close as possible. The higher the average Euclidean distance is, the worse the global structure of the graph is preserved.
Size increase (\(\text {SI}\)) is the ratio of the area of the convex hull of the initial scatterplot (\(C_X\)) and the area of the convex hull of the gridified version (\(C_Y\)):
This measure is used for global structure preservation. Ideally, the resulting grid has a similar convex hull to the initial one. While the range for values of size increase is \(]0, \infty [\), 1 is the optimal value. Size Increase has also been used in previous evaluations (Duarte et al. 2014; GomezNieto et al. 2013b). Due to the spacefilling aspect of DGrid, this metric will be expected to have worse results.
Run time (\(\text {RT}\)) is the total time needed for one technique to compute the gridified version from the original scatterplot. All the techniques we evaluated against also examine the running performance of their methods in terms of time.
4.3 Evaluation data
For the evaluation, we gathered 60 real and synthetic datasets. 54 of these datasets stem from Aupetit and Sedlmair (Aupetit and Sedlmair 2016). We have further included six additional image datasets publicly available online. From these datasets, we created a total of 339 scatterplots by projecting them with different dimensionality reduction (DR) methods. These methods lead to vastly diverse layouts with different distributions of points across them. Therefore, they allow us to assess how well \({{ \text {Hagrid}}_\mathrm{{HC}}}\), \({{\text {Hagrid}}_\mathrm{{GC}}}\), and the other techniques perform depending on the number of points in the dataset, and on the number of total collisions.
In the supplemental material, we provide a list of the datasets involved in the evaluation. We include their names, sizes, and number of projections computed on each. In terms of DR algorithms, we used PCA (Pearson 1901), robust PCA (Candès et al. 2011), tSNE (van der Maaten and Hinton 2008), Isomap (Tenenbaum et al. 2000), LLE (Roweis and Saul 2000), UMAP (McInnes et al. 2018), Spectral Embedding (Belkin and Niyogi 2003), Gaussian Random Project (Bingham and Mannila 2001), and MDS (Kruskal 1964). For parameterizable DR methods, we performed a grid search through a range of values for each parameter involved. The final 339 scatterplots were selected by uniformly sampling across different scatterplot sizes from a total of 1695 projections. For all tests, we have normalized the scatterplots so that the dimensions have values between 0 and 1.
4.4 Evaluation setup
We implemented \({{\text {Hagrid}}_\mathrm{{HC}}}\) and \({{\text {Hagrid}}_\mathrm{{GC}}}\), as well as CMDS and DGrid in Python and JavaScript. For SMWG, we used the available online tool (http://www.gicentre.org/smwg/) as we could not replicate the code based on publicly available information. For the evaluation we used the Python implementations, because Python supports parallel processing for speeding up the computations of the evaluation metrics. All evaluations were run on an Intel(R) Core(TM) i78705G CPU @ 3.10 GHz, with 16 GB of RAM.
We categorized the 339 scatterplots into six groups according to the number of points (see Fig. 5). \({{\text{Hagrid}}_\mathrm{{HC}}}\), \({{\text {Hagrid}}_\mathrm{{GC}}}\), and DGrid were computed for all 339 scatterplots. CMDS were only computed for scatterplots up to 250 points (group 1 and 2), and SMWG only for scatterplots up to 100 points (group 1). The reason behind this choice is that the computation times of these two techniques became intractable for larger datasets. These computation times were far beyond the requirements of our use case for rapid or even interactive application of such approaches. Last, we computed the quality metrics for all results.
For the Hagrid results, we used the level that guarantees at least \(50\%\) whitespace (\(l_{50\%} = \lceil \log _m\left( n \cdot 1.5\right) \rceil\)), to have space for maintaining global structures. If the vertices of the underlying SFC are completely occupied, the result would be spacefilling as with DGrid. We used the recommended parameters for CMDS (\(\alpha =0.1\) and \(\mathrm {iter}=20\) per round), for the size of one tile we used the size defined by \({{\text {Hagrid}}_\mathrm{{HC}}}\), and set the boundaries to the extent of the scatterplot plus the size of one tile as margin. Furthermore, we fixed CMDS ’s number of maximum rounds to 50, so we could measure comparable runtime. DGrid allows one to set an aspect ratio, we computed square results as we normalized the scatterplot beforehand to a range of [0, 1] for all dimensions. For SMWG, we used the online tool, where we first loaded the dataset, then we defined the grid to have 0.5 whitespace and “minimum aspect ratio”. For the optimization, we employed the simulated annealing option with default options but with weightings set to 1 for “VEC”, “DIST”, “TOPO”, and “SHAPE”, and the ones specifically important for maps (“DIR” and “DAT”) to 0.
The supplemental material includes further details and the code to reproduce the evaluation results.
4.5 Results
Figure 5 summarizes the results of our comparative evaluation according to the four quality metrics introduced and the runtime. We investigate the scalability of Hagrid, by aggregating the results according to dataset size, and an extra evaluation only measuring the runtime in Fig. 4. The supplemental material contains a more detailed version of Fig. 5, which shows the results for each dataset separately. As we fixed CMDS ’ number of maximum rounds to 50, not all overlap was removed from the final layout, and the quality metrics should be slightly more optimistic as the 50th round layout will always be closer to the original.
Generally, Hagrid ’s runtime outperforms every other method we evaluated against and is about four times as fast compared to the next best technique, DGrid, across different dataset sizes. Hagrid outperforms the other methods in almost all metrics, except for NP. SMWG and CMDS produce generally good quality results, but at the cost of much higher runtimes. In group 1 (0–100 points), Hagrid needs roughly a millisecond to gridify the result, whereas SMWG needed on average 30 seconds. Please note that, in Fig. 5, we use a \(\log _{10}\)scale for runtime and a \(\log _2\)scale for the SI metric to improve readability.
To expand on the analysis provided by the boxplots in Fig. 5, we show in Fig. 7 the resulting grids for the COIL100 dataset (Nene et al. 1996, only for Hagrid and DGrid because the runtime for CMDS was too high, and the onlinetool for SMWG even crashed for this dataset. This data consists of 792 images of 10 random objects, for example a car, a rubber duck, and a toy photographed from many different angles (see Fig. 6 left). We applied UMAP on this dataset, resulting in the scatterplot layout in Fig. 6. Our aim with selecting this dataset was both to highlight how different techniques transform such an intricate layout, and to provide an example of when preserving the global structure of a dataset might be beneficial.
In Fig. 7, points are colored according to labels or different quality metrics calculated for each of the individual points respectively. The spacefilling approach DGrid preserves the local cluster structure. However, when there are more cells instantiated than points, DGrid places the empty cells on the borders of the grid rather than inside the clusters. This characteristic reduces the preservation of more global patterns.
From a qualitative angle, we see that DGrid is optimized for spacefillingness, resulting in larger thumbnail image representations. If this is the sole objective, it is clearly a good choice. However, our goal was to additionally preserve spatial structures of the scatterplot. While DGrid also preserves the cluster contents, it looses information about the projection (e.g. potential manifolds), the particularity of the clusters, and the separation between them (Brehmer et al. 2014). The empty cells of DGrid are not optimally placed in this case.
5 Use cases
Our primary goal with Hagrid was to find a good balance between removing overlap and preserving scatterplot structure, while having a fast runtime. To illustrate the value of this combination, we now provide two use cases: adjusting the curve level to create different types of layouts, and realizing an interactive lens view.
5.1 Interactively adjusting level and point size
By altering the level parameter of our technique, Hagrid gives a natural handle to deal with the tradeoff between spacefillingness (cell size and space efficiency) and point position accuracy (global structure preservation). With a low level setting, Hagrid can achieve a more squared layout, similar to DGrid. An example is given in Fig. 8, where we can see the difference between a layout with the minimum \({{\text{Hagrid}}_\mathrm{{HC}}}\) level (5), one level higher (6) that outputs the grid, and the DGrid layout for reference. In our situation, the white space is still scattered around the convex hull of the initial layout. Interactively transitioning between different levels allows to stepwise translate a scatterplot into a spacefilling map with Hagrid.
5.2 Lens view
Due to its computational efficiency, our approach can also be used to support interactive lenses such as the one proposed in Glyphboard (Kammer et al. 2020). In Glyphboard, DR projections are plotted as normal scatterplots and as the user zooms in, the dots are replaced with circular glyphs. The lens view in their implementation is using a forcedirected layout algorithm to handle overlap, but that does not guarantee neighborhood preservation. Based on our comparison to CMDS, which closely resembles a forcedirected layout algorithm, we believe our technique could fit this type of use case better. In Fig. 9, we show a possible way of using Hagrid as part of a lens view. Here, we used a tSNE projection of photographs of flowers as the source dataset (Nilsback and Zisserman 2008). For the overall scatterplot, we maintain the original scatterplot, with overlaps. However, in the lens view we use a lowlevel HC, to dynamically align the photos. As our approach is computationally efficient, it lends itself toward supporting such interactive features.
6 Conclusions and future work
A limitation of Hagrid and other gridifying techniques occurs in scatterplots with very dense or differently dense regions. To maintain the global structure of differently dense or very dense areas of scatterplots with Hagrid, high levels of the respective SFC are required, which can lead to very small grid cells. We found this behavior also in our analysis, where some of the scatterplots led to bad results, especially for the NP metric (see Fig. 5). Hagrid is sensitive to extreme outliers, which entail nonoutlying points being squished into very small areas in the scatterplot. Such situations are suboptimal for filling a static SFC grid as many collisions occur (see supplemental material for examples). A possible enhancement—reducing the sensitivity of Hagrid to extreme outliers—could be the used of adaptive SFC’s (Zhou et al. 2020).
Like other comparative studies, our investigation is naturally limited to a certain selection of existing techniques. For future work, it might for instance also be interesting to compare to further other approaches. A recent update of DGrid (Hilasaca and Paulovich 2019) (version 4 on arXiv) adds dummy points to preserve also the global structure of a scatterplot. But similar to the extension of NMap (Duarte et al. 2014) for producing a treemap with uniform rectangles, adding dummy points increases the runtime for gridifying a scatterplot. With some effort it is possible to disentangle the overlap removal method of the Dust &Magnet (Vollmer and Döllner 2020) technique to use it as another overlap removal technique.
An additional limitation is inherited from the SFCs used, Hilbert and Gosper curves, which are not circular. When collisions happen close to the ends of the curves, Algorithm 2 has an imbalanced number of vertices to the left and to the right, which might worsen the quality of the result. In the future, we want to replace the SFCs used with their circular versions. The Moore curve (Ghatak et al. 2013) is, for instance, the circular alternative of the Hilbert curve. We used the Hilbert curve and the Gosper curve because these implementations have better computational efficiency.
In this paper, we proposed Hagrid, a novel approach for generating gridified layouts that preserves the local and global neighborhoods and cluster structure of a scatterplot, while removing the overlap of point representations. We demonstrated the approach by employing two spacefilling curves, Hilbert and Gosper, to create squared and hexagonal grids. The set of comparisons we provide shows that our technique outperforms the existing stateoftheart techniques in terms of different quality metrics, and is roughly four times faster than them.
Notes
Hagrid is short for Hilbert And Gosper Curvebased GRIDs
References
Auber D, Huet C, Lambert A, Renoust B, Sallaberry A, Saulnier A (2013) GosperMap: using a gosper curve for laying out hierarchical data. IEEE Trans Vis Comput Graph (TVCG) 19(11):1820–1832. https://doi.org/10.1109/TVCG.2013.91
Aupetit M, Sedlmair M (2016) SepMe: 2002 New visual separation measures. In: IEEE Pacific Vis Symp (PacificVis). https://doi.org/10.1109/PACIFICVIS.2016.7465244
Bader M (2012) Spacefilling curves: an introduction with applications in scientific computing. Vol. 9. Springer Science & Business Media. https://doi.org/10.1007/9783642310461
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396. https://doi.org/10.1162/089976603321780317
Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of ACM international conference on knowledge discovery and data mining (SIGKDD). pp 245–250. https://doi.org/10.1145/502512.502546
Borg I, Groenen P (2003) Modern multidimensional scaling: theory and applications. J Educ Meas (JEM) 4(3):277–280. https://doi.org/10.1111/j.17453984.2003.tb01108.x
Brehmer M, Sedlmair M, Ingram S, Munzner T (2014) Visualizing dimensionallyreduced data: Interviews with analysts and a characterization of task sequences. In: Proceedings of beyond time and errors: novel evaluation methods for visualization (BELIV). pp 1–8. https://doi.org/10.1145/2669557.2669559
Buchmüller J, Jäckle D, Cakmak E, Brandes U, Keim DA (2018) MotionRugs: visualizing collective trends in space and time. IEEE Trans Vis Comput Graph (TVCG) 25(1):76–86. https://doi.org/10.1109/TVCG.2018.2865049
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):37. https://doi.org/10.1145/1970392.1970395
Cormen TH, Leiserson CE, Rivest RL (2009) Introduction to algorithms. MIT Press
Crowley E, Zisserman A (2014) The state of the art: object retrieval in paintings using discriminative regions, In: Proc. British Machine Vision Conf, BMVA Press
Cutura R, Morariu C, Cheng Z, Wang Y, Weiskopf D, Sedlmair M (2021) Hagrid: gridify scatterplots with hilbert and gosper curves. In: International symposium on visual information communication and interaction. pp 1–8. https://doi.org/10.1145/3481549.3481569
Deng L (2012) The MNIST database of handwritten digit images for machine learning research. IEEE Sign Process Mag 29(6):141–142. https://doi.org/10.1109/MSP.2012.2211477
Du Q, Faber V, Gunzburger M (1999) Centroidal voronoi tessellations: applications and algorithms. SIAM Rev 41(4):637–676. https://doi.org/10.1137/S0036144599352836
Duarte FSLG, Sikansi F, Fatore FM, Fadel SG, Paulovich FV (2014) Nmap: a novel neighborhood preservation spacefilling algorithm. IEEE Trans Vis Comput Graph (TVCG) 20(12):2063–2071. https://doi.org/10.1109/TVCG.2014.2346276
Dwyer T, Marriott K, Stuckey PJ (2006) Fast node overlap removalcorrection. In: International symposium on graph drawing, Springer, pp 446–447. https://doi.org/10.1007/9783540709046_44
Eppstein D, van Kreveld M, Speckmann B, Staals F (2015) Improved grid map layout by point set matching. Int J Comp Geom Appl 25(2):101–122. https://doi.org/10.1142/S0218195915500077
Fried O, DiVerdi S, Halber M, Sizikova E, Finkelstein A (2015) IsoMatch: creating informative grid layouts. In: Computer graphics forum, Vol. 34. Wiley Online Library, pp 155–166. https://doi.org/10.1111/cgf.12549
Gansner ER, Hu Y (2008) Efficient node overlap removal using a proximity stress model. In: International Symposium on Graph Drawing, Springer, pp 206–217. https://doi.org/10.1007/9783642002199_20
Gardner M (1976) Mathematical gamesin which “monster” curves force redefinition of the word “curve”. Sci Am 235(6):124–133
Ghatak R, Pal M, Goswami C, Poddar DR (2013) Moore curve fractalshaped miniaturized complementary spiral resonator. Microw Opt Technol Lett 55(8):1950–1954. https://doi.org/10.1002/mop.27682
GomezNieto E, Casaca W, Nonato LG, Taubin G (2013) Mixed integer optimization for layout arrangement. In: Symposium on graphics, patterns and images (SIBGRAPI). IEEE, pp 115–122. https://doi.org/10.1109/SIBGRAPI.2013.25
GomezNieto E, Roman FS, Pagliosa P, Casaca W, Helou ES, de Oliveira MCF, Nonato LG (2013) Similarity preserving snippetbased visualization of web search results. IEEE Trans Vis Comput Graph (TVCG) 20(3):457–470. https://doi.org/10.1109/TVCG.2013.242
Hilasaca G, Paulovich FV (2019) Distance preserving grid layouts. arXiv preprint arXiv:1903.06262
Hilbert D (1935) Über die stetige Abbildung einer Linie auf ein Flächenstück. In: Dritter band: analysis grundlagen der mathematik physik verschiedenes, Springer, pp 1–2. https://doi.org/10.1007/9783662384527_1
Ježowicz T, Gajdoš P, Ochodková E, Snášel V (20140 A New iterative approach for finding nearest neighbors using spacefilling curves for fast graphs visualization. In: International Joint Conference SOCO’14CISIS’14ICEUTE’14. Springer, pp 11–20. https://doi.org/10.1007/9783319079950_2
Kammer D, Keck M, Gründer T, Maasch A, Thom T, Kleinsteuber M, Groh R (2020) Glyphboard: visual exploration of highdimensional data combining glyphs with dimensionality reduction. IEEE Trans Vis Comput Graph (TVCG) 26(4):1661–1671. https://doi.org/10.1109/TVCG.2020.2969060
Keim DA, Herrmann A (1998) The Gridfit algorithm: an efficient and effective approach to visualizing large amounts of spatial data. In: Proc. Visualization. pp 181–188. https://doi.org/10.1109/VISUAL.1998.745301
Kruskal JB (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1):1–27. https://doi.org/10.1007/BF02289565
Lee JA, PeluffoOrdóñez DH, Verleysen Michel (2015) Multiscale similarities in stochastic neighbour embedding: reducing dimensionality while preserving both local and global structure. Neurocomputing 169(2015):246–261. https://doi.org/10.1016/j.neucom.2014.12.095
Liu X, Hu Y, North S, Shen HW (2018) CorrelatedMultiples: spatially coherent small multiples with constrained multidimensional scaling. In: Computer graphics forum, Vol. 37. Wiley Online Library, pp 7–18. https://doi.org/10.1111/cgf.12526
MarcílioJr WE, Eler DM, Garcia RE, Pola IRV (2019) Evaluation of approaches proposed to avoid overlap of markers in visualizations based on multidimensional projection techniques. Inf Vis 18(4):426–438. https://doi.org/10.1177/1473871619845093
McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426
McNeill G, Hale SA (2017) Generating tile maps. In: Computer graphics forum, Vol. 36. Wiley Online Library, pp 435–445. https://doi.org/10.1111/cgf.13200
Meulemans W, Dykes J, Slingsby A, Turkay C, Wood J. Small multiples with gaps. http://www.gicentre.org/smwg/. Accessed 1 Dec 2020
Meulemans W, Dykes J, Slingsby A, Turkay C, Wood J (2016) Small multiples with gaps. IEEE Trans Vis Comput Graph (TVCG) 23(1):381–390. https://doi.org/10.1109/TVCG.2016.2598542
Meulemans W, Sondag M, Speckmann B (2020) A simple pipeline for coherent grid maps. IEEE Trans Vis Comput Graph (TVCG). https://doi.org/10.1109/TVCG.2020.3028953
Muelder C, Ma KL (2008) Rapid graph layout using space filling curves. IEEE Trans Vis Comput Graph (TVCG) 14(6):1301–1308. https://doi.org/10.1109/TVCG.2008.158
Nachmanson L, Nocaj A, Bereg S, Zhang L, Holroyd A (2016) Node overlap removal by growing a tree. In: International symposium on graph drawing and network visualization, Springer, pp 33–43. https://doi.org/10.1007/9783319501062_3
Nene SA, Nayar SK, Murase H et al. (1996) Columbia object image library (coil20)
Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. In: Conference on Computer Vision, Graphics and Image Processing. IEEE, pp 722–729. https://doi.org/10.1109/ICVGIP.2008.47
Paulovich FV, Minghim R (2008) HiPP: a novel hierarchical point placement strategy and its application to the exploration of document collections. IEEE Trans Vis Comput Graph (TVCG) 14(6):1229–1236. https://doi.org/10.1109/TVCG.2008.138
Pearson K (1901) LIII. On lines and planes of closest fit to systems of points in space. London Edinb Dublin Philos Mag J Sci 2(11):559–572. https://doi.org/10.1080/14786440109462720
Pinho R, de Oliveira MCF, de Lopes AA (2009) Incremental board: a gridbased space for visualizing dynamic data sets. In: ACM Symposium on Applied Computing (SAC). pp 1757–1764. https://doi.org/10.1145/1529282.1529679
Quadrianto N, Smola AJ, Song L, Tuytelaars T (2010) Kernelized sorting. IEEE Trans Patt Anal Mach Intell 32(10):1809–1821. https://doi.org/10.1109/TPAMI.2009.184
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326. https://doi.org/10.1126/science.290.5500.2323
Schulz C, Nocaj A, ElAssady M, Frey S, Hlawatsch M, Hund M, Karch G, Netzel R, Schätzle C, Butt M, Keim DA, Ertl T, Brandes U, Weiskopf D (2016) Generative data models for validation and evaluation of visualization techniques. In: Proceedings of beyond time and errors: novel evaluation methods for visualization (BELIV). pp 112–124. https://doi.org/10.1145/2993901.2993907
Sedlmair M, Tatu A, Munzner T, Tory M (2012) A taxonomy of visual cluster separation factors. In: Computer graphics forum, Vol. 31. Wiley Online Library, pp 1335–1344. https://doi.org/10.1111/j.14678659.2012.03125.x
Strobelt H, Spicker M, Stoffel A, Keim D, Deussen O (2012) Rolled: out Wordles—a heuristic method for overlap removal of 2D data representatives. In: Computer graphics forum, Vol. 31. Wiley Online Library, pp 1135–1144. https://doi.org/10.1111/j.14678659.2012.03106.x
Strong G, Gong M (2014) Selfsorting map: an efficient algorithm for presenting multimedia data in structured layouts. IEEE Trans Multimed 16(4):1045–1058. https://doi.org/10.1109/TMM.2014.2306183
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323. https://doi.org/10.1126/science.290.5500.2319
Vojtěch U, Petr G, Václav S, YuChi L, Michal Radeckỳ (2019) Hierarchical hexagonal clustering and indexing. Symmetry 11(6):731. https://doi.org/10.3390/sym11060731
van der Maaten L, Hinton G (2008) Visualizing data using tSNE. 9, pp 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html
Vollmer JO, Döllner J (2020) 2.5D Dust and magnet visualization for large multivariate data. In: International symposium on visual information communication and interaction. pp 21–1. https://doi.org/10.1145/3430036.3430045
Ward MO (2002) A taxonomy of glyph placement strategies for multidimensional data visualization. Inf Vis 1(3–4):194–210. https://doi.org/10.1057/PALGRAVE.IVS.9500025
M Wattenberg (2005) A note on spacefilling visualizations and spacefilling curves. In: Proceedings of the IEEE information visualization symposium. IEEE, pp 181–186. https://doi.org/10.1109/INFVIS.2005.1532145
Wilkinson L, Anand A, Grossman R (2005) Graphtheoretic scagnostics. In: Proceedings of the IEEE information visualization symposium. IEEE, pp 157–164. https://doi.org/10.1109/INFVIS.2005.1532142
Wood J, Dykes J (2008) Spatially ordered treemaps. IEEE Trans Vis Comput Graph (TVCG) 14(6):1348–1355. https://doi.org/10.1109/TVCG.2008.165
Zhou L, Johnson CR, Weiskopf D (2020) Datadriven spacefilling curves. IEEE Trans Vis Comput Graph (TVCG). https://doi.org/10.1109/TVCG.2020.3030473
Acknowledgements
This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)  ProjectID 251654672  TRR 161.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cutura, R., Morariu, C., Cheng, Z. et al. Hagrid: using Hilbert and Gosper curves to gridify scatterplots. J Vis 25, 1291–1307 (2022). https://doi.org/10.1007/s12650022008547
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12650022008547