Density-based label placement


We introduce a versatile density-based approach to label placement that aims to put labels in uncluttered areas of an underlying 2D visualization. Our novel, image-space algorithm constructs a density map by applying kernel density estimation to the input features, i.e., the locations of the points to be labeled. In order to find a suitable position for a label where it does not overlap any features or other labels, we move it following the gradient descent of this density map. This guides labels toward nearby areas of low feature density, resulting in a layout where labels are spread around feature-dense areas. The gradient descent trajectory can be used to draw curved leaders that connect the point features to their labels. Additionally, our approach supports prioritized label placement, user-defined label-to-label and label-to-feature margins, obstacle-constrained labeling, and arbitrarily shaped labels. The proposed method is conceptually simple and can easily be implemented using OpenCV and image-processing libraries.


The automated placement of labels on maps and other visualizations is a long-standing goal of information visualization. Human map makers and designers can solve the labeling problem quite well, but an automated approach is often required for large dynamic data sets or for real-time generation of visualizations. Most optimization problems related to label placement are NP-hard [11, 29], but several heuristics with different objectives and underlying principles have been proposed over the years [37, 39]. In the literature, automated label placement is often split into two main approaches: internal labeling and boundary labeling. In internal labeling, labels are placed directly adjacent or in close proximity to the features (e.g., points or image features). In boundary labeling, all labels are placed around the perimeter of the visualization and connected to their corresponding features with leaders (i.e., lines). One of the main difficulties with internal labeling is that dense areas of the visualization may not contain enough space to place all labels adjacent to their features without overlapping (and thus cluttering and obscuring) other graphical features. A drawback of boundary labeling, in contrast, is that the associations between features and their labels are not immediately apparent nor easy to follow: one has to trace the leader across the visualization to find the corresponding label. Mixed approaches, placing labels inside the visualization but not necessarily adjacent to the features and connecting them with leaders, have received little attention in the literature.

In this paper, we propose a novel approach for solving the labeling problem. We work entirely image-based: Given a feature cloud embedded in the plane, we first convolve the features with two different kernels to construct two density maps; one with a large kernel that encompasses the overall structure of the feature cloud and one with a smaller kernel that results in a finer-grained density map. For each feature to be labeled, we then compute a density gradient descent of the label starting at the position of the feature. During the first phase, we move the label according to the gradient of the density map based on the large kernel. Then, we continue the gradient descent using the finer-grained density map in the second phase. We iteratively move the label until it reaches an area with enough free space to place the label. Our proposed method allows the user to specify label-to-label or label-to-feature margins, which are taken into account when checking whether a label can be placed. The method supports prioritized labeling by adapting the order in which labels are placed. Furthermore, we propose a way to modify our density map that allows for the addition of obstacles of arbitrary shapes and sizes. Our main contributions are:

  • An image-based labeling heuristic that is conceptually simple and supports placing prioritized labels;

  • A method to generate curved leaders along gradient descent trajectories, avoiding crossings between leaders;

  • A technique to define a maximum overlap or minimum margin between labels, as well as between features and labels;

  • A way to route labels away from arbitrarily shaped obstacles in the layout.

Figure 1 shows a labeling result obtained using our approach: Labels are nicely spread around feature-dense areas, with smooth non-crossing leaders between features and labels. Since our method is image-based, it can be used for a wide array of labeling applications. It is particularly suitable when the data contain some dense areas or areas where no labels should be placed, but generally has enough space for the labels.

Fig. 1

Details of our label placement results for a German train station data set [39] for the area surrounding Dortmund

Related work

Automated label placement

Automated labeling has been an active research area for decades, so a comprehensive review of all related work is beyond the scope of this paper. We focus on highlighting major developments and the works closely related to our approach. We refer the reader to the bibliography by Wolff and Strijk [39] for an extensive list of published heuristics and to the survey by Van Garderen [37] for a detailed literature review. Label placement in 3D settings is discussed in depth by Kouřil et al. [22].

The labeling problem originated in cartography as the problem of placing textual labels on geographical maps, resulting in the internal labeling model. This model has been studied most extensively; proposed approaches include integer linear programs [41], expert systems [12], simulated annealing [6], and combinatorial optimization [38]. Closely related to internal labeling is the overlap removal problem, where labels are not placed adjacent to some feature (e.g., a point) but instead of it. Their initial positions then have to be adjusted to avoid overlap. Proposed heuristics for this problem include force-based methods [29], constrained optimization [7], and proximity stress [14].

Boundary labeling, although commonly used in medical drawings [30], only recently started to receive attention as a computational problem [3]. In this version of the problem, not only the placement of the labels, but also the routing of the leaders has to be considered [3, 4, 21]. Kaufmann [20] pointed out the need for hybrid approaches that allow some internal and some boundary labels or that use leaders for internal labels. So far, however, only few such methods have been developed, with some notable exceptions. Bekos et al. [2] and Löffler et al. [26] worked on mixed map labeling, where some labels are placed internally and the others along the boundary. The particle-based labeling system by Luboschik et al. [27] places as many labels as possible adjacent to the features, but in a second pass uses internal labeling with leaders for the remaining points. The clutter-aware layout approach by Meng et al. [28] also combines adjacent and distant labels, while filtering out some labels in dense areas to reduce clutter.

Although the placement of labels in high-density areas has been identified as a problem in several of the aforementioned papers on automated labeling, none of the existing methods, to the best of our knowledge, explicitly take density into account to find a suitable label position. The fact that crossing-free curved leaders follow directly from our placement strategy is also unique to our approach.

Density-based approaches

Density-based approaches shift the discrete representation of geometric features into a continuous spectrum of aggregated values. Density visualization has been proven useful to represent nodes [35], edges [23], or both nodes and edges [40] of a node-link diagram with density fields. This implicit aggregation of values allows for fast rendering and the development of fast algorithms. In image processing, density maps are generally computed using kernel density estimation (KDE) with Gaussian or Sobel kernels [15, 16, 25].

Several approaches exist to efficiently compute a KDE map. A common approach consists of directly splatting a precomputed kernel at each feature position and then accumulating the results into a floating-point buffer by using additive blending [16]. An alternative to the splatting method consists of using a gathering strategy where the spatial density is computed for each pixel in the image based on nearby point features [36]. Splitting the 2D convolution into two 1D convolutions further accelerates this approach, but it still yields a complexity that creates serious scalability issues for high-resolution image densities. The Fourier transform approach [31] further reduces the complexity of the KDE 2D convolution. Using the properties of the fast Fourier transform (FFT), this approach shifts the spatial convolution into a spectral point-wise signal product [25, 31]. The FFT approach is currently the best method to compute KDE maps for high-resolution images, yielding the lowest complexity of all known approaches [25]. Therefore, we will use this method to compute our density maps.

Overview of the density-based approach

We consider a set P of n points embedded in the plane and a set L of at most n labels, with at most one label for each point in P. In the basic version of our approach, we assume that labels are uniform rectangles of size \(w\times h\). Each label is anchored at its centroid and initially positioned at the position of its corresponding point. Our labeling approach aims to find new positions for the labels, such that they are as close as possible to their point without overlapping any other points or labels. To find such positions, we route the labels toward a local minimum of the spatial density \(\rho \) of the point features. The labeling operation results in a set of placed labels, where each placed label occupies a previously empty position in the neighborhood of its point feature.

Following other image-based approaches [16, 25, 36], we model the local spatial density \(\rho \) using KDE methods [33]: Given the drawing \(D(P) \subset {{\mathbb {R}}^2}\) of the points P in the image space, we can estimate \(\rho \) by convolving all points \(p\in P\) with a decaying radial kernel K of support radius H (see also Sect. 4.2). The density map constructed in this way reflects the point-feature density. A uniform distribution of features will yield a flat map. More interestingly, low-density areas correspond to zones of low feature density, where we would ideally place the labels. Following the gradient descent \(\nabla \rho \) of the local feature density, starting from the position of a feature, will guide the label toward areas of low feature density where there is enough space to place it. This process is illustrated in Fig. 2.

Fig. 2

Density-based approach: the labels start at the feature positions (black points) and follow the gradient descent (orange arrows) of the density map, shown in the background

Local and global density map

The chosen bandwidth of the kernel K has a strong impact on the shape of the density map and the location of the nearest local minimum for any point. Figure 3a illustrates two issues that might occur when the bandwidth is chosen poorly.

Fig. 3

Density maps computed with different kernel bandwidths for the same point set. A small bandwidth (a) may result in isolated points with a vanishing gradient (area A1) and local minima too small to hold all labels (area A2), while a large bandwith (b) results in the loss of local details

A feature that is too far away from any other features (relative to the bandwidth) might get a vanishing gradient, because it is the only contributor to the density in this area. This is the case for the point in the lower left corner of Fig. 3a (area A1). Furthermore, some input configurations might result in a local minimum that does not provide enough space to place all the labels. This can happen, for example, for points on a convex curve, like the group of points in Fig. 3a (area A2). By using a larger kernel, i.e., a larger bandwidth, we can obtain a density map for the same point set without either of these issues, as illustrated in Fig. 3b. However, using a large kernel size may lead to too much loss of local details in the density map.

Our approach makes use of two density maps, to benefit from the advantages of both a small and a large kernel. The global density map (\(\rho _{\text {glob}}\)) is computed with a large kernel bandwidth. We use \(\rho _{\text {glob}}\) for the initial displacement of the labels, to avoid vanishing gradients and to steer the labels away from local density minima that do not provide enough space. The local density map (\(\rho _{\text {loc}}\)) is computed with a smaller kernel bandwidth. We use \(\rho _{\text {loc}}\) for all subsequent iterations of the algorithm. Consequently, our gradient descent method consists of two phases. In the first phase, labels are moved according to the global density distribution of the features in the plane. In the second phase, the labels are routed according to the local density map.

Density gradient descent

All labels are initialized at the positions of their corresponding features. Each label is moved following the density gradient descent in an iterative process. In each iteration, the direction of the density gradient of the current label position is looked up and the label is moved by a small step size \(\delta \) in the opposite direction (see also Sect. 4.3).

The first phase, using the global density map \(\rho _{\text {glob}}\), continues until the label is a small distance d away from its initial position. Then, we switch to the second phase, using the local density map \(\rho _{\text {loc}}\), until termination of the algorithm. To ensure termination of the algorithm, we make use of three stopping conditions for the gradient descent: (1) A suitable position for the label has been found (see Sect. 4.4 for details); (2) a predefined maximum number of iterations has been reached; or (3) the gradient has vanished. If either the second or the third condition is fulfilled, the label cannot be placed and will be left out of the resulting set of placed labels.

Additional parameters

Our labeling approach supports a (user-defined) prioritization of labels by adapting the order in which labels are placed (see also Sect. 4.1). For the first few features that are labeled, there is a relatively high probability to find an empty space to place the label close to the feature. Labels that are placed later, however, are more likely to be forced further away from their features due to the space already occupied by previously placed labels. By placing high-priority labels first, they are more likely to be placed and more likely to be close to their features as compared to low-priority labels.

Furthermore, our method supports user-defined margins between labels as well as between labels and features. Such margins can be particularly helpful to ensure the features remain clearly visible and the labels are easily distinguishable. Negative margins are also supported, which can be used to allow a small “authorized” overlap between labels in order to fit more labels into the visualization. The effect of different margin settings is illustrated in Fig. 4.

Finally, the placed labels can be connected to their features with leaders (see also Sect. 4.5). In addition to straight-line leaders, our method supports leaders based on the gradient descent trajectories of the labels.

Fig. 4

Labeling without margins (a), with positive label and feature margins (b), and with negative label margins and no feature margins (c). Solid black lines represent actual label sizes, and the margins are shown by dashed red lines

Method details

In this section, we discuss each step of our approach in more detail while providing mathematical background and implementation details where relevant. The implementation of our method is realized in C# on Windows and uses a GPU image-based approach with OpenCV [5] and [13] libraries.

Ordering of labels

First, we order all point features of the input point set P according to their user-defined priorities. If no priorities are provided, we suggest two data-driven orderings as alternatives: label size (either maximum dimension or area) or global density value. Prioritizing the placement of large labels over smaller ones increases the chance to find empty space for large labels. Ordering labels based on the global density map can either be used to ensure that features in low-density areas will be labeled (by placing low-density labels first, e.g., when outliers are of interest) or to increase the number of features in high-density areas that can be labeled (by placing high-density labels first). The effect of prioritizing labels based on density values is illustrated in Fig. 5. In our implementation, we included the option to order the labels according to the global density value of their respective feature. To prioritize labels based on global density, we compute the global density map using Eq. 1 (see Sect. 4.2) and then order the labels according to the density value at the position of their respective feature in the density map.

Gradient estimation

To move the labels away from their features and toward areas with low feature density, we first need to compute the gradient of the density maps (\(\nabla \rho _{\text {glob}}\) and \(\nabla \rho _{\text {loc}}\)) of n features rasterized into an image (or texture) of size \({R_{\text {glob}}}^2\) and \({R_{\text {loc}}}^2\), respectively.

We use an FFT-based approach for our implementation, yielding a lower complexity than other approaches (see Sect. 2.2) as well as the ability to scale up to high-resolution images [25, 31]. This approach transforms the spatial convolution into a spectral point-wise signal product, so we compute the density map \(\rho \) as

$$\begin{aligned} \rho = D(P) * K = {\mathscr {F}}^{-1}[{\mathscr {F}}[D(P)]\cdot {\mathscr {F}}[K]]~, \end{aligned}$$

where D(P) is the drawing of the features in the image space, \({\mathscr {F}}\) and \({\mathscr {F}}^{-1}\) are the Fourier and inverse Fourier transforms, respectively, and K is a decaying kernel. We use a centered Gaussian normal distribution [1] as our kernel (\(K = {\mathscr {N}}(\sigma ,0)\)), but we could also have used an Epanechnikov kernel that optimally approximates \(\rho \) with respect to minimal variance [8, 18]. To move the labels away from their features and toward areas with low feature density, our algorithm requires only the gradient of the density map (\(\nabla \rho \)), not the density map itself. Instead of computing the gradient by finite differences of the density map, following Lhuillier et al. [24, 25], we simplify Eq. 1 to

$$\begin{aligned} \nabla \rho= & {} ({\mathscr {F}}^{-1}[{\mathscr {F}}[D(P)]\cdot {\mathscr {F}}[\nabla _x K]],\nonumber \\&\quad {\mathscr {F}}^{-1}[{\mathscr {F}}[D(P)]\cdot {\mathscr {F}}[\nabla _y K]])~, \end{aligned}$$

where \(\nabla _x K\) and \(\nabla _y K\) are the precomputed analytical partial derivatives of K.

Fig. 5

Results for labeling from left to right (a), high-density features first (b), and low-density features first (c)

The gradient estimation step computes two rasterized gradient maps \(\nabla \rho _{\text {glob}}\) and \(\nabla \rho _{\text {loc}}\) of resolutions \(R_{\text {glob}}\) and \(R_{\text {loc}}\) following Eq. 2 with the respective kernels \(K_{\text {glob}} = {\mathscr {N}}(\sigma _{\text {glob}}, 0)\) and \(K_{\text {loc}} = {\mathscr {N}}(\sigma _{\text {loc}}, 0)\). This step requires two parameters per map, R and \(\sigma \). In our application, we set the parameters manually based on the overall distribution of point features and such that \(R_{\text {glob}} \le R_{\text {loc}}\) and \(\sigma _{\text {glob}} \ge \sigma _{\text {loc}}\). However, kernel bandwidth could also be estimated with existing methods such as mean integrated squared error or data-driven adaptive selectors [18, 32].

Label advection

After ordering the list of features to be labeled and obtaining the gradient of the two feature-density maps (\(\nabla \rho _{\text {glob}}\) and \(\nabla \rho _{\text {loc}}\)), we advect each label in the direction of the gradient descent with a small step \(\delta \) to yield a new intermediate position of the label. We define our advection operator as the solution of an ordinary first-order differential equation:

$$\begin{aligned} \frac{\mathrm{{d}}l}{\mathrm{{d}}t} =\frac{\delta \cdot \nabla \rho (t)}{\max {(\left\| \nabla \rho (t)\right\| ,\epsilon )}}. \end{aligned}$$

For the first phase of the gradient descent, we use the global gradient of the density map (\(\nabla \rho = \nabla \rho _{\text {glob}}\)). In the second phase, we use the gradient of the local density map (\(\nabla \rho = \nabla \rho _{\text {loc}}\)) to route the label toward a more local density minimum. In Eq. 3, the gradient \(\nabla \rho \) is normalized in a regularized manner – the term \(\epsilon \simeq 10^{-6}\) in the denominator takes care of vanishing gradients. Moreover, the normalization of the gradient limits the displacement of a label, preventing it from “jumping” from one side of a local density minimum to the other. In practice, we set \(\delta \) to a small multiple of the unit distance of the density map, and ensure that labels move by at least one pixel in each iteration.

As outlined in Sect. 3.2, we iteratively advect each label until the label either (1) finds a suitable empty space, (2) is moved more than a predefined number of steps, or (3) converges to a local minimum. Ideally, (1) is achieved when the label reaches a position where there is sufficient free space to place the label (see Sect. 4.4). Since it might happen that no suitable label position can be found, we use two additional termination criteria. For (2), as the resolution of the density map \(R_{\text {loc}}\) is known, we know that the gradient should converge after a number of iterations \(i_{\text {max}}\) smaller than the image resolution \(R_{\text {loc}}^2\) (i.e., after having traveled every possible position in the density map). In practice, we set \(i_{\max }\) to \(R_{\text {loc}}\), meaning we stop once the label traveled the full width of the density map (which is highly unlikely). Stopping condition (3) is numerical convergence of the gradient descent. Once the label reaches a density well or moves beyond the edge of the density map, the density gradient at the current label position vanishes (i.e., \(\left\| \nabla \rho (l_i)\right\| \le \epsilon \)) and the label cannot be moved any further.

Collision mask

After each step of the gradient descent, we use a collision mask (\(C_\mathrm{{m}}\)) to test whether the current label position is empty. As detailed below, this texture is constructed such that, for uniform rectangular labels of size \(w\times h\), we have to lookup only one position in this mask to find out if there is sufficient space for the label. In our implementation, \(C_\mathrm{{m}}\) comprises two layers.

The first layer is a density map of the input points, computed using a unit kernel \(K_{{\mathbb {1}}}\) (i.e., consisting only of ones) of the same size as the labels (\(w\times h\)). This layer is computed during the initialization using the FFT approach following Eq. 1. A zero-density position in this layer is a position where a label can be placed without overlapping any feature, whereas a position with nonzero density (the area marked by the dashed outline in Fig. 6a) indicates that a label placed here would overlap a feature. A custom label-to-feature margin m can be incorporated by changing the size of \(K_{{\mathbb {1}}}\) to \((w+m)\times (h+m)\).

The second layer is a collision map of the labels that have already been placed, computed by splatting a unit kernel of size \(2w\times 2h\) centered around the position of the newly placed label. Similar to the first layer, zero-density positions in this layer are positions where a label can be placed without overlapping other labels (see Fig. 6b). Like the label-to-feature margins, user-defined label-to-label margins can be incorporated by adapting the size of the kernel.

In our implementation, the collision mask is a binary texture of size \({R_\mathrm{{c}}}^2\), initialized with the convolution of the input points and updated each time a label has been placed. Empty positions in this texture are suitable label positions. The sampling ratio between the collision mask resolution \({R_\mathrm{{c}}}^2\) and the final image is at most one, but to reduce memory access we can use a smaller grid spacing of up to the maximum label width and height.

Fig. 6

Collision mask for uniform rectangular labels applied to a feature (a) and to a label (b) without margins

Given a set of rectangular labels with (close to) uniform size \(w \times h\), our approach is beneficial in comparison with a naive lookup of available space. Using our approach requires writing \(2w\cdot 2h\) pixels per placed label and only one read per collision test. In comparison, a naive lookup would require only \(w\cdot h\) write operations per placed label, but also \(w\cdot h\) reads for each collision test during the gradient descent. However, our approach might not be suitable when labels have arbitrary shapes and/or sizes. To solve this issue, we propose an extension of the method to allow for arbitrary label shapes in Sect. 5.2.

Leader representation

Fig. 7

Labeling with straight-line leaders (a) and with the smoothed gradient descent trajectory as leaders (b)

Once the gradient descent of all labels has been computed, our algorithm outputs the set of placed labels and we can draw the labels in a given visualization. Since our approach allows for distant label placement, leaders can be drawn to connect features to their placed labels. The leaders can be visually represented using standard straight lines as shown in Fig. 7a. More interestingly, we can use the trajectory computed during the gradient descent as the leader. In addition to being easily computed during the label placement, such leaders have the advantage that crossings between leaders are avoided, because ordinary first-order differential equations (Eq. 3) have unique solutions for density fields (i.e., streamlines in a vector field cannot cross). Therefore, using the gradient descent trajectories as leaders ensures that they will not cross each other, as illustrated in Fig. 7b. However, multiple labels might end up following a very similar path (the space discretization may even cause them to converge), making it difficult to distinguish which label belongs to which feature. Possible ways to alleviate this issue are discussed in Sect. 7.

In order to counteract some artifacts of the gradient descent, we apply a light amount of Laplacian smoothing on the gradient descent trajectory. The smoothing process takes care of the small errors introduced by the imprecise estimation of the density map. The imprecision results from errors in the kernel bandwidth selection, the finite resolution of the density map, and the choice of our descent step size. Gradient descent artifacts could be further reduced by refining this step, for example, by using better smoothing algorithms or link curvature minimization approaches.


In this section, we describe two additions to our density-based labeling approach that are easily added on top of our basic method described above: obstacle-constrained labeling and generic-shape labels.

Obstacle-constrained labeling

In some labeling applications, there are areas in the embedding space that should be avoided when placing labels (e.g., icons, glyphs, or other zones of interest). The density-based labeling approach is suitable not only for the basic point-feature labeling problem, but also for labeling with obstacle constraints. Given a set of o obstacles embedded in the plane \(\Gamma _{1 \le i \le o} \in {\mathbb {R}}^2\), we want to route labels away from these obstacles. We define our obstacles as arbitrary shapes that may be placed freely. Our shapes are modeled as binary images, with black pixels delimiting the shapes and white pixels outside the shapes. Figure 12b shows an obstacle map based on the image in Fig. 12a.

To force labels to be routed outside \(\Gamma \), we modify our feature-density gradient and collision map to take obstacles into account. Given a binary image of the obstacle (\(D(\Gamma )\)), we first dilate (or erode) the image using standard image-processing methods of OpenCV [5] according to some threshold value \(\tau \), i.e., \(\Gamma _{\text {infl}} = T(\Gamma ,\tau )\). This yields an inflated (or deflated) obstacle image, allowing user-defined margins (or overlap) between the obstacles and the labels. We update the collision mask \(C_\mathrm{{m}}\) such that \(C_\mathrm{{m}} = C_\mathrm{{m}} || \Gamma _{\text {infl}}\), i.e., the two binary textures are merged together with a logical OR. Next, to constrain the gradient descent outside of the obstacles, we modify \(\nabla \rho \) in Eq. 3 such that:

$$\begin{aligned} \frac{\mathrm{{d}}l}{\mathrm{{d}}t} = \frac{\delta \cdot (\nabla \rho (t) + \alpha \cdot \nabla \mathrm{{DT}}(\Gamma _{\text {infl}})(t))}{\text {max}(\left\| \nabla \rho (t) + \alpha \cdot \nabla \mathrm{{DT}}(\Gamma _{\text {infl}})(t)\right\| ,\epsilon )} \end{aligned}$$

where \(\nabla \mathrm {{DT}}(\Gamma _{\text{ infl }}) : \Gamma \rightarrow {\mathbb {R}}^+\) is the gradient of the distance transform (DT) of the shape’s boundary \(\partial \Gamma \) and \(\alpha \in {\mathbb {R}}^+\) is a weight. Since the gradient \(\nabla \mathrm {{DT}}(\Gamma _{\text{ infl }})\) is a vector that points from each point \(x \in \Gamma \) to the point on \(\partial \Gamma \) closest to x, by using a weighted linear combination of the feature-density gradient and the DT gradient, we force the labels inside the shape \(\Gamma _{\text {infl}}\) to move outside the obstacles. Once the label exits \(\Gamma _{\text {infl}}\), the term \(\nabla \mathrm {{DT}}(\Gamma _{\text{ infl }})\) becomes null and Eq. 4 simplifies into Eq. 3.

In our implementation, we compute the distance transform with the built-in OpenCV method [5], but we could have used more efficient methods such as the AFMM method [34] or its CUDA implementation [9]. Using Eq. 2 with a Gaussian kernel and minimal variance (\(\sigma \simeq 0.2\)), we can compute \(\nabla \mathrm {{DT}}(\Gamma _{\text{ infl }})\). Finally, we duplicate and resize \(\nabla \mathrm {{DT}}(\Gamma _{\text{ infl }})\) to match the resolution of \(\nabla \rho _{\text {glob}}\) and \(\nabla \rho _{\text {loc}}\) using OpenCV. All in all, using the obstacle map requires the algorithm to compute the shape distance transform, update the initial obstacle map, and then create two new rasterized gradient maps of the DT gradient of size \(R_{\text {glob}}\) and \(R_{\text {loc}}\). Figures 10 and 12 show two application examples of obstacle-constrained labeling, which will be discussed in more detail in Sect. 6.

Generic-shape labels

So far, we just discussed the special case of rectangular uniform labels. Additionally, our approach can also be extended to the use of arbitrarily shaped labels. Here, we represent the label shape \(\Omega \) as a binary texture of size \(w\times h\) where foreground pixels delimit the label shape and background pixels are outside the shape (as depicted in Fig. 8a, dashed line). To handle generic-shape labels, we change the computation of the two layers of our collision mask \(C_\mathrm{{m}}\) as follows. The first layer is still computed during the initialization but with a unit kernel of size \((1+m_x,1+m_y)\), where \(m_x\) and \(m_y\) are the label-to-feature margins. This creates a collision area around the feature that is inflated by the size of the margins. The second layer is still computed by splatting the shape texture onto the binary collision texture, similar to the obstacle-constrained labeling (with a logical OR). We again allow for user-defined label-to-label margins by dilating or eroding the label texture, e.g., \(\Omega _{\text {infl}}=T(\Omega , \tau _\mathrm{{m}})\), where \(\tau _\mathrm{{m}}\) is a threshold value in pixels (see Fig. 8a, solid line).

Fig. 8

Example of collision test for our generic-shape extension: a initial shape of the label (solid line) and its dilated version (red dashes) taking into account label margin, b collision map centered around the position to be tested, c resulting intersection of the collision map (blue dots) with the extended label shape as a mask (red dashes). Nonzero values (black pixels) in this result indicate overlap between the shape and the collision map

To test whether a specific label can fit at a given position, the algorithm cannot just read a single pixel to determine whether or not there is enough space to put this particular label. Instead, we propose using an image-processing mask-copy operation implemented in OpenCV: For an area of the collision map of size \(w\times h\), centered around the position to test, we copy only the pixels that are true in the binary texture that holds the label shape. This process yields a new binary texture where a position x is true if and only if the collision map and the label-shape texture were true (as depicted in Fig. 8c). Finally, we only need to check whether or not the binary texture contains a true element. If it does, this means that there is not enough space to fit the label, conversely a texture without any true element allows us to place the label. Obviously, performance decreases when the complexity and size of label shapes increase. However, this can potentially be alleviated by using axis-aligned bounding boxes for the labels or reducing the resolution of the label texture as much as possible. Figure 11 shows an application example of labeling with arbitrarily shaped labels, which will be discussed in more detail in Sect. 6.

Application examples

In the following, we illustrate the versatility of our approach with three examples: the labeling of the German railway data set (Sect. 6.1), the automated placement of generic label shapes (Sect. 6.2), and the automated placement of uniform labels using an obstacle map (Sect. 6.3). These use cases outline the main capabilities and extensions of our labeling technique.

German railway

The German railway data set, consisting of 366 point features (train stations in Germany) with corresponding labels (the names of the train stations), is a commonly used benchmark for testing map labeling algorithms [27, 28, 39]. Following the approach described by Meng et al. [28], we computed the labeling using an image resolution of \(2850\times 3200\) pixels and the label size was determined using Latin letters in “Lucida Sans Unicode” font size 13.

Fig. 9

Details of our layout results for the German railway data set, for the areas surrounding Frankfurt (a) and Berlin (b)

Figures 1 and 9 show the results obtained by our approach for some of the denser regions in the data set: the metropolitan areas surrounding Dortmund (Fig. 1), Frankfurt (Fig. 9a), and Berlin (Fig. 9b). All three figures are details of the layout for the complete data set. In this test, our technique managed to place all 366 labels without any leader crossings, using a highest-density-first prioritization and the smoothed gradient descent trajectories as leaders. As can be seen most clearly in Fig. 9a, the use of a local density map ensures that labels are nicely spread around the dense areas of the map (Mainz and Frankfurt), while labels further away from this area can still move toward it (e.g., Bensheim and Heppenheim at the bottom of the figure).

Our results can be compared to the works by Meng et al. [28, Fig. 1] and Luboschik et al. [27, Fig. 9a], where the same regions of the German railway data set are shown. Compared to both these approaches, our labeling method manages to place the labels in an orderly fashion around the dense urban areas (Dortmund and Berlin agglomerations in Figs. 19b, respectively).

Fig. 10

Layout results for the German railway data set with a circular obstacle: comparison of our approach (b) to that of Luboschik et al. [27] (a)

Fig. 11

Labeling of the Berlin tourist shops data set [39] using generic-shape labels, where each shape corresponds to a category of tourist shops. The full data set contains 357 point features, here we show a close-up of one of the denser areas. We can see how labels are placed in a tight manner depending on their respective shape (see Sect. 5.2)

Fig. 12

Automated placement of labels using an obstacle map on a grass plant illustration: a manually labeled example (figure redrawn from [19]), b the extracted obstacle map, and c the labeling result using our density-based technique

Following the ideas of excentric neighborhood labeling by Fekete and Plaisant [10] and the labeling lens proposed by Luboschik et al. [27], Fig. 10b shows the result of our labeling approach with a round obstacle placed over the Frankfurt area. For comparison, the result by Luboschik et al. [27] for the same data with a lens similar to our obstacle is shown in Fig. 10a. Whereas both methods manage to still place all the labels, our approach produces a more visually pleasing result with smooth leaders and no leader crossings. It should be noted that in Fig. 10a, the additional constraint that labels should not overlap the state borders was taken into account [27]. The same effect could also be obtained with our method by adding the state borders to the obstacle map.

Glyph labeling using generic shapes

As detailed in Sect. 5.2, our approach can also be extended to labeling using arbitrarily shaped labels. We illustrate this using a data set consisting of tourist shops in Berlin [28, 39]. The original data set contains the names and coordinates of 357 shops. We assigned each of the shops to one of seven categories describing the type of shop, which we want to display using different shapes (e.g., square, triangle, circle). Figure 11 shows the results of our generic-shape labeling technique for this data set. We ran the full data set on a \(2000\times 2000\) image size and a binary texture of \(20\times 20\) pixels for each shape, and we could place 97% of the labels. Figure 11 shows a part of the resulting layout, with a close-up of one of the denser areas. The close-up clearly shows that the labels were placed based on their actual shapes rather than a rectangular bounding box, allowing them to be packed more closely together (e.g., the cross and the upward triangle at the top).

Schematics labeling using obstacles

Another application where the possibility to avoid certain areas can be particularly useful is the labeling of schematic illustrations and diagrams. Typically, the labels should be close to the features, but without overlapping important areas of the underlying drawing. Our approach is well-suited for this application, as demonstrated in Fig. 12.

Figure 12 showcases the automated labeling of an illustration of a grass plant. Figure 12a shows a manually labeled illustration. Figure 12b displays the obstacle map that was extracted from the original drawing. For this example, we created the obstacle map manually, but image analysis methods could be used to automatically create an obstacle map from any image. The label layout generated by our approach is shown in Fig. 12c. We used uniform labels and smoothed gradient descent trajectories as leaders. In our example, all labels are placed and routed toward low-density areas outside of the obstacle. Compared to the original, we see that our technique forces the labels of spatially close features to be more spread out (e.g., compare the labels around the top left bubble between Fig. 12a, c).

Number of placed labels

One criterion commonly used to evaluate labeling algorithms is the percentage of labels that can be placed. Although our main objective is to place labels nicely, rather than placing the maximum number of labels, our approach still manages to place a reasonably large percentage of the labels. To provide a base-line for comparison with other methods, we generated test data with 500 to 1500 points distributed uniformly at random. Note that this is the worst case for our algorithm, because uniformly distributed points will result in a mostly flat density map. Following Christensen et al. [6], we tested these data with an image size of \(792\times 612\) and an label size of \(30\times 7\). Table 1 shows the percentage of labels placed for different orderings of the labels. The results indicate that the number of placed labels is slightly improved if we use a density-based prioritization. For up to a 1000 points, where placing all labels would correspond to a screen space coverage of 43% excluding features and leaders, we can still place over 85% of the labels.

Table 1 Percentage of labels placed for uniformly random point sets of n points when labels are placed in a random order (random) or prioritized based on density (low first and high first)

For data sets of 1500 points, the runtime of the gradient descent step stays below 150 ms. Furthermore, our implementation using FFTW yields a runtime below 120 ms for computing a density map with a resolution of 400\(\times \)400 pixels (which could be improved using CUDA [31]).


While the application examples above show that our approach can easily be adapted and applied in various scenarios, some limitations of our method also become apparent in the results.

A drawback inherent to the density-based approach is that the gradient descent trajectories, although they cannot cross, may converge. This may lead to multiple labels aligned in a row and overlapping leaders (due to rendering and the discretization of the density map), making it hard to recognize which label belongs to which point. This problem is visible, for example, in Fig. 9b for the three points in the lower left corner, and in Fig. 12c for the two labels to the right of the lower bubble. This issue is more likely to occur as labels have to move further away from their features. One way to alleviate this problem would be to update the local density map each time a label has been placed. This would ensure that after a label is placed, its position will not be a density well in the new density map. As a result, the following labels would not move along the same trajectory. However, this would be computationally more expensive and the leader trajectories would no longer be guaranteed to be crossing-free.

Another issue relating directly to the use of density maps is that our approach is sensitive to the parameter settings for the local and global density maps, especially when there are very dense and very sparse areas. In our current implementation, these parameters have to be set manually. Extending our approach with existing methods to automatically determine the best kernel size for a given data set [18, 32] would improve its usability. Alternatively, our approach could be extended into a multilevel approach using more than two different kernel sizes. This could be coupled with simulated annealing, so local density minima without sufficient space for labels could be avoided by switching to a map with a larger kernel. Another possible approach would be to use adaptive map-resolution refinement. Similar to multilevel density maps, this would allow for more precise routing of labels toward positions with enough space.

Some aspects of the labeling problem are currently not taken into account by our approach. While overlap between labels and labels overlapping features are avoided, labels can still overlap the leaders of other labels, making it difficult to trace them in densely labeled areas. Furthermore, we only consider the labeling of point features. With some adaptations, our approach could also be extended for line feature and area labeling. Other possible extensions of the current approach that we did not yet address include support for breaking up textual labels if a label with a different aspect ratio would be easier to place [17], rotating labels to see if they can be placed in a different orientation, and keeping labels for a group of features close to each other.


We introduced and demonstrated a novel image-based approach for labeling point features. Our approach uses density maps obtained through KDE of the distribution of the point features. Labels are initialized at the positions of their corresponding points and moved following the density gradient descent until they can be placed. Point features can be connected to their labels by drawing the (smoothed) gradient descent trajectories as leaders. This results in aesthetically pleasing curved leaders without any crossings between leaders.

The fact that our method is entirely image-based makes it versatile and widely applicable, which we illustrated with three use cases. By defining parts of the image as obstacles, we can create a layout where the labels do not overlap these areas, which is particularly useful when labeling illustrations or diagrams. While our approach works fastest with uniform rectangular labels, we also support the use of arbitrarily shaped labels. Compared to the use of a uniform bounding box for all labels, generic-shape labels can be placed more tightly together, increasing the total number of labels that can be placed. The curved leaders provide an additional benefit when labeling abstract schematics, because they can be clearly distinguished from straight lines already present in such diagrams.

Our approach focuses on creating aesthetically pleasing, readable results by making use of the low-density areas in a visualization, rather than on placing the maximum number of labels. Nevertheless, we manage to place a reasonably high number of labels even for relatively dense and uniformly distributed point sets.

Multiple directions for future improvement in our technique have become apparent. Label-leader overdraw could be avoided by incorporating leaders in the density map. By modifying the kernels, different styles of labeling could be obtained, such as horizontal or vertical layouts. Finally, our two-level kernel density estimation could be extended to a multilevel approach. This would allow for more precise routing of the labels depending on the local point-feature distribution and the level of detail of the visualization.


  1. 1.

    Abramowitz, M., Stegun, I.A., et al.: Handbook of Mathematical Functions: with Formulas, Graphs, and Mathematical Tables, vol. 55. Dover Publications, New York (1972)

    Google Scholar 

  2. 2.

    Bekos, M.A., Kaufmann, M., Papadopoulos, D., Symvonis, A.: Combining traditional map labeling with boundary labeling. In: SOFSEM 2011: Theory and Practice of Computer Science, pp. 111–122. Springer, Berlin (2011)

  3. 3.

    Bekos, M.A., Kaufmann, M., Symvonis, A., Wolff, A.: Boundary labeling: models and efficient algorithms for rectangular maps. Comput. Geom. 36(3), 215–236 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  4. 4.

    Benkert, M., Haverkort, H.J., Kroll, M., Nöllenburg, M.: Algorithms for multi-criteria boundary labeling. J. Graph Algorithms Appl. 13(3), 289–317 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  5. 5.

    Bradski, G., Kaehler, A.: Learning OpenCV: Computer Vision with the OpenCV Library. O’Reilly Media, Newton (2008)

    Google Scholar 

  6. 6.

    Christensen, J., Marks, J., Shieber, S.: An empirical study of algorithms for point-feature label placement. ACM Trans. Graph. 14(3), 203–232 (1995)

    Article  Google Scholar 

  7. 7.

    Dwyer, T., Marriott, K., Stuckey, P.J.: Fast node overlap removal. In: Proceedings of 13th International Symposium on Graph Drawing (GD’05), pp. 153–164. Springer, Berlin (2006)

  8. 8.

    Epanechnikov, V.A.: Non-parametric estimation of a multivariate probability density. Theory Prob. Appl. 14(1), 153–158 (1969)

    MathSciNet  Article  Google Scholar 

  9. 9.

    Ersoy, O., Hurter, C., Paulovich, F., Cantareiro, G., Telea, A.: Skeleton-based edge bundling for graph visualization. IEEE Trans. Vis. Comput. Graph. 17(12), 2364–2373 (2011)

    Article  Google Scholar 

  10. 10.

    Fekete, J.D., Plaisant, C.: Excentric labeling: Dynamic neighbourhood labeling for data visualization. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems (CHI ’99), pp. 512–519 (1999)

  11. 11.

    Formann, M., Wagner, F.: A packing problem with applications to lettering of maps. In: Proceedings of 7th Annual Symposium on Computational Geometry (SoCG’91), pp. 281–288 (1991)

  12. 12.

    Freeman, H.: An expert system for the automatic placement of names on a geographic map. Inf. Sci. 45(3), 367–378 (1988)

    Article  Google Scholar 

  13. 13.

    Frigo, M., Johnson, S.G.: FFTW: An adaptive software architecture for the FFT. In: Proceedings of 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1381–1384 (1998)

  14. 14.

    Gansner, E.R., Hu, Y.: Efficient node overlap removal using a proximity stress model. In: Tollis, I.G., Patrignani, M. (eds.) Proceedings of 16th International Symposium on Graph Drawing (GD’08), pp. 206–217. Springer, Berlin (2009)

  15. 15.

    Gonzalez, R.C.: Digital Image Processing. Prentice Hall, Englewood Cliffs (2016)

    Google Scholar 

  16. 16.

    Hurter, C., Ersoy, O., Telea, A.: Graph bundling by kernel density estimation. Comput. Graph. Forum 31, 865–874 (2012)

    Article  Google Scholar 

  17. 17.

    Iturriaga, C., Lubiw, A.: Elastic labels around the perimeter of a map. J. Algorithms 47(1), 14–39 (2003)

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc. 91(433), 401–407 (1996)

    MathSciNet  Article  MATH  Google Scholar 

  19. 19.

    Kallenbach, R.L., Bishop-Hurley, G.J.: Dairy grazing: growth of pasture plants, vol. M182. University of Missouri Extensions, Columbia (2012)

    Google Scholar 

  20. 20.

    Kaufmann, M.: On map labeling with leaders. In: Albers, S., Alt, H., Näher, S. (eds.) Efficient Algorithms: Essays Dedicated to Kurt Mehlhorn on the Occasion of His 60th Birthday, pp. 290–304. Springer, Berlin (2009)

    Google Scholar 

  21. 21.

    Kindermann, P., Niedermann, B., Rutter, I., Schaefer, M., Schulz, A., Wolff, A.: Multi-sided boundary labeling. Algorithmica 76(1), 225–258 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    Kouřil, D., Čmolík, L., Kozlíková, B., Wu, H., Johnson, G., Goodsell, D.S., Olson, A., Gröller, M.E., Viola, I.: Labels on levels: labeling of multi-scale multi-instance and crowded 3D biological environments. IEEE Trans. Vis. Comput. Graph. 25(1), 977–986 (2019)

    Article  Google Scholar 

  23. 23.

    Lampe, O.D., Hauser, H.: Interactive visualization of streaming data with kernel density estimation. In: Proceedings of 2011 IEEE Pacific Visualization Symposium (PacificVis), pp. 171–178 (2011)

  24. 24.

    Lhuillier, A.: Bundling: a clutter reduction technique and its application to Alzheimer study. Ph.D. thesis, Université Paul Sabatier-Toulouse III (2017)

  25. 25.

    Lhuillier, A., Hurter, C., Telea, A.: FFTEB: edge bundling of huge graphs by the fast Fourier transform. In: Proceedings of 2017 IEEE Pacific Visualization Symposium (PacificVis), pp. 190–199 (2017)

  26. 26.

    Löffler, M., Nöllenburg, M., Staals, F.: Mixed map labeling. In: Paschos, V.T., Widmayer, P. (eds.) Algorithms and Complexity, pp. 339–351 (2015)

  27. 27.

    Luboschik, M., Schumann, H., Cords, H.: Particle-based labeling: fast point-feature labeling without obscuring other visual features. IEEE Trans. Vis. Comput. Graph. 14(6), 1237–1244 (2008)

    Article  Google Scholar 

  28. 28.

    Meng, Y., Zhang, H., Liu, M., Liu, S.: Clutter-aware label layout. In: Proceedings of 2015 IEEE Pacific Visualization Symposium (PacificVis), pp. 207–214 (2015)

  29. 29.

    Misue, K., Eades, P., Lai, W., Sugiyama, K.: Layout adjustment and the mental map. J. Vis. Lang. Comput. 6(2), 183–210 (1995)

    Article  Google Scholar 

  30. 30.

    Oeltze-Jafra, S., Preim, B.: Survey of labeling techniques in medical visualizations. In: Proceedings of 4th Eurographics Workshop Visual Computing Biology Medicine (VCBM’14), pp. 199–208 (2014)

  31. 31.

    Podlozhnyuk, V.: FFT-Based 2D Convolution. NVIDIA white paper (2007)

  32. 32.

    Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B (Methodol.) 53(3), 683–690 (1991)

    MathSciNet  MATH  Google Scholar 

  33. 33.

    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)

    Google Scholar 

  34. 34.

    Telea, A., van Wijk, J.J.: An augmented fast marching method for computing skeletons and centerlines. In: Proceedings of 2002 Symposium on Data Visualisation, pp. 251–258 (2002)

  35. 35.

    van Liere, R., de Leeuw, W.: GraphSplatting: visualizing graphs as continuous fields. IEEE Trans. Vis. Comput. Graph. 9(2), 206–212 (2003)

    Article  Google Scholar 

  36. 36.

    van der Zwan, M., Codreanu, V., Telea, A.: CUBu: universal real-time bundling for large graphs. IEEE Trans. Vis. Comput. Graph. 22(12), 2550–2563 (2016)

    Article  Google Scholar 

  37. 37.

    van Garderen, M.: Automated map labelling. In: Pictures of the past—visualization and visual analysis in archaeological context, chap. 6, pp. 155–174. Doctoral dissertation, University of Konstanz (2018)

  38. 38.

    Wagner, F., Wolff, A.: A combinatorial framework for map labeling. In: Proceedings of 6th International Symposium on Graph Drawing (GD’98), pp. 316–331 (1998)

  39. 39.

    Wolff, A., Strijk, T.: The map-labeling bibliography. (2009)

  40. 40.

    Zinsmaier, M., Brandes, U., Deussen, O., Strobelt, H.: Interactive level-of-detail rendering of large graphs. IEEE Trans. Vis. Comput. Graph. 18(12), 2486–2495 (2012)

    Article  Google Scholar 

  41. 41.

    Zoraster, S.: Integer programming applied to the map label placement problem. Cartograph. Int. J. Geograp. Inf. Geovis. 23(3), 16–27 (1986)

    Google Scholar 

Download references


MvG was funded by the European Research Council (ERC) under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC Grant agreement No. 319209 (project NEXUS 1492).

Author information



Corresponding author

Correspondence to Mereke van Garderen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lhuillier, A., van Garderen, M. & Weiskopf, D. Density-based label placement. Vis Comput 35, 1041–1052 (2019).

Download citation


  • Automated label placement
  • Image-based information visualization
  • Kernel density estimation