## Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## 1 Introduction

There are two main variants of force-directed layout methods, expressed either in terms of forces to balance or an energy function to minimize [3, 25]. For convenience, we refer to the former as spring embedders and to the latter as multidimensional scaling (MDS) methods.

Force-directed layout methods are in wide-spread use and of high practical significance, but their scalability is a recurring issue. Besides investigations into adaptation, robustness, and flexibility, much research has therefore been devoted to speed-up methods [20]. These efforts address, e.g., the speed of convergence [10, 11] or the time per iteration [1, 17]. Generally speaking, the most scalable methods are based on multi-level techniques [13, 18, 21, 35].

Experiments [5] suggest that minimization of the stress function [27]

\begin{aligned} s(x) = \sum _{i<j}w_{ij}(||x_i-x_j|| - d_{ij})^2 \end{aligned}
(1)

is the primary candidate for high-quality force-directed layouts $$x\in (\mathbb {R}^{2})^V$$ of a simple undirected graph $$G=(V,E)$$ with $$V=\{1,\ldots ,n\}$$ and $$m=|E|$$. The target distances $$d_{ij}$$ are usually chosen to be the graph-theoretic distances, the weights set to $$w_{ij} = 1/d_{ij}^2$$, and the dominant method for minimization is majorization [16]. Several variant methods reduce the cost of evaluating the stress function by involving only a subset of node pairs over the course of the algorithm [6, 7, 13]. If long distances are represented well already, for instance because of initialization with a fast companion algorithm, it has been suggested that one restrict further attention to short-range influences from k-neighborhoods only [5].

We here propose to stabilize the sparse stress function restricted to 1-neighborhoods [5] with aggregated long-range influences inspired by the use of Barnes & Hut approximation [1] in spring embedders [33]. Extensive experiments suggest how to determine representatives for individually weak influences, and that the resulting method represents a favorable compromise between efficiency and quality.

Related work is discussed in more detail in the next section. Our approach is derived in Sect. 3, and evaluated in Sect. 4. We conclude in Sect. 5.

## 2 Related Work

While we are interested in approximating the full stress model of Eq. (1), there are other approaches capable of dealing with given target distances such as the strain model [4, 24, 32] or the Laplacian [19, 26].

An early attempt to make the full stress model scale to large graphs is GRIP [13]. Via a greedy maximal independent node set filtration, this multi-level approach constructs a hierarchy of more and more coarse graphs. While a sparse stress model calculates the layout of the coarsened levels, the finest level is drawn by a localized spring-embedder [11]. Given the coarsening hierarchy for graphs of bounded degree, GRIP requires $$\mathcal {O}(nk^2)$$ time and $$\mathcal {O}(nk)$$ space with $$k = \log \max \{d_{ij}: i,j \in V\}$$.

Another notable attempt has been made by Gansner et al. [15]. Like the spring embedder the maxent-model is split into two terms:

\begin{aligned} \sum _{\{i,j\} \in E} w_{ij}(||x_i-x_j|| - d_{ij})^2 - \alpha \sum _{\{i,j\} \not \in E} \log ||x_i-x_j|| \end{aligned}

The first part is the 1-stress model [4, 13], while the second term tries to maximize the entropy. Applying Barnes & Hut approximation technique [1], the running time of the maxent-model can be reduced from $$\mathcal {O}(n^2)$$ per iteration to $$\mathcal {O}(m+n\log n)$$, e.g., using quad-trees [30, 34]. In order to make the maxent-model even more scalable Meyerhenke et al. [28] embed it into a multi-level framework, where the coarsening hierarchy is constructed using an adapted size-constrained label propagation algorithm.

Gansner et al. [14], inspired by the idea of decomposing the stress model into two parts, proposed COAST. The main difference between COAST and maxent is that it adds a square to the two terms in the 1-stress part and that the second term is quadratic instead of logarithmic. Transforming the energy system of COAST allows one to apply fast-convex optimization techniques making its running time comparable to the maxent model.

While all these approaches somewhat steer away from the stress model, MARS [23] tries to approximate the solution of the full stress model. Building on a result of Drineas et al. [9], MARS requires only $$k \ll n$$ instead of n single-source shortest path computations. Reconstructing the distance matrix from two smaller matrices and by setting $$w_{ij} = 1/d_{ij}$$, MARS runs in $$\mathcal {O}(kn+n\log n+ m)$$ per iteration with a preprocessing time in $$\mathcal {O}(k^3 + k(m+n \log n + k^2n))$$, and a space requirement in $$\mathcal {O}(nk)$$.

## 3 Sparse Stress Model

The full stress model, Eq. (1), is in our opinion the best choice to draw general graphs, not least because of its very natural definition. However, its $$\mathcal {O}(n^2)$$ running time per iteration and space requirement, and expensive processing time of $$\mathcal {O}(n(m+n\log n))$$, hamper its way into practice.

The reason sparse stress models are still in early stages of development is that the adaption to large graphs requires not just a reduction in the running time per iteration, but also the preprocessing time and its associated space requirement. Where these problems originate from is best explained by rewriting Eq. (1) to the following form:

\begin{aligned} s(x) = \sum _{\{i,j\} \in E} w_{ij}(||x_i-x_j|| - d_{ij})^2 + \sum _{\{i,j\} \in {V \atopwithdelims ()2} \setminus E} w_{ij}(||x_i-x_j|| - d_{ij})^2 \end{aligned}
(2)

As minimizing the first term only requires $$\mathcal {O}(m)$$ computations and all $$d_{ij}$$ are part of the input, solving this part of the stress model can be done efficiently. Yet, the second term requires an all-pairs shortest path computation (APSP), $$\mathcal {O}(n^2)$$ time per iteration, and in order to stay within this bound $$\mathcal {O}(n^2)$$ additional space. We note that the 1-stress approaches presented in Sect. 2 of Gajer et al. [13] and Brandes and Pich [4] ignore the second term, while Gansner et al. [14, 15] replace it. Discounting the problems arising from the APSP computation, we can see that the spring embedder suffered from exactly the same problem, namely the computation of the second term – there called repulsive forces. Barnes & Hut introduced a simple, yet ingenious and efficient solution, namely to approximate the second term by using only a subset of its addends.

To approximate the repulsive forces operating on node i Barnes & Hut partition the graph. Associated with each of these $$\mathcal {O}(\log n)$$ partitions is an artificial representative, a so called super-node, used to approximate the repulsive forces of the nodes in its partition affecting i. However, as these super-nodes have only positions in the euclidean space, but no graph-theoretic distance to any node in the graph they cannot be processed in the stress model. Furthermore, deriving a distance for a super-node as a function of the graph-theoretic distance of the nodes it represents would require an APSP computation, which is too costly, and since the partitioning is computed in the layout space, probably not a good approximation. Choosing a node from the partition as a super-node would not solve the problems, not least because the partitioning changes over time.

Therefore, adapting this approach cannot be done in a straightforward manner. However, the model we are proposing sticks to its main ideas. In order to reduce the complexity of the second term in Eq. (2), we restrict the stress computation of each $$i \in V$$ to a subset $$\mathcal {P}\subseteq V$$ of $$k = |\mathcal {P}|$$ representatives, from now on called pivots. The resulting sparse stress model, where N(i) are the neighbors of i and $$w_{ip}'$$ are adapted weights, has the following form:

\begin{aligned} s'(x) = \sum _{\{i,j\} \in E} w_{ij}(||x_i-x_j|| - d_{ij})^2 + \sum _{i \in V}\sum _{p \in \mathcal {P}\setminus N(i)} w'_{ip}(||x_i-x_p|| - d_{ip})^2 \end{aligned}
(3)

Note that GLINT [22] uses a similar function, yet the pivots change in each iteration, no weights are involved, and it is assumed that $$d_{ip}$$ is accessible in constant time.

Just like Barnes & Hut, we associate with each pivot $$p \in \mathcal {P}$$ a set of nodes $$\mathcal {R}(p) \subseteq V$$, where $$p \in \mathcal {R}(p), \bigcup _{p \in \mathcal {P}} \mathcal {R}(p) = V$$, and $$\mathcal {R}(p) \cap \mathcal {R}(p') = \emptyset$$ for $$p,p' \in \mathcal {P}$$. However, we propose to use only one global partitioning of the graph that does not change over time. Still, just like the super-nodes, we want that the pivots are representative for their associated region. In terms of the localized stress minimization algorithm [16] this means we want that for each $$i \in V$$ and $$p \in \mathcal {P}$$

\begin{aligned} \frac{\sum _{j \in \mathcal {R}(p)} w_{ij} (x_j^{\alpha } + d_{ij}(x_i^{\alpha } - x_j^{\alpha }) / ||x_i-x_j||)}{\sum _{j \in \mathcal {R}(p)}w_{ij}} \approx x_p^{\alpha } + \frac{d_{ip}(x_i^{\alpha } - x_p^{\alpha })}{||x_i-x_p||}, \end{aligned}

where $$\alpha$$ is the dimension. As the left part is the weighted average of all positional votes of $$j \in \mathcal {R}(p)$$ for the new position of i, we require p to fulfill the following requirements in order to be a good representative:

• The graph-theoretic distances to i from all $$j \in \mathcal {R}(p)$$ should be similar to $$d_{ip}$$

• The positions of $$j \in \mathcal {R}(p)$$ in $$x$$ should be well distributed in close proximity around p.

We propose to construct the partitioning induced by $$\mathcal {R}$$ only based on the graph structure, not on the layout space, and associate each node $$v \in V$$ with $$\mathcal {R}(p)$$ of the closest pivot subject to their graph-theoretic distance. As our algorithm incrementally constructs $$\mathcal {R}$$, ties are broken by favoring the currently smallest partition. Given the case that $$\mathcal {P}$$ has been chosen properly and since all nodes in $$\mathcal {R}(p)$$ are at least as close to p as to any other pivot, and consequently in the stress drawing, it is appropriate to assume that both conditions are met.

Even if the positional vote of each pivot is optimal w.r.t. $$\mathcal {R}(p)$$, it is still not enough to approximate the full stress model. In the full stress model the iterative algorithm to minimize the stress moves one node at a time while fixing the rest. By setting node i’s position in dimension $$\alpha$$ to

\begin{aligned} x_i^{\alpha } = \frac{\sum _{j \not = i} w_{ij} (x_j^{\alpha } + d_{ij}(x_i^{\alpha } - x_j^{\alpha }) / ||x_i-x_j||)}{\sum _{j \not = i}w_{ij}}, \end{aligned}

it can be shown that the stress monotonically decreases [16]. However, in our model we move node i according to

\begin{aligned} x_i^{\alpha } = \frac{\sum _{j \in N(i)} w_{ij} \left( x_j^{\alpha } + \frac{d_{ij}(x_i^{\alpha } - x_j^{\alpha })}{||x_i-x_j||} \right) + \sum _{p \in \mathcal {P}\setminus N(i)} w_{ip}' \left( x_p^{\alpha } + \frac{d_{ip}(x_i^{\alpha } - x_p^{\alpha })}{||x_i-x_p||}\right) }{\sum _{j \in N(i)}w_{ij}+ \sum _{p \in \mathcal {P}\setminus N(i)}w_{ij}'}. \end{aligned}
(4)

This implies that in order to find the globally optimal position of i we furthermore have to find weights $$w_{ip}'$$, such that $$\frac{w'_{ip}}{\sum _{j\in N(i)} w_{ij} + \sum _{p \in \mathcal {P}\setminus N(i)} w'_{ip}} \approx \frac{\sum _{j\in \mathcal {R}(p)} w_{ij}}{\sum _{i \not = j}w_{ij}}$$. Since our goal is only to reconstruct the proportions, and our model only knows the shortest-path distance between all nodes $$i \in V$$ and $$p \in \mathcal {P}$$, we set $$w_{ip}' = s/d_{ip}^2$$ where $$s \ge 1$$. At the first glance setting $$s = |\mathcal {R}(p)|$$ seems appropriate, since p represents $$|\mathcal {R}(p)|$$ addends of the stress model. Nevertheless, this strongly overestimates the weight of close partitions. Therefore, we propose to set $$s = |\{j \in \mathcal {R}(p) : d_{jp} \le d_{ip} /2\}|$$. This follows the idea that p is only a good representative for the nodes in $$\mathcal {R}(p)$$ that are at least as close to p as to i. Since the graph-theoretic distance between i and $$j \in \mathcal {R}(p)$$ is unknown, our best guess is that j lies on the shortest path from p to i. Consequently, if $$d_{jp}\le d_{ip} / 2$$ node j must be at least as close to p as to i. Note that $$w_{pp'}'$$ does not necessarily equal $$w_{p'p}'$$ for $$p,p' \in \mathcal {P}$$, and if $$k = n$$ our model reduces to the full stress model.

Asymptotic Running Time: To minimize Eq. (3) in each iteration we displace all nodes $$i \in V$$ according to Eq. (4). Since this requires $$|N(i)| + k$$ constant time operations, given that all graph-theoretic distances are known, the total time per iteration is in $$\mathcal {O}(kn+m)$$. Furthermore, only the distances between all $$i\in V$$ and $$p\in \mathcal {P}$$ have to be known, which can be done in $$\mathcal {O}(k(m + n\log n))$$ time and requires $$\mathcal {O}(kn)$$ additional space. If the graph-theoretic distances for all $$p \in \mathcal {P}$$ are computed with a multi-source shortest path algorithm (MSSP), it is possible to construct $$\mathcal {R}$$ as well as calculate all $$w_{ip}'$$ during its execution without increasing its asymptotic running time. The full algorithm to minimize our sparse stress model is presented in Algorithm 1.

## 4 Experimental Evaluation

We report on two sets of experiments. The first is concerned with the evaluation of the impact of different pivot sampling strategies. The second set is designed to assess how well the different sparse stress models approximate the full stress model, in both absolute terms and in relation to the speed-up achieved.

For the experiments we implemented the sparse stress model, Algorithm 1, as well as different sampling techniques in Java using Oracle SDK 1.8 and the yFiles 2.9 graph library (www.yworks.com). The tests were carried out on a single 64-bit machine with a 3.60 GHz quad-core Intel Core i7-4790 CPU, 32 GB RAM, running Ubuntu 14.10. Times were measured using the System.currentTimeMillis() command. The reported running times were averaged over 25 iterations. We note here that all drawing algorithms, except stated otherwise, were initialized with a 200 PivotMDS layout [4]. Furthermore, the maximum number of iterations for the full stress algorithm was set to 500. As stress is not resilient against scaling, see Eq. (1), we optimally rescaled each drawing such that it creates the lowest possible stress value [2].

Data: We conducted our experiments on a series of different graphs, see Tab. 1, most of them taken from the sparse matrix collection [8]. We selected these graphs as they differ in their structure and size, and are large enough to compare the results of different techniques. Two of the graphs, LeHavre and commanche, have predefined edge lengths that were derived from the node coordinates. We did not modify the graphs in any way, except for those that were disconnected. In this case we only kept the largest component.

### 4.1 Sampling Evaluation

In Sect. 3 we discussed how vital the proper selection of the pivots is for our model. In the optimal case we would sample pivots that are well distributed over the graph, creating regions of equal complexity, and are central in the drawing of their regions. In order to evaluate the impact of different sampling strategies on the quality of our sparse stress model and recommend a proper sampling scheme, we compared a set of different strategies:

• random: nodes are selected uniformly at random

• MIS filtration: nodes are sampled according to the maximal independent set filtration algorithm by Gajer et al. [13]. Once $$n \le k$$ the coarsening stops. If $$n < k$$, unsampled nodes from the previous level are randomly added

• max/min euclidean: starting with a uniformly randomly chosen node, $$\mathcal {P}$$ is extended by adding $${{\mathrm{arg\,max}}}_{i \in V \setminus \mathcal {P}} \min _{p \in \mathcal {P}}||x_i-x_p||$$

• max/min sp: similar to max/min euclidean except that $$\mathcal {P}$$ is extended according $${{\mathrm{arg\,max}}}_{i \in V \setminus \mathcal {P}} \min _{p \in \mathcal {P}}d_{ip}$$ [4]

Pretests showed that the max/min sp strategy initially favors sampling leaves, but nevertheless produces good results for large k. Thus, we also evaluated strategies building on this idea, yet try to overcome the problem of leaf node sampling.

• max/min random sp: similar to max/min sp, yet each node i is sampled with a probability proportional to $$\min _{p \in \mathcal {P}}d_{ip}$$

• k-means layout: the nodes are selected via a k-means algorithm, running at most 50 iterations, on the initial layout

• k-means sp: initially k nodes with max/min sp are sampled succeeded by k-means sampling using the shortest path entries of these pivots

• k-means + max/min sp: $$\mathcal {P}$$ is initialized with k / 2 pivots via k-means layout and the remaining nodes are sampled via max/min sp

To quantify how well suited each of the sampling techniques is for our model, we ran each combination on each graph with $$k \in \{50,51,\dots ,200\}$$ pivots. For all tests the sparse stress algorithm terminated after 200 iterations. Since all techniques at some point rely on a random decision, we repeated each execution 20 times in order to ensure we do not rest our results upon outliers. To distinguish the applicability of the different techniques to our model, we used two measures. The first measure is the normalized stress, which is the stress value divided by $${n \atopwithdelims ()2}$$. While the normalized stress measures the quality of our drawing, we also calculated the Procrustes statistic, which measures how well the layout matches the full stress drawing [31]. The range of the Procrustes statistic is [0, 1], where 0 is the optimal match.

The results of these experiments for some of the instances are presented in Figs. 1 and 2 (see the Appendix in [29] for the full set of data). In these plots each dot represents the median and each line starts at the 25%, 75% percentile and ends at the 5%, 95% percentile, respectively. For the sake of readability we binned each 25 consecutive sample sizes. Furthermore, the strategies were ordered according to their overall ranking w.r.t. the evaluated measure. For most of the graphs using k-means sp sampling yields the layouts with the lowest normalized stress value. There are only two graphs where this strategy performs worse than other tested strategies. The one graph where k-means sp is outclassed, yet only for large k by max/min sp, is pesa. The reason for this result is that k-means sp mainly samples pivots in the center of the left arm, see Table 4, creating twists. Max/min sp for small k in contrast mostly samples nodes on the contour of the arm, yet once k reaches a certain threshold the resulting distribution of the pivots prevents twists, yielding a lower normalized stress value.

The explanation of the poor behavior for lpship04l is strongly related to its structure. The low diameter of 13 causes, after a few iterations, the max/min sp strategy to repeatedly sample nodes that are part of the same cluster, see Table 4, and consequently are structurally very similar. As k-means sp builds on max/min sp, it can only slightly improve the pivot distribution. The argument that the problem is related to the structure is reinforced by the outcome of the random strategy. Still, except for these two graphs k-means sp generates the best outcomes, and since this strategy is also strongly favorable over the others subject to the Procrustes statistics, see Fig. 2, our following evaluation always relies on this sampling strategy. However, we note that the Procrustes statistic for btree and lpship04l are by magnitudes larger than for any other tested graph. While for lpship04l this is mostly caused by the quality of the drawings, this is only partly true for btree. The other factor contributing to the high Procrustes statistic for btree is caused by the restricted set of operations provided by the Procrustes analysis. As dilation, translation, and rotation are used to find the best match between two layouts, the Procrustes analysis cannot resolve reflections. Therefore, if in the one layout of btree, the subtree $$T_1$$ of v is drawn to the right of subtree $$T_2$$ of v and vice versa in the second drawing, although the two layouts are identical, the statistic will be high. This symmetry problem mainly explains the low performance w.r.t. btree.

### 4.2 Full Stress Layout Approximation

The next set of experiments is designed to assess how well our sparse stress model using k-means sp sampling, as well as related sparse stress techniques, resembles the full stress model. For this we compared the median stress layout over 25 repetitions on the same graph of our sparse stress model with $$k \in \{50,100,200\}$$, with MARS,Footnote 1 maxent,Footnote 2 PivotMDS, 1-stress, and the weighted version of GRIP.Footnote 3 The number of iterations of our model as well as for MARS and 1-stress have been limited to 200. Furthermore, we tested MARS with 100 and 200 pivots and report the layout with the smallest stress from the drawings obtained by running mars with argument -p $$\in \{1,2\}$$ combined with a PivotMDS or randomly initialized layout.

Besides comparing the resulting stress values and Procrustes statistics, we compared the distribution of pairwise euclidean distances subject to their graph-theoretic distances. Since the Procrustes statistic has problems with symmetries, as we pointed out in the previous subsection, we propose to evaluate the similarity of the sparse stress layouts with the full stress layout via Gabriel graphs [12]. The Gabriel graph of a given layout $$x$$ contains an edge between a pair of points if and only if the disc associated with the diameter of the endpoints does not contain any other point. Since the treatment of identical positions is not defined for Gabriel Graphs, we resolve this by adding edges between each pair of identical positions. We assess the similarity between the Gabriel Graph of the full stress layout and the sparse stress layouts by comparing the k-neighborhoods of a node in the graphs using the Jaccard coefficient.

A further measure we introduce evaluates the visual error. More precisely we measure for a given node v the percentage of nodes that lie in the drawing area of the k-neighborhood, but are not part of it. We calculate this value by computing the convex hull induced by the k-neighborhood and then test for each other node if it belongs to the hull or not. This number is then divided by $$n - |\{w \in V | d_{vw} \le k\}|$$. Therefore, a low value implies that there are only a few nodes lying in the region, while high values indicate we cannot distinguish non k-neighborhood and k-neighborhood nodes in the drawing. This measure is to a certain extend similar to the precision of neighborhood preservation [15].

The results of all these experiments, see Tables 2 and 4, Figs. 3 and 4, and the Appendix in [29], reveal that our model is more adequate in resembling the full stress drawing than any other of the tested algorithm, while showing comparable running times that scale nicely with k, cf. Table 3. The error plots in Table 4 expose the strength of our approximation scheme. We can see that, while all approaches work very well in representing short distances, our approach is more precise in approximating middle and especially long distances, explaining our good results. As the evaluation clearly shows that our approach yields better approximations of the full stress model, we rather want to discuss the low performance of our model for lpship04l and thereby expose one weakness of our approach.

Looking at the sparse 50 drawing of lpship04l in Table 4, we can see that a large portion of nodes share a similar or even the same position. This is because lpship04l has a lot of nodes that share very similar graph-theoretic distance vectors, exhibit highly overlapping neighborhoods, and are drawn in close proximity in the initial PivotMDS layout. While our model would rely on small variations of the graph-theoretic distances to create a good drawing we diminish these differences even further by restricting our model to $$\mathcal {P}$$. Consequently, the positional vote for two similar non-pivot nodes i and j that lie in the same partition will only slightly differ, mainly caused by their distinct neighbors. However, as these neighbors are also in close proximity in the initial drawing of lpship04l the distance between i and j will not increase. Therefore, if the graph has a lot of structurally very similar nodes and the initial layout has poor quality, our approach will inevitably create drawings where nodes are placed very close to one another.

## 5 Conclusion

In this paper we proposed a sparse stress model that requires $$\mathcal {O}(kn+m)$$ space and time per iteration, and a preprocessing time of $$\mathcal {O}(k(m+n\log n))$$. While Barnes & Hut derive their representatives from a given partitioning, we argued that for our model it is more appropriate to first select the pivots and then to partition the graph only relying on its structure. Since the approximation quality heavily depends on the proper selection of these pivots, we evaluated different sampling techniques, showing that k-means sp works very well in practice.

Furthermore, we compared a variety of sparse stress models w.r.t. their performance in approximating the full stress model. We therefore proposed two new measures to assemble the similarity between two layouts of the same graph. For the tested graphs, all our experiments clearly showed that our proposed sparse stress model exceeds related approaches in approximating the full stress layout without compromising the computation time.