Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Several of the earliest papers on Graph Drawing (for example, [2123]) discussed requirements for a “good” visualization of a graph. For example, Tamassia et al. [22] state:

Aesthetics : We use the term aesthetics to denote the criteria that concern certain aspects of readability. A well-admitted aesthetics, valid independently from the graphic standard, is the minimisation of crossings between edges. Also, in order to avoid unnecessary waste of space, it is usual to keep the area occupied by the drawing reasonably small. When the grid standard is adopted, it is meaningful to minimize the number of bends (turns) along the edges, as well as their total length.

We prefer the term quality metric rather than “aesthetics”. The underlying and often unstated assumption that these geometric properties of layout measure the “goodness” of a graph drawing was unchallenged until the experiments of Purchase [20]. These experiments showed that task performance is correlated with some of the previously defined quality metrics. A conclusive result was that human task times and error rates were both correlated with the number of edge crossings. Subsequent experiments have confirmed and refined these initial results [11, 1719, 25]. All these early experiments used relatively small graphs as stimuli; human experiments with larger graphs began recently [13, 14]. In particular it has been pointed out that edges and vertices become “blobs” in large graph drawings such as the biological network in Fig. 1; almost all the edge crossings are hidden in the blobs. Any causal relationship between readability and the number of edge crossings seems unlikely. In this paper we propose a quality metric for large drawings such as Fig. 1.

Fig. 1.
figure 1

Crossings can be hidden in a drawing of a large graph.

Although it is seldom explicitly stated as a quality metric for graph drawing, stress is often used as such. There are various measures of stress (for example, see [5, 6, 8, 10]); the most commonly used is to define the stress in a drawing D of a connected graph \(G = (V, E)\) as \(\sum _{u,v \in V} w_{uv} \left( d_G (u,v) - d_{\mathfrak {R}^2} (D(u),D(v)) \right) ^2\), where \(d_G (u,v)\) is the graph theoretic distance between u and v, \(d_{\mathfrak {R}^2} (D(u),D(v))\) is the Euclidean distance between the locations D(u) and D(v) of u and v, and \(w_{uv}\) is a constant. Stress appears to measure the faithfulness of a graph drawing [15] rather than its readability. For example, a low value of the stress in a drawing indicates that the Euclidean distances between vertices are (approximately) proportional to the graph-theoretic distances in the graph.

Quality metrics are significant simply because they measure success or failure of a graph drawing method. Most importantly, they are used as optimisation goals in graph drawing algorithms. Methods that aim to draw graphs with a small number of crossings are well established in the literature. Stress minimization algorithms, in one form or another, are by far the most popular methods for drawing undirected graphs.

This paper proposes a new family of quality metrics for graph visualization, especially for large graph drawings. Here, by “large”, we mean that the graphs are large enough to make “blobs” such as in Fig. 1 inevitable. This includes dense graphs with at least a few hundred vertices and well as sparse graphs with at least a few thousand vertices.

The proposed metrics are based on the notion of the “shape” of a set of points in \(\mathfrak {R}^2\). Our proposal is that a drawing is good if the shape of the set of vertex positions is similar to the original graph.

In Sect. 2, we describe this notion more precisely and illustrate with examples. In Sect. 3 we give some empirical indication that the metrics are valid, based on data sets from previous experiments [1, 14]. Section 4 concludes with open problems.

2 Shape-Based Metrics

Figure 2 summarises our proposal. The quality of a drawing D of a graph G is the similarity between G and the “shape” of the set of vertex locations of D. The “shape” is expressed as a graph, called a “shape graph”. To make these notions more precise, we need to examine the notion of the shape of a set of points, and the notion of similarity between two graphs.

Fig. 2.
figure 2

Shape-based quality metrics.

2.1 Shape as a Graph

Informally, a shape graph for a set of points P is a geometric graph \(G = (P, E)\) that captures the “shape” of P. The classical example of a shape graph is the \(\alpha \) -shape [3]; however, \(\alpha \)-shapes capture the shape of the boundary of P, and not the internal structure of P. Another kind of shape graph is a “proximity graph”: an edge is placed between two points \(p , q \in P\) if p is “close to” q in some sense. There are many kinds of proximity graphs (see [24]); some examples are below.

  • The k -nearest neighbours graph has a (directed) edge from point \(p \in P\) to point \(q \in P\) if the number of points \(r \in P\) with \(d(p,r) < d(p, q)\) is at most \(k-1\).

  • The Delaunay triangulation: the dual of the Voronoi diagram on P.

  • The Gabriel graph (GG) has an edge between distinct points \(p , q \in P\) if the closed disc which has the line segment pq as a diameter contains no other elements of P.

  • The relative neighbourhood graph (RNG) has an edge between distinct points \(p , q \in P\) if there is no point \(r \in P\) such that \(d(p,r) \le d(p,q)\) and \(d(q,r) \le d(p,q)\).

  • A Euclidean minimum spanning tree (EMST) is a minimum spanning tree of P where the weight of the edge between each pair of points is the Euclidean distance.

Each of these shape graphs can be computed in \(O(n \log n)\) time using standard methods [16]. In Sect. 3 below, we examine quality metrics based on the Euclidean minimum spanning tree, the Gabriel graph, and the relative neighborhood graph respectively. However, our remarks apply in principle to any shape graph.

2.2 Graph Similarity

Suppose that \(G_1 = (V , E_1) \) and \(G_2= (V , E_2) \) are two graphs with the same vertex set. A simple measure for the similarity of \(G_1\) and \(G_2\) is the Jaccard sum similarity:

$$\begin{aligned} JSS( G_1 , G_2 ) = \sum _{u \in V} \frac{ | N_1 (u) \cap N_2 (u) | }{ | N_1 (u) \cup N_2 (u) | }, \end{aligned}$$
(1)

where \(N_i(u)\) is the set of neighbors of u in \(G_i\) for \(i = 1 , 2\). It is straight-forward to compute the Jaccard sum similarity in linear time.

More complex measures for graph similarity include graph edit distance [7], and measures based on the notion that the similarity of two vertices u and \(u'\) depends on the similarity of their neighbours (see, for example, [12]). However, these metrics are computationally expensive and do not scale beyond a few thousand vertices; we have found that the Jaccard sum similarity is adequate for our purposes.

2.3 The Metrics

We can now define our proposed metrics. Suppose that D is a drawing of a graph G; we want to measure the quality of D. Let P denote the set of vertex locations of D, and suppose that \(\mu \) is a shape graph function (that is, \(\mu \) takes a set of points and produces a shape graph on this set of points). Further, let \(\eta \) be a graph similarity function, that is, \(\eta \) takes two graphs as input and returns a positive real number that indicates the similarity between these two graphs. Then we define the quality metric \(Q_{\mu ,\eta }\) by \(Q_{\mu ,\eta } ( D ) = \eta ( G , \mu (P) )\).

These metrics are, in spirit, related to the “graph theoretic scagnostics” approach to scatterplots (see [26]).

The “neighborhood inconsistency” [6] and “neighborhood preservation precision” [5, 6] metrics used by Gansner et al. are also related, especially when the shape graph \(\mu \) is a kind of nearest neighbor graph. These two metrics have a different motivation to ours: rather than measure the general notion of shape, they attempt to measure whether neighbours in the layout coincide with neighbours in the graph. Nevertheless, we can regard the “neighborhood inconsistency” as an example of a shape-based metric when the shape graph \(\mu \) is a k-nearest neighbor graph, and the similarity function \(\eta \) is based on the “stochastic neighbor embedding” of Hinton and Roweis [9].

Throughout this paper we use the Jaccard sum similarity for graph similarity, and so we abbreviate \(Q_{\mu ,\eta }\) to \(Q_\mu \). The time to compute \(Q_{\mu }\) depends on the choice of \(\mu \); for all such choices \(\mu \) described in this paper, \(Q_{\mu }\) can be computed in time \(O(n \log n )\).

2.4 An Example

Although our proposal is aimed at large graphs, this example uses a smaller graph so that it is easier to understand. Consider the graph drawing \(D_0\) in Fig. 3(a). The set \(P_0\) of vertex locations of \(D_0\) is shown in Fig. 3(b). A Euclidean minimum spanning tree \(T_0\) on \(P_0\) is shown in Fig. 3(c).

Fig. 3.
figure 3

(a) A graph drawing \(D_0\). (b) The set \(P_0\) of vertex locations of \(D_0\). (c) A Euclidean minimum spanning tree \(T_0\) on \(P_0\).

Fig. 4.
figure 4

The drawing \(D_\delta \) in the second column is formed from the drawing \(D_0\) in Fig. 3 by moving each vertex in a random direction by a random distance in the range \([ 0 , \delta ]\). The graph \(T_\delta \) in the third column is a Euclidean minimum spanning tree of the vertex locations of \(D_\delta \).

Our proposal is that the quality \(Q_{EMST} (D_0)\) of the graph drawing \(D_0\) can be measured as the similarity between the (combinatorial) graphs in Fig. 3(a) and (c). Using the Jaccard sum similarity in Eq. (1), we can calculate the value \(Q_{EMST} (D_0)\) as 0.61. The comparatively high value of \(Q_{EMST} (D_0)\) expresses the fact that for each vertex u the neighbors of u in the shape graph \(T_0\) overlap considerably with the neighbors of u in G. Intuitively, the graph drawing \(D_0\) is a reasonably faithful representation of the graph G, in that the “shape” of \(D_0\) is similar to G.

Fig. 5.
figure 5

Two graph drawings from the data set described in Sect. 3.1, and Euclidean minimum spanning trees of the vertex locations.

Next we examine what happens when we make the drawing progressively bad. Suppose that \(D_\delta \) is formed from \(D_0\) by moving each vertex in a random direction by a random distance in the range \([ 0 , \delta s ]\), where s is the size of the screen. Drawings \(D_\delta \) for \(\delta = 0.1, 0.2\), and 0.5 are shown in Fig. 4. For \(\delta = 0.1\), the shape of the drawing is fairly close to the graph, the minimum spanning tree \(T_\delta \) shares quite a few edges with \(D_\delta \), and the value \(Q_{EMST} (D_{0.1} ) = 0.42\) is reasonably high. As \(\delta \) increases, the minimum spanning tree \(T_\delta \) shares fewer edges with \(D_\delta \), and the values of \(Q_{EMST} (D_{\delta } )\) fall. For \(\delta = 0.5\) the shape of the drawing shows no resemblance to the graph, and \(Q_{EMST} (D_{\delta } )\) is low. Intuitively, as the drawing becomes worse, the shape of the set of points differs more and more from the graph.

Larger examples are shown in Fig. 5, from the data set described in Sect. 3.1. Here the graph drawing (a) has 1160 vertices and 6424 edges, and (b) is a Euclidean minimum spanning tree of (a); the graph drawing (c) has 1749 vertices and 13957 edges, and (d) is a Euclidean minimum spanning tree of (c). In both cases, the Euclidean minimum spanning tree shares many edges with the graph, and expresses the shape of the graph drawing well.

2.5 Remarks

We should point out that the metrics that we have defined above are not normalised across graphs. If D and \(D'\) are two drawings of the same graph, then \(Q_\mu ( D ) > Q_\mu (D')\) whenever D is a better drawing than \(D'\). However, we make no such claim for two drawings D and \(D'\) of two different graphs.

Further, note that the Gabriel graph contains the relative neighborhood graph, which in turn contains the Euclidean minimum spanning tree [24]. We expect the Gabriel graph to model the shape of a set of points more precisely than a Euclidean minimum spanning tree.

3 The Experiments

In this section, we describe how the shape-based quality metrics perform on two specific data sets from previous experiments [1, 14].

3.1 The “Untangling” Data Set

Marner et al. [14] introduced a new method called GION for supporting interaction with graph drawings on large displays. The user study of [14] focussed on the task of untangling a graph drawing: subjects were presented with a graph drawing (a Fruchterman-Reingold layout [4]), and were simply asked to untangle the layout. Eight RNA sequence graphs were used, ranging from 1159 to 7885 vertices; there were 16 subjects. The experimental system captured, for each subject and each graph, a snapshot drawing every 5 seconds; the snapshot at time t is denoted by \(D_t\). Two such snapshot graph drawings are shown in Fig. 5(a) and (c). The main result of the experiment was that the GION method is better in several ways than more traditional interaction methods. For more details, see [14].

The experiment gave a large data set of graph drawings (8 graphs \(\times \) 16 users \(\times \) 24 snapshot drawings) that we can use to check our shape-based quality metrics. For each snapshot \(D_t\), we computed the number \(\chi (D_t)\) of edge crossings, the (scaled) stress \(\sigma (D_t)\), and the metrics \(Q_{EMST} (D_t) \), \(Q_{GG}(D_t)\), and \(Q_{RND}(D_t)\), respectively based on Euclidean minimum spanning tree, Gabriel graphs, and relative neighborhood graphs.

Commonly-held graph drawing wisdom is that \(\chi (D_t) \) and \(\sigma (D_t)\) decrease with the quality of the graph drawing. We expect that quality increases as the graph is untangled, and so we expect that \(\chi (D_t) \) and \(\sigma (D_t)\) decrease with t. In contrast, the proposed quality metrics \(Q_{EMST} (D_t) \), \(Q_{GG}(D_t)\), and \(Q_{RND}(D_t)\) are expected to increase with t. To make the comparison between these metrics easier, we place them on a comparable scale by inverting and normalising crossings and stress, as follows. We define

$$\begin{aligned} \bar{Q}_\chi (D_t) = \frac{M_\chi - \chi (D_t)}{M_\chi },~~ \bar{Q}_\sigma (D_t) = \frac{M_\sigma - \sigma (D_t)}{M_\sigma }, \end{aligned}$$

where \(M_\chi = \max _t \chi (D_t)\) and \(M_\sigma = \max _t \sigma (D_t)\). Note that \(\bar{Q}_\chi (D_t)\) (respectively \(\bar{Q}_\sigma (D_t)\)) increases from 0 to 1 as the number of crossings (respectively stress) increases from 0 to the maximum. For the shape-based metrics, we simply linearly normalise \(Q_{EMST}\) (respectively \(Q_{GG}\) and \(Q_{RND}\)) to give \(\bar{Q}_{EMST}\) (respectively \(\bar{Q}_{GG}\) and \(\bar{Q}_{RND}\)) so that it increases from 0 to 1 as the quality of the drawing increases.

It is reasonable to assume that the drawing improves in quality as the untangling proceeds. However, the results reported in [14] were counterintuitive in terms of crossings and stress: as the subjects untangled the graph drawings, there was a tendency to increase both crossings and stress (that is, both \(\bar{Q}_\chi \) and \(\bar{Q}_\sigma \) decreased).

In contrast, we found that \(\bar{Q}_{EMST}\), \(\bar{Q}_{GG}\), and \(\bar{Q}_{RND}\) all increased as the subjects untangled the drawings. The charts in Fig. 6 show \(\bar{Q}_\chi \), \(\bar{Q}_\sigma \), \(\bar{Q}_{EMST}\), \(\bar{Q}_{GG}\), and \(\bar{Q}_{RND}\), averaged over all subjects, for the first 3 of the 8 graphs. The horizontal axis is time t; the vertical axis shows the values of the metrics. For graphs #1 and #2, both crossings and stress increase with t (that is, \(\bar{Q}_\chi (D_t)\) and \(\bar{Q}_\sigma (D_t)\) decrease). In contrast, \(\bar{Q}_{EMST}\), \(\bar{Q}_{GG}\), and \(\bar{Q}_{RND}\) increase. Graphs #4, #5, #6, #7, and #8 showed very similar patterns to graphs #1 and #2. Graph #3 was a little different in that crossings decrease (and thus \(\bar{Q}_\chi \) increases), albeit chaotically.

Fig. 6.
figure 6

Metrics against untangling.

Overall, the data from the untangling experiment shows that both crossings and stress metrics became worse as the subjects untangled the graphs, but the shape-based metrics became better. With some provisos (see Sect. 3.3 below), this suggests that the shape-based metrics are better than crossings and stress for measuring untangling.

3.2 The “Preference” Data Set

Chimani et al. [1] report an experiment at the University of Osnabrueck aimed at determining whether human preferences in graph drawing correlates with crossings and stress. There were two follow-up experiments, at the Graph Drawing conference in 2014, and at the University of Sydney. The design and results of all three experiments were similar; see  [1]. Here we investigate the data from the University of Sydney experiment, aiming to determine whether shape-based metrics are correlated with preference.

This experiment had 40 subjects. Each subject was presented with 20 “instances”. Each instance displayed a pair of drawings of the same graph, as in the screenshot in Fig. 7. There is a slider bar at the bottom of the screen, and the subject indicates which of the pair of drawings he/she prefers by sliding to the left or right. The slider bar has a scale on the left from 5 to 1 and on the right from 1 to 5, with zero in the middle. The slider bar is used to give a score to the drawing that the subject prefers, indicating how much the subject prefers it. A score of 5 on the left indicates a strong preference for the drawing on the left, and a score of 5 on the right indicates a strong preference for the drawing on the right.

Fig. 7.
figure 7

Example of a typical “instance” (a graph pair shown to participants).

A total of 118 graphs, ranging in size from small (25 vertices and 29 edges) to moderately large (8000 vertices and 15580 edges), were used. Five drawings for each graph were generated, and the instances were chosen randomly. For details, see [1].

The results for a particular quality metric Q are expressed in terms of the “Q-ratio”, defined as follows. Consider an instance consisting of two drawings \(D_{left}\) and \(D_{right}\) of a graph G, such as in Fig. 7. Let \(Q (D_{left})\) (respectively \(Q (D_{right})\)) be the value of the Q metric for \(D_{left}\) (respectively for \(D_{right}\)). We define the Q-ratio for this instance as

$$\begin{aligned} \frac{\max ( Q(D_{left}) , Q (D_{right}) )}{\min ( Q(D_{left}) , Q (D_{right}) )}. \end{aligned}$$

If the Q-ratio is significantly larger than 1, then we expect that most subjects prefer the drawing with the higher quality (according to the quality metric Q). Further, as the Q-ratio increases, we expect that more and more subjects prefer the drawing with higher quality. To make this precise, we need to define some further terms.

For each quality metric Q and each instance I we compute a score \(S_Q(I)\) as follows. Suppose that for this instance, the subject gives a score of x (\(0 \le x \le 5\)). If the subject chose the drawing with a higher value of the quality metric Q, then \(S_Q(I) = x\); otherwise \(S_Q(I) = -x\). The expectation that most subjects prefer the drawing with the higher quality becomes an expectation that in most instances, \(S_Q(I)\) is positive.

For each metric Q, we chart the median of \(S_Q(I)\) over all instances I against the Q-ratio in Fig. 8. The charts for crossings and stress are shown in Fig. 8(a), and for EMST, GG, and RNG in Fig. 8(b). For both crossing and stress, there is adequate data for ratios from 1 to 5; however, the data for ratios larger than 4.5 is small (less than 20 instances) and the results at this end of the spectrum must be treated with caution.

Fig. 8.
figure 8

Stress and crossing ratios, shape graph ratios, and preferences.

  • Crossings. Overall, there is a slight preference for fewer crossings (median over all instances is \(+1\)). As the crossing ratio increases, the median preference for the drawing with fewer crossings increases. When the crossing ratio is above 2.5 the median preference for the drawing with fewer crossings is \(+3\), and stays steady at \(+3\) as the crossing ratio increases beyond 2.5.

  • Stress. Overall, there is a preference for lower stress (median over all instances is \(+2\)). As the stress ratio increases, the median preference for lower stress rises; it hovers between \(+3\) and \(+4\) when the stress ratio is above 4.

For EMST, GG, and RNG, there is adequate data for ratios from 1 to 1.5; but the data for ratios larger than 1.45 is small (less than 20 instances) and the results at this end of the spectrum must be treated with caution.

  • EMST. The median preference for the drawing with higher value of \(\bar{Q}_{EMST}\) is chaotic when the EMST-ratio is less than 1.2. The preference rises to \(+4\) when the EMST-ratio rises from 1.2 to 1.3, and remains at \(+4\) as the EMST-ratio increases beyond 1.3.

  • GG. Overall, there is a preference for drawings with a higher value of \(\bar{Q}_{GG}\) (median over all instances is \(+2\)). The preference for the drawing with higher value of \(\bar{Q}_{GG}\) rises smoothly with GG-ratio. When the GG-ratio is above 1.2 the median preference for the drawing with higher value of \(\bar{Q}_{GG}\) is \(+4\), and remains at \(+4\) as the GG-ratio increases beyond 1.2.

  • RNG. Overall, there is a preference for drawings with a higher value of \(\bar{Q}_{RNG}\) (median over all instances is \(+1\)). The preference for the drawing with higher value of \(\bar{Q}_{RNG}\) rises smoothly with RNG-ratio. When the RNG-ratio is above 1.2 the median preference for the drawing with higher value of \(\bar{Q}_{RNG}\) is \(+4\), and remains at \(+4\) as the RNG-ratio increases beyond 1.2.

One can conclude that people prefer drawings with fewer crossings, lower stress, and higher values for the shape-based metrics \(\bar{Q}_{EMST}\), \(\bar{Q}_{GG}\), and \(\bar{Q}_{RND}\). Note that the preference for better GG and RNG based metrics appears to be a little stronger than the preference for fewer crossings and lower stress. The overall preference for EMST-based metrics seems unreliable when the EMST-ratio is small.

3.3 Remarks on the Experiments

The data from both the “untangling” experiment and the “preference” experiment support the proposal that the shape-based metrics are good measures of the quality of a graph drawing; there is some indication that the shape-based metrics are better than crossings and stress. However, this support has some limitations:

  • Both experiments were designed for other purposes. Neither experiment was designed to test the shape-based metrics. To completely validate the new metrics, further study is needed.

  • The “untangling” experiment used a very specific kind of graph: RNA sequence graphs, which are locally dense with a global “tree-like” structure. For more general classes of graphs, further experimentation would be useful.

  • The experiments use the notions of “untangledness” and “preference” as proxies for ground truth quality. It would be useful to test the metrics in a task-oriented experiment.

4 Conclusion and Open Problems

This paper proposes a new family of metrics, aimed at measuring the quality of large graph drawings. We have some evidence from data in two previous experiments that these metrics are effective.

Our proposal raises several open problems:

  • Design experiments to fully validate shape-based metrics. In particular, it would be interesting to know whether time and error of tasks on large graphs (see [13]) is related to shape-based metric values.

  • Design algorithms to produce layouts that optimise the metrics. Note that (as with most graph layout problems) optimisation problems of this kind are typically NP-hard (see, for example, [2]), and thus heuristic approaches are in order. In particular, it would be interesting to know whether a stress minimisation algorithm gives a reasonable approximation.