The visualizations presented in this paper are generated in Python with libraries including numpy, sklearn, networkx and plotly, in jupyter notebook or other web-based Python frameworks. We visualize the nodes in a graph as red dots, and the edges in red, green and grey with respect to different edge length. Equation 11 illustrates how the edge color is assigned, where e, \(\mathrm{color}_{e}\), \(d_{\mu }\) and \(d_{\sigma }\) represent the length of the edge, color of the edge, mean distance and standard deviation of the graph-theoretic distances of the individual graph dataset, respectively.
$$\begin{aligned} \mathrm{color}_{e} = \left\{ \begin{array}{ll} \mathrm{red}, &{} \text {if } e < d_{\mu } - d_{\sigma }, \\ \mathrm{green}, &{} \text {if } d_{\mu } - d_{\sigma } \le e \le d_{\mu } + d_{\sigma }, \\ \mathrm{grey}, &{} \text {if } e > d_{\mu } + d_{\sigma }. \end{array} \right. \end{aligned}$$
(11)
Test data
We test our approach with the 20 datasets released by the tsNET team [22]. Table 1 lists the datasets.
In order to analyze the results, we label the datasets regarding their size and graph density. Small and large graphs refer to graphs with less than 1000 and more than 1000 vertices, respectively. Sparse and dense graphs stand for graphs with graph density less than 3 and more than 3, respectively, where the graph density is calculated by \(\vert E\vert /\vert V\vert \).
Table 1 Benchmark datasets Validation of hypothesis
We carry out a series of tests to validate hypothesis. The perplexity is set as 2%, 5%, 10%, 20%, 30%, 40% and 50% of the size of individual dataset. Figure 1 shows the boxplot of \(M_{np}\) and \(M_{\sigma }\) over the 20 datasets, where the mean value of each subgroup labeled by the percentage is dotted.
Figure 1b shows that the average neighborhood preservation \(M_{np}\) increases to the peak when perplexity is less than 10% of dataset size, then decreases with the increase in perplexity. The average normalized stress \(M_{\sigma }\) in Fig. 1a increases to the peak when perplexity is less than 5% of dataset size, then decreases with the increase in perplexity. However, the decrease in \(M_{\sigma }\) is less sharp than the \(M_{np}\), as shown in Table 2 where the linear model coefficient \(\mathrm{lm}\_\mathrm{coef}\) of \(M_{\sigma }\) and \(M_{np}\) is \(-\,4.3826\) and \(-\,6.1940\), respectively. If we flip over one of the fitted lines against the x-axis, the intersection point of the two lines can be identified when \(x=0.4664\). It indicates that a large perplexity no more than 47% of the dataset size could result in quite stable visualization in which both normalized stress and neighborhood preservation could be more likely to be well-balanced. The correlation between pairs of variables is presented in Table 2. Pearson’s correlation coefficient presented as cor_coef suggests that both of the average \(M_{np}\) and the average \(M_{\sigma }\) are significantly negatively correlated to perplexity, with correlation coefficient − 0.8299 and − 0.9450, respectively, both are very close to − 1. The results strongly support the hypothesis that a smaller perplexity corresponds to a larger \(M_{\sigma }\) and a larger \(M_{np}\), and that a larger perplexity corresponds to smaller \(M_{\sigma }\) and \(M_{np}\). In addition, the smoothed quadratic means of both \(M_{\sigma }\) and \(M_{np}\) shown in Fig. 2, which are fitted in R by loess method and formula \(y \sim x\), is also a strong visual support to the hypothesis.
Table 2 Correlation of perplexity percentage and average \(M_{\sigma }\) and average \(M_{np}\)
To illustrate the issue visually with examples, Fig. 3 demonstrates two datasets with gradually increased perplexities ranging from 2 to 100% of the dataset size using the modified standard t-SNE. We can see that the visualizations generated with a larger perplexity are fairly robust from a global perspective (less \(M_{\sigma }\)), meanwhile preserving neighborhood information with less precision (less \(M_{np}\)) as more nodes are overlapped with the increase in perplexity.
We also examine the tsNET and tsNET* visualizations using their released code with the change of perplexities the same as in Fig. 3, which is shown as Fig. 4. The maximal iteration number is set 1100 for all tests presented in Fig. 3 and Fig. 4. A similar trend as Fig. 3 presented can be inspected from the graph layouts presented in Fig. 4 that tsNET and tsNET* visualizations generated with a larger perplexity are quite robust from the global perspective. However, the nodes of dw_1005 in tsNET and tsNET* layouts with a larger perplexity are much compressed with the increase in perplexity before 20% of the dataset size, and spread out with less overlapping with the increase in perplexity after 30% of the dataset size, which is different from what we can observe in Fig. 3, most probably due to the additional control of node repulsion and edge attraction in tsNET [6].
Overall, we can identify the descending trends of both \(M_{np}\) and \(M_{\sigma }\) with the increase in perplexity in Fig. 1, with which the hypothesis can be validated. In addition, as shown in Table 2, \(M_{np}\) and \(M_{\sigma }\) are positively correlated with \(p = 0.63\), it suggests that there is a need to find a trade-off between \(M_{np}\) and \(M_{\sigma }\) with respect to an appropriate perplexity, which will be large enough to generate a good layout with less \(M_{\sigma }\), but small enough to preserve more neighborhood details with higher \(M_{np}\).
Perplexity estimation based on our approach based on the modified standard t-SNE
We test our perplexity estimation approach with the 20 datasets. The perplexity is set according to Eq. 9, with the fixed value 40 as perplexity for small sized graphs. The threshold is 1000. Further, we roughly set the value of \(\delta (d) \in \{0.1, 0.3\}\) in our experiment as described in Eq. 12, and only apply a big \(\delta (d)\) value to large and very dense graphs to avoid overfitting.
$$\begin{aligned} \delta (d) = \left\{ \begin{array}{ll} 0.1, &{} \text {if } d < 6,\\ 0.3, &{} \text {if } d \ge 6\text { and } \vert V \vert \ge 1000. \end{array} \right. \end{aligned}$$
(12)
\(M_{\sigma }\) and \(M_{np}\)
Table 3 shows the experiment results in details. Overall, we can find that both the average \(M_{\sigma }\) and \(M_{np}\) of our approach show improvements compared with the overall averages of tsNET* reported in [6]. It suggests that our perplexity estimation approach is very robust and effective to balance between normalized stress and neighborhood preservation. When observing the performance of our method on the four different types of graphs, the average \(M_{\sigma }\) and \(M_{np}\) for small- and large-sized groups as well as the sparse group also demonstrate excellent performance stability. The performance of our approach on dense graphs is also comparable to that of tsNET*, with a less average normalized stress and neighborhood preservation, without significant difference however. The performance on several individual datasets including dwt_72, rajat11, 3elt, us_powergrid and dwt_1005 outperform that of tsNET*, the others show either a better or comparable \(M_{\sigma }\) or a better or comparable \(M_{np}\) at the same time, except dwt_419. Then, we test with a larger perplexity, a value which is over 15% of the dataset size of dwt_419, we can obtain very stable excellent performance.
Table 3 Normalized stress (\(M_\sigma \)) and neighborhood preservation (\(M_{np}\)) measures of graph layouts by the modified standard t-SNE (st-SNE), where italics and bold numbers correspond to outperformed and underperformed measures compared to tsNET*, respectively
Table 4 Normalized stress (\(M_\sigma \)), neighborhood preservation (\(M_{np}\)) measures and runtime (in seconds) of sklearn Barnes–Hut TSNE (TSNE)
Visual inspection
Visual inspections of all datasets with perplexity of 2% and 10% of dataset size, as well as estimation with our method based on the modified standard t-SNE, are shown in Fig. 6. We can find that a small perplexity often results in a decrease/uncertainty of fitting precision and therefore causes visual distortions of the graph layout. Several mesh-type graph examples in Fig. 6, such as dwt_72, can_96, 3elt, dwt_1005, sierpinski3d and bcsstk09 can illustrate this problem clearly, when their perplexities are set as small as 2% of the individual dataset size. Overall, most of the visualizations generated with the perplexity estimated with our approach based on the modified standard t-SNE, except dwt_419, display better structures without or with less visual distortions (shown as gray edges). dwt_419 can be better visualized when the perplexity is increased, as mentioned above.
Runtime
The runtime in seconds is shown in Fig. 5, where the datasets are ordered by their number of nodes (size) along the x-axis. We do not compare the runtime in a precise way the same as in Table 3 due to the reason that the runtime reported in the work of tsNET and tsNET* was based on their customized settings of perplexity and stop conditions, which are unknown to us. We can observe that the runtime in Fig. 5 in general is increased with the decrease in perplexity from 2 to 50% for the same dataset, and large datasets need more time. It shows that our perplexity estimation approach chooses the perplexity between 2 and 10% of dataset size for most of the datasets tested.
We focus on the performance of our approach based on the modified standard t-SNE rather than the speed in the experiments above. As it takes several hundreds of seconds for tsNET/tsNET* and our modified version of the standard t-SNE on large datasets with over 4000 nodes, we try to improve it by the adapted estimation method as described in Eq. 10. Our experiment on the large test datasets also demonstrates very good performance with \(M_\sigma = 0.1218\) and \(M_{np} = 0.6296\) on average, compared to the average tsNET* \(M_\sigma = 0.1243\) and \(M_{np} = 0.5971\), as given in Table 4. We find that our perplexity estimation for the Barnes–Hut t-SNE does not work well on small datasets, and it works better on large dense data than on large sparse data, as the Barnes–Hut approximation on sparse or small data causes a higher information loss. The average runtime is dramatically reduced from 598 s with our modified standard t-SNE to 12.9 s with sklearn Barnes–Hut TSNE, while keeping overall good graph layout quality comparable to the method based on the modified standard t-SNE.
Discussion
In our approach, we estimate the value of perplexity based on the normality of the input data, which is very effective when tested on the benchmark datasets. We also notice that each test dataset is a connected component such that the distribution of pairwise graph-theoretic distances fits normality well. For graphs with many disconnected components, a layout with much higher stress is supposed to be generated by t-SNE, as pilot experiments show, and similar findings is also reported in the work [7]. Focusing on the largest connected component or the connected component of interest is a practical solution to employ t-SNE on graphs with many disconnected components.
We estimate the value of perplexity for small sized graph with less than 1000 nodes to 40, which is an approximation between 2 and 10% of 1000, based on the average graph size of the test data. An increased value of graph density regulator \(\delta (d)\) is not applied to the small graphs to avoid overfitting, whose visualizations show more randomness/uncertainty than the large graphs due to the data sparsity. In our experiment, we observe an automatic increase in perplexity from 40 to 100 for the small dense dataset jazz, and it shows the robustness of our approach.
Our approach presents advantages with respect to ease of use and effectiveness.
First, our approach does not require users to try multiple times with different perplexities as input to deliver the acceptable output, which is very easy to use. For example, the EVA dataset can be visualized with any smaller perplexities compared to 600 in tsNET and tsNET*, as shown in Fig. 6. However, an extra grid search could probably help to find an optimum with extra costs of time and effort.
Second, our approach does not rely on the t-SNE parameter tuning such as the weight of the compression term [6, 7], and it does not require an additional PivotMDS layout as initialization as tsNET* does, although the employment of a PivotMDS layout seems to be beneficial for small graphs as tested. As the tsNET* works based on the initialization of PivotMDS layouts, t-SNE cannot obtain better embeddings in case the PivotMDS layouts could not fully reflect the relations based on graph-theoretic distances.
Experimental results show that our approach is very effective to visualize graph data with good quality evaluated by normalized stress, neighborhood preservation and visual inspections, as well as a significant improvement in runtime when applied with sklearn Barnes–Hut TSNE on large datasets without losing visualization quality. Our work is a step beyond the work of Kruiger et al. [6] and Wattenberg et al. [13], and it not only reveals the impact of t-SNE perplexity on graph layouts, but also presents guidelines to estimate an appropriate perplexity for good layout, and offers possibility to accelerate the process using widely used Python tools.