Keywords

1 Introduction

Visual analytics is based on the combination of automatic analysis technology and interactive visualization technology for efficacious exploring and decision making for users [1]. Converting complex datasets into graphics or images makes it easier for users to understand large-scale data and discover hidden information.

In many researches, network was just considered as a complex graph or a kind of data structure with nodes and edges. These studies are generally divided into two categories. One is visualization research for a single layer. The other is visualization of multi layers for complex network.

There are some works on displaying network status in single layer. For network control layer, VizFlowConnect [2] and PortVis [3] use Parallel—Coordinate and Scatter—Plot to visualize Netflow data. IPMatrix [4] and Netvis [5] use Node-Link and TreeMap to visualize IDS alarm data. Romero-Gomez [6] designs THACO (THreat Analysis COnsole) for DNS-based network threat analysis. In addition, researchers also use topology diagram of network routing information to analyze network events. BGP data is regularly used in some systems, such as BGPlay [7], BGP Eye [8], BGPfuse [9] and so on. From the view of application, Alsaleh [10] presents an extension to PHPIDS, which correlates PHPIDS logs with the corresponding web server logs to plot the security-related events. ThousandEyes [11] monitors internal and external network performance to improve application delivery and reduce service interruptions. Similarly, RIPE Atlas [12] is a global network of probes that measure Internet connectivity and reachability, providing an unprecedented understanding of the Internet in real time.

Some researchers also propose multi-layer visual model to show the complex network. Zhang [13] considers that all the subsystems in the real world are not isolates but connective. Iyad Katib [14] separates the logical layer form physical layer, and propose a three-layer named IP-OTN-DWDM to improve the convenience of resource management and fault detection in communication. In 2017, Wei [15] develops a research from the view of application and proposes two-layer network with carrier layer and business layer. In order to display the different levels of network topology and the relationships between the multi layers, researchers usually set each network’s nodes and edges in a 2D plane, and put these 2D planes in 3D space. This method is called as two-and-a-half-dimensional (2.5D) visualization [16].

However, in most cases, it is hard for single-layer visualization to show the relationship between different layers in network. Simultaneously, traditional multi-layer social or biological network visualization always focuses on displaying the same type of data hierarchically based on community detection. Each communities are placed on different 2d planes and straight lines connect those planes. These visualization models always ignore the connection relation among different types of layers in real network.

To solve problem mentioned above, we use the 2.5D visualization method to design our two-layer visual model. Concretely, different with the traditional Open System Interconnection (OSI) model, we just consider the application service layer and network control layer from the view of application. These two layers connect with each other by using IP mapping relationship. To avoid unnecessary loss of computational efficiency, we modify Louvain algorithm via pruning the leaf nodes to divide the network control layer into several parts. In order to make the view structured and well-distributed, we add additional community attractive forces to Fruchterman-Reingold algorithm. Then, to minimize the number of the crossing lines between these two layers, the method is given that the nodes’ locations of the application service layer can be obtained by using the locations of nodes in the network control layer. Finally, we merge the application service layer and the network control layer into a two-layer visualization model. Based on our two-layer model, the whole network trend, topology and incidence relation can be easily observed.

2 Preliminary Knowledge

In network control layer, we can abstract the Internet topology at the inter-domain level into an Autonomous System (AS) connection graph. Through this graph, network is just considered as a complex graph with AS nodes and BGP routing sessions’ links. Similar to social networks, there are communities in the real network. The AS nodes in the community interact closely and the relationships between communities are relatively sparse. In order to present a good community structure layout, this section introduces Louvain algorithm for community detection and Fruchterman-Reingold algorithm for nodes layout covered in this paper.

2.1 Louvain Algorithm

The existing community detection algorithms for complex network are mainly divided into two categories. One is based on the graph theory, such as k-clique algorithm [17], Label propagation algorithm (LPA) [18] and so on. Another is hierarchy-clustering algorithm, such as Fast Newman algorithm (FN) [19], Louvain algorithm [20] and so on. Currently, researchers consider the Louvain algorithm the best non-overlapping community detection algorithm.

Girvan Newman [21] first proposed the concept of modularity Q in 2002. Then the modularity Q in formula (1) is commonly used to measure the strength of the network community structure. As the value of the modularity Q increases, the community structure is more robust and compact. The value of modularity Q is up to one.

$$ \begin{array}{*{20}l} {{\text{Q}} = \frac{1}{2m}\sum\nolimits_{i,j} {[A_{ij} - \frac{{K_{i} K_{j} }}{2m}]\partial (c_{i} ,c_{j} )} } \hfill \\ { = \frac{1}{2m}\left[ {\sum\nolimits_{i,j} {A_{i,j} } - \frac{{\sum\nolimits_{i} {K_{i} \sum\nolimits_{j} {K_{j} } } }}{2m}} \right]\partial (c_{i} ,c_{j} )} \hfill \\ { = \frac{1}{2m}\sum\nolimits_{c} {\left[ {\sum {in} - \frac{{(\sum {tot} )^{2} }}{2m}} \right]} } \hfill \\ \end{array} $$
(1)

In which, \( \sum in \) is the sum of the weights of all the edges in community c. \( \sum tot \) is the sum of the weights of the edges connected to all nodes in community c.

Louvain algorithm is based on modularity Q optimization. The change of the modularity increment ∆Q can be derived by formula (1)

$$ \vartriangle {\text{Q = }}\frac{1}{2m}(k_{i,in} - \frac{{\sum {tot} k_{i} }}{m}) $$
(2)

Where, \( k_{i,in} \) is the sum of weight of edges which is connected to node \( {\text{n}}_{i} \) in community c. The use of greedy algorithm for large complex networks greatly improves computational efficiency.

Louvain algorithm’s process is mainly divided into two steps:

The first step is to regard all nodes in the network as an independent community. Then we try to assign each node to another community which its neighbor node belongs to and calculate the change in modularity increment ∆Q. If ∆Q > 0, we choose the community which makes ∆Q the largest and then put this node into this community.

In the second step, we regard the nodes of the same community as a new node. The weights of the edges between communities are converted into the weight of edges between new nodes. Repeat step 1 until the modularity Q no longer changes.

2.2 Fruchterman-Reingold Algorithms

The most common graph drawing methods always rely in physical simulations, such as force-directed algorithm (FDA) [22], Kamada-Kawai algorithm (KK) [23], Fruchterman-Reingold algorithm (FR) [24] and so on. In this system, the edges between the nodes are equivalent to the spring or other physical connections, and the nodes are balanced by the interaction of the elastic force. Our research is based on FR algorithm. This method treats nodes as atoms in physical system and there is attractive force and repulsions force between each node. By calculating the total energy of the system, it can produce a beautiful and balance layout with a simple cooling table.

In order to make the nodes in graph well-distributed, FR algorithm thinks that nodes with edges connected should be as close as possible and nodes with no edges connected should be as far as possible. This method defines the concepts of attractive force (\( f_{a} \)) and repulsions force (\( f_{r} \)). There are attractive force among all nodes with edges connected and repulsions force among all nodes. K is used to control side length. The force can be calculated as following:

$$ \left\{ {\begin{array}{*{20}c} {f_{a} = {{d^{2} } \mathord{\left/ {\vphantom {{d^{2} } k}} \right. \kern-0pt} k}} \\ {f_{r} = - {{k^{2} } \mathord{\left/ {\vphantom {{k^{2} } d}} \right. \kern-0pt} d}} \\ \end{array} } \right. $$
(3)

In which, \( {\text{k}} = {\text{C}}\sqrt {\frac{S}{N}} \) is the balance coefficient. C is a constant. S is the layout area and N is the number of all nodes. Using FR algorithm, nodes are evenly distributed.

3 The Method of Two-Layer Network Topology Visualization

In our research, we just abstract the network and application as network control layer and application service layer. We focus on the topological structure of the network control layer as well as the relationships between each layer, so we use the 2.5D visualization method to design our two-layer model. This section introduces our work on network control layer topological visualization and the display algorithm of two-layer network model.

3.1 Visualization of Network Control Layer

Similar to social networks, there are communities in the real network. The AS nodes in the community interact closely and the relationships between communities are less interact. Traditional Louvain algorithm and FR algorithm have some defects in display and efficiency. In order to make the view structured and well-distributed, we modify Louvain algorithm via pruning the leaf nodes and add additional community attractive forces to our modified FR algorithm.

The Modified Louvain Algorithm

Since there are a large number of leaf nodes which are only connected to one node in real network, it would cause unnecessary loss of computational efficiency if we calculate modularity increment ∆Q for every node. As shown in Fig. 1, the black nodes have five leaf nodes. Because community detection will avoid individual nodes belonging to a community, the leaf nodes and the black node must be in the same community. To avoid unnecessary loss of computational efficiency, in the first step of the Louvain algorithm, we can assign the leaf nodes to its adjacent non-leaf nodes directly. With the increase of the ratio of leaf nodes in the network, the algorithm efficiency is obviously enhanced.

Fig. 1.
figure 1

Network with leaf nodes.

Our research obtains the routing data of the rrc00.ripe.net probe at 16:00 on May 7, 2018 from the RIPE Routing Information Service (RIS). We get the relationships between global AS nodes by Python processing. Since there are edges with higher repetition rate and edges with fewer paths in our routing data, we introduce the number of AS path as edge weights into the Louvain algorithm. Compared to the unweighted graph in real network, the nodes with high weight links are more likely in the same community.

We classify the nodes and edges by countries and select 338 nodes and 544 edges in China collection for experiment. We perform Louvain algorithm and modified Louvain one on this dataset. Compared with the Fast Newman algorithm (FN) and Label propagation algorithm (LPA), Table 1 shows the results of the experiment.

Table 1. Quantitative evaluation of FN algorithm, LPA algorithm, Louvain algorithm and modified Louvain one.

As shown in Table 1, the Fast Newman algorithm performs worst. The Modularity Q is not bad but it takes too much time in iteration. The LPA algorithm is the fastest algorithm in those algorithms, but the Louvain algorithm performs much better in the modularity Q. Compared with the original Louvain algorithm, the modified one is better with 4.12% less time-consuming. At the same time, as the number of communities drops from 15 to 12, the modularity Q has also increased from 0.6137 to 0.6231. Therefore, this experiment proves the modified Louvain algorithm is the most suitable community detection for visualization of network control layer.

The Modified FR Algorithm

There are some issues on using FR algorithm to visualize large-scale data layout: (1) It is difficult for us to observe the community structure and the connection between communities when there are too many communities. (2) The time complexity of FR is \( {\text{O}}(\left| {\text{E}} \right| + |V|^{2} ) \). When using too many nodes, it will take a lot of time to calculate.

So, in our research, refer to FR algorithm, we redefine attractive forces (\( f_{a} \)) and repulsions forces (\( f_{r} \)), and add community force (\( f_{com} \)) to make the nodes in the same community more closely to each other while ensuring an evenly layout. In order to avoid the local optimum, we introduce the energy function into our method and use simulated annealing algorithm to approximate optimal solution. At the same time, Barnes-Hut force-calculation model [25] is introduced to reduce time complexity so that the modified algorithm can be applied to large-scale network layout.

Often, the nodes with high weight links should stay closer than other nodes, so we introduce the weight of each edges into the calculation of attractive forces (\( f_{a} \)). For another, the high-degree nodes usually belongs to different communities, so we hope these nodes father away from each other in order to display a better visualization of communities. Therefore, we also introduce the degrees of two nodes when we calculate the repulsive forces (\( f_{r} \)) between all nodes. The forces of every node in network are calculated as following:

$$ \left\{ {\begin{array}{*{20}c} {f_{a} = \frac{{d^{2} (n_{1} ,n_{2} )*w(n_{1} ,n_{2} )}}{\text{k}}} \\ {f_{r} = - \frac{{k^{2} }}{d}\text{deg}(n_{1} )*\text{deg}(n_{2} )} \\ \end{array} } \right. $$
(4)

For the nodes in the same community, we hope that edges with high weight connect nodes with each other tightly. Refer to the attractive force formula (3) defined in FR, we introduce the sum of edges weight in community (\( w_{com} \)) and define the formula for community forces as following:

$$ f_{com} = \frac{{d^{2} (n_{1} ,n_{2} )}}{k}*w(n_{1} ,n_{2} )*\frac{{w_{com} }}{{w_{all} }} $$
(5)

In which \( {\text{d}}\left( {n_{1} ,n_{2} } \right) \) is the distance between node 1 and node 2, \( w\left( {n_{1} ,n_{2} } \right) \) indicates the weight of edge \( \left( {n_{1} n_{2} } \right) \). \( {\text{deg}} \left( {n_{1} } \right) \) is the degree of node \( n_{1} \). \( w_{com} \) is the sum of edge weight in a community. \( w_{all} \) is the sum of weight of all edges.

At the same time, in order to reduce the time complexity of the repulsion forces calculation, we introduce Barnes-Hut force-calculation model in the calculation of repulsive forces. In our modified FR algorithm, the time complexity of the repulsion calculation is reduced from \( O(|V|^{2} ) \) to \( O(|V|\log |V|) \).

We use this modified FR algorithm and another classic layout algorithms for experiment in the public Dolphins data set (62 nodes, 159 sides) and Football Club data set (115 nodes, 613 sides). The results are as shown in Figs. 2 and 3.

Fig. 2.
figure 2

Visualization of Dolphins Dataset (a. raw data without algorithm, b. Yifan Hu algorithm [26], c. ForceAlatas 2 algorithm [27], d. FR algorithm, e. modified FR algorithm).

Fig. 3.
figure 3

Visualization of Football Club Dataset (a. raw data without algorithm, b. Yifan Hu algorithm, c. ForceAlatas 2 algorithm, d. FR algorithm, e. modified FR algorithm).

From the visualization of the two datasets in Figs. 2 and 3, it is easy to see that the modified FR algorithm shows the community relationships in network more clearly and structured than Yifan Hu and ForceAltas2 algorithm. We can easily distinguish community connections with greater connectivity in the current network state.

3.2 Establishment of Two-Layer Network Model

In our research, the real network is considered as a two-layer model of the application service layer and the network control layer. We find that the locations of the application service layer’s nodes are related to the locations of the network control layer nodes closely. In order to reduce the visual confusion caused by the inter-layer crossing, we need to calculate and adjust the location of the application service layer’s nodes. Generally, an application service corresponds to multiple nodes in the network control layer. So in our 2.5D model, it is assumed that the node coordinates of the network control layer are \( x_{i} \), \( y_{i} \), \( z_{i} = 0 \). Then the corresponding node coordinates in application service layer are as follows:

$$ \left\{ {\begin{array}{*{20}c} {{\text{x}}_{j} = ({\text{x}}_{i} + {\text{x}}_{i + 1} + \cdots + {\text{x}}_{i + n - 1} )/n} \\ {y_{j} = (y_{i} + y_{i + 1} + \cdots + y_{i + n - 1} )/n} \\ {z_{j} = 1} \\ \end{array} } \right. $$
(6)

In which, \( x_{i} \), \( y_{i} \), \( z_{i} \) represent node coordinates in the network control layer and \( x_{i} \), \( y_{i} \), \( z_{i} \) represent node coordinates in the application service layer. n is the number of nodes in network control layer corresponding to an application service.

The layout algorithm established by two-layer layout model can be described as following:

figure a

4 Visual Analysis

We select 338 nodes and 544 edges in China collection in network control layer and 13 website nodes in application service layer for experiment. Using IP mapping to connect two layers, the result is shown in Fig. 4. The black nodes represent 13 websites and the remaining colorful nodes are the AS nodes in network control layer. Different colors represent different communities.

Fig. 4.
figure 4

Two-layer network topology for fault detection (FR algorithm and modified FR algorithm with community forces)

As shown in Fig. 4, compared the two-layer network model built by FR algorithm with two-layer network model built by modified algorithm, we can find that the community structure in our modified algorithm is more obvious and structured.

If you are interested in a website, you can select a node in the application service layer. Then, all business relationship related to it in network control layer are highlighted and we can get the AS node information from node labels. The result is shown in Fig. 5.

Fig. 5.
figure 5

Select a node in application service layer.

As shown in Fig. 5, we take the Baidu Fanyi node as an example, the AS nodes connect with it are AS4808, AS4847 AS9808, AS23724 and AS55967. Therefore, when these nodesare safe, we can ensure the proper operation of Baidu Fanyi.

If you want to observe the relationships of an AS node or the business carried by an AS node, you can select an AS node in the network control layer. The result is shown in Fig. 6

Fig. 6.
figure 6

Select an AS node (AS23724) in network control layer.

Figure 6 shows the result when we select an AS node (take AS23724 as an example) in network control layer. We can easily distinguish all nodes connected to AS23724 in network control layer and application service layer. If there is a problem with the AS23724 node, the nodes connected to it in network control layer and the website relied on it in application service layer may be also in trouble.

5 Conclusion

Currently, the visualization of real network mainly focused on the single-layer networks-based representation. Our research considered the application service layer and the network control layer with the IP mapping relationship between them. In network control layer, to avoid unnecessary loss of computational efficiency, we modified Louvain algorithm via pruning the leaf nodes to community detection. In order to make the view structured and well-distributed, we added additional community attractive forces to FR algorithm to make nodes in network control layer structured and well-distributed. Finally, we merged application service layer and the network control layer into a 2.5D visual model to facilitate the user’s further analysis on observing the network trend, topology and incidence relation.

The future task of our multi-layer networks is to introduce a geographic location layer to show the IP location of each node in application service layer. From the future multi-layer visual model, the operating status of the network and topology can be observed in multiple dimensions.