Onion under Microscope: An in-depth analysis of the Tor network

Tor is an anonymity network that allows offering and accessing various kinds of resources, known as hidden services, while guaranteeing sender and receiver anonymity. The Tor web is the set of web resources that exist on the Tor network, and Tor websites are part of the so-called dark web. Recent research works have evaluated Tor security, evolution over time, and thematic organization. Nevertheless, few information are available about the structure of the graph defined by the network of Tor websites. The limited number of Tor entry points that can be used to crawl the network renders the study of this graph far from being simple. In this paper we aim at better characterizing the Tor Web by analyzing three crawling datasets collected over a five-month time frame. On the one hand, we extensively study the global properties of the Tor Web, considering two different graph representations and verifying the impact of Tor's renowned volatility. We present an in depth investigation of the key features of the Tor Web graph showing what makes it different from the surface Web graph. On the other hand, we assess the relationship between contents and structural features. We analyse the local properties of the Tor Web to better characterize the role different services play in the network and to understand to which extent topological features are related to the contents of a service.


Introduction
"Darkweb" is a generic name used to denote the part of the web that is accessible only through specific privacypreserving browsers. Other than being non-indexed by popular search engines, the darkweb is dark meaning that the identity of both the services offering contents and the users enjoying them may be kept anonymous by the use of overlay networks implementing suitable cryptographic protocols. Tor is probably the best known and most widespread overlay network, owing its name to The Onion Routing protocol it is based upon. Tor guarantees privacy and anonymity by redirecting traffic through a set of relays, each adding a layer of encryption to the data packets they forward. Past research on the Tor network has evaluated its security [1], evolution [2], and thematic organization [3]. Nevertheless, an in depth study of Tor's characteristics is difficult due to the limited number of Tor entry points on the surface Web.
In this paper, building on and extending over previous results on the topic [4,5] we aim at better characterizing the Tor Web by analyzing several crawling datasets collected over a five-month time frame. Those data are aggregated by Hidden Service (HS) (the equivalent of a domain on the surface web) in order to provide a structural analysis of these services and of their interconnections (i.e., hyperlinks). In particular, the paper investigates the existence of specific Tor Web features possibly also inferring on the latent patterns of interactions among Tor users. It also assesses whether graph metrics can be used to tell apart hidden services playing specific/unique roles in the network.
In addition, by making use of a publicly available topic-tagged dataset of Tor hidden services [6] the paper assesses potential relationships between contents and structural features.
In line with previous work on the WWW [7] and with a recent trend for criminal networks and dark/deep web [4,8,9,10], we will especially focus on the topology of the Tor web graph as a source of valuable information about Tor as a complex system, shedding light on usage patterns as well as dynamics and vulnerabilities of the Tor network. To that purpose, along with the three snapshot graphs induced by the three crawling data sets, we will also consider an intersection graph and an union graph, in an effort to discriminate intrinsic features from noise. As a side effect, the present paper also addresses several open questions about the persistence of the Tor Web, showing the actual changes that took place in the quality, quantity and shape of available services and in their inter-connections over the considered time span.
This work provides several contributions, the most relevant ones summarized as follows: • On the one hand, we extensively study the global properties of the Tor Web, considering two different graph representations -directed and undirected -and verifying the impact of Tor's renowned volatility. Among other findings, we show that: Tor is a small world but is inefficient, consisting of a stable core of mostly inand out-hubs, surrounded by an unstable periphery of services that point to or are pointed by the services in the core; when only mutual connections are considered, we obtain an undirected version of the Tor Web that is not a small world anymore, but that better preserves the social structure of the graph, such as its community structure, which appear to be generally stable and, as such, meaningful. Overall, Tor comes out having significant structural differences with respect to the WWW.
• On the other hand, we analyse the local properties of the Tor Web to better characterize the role that different services play in the network and to understand to which extent topological features are related to the contents/structure of a service. We show that authorities and hubs (as defined in [11]) are indeed separate services in Tor. We provide evidence that both the volatility of Tor's hidden services and the tendency of services to cluster together are unrelated to the services' content. We verify that switching to mutual connections also impacts on the distribution of local metrics and brings to the surface the social life of Tor in a broad sense. Finally, we show that some topological metrics are informative of the activity occurring on a service and especially of whether this activity is "suspicious" (as defined in [12]).
To the best of our knowledge, this is to date the widest and most accurate study of this type on the Tor Web, exceeding previous efforts in the literature [13,10,3,5].

Related Work
Interesting works studying the topology of the underlying network and/or semantically analyzing Tor contents have appeared so far. In particular, Biryukov et al. [1] managed to collect a large number of hidden service descriptors by exploiting a presently-fixed Tor vulnerability to find out that most popular hidden services were related to botnets. Owen et al. [14] reported over hidden services persistence, contents, and popularity, by operating 40 relays over a 6 month time frame. Their aim was classifying services based on their contents. Griffith et al. [10] performed a topological analysis of the Tor hidden services graph. They crawled the Tor network using the scrapinghub.com commercial service through the tor2web proxy onion link. Interestingly, they reported that more than 87% of Darkweb sites never link to another site. The main difference with our work lies in both the extent of the explored network (we collected a much more extensive dataset than that accessible through tor2web) and the depth of the network analysis (we evaluate a far larger set of network characteristics). Ghosh et al. [15] employed another automated tool to explore the Tor network and analyze the contents of onion sites for mapping onion site contents to a set of categories, and clusters Tor services to categorize onion content. The main limitation of that work is that it focused on page contents/semantics, and did not consider network topology. Christin et al. [9] collected crawling data on Tor hidden services over an 8 month lifespan. They evaluated the evolution/persistence of such services over time, and performed a study on the contents and the topology of the explored network. The main difference with our work is that the Tor graph we explore is much larger, not being limited to a single marketplace. In addition, we present here a more in depth evaluation of the graph topology. De Domenico et al. [16], used the data collected in [17] to study the topology of the Tor network. They gave a characterization of the topology of the Darknet and proposed a generative model for the Tor network to study its resilience. Their viewpoint is quite different from our own here, as they consider the network at the autonomous system (AS) level. Duxburyet al. [18] examine the global and local network structure of an encrypted online drug distribution network. Their aim is to identify vendor characteristics that can help explain variations in the network structure. Their study leverages structural measures and community detection analysis to characterize the network structure. ToRank [12] [20] analyzed a large amount of onion domains obtained using the Ichidan search engine and the Fresh Onions site. They classified every encoutered onion domain into 6 categories, creating a directed graph and attempting to determine the relationships and characteristics of each instance. Norbutas et al. [21] made use of publicly available crawls of a single cryptomarket (Abraxas) during 2015 and leveraged descriptive social network analysis and Exponential Random Graph Models (ERGM) to analyze the structure of the trade network. They found out the structure of the online drug trade network to be primarily shaped by geographical boundaries, leading to strong geographic clustering, especially strong between continents and weaker for countries within Europe. As such, they suggest that cryptomarkets might be more localized and less international than thought before. So far, the largest Tor dataset collected from an automated Tor network exploration is due to Bernaschi et al. [4] whose work aimed at relating semantic contents similarity with Tor topology. Further work [5] by the same authors features a very detailed network topology study investigating similarities and differences from surface Web and applying a novel set of measures to the data collected by automated exploration.

Roadmap
The rest of paper is organized as follows. In Section 2 we describe our dataset, including statistics about the organization of the hidden services as websites: tree map, amount of characters and links. We introduce our hidden services graphs in Section 3, briefly recalling how these graphs were extracted and characterizing them by analyzing properties such as global metrics, degree distribution and community structure. In Section 4 we study local (i.e., vertex-level) properties of our graphs in relation with content-based classification: (i) we provide a correlation analysis of several metrics, (ii) we measure the prevalence of thematic classes in graphs and communities, and (iii) we study the information gain provided by topological features for topic-based classification. Finally, we draw conclusions in Section 5.

The Dataset
The dataset considered in the present paper is the result of three independent scraping procedures of the Tor Web, described in detail in [5]. We executed three independent six-week runs of our customized crawler, resulting in three "snapshots" of the Tor Web: SNP1, SNP2 and SNP3. More details on the design of the crawler and the outcome of the scraping procedures can be found in [22,5].
It is quite common to analyze a dataset obtained by crawling the web. Yet, it must be kept in mind that the analysis may be susceptible to fluctuations due to the order in which pages have been first visited -and, hence, not revisited thereafter [23]. In the case of the Tor Web, the issue is exacerbated by the renowned volatility of Tor hidden services [1,13,14]. By executing three independent scraping attempts over five months, we aimed at making our analysis more robust and at telling apart "stable" and "temporary" features of the Tor Web.
In total, we reached millions of onion pages (more than 3 millions in the second run alone) and almost 30 thousands different hidden services. The distribution of these hidden services across the three snapshots is depicted through a pie-diagram in Figure 1. Albeit active services may temporarily appear offline to the crawler (e.g., due to all paths to those services being concurrently unavailable), these statistics are quite informative about the volatility of the Tor web. "Only" 10685 onion URLs were successfully reached by all three crawling runs. It is quite likely that those hidden services were durably present over the considered five months time frame; they account for, respectively, 83.3% of SNP1, 42.2% of SNP2 and 61.2% of SNP3. Among the hidden services that are absent in just one of the three data sets, especially notable are the 76 hidden services that reappeared in SNP3 after not having been found during SNP2.
To gain a better understanding of the structure and degree of mutability of Tor contents, we resorted to a few high-level indicators. On the one hand, we reconstructed the whole tree-structure of each and every hidden service sub-domain and pages; on the other hand, we computed the total number of characters and the total number of hyperlinks (i.e., number of href s in the HTML source) of each service. These metrics can be used as proxies for the complexity of Tor domains, taking both structure and contents into account. In particular, we identified all onions whose tree structure and/or text volume remained constant across all snapshots both to obtain a further measure of volatility and to better characterize stable services. Figure 1: Services persistence over time: outer disc is SNP1, middle disc is SNP2, inner disc is SNP3. Figure 2 shows the statistical distribution of tree heights for the three snapshots and for the 6961 hidden servicesapproximately 65% of all stable onion URLs -whose tree structure remains consistent across all snapshots (denoted "persistent" in the legend). The figure shows that the trees are generally very short -thus poorly informative on the "nature" of the service -and that only "short" trees seem to be persistent. Figure 3 instead shows the joint distribution of char and link counts, where, together with the three snapshots, we considered the 4590 hidden services -approximately 43% of all stable onion URLs -whose char count remains constant across all snapshots (again, denoted "persistent" in the legend). While the char count is generally variable, services with 0 links are predominant. This is especially true in hidden services with a persistent char count -hence, presumably persistent contents -that appear to be very peripheral, other than having a char count lower than average. Although there is no visible correlation between char and link counts, there seems to be an almost constant upper bound for the ratio of links over chars. This is not entirely surprising, since clickable hrefs require a small amount of text. In particular, we highlighted the plane region bounded by web pages having one link every 20 chars (i.e., ≈ 3 words) and one link every 200 chars (i.e., 1 to 2 sentences), a pattern that we may expect for link directories. In Section 4, we will correlate the links-to-char ratio (LCRatio) with a set of topological features for better investigating the relation of topological and content-related feature. The above-reported analysis also leads to a consequent finding: automatically detecting hidden services that stay durably online but with different names is hardly feasible, because often the contents vary with time even for stable hidden services. The tree structure and the char count are in fact only proxies for the contents of a hidden service, and yet the portion of stably-structured as well as that of stably-sized hidden services is far from 100%. Albeit we ideally aimed at determining the persistence of a service regardless of whether it changed its name -e.g., to prevent from being tracked down -in the remainder we will therefore rely on the onion URL as the unique hidden service identifier.

Characterizing the Tor Web Graphs
The aforementioned scraping procedures produced three WARC files. From each of them, we extracted two graphs: a Directed Service Graph (DSG) and an Undirected Service Graph (USG). As detailed in [22], a vertex of these graphs represents the set of pages belonging to a hidden service. In the DSG a directed edge is drawn from hidden service HS1 to HS2 if any page in HS1 contains, at least, a hypertextual link to any page in HS2 1 . The directed graphs obtained from the three snapshots are denoted DSG1, DSG2, and DSG3, respectively. In the USG, instead, an undirected edge connects hidden services HS1 and HS2 if they are mutually connected in the corresponding DSG, that is, if there exists at least one page in HS1 linking any page in HS2 and at least one page in HS2 linking any page in HS1. A vast majority of isolated vertices "naturally" emerges from only considering mutual connections; these are ignored in the following -i.e., we consider edge-induced graphs -since they convey no structural information. The undirected graphs obtained from the three snapshots are denoted USG1, USG2, and USG3, respectively.
As highlighted in Section 2, the snapshot graphs are inevitably conditioned by the effect of scraping a reputedly volatile network. We therefore also consider the intersection and the union of the aforementioned graphs. Precisely, we consider edge-induced intersection and union, because hyperlinks are generally "less stable" than hidden services (for more details, see [5]) and, again, we choose to ignore isolated vertices. This means that DSGI is induced by the set of common edges among DSG1, DSG2 and DSG3, whereas DSGU is induced by the set of edges existing in, at least, one of them. Analogously, USGI is induced by the set of common edges among USG1, USG2 and USG3, whereas USGU is induced by the set of edges existing in, at least, one of them. Both USGI and USGU are composed by multiple connected components.
We do not preserve multi-edges so as to permit a direct comparison with most previous work on other web and social/complex networks -indeed, even simple metrics (e.g., the degree) may be misleading for multi-graphs. However, algorithms on web graphs are often based -either directly or indirectly -on modeling a random web surfer [24], so we shall not neglect the fact that some edges may be more likely travelled by the surfer than others. In both directed and undirected graphs, we store the information about the number of links that have been "flattened" onto an edge as a weight attribute assigned to that edge -taking the minimum available weight for edges of our intersection graph and the maximum for the union. Whether to use this weight in practice will be separately discussed for each task. We believe that the edge weight may be interpreted as a measure of connection strength, expressing endorsement/trust but not altering distances. This is the reason why we will ignore it in all global metrics discussed in the following.

Global Metrics
As a first step towards a possible understanding of Tor dynamics, we characterize our ten graphs relying on well-known metrics, summarized in Table 2 -while graph-related symbols used throughout the paper are reported in Table 1.
Many of such metrics are undefined for disconnected graphs, or may provide misleading results when evaluated over multiple isolated components. To make up for it and allow for a fair comparison, in the remainder of this section we only consider the giant (weakly) connected component of all disconnected graphs. It is worth mentioning that the three DSGs, and therefore their union DSGU, consist of a single weakly connected component. On the contrary, all USGs are weakly disconnected graphs. DSGI is also disconnected, albeit only two hidden services -violet77pvqdmsiy.onion and typefacew3ijwkgg.onion -are isolated from the rest and connected by an edge. We will again consider the graphs in their entirety in Section 3.3.  Average (in-/out-) degree ρ Assortativity: see (26) Average shortest path length E glo Global efficiency: Global metrics valid for DSG only Out-degree centralization: Global metrics valid for USG only ∆/N Normalized maximum degree Cen Degree centralization: C Global clustering coefficient: # closed triplets # all triplets Table 3 provides a first glimpse of the structure of the Tor web in terms of global metrics of our DSGs. We observe a significant variance in the sizes N and M of the three snapshots, which is however consistent with publicly available aggregated statistics 3 , as already discussed in [5]. In all graphs, there are huge out-hubs reaching 35% to 61% of the network, but no equivalently prominent in-hubs. The values of Cen out show that the network tends to be out centralized and the high values of ∆ out /N denotes the importance of large out-hubs in the graphs' connectivity. However, the greatest of such hubs emerges in the largest graphs, and ∆ in /N and ∆ out /N are comparably smaller in DSGI with respect to the snapshots, suggesting that the degree of such stable hubs is heavily influenced by the non-persistent nodes. Our analysis suggests the presence of a stable core mainly composed by in-and out-hubs, and of an unstable periphery composed of nodes that point to or are pointed by the nodes in the core. The assortativity ρ is computed according to Newman's original definition [25], i.e., it measures the correlation between a node's out-degree and the adjacent nodes' respective in-degree. All networks are disassortative, meaning that links are more likely to connect high-out-degree nodes to low-in-degree nodes, or low-out-degree nodes to high-in-degree nodes. In other words, hubs (i.e., link directories) "rarely" link to authorities. The reported transitivity T measures how often vertices that are adjacent to the same vertex are connected in, at least, one direction. This happens more often than expected, as it emerges comparing deg N with the measured values for T . The diameter d and the average path length dist are approximately logarithmic in N , as in most social and web graphs. In other words, the Tor web graph is a small world. However, these metrics only consider finite (i.e., existing) paths. If we also include infinite paths, as done when computing the global efficiency E glo , we see that many vertex pairs are disconnected and information diffusion in the Tor web is quite inefficient.  Table 4 we immediately see that the sizes N and M of the three snapshots are again variable, but, somewhat surprisingly, USG3 is now the biggest. This is probably related to the existence in USG3 of a large hub that is absent in the other two snapshots. Indeed the value of Cen shows that the network tends to be centralized and the high value of ∆/N denotes the importance of such large out-hubs in the graphs' connectivity. Albeit all of these graphs are much smaller, the average distance and the diameter are comparable to, or even greater than, their directed counterparts. This tells us that long "bidirectional" paths exist and that, when only mutual connections are considered, the Tor web is not a small world anymore. For undirected graphs, the assortativity ρ measures the tendency of a node to connect with other nodes having similar degree. As all networks are again disassortative, we know that hubs are more likely connected with peripheral nodes. However, in both USG1 and USG2, where the tendency of vertices to cluster together is not hidden by the presence of a dominant hub, we notice that the clustering coefficient C is one order of magnitude greater than deg N , that is, its estimate in completely random graphs. Finally, these graphs are much more efficient than the DSGs, albeit this may be due simply to all pairs now being connected in both directions (in fact, E glo ≈ 1 dist ). Figure 4 shows the distributions of the in-and out-degree for all five directed graphs on a log-log scale. We fitted a power-law to the distribution using the statistical methods developed in [26], relying on the implementation provided by the POWERLAW python package [27]. We also fitted a log-normal and reported the comparison of these two fits in the figure. As already suggested in the literature [28], a log-normal distribution may be a better fit of degree distributions in many complex networks. In particular, a recent work suggests that a log-normal distribution may emerge from the combination of preferential attachment and growth [29]. In our DSGs, the log-normal fit slightly outperforms the power-law fit for the tail of the distribution. It is worth specifying that POWERLAW autonomously finds a lower-bound k min for degrees to be fitted. In this case, even if k min is much less than the maximum degree, all values greater than k min account for just a small percentage of the whole graph. However, we believe this should not prevent from taking these fits seriously into consideration: the tail of the distribution de facto describes the central part of the graph that actually has a meaningful structure -as opposed to the bulk of the distribution mostly depicting vertices with out-degree 0 (83% to 95% of the graph according to the specific DSG considered) and/or in-degree 1 (17% to 43%).

Degree Distribution
Looking at the numbers, we immediately realize the huge difference between the in-degree and out-degree distribution. This is not surprising, since very different interpretations can be given of these two quantities: while the in-degree measures the ability of a service to attract interest, the out-degree is just a measure of a "design choice" -how many links to include in a web page -which is related to the nature of the service but not inherently associated to any social characteristics of the network. For DSG2, DSG3 and DSGU the α exponent of the in-degre distribution is greater than the threshold 3 that is known to control the variance of the distribution, whereas for DSG1 α is close to 2.9 and for the DSGI graph it is close to 2.7. Intuitively, this says that the DSG1 and the DSGI "look more like" social networks. All out-degree distributions have instead α ≈ 1.5, which is a very low value reflecting the existence of many large out-hubs -i.e., link directories or similar web services. In other words, the long tail of the out-degree distribution says that the large value of ∆ out observed in Section 3.1 is not an isolated case, but rather an evidence of a general trend.
In Figure 5, we report the fitted degree distribution for the USGs. These plots seem to broadly confirm the insights provided by the DSGs. In addition, they show that only considering mutual connections is apparently an effective way for preserving the social structure of the graph when switching from a directed to an undirected graph. Furthermore, apart from a couple of outliers in the union graph, the power-law fit is especially accurate in the two largest graphs -the USG2 and the USGU -somehow speaking in favour of the importance of keeping all available information together to capture the social dynamics of Tor.
Motivated by the long tail of the degree distributions and with the purpose of gaining a better understanding of how the whole graph can be explored from just a few starting points, in Figure 6 we plot the cumulative percentage of the giant component of the network that is at distance one from the 25 top hubs -i.e., top out-degree vertices in the DSGs and top degree vertices in the USGs. We see that just a few out-hubs suffice to reach ≈ 90% of the graph 4 in just one click in DSG2, DSG3 as well as in DSGU. In DSG1, however, we need the top-20 out-degree hidden services to reach the same percentage of the graph, albeit the top-6 out-degree services reach out to almost 80% of the nodes. This difference between DSG1 and the other snapshots is reflected in the behavior of DSGI. If we switch to the USGs, we see a completely different scenario. The giant components of USG3 and USGU are dominated by just two services, with all other giving a negligible contribution. On the other hand, the giant components of USG1, USG2 and USGI are much less centralized, in accordance with the much lower value for Cen reported in Table 4. What is especially surprising by comparing figures 6a and 6b is that shifting from directed to mutual connections seems to completely alter the topology of DSG2.

Community Structure
Let us now examine the community structure of the 10 Tor Web graphs. We used the well-known Louvain algorithm [30], based on modularity maximization. As often done in the literature [31], we considered edge weights to make it harder to break an edge corresponding to a hyperlink that appears several times in the dataset.        (h) DSGI out-degree.           In Figures 7 and 8 we compare the obtained community structures for all of our graphs. First, in Figures 7 we plot the distribution of cluster sizes for the DSGs and USGs, respectively. These two plots highlight that, at a high level, all DSGs appear to have a very similar structure, in terms of number and size of the clusters. This sort of similarity is still partially visible in the USGs, with the significant difference in the graph size apparently mostly impacting the size of the greatest communities. It is possibly more important to assess to which extent the obtained clusters are influenced by the network's volatility or, in other words, whether they are coherent across different graphs -an element in favor of the possibility that pinpointed clusters have an intrinsic meaning. In Figure 8 we use the well-known Adjusted Mutual Information (AMI) to compare the clusters emerged across different graphs. We recall that the AMI of two partitions is 1 if the two partitions are identical, it is 0 if the mutual information of the two partitions is the expected mutual information of two random partitions 5 , and it is negative if the mutual information of the two partitions is worse than the expected one. We see that common vertices are clustered in an extremely stable way in the USGs, meaning that the existence of a mutual link is -as expected -a stronger indicator of the similarity between two services. We also see that the union graphs DSGU and USGU, i.e., the graphs based on all collected data, are those whose community structure is less influenced by switching from the directed to the undirected graph. In some sense, this means that the clustering obtained for DSGU can be reasonably considered as an extension of the very meaningful partition obtained for USGU.

Bow-Tie Structure
Finally, as commonly done to describe Web graphs [7], in Table 5 we provide a bow-tie decomposition of our directed graphs, compared with previous work. Our findings broadly confirm what emerged in [10], i.e., that in the Tor Web the LSSC is very small and everything else falls into the OUT component -giving rise to a significantly different structure with respect to the WWW. However, by comparing three large crawls of the Tor Web we highlight a few features that were not noticed before. On the one hand, the share of the LSCC in the total size of the graph is very variable over time, to the point that in DSG3 it is ∼ 4× larger than in the other two snapshots. On the other hand, the structure of the DSGI graph is slightly more similar to the WWW, with all components being non-empty.

Discussion
Summarizing our results, we conclude that, albeit the Tor web graph presents a few common features of other real world networks, it has a significantly different structure with respect to the WWW. In the Tor Web the LSSC is very small and everything else falls into the OUT component. Only the structure of the DSGI graph is slightly more similar to the WWW, with all components being non-empty.
Generally speaking, Tor is a small world network composed by a large percentage of volatile hidden services. The network is characterized by the presence of in and out-hubs nodes that are critical for the graph's connectivity. The hubs are persistent/stable nodes whose degree is heavily influenced by non-persistent nodes. In particular, there are few in-hubs and several large out-hubs in the network -i.e., link directories or similar web services. More than half of the network's nodes are reachable in just one click from the top out-hubs. Peripheral services are however loosely connected, making the Tor Web an inefficient network.
Although many Tor nodes are not persistent, the graph on the whole seems to possess a meaningful and stable community structure. This is especially visible when only mutual connections are considered. "Bidirectional" ties not only provide more consistent communities, but they generally better preserve a few social structural features of the graph, such as the tendency to cluster of nodes having degree in some intermediate range. However, bidirectional paths in the Tor network are generally long -when they do exist. Considering only mutual connections, the Tor web is thus not a small world anymore.
Global metrics of the graph are widely consistent among snapshots. This means that the volatility of peripheral nodes does not heavily influence the global structure of the network. The DSGU and DSGI seem to capture all the features of the snapshots, reflecting in different ways some specific features that appear only in few of them. For these reasons in the next Sections we use only the intersection and union graphs to avoid the drawbacks of information overload.

Characterizing Tor Services
Hereafter, we shift the focus onto vertex properties of the Tor Web graph. With the purpose of sorting out the possible roles of a service in the network, we first provide a correlation analysis of several local structural properties. Among the considered set of metrics -recapped in Table 6 -we include the hubscore and authscore provided by the HITS algorithm [11]. These two metrics allow to identify and characterize the out-hubs ("hubs", i.e., high hubscore) and in-hubs ("authorities", i.e., high authscore) identified in Section 3, in a more proper yet conceptually similar way. We then compare semantic and topological features by making use of the DUTA [12] dataset of manually labeled hidden services. We measure the similarity of content-based and modularity-based clusterings and we determine to which extent a service's properties are related to its contents/structure. As already discussed in Section 3 we consider only the intersection and union graphs for the sake of clarity. pagerank PageRank: see [33] authscore Authority score: see [34] hubscore Hub score: see [34] efficiency Efficiency:  Table 6. We rely on Spearman's rank correlation coefficient -rather than the widely used Pearson's -for a number of reasons: (i) we are neither especially interested in verifying linear dependence, nor we do expect to find it; (ii) we argue that not all the considered metrics yield a clearly defined interval scale -while they evidently provide a ordinal scale; (iii) when either of the two distributions of interest has a long tail, Spearman's is usually preferable because the rank transformation compensates for asymmetries in the data; and (iv) recent work [35] showed that Pearson's may have pathological behaviors in large scale-free networks.
In the DSGs (figures 9a and 9b) we notice a few interesting trends -albeit not entirely surprising. The authscore tends to correlate more with the in-degree, closeness and pagerank, whereas the hubscore tends to correlate more with the out-degree, betweenness, efficiency, transitivity and LCRatio. In other words, vertices that are authoritative are, on average, easier to reach and may not be hubs. Hubs, on the other hand, are not necessarily authoritative, they facilitate information flows and are at the center of highly clustered regions. The LCRatio seems to perform pretty well, on average, as a measure of hubbiness, as expected. The eccentricity is instead uncorrelated or negatively correlated with all other metrics. This says that central nodes are either close to or entirely disconnected to any other service, while long paths exist that connect peripheral services.
The most remarkable aspect emerging from the correlation analysis of the USGs is probably the great impact that switching to mutual connections has on the distribution of local metrics. The results for USGI and USGU are not only very different from their directed counterparts, but they also significantly differ from each other. In the USGI what stands out is the lack of correlation between closeness and pagerank, and between the LCRatio and all other metrics.
In the USGU we instead notice a very interesting phenomenon: the closeness and the eccentricity "agree" with each other while they negatively correlate with all other measures.

Content-based Classification of Services
For contents analysis we rely on the DUTA dataset, the widest publicly available thematic dataset for Tor, consisting of a three-layer classification of 10250 hidden services [6,12]. Albeit the DUTA dataset does not cover our graphs entirely, it has the undeniable advantage of being manually tagged -by choosing it rather than carrying out a fresh new classification of our dataset, we trade coverage for accuracy. The percentage of vertices of our graphs contained in the DUTA dataset is significant, especially for the two intersection graphs (49.5% and 96%, respectively, dropping to 28% and 24.3% for the unions). Further, if we only consider the first 200 nodes ordered by out-degree, only ≈ 15% are not tagged. The DUTA dataset provides a two-layers thematic classification plus a language tag for each service. The thematic classes are further categorized as "Normal", "Suspicious" or "Unknown". The "Unknown" category only includes classes that correspond to services whose nature was impossible to establish: "Empty", "Locked" or "Down". Due to the limited information provided by these tags, we ignore all "Unknown" services in the following. For certain first layer classes (e.g., "Marketplace") that may be both "Suspicious" and "Normal", the second layer is exactly used to tell apart "Legal" and "Illegal" content. We consider the second layer for this purpose only, thus obtaining the customized version of the DUTA thematic classification reported in Table 7. In Figure 10 we compare the distribution of thematic tags in the DUTA dataset and in the four intersection and union graphs. We immediately see that "Hosting" services are predominant in all cases. We also see that the distribution in both the DSGI and the DSGU follows the original distribution quite closely, suggesting that the volatility of Tor's hidden services is unrelated to their content.   Finally, we notice that in the USGI and in the USGU, instead, some common classes are entirely missing or barely present (e.g., "Cryptocurrency") while some others are relatively much more frequent than in DUTA (e.g., "Social Network"). It is interesting that the latter are mostly classes related to sociality in a broad sense, again corroborating the idea that mutual connections better capture the social structure of Tor. Once we have assessed the global prevalence Figure 10: Distribution of tags from Table 7 in the DUTA dataset and in the four considered Tor Web graphs.
of different classes of services in our graphs, we aim at understanding whether a more specific pattern emerges when we focus on modularity-based communities. Since a single label from Table 7 is assigned to each service, the DUTA classification naturally induces three hard partitions, denoted "duta" (the individual classes), "duta type" (the macro categories "Normal" and "Suspicious") and "lang" (the language) in the following. For the set of hidden services the our graphs share with the DUTA dataset, we can assess the coherence of topic-based and modularity-based clustering by plotting the AMI of "duta", "duta type" and "lang" with respect to the Louvain's clusters discussed in Section 3.3. From Figure 11 it emerges very clearly that modularity-based clusters are not thematically uniform, since the mutual information of the two partitions is always barely greater than the mutual information of two random partitions. Our analysis makes clear that DUTA clusters and Louvain's clusters are substantially unrelated. Figure 11: The comparison of the topic-based partition induced by the DUTA dataset and the modularity-based partitions obtained through Louvain's algorithm on our graphs.

Topological Features for Content-based Classification
Finally, in this section we measure the information gain provided by topological vertex properties with respect to content-based classification. To this end, we proceed as follows: • For each category C, we consider the dummy variable X C that indicates whether a randomly picked service belongs to the considered category. • We let each metrics m induce a probability distribution P m over the set of all services, in such a way that the probability of selecting a service is proportional to the value of that metrics for that service. • To measure the importance of knowing a metrics m with respect to a specific category C, we compare the distribution of X C under two different assumptions: that the services are drawn based on P C and that they are drawn uniformly at random -the latter meaning that Pr[X C = 1] is the overall prevalence of C in the graph. • As a measure of information gain, we use the Kullback-Leibler divergence.
Since the statistical relevance of the above approach relies on a reasonably sized sample, we will only consider the DSGI and DSGU.
In Figure 12 we show the obtained results, separately considering "Normal" classes, "Suspicious" classes and their aggregate. Interestingly, the DSGI and the DSGU broadly provide the same view. Generally speaking, most of the metrics appear to be uninformative with respect to content-based categories, i.e., the probability of finding a service of a specific class does not increase or decrease significantly when we select the service with probability proportional to most of its topological properties. However, there are a few remarkable exceptions: (i) the out-degree and the hubscore are especially informative about hosting services and illegal forums; (ii) services discussing religion topics are highlighted by their efficiency and transitivity, arguably because they tend to strongly cluster together; (iii) in the DSGU, the transitivity is also somewhat informative of services that focus on drugs, while the LCRatio is associated with hosting services, even though not as much as one could expect. These class-level information gains are only partially able to explain the notable improvement that many metrics instead seem to provide to the goal of telling apart, more in general, "Suspicious" and "Normal" services. This opens new perspectives towards the design of classifiers that make use of topological features instead of text analysis.