1 Introduction

“Dark web” is a generic term for the subset of the Web that, other than being non-indexed by popular search engines, is accessible only through specific privacy-preserving browsers and overlay networks. Those networks, often called darknets, implement suitable cryptographic protocols to the purpose of keeping anonymous the identity of both the services offering contents and the users enjoying them. The best known and most widespread of them is probably Tor, which takes its name from The Onion Routing protocol it is based upon. Tor guarantees privacy and anonymity by redirecting traffic through a set of relays, each adding a layer of encryption to the data packets they forward. The equivalent of a domain on the surface Web is called Hidden Service (HS) in Tor.

Past research on the Tor network has evaluated its security [8], evolution [22], and thematic organization [36]. Nevertheless, an in depth study of Tor’s characteristics is difficult due to the limited number of Tor entry points on the surface web. In this paper, building on and extending over previous results on the topic [5, 6], we aim at better characterizing the Tor Web by analyzing three crawling datasets collected over a five-month time frame. In line with previous work on the WWW [11] and with a recent trend for criminal networks and dark/deep web [5, 13, 20, 34], we investigate Tor as a complex system, shedding new light on usage patterns as well as dynamics and resilience of the Tor Web. We consider the Tor Web graph aggregated by HS, i.e., the network of Tor HSs connected by hyperlinks – not to be mistaken with the network of Tor relays. We analyze the topology of two different graph representations of the Tor Web – directed and undirected – also using local properties of the graphs to characterize the role that different services play in the network. Relying on a large dataset of manually tagged HSs [2], we relate a few structural properties with the thematic organization of Tor’s web content.

Along with the three snapshot graphs induced by the three crawling data sets, we also consider an intersection graph and an union graph, in an effort to discriminate intrinsic features from noise. As a side effect, the present paper also addresses several open questions about the persistence of the Tor Web, showing the actual changes that took place in the quality, quantity and shape of available services and in their interconnections over the considered time span.

Overall, Tor comes out having significant structural differences with respect to the WWW. Our main findings may be summarized as follows:

  • The Tor Web is a network which resembles a small world one but is somehow inefficient, consisting of a tiny strongly connected component (SCC) surrounded by a multitude of services that can be reached from the SCC but do not allow getting back to it.

  • The stable core of the Tor Web is mostly composed of in- and out-hubs, whereas the periphery is highly volatile. The in- and out-hubs are generally separate services in Tor.

  • The (relatively small) undirected subgraph of the Tor Web, obtained only considering mutual connections, is quite efficient despite it lacks most of the features of a small world network. As a matter of fact, the undirected graph better preserves the social organization of the graph, such as its community structure, which appears to be generally stable and, as such, meaningful.

  • Both the volatility of Tor’s HSs and the tendency of the HSs to cluster together are unrelated to the services’ content.

  • With a few exceptions, the topological metrics are scarsely informative of the activity occurring on a service; however, the “hubbiness” of a HS may be of some help in detecting “suspicious” activities (as defined in [1]).

To the best of our knowledge, the amount of data we collected for the study of the Tor Web exceeds previous efforts reported in the literature [6, 7, 20, 36], making possible an in-depth analysis.

1.1 Related Work

Interesting works studying the topology of the underlying network and/or semantically analyzing Tor contents have appeared so far.

Biryukov et al. [8] managed to collect a large number of hidden service descriptors by exploiting a presently-fixed Tor vulnerability to find out that most popular hidden services were related to botnets. Owen et al. [32] reported over hidden services persistence, contents, and popularity, by operating 40 relays over a 6 month time frame.

ToRank [1], by Al-Nabki et al. is an approach to rank Tor hidden services. The authors collected a large Tor dataset called DUTA-10K extending the previous Darknet Usage Text Address (DUTA) dataset [2]. The ToRank approach selects nodes relevant to the Tor network robustness. DUTA-10K analysis reveals that only 20% of the accessible hidden services are related to suspicious activities. It also shows how domains related to suspicious activities usually present multiple clones under different addresses. Zabihimayvan et al. [39] evaluate the contents of English Tor pages by performing a topic and network analysis on crawling-collected data composed of 7,782 pages from 1,766 unique onion domains. They classify 9 different domain types according to the information or service they host. Further, they highlight how some types of domains intentionally isolate themselves from the rest of Tor. Contrary to [1], their measurements suggest how marketplaces of illegal drugs and services emerge as the dominant type of Tor domain. Similarly, Takaaki et al. [37] analyzed a large amount of onion domains obtained using the Ichidan search engine and the Fresh Onions site. They classified every encoutered onion domain into 6 categories, creating a directed graph and attempting to determine the relationships and characteristics of each instance. Ghosh et al. [18] employed another automated tool to explore the Tor network and analyze the contents of onion sites for mapping onion site contents to a set of categories, and clustered Tor services to categorize onion content. The main limitation of that work is that it focused on page contents/semantics, and did not consider network topology.

A few research works focus on Tor’s illegal marketplaces. Duxbury et al. [16] examine the global and local network structure of an encrypted online drug distribution network. Their aim is to identify vendor characteristics that can help explain variations in the network structure. Their study leverages structural measures and community detection analysis to characterize the network structure. Norbutas et al. [31] made use of publicly available crawls of a single cryptomarket (Abraxas) during 2015 and leveraged descriptive social network analysis and Exponential Random Graph Models (ERGM) to analyze the structure of the trade network. They found out the structure of the online drug trade network to be primarily shaped by geographical boundaries, leading to strong geographic clustering, especially strong between continents and weaker for countries within Europe. As such, they suggest that cryptomarkets might be more localized and less international than thought before. Christin et al. [13] collected crawling data on specific Tor hidden services over an 8 month lifespan. They evaluated the evolution/persistence of such services over time, and performed a study on the contents and the topology of the explored network. The main difference with our work is that the Tor graph we explore is much larger, not being limited to a single marketplace. In addition, we present here a more in depth evaluation of the graph topology.

De Domenico et al. [15], used the data collected in [4] to study the topology of the Tor network. They gave a characterization of the topology of this darknet and proposed a generative model for the Tor network to study its resilience. Their viewpoint is quite different from our own here, as they consider the network at the autonomous system (AS) level. Griffith et al. [20] performed a topological analysis of the Tor hidden services graph. They crawled Tor using the scrapinghub.com commercial service through the tor2web proxy onion link. Interestingly, they reported that more than 87% of dark websites never link to another site. The main difference with our work lies in both the extent of the explored network (we collected a much more extensive dataset than that accessible through tor2web) and the depth of the network analysis (we evaluate a far larger set of network characteristics).

So far, one of the largest Tor dataset collected from an automated Tor network exploration is due to Bernaschi et al. [5]. They aimed at relating semantic contents similarity with Tor topology, searching for smaller connected components that exhibit a larger semantic uniformity. Their results show that the Tor Web is very topic-oriented, with most pages focusing on a specific topic, and only a few pages dealing with several different topics. Further work [6] by the same authors features a very detailed network topology study investigating similarities and differences from surface Web and applying a novel set of measures to the data collected by automated exploration. They show that no simple graph model fully explains Tor’s structure and that out-hubs govern the Tor’s Web structure.

1.2 Roadmap

The rest of the paper is organized as follows. In Section 2 we describe: (i) our dataset, including statistics about the organization of the hidden services as websites (tree map, amount of characters and links); (ii) the DUTA dataset we used for content analysis. In Section 3 we describe how we extracted our graph representations from the available data and we recall the definition of all graph-related notation and metrics used throughout the paper. In Section 4 we discuss and present the results of our in-depth analysis of the Tor Web, carried out through a set of structural measures and statistics. We study properties such as bow-tie decomposition, global and local (i.e., vertex-level) metrics, degree distributions, community structure, and content related distribution and metrics. Finally, we draw conclusions in Section 5.

2 Data

The present paper analyzes a dataset that is the result of three independent six-week runs of our customized crawler, resulting in three “snapshots” of the Tor Web: SNP1, SNP2 and SNP3. The design of the crawler and the outcome of the scraping procedures are reported in Appendix 1 and more extensively discussed in [6, 12].

It is quite common to analyze a dataset obtained by crawling the web. Yet, it must be kept in mind that the analysis may be susceptible to fluctuations due to the order in which pages have been first visited – and, hence, not revisited thereafter [26]. In the case of the Tor Web, the issue is exacerbated by the renowned volatility of Tor hidden services [7, 8, 32]. By executing three independent scraping attempts over five months, we aimed at making our analysis more robust and at telling apart “stable” and “temporary” features of the Tor Web.

In total, we reached millions of onion pages (more than 3 millions in the second run alone) and almost 30 thousands distinct hidden services. The distribution of these hidden services across the three snapshots is reported in Table 1. Albeit active services may temporarily appear offline to the crawler (e.g., due to all paths to those services being unavailable), these statistics are quite informative about the volatility of the Tor web. Just 10685 onion URLs were successfully reached by all three crawling runs. It is quite likely that those hidden services were durably present over the considered five months time frame; they account for, respectively, 83.3% of SNP1, 42.2% of SNP2 and 61.2% of SNP3. Among the hidden services that are absent in just one of the three data sets, especially notable are the 76 hidden services that reappeared in SNP3 after they disappeared during SNP2.

Table 1 Services persistence over time; in total we reached almost 30000 different hidden services

To provide a better picture of the complexity of Tor websites, for each and every hidden service, we proceeded as follows: i) we reconstructed the whole tree-structure of sub-domains and pages; ii) we computed the total number of characters and the total number of hyperlinks (i.e., number of hrefs in the HTML source). Figure 1 shows the statistical distribution of tree heights for the three snapshots and the distribution of tree height variations across different snapshots (for hidden services present in, at least, two snapshots). The trees are generally very short and do not vary remarkably over time, yet exceptions exist with variations comparable to the maximum “size” of a hidden service. The char count is generally variable, whereas services with 0 hyperlinks are predominant. A significant number of hidden services has one hyperlink every 20 to 200 chars (i.e., from \(\approx 3\) words up to \(\approx 2\) sentences). In the following sections we rely on the ratio of number of hyperlinks over number of characters (links-to-char ratio, or LCRatio) to assess whether hidden services that are central in the Tor Web graph are indeed just link directories or not. It is worth noting that, of the 10685 hidden services reached in all three snapshots, only \(\approx 65\%\) had a constant tree height and only \(\approx 43\%\) had a constant char count across all snapshots. Automatically detecting hidden services that stay durably online but with different names (e.g., to prevent being tracked down) thus requires manual work that lies beyond the scope of the present paper.

Fig. 1
figure 1

Distribution of tree heights in the three snapshots (a) and distribution of tree height variations across different snapshots for hidden services present in at least two snapshots (b)

For contents analysis we rely on the DUTA dataset, the widest publicly available thematic dataset for Tor, consisting of a three-layer classification of 10250 hidden services [1, 2]. Albeit the DUTA classification does not cover our dataset entirely, the percentage of HSs of our snapshots contained in the DUTA dataset is significant: for instance, \(\approx 49.5\%\) of the fully persistent HSs found in all three snapshots, and \(\approx 85\%\) of the 200 HSs having most hyperlinks to other HSs, have a DUTA tag. In addition, the DUTA dataset has the undeniable advantage of being manually tagged – by choosing it rather than carrying out a fresh new classification of our dataset, we trade coverage for accuracy.

The DUTA dataset provides a two-layer thematic classification plus a language tag for each service. The thematic classes are further categorized as “Normal”, “Suspicious” or “Unknown”. The “Unknown” category only includes classes that correspond to services whose nature could not be established: “Empty”, “Locked” or “Down”. Due to the limited information provided by these tags, we ignore all “Unknown” services in the following. For certain first layer classes (e.g., “Marketplace”) that can be both “Suspicious” and “Normal”, the second layer is exactly used to tell apart “Legal” and “Illegal” content. We consider the second layer for this purpose only, thus obtaining the customized version of the DUTA thematic classification reported in Table 2.

Table 2 The content-based classification used in this paper

3 Methods

3.1 Graph construction

From each of the three WARC Footnote 1 files obtained from the scraping procedures we extracted two graphs: a Directed Service Graph (DSG) and an Undirected Service Graph (USG). As detailed in [12], a vertex of these graphs represents the set of pages belonging to a hidden service. In the DSG a directed edge is drawn from hidden service HS1 to HS2 if any page in HS1 contains, at least, a hypertextual link to any page in HS2Footnote 2. The directed graphs obtained from the three snapshots are denoted DSG1, DSG2, and DSG3, respectively. In the USG, instead, an undirected edge connects hidden services HS1 and HS2 if they are mutually connected in the corresponding DSG, that is, if there exists at least one page in HS1 linking any page in HS2 and at least one page in HS2 linking any page in HS1. More formally an edge \((u,v) \in E_{USG}\) iff \((u,v) \in E_{DSG}\) and \((v,u) \in E_{DSG}\), Figure 2 shows an example of construction of a DSG and a USG. When we consider just mutual connections, a vast majority of vertices remains isolated. These are ignored in the following since they convey no structural information. In other words, we consider edge-induced graphs. The undirected graphs obtained from the three snapshots are denoted USG1, USG2, and USG3 respectively.

Fig. 2
figure 2

A toy example showing a Directed Service Graph (DSG) and an Undirected Service Graph (USG). The USG is built from the DSG by keeping only mutually connected hidden services. We consider edge-induced graphs, thus isolated vertices are ignored

Since the snapshot graphs are inevitably conditioned by the effect of scraping a reputedly volatile network, we also consider the edge-induced intersection and union of the aforementioned graphs. Precisely, we denote DSGI the graph induced by the edge set \(E_{DSGI}= E_{DSG1} \cap E_{DSG2} \cap E_{DSG3}\) and DSGU the graph induced by the edge set \(E_{DSGU}= E_{DSG1} \cup E_{DSG2} \cup E_{DSG3}.\) Analogously, USGI is induced by the edge set \(E_{USGI}= E_{USG1} \cap E_{USG2} \cap E_{USG3}\) and USGU is induced by the edge set \(E_{USGU}= E_{USG1} \cup E_{USG2} \cup E_{USG3}.\)

We do not preserve multi-edges in order to allow a direct comparison with most previous work on other web and social/complex networks. However, in both directed and undirected graphs, we store the information about the number of links that have been “flattened” onto an edge as a weight attribute assigned to that edge – taking the minimum available weight for edges of our intersection graph and the maximum for the union. We interpret the edge weight as a measure of connection strength that does not alter distances but expresses endorsement/trust and quantifies the likelihood that a random web surfer [33] travels on that edge.

3.2 Graph Analysis

In line with previous work on Web and social graphs [11, 19, 21, 26], we analyze the Tor Web graph through a set of structural measures and statistics, including a bow-tie decomposition of the directed graphs, global and local (i.e., vertex-level) metrics, and modularity-based clustering. The main graph-based notions and definitions are reported in the following, while graph-related symbols used throughout the paper are reported in Table 3.

Table 3 Basic graph notations and definitions used throughout the paper

Bow-Tie decomposition In a directed graph, two vertices u and v are strongly connected if there exists a path from u to v and a path from v to u. Strong connectedness defines equivalence classes called strongly connected components. A common way to characterize a directed graph consists in partitioning its vertices based on whether and how they are connected to the largest strongly connected component of the graph. This “bow-tie” decomposition [26] consists of six mutually disjoint classes, defined as follows: (i) a vertex v is in LSCC if v belongs to the largest strongly connected component; (ii) v is in IN if v is not in LSCC and there is a path from v to LSCC; (iii) v is in OUT if v is not in LSCC and there is a path from LSCC to v; (iv) v is in TUBES if v is not in any of the previous sets and there is a path from IN to v and a path from v to OUT; (v) v is in TENDRILS if v is not in any of the previous sets and there is either a path from IN to v or a path from v to OUT, but not both; otherwise, (vi) v is in DISCONNECTED.

Global metrics To characterize our ten graphs we resort to well-known metrics, summarized in Table 4. Most of these metrics have a straightforward definition. Let us just mention that: in directed graphs, following Newman’s original definition [30], the assortativity \(\rho\) measures the correlation between a node’s out-degree and the adjacent nodes’ respective in-degree; in undirected graphs, \(\rho\) measures the correlation between a node’s degree and the degree of its adjacent nodes; the global efficiency \(E_{glo}\) is the average of inverse path lengths; in directed graphs, the transitivity T measures how often vertices that are adjacent to the same vertex are connected in, at least, one direction; the clustering coefficient C is the transitivity in undirected graph, defined as the ratio of closed triplets over total number of triplets. Many of the metrics from Table 4 are undefined for disconnected graphs, or may provide misleading results when evaluated over multiple isolated components. To make up for it and allow for a fair comparison, we only consider the giant (weakly) connected component of all disconnected graphs. It is worth mentioning that the three Directed Service Graphs (DSGs), and therefore their union DSGU, consist of a single weakly connected component. On the contrary, all Undirected Service Graphs (USGs) are weakly disconnected graphs. DSGI is also disconnected, albeit only two hidden services – violet77pvqdmsiy.onion and typefacew3ijwkgg.onion – are isolated from the rest and connected by an edge. We instead consider the graphs in their entirety for other types of analysis.

Table 4 Global metrics notations and definitions

Correlation analysis of centrality metrics We perform a correlation analysis of several local structural properties to the purpose of sorting out the possible roles of a service in the network. We rely on Spearman’s rank correlation coefficient – rather than the widely used Pearson’s – for a number of reasons: (i) we are neither especially interested in verifying linear dependence, nor we do expect to find it; (ii) we argue that not all the considered metrics yield a clearly defined interval scale – while they apparently provide a ordinal scale; (iii) when either of the two distributions of interest has a long tail, Spearman’s is usually preferable because the rank transformation compensates for asymmetries in the data; and (iv) recent work [27] showed that Pearson’s may have pathological behaviors in large scale-free networks. The considered metricsFootnote 3 are shown in Table 5. In words: the betweenness of v measures the ratio of shortest paths that pass through v; the closeness of v is the inverse of the average distance of v from all other vertices; the pagerank of v measures the likelihood that a random web surfer ultimately lands on v; the authscore and hubscore of v, jointly computed by the HITS algorithm [25], respectively measure how easy it is to reach v from a central vertex or to reach a central vertex from v; the efficiency of v is the average inverse distance of v from all other vertices; the transitivity of v is the ratio of pairs of neighbors of v which are themselves adjacent; the eccentricity of v is the maximum distance of any other vertex from v; the LCRatio of v is not a graph-based metrics, but we defined it as the ratio of number of hyperlinks over number of characters in the text extracted from the HS associated to v.

Table 5 Local metrics notations and definitions

Degree distribution We perform a log-normal and a power-law fit of the degree distribution of all graphs using the statistical methods developed in [14], relying on the implementation provided by the powerlaw python package [3]. A log-normal distribution may be a better fit of degree distributions in many complex networks [28], and a recent work suggests that a log-normal distribution may emerge from the combination of preferential attachment and growth [35]. Nevertheless, using a power-law fit is standard practice in the study of long-tailed distributions and allows direct comparison with previous works. It is worth specifying that powerlaw autonomously finds a lower-bound \(k_{\min }\) for degrees to be fitted. In our case, even if \(k_{\min }\) is much less than the maximum degree, all values greater than \(k_{\min }\) account for just a small percentage of the whole graph. However, we believe this should not prevent from taking these fits seriously into consideration: the tail of the distribution de facto describes the central part of the graph that actually has a meaningful structure – as opposed to the bulk of the distribution mostly depicting vertices with out-degree 0 (83% to 95% of the graph according to the specific DSG considered) and/or in-degree 1 (17% to 43%). The procedure by which we calculate the reach of the most important hubs of each network is the following: taking into account just the giant component, we i) sort the hidden services by degree (out-degree in the DSGs); ii) compute the cumulative percentage of the giant component that is at distance one from one of the first i hubs, for \(i\in \{1,\ldots ,25\}\).

Community structure To extract a community structure for our graphs we rely on the well-known Louvain algorithm [9], based on modularity maximization. As often done in the literature [19], we consider edge weights to make it harder to break an edge corresponding to a hyperlink that appears several times in the dataset. To compare the clusters emerged across different graphs, we consider how common vertices are grouped in each graph using the well-known Adjusted Mutual Information (AMI) to measure the similarity of two partitions. The AMI of two partitions is 1 if the two partitions are identical, it is 0 if the mutual information of the two partitions is the expected mutual information of two random partitions, and it is negative if the mutual information of the two partitions is worse than the expected one. Since a single label from Table 2 is assigned to each service, the DUTA classification naturally induces three hard partitions, denoted “duta” (the individual classes), “duta type” (the macro categories “Normal” and “Suspicious”) and “lang” (the language) in the following. For the set of hidden services that our graphs share with the DUTA dataset, we can assess the coherence of topic-based and modularity-based clustering by computing the AMI of “duta”, “duta type” and “lang” with respect to the Louvain’s clusters.

3.3 Topological features for content-based classification

To measure the information gain provided by topological vertex properties with respect to content-based classification, we proceed as follows:

  • For each DUTA category C, we consider the dummy variable \(X_C\) that indicates whether a randomly picked service belongs to the considered category.

  • We let each metrics m induce a probability distribution \(P_m\) over the set of all services, in such a way that the probability of selecting a HS is proportional to the value of that metrics for that service.

  • To measure the importance of knowing a metrics m with respect to a specific category C, we compare the distribution of \(X_C\) under two different assumptions: that the HSs are drawn based on \(P_C\) and that they are drawn uniformly at random – the latter meaning that \(\Pr [X_C=1]\) is the overall prevalence of C in the graph.

  • As a measure of information gain, we use the Kullback-Leibler divergence. The KL divergence lies in \([0,+\infty ]\), and it is 0 if the two distributions coincide.

4 Results and discussion

Hereafter, we summarize and discuss our main findings; additional explanations, statistics and figures are available in the Appendices. Since we monitored Tor over a sufficient time span, our analysis is robust under fluctuations of the results obtained for different snapshots. The union and intersection graphs, in particular, capture most of the features of the snapshots, reflecting in different ways some of their specific characteristics. We will therefore often focus on such graphs to provide a clear and synthetic overview of the results.

The bow-tie decomposition of the DSGs is reported and compared with previous work in Table 6. In general agreement with [20], we found that the Tor Web has a radically different structure with respect to the WWW, except, in part, for the DSGI graph where all components are non-empty. The Tor Web consists of just a very small LSCC and a much larger OUT component, albeit the share of the LSCC in the total size of the graph may be heavily influenced by the volatility of the network.

Table 6 Bow-Tie structure

At a first sight, the Tor Web, seen as a directed graph, seems to show the key features of a small world network: the transitivity T is one order of magnitude greater than \(\langle \deg \rangle /N\), which is the expected transitivity in a comparable random graph; the distance between any two connected nodes is approximately logarithmic in N, as in most social and web graphs (see Appendix 1). A typical small world network, however, should be efficient, while the Tor Web has a very low global efficiency (\(E_{glo}\)), which is computed assigning infinite distance to non-connected vertex pairs.

The emerging structure entails that most resources of potential interest for Tor users are not easy to reach. The only way to effectively browse this network is to find a HS that belongs to the LSCC (possibly, through a link from the surface Web) and, from there, look for a path to the resource of interest. Navigation in the network is thus mostly tied to paths that connect the tiny LSCC to the many peripheral nodes, whereas the inverse route is precluded. This shows that the user experience in Tor is quite different from that of the WWW, and supports the general perception that most Tor users do not actually browse the Tor Web, but already know the onion url they want to visit.

The small-world effect that we observe in the DSGs is not visible in the undirected version of the Tor Web graphs, which have far fewer vertices but a comparable, or even greater, average distance and diameter than their directed counterparts. Most of the paths in the USGs, however, have length close to \(\langle \text{dist} \rangle\), so that \(E_{glo}\approx 1/\langle \text{dist} \rangle\) and the graphs are thus much more efficient than the DSGs. The clustering coefficient C is large in the USGI, but drops to \(\approx \langle \deg \rangle /N\) in the USGU due to the presence of a huge hub in USG3.

To gain insights into the local properties of the network, we performed a correlation analysis of several centrality metrics (see Appendix 2 for additional details). In Figure 3 we see that in the DSGs the central vertices can be broadly categorized in two groups. On the one hand, HSs having large in-degree, authscore, pagerank and closeness are those that provide the most valuable content: they attract connections from all other HSs, including “important” ones and are, on average, easier to reach. On the other hand, HSs having large out-degree, hubscore, betweenness and efficiency provide significant contribution to information flows and are at the center of highly clustered regions. The fact that the LCRatio correlates with the latter set of metrics suggests that these hubs are mostly link directories or similar Web services.

Fig. 3
figure 3

Spearman’s rank correlation coefficient between the considered local metrics for the DSGI and DSGU

Supported by the correlation analysis, we then focused on the degree sequences (in- and out-degree for the DSGs) to gain information on the hierarchical organization of the network. The tails of the distributions, in particular, describe the central part of the graph having a meaningful structure, whereas the bulk of the distribution mostly depicts peripheral HSs with very low degree. We performed both a log-normal and a power-law fit of the degree distributions (see Appendix 3 for details). While the former is slightly more accurate, power-law fits are widely used in the literature and looking at the \(\alpha\) exponent of the power-law is a straightforward way to classify the Tor Web graph with respect to the vast body of work on complex networks.

The in- and out-degree distribution of the DSGI and DSGU are shown in Figure 4. The value of \(\alpha\) obtained for the out-degree distribution lies consistently around 1.5 for all directed graphs. This may be interpreted as the emergence of some level of self-organization: the choice of how many links to include in its web pages, arguably taken in full autonomy by each HS, makes the network resilient and facilitates its navigability. While the low value of \(\alpha\) obtained for the out-degree distribution shows that hubs are quite common in Tor, the strong out-degree centralization \(\varDelta _{out}\) signifies that some of these hubs are especially large if compared with the others. In the DSGU, more than \(90\%\) of the graph is in fact at distance 1 from (at least) one of the top 6 hubs; in the DSGI, which is much less centralized, more than \(90\%\) of the graph is still at distance 1 from (at least) one of the top 23 hubs (see Appendix 3).

All DSGs are disassortative, meaning that most of the HSs linked by a hub have a low in-degree. This means that the neighborhoods of different hubs are, at least, partially disjoint. As we have seen, the in-degree correlates with a set of metrics that express the authority and measure the ease of reach of a service. For the in-degree, \(\alpha\) lies around the threshold 3 that is known to control the variance of the distribution, with \(\alpha \approx 2.7\) in the DSGI and \(\alpha \approx 3.3\) in the DSGU. This indicates that even authorities have a moderate in-degree and that, to match up with the out-degree, there are many HSs with a very low in-degree that may become almost impossible to reach due to minimal changes in the Tor link connectivity. Combined with the disassortativity, and contrary to what the out-degree distribution may suggest, this means that access to valuable information is barely granted in Tor.

Fig. 4
figure 4

The in- and out-degree distribution for the DSGI and DSGU

Fig. 5
figure 5

The degree distribution for the USGI and USGU

The degree distribution of the USGs mostly follows a power-law with \(\alpha\) exponent \(\approx 2.5\), closer to the value typically found in social networks, as visible in Figure 5. Mutual connections seem to represent the backbone of the social structure of the Tor Web graph, as also confirmed by a comparison of the distribution of DUTA topics in the DSGs and USGs (see Figure 6). While the DSGI and the DSGU follow the original DUTA distribution quite closely, the thematic tag distributions in the USGs show an increased prevalence of classes of HSs related to sociality in a broad sense, such as “Social Network” or “Forum”. This type of HSs are keener to link to other similar HSs, thus favoring the emergence of mutual links. More generally, depending on the tag, HSs could instead compete and, hence, not connect with one another. This could explain why common classes such as “Cryptocurrency” or “Drugs” are entirely missing or barely present in the USGI and in the USGU.

A few outliers in the degree sequence of the USGU show that the inferred scale-free distribution cannot fully explain the organization of mutual links in Tor. The combined neighborhoods of the two most central HSs, in particular, cover almost 90% of the USGU. The USGI, instead, is much less centralized. As all networks are again disassortative, we know that also in USGs hubs are more likely connected with peripheral nodes.

Fig. 6
figure 6

Distribution of tags from Table 2 in the DUTA dataset and in the four considered Tor Web graphs

We inferred the community structure of our graphs through modularity-based clustering and measured the similarity of the clusters obtained for different graphs computing the Adjusted Mutual Information (AMI) on the clusters projected on the set of common vertices. The community structure of the DSGs is very similar, in terms of number and size of the clusters, and reasonably consistent, taking into consideration the volatility of the network, with AMI \(\approx 0.5\) for all combinations (see Appendix 4). In any case, the apparent significance of the obtained clusters does not respond to a thematic homogeneity: for the set of HSs that our graphs share with the DUTA dataset, the coherence of topic-based and modularity-based clustering resulted in an AMI score \(\approx 0\).

While USGs have a more heterogeneous community structure, common vertices are clustered in an extremely stable way in the USGs, meaning that the existence of a mutual link is – as expected – a stronger indicator of the similarity between two services. We also see that the union graphs DSGU and USGU, i.e., the graphs based on all collected data, are those whose community structure is less influenced by switching from the directed to the undirected graph. In some sense, this means that the clustering obtained for DSGU can be reasonably considered as an extension of the very meaningful partition obtained for USGU.

To assess whether computing graph-based centrality metrics provides any advantage to the purpose of inferring the thematic tag of a HS, we proceeded as described in Section 3.3. In Figure 7 we show the measured information gain for the DSGI and DSGU, separately considering “Normal” classes, “Suspicious” classes and their aggregate. The scenario for the other DSGs is almost identical and thus omitted, whereas the USGs were not considered because their limited size affects the statistical relevance of this method. Generally speaking, most of the metrics appear to be uninformative with respect to content-based categories, i.e., the probability of finding a service of a specific class does not increase or decrease significantly when we select the service with probability proportional to most of its topological properties. However, there are a few remarkable exceptions: (i) the out-degree and the hubscore are especially informative about hosting services and illegal forums; (ii) services discussing religion topics are highlighted by their efficiency and transitivity, arguably because they tend to strongly cluster together; (iii) in the DSGU, the transitivity is also somewhat informative of services that focus on drugs, whereas the LCRatio is associated with hosting services, even though not as much as one could expect. These class-level information gains are only partially able to explain the notable improvement that many metrics instead seem to provide to the goal of telling apart, more in general, “Suspicious” and “Normal” services. This opens new perspectives towards the design of classifiers that make use of topological features instead of text analysis.

Fig. 7
figure 7

The information gain provided by different metrics with respect to DUTA classes and macro categories

5 Conclusion

In this paper, we presented an in depth investigation of the key features of the Tor Web graph, providing a clear view on its topology and on how the topology is affected by the volatility of the network, inferring on the latent patterns of interactions among Tor users, and assessing whether graph metrics can be used to expose the thematic organization of the network. The Tor Web is composed of a large percentage of volatile hidden services and of mostly persistent hubs that are critical for the graph connectivity. The volatility of peripheral nodes does not heavily influence the global structure of the Tor Web graph, which consists of a small strongly connected component from which the remainder of the network can be reached in just a few steps. Albeit a small world effect can be observed, the Tor Web has a very low global efficiency and most resources of potential interest for Tor users are not easy to reach. The graph seems to possess a meaningful and stable community structure, not related to the thematic organization of the network, which is especially visible when only mutual connections are considered. The subgraph induced by mutual connections comprises just a tiny fraction of the nodes and includes a major presence of topics related to sociality in a broad sense. Considering a class-level categorization, most of the applied topological metrics appear to be scarcely informative with respect to the hidden services’ content. Nevertheless, some metrics seem to provide a notable improvement in the goal of telling apart “Suspicious” from “Normal” services.

We are used to consider the Web and online social networks as systems in which we can find or disseminate information. The Tor Web does not seem to be based on these two cornerstones: it is inefficient in spreading information and difficult to navigate. If compared with most real world complex networks, it has a fairly simple and asymmetric structure that is reflected in its navigation being facilitated only in one direction: users select a starting out-hub and then they move looking for the website of interest. Peripheral nodes, once reached, usually do not provide any possibility to go back and navigate in other directions. The number of hops required to reach a node, when possible, remains limited, but the overall structure is quite different compared to a typical small world network. As a consequence of these topological features, Tor provides a very different user experience from that of the WWW and online social media.

Future efforts will be devoted to widen the scope and the depth of the analysis. Any study of the dynamics on and of Tor would benefit from monitoring the Tor Web consistently over a long time range and possibly measuring the influence of exogenous factors (e.g. changes in the legislation or breaking news from the real-world) on the Tor Web organization. More generally, crawling specific areas of the surface Web (e.g., forums on Reddit or public groups on Whatsapp or Telegram) may lead to onion urls that could not be found scraping Tor itself. This could either confirm that the majority of Tor’s HSs are isolated from the subset of the Tor Web having a network structure, or, to the contrary, reveal a more complex system composed of multiple portions of the Tor Web connected through a layer of surface websites.