Keywords

1 Introduction

Tor provides a safe and concealed channel for clients to access the clear network. As of August 2021, there are approximately 1,300 exit relay nodes in the Tor network to provide services for clients [1]. Since the exit relay node is the last hop from the Tor network to the target website [2], it means that Tor’s onion encryption algorithm will not be able to protect the traffic between the exit relay node and the target website. This may directly expose some client traffic without encryption means (such as HTTP requests) [3, 4] to the attacker’s vision. Therefore, the identity of the exit relay node is very sensitive, which is directly related to the security and privacy of client traffic.

Since the relay nodes of Tor are contributed by volunteers, attackers can implant malicious nodes with different roles into tor network at a very small cost to realize their attacks, such as sybil attack [7]. A lot of work has proved that attackers have caused great harm to the anonymity and stability of tor network by manipulating controlled nodes [6,7,8]. Because the exit node acts as the middleman between the client and the target website, it is common for attackers to attack by controlling the exit node. [12], The reason why attackers are happy to implant malicious exit nodes may be that some government or school researchers try to destroy the anonymity of tor network; Another kind of attacker is for benefit. The attacker can hijack the transaction traffic and fake the content in the traffic by implementing man in the middle attack [8], so as to gain profits.

Tor officials and some researchers have launched defense against some common exit node attacks, such as traffic sniffing, DNS pollution and SSL based attacks [9, 19] through practical work. For example, Tor officials launched Tor Metric project [10] to measure the ecology in Tor network. Philipp Winter et al. designed the ExitMap tool to detect malicious exit nodes [9]. In 2016, Philipp Winter’s team again designed Sybilhunter tool to detect the sybil nodes of the Tor network [11]. These works have achieved remarkable results, but in the face of some new malicious exit node attacks, the results of the above methods are not satisfactory. When we observed the behavior of exit nodes in tor network for a long time, we find a phenomenon: the host IP address directly connected to the target website is inconsistent with the consensus IP address of exit nodes. We define it as a new attack mode of exit nodes. Unfortunately, the existing exit node probing tools can not detect this malicious behavior, because this silent attack method cannot trigger the detection module of the detection tool.

We have developed ExitSnifer system based on Python language, which depends on Ubuntu 16.04 system environment. ExitSniffer is designed with the idea of distributed clustering. It consists of three cloud hosts (with 1 core CPU and 4 GB of RAM) distributed in different countries, and can control multiple Tor clients to complete operations at the same time. The frequency of scanning nodes can be controlled by setting the timing module of ExitSniffer.

The principle of ExitSniffer is: use the exit node to be tested to access the website with the function of detecting IP through the Stem control protocol. It will compare the IP address returned by the website with the IP address of the exit node in the consensus file to judge whether the exit node is malicious.

By analyzing the results returned by exitsnifer, we find that the consensus IP of the malicious exit node has a malicious binding relationship with the IP returned by the website. Another phenomenon is that the IP returned by the website is in the same/24 network segment as the consensus IP of the exit node. However, we cannot distinguish whether the actual IP and consensus IP are different network cards of the same host, or different hosts. However, this malicious binding relationship indicates that there is a co-owner relationship between exit nodes.

To sum up, we make the following key contributions:

  • We find that the exit node is not directly connected to the target website, and defined this type of exit node as: a malicious exit node with a proxy.

  • We designed and implemented ExitSniffer, which is capable of detecting malicious exit node with proxies in consensus files in real time. It adopts distributed design, and is lightweight and easy to deploy.

  • We analyze the influence of these malicious exit nodes with proxies on Tor networks and explore the co-owner relationship of these nodes.

2 Related Work

Due to the low threshold of implanting nodes into tor network, it promotes the continuous growth of tor network, but it will also make tor network absorb some malicious nodes. Up to now, tor network has been subject to Traffic Association attacks and node attacks by attackers, such as traffic confirmation attack, DoS attack, sybil attack, etc. However, some researchers have always supported the development of tor, and the communities involved in tor have been active. They have developed many tools to defend against tor attacks through their practical work.

Murdoch et al. [22] verified the IX level attack. By sampling the real traffic and extracting the statistical characteristics such as message sending rate and message length, the entity communication relationship was confirmed. Nasr et al. [23] Designed a traffic association system deepcorr, which uses the deep learning architecture to learn the traffic association function suitable for tor complex networks, and can connect Tor. Ling et al. [24] control the sending law of Tor cell on the controllable OR node. Wang et al. [25] proposed an attack scheme. Once the attacker finds that the target page of his exit node responds to traffic, he can inject malicious network links with empty images to make the browser on the client side download these links to generate specific traffic model.

However, there are some researchers who have been working on the defense of Tor. Akhoondi et al. [26] designed an efficient algorithm to judge whether the tor link can be associated by as traffic, and designed a new Tor client LASTor based on this algorithm. LASTor can avoid establishing tor links that can be associated by as traffic, so as to improve the security of tor dark network. Philipp Winter et al. designed the ExitMap tool to detect malicious exit nodes [9]. In 2016, Philipp Winter’s team again designed Sybilhunter tool to detect the sybil nodes of the Tor network [11]. Sybilhunter integrates the functions of exitmap tool and adds HTML and HTTP injection detection.

By observing the exit nodes in the Tor consensus document for a long time, we find a malicious exit node with a proxy (MENP). However, the existing node scanning tools ignore this malicious node. Therefore, we designed a special software ExitSniffer to detect MENP nodes, which can scan all exit nodes in the consensus file in a short time. We hope that the ExitSniffer tool can become an extension of ExitMap and other tools to make up for the defense mechanism of malicious exit nodes. Next we will introduce the composition of ExitSniffer and analyze the behavior of MENP nodes.

Fig. 1.
figure 1

It shows the schematic diagram of ExitSniffer. With a distributed design, ExitSniffer can start multiple Tor clients at the same time, create Tor links through Stem libraries, visit websites with IP detection function, and finally output the results to the specified database.

3 The Design of ExitSniffer and Phenomenon

3.1 The Design of ExitSniffer

Before using exitsnifer, it is necessary to introduce its working principle. As shown in Fig. 1, ExitSniffer will download the exit node information in the latest consensus file every hour, use Stem control protocol [14] to create the circuits, judge whether the exit node is evil by using the target exit node to access the website with IP query function, and finally output the result to the specified database. Stem is a Python controller library for Tor. With it we can use Tor’s control protocol to script against the Tor process, or build things such as Nyx. We use the Stem library to control Tor’s circuit creation process. For example, we can decide when to establish a circuit and which exit node to choose as the last hop of the circuit. The target website we use also has special functions. It can return the IP of the host directly connected to it. We can deploy a website with this function ourselves, because it is more secure. But using public websites(such as https://jsonip.com/) is more labor-saving, as long as we would like to take some security risks. Generally speaking, the exit node is directly connected to the target website. Therefore, the IP of the exit node obtained by the target website should be consistent with the IP of the exit node in the consensus document. If this condition is not established, we believe that there is a problem with the exit node.

ExitSniffer is a distributed sniffer tool developed in Python can that detect malicious exit nodes with proxies. We use a distributed setup, utilizing 3 Virtual Machines (VMs) on cloud environment provided by VultrFootnote 1. These virtual machines are located in different countries including Singapore, France and the United States, so as to ensure the diversity of traces. Each VM is configured with 1 core CPU and 4 GB of RAM. On each VM, 10 docker instances are deployed, and each docker with a separate Tor process (version 0.4.4.6). Next, we will introduce the data set captured using exitsnifer.

3.2 Dataset

We set the detection frequency of ExitSniffer to be executed every two hours. A total of 7125,133 data records were collected from 2020-02-18 to 2021-08-18 by ExitSniffer system. Unfortunately, due to machine failure and other reasons, our data collection process was interrupted during the two months from 2021-02-18 to 2021-04-18, but this did not affect the following work. This dataset has 10 fields:

  • consensus-IP:IP address of the exit node in the consensus file.

  • actual-IP:ExitSniffer actually obtains the IP address of the exit node.

  • bandwidth:Bandwidth of the exit node in the consensus file.

  • flags:The labels of the exit node in the consensus file.

  • or-port:The OR port of the exit node in the consensus file.

  • fingerprint:The fingerprint of exit node in consensus file.

  • nickname:The nickname of the exit node in the consensus file.

  • spent-time:The time it takes the ExitSniffer to scan an exit node.

  • status:ExitSniffer detects the result of an exit node.

  • recording-time:The local time when the ExitSniffer scans an exit node.

It is worth noting that the data set mainly contains the information of exit nodes in the consensus file, so it depends on the accuracy of the consensus file.

4 Experimental Analysis

By testing exit node behavior for a long period of time, we find a hidden phenomenon: the IP address in the consensus document of the exit node is inconsistent with the IP address connected to the target website. This is similar to adding an proxy between the exit node and the target website. We believe that this behavior is malicious and mark the exit node with this behavior as MENP(a malicious exit node with a proxy). The inconsistency between the consensus IP address and the actual IP address may be caused by malicious operations of the trunk owner, such as man-in-the-middle attacks. Figure 2 shows that MENP nodes can reroute traffic to Tor network or route it to hosts outside the Tor network. By analyzing the IP actually returned by ExitSniffer, we find that some of MENP nodes route the client traffic to nodes outside the Tor network. Furthermore, the attacker may control multiple malicious exit nodes, and first aggregate the traffic relayed through these controlled exits to a controlled node outside the Tor network, and then route it to the client’s target website. The behavior has a negative impact on the privacy protection of the client, because we don’t know what happed between malicious exit node and the target website.

Fig. 2.
figure 2

MENP nodes can decide whether to route the traffic that flows through them into or out of the Tor network. In Fig. 2(a), MENP nodes route traffic into the Tor network, and traffic eventually arrives at the target website from a random exit node. Figure 2(b) shows the MENP node routing traffic to a conspiracy host outside the Tor network

For malicious exit nodes with proxy (MENP) that route traffic outside the Tor network, there are two cases for Actual exit node IP (actual-IP) scanned by ExitSniffer:

  1. 1)

    Actual-IP is another IP address in the same/24 network segment of the Consensus IP address (Consensus IP) of the exit node. For example, by searching the Consensus file, the IP address of the exit node is 109.70.100.9, while the IP address detected by ExitSniffer is 109.70.100.27. It is impossible to tell whether such a node is a malicious one, even if its consensus IP is different from the one detected by the ExitSniffer, because there may be multiple network cards on the machine.

  2. 2)

    Actual-IP and the consensus IP address of the exit node are not in the same/24 network segment. As shown in Fig. 2(b), the client traffic is routed by the malicious exit node to other relays that are not part of the Tor network, during which the attacker is likely to carry out a man-in-the-middle attack.

Perhaps MENP nodes do nothing but route the traffic between the client and the target website to the proxy, which may also be a kind of goodwill behavior. According to the work of Zhao Zhang et al., some websites have implemented IP blocking on Tor exit nodes [13]. Adding a proxy outside the Tor network behind the exit node can circumvent a site’s blocking of Tor, but this is potentially risky because it could expose more traffic from clients and target sites to attackers.

4.1 The Size of the Malicious Exit Nodes

We analyze the data of MENP nodes (malicious exit node with a proxy) since 2020-02-18 to 2021-08-18 (where 2021-02-18 to 2021-04-18 data is missing). We totally find 1983 malicious exit relays. Figure 3 shows the proportion curve of the number of malicious exit relays to the total number of exit nodes in the consensus file. It can be seen that the proportion curve (green line) has relatively large fluctuations, which means that the MENP node has a large churn rate, that is, the attacker deploys the MXNO malicious node and implements malicious behavior in a short period of time. However, from 2020-07-18 to 2020-08-18, the number of MENP nodes accounted for 16\(\%\) of all exit nodes, which has reached a very large scale.

Fig. 3.
figure 3

This figure shows the loss of MENP nodes over time. It can be clearly seen that many MENP nodes run for a short time, so the proportion curve (green line) fluctuates greatly. (Color figure online)

4.2 Bandwidth Ratio of MENP Nodes

Tor’s selection of exit nodes follows the routing selection algorithm [15]. The client randomly selects a node as the exit node from the relay node set that meets the exit node label rules by using the bandwidth weighting algorithm [16]. The calculation method of weight\(_{final}\) of each alternative node is shown in the formula:

$$\begin{aligned} weight_{final}=weight*bw_{n}= \left\{ \begin{array}{cl} I_{n}*(W_{db}))*W_{d}*bw_{n},&{} n \in S_{guard,exit}\\ I_{n}*(W_{gb}))*W_{g}*bw_{n},&{} n \in S_{guard}\\ I_{n}*(W_{eb}))*W_{e}*bw_{n},&{} n \in S_{exit}\\ I_{n}*(W_{mb}))*W_{m}*bw_{n},&{} else \end{array} \right. \end{aligned}$$
(1)
$$\begin{aligned} I_{n}(x)= \left\{ \begin{array}{cl} x, n \in S_{dir}\\ 1, esle\\ \end{array} \right. \end{aligned}$$
(2)

Where S is the set of all nodes in the consensus file, n is a node in the set S, and bw\(_{n}\) is the consensus bandwidth of node n. S\(_{guard,exit}\) is a set of nodes with Guard and Exit labels in set S. S\(_{guard}\) is set of nodes with only Guard labels in set S. S\(_{exit}\) is set of nodes with only Exit labels in set S. W\(_{db}\), W\(_{gb}\), W\(_{eb}\), and W\(_{mb}\) can be obtained from the consensus file. W\(_{db}\) is the bandwidth weight with Guard and Exit tags, W\(_{gb}\) is the bandwidth weight with Guard tags, W\(_{eb}\) is the bandwidth weight with Exit tags, W\(_{mb}\) is the bandwidth weight without Guard or Exit tags, W\(_{d}\) is the W\(_{ed}\) parameter in the consensus file, W\(_{g}\) is the W\(_{eg}\) parameter in the consensus file, W\(_{e}\) is the W\(_{ee}\) parameter in the consensus file, and W\(_{m}\) is the W\(_{me}\) parameter in the consensus file.

Query parameters related to consensus files and make standardized discovery:

$$\begin{aligned} W_{g} = W_{gb}=W_{m}=W_{mb}=W_{e}=W_{eb}=W_{d}=W_{db}=1 \end{aligned}$$
(3)

It can be obtained by combining Eq. 1 and Eq. 2:

$$\begin{aligned} weight_{final}= bw_{n} \end{aligned}$$
(4)

According to the weighted bandwidth selection algorithm, the greater weight\(_{final}\) of the node, the greater the probability of the node being selected. This means that the greater the bandwidth of the node, the greater the probability P\(_{n}\) of being selected by the client as the last hop of the circuit.

$$\begin{aligned} P_{n}=\frac{weight_{final,n}}{\sum _{i=0}^{m}weight_{fina,il}}=\frac{bw_{n}}{\sum _{i=0}^{m}bw_{i}},i \in (0,m) \end{aligned}$$
(5)

In the above formula, P\(_{n}\) is the probability that the client selects exit node n, bw\(_{i}\) is the consensus bandwidth of exit node i, and m is the number of exit nodes in the consensus file.

Algorithm 1 is the weighted bandwidth algorithm of the exit node, and Algorithm 2 shows the algorithm for selecting the index of each node according to the weighted bandwidth value of each node. Therefore, we can deduce Eq. 1 and Eq. 5.

figure a
figure b

In Algorithm 2, even if the index of the selected node is found, the loop will not jump out immediately. This is to resist the attack of time consumption, and the attacker cannot calculate the value of rand\(\_\)value through the running time of the algorithm.

Fig. 4.
figure 4

It shows the bandwidth capabilities of MENP nodes.

Figure 4 shows that the bandwidth curve of the MENP node is basically stable. But it fluctuates greatly in the proportion of the total exit bandwidth. This is because the large churn rate at the Tor exit point causes the total export bandwidth to fluctuate greatly [11]. Through statistical analysis, the bandwidth of MENP nodes accounts for 10.12\(\%\) of the total exit bandwidth. According to Eq. 5, it can be concluded that the probability of clients selecting these malicious exit nodes as the last hop is 10.12\(\%\). This is a very scary thing, malicious exit nodes listen to or hijack about 10.12\(\%\) of the exit traffic without being detected by Tor officials.

4.3 Behavior Exploration of MENP Nodes

After 16 months of observation and data collection, We find that MENP nodes have two forms when routing client traffic: 1) rerouting traffic to Tor network; 2) routing traffic to hosts outside the Tor network.

Traffic Is Routed to the Tor Network (MRIT, Malicious Routing into the Tor Network). When using the ExitSniffer system to scan the exit nodes, we find that many exit nodes reroute client traffic to the Tor network. We suspect that the reasons for this are as follows: 1) The circuit is replaced. When Tor detects a circuit failure, it will select another circuit. Therefore, the exit node IP detected by exitsnifer is inconsistent with the target exit node’s consensus IP 2) Malicious operations by node owners. Some researchers reroute traffic to tor network in order to collect more traffic information.

Traffic Is Routed Outside the Tor Network (MROT, Malicious Routing Out of the Tor Network). The attacker manipulated the exit node to route the traffic of the client to the outside of Tor network. Although it is not ruled out that the exit node is innocent, it may be that the connection between the exit node and the target website is hijacked by the attacker. We cannot distinguish it, so we uniformly believe that the malicious exit node leads to such behavior.

Fig. 5.
figure 5

It shows the ratio of MRIT malicious nodes to MROT malicious nodes.

Fig. 6.
figure 6

It demonstrates the bandwidth capabilities of MROT malicious nodes.

Fig. 7.
figure 7

This figure shows the number of malicious nodes with MROT behavior that actually return IP addresses versus consensus IP addresses in the same/24 network segment.

Figure 5 shows that the number of MROT malicious exit nodes accounts for 67.59\(\%\) of the total number of MROT malicious exit nodes. Figure 6 shows the bandwidth capacity of MROT malicious exit nodes, which accounts for 68.30\(\%\) of the total number of MROT malicious exit nodes. Therefore, we turn more attention to MROT malicious exit nodes. We were surprised to find that when ExitSniffer scanned the malicious exit node with MROT behavior, the actual IP address obtained was in the same/24 network segment as the consensus IP address of the exit node. We were inclined to think that the ExitSniffer system had errors in obtaining the real IP address. We obtained the IP addresses of other network cards under the host, because we assume that if two IP addresses are in the same/24 network segment. We will aggregate them into one IP address. Such data records are common in our data set. For example: Consensus IP is on the same/24 network segment as actual IP, and ACutal IP1 is on the same/24 network segment as actual IP2.

Figure 7 shows the comparison of the number of malicious nodes with MROT behavior on whether the actual returned IP and the consensus IP are in the same/24 network segment. Most of the malicious exit node consensus IP and the actual measured IP are not in the same subnet. Moreover, the consensus IP and the actual returned IP present a one-to-one, one-to-many, many-to-one, and many-to-many malicious binding relationship. Table 1 shows the one-to-one relationship between consensus IP and actual IP, and Fig. 8 shows an exit node may be bound to multiple hosts outside the Tor network. Figure 9 shows that multiple exit nodes are bound to one or more hosts outside the Tor network. Our intuition is that multiple exit nodes are bound to one or more nodes outside the Tor network, which means that these exit nodes have a “co-owner” relationship [17, 18], which we will discuss in the next section.

Table 1. Consensus IP and actual IP have a one-to-one mapping relationship.
Fig. 8.
figure 8

It shows that a consensus-IP corresponds to multiple actual-IP.

4.4 The co-owner Relationship of the Malicious Exit Node

Multiple exit nodes route traffic to one or more IP addresses outside the Tor network, and we suspect that these exit nodes are held by the same person or organization, i.e., there is a “co-owner” relationship. We used graph algorithm and Gephi tool to display the results of family aggregation. A total of 35 co-host families were found, and the largest family included 230 exit nodes.

Fig. 9.
figure 9

It shows multiple consensus-IP corresponding to one or more actual-IP.

We can see that there is a large family in Fig. 10, which may seem inconceivable, but through deep analysis of the data, we find that the multiple acutal IP bound exit nodes have a large overlap, as shown in Table 2. Acutal IP 109.70.100.32 overlapped 66 exit nodes bound to 89.31.57.5. We believe that the same attacker controls host 109.70.100.32 and host 89.31.57.5, and their bound exit nodes jointly carry out malicious acts at the same time.

Fig. 10.
figure 10

We grouped the exit nodes of the same family together and identified them in the same color. (Color figure online)

Table 2. Different actual-IPS are bound to the same group of consensus-IPS. It means that these actual-IPS can be aggregated after exceeding the overlap threshold. Finally, cousensusIPs will form a large family.

As mentioned by Zhao Z. [13, 21], many websites [20] block the IP address of exit node. The malicious exit nodes detected in this paper may be a well-intentioned bridge between the client and the target website built by the owner of the exit node, but such behavior may bring more security risks. It will expose more users’ traffic to attackers, so it is reasonable for us to define this kind of node as malicious exit node.

5 Conclusion

In this paper, we revisited the trustworthiness of Tor exit relays. We designed and developed the ExitSniffer tool to continuously pay attention to scale and bandwidth of malicious exit relay nodes, the consensus IP is inconsistent with the actual returned IP, over a period of 16 months. By analyzing the anomalous binding relationship phenomena of malicious exit nodes, we totally find 1983 malicious exit relays which average contribute 10.12% bandwidth of total Tor exit relays bandwidth monthly, resulting tremendous threaten for Tor user’s anonymity according to the current path-relay selecting algorithm. Besides, according to our results, there exits two types of anomalous binding relationship consists 35 exit relay families which are neither announced in the consensus document or detected by the Tor network.