Keywords

1 Introduction

Bitcoin is a purely peer-to-peer version of electronic cash [1], which allow online payments to be sent directly from one party to another without going through a financial institution. It relies on digital signatures to prove ownership and a public history of transactions to prevent double-spending. Bitcoin does not rely on third-party credit, has strong anonymity. It mainly reflects three aspects: one is the anonymous transaction address. Bitcoin transaction address is created by the user independently, independent of user identity information, and does not require third-party participation to create and use the address; Second, the fragmented transaction behavior. Bitcoin system supports users to generate different addresses for each transaction. User transaction information can be arbitrarily dispersed in different anonymous address behaviors. Third, the source of Bitcoin transaction package is difficult to find in network. Bitcoin communication network uses P2P protocol, and there is no central node. Transaction information broadcasts all over the network. It is difficult to track the origin of transaction information by monitoring a single server. Because of its strong anonymity, Bitcoins are often used in gambling, illegal fund-raising, fraud, pyramid sale, money laundering and other illegal activities.

Traditional Bitcoin transaction anti-anonymity technology mainly includes two types: one is the network layer anti-anonymity method, which mainly detects and collects the transaction information broadcast by the Bitcoin network layer, analyzes the propagation path of a specific Bitcoin transaction in the P2P network, infers the IP address of the originating service node of the transaction, and then locates the user IP of the transaction. Another method is the anti-anonymity method at the transaction level, which mainly obtain user portrait information for a specific wallet address by analyzing transaction relationships between different transaction addresses, especially with the help of the labels of the addresses of exchanges, mining pools and other institutions. The above two types of anti-anonymity technologies are not effective because they cannot track the source of the user’s social identity information to which the transaction address belongs.

Because of the shortcomings of the traditional anti-anonymity technology of Bitcoin transaction, this paper integrates the data on and off the chain, studies and proposes an anti anonymity technology of Bitcoin transaction based on behavior vector mapping and aligning model. Build a social behavior vector based on off chain social data, and establish a mapping and aligning model with the transaction behavior vector based on Bitcoin ledger data, which can realize the anti-anonymity of Bitcoin address and transaction. Because the social behavior vector contains the real social identity information of users, this paper proposes anti anonymity technology, which has better practical effect than the traditional anti anonymity technology.

2 Bitcoin Transaction Overview

Every transaction in the Blockchain has a list of inputs and outputs, where each includes addresses that were used in the transaction and the amount of coins spent in that transaction. Inputs of the current transaction come from the outputs of the previous transaction, and the output of the current transaction will be used as the input in other transactions, which to form a transaction chain (see Fig. 1).

Fig. 1.
figure 1

Bitcoin transaction chain

There will be either a single input from a larger previous transaction or multiple inputs combining smaller amounts, and at most two outputs: one for the payment, and one returning the change, if any, back to the sender, which will be automatically selected by the Bitcoin client as the input in future transactions.

Bitcoin transactions can be roughly divided into two types: the first type is mining reward transactions. Each block has a mining reward transaction. This kind of transaction has no input but only output. The system transfers the mining reward of this block and fee of the transaction contained in the block to the output; The second type is ordinary transactions, including several inputs and several outputs.

Since multiple input addresses of a transaction correspond to different private keys, Bitcoin transferring the input needs the signature of the corresponding private key; Therefore, it is generally believed that multiple input addresses of a transaction belong to the same entity. So, with the help of transaction address clustering, the decentralized transaction behaviors of the same entity in the ledger can be gathered, which is convenient to master the behavior characteristics of the entity.

There are four kinds of transaction address clustering technology [2]. One is the clustering technology based on multiple input addresses. Multiple input addresses of a transaction belong to the same address cluster; The second is the clustering technology based on the change address. The change address of a transaction belongs to the same address cluster as the input address. At the same time, through the change address as the connecting link, the input addresses in the two transactions can be combined into the same address cluster; the third is the clustering technology based on mining reward transaction. Multiple output addresses of a mining reward transaction belong to the same address cluster. The fourth is the comprehensive clustering technology combining the above three clustering technology.

3 Transaction Scene Graph Structure

Bitcoin transaction scene include mining reward, depositor withdrawal on the exchange, gambling, blackmail, MLM fraud, etc. Among them, deposit and withdrawal of Bitcoin on the exchange are more popular.

Deposit transaction transfer Bitcoin held by the user’s personal wallet address to the deposit wallet address assigned to the user by exchange. The private key of the deposit wallet address is controlled by the exchange, and different deposit wallet addresses correspond to different users. Deposit transactions include customer to customer (C2C) transaction scene and business to customer (B2C) transaction scene.

The general characteristics of the graph structure of C2C deposit transaction are: a small number of transaction input and two outputs, one of which including user’s deposit wallet address, and the cluster label of this address is the name of exchange (see Fig. 2).

Fig. 2.
figure 2

Graph structure of C2C deposit transaction scene

B2C deposit transaction scene graph has a 1-to-N structure, which is generally characterized by a small number of transaction input addresses and a large number of transaction output, in which the output addresses are deposit wallet addresses of a large number of different users, and the cluster labels of different output addresses are the same or different exchange (see Fig. 3).

Fig. 3.
figure 3

Graph structure of B2C deposit transaction scene

Withdrawal transaction transfer Bitcoin hosted on the exchange to the wallet address specified by the user. In order to reduce the transaction fee, exchange usually collects multiple users’ withdrawal order and transfers Bitcoin to multiple users’ wallet addresses in one transaction.

The graph structure of withdrawal transaction has the characteristics of a 1-to-N structure. The cluster labels of transaction input addresses are the same exchange, and the transaction output addresses are specified by a large number of different users (see Fig. 4).

Fig. 4.
figure 4

Graph structure of withdrawal transaction scene

Each transaction needs to pay fee, in reality, there is a combination of deposit transaction and withdrawal transaction, that is, user withdraws Bitcoin on a exchange and deposit it to another exchange.

4 Traditional Bitcoin Anti-anonymity Technology

Traditional Bitcoin anti-anonymity technology mainly includes network layer anti-anonymity technology and transaction layer anti-anonymity technology.

Network layer anti-anonymity technology [3] refers to collecting transaction packet transmitted by Bitcoin P2P network, analyzing the propagation path of a specific Bitcoin transaction packet in P2P network, and inferring the server IP of the first broadcast node. For example, koshy et al. [4] used special transactions to find the originating node. Most normal transactions will be forwarded once by multiple nodes, while transactions with wrong format will only be forwarded once by the originating node. Therefore, this feature can be used to identify the originating node of special transactions. However, due to the small proportion of special transactions, the effect of this method is limited. In addition, biryukov et al. [5, 6] proposed a transaction traceability mechanism based on neighbor nodes, which can improve the traceability accuracy by taking neighbor nodes as the judgment basis. However, the scheme needs to continuously send packet to all nodes in Bitcoin network, which may cause serious interference to Bitcoin network.

The network layer anti-anonymity technology has a certain probability to speculate the initial service node IP of the transaction. Gao Feng, Mao Hong-liang and others [3] have achieved the anti-anonymity traceability accuracy with a recall rate of 60% and an accuracy rate of 35.3%. The traceability and positioning from the service node IP to the end-user IP needs to be combined with the operator’s traffic analysis technology and IP positioning data.

Transaction layer anti-anonymity technology refers to finding the correlation between different Bitcoin addresses by analyzing transaction records in Bitcoin ledger, so as to infer the transaction behavior law and capital flow of the transaction address. Liao et al. [7] analyzed the blackmail process of the blackmail software crypto locker by analyzing the Bitcoin ledger data, found multiple Bitcoin addresses belonging to blackmail organizations, and identified a large number of Bitcoin ransom transactions. Meiklejohn et al. [8] used heuristic cluster analysis technology to identify multiple Bitcoin addresses belonging to the Silk Road website. Guo Wen-sheng et al. [9] studied how to realize the division of Bitcoin entities with different types of characteristics through machine learning of Bitcoin ledger data.

Transaction layer anti-anonymity technology can analyze and speculate the characteristics of the trading behavior on the chain of a specific wallet address. Combined with the anti-anonymity label information of the exchange, mining pool and other platform institutions, it can speculate the ownership of some wallet addresses, but it is difficult to determine the user’s social identity information. In reality, many Bitcoin hacking incidents generally analyze the transaction data of Bitcoin ledger, track the exchange into which Bitcoin is transferred, and coordinate the exchange to provide user information of the Bitcoin addresses.

In recent years, the research on Bitcoin anti-anonymity technology by integrating data on and off the chain has gradually become a research hotspot. Husam et al. [10] found that Tor Network anonymous services and users by integrating online social network data and Bitcoin ledger data.

5 Behavior Vectors Mapping and Aligning Model

Due to the anonymity of Bitcoin transaction address and trading process, and the poor readability of Bitcoin ledger data, most centralized institutions or platforms, such as exchange and mixed service, will synchronously record the user identity information and behavior information corresponding to Bitcoin ledger data. The above data is called social data off chain. Although it does not contain Bitcoin address, making full use of this data can realize the positioning and anti-anonymity of transaction behavior of Bitcoin ledger data.

We define social behavior vector S including five dimensions: [time, value, scene, name and account]. Time is the time when user receives social data, value is the number of Bitcoinin social data, scene is the transaction scene describing in social information, name is the platform name, and account is the user’s social account. If only time and value are considered, and the transaction scene, platform name are missing or ignored, the accuracy of anti-anonymity will be affected in some complex cases.

Like social behavior vector, we define transaction behavior vector E including seven dimensions: [time, value, scene, input label, output label, input address, output address]. Time is the transaction time recorded in the Bitcoin ledger, value is the number of Bitcoin in transaction output, scene is the transaction scene inferred through graph structure analysis, input label is the clustering label of the transaction input address, output label is the clustering label of the transaction output address (non change address), input address is the transaction input address and output address is the transaction output address (non change address). If transaction behavior vector E and social behavior vector S satisfy the following conditions:

  1. â‘ 

    Difference between S.time and E.time is small, that is, the social time is close to the Bitcoin ledger transaction time, such as less than 10 min;

  2. â‘¡

    S. Value is equal to E.value, that is, the transaction values on and off the chain are consistent;

  3. â‘¢

    S. Scene is equal to E.scene, that is, the trading scenarios on and off the chain are consistent;

  4. â‘£

    For deposit transaction, S.name is equal to E.output lable, that is, the name of the platform name is consistent with the address clustering label on the chain.

Then, user’s social account S.account corresponding to Bitcoin transaction address E.output address can be considered. Because user’s social account is more unique and social than the IP and user behavior portrait, and can better reflect user’s social identity information.

6 Experiment and Result Analysis

In order to research and prove the alignment model of behavior vector mapping on and off the chain, the anti anonymity of Bitcoin transaction can be realized more accurately. We conducted an experimental test on the charging transaction of a platform. The experimental process is as follows:

  1. â‘ 

    Recharge the two deposit wallet addresses assigned by the exchange, then receive 26 social messages sent by the exchange through two social accounts. 26 social messages correspond to 26 social behavior vectors, including 11 social behavior vectors belonging to social account A and 15 social behavior vectors belonging social account B. The sample data of social behavior vector after anonymized is as follows: [‘2020–05-12 14:24’, ‘0.010 *’, deposit, * exchange, ‘account’]

  2. â‘¡

    Determine the time window of Bitcoin ledger data. In this experiment, the start time of Bitcoin ledger data is greater than or equal to the social behavior vector’s time minus 20 min, and the end time is less than or equal to the social behavior vector time plus 10 min.

  3. â‘¢

    Extract time and value fields in each social behavior vector, match with the output value of Bitcoin ledger transaction output in the time window, choose Bitcoin ledger transactions output with equal value.

  4. â‘£

    Analyze the graph structure of the transaction, and choose transaction whose transaction scene is the same as the social behavior vector’s scene.

  5. ⑤

    For the transaction output address, choose address whose cluster label is consistent with the exchange’s name in the social behavior vector.

The experimental results are shown in the following table (see Table 1):

Table 1. Anti-anonymity experimental results of Bitcoin transaction

Eleven social behavior vectors of social account A are respectively aligned with eleven C2C deposit transaction behavior vectors, and these Bitcoin transaction behavior vectors belong to one Bitcoin address, which is also the deposit address opened by the exchange for user A. Fifteen social behavior vectors of social account B are respectively aligned with fifteen B2C deposit transaction behavior vectors, and these Bitcoin transaction behavior vectors belong to one Bitcoin address, which is also the deposit address opened by the exchange for user B.

7 Conclusion

The anti-anonymity technology of Bitcoin transaction based on behavior vector mapping and aligning model proposed in this paper, realizes the fusion analysis of data on and off the chain. Compared with the traditional anti-anonymity technology, it has stronger practical effect. At the same time, the anti-anonymity technology proposed in this paper is also applicable to the anti-anonymity of other virtual currencies, such as Ethereum Coin and Tether USD.