Keywords

Introduction

Undoubtedly, the cryptocurrency industry is experiencing rapid innovation and constant evolution derived from its power and utility. Despite the promises of security, immutability, and complete transparency offered by blockchain technology, certain cryptocurrencies, particularly Bitcoin, have been utilized in both legal and illegal activities such as trading, buying goods, money laundering, scams, terrorism financing, and ransomware payments. In this sense, tackling terrorist financing through investigation, prosecution, and prevention has become a worldwide issue that extends beyond Europe. Every day, terrorists find new mediums to communicate, campaign, and finance their activities. For example, as reported by EUROPOL in the IOCTA report [1], two main trends are related to crowdfunding campaigns and generating revenue in markets. However, in both cases, to maintain anonymity, they often employ a combination of cryptocurrencies and dark market technologies [2].

Consequently, law enforcement officers (LEOs) face a critical challenge in analyzing these crypto transactions and identifying the responsible parties, especially due to properties like the (pseudo) anonymity of the network, the absence of regulatory oversight, the utilization of anonymizer mechanisms, the changing behavior of entities, and the emergence of new dynamics all contribute to the complexity of this task. Additionally, the sheer volume of information that needs to be examined can lead to a significant waste of time and resources, thereby impeding the progress of investigations.

To tackle these needs, and combat cybercrime, new paradigms, such as artificial intelligence (AI) and big data, can be used alongside conventional systems to create novel investigation tools. In particular, in this work, we present Kriptosare, a tool able to classify entity behaviors belonging to three main cryptocurrencies: Bitcoin (BTC), Bitcoin Cash (BCH), and Litecoin (LTC). Kriptosare is able to extract behaviors (or classes) from interactions and dynamics of different known entities involved in the transactions and then predicts the behaviors of new unseen entities. Pre-defined ML models are provided for a first classification, although users can train new ones using always new information and so they can reclassify the whole blockchains. For this task, the blockchain information is combined with open-source external data containing information about crypto addresses and real-world entity names detected over the years. This additional information facilitates the behavior definition following the taxonomyFootnote 1 provided by Interpol (Exchange, Mixer, Miner Pool, Marketplace, etc.) and represents a ground-truth for the ML training. However, these external data show uneven distribution, i.e., several entity behaviors are more represented than others introducing a class imbalance problem [3]. The imbalance problem is very critical since it can strongly affect ML performance, leading the model to learn skewed scenarios. Furthermore, addressing this issue is even more challenging in cryptocurrency applications, where detecting and collecting new observation data is complex and expensive in terms of resources and costs. Indeed, it is easier to find labeled behaviors of entities related to licit transactions rather than those involved in illicit activities, which are the most interesting from an investigation point of view. For this reason, Kriptosare also includes a synthetic data generator module, i.e., a crypto simulator able to create and manage a private Bitcoin, Bitcoin Cash, or Litecoin network. The control of this crypto environment allows users to replicate real behaviors generating synthetic data and then use them to address the imbalance problem introduced by external sources. More specifically, for creating their private network, users have two options, (a) deploy standard wallets, i.e., traditional and behavioral-free entities, or (b) pre-defined behavioral entities, i.e., intelligent wallets able to replicate real specific behavior assigned. In this way, on the one hand, it is possible to enhance the performance of the Kritposare.class reducing the costs. On the other hand, LEOs can study behaviors in captivity, i.e., in an isolated and controlled environment, to improve their knowledge about them.

In summary, Kriptosare allows users to manage both the classification and the generator modules in an easy way, through an intuitive and user-friendly interface (frontend). To the best of our knowledge, the presented tool can be used by LEOs to search and highlight the most important red flag indicators that could suggest criminal behavior, for example, a divergence between real labels obtained from external sources and the Kriptosare.class predictions, or the usage of specific entities that are usually involved in illicit activities, such as anonymizer or tumblers. These results can also be used for supporting LEOs’ analysis and optimizing their investigation resources by focusing their effort just on the most relevant behaviors, excluding the ones that are completely unregulated and which would require longer analysis times.

Related Work

The capability for non-transparent transactions and the absence of robust regulatory measures have spread the usage of cryptocurrency, in both legal and illegal/criminal activities. The most striking case is represented by Bitcoin [4]. In fact, over the years, the number of transactions involved in activities such as money laundering, selling illegal goods, ransomware, and Ponzi schemes has abruptly increased. This trend is confirmed in the “2023 Crypto Crime Report” [5] released by ChainalysisFootnote 2 [5], in which they count that in 2022, $20.6B were moved by illicit addresses. Consequently, the task of reducing anonymity within the network and categorizing crypto entities has become challenging and essential for law enforcement agencies (LEAs) [6].

For these reasons, many studies [7,8,9] have tried to address this task by using new paradigms like artificial intelligence (AI) and machine learning (ML). However, the majority of them, although valid from an academic point of view, are not used and validated in an operative context (investigation) by an end user. On the other hand, the most common tools like Chainalysis, Graphsense [10], BlockSci [11], Blockchair,Footnote 3 and CiphertraceFootnote 4 [12] are mainly focused on detecting entity behavior by gathering tags, labels, and information from the clear and dark web, rather than using AI and ML algorithms for forecasting them. In that sense, they need to be continuously fed with new external information (tags/labels) that is not always easy—and cheap—to find.

For this reason, in this study, we try to merge the two needs by introducing Kriptosare, a tool able to predict entity behaviors within cryptocurrencies using ML techniques. The tool analyzes interactions and dynamics of entities engaged in transactions, and from a few known (tagged/labeled), it is able to generalize their behaviors for detecting similar behaviors across the blockchain. Furthermore, Kriptosare allows the generation of synthetic data in a private and isolated environment. In this way, it is possible to reduce the issues related to the acquisition of external information.

General Architecture

As shown in Fig. 21.1, Kriptosare includes a central database (DB) and five units interconnected among them: four of them representing the backend (kripto_data, kripto_brain, kripto_API, and kripto_twins) and one (kripto_viz) the frontend.

Fig. 21.1
The general architecture of Kriptosare includes a central database D B and five interconnected units, four of which are the backend kripto data, kripto brain, kripto A P I, and kripto twins, and one kripto viz.

Kriptosare architecture

In particular, the backend is based on the following technologies:

  • Python-Flask: Microweb service AP.

  • Swagger: Python-Flask API development.

  • Python Scikit Learn: Python library for ML application.

  • Cassandra: Database DB.

  • Litecoin and Bitcoin Core: daemon for running real wallet.

Whereas the frontend:

  • Vue.js: JavaScript framework.

As already described, the backend is composed of four units. The first one is the kripto_data unit, which is in charge of the data collection. More specifically, this unit allows Kriptosare to download all the available blockchains (BTC, BCH, LTC) until the current date. This operation is done by running a blockchain daemon with a specific configuration inside a docker container (one container for each cryptocurrency). Once these containers are created and linked to the real network (Mainnet), they start to synchronize themselves and so download the data. At the same time, during this synchronization phase, a specific task is in charge of copying the blockchain data into the centralized DB so that the information can be further consumed by other units. This unit constitutes a safeguard for the data in the networks that are created and used over the tool’s lifetime.

The Second Unit That Composes the Backend Is the kripto_brain

This unit represents the CORE of Kriptosare. In fact, it is in charge of three main processes which are: data preprocessing, entity creation, and feature extraction. More specifically, in the first process, blockchain data are analyzed and processed in order to extract direct relations between input and output addresses. This information is a key aspect of the LEOs’ investigations as well as for applying the follow-the-money approach [13]. As the data are preprocessed, the entity creation process is run. This script applies common cryptocurrency heuristics [14] that allow one to link addresses controlled by the same user based on publicly available transaction information or users’ mistakes, such as address reuse. In this way, it is possible to create a cluster of addresses that represents a concrete user [15]. Finally, once the entities are created, the last process is in charge of analyzing the interactions between the entities in the blockchain and extracting the features that are the inputs of the ML model. This information is finally stored in the centralized DB. All these processes are executed as Python scripts that operate uninterrupted in the background so that new information is continuously preprocessed and updated.

The primary objectives of the third unit (kripto_API) are twofold. Firstly, it serves as a conventional API, i.e., the contact point between the user interface and the data. In fact, it allows users to consult the stored information and get classification results, statistics, and so on. Secondly, kripto_API also executes the scripts that control the training of the ML models and the (re-)classification task. More specifically, the ML model used by Kriptosare recalls the cascading machine learning approach presented in [16]. This ML strategy already showed to reach very promising performance in scientific investigations. Again, all the predictions and the new models are stored in the centralized DB.

Finally, the last module that composes the backend is the kripto_twins (or simulator). This unit allows controlling and generating private cryptocurrency networks (or Regtest) of BTC, BCH, or LTC. This simulator is implemented following the instruction released in [17], where Docker containers are used for simulating the different crypto wallets. In each of these containers, the appropriate crypto daemon is run, and then, remote procedure call (RPC) commands are used to control these nodes for creating connections, transactions, mining blocks, and simulating complex behaviors.

The Frontend Is Based on the Kripto_viz Module

This unit serves as a bridge between the user and the underlying functionalities of the tool, enabling a seamless and user-friendly experience. In fact, it promotes interactivity, helping users to retrieve the classification information and the complete parametrization of synthetic networks, specifying values such as the number of wallets or their behavior (within a pre-defined set of available types of behavior). It also improves the operations’ efficiency, allowing users to run complex tasks and workflows in a few steps, and at the same time, it provides meaningful error messages for recovering mistakes and preventing critical errors. Finally, this unit allows the users to interpret and visually understand the results through a series of graphs and tables.

As shown in Fig. 21.2, Kriptosare main page is composed of 4 main sections, each one highlighted with a different color. On the right-hand side (red section), there is a menu that allows the user to navigate through the tool functionalities: Classifier, for getting classification results; Model Management, dedicated to modify or train new machine learning models; Network Management, facilitating the creation, deletion, and status retrieval of private blockchains; and lastly Behavior Simulator, designed for generating transactions, mining blocks, and simulating intricate behaviors within the simulator. The central section (green area) represents the command area where the user can select the desired cryptocurrency and configure relevant parameters according to the chosen functionality. In the specific instance presented in Fig. 21.2, when users navigate on to the Classifier menu, they must input a valid cryptocurrency address. Situated at the lower part of the interface (highlighted in yellow), a log area is designated to display all messages related to ongoing operations, their real-time statuses, and their ultimate outcomes. This setup ensures that users remain well-informed about the progress of all the operations, particularly considering that certain tasks may demand considerable time, such as training a new model or generating a substantial volume of blocks. Concluding the layout, the orange segment is dedicated to showcasing the results attained following the function’s execution. An example of the results that can be obtained using the Classifier function is reported in Fig. 21.3. As it is possible to see, Kriptosare shows statistical information related to the searched address as well as the behavior prediction (provided by the ML model) of the entity that controls or can control the address, using a very intuitive view.

Fig. 21.2
A screenshot of the Kriptosare window. The left panel presents the Kriptosare class and Kriptosare gen. The right panel presents the address. The Bitcoin tab at the top and the classify tab at the bottom are highlighted.

Kriptosare interface (main page)

Fig. 21.3
A screenshot of a statistics window. It presents the information about the address. A pie chart presents the distribution of exchange, gambling, market, miner, and service.

Example of address statistics and classification results extracted and visualized in Kriptosare

All the functionalities provided by Kriptosare are functionable to all users without any prior knowledge. However, it is possible to differentiate two different groups of users: basic and ML experts. The first one includes basic users that use the interface for their investigations about crypto address predictions and the generator to create private networks that validate their hypotheses (Classifier, Network Management, and Behavior Simulator menu). The second group includes users who know how beneficial could be to train a new ML model and reclassify the whole blockchain, as well as they know how to include the synthetic data in the loop for improving the model’s abilities. In this sense, the ML experts fully exploit the model management features.

Validation and Conclusions

Kriptosare has been evaluated in two different European projects: TITANIUMFootnote 5 and Tools4LEAs.Footnote 6 In the first one, the initial version of the tool (a prototype) was made available to the project stakeholders (mainly LEAs from Germany, Spain, Finland, and Interpol) during two events called Field Labs. These events were Capture-The-Flag (CTF) exercises, in which LEAs used Kriptosare and other project tools to tackle challenges associated with criminal investigations and terrorist activities involving virtual currencies and underground markets in the darknet. This approach provided valuable insights from end-users regarding the tool’s relevance to their investigations and day-to-day responsibilities. Additionally, it allowed us to gather feedback on how to improve the tool, i.e., include new functionalities, increase the interoperability of the tool, and improve usability and user experience.

Thus, the tool was improved, and its maturity level was enhanced thanks to the Tools4LEAs project. In this second project, the tool was again evaluated by domain experts selected by the EUROPEAN ANTI-CYBERCRIME TECHNOLOGY DEVELOPMENT ASSOCIATION (EACTDA).Footnote 7 The purpose of EACTDA is to support the collaboration of multiple essential stakeholders and provide technological solutions for European Law Enforcement Agencies and Forensic Laboratories to use them in their fight against crime. In this second validation, domain experts had the chance to read all about the tool (installation and user guide), and then, they tested it freely, using only the provided materials as a guide. After that, they evaluated the tool according to the eight software characteristics, as defined by the ISO/IEC25010:2011Footnote 8 standard. Finally, the experts answered some final questions such as the type of enhancements that they would suggest to the tool, if they considered the tools valuable in their investigation, etc. At the end of this process, Kriptosare was a fully tested and operational tool ready to be used by EU public security organizations for fighting cybercrime.

In this chapter, Kriptosare, a tool for cryptocurrency entity behavioral analysis and simulation, is presented. Some preliminary results gathered from LEAs, practitioners, and domain experts proved the potential of this tool and its application in use case investigation. However, on its first deployment, the tool takes a long time to have all the blockchains up to date (depending on the physical resources). In fact, each time a new instance of Kriptosare is run, it needs months to download, preprocess, train, and classify all the data of the three blockchains, considering that just the Bitcoin blockchain has about 866 M transactions and more than 1000 M of addresses generated in 14 years (until the publication date).

As a product of the Tools4LEAs project, Kriptosare is now accessible to EU public security organizations, practitioners, and customers. To gain access to the tool, interested parties may reach out to EACTDA at info@eactda.eu.