Identification of token contracts on Ethereum: standard compliance and beyond

Di Angelo, Monika; Salzer, Gernot

doi:10.1007/s41060-021-00281-1

Identification of token contracts on Ethereum: standard compliance and beyond

Regular Paper
Open access
Published: 03 September 2021

Volume 16, pages 333–352, (2023)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Identification of token contracts on Ethereum: standard compliance and beyond

Download PDF

19k Accesses
8 Citations
5 Altmetric
Explore all metrics

This article has been updated

Abstract

Next to cryptocurrencies, tokens are a widespread application area of blockchains. Tokens are digital assets implemented as small programs on a blockchain. Being programmable makes them versatile and an innovative means for various purposes. Tokens can be used as investment, as a local currency in a decentralized application, or as a tool for building an ecosystem or a community. A high-level categorization of tokens differentiates between payment, security, and utility tokens. In most jurisdictions, security tokens are regulated, and hence, the distinction is of relevance. In this work, we discuss the identification of tokens on Ethereum, the most widely used token platform. The programs on Ethereum are called smart contracts, which—for the sake of interoperability—may provide standardized interfaces. In our approach, we evaluate the publicly available transaction data by first reconstructing interfaces in the low-level code of the smart contracts. Then, we not only check the compliance of a smart contract with an established interface standard for tokens, but also aim at identifying tokens that are not fully compliant. Thus, we discuss various heuristics for token identification in combination with possible definitions of a token. More specifically, we propose indicators for tokens and evaluate them on a large set of token and non-token contracts. Finally, we present first steps toward an automated classification of tokens regarding their purpose.

Blockchain smart contracts: Applications, challenges, and future trends

Article 18 April 2021

Cybersecurity, Data Privacy and Blockchain: A Review

Article Open access 12 January 2022

Blockchain application for central bank digital currencies (CBDC)

Article 16 January 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Tokens (more specifically crypto tokens) are similar to the coins of a cryptocurrency, with two main differences. First, they do not have a blockchain or distributed ledger of their own. Rather, they are a digital asset on top of a cryptocurrency or blockchain, representing the right to something. As a medium of exchange, tokens can act as a currency themselves.

Secondly, tokens are programmable and can be used beyond the mere exchange of value. In this respect, tokens are part of an application, often a decentralized application (DApp). DApps are applications on a P2P network that are not controlled by a single entity. Decentralization can be achieved by implementing critical components on a blockchain. Governance of and access to DApps are often controlled by application-specific tokens, but tokens can also act as the local currency of a DApp.

In addition to these use cases (exchange of value, part of an application), tokens may be linked to off-chain assets. Moreover, they can serve as means of fundraising, pre-order or investment, as well as means for building an ecosystem or a community.

Tokens gained in importance, the more value was attached to them. At the same time, they sparked the interest of regulatory bodies. With the proliferation of tokens, one may ask what people intend to achieve by using tokens and how they attempt to achieve it.

As Ethereum is the major platform for tokens, we search for a clarification on the actual usage of tokens in its public data. More specifically, we investigate the following regulatory and technological aspects of tokens:

Which types of tokens can be distinguished?
Which standards for token contracts are in use?
How can token contracts be identified in transaction data?
How can the type of a token be automatically inferred?

Our approach. We address these questions by analyzing the transaction traces of Ethereum with regard to the deployed bytecodes (static data) as well as the calls to token contracts (dynamic data). Concerning methods, we discuss the automated identification and classification of token contracts. The methods rely on reconstructing the interface of contracts from bytecode as well as on observing the actual behavior of contracts.

Contribution. Most work focuses on tokens with a high market cap and on the flow of Ether and tokens. In contrast, we view tokens as a particular group of smart contracts that includes all contracts, from unused to top tokens. Like other authors, we discuss tokens complying with the most prevalent ERC-20 standard [37], but we include other token standards as well. Furthermore, we give an account of the utilization of the standards and depict their usage over time.

Based on our exploration of contract interfaces and activities, we derive indicators for detecting token contracts that do not comply with any of the standards considered. Moreover, we evaluate these indicators systematically on a carefully selected ground truth of tokens and non-tokens. Finally, we propose a heuristic approach to assess the type of token contracts—security versus non-security—and evaluate it qualitatively against decisions of the US Securities and Exchange Commission (SEC).

Overall, this paper advances the field of blockchain analytics, in particular regarding tokens on Ethereum.

Roadmap. Section 2 introduces blockchain tokens and typical functionalities of token contracts. In Sect. 3, we discuss types of tokens with an emphasis on regulatory aspects. In Sect. 4, we summarize relevant token standards. In Sect. 5, we compare our approach to related work. Section 6 introduces terms and data. We present methods for the identification of compliant tokens in Sect. 7 and discuss their prevalence in Sect. 8. In Sect. 9, we characterize token contracts beyond standard compliance and discuss indicators for identifying non-compliant tokens. We compare the indicators in Sect. 10. To assess the type of tokens, we introduce the concept of purity in Sect. 11 and give examples in Sect. 12. Section 13 concludes with a summary of our findings.

2 Token basics

Token contracts maintain a ledger that records the ownership of tokens. Most contracts implement fungible tokens, which are mutually indistinguishable. In this case, it suffices to store the amount of tokens for each holder. Non-fungible tokens, on the other hand, are uniquely identified by individual bit patterns, like numbers, and the contract has to associate each individual token with its owner. The ledger is safeguarded by the cryptographic mechanics of the underlying blockchain.

The core functionality of token contracts consists of methods that allow holders to transfer some of their tokens to a specified address. Moreover, the contracts often enable administrators to create or destroy tokens (known as minting and burning).

2.1 Benefits

Three main characteristics make tokens on a blockchain particularly attractive.

Programmability: Token contracts facilitate an automated management of aspects like the enforcement of regulations.
Tamper evidence: The immutable traces of transfers on the blockchain provide evidence whether the digital ownership has been tampered with.
Liquidity: With tokens, ownership can readily be divided into fractions, which increases the liquidity of otherwise indivisible assets.

2.2 Acquisition and value

Tokens can be purchased (e.g., during an initial coin offering (ICO) or through a crypto exchange), traded on-chain or received freely (e.g., during an airdrop or as a reward for a service or behavior).

The value of a token depends on supply and demand as well as on the trust of the participating community, which is based on the credibility and service.

2.3 Design of token contracts

As tokens are a widespread application, coding patterns and best practice examples are readily available, like in the collections provided by ConsenSys^{Footnote 1} and OpenZeppelin.^{Footnote 2} Many token contracts are generated by factories (on-chain or as a web service) according to a given specification.

Most tokens aim at establishing trust and credibility by disclosing their source code on Etherscan.io. As a service, this platform checks that the deployed bytecode is the result of compiling the source code with the given compiler settings and labels it as ‘verified source code.’

3 Types of tokens

A common high-level categorization of tokens distinguishes between payment, security, and utility tokens [30]. The need for clarifying the differences lies in the fact that in most jurisdictions, security tokens are more strictly regulated than other tokens. The main distinguishing feature is the investment purpose of security tokens as opposed to the added value for the functioning of a product that is typical of utility tokens. Payment tokens offer little to no other functionality beyond the transfer of values. Legally, the distinction is still a gray area in many jurisdictions.

3.1 Howey test

In [32], Rohr et al. base their discussion of legal aspects of token sales under US law on a similar classification of tokens and emphasize the importance of the so-called Howey test. They argue that jurisdictions should provide ‘regulatory certainty and a sensible path to compliance.’

The Howey test essentially identifies three criteria as characteristic of securities. A financial instrument is considered a security, if it requires (i) the investment of money, (ii) in a common enterprise, (iii) with the expectation of profits mainly from the efforts of others [35]. For crypto-tokens, criterion (i) is met if the token is sold on-chain in exchange for a cryptocurrency or other crypto-assets. Whether a token is related to a ‘common enterprise’ mainly depends on the legal assessment of off-chain factors. For criterion (iii), an analysis of the underlying token contract may contribute to the overall assessment of the token. In Sect. 11, we will introduce our concept of ‘purity’ as an indicator that the token contract itself does not provide any means that would allow a token holder to make efforts on-chain.

3.2 Definitions

In this work, we rely on the distinction of token types as stated by the Swiss FINMA [19] as a common ground for US [32], EU [22], and other jurisdictions.

Security Tokens are ‘assets, such as a debt or equity claim on the issuer. In terms of their economic function, therefore, these tokens are analogous to equities, bonds or derivatives.’ Typically, it is a share in the issuing company (equity token).

Regarding legal compliance, there is an ongoing discussion on how it could be integrated into a token standard (cf. Sect. 4), as well as into wallets and exchanges (cf. [2]).

Utility Tokens are usually backed by a project, an application, or a DApp with a definable benefit (like access) and intend to ‘provide access digitally to an application or service by means of a blockchain-based infrastructure. The issue of utility tokens does not require supervisory approval if the digital access to an application or service is fully functional at the time the tokens are issued.’ The purpose of a utility token may include voting rights, some sort of reward, or staking governance.

3.3 Categorization

As these purposes and categories may overlap for a specific token, a finer-grained classification scheme may be more adequate. Many tokens are hybrids concerning this coarse categorization [22]. Based on a literature review and a subsequent empirical study, Oliveira et al. [30] distill eight archetypes of tokens.

It would be desirable to automatically identify the type of a token that a contract implements. In this work, we discuss first steps toward this goal.

4 Interface standards for tokens

Standardized interfaces for token contracts enable applications such as wallets to recognize tokens and to interact with them. In this section, we first introduce accepted token standards and then proposed security token standards.

4.1 Accepted token standards

The community continuously discusses and establishes standard interfaces for tokens in the programming language Solidity, which is prevalent on Ethereum. The following standards have been accepted so far.

ERC-20 Token Standard [37] is the most widely used and most general token standard that ‘provides basic functionality to transfer tokens, as well as allows tokens to be approved so they can be spent by another on-chain third party.’ It lists six mandatory and three optional functions as well as two events to be implemented by a conforming API.

ERC-721 Non-Fungible Token Standard [17] concerns tokens where each token is distinct (aka non-fungible) and thus enables the tracking of distinguishable assets. Each asset must have its ownership individually and atomically tracked. This standard requires compliant tokens to implement 10 mandatory functions and three events.

ERC-777 Token Standard [8] defines advanced features to interact with tokens while remaining backwards compatible with ERC-20. It defines operators to send tokens on behalf of another address and hooks for sending and receiving in order to offer token holders more control over their tokens. This standard requires compliant tokens to implement 13 mandatory functions and five events.

ERC-1155 Multi Token Standard [31] allows for the management of any combination of fungible and non-fungible tokens in a single contract, including transferring multiple token types at once. This standard requires compliant tokens to implement six mandatory functions and four events.

4.2 Proposed security token standards

Apart from the accepted standards, several others are proposed and discussed, but not yet finalized. From the legal perspective, the following security token standards seem interesting. While the first one is rather general, the other two are project-specific and company-backed.

ERC-1462 Base Security Token [25] is a minimal extension to ERC-20 that ‘provides compliance with securities regulations and legal enforceability’ and aims at general use-cases, while additional functionality and limitations related to projects or markets can be enforced separately. Furthermore, it includes ‘KYC (Know Your Customer) and AML (Anti Money Laundering) regulations and the ability to lock tokens for an account, and restrict them from transfer due to a legal dispute.’ Moreover, it provides means to attach documents to tokens. This standard requires compliant tokens to implement four further mandatory checking functions (on top of ERC-20) and two optional documentation functions.

ERC-1450 LDGRToken [33] is a ‘security token for issuing and trading SEC-compliant securities’ that extends ERC-20. This standard ‘facilitates the recording of ownership and transfer of securities sold in compliance with the Securities Act Regulations CF, D and A.’ Apart from its own mandatory functions, it makes optional parts of ERC-20 mandatory. Moreover, it requires certain modifiers and constructor arguments to be implemented.

ERC-1644 Controller Token Operation Standard [15] ‘allows a token to transparently declare whether or not a controller can unilaterally transfer tokens between addresses.’ This is motivated by the fact that ‘in some jurisdictions the issuer (or an entity delegated to by the issuer) may need to retain the ability to force transfer tokens.’ This standard requires compliant tokens to implement three mandatory functions and two events.

ERC-1644 is part of ERC-1400 [16], a library of standards for security tokens, which requires the contained standards to be backwards compatible with ERC-20 and via extensions also with ERC-777. Additionally, the library contains ERC-1410 for differentiated ownership and transparent restrictions, ERC-1594 for on- and off-chain restrictions, and ERC-1643 for document and legend management.

5 Comparison to related work

Most of the distantly related work focuses on the financial aspects (specifically the transfer of assets), network aspects (like address clustering), or cryptocurrency platforms other than Ethereum.

5.1 Ethereum token networks and transactions

The work mentioned here is related to our approach to the extent that it deals with Ethereum tokens and transaction data.

Ethereum transactions. Chan et al. [4] analyze the transactions as a graph in order to de-anonymize addresses. With the aim to address security issues, Chen et al. [5] analyze the transaction graph in regard to money transfer, contract creation, and contract call. Applying network science theory onto the transaction graph, Guo et al. [21] conclude that ‘transaction volume, transaction relation, and component structure, exhibit a heavy-tailed property and can be approximated by the power law function.’ Likewise, Chen et al. [7] employ a graph approach to analyze the token ecosystem by constructing a graph each for the creators, holders, and transfers of tokens.

ERC-20 token networks. Somin et al. [34] study the token trading network in its entirety by analyzing it as a graph and show power-law properties for the degree distribution. Similarly, Victor et al. [36] measure token networks, which they define as the network of addresses that have owned a specific type of token at any point in time, connected by the transfers of the respective token.

Our Approach. Rather than de-anonymization, security issues, or trading aspects, our investigation puts a focus on the identification of token contracts that comply to an interface standard, fully or partially. Furthermore, we aim at automatically inferring the type of an implemented token. To this end, we consider transactions not from a network or graph perspective, but on the level of contract deployment (for the bytecode of the contract) and event logs as well as call frequency of functions and contracts. Moreover, we employ the analysis of calls as an add-on to the analysis of bytecode in order to identify aspects of deployed contracts more reliably than we can achieve by relying merely on bytecode.

5.2 EVM bytecode analysis

The work mentioned here is related closely to our approach since we employ bytecode analysis for identifying both standard compliant and non-compliant token contracts.

Code Clones. To detect code clones, He et al. [23] first de-duplicate contracts by ‘removing function unrelated code (e.g., creation code and Swarm code), and tokenizing the code to keep opcodes only.’ Then, they generate fingerprints of the de-duplicated contracts by a customized version of fuzzy hashing and compute pairwise similarity scores. In another approach to clone detection, Liu et al.[27, 28] characterize each smart contract by a set of critical high-level semantic properties. Then, they detect clones by computing the statistical similarity between the respective property sets. On source code level, Kondo et al. [24] applied a tree-based clone detector to 33,000 verified contracts from Etherscan up to the year 2018.

ERC-20 Compliance. Fröwis et al. [20] as well as Norvill et al. [29] demonstrate the feasibility to identify ERC-20 compliance over the interface of a contract. To detect token systems automatically, Fröwis et al. [20] compare the effectiveness of a behavior-based method combining symbolic execution and taint analysis, to a signature-based approach limited to ERC-20 compliant tokens. They demonstrated that the latter approach detects 99% of the tokens in their ground-truth data set. Extracting function signatures and restoring the interface is also reported in our previous work [10, 13].

Partial Compliance. Moreover, Fröwis et al. [20] consider partially ERC-20 compliant tokens when they implement at least 5 of the 6 mandatory functions. While the usage of signatures of the interface is in line with [20, 29], our previous work extends it beyond ERC-20 compliance by including other standards as well and by discussing partial compliance [13].

Type Distinction. Next to employing a graph approach to analyze the token ecosystem, Chen et al. [7] try to classify token contracts by reading the descriptive texts in their source code, albeit less than 1% of the tokens provide such a text. In our previous work [14], we infer the token type over a semantic classification of the token interface.

Our Approach. The method of computing code skeletons is comparable to the first step for detecting similarities by [23]. Instead of fuzzy hashing as a second step though, we rely on the set of function signatures extracted from the bytecode and manual analysis, as our purpose is to identify token contracts reliably. This is in line with previous work on ERC-20 standard compliance [10, 13, 20, 29].

Regarding non-compliant tokens, we devise further methods for their identification that extend our previous work [13].

Additionally, we aim at an automatic distinction of token types. In contrast to [7] where Chen et al. use descriptive texts from source code, we work at bytecode level and approach it over the concept of pure token contracts that we define via the set of implemented functions in the bytecode and apply this concept to exemplary security tokens.

6 Terms and data

In this section, we introduce relevant terms and describe the data used for the analysis. Throughout the paper, we abbreviate the factors 1000, 1,000,000 and 1,000,000,000 by the letters k, M, and G, respectively.

6.1 Terms

We assume the reader to be familiar with blockchain technologies and cryptocurrencies in general. Regarding the specifics of Ethereum, we refer to [3, 18, 38].

6.1.1 Accounts, transactions, and messages

Ethereum distinguishes between externally owned accounts, often called users, and contract accounts or simply contracts. Accounts are uniquely identified by addresses of 20 bytes. Users can issue transactions (signed data packages) that transfer value to users and contracts, or that call or create contracts. These transactions are recorded on the blockchain. Contracts need to be triggered to become active, either by a transaction from a user or by a call (a message) from another contract. Messages are not recorded on the blockchain since they are deterministic consequences of the initial transaction. They only exist in the execution environment of the Ethereum Virtual Machine (EVM) and are reflected in the execution trace and potential state changes. We use ‘message’ as a collective term for any (external) transaction or (internal) message.

6.1.2 Abstract binary interface (ABI)

Most contracts in the Ethereum universe adhere to the ABI standard [1], which identifies functions by a particular hash of the header. More precisely, such a function signature consists of the first four bytes of the Keccak-256 hash of the function name concatenated with the parameter types. The bytecode of a contract contains instructions that compare the first four bytes of the call data to the signatures of its functions. The latter can be usually found literally in the deployed bytecode and indicate that the contract implements functions with these headers.

Another component of the interface are events. Emitting an event during the execution of a contract results in a log entry that can be observed by off-chain programs. Events are implemented via the instruction LOG whose first argument is the hash of the event header. The presence of the hash in the bytecode indicates the ability to issue the corresponding event.

6.2 Database

Our analysis is based on the transaction data of the Ethereum main chain up to block 10.5 M, which was mined on July 21, 2020. We retrieve the blocks, transactions, and execution traces via the RPC interface of the Ethereum client OpenEthereum v3.0.1. To speed up the analysis of contracts, we use the verified source code of contracts at Etherscan. If not available, we resort to disassembling or decompiling the bytecode.

For efficient querying, we store the data in a Postgres database. Each of the 2 G messages (creations, calls, and self-destructions) is uniformly represented by a record composed of an abstract timestamp, the message type, the success status, the addresses of context, sender and recipient, the input and output data, and the transferred amount of Ether.

6.2.1 Contracts

For each of the 28.1 M successful creation messages, our table of contracts contains an entry with the timestamps of start and end of deployment, of the contract’s first use after deployment, and of an optional self-destruction. Moreover, we store the deployment and the deployed bytecode, the deployment address, and the address of the creator.

6.2.2 Bytecodes

Frequently, contracts share the same bytecode. For each of the 300 k distinct codes, our table of codes contains the function and event signatures. Moreover, we maintain dictionaries with 400 k function and 60 k event headers that allow us to reconstruct the headers for the majority of signatures. See the next section for details.

6.2.3 Logs

For each of the 710 M instructions LOG that have been executed so far, our table of log entries records a timestamp, the context address and several fields with log data. The first field holds the hash of the event header. We are particularly interested in the standardized event accompanying token transfers, accounting for 60% of the entries.

6.2.4 Messages

The dynamic data, i.e., the calls to and from contracts as well as the emitted events, are sparse and noisy. For most contracts, only a small fraction of the offered functions has ever been called, and many events have never been emitted. Moreover, observing a call to a contract with a particular signature does not mean that the corresponding function is indeed implemented; often a so-called fallback function catches unknown signatures without raising an error. Only if a function is frequently called, it is safe to assume that it is part of the interface. To get a more complete picture, we accumulate the dynamic data for all contracts with the same bytecode.

6.2.5 Proxies

Furthermore, proxies are a phenomenon to be considered. They forward incoming calls to a central contract via a particular type of call. This way the proxy contract may implement an interface without containing the corresponding signatures in its bytecode. We identify proxies statically via their bytecode as well as dynamically by detecting the forwarding of calls.

7 Methods for ERC-compliant token contracts

In this section, we concentrate on contracts that comply with the token standards in Sect. 4, referring to them as ‘fully compliant’, and summarize methods to identify them. In Sect. 9, we consider methods for token contracts that are partially or not compliant.

Behavior-oriented approach. The central task of a token contract is bookkeeping. Each token contract maintains a data structure that maps user ids like addresses to quantities of fungible tokens or lists of non-fungible ones. Moreover, it usually implements functions for querying the data structure and for transferring tokens between users.

Chen et al. [6] observe the EVM execution trace to capture changes in the bookkeeping of a token. Then, they try to match the found changes with emitted events in order to detect inconsistencies.

Fröwis et al. try to detect bookkeeping by symbolic execution and taint analysis of the bytecode in order to identify token contracts. Due to the difficulty of the problem, this method is still less effective than the interface approach [20]. We therefore resort to interface methods in our analysis.

Interface-oriented approach. Token contracts need to be accessible by wallets and exchanges; hence, they offer standardized interfaces. We therefore expect fully compliant token contracts to be identifiable by the functions and events they implement. It is unlikely that a contract offers six or more functions with the profiles prescribed by a standard without implementing token semantics. We found a single bogus contract whose interface pretends to be a token contract but that does not record token holdings.

Figure 1 gives an overview of the procedure for interface reconstruction. In the first step, we split the raw bytecode into sections. Then, we locate all function entry points as well as selected events in the first code section; their signatures form the interface. For many signatures, we are able to restore the original headers, which helps to understand the purpose of the contract. In the following, we describe the algorithms in more detail.

7.1 Skeletons

To detect functional similarities between contracts, we compare their skeletons. They are obtained from the bytecodes of contracts by replacing meta-data, constructor arguments, and the arguments of the operations \({\texttt {PUSH}}\) uniformly by zeros and by stripping trailing zeros. The rationale is to remove variability that has little to no impact on the functional behavior (like the swarm hashes added by the Solidity compiler or hard-coded addresses of companion contracts). Skeletons allow us to transfer knowledge gained about one contract to others with the same skeleton.

As an example, the 28.1 M contract deployments correspond to just 140 k distinct skeletons. This is still a large number, but more manageable then 298 k distinct bytecodes. By exploiting creation histories and the similarity via skeletons, we are able to relate 13.7 M contract addresses to one of the 92 k source codes on Etherscan, an increase from 0.3 to 49%.

7.2 Sectioning EVM bytecode

As preparation for code analysis techniques like code skeletons, signature extraction, and control flow graphs, we decompose the bytecode of contracts into code, data, and meta-data sections, as otherwise parts of the bytecode may be misinterpreted. Apart from the proper contract code at the beginning, the bytecode may contain the code of further contracts to be deployed as well as literals. Moreover, the Solidity compiler adds meta-data with information on the source code and the compiler version. Meta-data may be followed by constructor arguments. Some bytecodes consist of more than 40 sections with as many as 14 meta-data parts.

The decomposition takes place in three stages. First, meta-data can be unambiguously detected as CBOR encoded mappings that contain one of the keys bzzr0, bzzr1 or ipfs. Second, the byte strings before, between and after meta-data, are split at instruction sequences that are characteristic for the start of a new contract; they are marked as code. Finally, the parts after meta-data that do not start with a characteristic sequence are labeled as constructor arguments.

Evaluation. To validate our method, we count the number of good and bad jumps. For each instruction JUMP(I) preceded by a PUSH of the target address, we determine whether the target instruction is JUMPDEST (good jump) or not (bad jump). Bad jumps raise an exception that reverts the entire transaction, so they are used only infrequently in regular code. If, on the other hand, the sectioning algorithm determines the start of a code section incorrectly, then virtually all jumps will be bad jumps. We find that our decomposition heuristics works correctly for 99.9% of the bytecodes. The first code section, relevant for extracting function signatures (see below), is faulty for only 0.03% of the bytecodes. Among others, the faulty cases are ‘contracts’ that are actually data repositories for other contracts and are not meant to be executed.

7.3 Extracting function signatures

When calling a contract that adheres to the standard for abstract binary interfaces (ABI), the first four bytes of the call data identify the function to be executed. The contract compares these bytes to the signatures of the implemented functions and branches to the respective code. To aid code analysis, tools like Mythril^{Footnote 3} identify heuristically byte sequences involved in comparisons, look them up in a database and, if successful, annotate the code with the function header found. Since a function header exists, chances are high that the byte sequence indeed is a signature.

Our goal is different, as we want to reconstruct the interfaces reliably, regardless of whether signatures correspond to known function headers or not. We need to avoid that arbitrary data or signatures of other code sections are mistaken as part of the interface. Therefore, we identify the first code section of the contract and then apply algorithm 1. It uses eight pairs of regular expressions, where the first expression in each pair locates the code that reads the call data, and the second one is applied repeatedly to extract the signatures from the comparisons. Tables 1 and 2 show one such pair.

Table 1 One of the regular expressions \({\textit{reData}}_i\). It specifies 44 equivalent code fragments that push the first four bytes of the calldata on the stack. \({\texttt {PUSH}}\ 2^{224}\) is shorthand for a reg.exp. describing five ways of putting the constant \(2^{224}\) on the stack. Question marks with the same index denote elements that are simultaneously present or missing

Identification of token contracts on Ethereum: standard compliance and beyond

Abstract

Similar content being viewed by others

Blockchain smart contracts: Applications, challenges, and future trends

Cybersecurity, Data Privacy and Blockchain: A Review

Blockchain application for central bank digital currencies (CBDC)

1 Introduction

2 Token basics

2.1 Benefits

2.2 Acquisition and value

2.3 Design of token contracts

3 Types of tokens

3.1 Howey test

3.2 Definitions

3.3 Categorization

4 Interface standards for tokens

4.1 Accepted token standards

4.2 Proposed security token standards

5 Comparison to related work

5.1 Ethereum token networks and transactions

5.2 EVM bytecode analysis

6 Terms and data

6.1 Terms

6.1.1 Accounts, transactions, and messages

6.1.2 Abstract binary interface (ABI)

6.2 Database

6.2.1 Contracts

6.2.2 Bytecodes

6.2.3 Logs

6.2.4 Messages

6.2.5 Proxies

7 Methods for ERC-compliant token contracts

7.1 Skeletons

7.2 Sectioning EVM bytecode

7.3 Extracting function signatures

7.4 Extracting event signatures

7.5 Header restoration

8 ERC-tokens over time

9 Identification of non-compliant tokens

9.1 When is a contract a token contract?

9.2 A ground truth for token contracts

9.3 Indicator \(I_1\): single ERC-20 signatures

9.4 Indicator \(I_2\): multiple ERC-20 signatures

9.5 Indicator \(I_3\): contract name

9.6 Indicator \(I_4\): transfer events

10 Comparison of indicators for non-compliant tokens

10.1 Indicator \(I_1\): single ERC-20 signatures

10.2 Indicator \(I_2\): multiple ERC-20 signatures

10.3 Indicator \(I_3\): contract name

10.4 Indicator \(I_4\): transfer events

10.5 Combination of indicators

10.6 Non-compliant tokens

11 Purity of token contracts

11.1 Grouping function headers

11.2 Share of pure token contracts

12 Purity for sample tokens

12.1 Ethereum tokens and the SEC

12.2 Settlements and orders

12.3 Services and games

13 Conclusions

Change history

06 March 2022

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding

Conflict of interest

Availability of data and materials

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search