1 Introduction

EVM and memory model. Ethereum [27] is considered the world-leading programmable blockchain today. It provides a virtual machine, named EVM (Ethereum Virtual Machine) [21], to execute the programs that run on the blockchain. Such programs, known as Ethereum “smart contracts”, can be written in high-level programming languages such as Solidity [6], Vyper [4], Serpent [3] or Bamboo [1] and they are then compiled to EVM bytecode. The EVM bytecode is the code finally deployed in the blockchain, and has become a uniform format to develop analysis and optimization tools. The memory model of EVM programs has been described in previous work [17, 19, 26, 27]. Mainly, there are three regions in which data can be stored and accessed: (1) The EVM is a stack-based virtual machine, meaning that most instructions perform computations using the topmost elements in a machine stack. This memory region can only hold a limited amount of values, up to 1024 256-bit words. (2) EVM programs store data persistently using a memory region named storage that consists of a mapping of 256-bit addresses to 256-bit words and whose contents persist between external function calls. (3) The third memory region is a local volatile memory area that we will refer to as EVM memory, and which is the focus of our work. This memory area behaves as a simple word-addressed array of bytes that can be accessed by byte or as a one-word group. The EVM memory can be used to allocate dynamic local data (such as arrays or structs) and also for specific EVM bytecode instructions which have been designed to require some lengthy operands to be stored in local memory. This is the case of the instructions for computing cryptographic hashes, or for passing arguments to and returning data from external function calls. Compilers use the stack and volatile memory regions in different ways. The most-used Solidity compiler solc generates EVM code that uses the stack for storing value-type local variables, as well as intermediate values for complex computations and jump addresses, whereas reference-type local variables such as array types and user-defined struct types are located in memory. For instance, when a Solidity function returns a struct variable, the required memory for the struct is allocated and initialized at the beginning of the function execution. However, the allocated memory is not always accessed as we illustrate in the following function (that belongs to the contract in Fig. 1):

figure a

Although the execution of \(\_\mathtt {ownershipAt}\) allocates memory for the return value declared in the function definition, the execution of the function is reserving a different memory space for the actual returned struct obtained from \(\texttt{unpackedOwnership}\) and, thus, the first reservation and its initialization are needless. The focus of our work is on detecting such needless write memory accesses on the code generated by solc. Nevertheless, as the analysis works at EVM level, it could be easily adapted to EVM code generated by any other compiler.

Optimization. Optimization of Ethereum smart contracts is a hot research topic, see e.g. [9, 10, 12,13,14, 22, 24] and their references. This is because the reduction of their costs is relevant for three reasons: (1) Deployment fees. When the contract is deployed on the blockchain, the owner pays a fee related to the size in bytes of the bytecode. Hence, a clear optimization criterion is the bytes-size of the program. The Solidity compiler solc [6] has as optimization target such bytes-size reduction. (2) Gas-metered execution. There is a fee to be paid by each client to execute a transaction in the blockchain. This fee is a fixed amount per transaction plus the cost of executing all bytecode instructions within the function being invoked within the transaction. This cost is measured in “gas” (which is then priced in the corresponding cryptocurrency) and this is why the execution is said to be gas-metered. The EVM specification ([27] and more recent updates) provides a precise gas consumption for each bytecode instruction in the language. The goal of most EVM bytecode optimization tools [9, 10, 12,13,14, 22] is to reduce such gas consumption, as this will revert on reducing the price of all transactions on the smart contract. (3) Enlarging Ethereum’s capability. Due to the huge volume of transactions that are being demanded, there is a huge interest in enlarging the capability of the Ethereum network to increase the number of transactions that can be handled. Optimization of EVM bytecode in general –and of its memory usage in particular– is an important step contributing into this direction.

Challenges and contributions. Optimizing memory usage is considered a challenging problem that requires a precise inference of the memory locations being accessed, and that usually varies according to the memory model of the language being analyzed, and to the compiler that generates the code to be executed. In the case of Ethereum smart contracts generated by the solc compiler, the memory model is rather unconventional and its low-level memory usage patterns challenge automated reasoning. On one hand, instead of having an instruction to allocate memory, the allocation is performed by a sequence of instructions that use the value stored at address 0x40 as the free memory pointer, i.e., a pointer to the first memory address available for allocating new memory. In the general case, the memory is structured as a sequence of slots: a slot is composed of several consecutive memory locations that are accessed in the bytecode from the same initial memory location plus a corresponding offset. A slot might just hold a data structure created in the smart contract but also, when nested data structures are used, from one slot we can find pointers to other memory slots for the nested components. Finally, there are other type of transient slots that hold temporary data and that need to be captured by a precise memory analysis as well. These features pose the main challenges to infer needless write accesses and, to handle them accurately, we make the following main contributions: (1) we present a slot analysis to (over-)approximate the slots created along the execution and the program points at which they are allocated; (2) we then introduce a slot usage analysis which infers the accesses to the different slots from the bytecode instructions; (3) we finally infer needless write accesses, i.e., program points where the memory is written but is never read by any subsequent instruction of the program; and (4) we implement the approach and perform a thorough experimental evaluation on real smart contracts detecting needless write accesses which belong to highly optimizable memory usage patterns generated by solc. Finally, it is worth mentioning that the applications of the memory analysis (points 1 and 2) go beyond the detection of needless write accesses: a precise model of the EVM memory is crucial to enhance the accuracy of any posterior analysis (see, e.g., [19] for other concrete applications of a memory analysis).

2 Memory Layout and Motivating Examples

Memory Opcodes. The EVM instruction set contains the usual instructions to access memory: the most basic instructions that operate on memory are \(\texttt{MLOAD}\) and \(\texttt{MSTORE}\), which load and store a 32-byte word from memory, respectively.Footnote 1 The solc compiler generates code to handle memory with a cumulative model in which memory is allocated along the execution of the program and is never released. In contrast to other bytecode virtual machines, like the Java Virtual Machine, the EVM does not have a particular instruction to allocate memory. The allocation is performed by a sequence of instructions that use the value stored at address 0x40 as the free memory pointer, i.e., a pointer to the first memory address available for allocating new memory. In what follows, we use \( mem\langle \text {x}\rangle \) to refer to the content stored in memory at location x.

Memory Slots. In the general case, memory is structured as a sequence of slots. A slot is composed of consecutive memory locations that are accessed by using its initial memory location, which we call the base reference (\(\textit{baseref}\) for short) of the slot, plus the corresponding offset needed to access a specific location within the slot. Slots usually store (part of) some data structure created in the Solidity program (e.g., an array or a struct) and whose length can be known.

Fig. 1.
figure 1

Excerpt of smart contract ERC721A.

Example 1 (slots)

Fig. 1 shows an excerpt of smart contract ERC721A [2] which contains two different contracts \(\texttt{Running1}\) and \(\texttt{Running2}\). We have omitted non-relevant instructions such as those that appear at lines 15-17 (L15-L17 for short). The contract \(\texttt{Running1}\) to the left of Fig. 1 contains the public function \(\texttt{unpackedOwnership}\) that returns a struct of type \(\texttt{TokenOwnership}\) defined at L4-L8. The contract \(\texttt{Running2}\), shown to the right, contains the public function \(\texttt{explicitOwnershipOf}\) that returns, depending on a non-relevant condition, an empty struct of type \(\texttt{TokenOwnership}\) (L28) or the \(\texttt{TokenOwnership}\) received from a call to function \(\texttt{unpackedOwnership}\) of contract \(\texttt{Running1}\) (L23), which is done in the private function \(\mathtt {\_ownershipAt}\). The execution of function \(\texttt{unpackedOwnership}\) in \(\texttt{Running1}\) allocates two different memory slots at L14: \(s_{1}\), for the returned variable ownership, and \(s_{2}\), which is used for actually returning from the function the contents of ownership:

figure b

The execution of this function might create up to six different slots. At L26 and L27, it creates two slots, one for the struct declared in the returns part of the function header (\(s_{3}\)) and one for the local variable \(\texttt{ownership}\) (\(s_{4}\)). Depending on the evaluation of the condition in the if sentence, it might create the slots needed to perform the call to \(\mathtt {\_ownershipAt}\) and, consequently, the external call to \(\mathtt {Running1.unpackedOwnership}\). The invocation to the private function involves three slots: one for the struct declared in the returns part of \(\mathtt {\_ownershipAt}\) in L29 (\(s_{6}\)), one slot to manage the external call data in L23 (\(s_{7}\)), and one slot for storing the results of the private function \(\mathtt {\_ownershipAt}\) in L29 (\(s_{8}\)). Finally, a new slot (\(s_{5}\)) is created for returning the results of \(\texttt{explicitOwnershipOf}\). This new slot might contain the contents of \(s_{4}\) or \(s_{8}\), depending on the if evaluation.

When an amount of memory t is to be allocated, the slot reservation is made by reading and incrementing the free memory pointer (\( mem\langle \text {0x40}\rangle \)) t positions. From this update on, the base reference to the slot just allocated is used, and subsequent accesses to the slot are performed by means of this baseref, possibly incremented by an offset.

Example 2 (memory slot reservation)

The following excerpt of EVM code allocates a slot of type \(\texttt{TokenOwnership}\). The EVM bytecode performs three steps:

(i) load the current value of the free memory pointer \( mem\langle \text {0x40}\rangle \) that will be used as the \(\textit{baseref}\) of the new slot; (ii) compute the new free memory address by adding t to the \(\textit{baseref}\); and (iii), store the new free memory pointer in \( mem\langle \text {0x40}\rangle \). Additionally, in the same block of the CFG, the slot reservation is followed by the slot initialization at \(\texttt{0x19A}\), \(\texttt{0x1AB}\) and \(\texttt{0x1B4}\).

figure c

Solidity reference type values such as arrays, struct typed variables and strings are stored in memory using this general pattern, with some minor differences. However, there are some cases in which the steps detailed above vary and the size of the slot is not known in advance, and thus the free memory pointer cannot be updated at this point. For instance, when data is returned by an external call, its length is unknown beforehand and hence the free memory pointer is updated only after the memory pointed to is written. In other cases, the free memory is used as a temporary region with a short lifetime, as in the case of parameter passing to external calls, and the free memory pointer is not updated. These variants of the general schema must be detected by a precise memory analysis. To this end, we consider that a slot is in transient state when its baseref has been read from \( mem\langle \text {0x40}\rangle \) but the free memory pointer has not been updated, and it is in permanent state when the free memory pointer has been pushed forward.

Example 3 (transient slot)

Now we focus on the external call in L23 of \(\texttt{Running2}\), which performs a \(\texttt{STATICCALL}\), reading from the stack (see [27] for details) the memory location of the input arguments and the location where the results of the call will be saved. Interestingly, both locations reuse the same slot (it corresponds to \(s_7\)) as it can be seen in the following EVM bytecode from \(\mathtt {_ownerShipAt}\):

figure d

The call starts by reading the free memory pointer (at \(\texttt{0x114}\)) and storing at that address the arguments’ data (which include the function selector as first argument). Importantly, the pointer is not pushed forward when the input arguments are written and thus the slot remains in transient state. Once the call at \(\texttt{0x139}\) is executed, the result is written to memory from the baseref on (overwriting the locations used for the input arguments) and the slot is finally made permanent by reading the free memory pointer again (\(\texttt{0x151}\)) and updating it (\(\texttt{0x160}\)) by adding the actual return data size (\(\texttt{RETURNDATASIZE}\)).

Transient slots are also used when returning data from a public function to an external caller. In that case, the EVM code of the public function halts its execution using a RETURN instruction. It reads from the stack the memory location where the length and the data to be returned are located. However, it does not change \( mem\langle \text {0x40}\rangle \) because the function code halts its execution at this point, as we can see in the EVM code of \(\texttt{explicitOwnershipOf}\) (corresponds to slot \(s_5\)):

figure e

The baseref for the return slot is read (at \(\texttt{0x4D}\)) and it is used as a transient slot to write the struct contents to be returned by adding the corresponding offset for each field contained in the struct (instructions on the left column). The code on the left ends with the baseref plus the size of the stored data on top of the stack. After that, the baseref is read again (top of the right column) and the length of the returned data is computed (by subtracting the baseref to the baseref plus the size of the stored data) before calling the RETURN instruction.

3 Inference of Needless Write Accesses

This section presents our static inference of needless write accesses. We first provide some background in Sec. 3.1 on the type of control-flow-graph (CFG) and static analysis we rely upon. Then, the analysis is divided into three consecutive steps: (1) the slot analysis, which is introduced in Sec. 3.2, to identify the slots created along the execution and the program points at which they are allocated; (2) the slot usage analysis, presented in Sec. 3.3, which computes the read and write accesses to the different slots identified in the previous step; and (3) the detection of needless write accesses, given in Sec. 3.4, which finds those program points where there is a write access to a slot which has no read access later on.

3.1 Context-Sensitive CFG and Flow-Sensitive Static Analysis

The construction of the CFG of Ethereum smart contracts is a key part of any decompiler and static analysis tool and has been subject of previous research [15, 16, 25]. The more precise the CFG is, the more accurate our analysis results will be. In particular, context-sensitivity [16] on the CFG construction is vital to achieve precise results. Our implementation of context-sensitivity is realized by cloning the blocks which are reached from different contexts.

Example 4 (context-sensitive CFG)

The EVM code of \(\texttt{Running2}\) creates multiple slots for handling structs of type \(\texttt{TokenOwnership}\). Interestingly, all these slots are created by means of the same EVM code shown in Ex. 2, which corresponds to the CFG block that starts at program point \(\texttt{0x175}\). As this block is reached from different contexts, the context-sensitive CFG contains three clones of this block: \(\texttt{0x175}\), which creates \(s_{3}\) at L26; \(\mathtt {0x175\_0}\), which creates \(s_{4}\) used at L27; and \(\mathtt {0x175\_1}\), which reserves \(s_{6}\), created at L22. Block cloning means that program points are cloned as well, and we adopt the same subindex notation to refer to the program points included in the cloned block: e.g. program point \(\texttt{0x178}\) contains the \(\texttt{MLOAD 0x40}\) that gets the baseref of the slot reserved at block \(\texttt{0x178}\), and \(\mathtt {0x178_0}\) to the same \(\texttt{MLOAD}\) but at \(\mathtt {0x178_0}\), etc.

In what follows, we assume that cloning has been made and the memory analysis using the resulting CFG (with clones) is thus context-sensitive as well, without requiring additional extensions. As usual in standard analyses [23], one has to define the notion of abstract state which defines the abstract information gathered in the analysis and the transfer function which models the analysis output for each possible input. Besides context-sensitivity, the two analyses that we will present in the next two sections are flow-sensitive, i.e., they make a flow-sensitive traversal of the CFG of the program using as input for analyzing each block of the CFG the information inferred for its callers. When the analysis reaches a CFG block with new information, we use the operation \(\sqcup \) to join the two abstract states, and the operator \(\sqsubseteq \) to detect that a fixpoint is reached and, thus, that the analysis terminates. The operations \(\sqcup \) and \(\sqsubseteq \), the abstract state, and transfer function, will be defined for each particular analysis.

3.2 Slot Analysis

The slot analysis aims at inferring the abstract slots, which are an abstraction of all memory allocations that will be made along the program execution. The slots inferred are abstract because over-approximation is made at the level of the program points at which slots are allocated. Therefore, an abstract slot might represent multiple (not necessarily consecutive) real memory slots, e.g., when memory is allocated within a loop. The slot analysis will look for those program points at which the value stored in \( mem\langle \text {0x40}\rangle \) is read for reserving memory space. These program points are relevant in the analysis for two reasons: firstly, to obtain the baseref of the memory slot, and, secondly, because from this point on, the memory reservation of the corresponding slot has started and it is pending to become permanent at some subsequent program point. The output of the slot analysis is a set which contains the allocated abstract slots, named \(\mathcal {S}_{all}\) in Def. 2 below. Each allocated abstract slot (i.e., each element in \(\mathcal {S}_{all}\)) is in turn a set of program points, as the same abstract slot might have several program points where \( mem\langle \text {0x40}\rangle \) is read before its reservation becomes permanent. In order to obtain \(\mathcal {S}_{all}\), the memory analysis makes a flow-sensitive traversal of the (context-sensitive) CFG of the program that keeps at every program point the set of transient slots (i.e. whose baseref has been read but it has not yet made permanent) and applies the transfer function in Def. 1 to each bytecode instruction within the blocks until a fixpoint is reached. An abstract state of the analysis is a set \(\mathcal {S}\subseteq \wp (\mathcal {P}_{R})\), where \(\mathcal {P}_{R}\) is the set of all program points at which \( mem\langle \text {0x40}\rangle \) is read. The analysis of the program starts with \(\mathcal {S}= \{\emptyset \}\) at all program points and takes \(\sqcup \) and \(\sqsubseteq \) as the set union and inclusion operations. Termination is trivially guaranteed as the number of program points is finite and so is \(\wp (\mathcal {P}_{R})\). In what follows, \(\textit{Ins}\) is the set of EVM instructions and, for simplicity, we consider MLOAD 0x40 and MSTORE 0x40 as single instructions in \(\textit{Ins}\).

Definition 1 (slot analysis transfer function)

Given a program point pp with an instruction \(I\in \textit{Ins}\), an abstract state \(\mathcal {S}\), and \(\mathcal {K}= \{{\small {\texttt {MSTORE 0x40}}}, {\small {\texttt {RETURN}}}, {\small {\texttt {REVERT}}},\) \({\small {\texttt {STOP}}}, {\small {\texttt {SELFDESTRUCT}}} \}\), the slot analysis transfer function \(\nu \) is defined as a mapping \(\nu : \textit{Ins} \times \wp (\mathcal {S}) \mapsto \wp (\mathcal {S})\) computed according to the following table:

figure f

Let us explain intuitively how the above transfer function works. As we have seen in Sec. 2, in an EVM program all memory reservations start by reading \( mem\langle \text {0x40}\rangle \) by means of a \(\texttt{MLOAD}\) instruction preceded by a \(\texttt{PUSH 0x40}\) instruction (case 1 in Def. 1). In this case, the transfer function adds to all sets in \(\mathcal {S}\) the current program point, since this is, in principle, an access to the same slots that were already open at this program point and are not permanent yet. To properly identify the slots, our analysis also searches for those program points at which slots reservations are made permanent (case 2 in Def. 1), i.e., those program points with instructions \(I \in \mathcal {K}\). The most frequently used instruction to make a slot reservation permanent is a write access to \( mem\langle \text {0x40}\rangle \) using \(\texttt{MSTORE}\), that pushes forward the free memory pointer such that any subsequent read access to \( mem\langle \text {0x40}\rangle \) will allocate a different slot. The rest of instructions in \(\mathcal {K}\) finalize the execution in different forms (a normal return, a forced stop, a revert execution, etc.). In all such cases, the slot needs to be considered as a permanent slot so that we can reason later on potential needless write accesses involved in it. The set \(\mathcal {S}\) is empty after these instructions since all transient (abstract) slots are made permanent after them. We use the notation \(\mathcal {S}_{pp}\) to refer to the abstract state computed at program point pp.

Example 5 (slot analysis)

The slot analysis of \(\texttt{Running2}\) starts with \(\mathcal {S}_{pp} {=} \{\emptyset \}\) at all program points. When it reaches the block that starts at \(\texttt{0x175}\) (see Ex. 2) \(\mathcal {S}_{{\texttt {\tiny {0x175}}}}\) is \(\{\emptyset \}\) and it remains empty until \(\texttt{0x178}\), where the baseref of \(s_{3}\) is read and hence \(\mathcal {S}_{{\texttt {\tiny {0x178}}}} {=} \{\{{\small {\texttt {0x178}}}\}\}\). This slot is made permanent when the free memory pointer is updated at \(\texttt{0x17F}\), thus having \(\mathcal {S}_{{\texttt {\tiny {0x17D}}}} {=} \{\{{\small {\texttt {0x178}}}\}\}\) and \(\mathcal {S}_{{\texttt {\tiny {0x17F}}}} {=} \{\emptyset \}\). Following the same pattern, \(s_{4}\) and \(s_{6}\) are resp. reserved at instructions \(\mathtt {0x178\_0}\) and \(\mathtt {0x178\_1}\) and closed at \(\mathtt {0x17F\_0}\) and \(\mathtt {0x17F\_1}\) (at the cloned blocks). On the other hand, the baseref of \(s_{5}\) is read at two consecutive program points (\(\texttt{0x4D}\) and \(\texttt{0x5A}\)) and updated at \(\texttt{0x5F}\), and thus, we have \(\mathcal {S}_{{\texttt {\tiny {0x4D}}}} {=} \{\{{\small {\texttt {0x4D}}}\}\}\) and the same until \(\mathcal {S}_{{\texttt {\tiny {0x5A}}}} {=} \{\{{\small {\texttt {0x4D}}},{\small {\texttt {0x5A}}}\}\}\) and again the same until \(\mathcal {S}_{{\texttt {\tiny {0x5F}}}} {=} \{\emptyset \}\). Finally, after the execution of \(\texttt{STATICCALL}\) (see Ex. 3) we have three consecutive reads of \( mem\langle \text {0x40}\rangle \) at 0x114, 0x132 and 0x151 that refer to the same slot \(s_{7}\), which is made permanent at 0x160. Therefore, we have \(\mathcal {S}_{{\texttt {\tiny {0x151}}}} {=} \{\{{\small {\texttt {0x114}}}, {\small {\texttt {0x132}}},{\small {\texttt {0x151}}}\}\}\) and \(\mathcal {S}_{{\texttt {\tiny {0x160}}}} = \{\emptyset \}\).

Using the transfer function, as mentioned in Sec. 3.1, our analysis makes a flow-sensitive traversal of the (context-sensitive) CFG of the program that uses as input for analyzing each block the information inferred for its callers. When a fixpoint is reached, we have an abstract state for each program point that we use to compute the set of abstract slots allocated in the program, named \(\mathcal {S}_{all}\).

Definition 2

The set of allocated abstract slots \(\mathcal {S}_{all}\) is defined as

\(\mathcal {S}_{all}= \bigcup _{pp \in \mathcal {P}_{W}} \mathcal {S}_{pp-1}\), where \(\mathcal {P}_{W}\) is the set of all program points pp : I where \(I{\in } \mathcal {K}\).

Example 6

(\(\mathcal {S}_{all}\) computation). With the values of \(\mathcal {S}_{{\texttt {\tiny {0x17F-1}}}}\), \(\mathcal {S}_{{\texttt {\tiny {0x17F\_0-1}}}}\), \(\mathcal {S}_{{\texttt {\tiny {0x17F\_1-1}}}}\), \(\mathcal {S}_{{\texttt {\tiny {0x160-1}}}}\) and \(\mathcal {S}_{{\texttt {\tiny {0x5F-1}}}}\) from Ex. 5, at the end of the slot analysis of \(\texttt{Running2}\), we have:

\(\mathcal {S}_{all}{=} \{ \underbrace{\{{\small {\texttt {0x178}}}\}}_{s_{3}}, \underbrace{\{{\small {\texttt {0x178\_0}}}\}}_{s_{4}}, \underbrace{\{{\small {\texttt {0x178\_1}}}\}}_{s_{6}}, \underbrace{\{{\small {\texttt {0x114}}}, {\small {\texttt {0x132}}},{\small {\texttt {0x151}}}\}}_{s_{7}}, \underbrace{\{{\small {\texttt {0x5A}}}, {\small {\texttt {0x4D}}}\}}_{s_{5}}, \dots \}. \)

Note that, the cloning of block \(\texttt{0x175}\) allows our analysis to detect three different slots, \(s_{3}\), \(s_{4}\) and \(s_{6}\), for the same program point, \(\texttt{0x178}\), in the original EVM code.

The next example shows the behavior of the analysis when the program contains loops, and an abstraction is needed for approximating the slots.

Fig. 2.
figure 2

Solidity code of contract \(\texttt{Caller}\).

Example 7 (loops)

Fig. 2 shows the contract \(\texttt{Running3}\) that includes the function \(\texttt{explicitOwnershipsOf}\) from the smart contract at [2] (made through a \(\texttt{STATICCALL}\)). This function receives an array of token identifiers as argument and returns an array of \(\texttt{TokenOwnership}\) structs that is populated invoking the function \(\texttt{explicitOwnershipOf}\) from \(\texttt{Running2}\) inside a loop. The slots identified by the analysis for contract \(\texttt{Running3}\) shown in Fig. 2 are: \(s_{9}\), which is created for making a copy of parameter \(\texttt{tokenIds}\) to memory; \(s_{10}\), which creates the local array \(\texttt{ownerships}\) (L44) that contains the array length and pointers to the structs identified initially by \(s_{11}\) (and later on by \(s_{13}\)); \(s_{12}\) for \(\texttt{STATICCALL}\) input arguments and return data (L46); \(s_{13}\) which abstracts the structs for storing the \(\texttt{STATICCALL}\) output results (L46); and \(s_{14}\), which includes the length of ownership and a copy of \(s_{13}\) for returning the results (L48). The important point is that, the local array declaration at L44 produces a loop to allocate as many structs as elements are contained in the array. For this reason, \(s_{11}\) is an abstract slot that represents all \(\texttt{TokenOwnership}\)’s initially added to the array. Similarly, \(s_{12}\) and \(s_{13}\) are created inside the for loop, and each abstract slot represents as many concrete slots as iterations are performed by the loop. Note that, each iteration of the loop creates one instance of \(s_{12}\) for getting the results from the call, and it is copied later to \(s_{13}\) and pointed by \(\texttt{ownerships}\) (\(s_{10}\)).

As notation, we will use a unique numeric identifier (1, 2, \(\ldots \)) to refer to each abstract slot (represented in \(\mathcal {S}_{all}\) as a set) and retrieve it by means of function \(get\_id(a), a\in \mathcal {S}_{all}\). We use \(\mathcal {A}\) to refer to the set of all such identifiers in the program. Also, given a program point pp with an instruction \(\texttt{MLOAD 0x40}\), we define the function \(\textit{get\_slots}(pp)\) to retrieve the identifiers of the elements of \(\mathcal {S}_{all}\) that might be referenced at pp as follows: \(\textit{get\_slots}(pp) = \{id ~|~ a \in \mathcal {S}_{all}\wedge pp \in a \wedge id = get\_id(a) \}.\)

3.3 Slot Access Analysis

While Sec. 3.2 looked for allocations, the next step of the analysis is the inference of the program points at which the inferred abstract slots might be accessed. To do so, our slot access analysis needs to propagate the references to the abstract slots that are saved at the different positions of the execution stack. Importantly, we keep track, not only of the stack positions, but also, in order to abstract complex data structures stored in memory (e.g., arrays of structs), we need to keep track of the abstract slots that could be saved at memory locations. As seen in Ex. 7, a memory location within a slot might contain a pointer to another memory location of another slot, as it happens when nested data structures are used. Thus, an abstract state is a mapping at which we store the potential slots saved at stack positions or at memory locations within other slots.

Definition 3 (memory analysis abstract state)

A memory analysis abstract state is a mapping \(\pi \) of the form \(\mathcal {T}\cup \mathcal {A}\mapsto \wp (\mathcal {A})\).

\(\mathcal {T}\) is the set containing all stack positions, which we represent by natural numbers from 0 (bottom of the stack) on, and \(\mathcal {A}\) is the set of abstract slots identifiers computed in Sec. 3.2. We refer to the set of all memory analysis abstract states as AS. Note that, for each entry, we keep a set of potential slots for each stack position because a block might be reached from several blocks with different execution stacks, e.g., in loops or if-then-else structures. In what follows, we assume that, given a value k, the map \(\pi \) returns the empty set when \(k \not \in dom(\pi )\). The inference is performed by a flow-sensitive analysis (as described in Sec. 3.1) that keeps track of the information about the abstract slots used at any program point by means of the following transfer function.

Definition 4 (memory analysis transfer function)

Given an instruction I with n input operands at program point pp and an abstract state \(\pi \), the memory analysis transfer function \(\tau \) is defined as a mapping \(\tau {:}{} \textit{Ins} \times AS \mapsto AS\) of the form:

figure g

\(t{=}top(pp)\) is the numerical position of the top of the stack before executing I.

Let us explain the above definition. The transfer function distinguishes between two different types of \(\texttt{MLOAD}\): (1) accesses to location \( mem\langle \text {0x40}\rangle \), which return the baseref of the slots that might be used, taking them from the previous analysis through \(\textit{get\_slots}(p)\); and (2) other \(\texttt{MLOAD}\) instructions, which could potentially return slot baserefs from memory locations. Therefore, we have to consider two possibilities: if we are reading a memory location which reads a generic value (e.g. a number) then \(\pi (t) = \emptyset \); if we are reading a memory location that might store an abstract slot, then \(\pi (t)\) contains all abstract slots that might be stored at that memory location. Regarding (3), \(\texttt{MSTORE}\) has two operands: the operand at t is the memory address that will be modified by \(\texttt{MSTORE}\), and the operand at \(t-1\) is the value to be stored in that address. For each element s in \(\pi (t)\), the analysis adds the abstract slots that are in \(\pi (t{-}1)\). Other instructions that are also treated by the analysis are \(\mathtt {SWAP*}\) and \(\mathtt {DUP*}\) shown in (4-5), that exchange or copy the elements of the stack that take part in the operation. Finally, all other operations delete the elements of the stack that are no longer used based on the number of elements taken and written to the stack (case 6).

Example 8 (transfer)

Now we focus on the analysis of block \(\texttt{0x175}\), shown in Fig. 3. As we have already explained, this block is responsible for creating the memory needed to work with several structs of type \(\texttt{TokenOwnership}\) and it is thus cloned in the CFG. In particular, we focus on the clone \(\mathtt {0x175\_1}\). The analysis of the block starts with a stack of size 7 and includes at positions 3 and 4, the abstract slots \(s_{3}\) and \(s_{4}\), which were created at L25 and L26 of Fig. 1. At \(\mathtt {0x178\_1}\), \( mem\langle \text {0x40}\rangle \) is read, and, by means of \(get\_slots({\small {\texttt {0x178\_1}}})\) and, considering that \(top({\small {\texttt {0x178\_1}}}) {=} 8\), we add to \(\pi \) a new entry \(8 \mapsto s_6\). At \(\mathtt {0x179\_1}\), \(\mathtt {0x180\_1}\), \(\mathtt {0x1AA\_1}\), \(\mathtt {0x1B3\_1}\) the transfer function duplicates a slot identifier stored in the stack. \(\texttt{MSTORE}\) and \(\texttt{POP}\) instructions of the example remove a slot identifier from the stack.

Fig. 3.
figure 3

Block of the CFG that reserves memory slot for struct

As it is flow-sensitive, the analysis of each block of the CFG takes as input the join \(\sqcup \) of the abstract states computed with the transfer function for the blocks that jump to it, and keeps applying the memory analysis transfer function until a fixpoint is reached. The operation \(A \sqcup B\) is the result of joining, by means of operation \(\cup \), all entries from maps A and B. Operation \(\sqsubseteq \) is defined as expected, \(A \sqsubseteq B\), when B includes entries that are not in dom(A) or when we have an entry \(v \in dom(A) \cap dom (B)\) such that \(A(v) \subseteq B(v)\). Again, termination of the computation is guaranteed because the domain is finite.

Example 9 (joining abstract states)

The EVM code of \(\texttt{explicitOwnershipOf}\) of Fig. 1 uses \(s_5\) in both return sentences at L28 and L31 (see Ex. 1). This EVM code has a single return block which is reachable from two different paths from the if statement, and which come with different abstract states: (1) the path that corresponds to L28 comes with \(\pi {=} \{3 \mapsto s_8\}\), and the other path (L31) with \(\pi {=} \{3 \mapsto s_4\}\). Our analysis joins both abstract states resulting in \(\pi {=} \{3 \mapsto \{s_4,s_8\}\}\). Because of this join, we get that the RETURN instruction that comes from lines L28 and L31 might return the content of the slots \(s_4\) or \(s_8\).

When the fixpoint is reached, the analysis has computed an abstract state for each program point pp, denoted by \(\pi _{pp}\) in what follows.

Example 10 (complex data structures)

The analysis of the code at Fig. 2 shows how it deals with data structures that might contain pointers to other structures, e.g. \(\texttt{ownerships}\). The abstract slot that represents variable \(\texttt{ownerships}\) is \(s_{10}\), which is written, by means of \(\texttt{MSTORE}\) at two program points, say \(pp_1\) and \(pp_2\) which, resp., come from L44 and L46 of the Solidity code. The input abstract state that reaches \(pp_1\) is \(\{2 \mapsto s_{9}, 6 \mapsto s_{10}, 8 \mapsto s_{10}, 9 \mapsto s_{11}, 10 \mapsto s_{10}\}\), and the transfer function of \(\texttt{MSTORE}\) leaves the abstract state as \(\pi _{pp_1} = \{2 \mapsto s_{9}, 6 \mapsto s_{10}, 8 \mapsto s_{10}, s_{10} \mapsto s_{11}\}\).

At this point, we can see that variable \(\texttt{ownerships}\) is initialized with empty structs and, to represent it, our analysis includes in \(\pi \) the entry \(s_{10} \mapsto s_{11}\) as it is described in instruction \(\texttt{MSTORE}\) of the transfer function at Def. 4. The second write to \(s_{10}\) is performed by another \(\texttt{MSTORE}\) instruction at \(pp_2\). The input abstract state for \(pp_2\) is \(\{{2} \mapsto s_{9}, 5 \mapsto s_{10}, 7 \mapsto s_{13}, 8 \mapsto s_{13}, 9 \mapsto s_{10}, s_{10} \mapsto s_{11}\}\), and thus we get \(\pi _{pp_2} = \{{2} \mapsto s_{9}, 5 \mapsto s_{10}, 7 \mapsto s_{13}, s_{10} \mapsto \{s_{11}, s_{13}\}\}\).

Interestingly, at \(pp_2\), we detect that \(s_{11}\) might also store the structs returned by the call to \(\mathtt {c.explicitOwnershipOf(tokenIds[i])}\), identified by \(s_{13}\), which is added to \(s_{10} \mapsto \{s_{11}, s_{13}\}\). Finally, \(s_{10}\) is read at the end of the method, returning the set \(\{s_{11}, s_{13}\}\), to copy the content of \(\texttt{ownerships}\) to \(s_{14}\), the slot used in the return.

3.4 Inference of Needless Write Memory Accesses

With the results of the previous analysis, we can compute the maps \({\small \mathcal {R}}\) and \({\small \mathcal {W}}\), which are of the form \(pp \mapsto \wp (\mathcal {A})\) and capture the slots that might be read or written, resp., at the different program points. To do so, as multiple EVM instructions, e.g. RETURN, \(\texttt{CALL}\), \(\texttt{LOG}\), \(\texttt{CREATE}\), ..., might read, or write, memory locations taking the concrete location from the stack, we define functions \(\textit{mr}(I)\) and \(\textit{mw}(I)\) that, given an EVM instruction I, return the position in the stack of the address to be read and written by I, resp. If the instruction does not read/write any memory position, function \(\textit{mr}(I) = \bot \)/\(\textit{mw}(I) = \bot \). For example, \(\textit{mr}(\text {\texttt{MLOAD}}) = 0\) as it reads the top of the stack and \(\textit{mw}(\text {\texttt{MLOAD}}) = \bot \), or \(\textit{mr}(\text {\texttt{STATICCALL}}) = 2\) and \(\textit{mw}(\text {\texttt{STATICCALL}}) = 4\). Now, we define the read/write maps \({\small \mathcal {R}}\)/\({\small \mathcal {W}}\):

Definition 5 (memory read/write accesses map)

Given an EVM program P, such that \(pp \equiv I \in P\) and being \(t {=} top(pp)\), we define maps \({\small \mathcal {R}}\) and \({\small \mathcal {W}}\) as follows:

figure h

Example 11

(\({\small \mathcal {R}}\)/\({\small \mathcal {W}}\) maps). Let us illustrate the computation of \({\small \mathcal {R}}({\small {\texttt {0x139}}})\) and \({\small \mathcal {W}}({\small {\texttt {0x139}}})\), which contains the \(\texttt{STATICCALL}\) of \(\texttt{Running2}\). With the analysis information obtained from the analysis we have that \(top({\small {\texttt {0x139}}}) = 16\) and \(\pi _{{\small {\texttt {0x138}}}} = \{3 \mapsto s_{3}, 4 \mapsto s_{4}, 7 \mapsto s_{6}, 10 \mapsto s_{7}, 12 \mapsto s_7, 14 \mapsto s_7\}\), thus we get \({\small \mathcal {R}}({\small {\texttt {0x139}}}) = \{s_{7}\}\) and \({\small \mathcal {W}}({\small {\texttt {0x139}}}) = \{s_{7}\}\), i.e., the slot used for managing the input and the output of the external call. Analogously, we get that \({\small \mathcal {R}}({\small {\texttt {0x178}}}) = \{s_3\}\) and \({\small \mathcal {W}}({\small {\texttt {0x178}}}) = \emptyset \).

The last step of our analysis consists in searching for write accesses to slots which will never be read later. To do so, we use the information computed in \({\small \mathcal {R}}\) and \({\small \mathcal {W}}\). Given the CFG of the program and two program points p and p2, we define function \(reachable(p,p2)\), which returns true when there exists a path in the CFG from p to p2. We define the set write leaks \({\small \mathcal {N}}\) as follows:

Definition 6

Given an EVM program and its \({\small \mathcal {W}}\) and \({\small \mathcal {R}}\), we define \({\small \mathcal {N}}\) as

\({\small \mathcal {N}}= \{pw{:}s ~|~ pw \in P \wedge s \in {\small \mathcal {W}}(pw) \wedge \lnot exists\_read(pw,s)\}\)

where \(exists\_read(pw,s) \equiv \exists ~ pr \in dom({\small \mathcal {R}}) ~|~ s \in {\small \mathcal {R}}(pr) \wedge reachable(pw,pr)\).

Intuitively, the set \({\small \mathcal {N}}\) contains those write accesses, taken from \({\small \mathcal {W}}\), that are never read by subsequent blocks in the CFG. As both function reachable and the sets \({\small \mathcal {W}}\) and \({\small \mathcal {R}}\) are over-approximations, the computation of \({\small \mathcal {N}}\) provides us those write accesses that can be safely removed, as the next example shows.

Example 12

Our analysis detects that at program points \(\texttt{0x19A}\), \(\texttt{0x1AB}\) and \(\texttt{0x1B4}\) there are \(\texttt{MSTORE}\) operations that are never read in the subsequent blocks of the CFG. Such operations correspond to the memory initialization of \(s_{3}\), which is performed at L26 of the code of Fig. 1 (see Ex. 2). Given that these write accesses are the only use of the slot, the whole reservation can be safely removed. Moreover, the analysis detects that program points \(\mathtt {0x19A\_1}\), \(\mathtt {0x1AB\_1}\) and \(\mathtt {0x1B4\_1}\), which correspond to the reservation of \(s_{6}\) performed at L22, are detected as needless. In essence, it means that \(s_{3}\) and \(s_{6}\) are allocated and initialized but are never used in the program. Note that, all these program points belong to two blocks cloned: (\(\texttt{0x175}\) and \(\mathtt {0x175\_1}\)). However, the three \(\texttt{MSTORE}\) operations of the other clone of the same block (\(\mathtt {0x175\_0}\)), which correspond to the allocation at L27 are not identified as non-read, as they might be used in the return of the function. For this, the precision of the context-sensitive CFG is necessary to identify these \(\texttt{MSTORE}\) operations as needless. As a result we cannot eliminate the block because it is needed in one of the clones, but still we can achieve an important optimization on the EVM code by removing the unconditional jumps to this block in the other two cases that would avoid completely the execution of all these instructions (and their corresponding gas consumption [27]).

The soundness of slots and slots access analyses states that, for each concrete slot, there exists an abstract slot in \(\mathcal {S}_{all}\) that represents it and, that any access to memory is approximated by an inferred abstract slot. Technical details can be found in an extended report [8].

4 Experimental Evaluation

This section reports on the results of the experimental evaluation of our approach, as described in Sec. 3. All components of the analysis are implemented in Python, are open-source, and can be downloaded from github where detailed instructions for its installation and usage are providedFootnote 2. We use external components to build the CFGs (as this is not a contribution of our work). Our analysis tool accepts smart contracts written in versions of Solidity up to 0.8.17 and bytecode for the Ethereum Virtual Machine v1.10.25Footnote 3. The experiments have been performed on an AMD Ryzen Threadripper PRO 3995WX 64-cores and 512GB of memory, running Debian 5.10.70. In order to experimentally evaluate the analysis, we pulled from etherscan.io [5] the Ethereum contracts bound to the last 5,000 open-source verified addresses whose source code was available on July 14, 2022. From those addresses, the code of 2.18% of them raises a compilation error from \(\textsf {solc}\). For the code bound to the 4,891 remaining addresses, the generation of the CFG (which is not a contribution of this work) timeouts after 120s on 626 of them. Removing such failing cases, we have finally analyzed 19,199 smart contracts, as each address and each Solidity file may contain several contracts in it. Note that 84.86% of the contracts are compiled with the \(\textsf {solc}\) version 0.8, presumably with the most advanced compilation techniques. The whole dataset used will be found at the above github link.

In order to be in a worst-case scenario for us, we run the memory analysis after executing the solc optimizer, i.e, we analyze bytecode whose memory usage may have been optimized already by the optimizer available in solc. This will allow us also to see if we can achieve further optimization with our approach. Unfortunately, we have not been able to apply our tool after running the super-optimizer GASOL [9], because it does not generate the optimized bytecode but rather it only reports on the gas and/or size gains for each of the blocks. Nevertheless, a detailed comparison of the techniques that GASOL applies and ours is given in Sec. 5, where we justify that GASOL will not find any of our needless accesses. From the 19,199 analyzed contracts, the analysis infers 679,517 abstract memory slots and detects 6,242 needless write memory accesses in 12,803s. These needless accesses occur within the code bound to 780 different addresses, i.e., 15.95% of the analyzed ones.

We have computed the number of needless accesses identified by our analysis grouped by function and the number of different contracts that contain these functions. Some of them such as \(\textsf {transferFrom}\)(1736 accesses in 439 contracts), \(\textsf {transfer}\)(1745 aacesses in 441 contracts), \(\textsf {reflectionFromToken}\)(105 accesses in 6 contracts) or \(\textsf {withdraw}\)(54 accesses in 32 contracts) are functions widely used in the implementation of contracts based on ERC tokens. A manual inspection of the 10 most common public functions with the needless accesses inferred has revealed two different sources for them: some of the needles accesses are due to inefficient programming practices, while others are generated by the compiler and could be improved. As regards compiler inefficiencies, we detected bytecode that allocates memory slots that are inaccessible and cannot be used because the baseref to access them is not maintained in the stack. For example, when a struct is returned by a function, it always allocates memory for this data. However, if the return variable is not named in the header of the function, the compiler allocates memory for this data although it will never be accessed. If programmers are aware of this behavior they can avoid such generation of useless memory but, even better, this memory usage patterns can be changed in the compiler. For instance, it is reflected in L22 and L26 in Fig. 1, where the functions do not name the return variable. Hence, the compiler allocates memory for these anonymous data structures which are never used. Similarly, there are various situations involving external calls in which the compiler creates memory that is never used. When there is an external call that does not retrieve any result, the compiler creates two memory slots, one for retrieving the result from the call, and another one for copying a potential result to a memory variable that is never used. Finally, the compiler also creates memory that is never used for low-level plain calls for currency transfer. Even though the contract code does not use the second result returned by the low-level call, the compiler generates code for retrieving it. All these potential optimizations have been detected by means of our inference of needless write accesses and will be communicated to the solc developers.

5 Conclusions and Related Work

We have proposed a novel memory analysis for Ethereum smart contracts and have applied it to infer needless write memory accesses. The application of our implementation over more than 19,000 real smart contracts has detected some compilation patterns that introduce needless write accesses and that can be easily changed in the compiler to generate more efficient code. Let us discuss related work along two directions: (1) memory analysis and (2) memory optimization. Regarding (1), we can find advanced points-to analysis developed for Java-like languages [7, 11, 18, 20]. Focusing on EVM, the static modeling of the EVM memory in [16] has some similarities with the memory analysis presented in Secs. 3.2 and 3.3, since in both cases we are seeking to model the memory although with different applications in mind. There are differences on one hand on the type of static analysis used in both cases: [16] is based on a Datalog analysis while we have defined a standard transfer function which is used within a flow-sensitive analysis. More importantly, there are differences on the precision of both analyses. We can accurately model the memory allocated by nested data structures in which the memory contains pointers to other memory slots, while [16] does not capture such type of accesses. This is fundamental to perform memory optimization since, as shown in the running examples of the paper, it allows detecting needless write accesses that otherwise would be missed. Finally, the application of the memory analysis to optimization is not studied in [16], while it is the main focus of our work.

As regards (2), optimizing memory usage is a challenging research problem that requires to precisely infer the memory positions that are being accessed. Such positions sometimes are statically known (e.g., when accessing the EVM free memory pointer) but, as we have seen, often a precise and complex inference is required to figure out the slot being accessed at each memory access bytecode. Recent work within the super-optimizer GASOL [9] is able to perform some memory optimizations at the level of each block of the CFG (i.e., intra-block). of There are three fundamental differences between our work and GASOL: First, GASOL can only apply the optimizations when the memory locations being addressed refer to the same constant direction. In other words, there is no real memory analysis (namely Secs. 3.2 and 3.3). Second, the optimizations are applied only at an intra-block level and hence many optimization opportunities are missed. These two points make a fundamental difference with our approach, since detected optimizable patterns (see Sec. 4) require inter-block analysis and a precise slot access analysis, and hence cannot be detected by GASOL.

Finally, as mentioned in Sec. 1, in addition to dynamic memory, smart contracts also use a persistent memory called storage. Regarding the application of our approach to infer needless accesses in storage, there are two main points. First, there is no need to develop a static analysis to detect the slots in storage, as they are statically known (hence our inference in Sec. 3.2 and 3.3 is not needed), i.e., one can easily know the read and write sets of Def. 6. Thus, the read and write sets of our analysis can be easily defined for storage. The second point is that, as storage is persistent memory, a write storage access is not removable even if there is no further read access within the smart contract, as it needs to be stored for a future transaction. The removable write storage accesses are only those that are rewritten and not read in-between the two write accesses. Including this in our implementation is straightforward. However, this situation is rather unusual, and we believe that very few cases would be found and hence little optimization can be achieved.