Symbolic Predictive Cache Analysis for Out-of-Order Execution

. We propose a trace-based symbolic method for analyzing cache side channels of a program under a CPU-level optimization called out-of-order execution (OOE). The method is predictive in that it takes the in-order execution trace as input and then analyzes all possible out-of-order executions of the same set of instructions to check if any of them leaks sensitive information of the program. The method has two important properties. The (cid:12)rst one is accurately analyzing cache behaviors of the program execution under OOE, which is largely overlooked by existing methods for side-channel veri(cid:12)cation. The second one is e(cid:14)ciently analyzing the cache behaviors using an SMT solver based symbolic technique, to avoid explicitly enumerating a large number of out-of-order executions. Our experimental evaluation on C programs that implement cryptographic algorithms shows that the symbolic method is e(cid:11)ective in detecting OOE-related leaks and, at the same time, is signi(cid:12)cantly more scalable than explicit enumeration.


Introduction
There has been growing interest in recent years in detecting side-channel leaks in software using automated program analysis and verification techniques, due to the increased awareness of the threat of real-world side-channel attacks [4,15,18]. These are side-channel attacks because they exploit dependencies between sensitive information of the program and non-functional properties of the computing platform, including cache-related timing variations caused by CPU-level optimizations such as pipelining and branch prediction. While there are existing methods for detecting these side channels based on static analysis [6,28,31] and symbolic execution [3,[10][11][12]29], they do not accurately model an important CPU-level optimization called out-of-order execution (OOE).
Out-of-order execution is widely adopted by modern CPUs. It is possible for a program to be free of side-channel leaks when instructions are executed in the program order but have leaks when they are executed out of order. Here, the program order refers to the order in which instructions appear in the program. However, modeling out-of-order execution during program analysis is a  Fig. 1. Spreca -symbolic predictive analysis for out-of-order execution.
challenging task due to the inherently large number of possible scenarios that must be considered. Generally speaking, instructions within a fixed window (an imaginary window used to model the effect of hardware features including the reorder buffer, issue queue, and load-store queue) may be executed in any order as long as it respects the semantics of the program. Thus, given N instructions, the number of possible execution orders can be as large as O(N !). Since it is practically intractable to examine these execution orders individually, existing methods had to choose from the following two undesired outcomes: if they overapproximate, they may report bogus leaks since some infeasible execution orders will be included; but if they under-approximate, they may miss real leaks since some feasible execution orders will be excluded.
To solve the aforementioned problem, we propose a trace-based symbolic predictive analysis to accurately and efficiently analyze the OOE related cache behaviors. Here, accurately means that our method does not over-or underapproximate the OOE behaviors but precisely encodes these behaviors as a set of logical constraints; efficiently means that our method avoids enumerating the out-of-order executions explicitly to avoid the exponential blowup; instead it leverages an off-the-shelf SMT solver to conduct a symbolic analysis of the logical constraints. Our method is predictive in that, given an in-order execution trace of the program, it analyzes the cache behaviors of all out-of-order executions of the instructions that appeared in the in-order execution, instead of executing them. Fig. 1 shows the overall flow of our method, named Spreca, which takes an annotated C program as input; the annotation marks program inputs as either public or private (secret). Internally, our method has three steps. In the first step, it utilizes the LLVM compiler to parse the C program, compute the program dependencies, and use the information to instrument the LLVM bit-code. The instrumented program, at run time, can generate the in-order execution trace. In the second step, our method encodes the set of all possible OOE related cache behaviors as a set of logical constraints, to be solved by an off-the-shelf SMT solver. In the third step, our method checks if there are secret-dependent divergent cache behaviors, e.g., an out-of-order execution causing a cache hit for one value of the secret variable but a cache miss for another value of the secret variable.
The main contribution of our work is symbolically modeling the OOE related cache behaviors accurately and efficiently. We design the SMT encoding (to be presented in Section 5) carefully to make it compact. For example, a straightforward encoding of all possible permutations of N instructions would lead to an SMT formula of size O(N 3 ), since any instruction may have any other instruction as its predecessor and, as a result, the update function must be encoded for each predecessor's cache state and the current cache state. Our method, in contrast, avoids most of these update functions by leveraging the program dependency relations recorded in the in-order execution trace to prune away the infeasible permutations.
Our method differs significantly from the method of Guo et al. [10,11] based on symbolic execution. While their method also uses symbolic analysis, they only made the program input symbolic, whereas the out-of-order executions are still enumerated explicitly (this is evident based on their use of a technique designed for speeding up explicit enumeration, called partial order reduction). In other words, for each out-of-order execution, they had to generate an SMT formula to check if it has divergent cache behaviors; as a result, they did not avoid the exponential blowup. In contrast, our method generates a single SMT formula to encode all possible out-of-order executions associated with the inorder execution. In addition to being more efficient, our single-formula based encoding can be more easily adapted to model other CPU-level optimizations by slightly modifying how dependencies are encoded as logical constraints.
We have implemented our method in a software tool by leveraging the opensource LLVM compiler [17] and the Z3 SMT solver [19]. Specifically, we use LLVM to parse the C program, compute the program dependencies, and instrument the bit-code, to generate the in-order execution trace at run time. We use Z3 to implement symbolic analysis of the out-of-order executions. We evaluated our method on a set of C programs from OpenSSL that implement well-known block ciphers and cryptographic hash functions. The experimental results show that our method, by accurately modeling the OOE related cache behaviors, can detect OOE-related side-channel leaks that otherwise would have been missed. The results also show that our SMT solver-based symbolic analysis is significantly more scalable than explicit enumeration.
To summarize, this paper makes the following contributions: -We propose a trace-based symbolic predictive analysis for detecting OOE related cache side-channel leaks. -We rely on an off-the-shelf SMT solver to accurately and efficiently analyze the out-of-order executions associated with an in-order execution trace. -We demonstrate the effectiveness of our method on C programs from an open-source library that implements well-known cryptographic algorithms.
The remainder of this paper is organized as follows. First, we motivate our work using examples in Section 2. Then, we provide the technical background in Section 3. Next, we present our method in Sections 4 and 5, followed by the experimental results in Section 6. We review the related work in Section 7. Finally, we give our conclusions in Section 8.

Motivation
In this section, we use examples to illustrate the cache behaviors of the in-order execution and an out-of-order execution. We also explain the high-level idea of our trace-based symbolic analysis. Fig. 2 shows the code snippet which, for ease of presentation, is written in a mixture of C and simplified assembly language. Here, assume i ∈ {0, 1, 2} is a secret variable and each array element A[i] occupies 4 bytes in memory. Furthermore, while our method handles realistic cache size and configurations, in this motivating example, we assume the cache has only one set, consisting of 3 cache lines, with each cache line holding only 4 bytes. We assume the cache is fully associative, and uses the LRU (least recently used) replacement policy. Under these assumptions, each array element A[i] occupies an entire cache line.

The Execution Order
The order in which instructions are written in a program is called the program order. During the in-order execution, instructions are executed according to their program order. Without loss of generality, we assume that there are two types of instructions: memory-related instructions such as Load and Store, and nonmemory-related instructions, such as ALU and branch instructions. As far as this work is concerned, our focus is on memory-related instructions because non-memory instructions do not affect cache behavior 1 . Fig. 3 compares the in-order execution on the left with a possible out-oforder execution on the right. The out-of-order execution is a permutation of instructions of the in-order execution that, at the same time, must respect the semantics of the original program. In both of these two execution traces, each row represents an instruction and its associated memory address. Note that while a program may have if-else statements and thus multiple paths, an execution trace corresponds to only one program path.

The Cache State
Given a program execution, regardless of whether it is the in-order execution or one of the out-of-order executions, it is straightforward to compute changes of the cache state at each step. The cache state of our running example can be defined as a tuple S = Age(A[0]), Age(A [1]), Age(A [2]), Age(B) , consisting of the ages of cache lines associated with the four program variables. Since we assume that the cache holds at most 3 variables (lines) at any moment if a variable is inside the cache, its age must be 0, 1, or 2; and if it is evicted from the cache, its age must be 3. Initially, the cache state is S 0 = −1, −1, −1, −1 , where -1 is a special symbol meaning it is not loaded into cache yet.
In-Order Cache Behavior As shown in Fig. 4   Out-of-Order Cache Behavior There can be many out-of-order executions, or permutations of instructions, corresponding to an in-order execution. While they must preserve the semantics of the in-order execution, they do not have to preserve its cache behavior. Thus, even if the in-order execution does not have divergent cache behaviors (with respect to a secret variable), one of the out-oforder executions may have divergent cache behaviors. As shown in Fig. 5 for this particular out-of-order execution that reorders store A[i] and load B, when i = 0, accessing A[i] results in a cache hit, but when i = 0, it results in a cache miss. 1 Out-of-order (for i==1) Out-of-order (for i==0) Cache behavior of the out-of-order execution depends on the secret value i; that is, accessing A[i] results in a cache hit when i = 0 but a cache miss when i = 0.

The Side-channel Leak
Whenever the cache behavior of an execution (regardless of whether it is the in-order execution or an out-of-order execution) depends on the value of a secret variable, it is called a side-channel leak. This is a security risk because, in modern CPUs, a cache hit only takes 1-3 CPU cycles whereas a cache miss may take up to a hundred CPU cycles. By observing the difference in the execution time of a victim program, the attacker may be able to deduce a certain amount of information about the secret. In our running example, since store A[i] is dependent on the value of the secret variable i, we need to check if executing store A[i] leads to divergent cache behaviors. During the in-order execution, the answer is no, since it results in a cache hit for all i = 0, 1, and 2. Thus, the in-order execution has no sidechannel leak. During one of the out-of-order executions, however, the answer is yes, since it results in a cache hit for some value of i but a cache miss for some other value of i. Thus, the out-of-order execution has a leak.
Generally speaking, there are two types of side-channel analysis techniques: approximate and accurate. While over-or under-approximation may be fast, it leads to poor results, i.e., reporting bogus leaks or missing real leaks. Thus, we are only concerned with accurate analysis techniques. In this context, while it is possible to examine each individual out-of-order execution, it will lead to exponential blowup. Our method, in contrast, encodes the cache behaviors of all out-of-order executions in a single logical formula. The formula is then solved using an efficient, off-the-shelf SMT solver to avoid an exponential blowup.

In-order
Out-of-order Fig. 6. The instruction window and the different execution orders.

The Execution Model
Recall that modern CPUs may execute instructions of a program in any order as long as the end result remains the same. The default order is the program order, i.e., the order in which instructions appear in the program. For performance reasons, however, the CPU does not always follow the program order, because some instructions may be significantly slower than others and, instead of waiting for the slower instructions to complete, the CPU may choose to execute some subsequent instructions as long as the program semantics is preserved.
Instruction Window As shown in Fig. 6, we use an imaginary instruction window to abstract the behavior of various hardware components inside the CPU for supporting out-of-order execution. The size of this instruction window depends on the CPU, including but not limited to the sizes of its reorder buffer, issue queue, and load-store queue. For this work, however, there is no need to delve into the hardware details. Instead, it suffices to assume that within this imaginary window of N instructions, the CPU may choose any execution order as long as the end result remains the same.
Data Hazards To make sure that the end result remains the same, only the outof-order executions that respect the data dependencies of the original program are allowed. In the computer architecture literature, violations of such dependencies are called hazards. Specifically, there are three types of hazards, named RAW (read after write), WAR (write after read), and WAW (write after read), respectively. It is worth noting that RAR (read after read) is not a hazard.

The Cache Model
Without loss of generality, we assume the cache has K cache lines in total and each cache line has 64 bytes. The cache lines are further divided into M sets, which means each set has (K/M ) cache lines. The memory is also divided into 64-byte blocks, each of which is mapped to a unique set. Within the same set, however, the 64-byte block may occupy any of the cache lines. Thus, within the set, it is called fully associative; overall, the entire cache is called set associative.
In this context, a fully associative cache is a special case (K-way set associative), while a direct mapped cache is another special case (1-way set associative).
The Cache State The cache state is a tuple is a variable in the program, and Age(v i ) is the age of the cache line associated with v i . V ars is the set of all variables. Here, we use the subscript in S I to indicate that it is the cache state resulting from executing the instruction I. Assume that K is the number of cache lines in a set. The domain of Age(v i ) is {0, 1, . . . , K, −1}, where an age from 0 to K − 1 means the variable is inside the cache, while K means the variable is evicted from cache and −1 means it has never been loaded into cache. We assume that the cache uses the LRU (least recently used) replacement policy. Given a cache state S I and an instruction I , the new cache state S I is computed by the U pdate(S I , I ) function. Assuming that v ∈ V ars is the variable used by the instruction I , u 1 ∈ V ars is another variable whose age was younger than v in S I , and u 2 ∈ V ars is yet another variable whose age was older than v in S I , we compute the new cache state S I = Age (v 1 ), . . . , Age (v n ) as follows: That is, the most recently used variable (v) occupies the youngest cache line, any variable (u 1 ) whose age was younger than v in S I increases its age by 1, and any variable (u 2 ) whose age was older than v in S I keeps its age unchanged.

The Side-channel Leak Condition
Whenever there is a dependency between the secret and some divergent cache behaviors of an execution, there is a side-channel leak. Thus, there are two requirements. First, there must be divergent cache behaviors, i.e., memory-related instruction causing a cache miss for some input value but a cache hit for some other input value. Second, the input value causing divergent cache behaviors must be a secret, e.g., a password, security token, or cryptographic key.
Thus, the side-channel leak condition can be defined as follows: Here, E denotes an execution, and I ∈ E is an instruction in E; v 1 and v 2 are two values of a secret variable v s ∈ V ars; and CacheStatus(E, I, v s ) is a function that returns the cache status (hit or miss) when instruction I is executed in E using v s .

Analyzing the In-Order Execution
In this section, we present our method for generating, and then analyzing the in-order execution trace. There are two tasks. The first one is to compute the dependencies of memory-related instructions. The second one is to compute the default cache states. Both the dependencies and the default cache states will be used during our symbolic analysis of the out-of-order executions.

Computing the Dependencies
There are two types of dependencies associated with the in-order execution of a program: explicit dependencies and implicit dependencies.
Explicit Dependencies Explicit dependencies refer to data conflicts that can be directly observed during the execution, by looking at the actual addresses of memory blocks used by the instructions at run time. Consider the in-order execution example in Fig. 3 (left). Since both instructions I 4 and I 1 access the memory block at the address 0x77ef5bd0, and at least one of them is a store operation, these two instructions have an explicit dependency; that is, they cannot be reordered during out-of-order. Implicit Dependencies Implicit dependencies, on the other hand, refer to data conflicts that cannot be directly observed during the in-order execution. Fig. 7 shows an example. The code snippet shows that store A [1] is dependent on load A[0], through the def-use chain of (register) variables r1-r3. Since nonmemory instructions (mul, add, mov in this example) do not show up in the logged execution trace, their constraints on the memory instructions would have been lost if we do not compute and record them explicitly into the execution trace.
In our method, we compute the implicit dependencies by statically analyzing the LLVM bit-code of the program before instrumenting the bit-code to add self-logging capabilities. Then, we execute the instrumented code to obtain the trace. As a result, the implicit dependencies will be captured in the execution trace as a special relation (DEP sta ). Static program analysis has a global view of the program and thus is well suited for computing the implicit dependencies. Inside LLVM, the bit-code is represented in a Single Static Assignment (SSA) format, meaning each variable is defined only once, which makes it possible to efficiently compute the implicit dependencies [20].
In addition to the implicit dependencies (DEP sta ) computed by static analysis, we also compute the explicit dependencies (DEP dyn ) based on the actual addresses appeared in the execution trace: for each memory address, instructions that use the address are checked to see if they have data hazards (RAW, WAR, or WAW). For instructions that have data hazards, their relative execution order during in-order execution cannot be violated; otherwise, the original program semantics may be changed.
Given both the statically computed DEP sta and the dynamically computed DEP dyn , we compute their transitive closure to obtain DEP = (DEP sta ∪ DEP dyn ) * , which represents the complete set of dependency constraints that must be respected at all time, to ensure that the out-of-order executions examined by our symbolic analysis are feasible.
The fact that static analysis is conservative in nature will not affect the correctness of our subsequent symbolic analysis. Since not all memory-addressing instructions can be statically resolved, as shown by the example instruction store A[i] in Fig. 2, static analysis may soundly over-approximate the possible dependencies of memory-related instructions. This is not a problem because it guarantees that, as long as two instructions are marked as independent, it is always safe to reorder these instructions during out-of-order execution. This is crucial for ensuring that leaks detected by our method are feasible.

Computing the Default Cache States
Given the in-order execution trace, we perform an in-order simulation to compute the default cache states, which will be used during our symbolic analysis of the out-of-order executions.
We regard the in-order execution trace as a sequence of instructions T ino = {I 1 , . . . , I n }. The type of each instruction may be Load, Store, Symbolic Load, or Symbolic Store. Each Load/Store instruction is associated with an actual memory address. Each Symbolic Load/Store instruction is associated with a range of addresses that it may use.
Starting with an initial cache state S 0 , we compute the sequence of cache states T cache = {S 0 , S I1 . . . , S In } using the update function defined in Section 3.2. While the update function in Section 3.2 uses the LRU replacement policy, other cache replacement policies can also be implemented easily.
The result of in-order simulation will be given to our symbolic analysis, to examine the set of all possible out-of-order executions. Here, an out-of-order execution, denoted T ooe = {I 1 , ..., I n }, is a permutation of instructions of the in-order execution. That is, for all 1 ≤ i ≤ n and instruction I i ∈ T ino , there exists 1 ≤ j ≤ n, i = j such that I j ∈ T ooe and I j = I i , and vice versa.

Analyzing the Out-of-Order Executions
In this section, we present our method for symbolically analyzing the out-of-order executions.

Symbolic Encoding
Our method uses a single logical formula (Φ) to encode the behaviors of all outof-order executions of instructions within a sliding window of size N , together with the condition under which an out-of-order execution has secret-dependent, divergent cache behaviors. It guarantees that Φ is satisfiable if and only if there exists such a side-channel leak in the sliding window of size N . Thus, when setting the value of N , there is a trade-off between coverage and scalability.
Before explaining how Φ is constructed from the in-order execution trace, however, we need to define the notations used in the symbolic encoding.
-Sliding Window: We focus on a sliding window of N instructions appeared in the in-order execution trace. Within this window, instructions may be executed in any order as long as they respect the DEP relation; outside of this window, instructions are executed in-order. -Program Counter: We use (N + 1) variables P C I 0 , P C I 1 , . . . , P C I N to represent the time when we execute the N instructions I 1 , . . . , I N . The special variable P C I 0 represents the start time, and each P C I i (where 1 ≤ i ≤ N ) represents the time immediately after I i is executed. With these notations, we define the formula Φ as a conjunction of the following subformulas: where Φ pc is the program counter constraint, Φ cs is the cache state constraint, Φ ics is the initial cache state constraint, Φ rep is the cache replacement constraint, Φ dep is the dependency constraint, and Φ divc is the divergence condition constraint.
Program Counter Constraint (Φ pc ) To get a total order of the N instructions, we require that, for all 0 ≤ i ≤ N , the value of P C I i is unique; furthermore, we require 0 ≤ P C I i ≤ N . Thus, the constraint is defined as Initial Cache State Constraint (Φ ics ) Before the first instruction is executed, the cache must be set to a proper initial state. In other words, variables Age addr 1 I 0 , . . . , Age addr M I 0 must be initialized based on the default cache states computed by in-order simulation (Section 4.2). Thus, the constraint is defined as Φ ics = 0≤k≤M (Age addr k I 0 = init age addr k ) Replacement Constraint (Φrep) Assuming that instruction I j is immediately before I i during an out-of-order execution, we define the cache line ages after executing I i based on their ages after executing the predecessor instruction I j . Let addr k be the address used by I i , addr k1 be any address whose age was younger than that of addr k immediately before executing I i , and addr k2 be any address whose age was older than that of addr k . According to the update function defined in Section 3.2, we set Age addr k I i to 0, set Age addr k1 I i to (Age addr k1 I j + 1), and set Age addr k2 I i to Age addr k2 I j . Let the relation U pdateRel(I i , I j ) be the conjunction of the constraints defined above. If a symbolic address (secret-dependent) is used by I i , we encode it into the update relation as follows: for each concrete address that may be instantiated from the symbolic address, we construct an update relation U pdateRel() under the assumption that it may be the actual address used by I i .
Overall, the cache replacement constraint is defined as Dependency Constraint (Φ dep ) To ensure that out-of-order executions are feasible, we enforce the relative order of any two instructions if they have dependencies according to the DEP relation. Thus, the constraint is defined as 0≤i,j≤N and i =j and DEP (Ii,Ij ) That is, if I j depends on I i , I i must be executed before I j .
Divergent Cache Constraint (Φ divc ) Let V ar s be a symbolic (secret) variable whose values include v 1 , v 2 , . . . and let I i be a symbolic instruction whose actual addresses include addr v1 , addr v2 , . . . Here, the value v 1 corresponds to addr v1 and the value v 2 corresponds to addr v2 . If accessing the memory block at addr v1 leads to a cache hit and accessing addr v2 leads to a cache miss (or vice versa), the target instruction I i has divergent cache behaviors. Thus, the constraint is defined as Conjoining all of the subformulas defined above, we can construct the entire formula Φ which is satisfiable (SAT) if and only if there is a side-channel leak during one of the out-of-order executions.

The Overall Algorithm
The overall algorithm for predictive cache analysis is shown in Algorithm 1, which takes the in-order execution trace T ino = {I 1 , . . . , I n }, the in-order cache state trace T cache = {S 0 , . . . , S n }, and the sliding window size N as input. Internally, it uses a sliding window of N instructions, T window , to generate the SMT formula Φ. For this window, S init is the initial cache state as computed by in-order simulation, and I target is the target instruction. The formula Φ is satisfiable if and only if an out-of-order execution of the instructions within the window leads to divergent cache behaviors at the instruction I target .
Running Example We use the example code snippet in Fig. 2 to illustrate the symbolic encoding presented in this section. For this example, the in-order execution trace generated by our method is shown in the top half of Fig. 8. Note that A is marked as symbolic since A[i] is affected by the unknown variable i. The logical constraints are shown in the bottom half. Assume that the target instruction is I 4 , meaning that we want to construct a formula Φ to check if I 4 has divergent cache behaviors. The program counter and cache state constraints are shown in Lines 10-12; recall that each program counter variable must have a unique value. The dependency constraints are shown in Line 13. Then, in Line 14, we show the two symbolic variables used to check divergent cache behaviors; their values are in the range of the symbolic store in Line 5.

Optimizations of the Symbolic Encoding
Without optimization, the size of the formula Φ may be as large as O(N 2 M ) in the worst case, where N is the number of instructions in the sliding window and M is the number of memory addresses used inside the window. In practice, however, many of the logical constraints can be skipped. Here, we propose two optimization techniques.
Skipping the Infeasible Cache Update Relations While constructing the constraints that update the cache states of the instructions, the default approach is to assume that, for any instruction I i , any other instruction I j in the same window may be executed immediately before I i . This means it must construct N 2 update relations. However, due to the dependencies among instructions captured by the DEP relation, there may be many instruction pairs (I j , I i ) such that I j is not allowed to execute before I i . By leveraging the information, we can skip many of these update relations.
Skipping the Unnecessary Φ divc Constraints In many cases, by checking the initial cache state with respect to the sliding window of N instructions, we may be able to know that divergent cache behaviors are impossible during any of the out-of-order executions. In other words, Φ divc is guaranteed to be unsatisfiable (UNSAT). Thus, we can avoid generating Φ. Toward this end, we check for the following two conditions, each of which is sufficient for Φ divc to be UNSAT: -All ages are too young: Inside the initial cache state (with respect to the window), if all cache line ages are less than (M AX − M ), where M is the number of unique addresses used in this window, we skip checking any of the instructions in this window for divergent cache behaviors. This is because the cache is large enough that, regardless of the execution order, none of the cache lines will be evicted. -The age of addr accessed by the target instruction is too young : Inside the initial cache state, if the age of addr is less than (M AX − M ), we skip checking this particular target instruction for divergent cache behaviors. This is because, regardless of the value of the secret variable, this particular cache line will never be evicted out of the cache.

Experiments
We have implemented our method in a tool named Spreca, which builds upon the LLVM compiler [17] and the Z3 SMT solver [19]. Specifically, it uses LLVM to implement the static analysis component, which takes a C program as input and computes the dependencies of memory-related instructions before instrumenting the LLVM bit-code; the instrumented bit-code, after compilation, is used to generate the execution trace at run time. We use Z3 to implement our symbolic analysis component, which takes the logged execution trace as input and generates SMT formulas of the cache states for leakage detection. Overall, our implementation includes 3.6K lines of C++ code inside LLVM for trace generation, SMT encoding and leakage detection, as well as 0.5K lines of Python/Bash script code for processing the trace files and automation. The archive is available at: https://doi.org/10.5281/zenodo.6117196.

Benchmarks
The benchmarks used to evaluate our tool are a set of C programs from OpenSSL 1.1.1k that implement well-known block-ciphers such as AES and DES and cryptographic hashing functions such as SHA256 and Whirlpool. The statistics of these benchmark programs are shown in Table 1, including the name of the program, a short description, the number of lines of C code, and statistics of the logged execution trace, which serves as input of our symbolic analysis method. For each execution trace, we show the trace length, the number of Store (ST) operations, the number of Load (LD) operations, the number of distinct memory locations touched by the execution, and the number of corresponding cache lines. Our experiments were designed to answer the following questions: -Is our method effective in detecting OOE-related cache side-channel leaks? -Is our method, based on symbolic analysis, more scalable than explicit analysis?
Toward this end, for each benchmark program, we applied our symbolic analysis method to check if it can find OOE-related cache side-channel leaks, i.e., leaks that otherwise would not show up unless out-of-order execution is considered.
To evaluate the scalability of our method, we also compared it with a baseline explicit analysis method. Due to space limit, we omit the detailed algorithm of the explicit analysis method, which systematically enumerates the same set of out-of-order executions of instructions considered by our symbolic analysis method. Thus, both our symbolic method and the explicit method examine the same type of secret-dependent divergent cache behaviors, but they differ in efficiency and scalability. Table 2 shows the results of our symbolic analysis method. These results were obtained using the following parameters: the cache has a total of 8K bytes, divided into 128 cache lines, with 64 bytes per cache line. The cache is fully associative, with the LRU replacement policy. The OOE window size is set to 10, meaning the number of Load/Store instructions that will be executed out of order is bounded to 10. Recall that inside the reorder buffer, there can be many non-memory instructions (e.g., arithmetic operations); thus, setting the window size to 10 is a reasonable choice. In this table, Columns 1-2 show the program name and the trace length. Columns 3-5 show the number of SMT solver calls, the number of satisfiable (SAT) instances, and the number of unsatisfiable (UNSAT) instances. Column 6 shows the number of leaking sites detected by our method and Column 7 shows the total analysis time in seconds. Note that the number of SMT solver calls may be smaller than the number of instructions in the trace and, in many cases, is 0 because of the optimizations implemented during our symbolic encoding: for any instruction, if our simple checks reveal that no OOE-related divergent cache behavior is possible, we skip the more time-consuming SMT solver call. Also note that the number of leaking sites in Column 6, which are locations in the original C program, may be smaller than the number of UNSAT instances in Column 4; this is because multiple UNSAT results may be mapped to the same source code location.

Leakage Detection Results
To confirm that the leaking sites reported in Table 2 are indeed feasible (5 for SEED and 6 for Camellia), we manually inspected the source code and the LLVM bit-code of both SEED and Camellia. Our manual inspection shows that the reordered sequences provided by the SMT solver are indeed feasible as we check them against the source code. We also find that the divergent cache behaviors are real in that the two concrete values computed for each symbolic (sensitive) variable can indeed lead to a cache hit in one case but a cache miss in the other case.

Scalability Results
To evaluate the scalability of our symbolic analysis method, we compared its analysis time to that of the baseline explicit enumeration method. This experiment was conducted on SEED, with the OOE window size set to 2, 4, 6, 8 and 10, respectively. This is because the computational complexity of the problem increases exponentially as the OOE window size increases. The results are shown in Fig. 9, where the x-axis is the OOE window size and the y-axis is the analysis time in seconds. The blue line represents our symbolic method while the red line represents the explicit method.
The results in Fig. 9 show that, while our symbolic method has a higher fixed cost (associated with generating SMT formulas, calling the Z3 solver, and interpreting the results), and thus is slower than the explicit method when the OOE window size is smaller, it becomes significantly more efficient when the window size is larger. The figure also show that, as expected, the explicit method has an exponential blowup -its analysis time is actually worse than exponential (factorial in the window size) -whereas the scalability of our symbolic method is significantly better.

Related Work
As we have mentioned earlier, the most closely related work is that of Guo et al. [10,11] which relies on KLEE to detect cache side channels. However, their method only treats program input as symbolic, while still explicitly enumerating the out-of-order executions. Unlike their method, we analyze the set of all possible out-of-order executions symbolically by encoding them in a single logical formula to avoid the exponential blowup. In this sense, our method is the only predictive analysis method that can symbolically analyze the cache behaviors of out-of-order executions.
Besides our method and the method of Guo et al. [10,11], there are many other techniques for analyzing cache side channels. Some of them use symbolic execution as well, e.g., to detect concurrency-related leaks [12] as well as leaks in sequential programs [3,21,29,32]. Others use static analysis techniques including those based on abstract interpretation [6,28,30,31]. In addition to leakage detection, there are techniques for leakage quantification [1,2,5,7,16] as well. However, none of these prior works considers out-of-order execution.
Beyond side-channel leakage detection and leakage quantification, cache analysis has been used in other applications such as estimating the worst-case execution time (WCET) of real-time software [9,13,25]. Beyond cache analysis, the idea of trace-based predictive analysis has been applied to multithreaded programs to detect concurrency bugs [8, 14, 22-24, 26, 27]. However, a crucial difference is that while concurrency bugs are violations of functional properties of a program, our method for side-channel analysis focuses exclusively on non-functional properties.

Conclusions
We have presented a symbolic method for analyzing the cache behaviors of outof-order executions associated with an in-order execution trace. The method uses static analysis to compute dependencies before instrumenting the program to generate the in-order execution trace. Then, it uses an SMT solver based symbolic analysis to analyze the cache behaviors of all out-of-order executions. Our experiments on cryptographic software code show that the symbolic analysis method is effective in detecting OOE-related cache side-channel leaks and is significantly more scalable than explicit analysis. For future work, we plan to extend our method to detect side-channel leaks caused by other CPU-level optimizations.