1 Introduction

Power side-channel attacks [42] can infer secrecy by statistically analyzing the power consumption during the execution of cryptographic programs. The victims include implementations of almost all major cryptographic algorithms, e.g., DES [41], AES [54], RSA [33], Elliptic curve cryptography [46, 52] and post-quantum cryptography [56, 59]. To mitigate the threat, cryptographic algorithms are often implemented via masking [37], which divides each secret value into \((d+1)\) shares by randomization, where d is a given masking order. However, it is error-prone to implement secure and correct masked implementations for non-linear functions (e.g., finite-field multiplication, module addition and S-Box), which are prevalent in cryptography. Indeed, published implementations of AES S-Box that have been proved secure via paper-and-pencil [19, 40, 58] were later shown to be vulnerable to power side-channels when d is no less than 4 [24].

While numerous formal verification techniques have been proposed to prove resistance of masked cryptographic programs against power side-channel attacks (e.g., [7, 13, 26, 29,30,31,32, 64]), one fundamental question which is largely left open is the (functional) correctness of the masked cryptographic programs, i.e., whether a masked program and the original (unmasked) cryptographic algorithm are actually functional equivalent. It is conceivable to apply general-purpose program verifiers to masked cryptographic programs. Constraint-solving based approaches are available, for instance, Boogie [6] generates constraints via weakest precondition reasoning which then invokes SMT solvers; SeaHorn [36] and CPAChecker [12] adopt model checking by utilizing SMT or CHC solvers. More recent work (e.g., CryptoLine [28, 45, 53, 62]) resorts to computer algebra, e.g., to reduce the problem to the ideal membership problem. The main challenge of applying these techniques to masked cryptographic programs lies in the presence of finite-field multiplication, affine transformations and bitwise exclusive-OR (XOR). For instance, finite-field multiplication is not natively supported by the current SMT or CHC solvers, and the increasing number of bitwise XOR operations causes the infamous state-explosion problem. Moreover, to the best of our knowledge, current computer algebra systems do not provide the full support required by verification of masked cryptographic programs.

Contributions. We propose a novel, term rewriting based approach to efficiently check whether a masked program and the original (unmasked) cryptographic algorithm (over Galois fields of characteristic 2) are functional equivalent. Namely, we provide a term rewriting system (TRS) which can handle affine transformations, bitwise XOR, and finite-field multiplication. The verification problem is reduced to checking whether a term can be rewritten to normal form 0. This approach is sound, i.e., once we obtain 0, we can claim functional equivalence. In case the TRS reduces to a normal form which is different from 0, most likely they are not functional equivalent, but a false positive is possible. We further resort to random testing and SMT solving by directly analyzing the obtained normal form. As a result, it turns out that the overall approach is complete if no uninterpreted functions are involved in the normal form.

We implement our approach as a new tool FISCHER (FunctionalIty of maSked CryptograpHic program verifiER), based on the LLVM framework [43]. We conduct extensive experiments on various masked cryptographic program benchmarks. The results show that our term rewriting system solely is able to prove almost all the benchmarks. FISCHER is also considerably more efficient than the general-purpose verifiers SMACK [55], SeaHorn, CPAChecker, and Symbiotic [22], cryptography-specific verifier CryptoLine, as well as a straightforward approach that directly reduces the verification task to SMT solving. For instance, our approach is able to handle masked implementations of finite-field multiplication with masking orders up to 100 in less than 153 s, while none of the compared approaches can handle masking order of 3 in 20 min.

In particular, for the first time we detect a flaw in a masked implementation of finite-field multiplication published in EUROCRYPT 2017 [8]. The flaw is tricky, as it only occurs for the masking order \(d\equiv 1 \mod 4\).Footnote 1 This finding highlights the importance of the correctness verification of masked programs, which has been largely overlooked, but of which our work provides an effective solution.

Our main contributions can be summarized as follows.

  • We propose a term rewriting system for automatically proving the functional correctness of masked cryptographic programs;

  • We implement a tool FISCHER by synergistically integrating the term rewriting based approach, random testing and SMT solving;

  • We conduct extensive experiments, confirming the effectiveness, efficiency, scalability and applicability of our approach.

Related Work. Program verification has been extensively studied for decades. Here we mainly focus on their application in cryptographic programs, for which some general-purpose program verifiers have been adopted. Early work [3] uses Boogie [6]. HACL* [65] uses F* [2] which verifies programs by a combination of SMT solving and interactive proof assistants. Vale [15] uses F* and Dafny [44] where Dafny harnesses Boogie for verification. Cryptol [61] checks equivalence between machine-readable cryptographic specifications and real-world implementations via SMT solving. As mentioned before, computer algebra systems (CAS) have also been used for verifying cryptographic programs and arithmetic circuits, by reducing to the ideal membership problem together with SAT/SMT solving. Typical work includes CryptoLine and AMulet [38, 39]. However, as shown in Sect. 7.2, neither general-purpose verifiers (SMACK with Boogie and Corral, SeaHorn, CPAChecker and Symbiotic) nor the CAS-based verifier CryptoLine is sufficiently powerful to verify masked cryptographic programs. Interactive proof assistants (possibly coupled with SMT solvers) have also been used to verify unmasked cryptographic programs (e.g., [1, 4, 9, 23, 27, 48, 49]). Compared to them, our approach is highly automatic, which is more acceptable and easier to use for general software developers.

Outline. Section 2 recaps preliminaries. Section 3 presents a language on which the cryptographic program is formalized. Section 4 gives an example and an overview of our approach. Section 5 and Sect. 6 introduce the term rewriting system and verification algorithms. Section 7 reports experimental results. We conclude in Sect. 8. The source code of our tool and benchmarks are available at https://github.com/S3L-official/FISCHER.

2 Preliminaries

For two integers lu with \(l\le u\), [lu] denotes the set of integers \(\{l,l+1, \cdots ,u\}\).

Galois Field. A Galois field \(\mathbb{G}\mathbb{F}(p^n)\) comprises polynomials \(a_{n-1} X^{n-1}+\cdots + a_1 X^1+ a_0 \) over \(\mathbb {Z}_p=[0,p-1]\), where p is a prime number, n is a positive integer, and \(a_i\in \mathbb {Z}_p\). (Here p is the characteristic of the field, and \(p^n\) is the order of the field.) Symmetric cryptography (e.g., DES [50], AES [25], SKINNY [10], PRESENT [14]) and bitsliced implementations of asymmetric cryptography (e.g., [17]) intensively uses \(\mathbb{G}\mathbb{F}(2^n)\). Throughout the paper, \(\mathbb {F}\) denotes the Galois field \(\mathbb{G}\mathbb{F}(2^n)\) for a fixed n, and \(\oplus \) and \(\otimes \) denote the addition and multiplication on \(\mathbb {F}\), respectively. Recall that \(\mathbb{G}\mathbb{F}(2^n)\) can be constructed from the quotient ring of the polynomial ring \(\mathbb{G}\mathbb{F}(2)[X]\) with respect to the ideal generated by an irreducible polynomial P of degree n. Hence, multiplication is the product of two polynomials modulo P in \(\mathbb{G}\mathbb{F}(2)[X]\) and addition is bitwise exclusive-OR (XOR) over the binary representation of polynomials. For example, AES uses \(\mathbb{G}\mathbb{F}(256)=\mathbb{G}\mathbb{F}(2)[X]/(X^8+X^4+X^3+X+1)\). Here \(n=8\) and \(P=X^8+X^4+X^3+X+1\).

Higher-Order Masking. To achieve order-d security against power side-channel attacks under certain leakage models, masking is usually used [37, 60]. Essentially, masking partitions each secret value into (usually \(d+1\)) shares so that knowing at most d shares cannot infer any information of the secret value, called order-d masking. In Boolean masking, a value \(a\in \mathbb {F}\) is divided into shares \(a_0, a_1, \ldots , a_d\in \mathbb {F}\) such that \(a_0 \oplus a_1 \oplus \ldots \oplus a_d = a\). Typically, \(a_1, \ldots , a_d\) are random values and \(a_0= a\oplus a_1 \oplus \ldots \oplus a_d\). The tuple \((a_0, a_1, \ldots , a_d)\), denoted by \(\textbf{a}\), is called an encoding of a. We write \(\bigoplus _{i \in [0,d]} {\textbf{a}}_i\) (or simply \(\bigoplus \textbf{a}\)) for \(a_0 \oplus a_1 \oplus \ldots \oplus a_d\). Additive masking can be defined similarly to Boolean masking, where \(\oplus \) is replaced by the module arithmetic addition operator. In this work, we focus on Boolean masking as the XOR operation is more efficient to implement.

To implement a masked program, for each operation in the cryptographic algorithm, a corresponding operation on shares is required. As we will see later, when the operation is affine (i.e. the operation f satisfies \(f(x \oplus y) = f(x) \oplus f(y)\oplus c\) for some constant c), the corresponding operation is simply to apply the original operation on each share \(a_i\) in the encoding \((a_0, a_1, \ldots , a_d)\). However, for non-affine operations (e.g., multiplication and addition), it is a very difficult task and error-prone [24]. Ishai et al. [37] proposed the first masked implementation of multiplication, but limited to the domain \(\mathbb{G}\mathbb{F}(2)\) only. The number of the required random values and operations is not optimal and is known to be vulnerable in the presence of glitches because the electric signals propagate at different speeds in the combinatorial paths of hardware circuits. Thus, various follow-up papers proposed ways to implement higher-order masking for the domain \(\mathbb{G}\mathbb{F}(2^n)\) and/or optimizing the computational complexity, e.g., [8, 11, 21, 34, 58], all of which are referred to as ISW scheme in this paper. In another research direction, new glitch-resistant Boolean masking schemes have been proposed, e.g., Hardware Private Circuits (HPC1 & HPC2) [20], Domain-oriented Masking (DOM) [35] and Consolidating Masking Schemes (CMS) [57]. In this work, we are interested in automatically proving the correctness of the masked programs.

3 The Core Language

In this section, we first present the core language MSL, given in Fig. 1, based on which the verification problem is formalized.

Fig. 1.
figure 1

Syntax of MSL in Backus-Naur form

A program \(\mathcal {P}\) in MSL is given by a sequence of procedure definitions and affine transformation definitions/declarations. A procedure definition starts with the keyword , followed by a procedure name, a list of input parameters, an output and its body. The procedure body has two blocks of statements, separated by a special statement \(d+1\), where d is the masking order. The first block \(\langle \)stmts\(\rangle _\text {origin}\), called the original block, implements its original functionality on the input parameters without masking. The second block \(\langle \)stmts\(\rangle _\text {masked}\), called the masked block, is a masked implementation of the original block over the input encodings \(\textbf{x}\) of the input parameters x. The input parameters and output x, declared using the keywords and respectively, are scalar variables in the original block, but are treated as the corresponding encodings (i.e., tuples) \(\textbf{x}\) in the masked block. For example, x declares the scalar variable x as the input of the original block, while it implicitly declares an encoding \(\textbf{x}=(x_0, x_1, \ldots , x_d)\) as the input of the masked block with \(d+1\).

We distinguish affine transformation definitions and declarations. The former starts with the keyword , followed by a name f, an input, an output and its body. It is expected that the affine property \(\forall x, y\in \mathbb {F}. f(x \oplus y) = f(x) \oplus f(y) \oplus c\) holds for some affine constant \(c\in \mathbb {F}\). (Note that the constant c is not explicitly provided in the program, but can be derived, cf. Sect. 6.2.) The transformation f is linear if its affine constant c is 0. In contrast, an affine transformation declaration f simply declares a transformation. As a result, it can only be used to declare a linear one (i.e., c must be 0), which is treated as an uninterpreted function. Note that non-linear affine transformation declarations can be achieved by declaring linear affine transformations and affine transformation definitions. Affine transformation here serves as an abstraction to capture complicated operations (e.g., shift, rotation and bitwise Boolean operations) and can accelerate verification by expressing operations as uninterpreted functions. In practice, a majority of cryptographic algorithms (in symmetric cryptography) can be represented by a composition of S-box, XOR and linear transformation only.

Masking an affine transformation can simply mask an input encoding in a share-wise way, namely, the masked version of the affine transformation f(a) is

$$\begin{aligned} f(a_0 \oplus a_1 \oplus \ldots \oplus a_d)= \left\{ \begin{array}{ll} f(a_0) \oplus f(a_1) \oplus \ldots \oplus f(a_d), &{} \hbox {if}\, d\, \hbox {is even;} \\ f(a_0) \oplus f(a_1) \oplus \ldots \oplus f(a_d)\oplus c, &{} \hbox {if}\, d\, \hbox {is odd.} \end{array} \right. \end{aligned}$$

This is default, so affine transformation definition only contains the original block but no masked block.

A statement is either an assignment or a function call. MSL features two types of assignments which are either of the form \(x \leftarrow e\) defined as usual or of the form \(r \leftarrow \) which assigns a uniformly sampled value from the domain \(\mathbb {F}\) to the variable r. As a result, r should be read as a random variable. We assume that each random variable is defined only once. We note that the actual parameters and output are scalar if the procedure is invoked in an original block while they are the corresponding encodings if it is invoked in a masked block.

MSL is the core language of our tool. In practice, to be more user-friendly, our tool also accepts C programs with conditional branches and loops, both of which should be statically determinized (e.g., loops are bound and can be unrolled; the branching of conditionals can also be fixed after loop unrolling). Furthermore, we assume there is no recursion and dynamic memory allocation. These restrictions are sufficient for most symmetric cryptography and bitsliced implementations of public-key cryptography, which mostly have simple control graphs and memory aliases.

Problem Formalization. Fix a program \(\mathcal {P}\) with all the procedures using order-d masking. We denote by \(\mathcal {P}_{\mathfrak o}\) (resp. \(\mathcal {P}_{\mathfrak m}\)) the program \(\mathcal {P}\) where all the masked (resp. original) blocks are omitted. For each procedure f, the procedures \(f_{\mathfrak o}\) and \(f_{\mathfrak m}\) are defined accordingly.

Definition 1

Given a procedure f of \(\mathcal {P}\) with m input parameters, \(f_{\mathfrak m}\) and \(f_{\mathfrak o}\) are functional equivalent, denoted by \(f_{\mathfrak m}\cong f_{\mathfrak o}\), if the following statement holds:

$$\begin{aligned} \forall a^1,&\cdots , a^m, r_1,\cdots ,r_h \in \mathbb {F}, \forall \textbf{a}^1, \cdots , \textbf{a}^m \in \mathbb {F}^{d+1}.\\&\big ( \bigwedge \nolimits _{i \in [1,m]}\ a^i = \bigoplus \nolimits _{j \in [0,d]} {\textbf{a}}^i_j \big ) \rightarrow \big (f_{\mathfrak o}(a^1,\cdots , a^m) = \bigoplus \nolimits _{i \in [0,d]} f_{\mathfrak m}(\textbf{a}^1, \cdots , \textbf{a}^m)_i\big ) \end{aligned}$$

where \(r_1,\cdots ,r_h\) are all the random variables used in \(f_{\mathfrak m}\).

Note that although the procedure \(f_{\mathfrak m}\) is randomized (i.e., the output encoding \(f_{\mathfrak m}(\textbf{a}^1, \cdots , \textbf{a}^m_i)\) is technically a random variable), for functional equivalence we consider a stronger notion, viz., to require that \(f_{\mathfrak m}\) and \(f_{\mathfrak o}\) are equivalent under any values in the support of the random variables \(r_1,\cdots ,r_h\). Thus, \(r_1,\cdots ,r_h\) are universally quantified in Definition 1.

The verification problem is to check if \(f_{\mathfrak m}\cong f_{\mathfrak o}\) for a given procedure f where \(\bigwedge _{i \in [1,m]}\ a^i = \bigoplus _{j \in [0,d]} {\textbf{a}}^i_j\) and \(f_{\mathfrak o}(a^1,\cdots , a^m) = \bigoplus _{i \in [0,d]} f_{\mathfrak m}(\textbf{a}^1, \cdots , \textbf{a}^m)_i\) are regarded as pre- and post-conditions, respectively. Thus, we assume the unmasked procedures themselves are correct (which can be verified by, e.g., CryptoLine). Our focus is on whether the masked counterparts are functional equivalent to them.

Fig. 2.
figure 2

Motivating example, where \(\textbf{x}\) denotes \((x_0,x_1)\).

4 Overview of the Approach

In this section, we first present a motivating example given in Fig. 2, which computes the multiplicative inverse in \(\mathbb{G}\mathbb{F}(2^8)\) for the AES S-Box [58] using first-order Boolean masking. It consists of three affine transformation definitions and two procedure definitions. For a given input x, exp2(x) outputs \(x^2\), exp4(x) outputs \(x^4\) and exp16(x) outputs \(x^{16}\). Obviously, these three affine transformations are indeed linear.

Procedure sec_mult\(_{\mathfrak o}(a,b)\) outputs \(a\otimes b\). Its masked version sec_mult\(_{\mathfrak m}(\textbf{a},\textbf{b})\) computes the encoding \(\textbf{c}=(c_0,c_1)\) over the encodings \(\textbf{a}=(a_0,a_1)\) and \(\textbf{b}=(b_0,b_1)\). Clearly, it is desired that \(c_0\oplus c_1=a\otimes b \) if \(a_0\oplus a_1=a\) and \(b_0\oplus b_1=b\). Procedure refresh_masks\(_{\mathfrak o}(x)\) is the identity function while its masked version refresh_masks\(_{\mathfrak m}(\textbf{x})\) re-masks the encoding \(\textbf{x}\) using a random variable \(r_0\). Thus, it is desired that \(y_0\oplus y_1=x\) if \(x=x_0\oplus x_1\). Procedure sec_exp254\(_{\mathfrak o}(x)\) computes the multiplicative inverse \(x^{254}\) of x in \(\mathbb{G}\mathbb{F}(2^8)\). Its masked version sec_exp254\(_{\mathfrak m}(\textbf{x})\) computes the encoding \(\textbf{y}=(y_0,y_1)\) where refresh_masks\(_{\mathfrak m}\) is invoked to avoid power side-channel leakage. Thus, it is desired that \(y_0\oplus y_1=x^{254}\) if \(x_0\oplus x_1=x\). In summary, it is required to prove \(\texttt {sec\_mult}_{\mathfrak m}\cong \texttt {sec\_mult}_{\mathfrak o}\), \(\texttt {refresh\_masks}_{\mathfrak m}\cong \texttt {refresh\_masks}_{\mathfrak o}\) and \(\texttt {sec\_exp254}_{\mathfrak m}\cong \texttt {sec\_exp254}_{\mathfrak o}\).

4.1 Our Approach

An overview of FISCHER is shown in Fig. 3. The input program is expected to follow the syntax of MSL but in C language. Moreover, the pre-conditions and post-conditions of the verification problem are expressed by and statements in the masked procedure, respectively. Recall that the input program can contain conditional branches and loops when are statically determinized. Furthermore, affine transformations can use other common operations (e.g., shift, rotation and bitwise Boolean operations) besides the addition \(\oplus \) and multiplication \(\otimes \) on the underlying field \(\mathbb {F}\). FISCHER leverages the LLVM framework to obtain the LLVM intermediate representation (IR) and call graph, where all the procedure calls are inlined. It then invokes Affine Constant Computing to iteratively compute the affine constants for affine transformations according to the call graph, and Functional Equivalence Checking to check functional equivalence, both of which rely on the underpinning engines, viz., Symbolic Execution (refer to symbolic computation without path constraint solving in this work), Term Rewriting and SMT-based Solving.

Fig. 3.
figure 3

Overview of FISCHER.

We apply intra-procedural symbolic execution to compute the symbolic outputs of the procedures and transformations, i.e., expressions in terms of inputs, random variables and affine transformations. The symbolic outputs are treated as terms based on which both the problems of functional equivalence checking and affine constant computing are solved by rewriting to their normal forms (i.e., sums of monomials w.r.t. a total order). The analysis result is often conclusive from normal forms. In case it is inconclusive, we iteratively inline affine transformations when their definitions are available until either the analysis result is conclusive or no more affine transformations can be inlined. If the analysis result is still inconclusive, to reduce false positives, we apply random testing and accurate (but computationally expansive) SMT solving to the normal forms instead of the original terms. We remark that the term rewriting system solely can prove almost all the benchmarks in our experiments.

Consider the motivating example. To find the constant \(c\in \mathbb {F}\) of exp2 such that the property \(\forall x, y\in \mathbb {F}. \texttt{exp2}(x \oplus y) = \texttt{exp2}(x) \oplus \texttt{exp2}(y) \oplus c\) holds, by applying symbolic execution, \(\texttt{exp2}(x)\) is expressed as the term \(x\otimes x\). Thus, the property is reformulated as \((x \oplus y)\otimes (x \oplus y) = (x\otimes x) \oplus (y\otimes y) \oplus c\), from which we can deduce that the desired affine constant c is equivalent to the term \(((x \oplus y)\otimes (x \oplus y)) \oplus (x\otimes x) \oplus (y\otimes y)\). Our TRS will reduce the term as follows:

$$\begin{aligned} {\begin{array}{lr} \quad \underline{((x \oplus y)\otimes (x \oplus y))} \oplus (x\otimes x) \oplus (y\otimes y) &{} \quad \text {Distributive Law}\\ =\underline{(x\otimes (x \oplus y))} \oplus \underline{(y\otimes (x \oplus y))} \oplus (x\otimes x) \oplus (y\otimes y) &{} \quad \text {Distributive Law}\\ =(x\otimes x)\oplus (x\otimes y) \oplus \underline{(y\otimes x)} \oplus (y \otimes y) \oplus (x\otimes x) \oplus (y\otimes y) &{} \quad \text {Commutative Law}\\ =(x\otimes x)\oplus (x\otimes y) \oplus (x\otimes y) \oplus (y \otimes y) \oplus \underline{(x\otimes x)} \oplus (y\otimes y) &{} \quad \text {Commutative Law}\\ =\underline{(x\otimes x)\oplus (x\otimes x)}\oplus \underline{(x\otimes y) \oplus (x\otimes y)} \oplus \underline{(y \otimes y) \oplus (y\otimes y)}=0 &{} \quad \text {Zero Law of XOR}\\ \end{array}} \end{aligned}$$

For the transformation exp4(x), by applying symbolic execution, it can be expressed as the term \(\texttt{exp2}(\texttt{exp2}(x))\). To find the constant \(c\in \mathbb {F}\) to satisfy \(\forall x, y\in \mathbb {F}. \texttt{exp4}(x \oplus y) = \texttt{exp4}(x) \oplus \texttt{exp4}(y) \oplus c\), we compute the term \(\texttt{exp2}(\texttt{exp2}(x \oplus y))\oplus \texttt{exp2}(\texttt{exp2}(x))\oplus \texttt{exp2}(\texttt{exp2}(y))\). By applying our TRS, we have:

$$\begin{aligned} {\begin{array}{l} \quad \underline{\texttt{exp2}(\texttt{exp2}(x \oplus y))}\oplus \texttt{exp2}(\texttt{exp2}(x))\oplus \texttt{exp2}(\texttt{exp2}(y)) \\ = \underline{\texttt{exp2}(\texttt{exp2}(x) \oplus \texttt{exp2}(y))}\oplus \texttt{exp2}(\texttt{exp2}(x))\oplus \texttt{exp2}(\texttt{exp2}(y))\\ = \texttt{exp2}(\texttt{exp2}(x)) \oplus \texttt{exp2}(\texttt{exp2}(y))\oplus \underline{\texttt{exp2}(\texttt{exp2}(x))}\oplus \texttt{exp2}(\texttt{exp2}(y))\\ = \underline{\texttt{exp2}(\texttt{exp2}(x)) \oplus \texttt{exp2}(\texttt{exp2}(x))}\oplus \underline{\texttt{exp2}(\texttt{exp2}(y))\oplus \texttt{exp2}(\texttt{exp2}(y))}=0\\ \end{array}} \end{aligned}$$

Clearly, the affine constant of \(\texttt{exp4}\) is 0. Similarly, we can deduce that the affine constant of the transformation exp16 is 0 as well.

To prove \(\texttt {sec\_mult}_{\mathfrak o}\cong \texttt {sec\_mult}_{\mathfrak m}\), by applying symbolic execution, we have that \(\texttt {sec\_mult}_{\mathfrak o}(a,b)=a\otimes b\) and \(\texttt {sec\_mult}_{\mathfrak m}(\textbf{a},\textbf{b})=\textbf{c}=(c_0,c_1)\), where \(c_0= (a_0\otimes b_0)\oplus r_0\) and \(c_1=(a_1\otimes b_1)\oplus (r_0\oplus (a_0\otimes b_1)\oplus (a_1\otimes b_0))\). Then, by Definition 1, it suffices to check

$$\begin{aligned}&\forall a,b,a_0,a_1,b_0,b_1,r_0 \in \mathbb {F}. \big ( a = a_0\oplus a_1 \wedge b = b_0\oplus b_1 \big )\rightarrow \\&\qquad \qquad \,\, \big (a\otimes b = ((a_0\otimes b_0)\oplus r_0)\oplus \big ((a_1\otimes b_1)\oplus (r_0\oplus (a_0\otimes b_1)\oplus (a_1\otimes b_0))\big ) \big ). \end{aligned}$$

Thus, we check the term \(\big ((a_0\oplus a_1) \otimes (b_0\oplus b_1)\big )\oplus ((a_0\otimes b_0)\oplus r_0)\oplus ((a_1\otimes b_1)\oplus (r_0\oplus (a_0\otimes b_1)\oplus (a_1\otimes b_0)))\) which is equivalent to 0 iff \(\texttt {sec\_mult}_{\mathfrak o}\cong \texttt {sec\_mult}_{\mathfrak m}\). Our TRS is able to reduce the term to 0. Similarly, we represent the outputs of \(\texttt {sec\_exp254}_{\mathfrak o}\) and \(\texttt {sec\_exp254}_{\mathfrak m}\) as terms via symbolic execution, from which the statement \(\texttt {sec\_exp254}_{\mathfrak o}\cong \texttt {sec\_exp254}_{\mathfrak m}\) is also encoded as a term, which can be reduced to 0 via our TRS without inlining any transformations.

5 Term Rewriting System

In this section, we first introduce some basic notations and then present our term rewriting system.

Definition 2

Given a program \(\mathcal {P}\) over \(\mathbb {F}\), a signature \(\varSigma _\mathcal {P}\) of \(\mathcal {P}\) is a set of symbols \(\mathbb {F}\cup \{\oplus , \otimes , f_1, \ldots , f_t\}\), where \(s\in \mathbb {F}\) with arity 0 are all the constants in \(\mathbb {F}\), \(\oplus \) and \(\otimes \) with arity 2 are addition and multiplication operators on \(\mathbb {F}\), and \(f_1,\cdots , f_t\) with arity 1 are affine transformations defined/declared in \(\mathcal {P}\).

For example, the signature of the motivating example is \(\mathbb {F}\cup \{\oplus , \otimes , \texttt{exp2},\texttt{exp4}, \texttt{exp16}\}\). When it is clear from the context, the subscript \(\mathcal {P}\) is dropped from \(\varSigma _\mathcal {P}\).

Definition 3

Let V be a set of variables (assuming \(\varSigma \cap V=\emptyset \)), the set \(T[\varSigma ,V]\) of \(\varSigma \)-terms over V is inductively defined as follows:

  • \(\mathbb {F}\subseteq T[\varSigma ,V]\) and \(V\subseteq T[\varSigma ,V]\) (i.e., every variable/constant is a \(\varSigma \)-term);

  • \(\tau \oplus \tau '\in T[\varSigma ,V]\) and \(\tau \otimes \tau '\in T[\varSigma ,V]\) if \(\tau ,\tau '\in T[\varSigma ,V]\) (i.e., application of addition and multiplication operators to \(\varSigma \)-terms yield \(\varSigma \)-terms);

  • \(f_j(\tau )\in T[\varSigma ,V]\) if \(\tau \in T[\varSigma ,V]\) and \(j\in [1,t]\) (i.e., application of affine transformations to \(\varSigma \)-terms yield \(\varSigma \)-terms).

We denote by \(T_{\backslash \oplus }(\varSigma ,V)\) the set of \(\varSigma \)-terms that do not use the operator \(\oplus \).

A \(\varSigma \)-term \(\alpha \in T[\varSigma ,V]\) is called a factor if \(\tau \in \mathbb {F}\cup V\) or \(\tau =f_i(\tau ')\) for some \(i\in [1,t]\) such that \(\tau '\in T_{\backslash \oplus }(\varSigma ,V)\). A monomial is a product \(\alpha _1\otimes \cdots \otimes \alpha _k\) of none-zero factors for \(k\ge 1\). We denote by \(M[\varSigma ,V]\) the set of monomials. For instance, consider variables \(x,y\in V\) and affine transformations \(f_1,f_2\in \varSigma \). All \(f_1(f_2(x))\otimes f_1(y)\), \(f_1(2\otimes f_2(4\otimes x))\), \(f_1(x\oplus y)\) and \(f_1(f_2(x)) \oplus f_1(x)\) are \(\varSigma \)-terms, both \(f_1(f_2(x))\otimes f_1(y)\) and \(f_1(2\otimes f_2(4\otimes x))\) are monomials, while neither \(f_1(x\oplus y)\) nor \(f_1(f_2(x)) \oplus f_1(x)\) is a monomial. For the sake of presentation, \(\varSigma \)-terms will be written as terms, and the operator \(\otimes \) may be omitted, e.g., \(\tau _1\tau _2\) denotes \(\tau _1\otimes \tau _2\), and \(\tau ^2\) denotes \(\tau \otimes \tau \).

Definition 4

A polynomial is a sum \(\bigoplus _{i\in [1,t]} m_i\) of monomials \(m_1 \ldots m_t\in M[\varSigma , V]\). We use \(P[\varSigma ,V]\) to denote the set of polynomials.

To simplify and normalize polynomials, we impose a total order on monomials and their factors.

Definition 5

Fix an arbitrary total order \(\ge _s\) on \(V\uplus \varSigma \).

For two factors \(\alpha \) and \(\alpha '\), the factor order \(\ge _l\) is defined such that \(\alpha \ge _l \alpha '\) if one of the following conditions holds:

  • \(\alpha ,\alpha '\in \mathbb {F}\cup V\) and \(\alpha \ge _s \alpha '\);

  • \(\alpha =f(\tau )\) and \(\alpha '=f'(\tau ')\) such that \(f \ge _s f'\) or (\(f = f'\) and \(\tau \ge _p \tau '\));

  • \(\alpha =f(\tau )\) such that \(f \ge _s \alpha '\) or \(\alpha '=f(\tau )\) such that \(\alpha \ge _s f\).

Given a monomial \(m=\alpha _1\cdots \alpha _k\), we write \(\textsf{sort}_{\ge _l}(\alpha _1,\cdots , \alpha _k)\) for the monomial which includes \(\alpha _1, \cdots , \alpha _k\) as factors, but sorts them in descending order.

Given two monomials \(m=\alpha _1\cdots \alpha _k\) and \(m'=\alpha _1'\cdots \alpha _{k'}'\), the monomial order \(\ge _p\) is defined as the lexicographical order between \(\textsf{sort}_{\ge _l}(\alpha _1,\cdots , \alpha _k)\) and \(\textsf{sort}_{\ge _l}(\alpha _1',\cdots , \alpha _{k'}')\).

Intuitively, the factor order \(\ge _l\) follows the given order \(\ge _s\) on \(V\uplus \varSigma \), where the factor order between two factors with the same affine transformation f is determined by their parameters. We note that if \(\textsf{sort}_{\ge _l}(\alpha _1',\cdots , \alpha _{k'}')\) is a prefix of \(\textsf{sort}_{\ge _l}(\alpha _1,\cdots , \alpha _k)\), we have: \(\alpha _1\cdots \alpha _k \ge _p \alpha _1'\cdots \alpha _{k'}'\). Furthermore, if \(\alpha _1\cdots \alpha _k \ge _p \alpha _1'\cdots \alpha _{k'}'\) and \(\alpha _1'\cdots \alpha _{k'}' \ge _p \alpha _1\cdots \alpha _k\), then \(\textsf{sort}_{\ge _l}(\alpha _1',\cdots , \alpha _{k'}')=\textsf{sort}_{\ge _l}(\alpha _1,\cdots , \alpha _k)\). We denote by \(\alpha _1\cdots \alpha _k >_p \alpha _1'\cdots \alpha _{k'}'\) if \(\alpha _1\cdots \alpha _k \ge _p \alpha _1'\cdots \alpha _{k'}'\) but \(\textsf{sort}_{\ge _l}(\alpha _1',\cdots , \alpha _{k'}')\ne \textsf{sort}_{\ge _l}(\alpha _1,\cdots , \alpha _k)\).

Proposition 1

The monomial order \(\ge _p\) is a total order on monomials.

Definition 6

Given a program \(\mathcal {P}\), we define the corresponding term rewriting system (TRS) \(\mathcal {R}\) as a tuple \((\varSigma , V, \ge _s, \varDelta )\), where \(\varSigma \) is a signature of \(\mathcal {P}\), V is a set of variables of \(\mathcal {P}\) (assuming \(\varSigma \cap V=\emptyset \)), \(\ge _s\) is a total order on \(V\uplus \varSigma \), and \(\varDelta \) is the set of term rewriting rules given below:

figure k

where \(m_1,m_1',\cdots ,m_k,m_k'\in M[\varSigma , V]\), \(\alpha _1,\alpha _2,\alpha _3\) are factors, \(\tau ,\tau _1,\tau _2\in T[\varSigma ,V]\) are terms, \(f\in \varSigma \) is an affine transformation with affine constant c.

Intuitively, rules R1 and R2 specify the commutativity of \(\oplus \) and \(\otimes \), respectively, by which monomials and factors are sorted according to the orders \(\ge _{p}\) and \(\ge _{l}\), respectively. Rule R3 specifies that \(\oplus \) is essentially bitwise XOR. Rules R4 and R5 specify that 0 is the multiplicative zero. Rules R6 and R7 (resp. R8 and R9) specify that 0 (resp. 1) is additive (resp. multiplicative) identity. Rules R10 and R11 express the distributivity of \(\otimes \) over \(\oplus \). Rule R12 expresses the affine property of an affine transformation while rule R13 is an instance of rule R12 via rules R3 and R5.

Given a TRS \(\mathcal {R}=(\varSigma , V, \ge _s, \varDelta )\) for a given program \(\mathcal {P}\), a term \(\tau \in T[\varSigma ,V]\) can be rewritten to a term \(\tau '\), denoted by \(\tau \Rightarrow \tau '\), if there is a rewriting rule \(\tau _1\mapsto \tau _2\) such that \(\tau '\) is a term obtained from \(\tau \) by replacing an occurrence of the sub-term \(\tau _1\) with the sub-term \(\tau _2\). A term is in a normal form if no rewriting rules can be applied. A TRS is terminating if all terms can be rewritten to a normal form after finitely many rewriting. We denote by \(\tau \Rrightarrow \tau '\) with \(\tau '\) being the normal form of \(\tau \).

We show that any TRS \(\mathcal {R}\) associated with a program \(\mathcal {P}\) is terminating, and that any term will be rewritten to a normal form that is a polynomial, independent of the way of applying rules.

Lemma 1

For every normal form \(\tau \in T[\varSigma ,V]\) of the TRS \(\mathcal {R}\), the term \(\tau \) must be a polynomial \(m_1\oplus \cdots \oplus m_k\) such that (1) \(\forall i\in [1,k-1]\), \(m_i>_p m_{i+1}\), and (2) for every monomial \(m_i=\alpha _1 \cdots \alpha _h\) and \(\forall i\in [1,h-1]\), \(\alpha _i \ge _l \alpha _{i+1}\).

Proof

Consider a normal form \(\tau \in T[\varSigma ,V]\). If \(\tau \) is not a polynomial, then there must exist some monomial \(m_i\) in which the addition operator \(\oplus \) is used. This means that either rule R\(_{10}\) or R\(_{11}\) is applicable to the term \(\tau \) which contradicts the fact that \(\tau \) is normal form.

Suppose \(\tau \) is the polynomial \(m_1\oplus \cdots \oplus m_k\).

  • If there exists \(i:1\le i< k\) such that \(m_i>_p m_{i+1}\) does not hold, then either \(m_i=m_{i+1}\) or \(m_{i+1}>_p m_i\). If \(m_i= m_{i+1}\), then rule R3 is applicable to the term \(\tau \). If \(m_{i+1}>_p m_i\), then rule R\(_{1}\) is applicable to the term \(\tau \). Thus, for every \(1\le i< k\), \(m_i>_p m_{i+1}\).

  • If there exist a monomial \(m_i=\alpha _1 \cdots \alpha _h\) and \(i:1\le i< h\) such that \(\alpha _i \ge _l \alpha _{i+1}\) does not hold, then \(\alpha _{i+1} >_l \alpha _i\). This means that rule R2 is applicable to the term \(\tau \). Thus, for every monomial \(m_i=\alpha _1 \cdots \alpha _h\) and every \(i:1\le i< h\), \(\alpha _i \ge _l \alpha _{i+1}\).   \(\square \)

Lemma 2

The TRS \(\mathcal {R}=(\varSigma , V, \ge _s, \varDelta )\) of a given program \(\mathcal {P}\) is terminating.

Proof

Consider a term \(\tau \in T[\varSigma ,V]\). Let \(\pi =\tau _1\Rightarrow \tau _2\Rightarrow \tau _3\Rightarrow \cdots \Rightarrow \tau _i\Rightarrow \cdots \) be a reduction of the term \(\tau \) by applying rewriting rules, i.e., \(\tau =\tau _1\). We prove that the reduction \(\pi \) is finite by showing that all the rewriting rules can be applied finitely.

First, since rules R1 and R2 only sort the monomials and factors, respectively, while sorting always terminates using any classic sorting algorithm (e.g., quick sort algorithm), rules R1 and R2 can only be consecutively applied finitely for each term \(\tau _i\) due to the premises \(\texttt{sort}_{\ge _p} (m_1, \cdots , m_k)\ne (m_1, \cdots , m_k)\) and \(\texttt{sort}_{\ge _l} (\alpha _1, \cdots , \alpha _k)\ne (\alpha _1, \cdots , \alpha _k)\) in rules R1 and R2, respectively.

Second, rules R10, R11 and R12 can only be applied finitely in the reduction \(\pi \), as these rules always push the addition operator \(\oplus \) toward the root of the syntax tree of the term \(\tau _i\) when one of them is applied onto a term \(\tau _i\), while the other rules either eliminate or reorder the addition operator \(\oplus \).

figure l

Lastly, rules R3–9 and R13 can only be applied finitely in the reduction \(\pi \), as these rules reduce the size of the term by 1 when one of them is applied onto a term \(\tau _i\) while the rules R10–12 that increase the size of the term can only be applied finitely.

Hence, the reduction \(\pi \) is finite indicating that the TRS \(\mathcal {R}\) is terminating.    \(\square \)

By Lemmas 1 and 2, any term \(\tau \in T[\varSigma ,V]\) can be rewritten to a normal form that must be a polynomial.

Theorem 1

Let \(\mathcal {R}=(\varSigma , V, \ge _s, \varDelta )\) be the TRS of a program \(\mathcal {P}\). For any term \(\tau \in T[\varSigma ,V]\), a polynomial \(\tau '\in T[\varSigma ,V]\) can be computed such that \(\tau \Rrightarrow \tau '\).

Remark 1

Besides the termination of a TRS, confluence is another important property of a TRS, where a TRS is confluent if any given term \(\tau \in T[\varSigma ,V]\) can be rewritten to two distinct terms \(\tau _1\) and \(\tau _2\), then the terms \(\tau _1\) and \(\tau _2\) can be reduced to a common term. While we conjecture that the TRS \(\mathcal {R}\) associated with the given program is indeed confluent which may be shown by its local confluence [51], we do not strive to prove its confluence, as it is irrelevant to the problem considered in the current work.

6 Algorithmic Verification

In this section, we first present an algorithm for computing normal forms, then show how to compute the affine constant for an affine transformation, and finally propose an algorithm for solving the verification problem.

6.1 Term Normalization Algorithm

We provide the function TermNorm (cf. Algorithm 1) which applies the rewriting rules in a particular order aiming for better efficiency. Fix a TRS \(\mathcal {R}=(\varSigma , V, \ge _s, \varDelta )\), a term \(\tau \in T[\varSigma ,V]\) and a mapping \(\lambda \) that provides required affine constants \(\lambda (f)\). TermNorm\((\mathcal {R},\tau ,\lambda )\) returns a normal form \(\tau '\) of \(\tau \), i.e., \(\tau \Rrightarrow \tau '\).

figure m

TermNorm first applies rules R3–R13 to rewrite the term \(\tau \) (line 2), resulting in a polynomial which does not have 0 as a factor or monomial (due to rules R4–R7), or 1 as a factor in a monomial unless the monomial itself is 1 (due to rules R\(_8\) and R\(_9\)). Next, it recursively sorts all the factors and monomial involved in the polynomial from the innermost sub-terms (lines 3 and 4). Sorting factors and monomials will place the same monomials at adjacent positions. Finally, rules R3 and R6–R7 are further applied to simplify the polynomial (line 5), where consecutive syntactically equivalent monomials will be rewritten to 0 by rule R3, which may further enable rules R6–R7. Obviously, the final term \(\tau '\) is a normal form of the input \(\tau \), although its size may be exponential in that of \(\tau \).

Lemma 3

TermNorm\((\mathcal {R},\tau ,\lambda )\) returns a normal form \(\tau '\) of \(\tau \).   \(\square \)

6.2 Computing Affine Constants

The function AffConst in Algorithm 2 computes the associated affine constant for an affine transformation f. It first sorts all affine transformations in a topological order based on the call graph G (lines 2–21). If f is only declared in \(\mathcal {P}\), as mentioned previously, we assumed it is linear, thus 0 is assigned to \(\lambda (f)\) (line 4). Otherwise, it extracts the input x of f and computes its output \(\xi (x)\) via symbolic execution (line 7), where \(\xi (x)\) is treated as f(x). We remark that during symbolic execution, we adopt a lazy strategy for inlining invoked affine transformations in f to reduce the size of \(\xi (x)\). Thus, \(\xi (x)\) may contain affine transformations.

Recall that c is the affine constant of f iff \(\forall x, y\in \mathbb {F}. f(x \oplus y) = f(x) \oplus f(y) \oplus c\) holds. Thus, we create the term \(\tau =\xi (x)[x\mapsto x\oplus y]\oplus \xi (x) \oplus \xi (x)[x\mapsto y]\) (line 7), where \(e[a\mapsto b]\) denotes the substitution of a with b in e. Obviously, the term \(\tau \) is equivalent to some constant c iff c is the affine constant of f.

The while-loop (lines 9–21) evaluates \(\tau \). First, it rewrites \(\tau \) to a normal form (line 10) by invoking TermNorm in Alg.1. If the normal form is some constant c, then c is the affine constant of f. Otherwise, AffConst repeatedly inlines each affine transformation g that is defined in P but has not been inlined in \(\tau \) (lines 13 and 14) and rewrites the term \(\tau \) to a normal form until either the normal form is some constant c or no affine transformation can be inlined. If the normal form is still not a constant, \(\tau \) is evaluated using random input values. Clearly, if \(\tau \) is evaluated to two distinct values (line 18), f is not affine. Otherwise, we check the satisfiability of the constraint \(\forall x,y. \tau =c\) via an SMT solver in bitvector theory (line 19), where declared but undefined affine transformations are treated as uninterpreted functions provided with their affine properties. If \(\forall x, y. \tau =c\) is satisfiable, we extract the affine constant c from its model (line 20). Otherwise, we emit an error and then abort (line 21), indicating that the affine constant of f cannot be computed. Since the satisfiability problem module bitvector theory is decidable, we can conclude that f is not affine if \(\forall x.\forall y. \tau =c\) is unsatisfiable and no uninterpreted function is involved in \(\tau \).

Lemma 4

Assume an affine transformation f in \(\mathcal {P}\). If AffConst\((\mathcal {P},\mathcal {R},G)\) in Algorithm 2 returns a mapping \(\lambda \), then \(\lambda (f)\) is the affine constant of f.   \(\square \)

6.3 Verification Algorithm

The verification problem is solved by the function Verifier\((\mathcal {P})\) in Algorithm 3, which checks if \(f_{\mathfrak m}\cong f_{\mathfrak o}\), for each procedure f defined in \(\mathcal {P}\). It first preprocesses the given program \(\mathcal {P}\) by inlining all the procedures, unrolling all the loops and eliminating all the branches (line 2). Then, it computes the corresponding TRS \(\mathcal {R}\), call graph G and affine constants as the mapping \(\lambda \), respectively (line 3). Next, it iteratively checks if \(f_{\mathfrak m}\cong f_{\mathfrak o}\), for each procedure f defined in \(\mathcal {P}\) (lines 4–23).

For each procedure f, it first extracts the inputs \(a^1, \cdots , a^m\) of \(f_{\mathfrak o}\) that are scalar variables (line 5) and input encodings \(\textbf{a}^1, \cdots , \textbf{a}^m\) of \(f_{\mathfrak m}\) that are vectors of variables (line 6). Then, it computes the output \(\xi (a^1, \cdots , a^m)\) of \(f_{\mathfrak o}\) via symbolic execution, which yields an expression in terms of \(a^1, \cdots , a^m\) and affine transformations (line 7). Similarly, it computes the output \(\mathbf{\xi '}(\textbf{a}^1, \cdots , \textbf{a}^m)\) of \(f_{\mathfrak m}\) via symbolic execution, i.e., a tuple of expressions in terms of the entries of the input encodings \(\textbf{a}^1, \cdots , \textbf{a}^m\), random variables and affine transformations (line 8).

Recall that \(f_{\mathfrak m}\cong f_{\mathfrak o}\) iff for all \(a^1, \cdots , a^m, r_1,\cdots ,r_h \in \mathbb {F}\) and for all \(\textbf{a}^1, \cdots , \textbf{a}^m \in \mathbb {F}^{d+1}\), the following constraint holds (cf. Definition 1):

$$\begin{aligned} \big ( \bigwedge \nolimits _{i \in [1,m]}\ a^i = \bigoplus \nolimits _{j \in [0,d]} {\textbf{a}}^i_j \big ) \rightarrow \big (f_{\mathfrak o}(a^1,\cdots , a^m) = \bigoplus \nolimits _{i \in [0,d]} f_{\mathfrak m}(\textbf{a}^1, \cdots , \textbf{a}^m_i)\big ) \end{aligned}$$

where \(r_1,\cdots ,r_h\) are all the random variables used in \(f_{\mathfrak m}\). Thus, it creates the term \(\tau =\xi (a^1, \cdots , a^m)[a^1\mapsto \bigoplus \textbf{a}^1,\cdots , a^m\mapsto \bigoplus \textbf{a}^m]\oplus \bigoplus \mathbf{\xi '}(\textbf{a}^1, \cdots , \textbf{a}^m)\) (line 9), where \(a^i\mapsto \bigoplus \textbf{a}^i\) is the substitution of \(a^i\) with the term \(\bigoplus \textbf{a}^i\) in the expression \(\xi (a^1, \cdots , a^m)\). Obviously, \(\tau \) is equivalent to 0 iff \(f_{\mathfrak m}\cong f_{\mathfrak o}\).

figure n

To check if \(\tau \) is equivalent to 0, similar to computing affine constants in Algorithm 2, the algorithm repeatedly rewrites the term \(\tau \) to a normal form by invoking TermNorm in Algorithm 1 until either the conclusion is drawn or no affine transformation can be inlined (lines 10–23). We declare that f is correct if the normal form is 0 (line 13) and incorrect if it is a non-zero constant (line 14). If the normal form is not a constant, we repeatedly inline affine transformation g defined in P which has not been inlined in \(\tau \) and re-check the term \(\tau \).

If there is no definite answer after inlining all the affine transformations, \(\tau \) is evaluated using random input values. f is incorrect if \(\tau \) is non-zero (line 20). Otherwise, we check the satisfiability of the constraint \(\tau \ne 0\) via an SMT solver in bitvector theory (line 21). If \(\tau \ne 0\) is unsatisfiable, then f is correct. Otherwise we can conclude that f is incorrect if no uninterpreted function is involved in \(\tau \), but in other cases it is not conclusive.

Theorem 2

Assume a procedure f in \(\mathcal {P}\). If Verifier\((\mathcal {P})\) emits “f is correct”, then \(f_{\mathfrak m}\cong f_{\mathfrak o}\); if Verifier\((\mathcal {P})\) emits “f is incorrect” or “f may be incorrect” with no uninterpreted function involved in its final term \(\tau \), then \(f_{\mathfrak m}\not \cong f_{\mathfrak o}\).    \(\square \)

6.4 Implementation Remarks

To implement the algorithms, we use the total order \(\ge _s\) on \(V\uplus \varSigma \) where all the constants are smaller than the variables, which are in turn smaller than the affine transformations. The order of constants is the standard one on integers, and the order of variables (affine transformations) uses lexicographic order.

In terms of data structure, each term is primarily stored by a directed acyclic graph, allowing us to represent and rewrite common sub-terms in an optimised way. Once a (sub-)term becomes a polynomial during term rewriting, it is stored as a sorted nested list w.r.t. the monomial order \(\ge _p\), where each monomial is also stored as a sorted list w.r.t. the factor order \(\ge _l\). Moreover, the factor of the form \(\alpha ^k\) in a monomial is stored by a pair \((\alpha ,k)\).

We also adopted two strategies: (i) By Fermat’s little theorem [63], \(x^{2^n - 1} = 1\) for any \(x\in \mathbb{G}\mathbb{F}(2^n)\). Hence each k in \((\alpha ,k)\) can be simplified to \(k \mod (2^n-1)\). (ii) By rule R12, a term \(f(\tau _1\oplus \cdots \oplus \tau _k)\) can be directly rewritten to \(f(\tau _1)\oplus \cdots \oplus (\tau _k)\) if k is odd, and \(f(\tau _1)\oplus \cdots \oplus f(\tau _k)\oplus c\) if k is even, where c is the affine constant associated with the affine transformation f.

7 Evaluation

We implement our approach as a tool FISCHER for verifying masked programs in LLVM IR, based on the LLVM framework. We first evaluate FISCHER for computing affine constants (i.e., Algorithm 2), correctness verification, and scalability w.r.t. the masking order (i.e., Algorithm 3) on benchmarks using the ISW scheme. To show the generality of our approach, FISCHER is then used to verify benchmarks using glitch-resistant Boolean masking schemes and lattice-based public-key cryptography. All experiments are conducted on a machine with Linux kernel 5.10, Intel i7 10700 CPU (4.8 GHz, 8 cores, 16 threads) and 40 GB memory. Milliseconds (ms) and seconds (s) are used as the time units in our experiments.

7.1 Evaluation for Computing Affine Constants

To evaluate Algorithm 2, we compare with a pure SMT-based approach which directly checks \(\exists c.\forall x, y\in \mathbb {F}. f(x \oplus y) = f(x) \oplus f(y) \oplus c\) using Z3 [47], CVC5 [5] and Boolector [18], by implementing \(\oplus \) and \(\otimes \) in bit-vector theory, where \(\otimes \) is achieved via the Russian peasant method [16]. Technically, SMT solvers only deal with satisfiability, but they usually can eliminate the universal quantifiers in this case, as xy are over a finite field. In particular, in our experiment, Z3 is configured with default (i.e. (check-sat)), simplify (i.e. (check-sat-using (then simplify smt))) and bit-blast (i.e. (check-sat-using (then bit-blast smt))), denoted by Z3-d, Z3-s and Z3-b, respectively. We focus on the following functions: \(\texttt {exp}i(x)=x^i\) for \(i\in \{2,4,8,16\}\); \(\texttt {rotl}i(x)\) for \(i\in \{1,2,3,4\}\) that left rotates x by i bits; \(\texttt {af}(x) = \texttt {rotl1}(x) \oplus \texttt {rotl2}(x) \oplus \texttt {rotl3}(x) \oplus \texttt {rotl4}(x) \oplus 99\) used in AES S-Box; \(\texttt {L1}(x)=7x^2\oplus 14x^4\oplus 7x^8\), \(\texttt {L3}(x)=7x\oplus 12x^2\oplus 12x^4\oplus 9x^8\), \(\texttt {L5}(x)=10x\oplus 9x^2\) and \(\texttt {L7}(x)=4x\oplus 13x^2\oplus 13x^4\oplus 14x^8\) used in PRESENT S-Box over \(\mathbb{G}\mathbb{F}(16)=\mathbb{G}\mathbb{F}(2)[X]/(X^4+X+1)\) [14, 19]; \(\texttt {f1}(x)=x^3\), \(\texttt {f2}(x)=x^2\oplus x\oplus 1\), \(\texttt {f3}(x)=x\oplus x^5\) and \(\texttt {f4}(x)=\texttt {af}(\texttt {exp2}(x))\) over \(\mathbb{G}\mathbb{F}(2^8)\).

Table 1. Results of computing affine constants, where \(\dag \) means Algorithm 2 needs SMT solving, \(\ddag \) means affineness is disproved via testing, means nonaffineness, and Algorithm 2+B means Algorithm 2+Boolector.

The results are reported in Table 1, where the 2nd–8th rows show the execution time and the last row shows the affine constants if they exist otherwise . We observe that Algorithm 2 significantly outperforms the SMT-based approach on most cases for all the SMT solvers, except for rotl i and af (It is not surprising, as they use operations rather than \(\oplus \) and \(\otimes \), thus SMT solving is required). The term rewriting system is often able to compute affine constants solely (e.g., \(\texttt {exp}i\) and \(\texttt {L}i\)), and SMT solving is required only for computing the affine constants of \(\texttt {rotl}i\). By comparing the results of Algorithm 2+Z3-b vs. Z3-b and Algorithm 2+B vs. Boolector on af, we observe that term rewriting is essential as checking normal form—instead of the original constraint—reduces the cost of SMT solving.

7.2 Evaluation for Correctness Verification

To evaluate Algorithm 3, we compare it with a pure SMT-based approach with SMT solvers Z3, CVC5 and Boolector. We also consider several promising general-purpose software verifiers SMACK (with Boogie and Corral engines), SeaHorn, CPAChecker and Symbiotic, and one cryptography-specific verifier CryptoLine (with SMT and CAS solvers), where the verification problem is expressed using and statements. Those verifiers are configured in two ways: (1) recommended ones in the manual/paper or used in the competition, and (2) by trials of different configurations and selecting the optimal one. Specifically:

  • CryptoLine (commit 7e237a9). Both solvers SMT and CAS are used;

  • SMACK v2.8.0. integer-encoding: bit-vector, verifier: corral/boogie (both used), solver: Z3/CVC4 (Z3 used), static-unroll: on, unroll: 99;

  • SEAHORN v0.1.0 RC3 (commit e712712). pipeline: bpf, arch: m64, inline: on, track: mem, bmc: none/mono/path (mono used), crab: on/off (off used);

  • CPAChecker v2.1.1. default.properties with cbmc: on/off (on used);

  • Symbiotic v8.0.0. officially-provided SV-COMP configuration with exit-on-error: on.

The benchmark comprises five different masked programs sec_mult for finite-field multiplication over \(\mathbb{G}\mathbb{F}(2^8)\) by varying masking order \(d=0,1,2,3\), where the \(d=0\) means the program is unmasked. We note that sec_mult in [8] is only available for masking order \(d\ge 2\).

Table 2. Results on various sec_mult, where T.O. means time out (20 min), N/A means that UNKNOWN result, and \(\natural \) means that verification result is incorrect.

The results are shown in Table 2. We can observe that FISCHER is significantly more efficient than the others, and is able to prove all the cases using our term rewriting system solely (i.e., without random testing or SMT solving). With the increase of masking order d, almost all the other tools failed. Both CryptoLine (with the CAS solver) and CPAChecker fail to verify any of the cases due to the non-linear operations involved in sec_mult. SMACK with Corral engine produces two false positives (marked by \(\natural \) in Table 2). These results suggest that dedicated verification approaches are required for proving the correctness of masked programs.

7.3 Scalability of FISCHER

To evaluate the scalability of FISCHER, we verify different versions of sec_mult and masked procedures sec_aes_sbox (resp. sec_present_sbox) of S-Boxes used in AES [58] (resp. PRESENT [19]) with varying masking order d. Since it is known that refresh_masks in [58] is vulnerable when \(d\ge 4\) [24], a fixed version RefreshM [7] is used in all the S-Boxes (except that when sec_mult is taken from [8] its own version is used). We note that sec_present_sbox uses the affine transformations L1, L3, L5, L7, exp2 and exp4, while sec_aes_sbox uses the affine transformations af, exp2, exp4 and exp16.

The results are reported in Table 3. All those benchmarks are proved using our term rewriting system solely except for the three incorrect ones marked by \(\natural \). FISCHER scales up to masking order of 100 or even 200 for sec_mult, which is remarkable. FISCHER also scales up to masking order of 30 or even 40 for sec_present_sbox. However, it is less scalable on sec_aes_sbox, as it computes the multiplicative inverse \(x^{254}\) on shares, and the size of the term encoding the equivalence problem explodes with the increase of the masking order. Furthermore, to better demonstrate the effectiveness of our term writing system in dealing with complicated procedures, we first use Algorithm 2 to derive affine constants on sec_aes_sbox with ISW [58] and then directly apply SMT solvers to solve the correctness constraints obtained at Line 9 of Algorithm 3. It takes about 1 s to obtain the result on the first-order masking, while fails to obtain the result within 20 min on the second-order masking.

Table 3. Results on sec_mult and S-Boxes, where T.O. means time out (20 min), and \(\natural \) means that the program is incorrect.

A highlight of our findings is that FISCHER reports that sec_mult from [8] and the S-boxes based on this version are incorrect when \(d=5\). After a careful analysis, we found that indeed it is incorrect for any \(d \equiv 1 \mod 4\) (i.e., 5, 9, 13, etc.). This is because [8] parallelizes the multiplication over the entire encodings (i.e., tuples of shares) while the parallelized computation depends on the value of \(d \mod 4\). When the reminder is 1, the error occurs.

7.4 Evaluation for More Boolean Masking Schemes

To demonstrate the applicability of FISCHER on a wider range of Boolean masking schemes, we further consider glitch-resistant Boolean masking schemes: HPC1, HPC2 [20], DOM [35] and CMS [57]. We implement the finite-field multiplication sec_mult using those masking schemes, as well as masked versions of AES S-box and PRESENT S-box. We note that our implementation of DOM sec_mult is derived from [20], and we only implement the 2nd-order CMS sec_mult due to the difficulty of implementation. All other experimental settings are the same as in Sect. 7.3.

Table 4. Results on sec_mult and S-Boxes for HPC, DOM and CMS.

The results are shown in Table 4. Our term rewriting system solely is able to efficiently prove the correctness of finite-field multiplication sec_mult, masked versions of AES S-box and PRESENT S-box using the glitch-resistant Boolean masking schemes HPC1, HPC2, DOM and CMS. The verification cost of those benchmarks is similar to that of benchmarks using the ISW scheme, demonstrating the applicability of FISCHER for various Boolean masking schemes.

Table 5. Results on sec_add, sec_add_modp and sec_a2b [17], where T.O. means time out (20 min).

7.5 Evaluation for Arithmetic/Boolean Masking Conversions

To demonstrate a wider applicability of FISCHER other than masked implementations of symmetric cryptography, we further evaluate FISCHER on three key non-linear building blocks for bitsliced, masked implementations of lattice-based post-quantum key encapsulation mechanisms (KEMs [17]). Note that KEMs are a class of encryption techniques designed to secure symmetric cryptographic key material for transmission using asymmetric (public-key) cryptography. We implement the Boolean masked addition modulo \(2^k\) (sec_add), Boolean masked addition modulo p (sec_add_modp) and the arithmetic-to-Boolean masking conversion modulo \(2^k\) (sec_a2b) for various bit-width k and masking order d, where p is the largest prime number less than \(2^k\). Note that some bitwise operations (e.g., circular shift) are expressed by affine transformations, and the modulo addition is implemented by the simulation algorithm [17] in our implementations.

The results are reported in Table 5. FISCHER is able to efficiently prove the correctness of these functions with various masking orders (d) and bit-width (k), using the term rewriting system solely. With the increase of the bit-width k (resp. masking order d), the verification cost increases more quickly for sec_add_modp (resp. sec_a2b) than for sec_add. This is because sec_add_modp with bit-width k invokes sec_add three times, two of which have the bit-width \(k+1\), and the number of calls to sec_add in sec_a2b increases with the masking order d though using the same bit-width as sec_a2b. These results demonstrate the applicability of FISCHER for asymmetric cryptography.

8 Conclusion

We have proposed a term rewriting based approach to proving functional equivalence between masked cryptographic programs and their original unmasked algorithms over \(\mathbb{G}\mathbb{F}(2^n)\). Based on this approach, we have developed a tool FISCHER and carried out extensive experiments on various benchmarks. Our evaluation confirms the effectiveness, efficiency and applicability of our approach.

For future work, it would be interesting to further investigate the theoretical properties of the term rewriting system. Moreover, we believe the term rewriting approach extended with more operations may have a greater potential in verifying more general cryptographic programs, e.g., those from the standard software library such as OpenSSL.