1 Introduction

Side-channel attacks aim to infer secret data (e.g. cryptographic keys) by exploiting statistical dependence between secret data and non-functional observations such as execution time [33], power consumption [34], and electromagnetic radiation [46]. They have become a serious threat in application domains such as cyber-physical systems. As a typical example, the power consumption of a device executing the instruction \(c=p\oplus k\) usually depends on the secret k, and this can be exploited via differential power analysis (DPA) [37] to deduce k.

Masking is one of the most widely-used and effective countermeasure to thwart side-channel attacks. Masking is essentially a randomization technique for reducing the statistical dependence between secret data and side-channel information (e.g. power consumption). For example, using Boolean masking scheme, one can mask the secret data k by applying the exclusive-or (\(\oplus \)) operation with a random variable r, yielding a masked secret data \(k\oplus r\). It can be readily verified that the distribution of \(k\oplus r\) is independent of the value of k when r is uniformly distributed. Besides Boolean masking scheme, there are other masking schemes such as additive masking schemes (e.g. \((k+r)\bmod n\)) and multiplicative masking schemes (e.g. \((k\times r)\bmod n\)). A variety of masking implementations such as AES and its non-linear components (S-boxes) have been published over the years. However, designing effective and efficient masking schemes is still a notoriously difficult task, especially for non-linear functions. This has motivated a large amount of work on verifying whether masked implementations, as either (hardware) circuits or (software) programs, are statistically independent of secret inputs. Typically, masked hardware implementations are modeled as (probabilistic) Boolean programs where all variables range over the Boolean domain (i.e. ), while masked software implementations, featuring a richer set of operations, require to be modeled as (probabilistic) arithmetic programs.

Verification techniques for masking schemes can be roughly classified into type system based approaches [3,4,5, 14, 16, 19, 38] and model-counting based approaches [24, 25, 50]. The basic idea of type system based approaches is to infer a distribution type for observable variables in the program that are potentially exposed to attackers. From the type information one may be able to show that the program is secure. This class of approaches is generally very efficient mainly because of their static analysis nature. However, they may give inconclusive answers as most existing type systems do not provide completeness guarantees.

Model-counting based approaches, unsurprisingly, encode the verification problem as a series of model-counting problems, and typically leverage SAT/SMT solvers. The main advantage of this approach is its completeness guarantees. However, the size of the model-counting constraint is usually exponential in the number of (bits of) random variables used in masking, hence the approach poses great challenges to its scalability. We mention that, within this category, some work further exploits Fourier analysis [11, 15], which considers the Fourier expansion of the Boolean functions. The verification problem can then be reduced to checking whether certain coefficients of the Fourier expansion are zero or not. Although there is no hurdle in principle, to our best knowledge currently model-counting based approaches are limited to Boolean programs only.

While verification of masking for Boolean programs is well-studied [24, 50], generalizing them to arithmetic programs brings additional challenges. First of all, arithmetic programs admit more operations which are absent from Boolean programs. A typical example is field multiplication. In the Boolean domain, it is nothing more than \(\oplus \) which is a bit operation. However for (typically \(n=8\) in cryptographic algorithm implementations), the operation is nontrivial which prohibits many optimization which would otherwise be useful for Boolean domains. Second, verification of arithmetic programs often suffers from serious scalability issues, especially when the model-counting based approaches are applied. We note that transforming arithmetic programs into equivalent Boolean versions is theoretically possible, but suffer from several deficiencies: (1) one has to encode complicated arithmetic operations (e.g. finite field multiplication) as bitwise operations; (2) the resulting Boolean program needs to be checked against higher-order attacks which are supposed to observe multiple observations simultaneously. This is a far more difficult problem. Because of this, we believe such as approach is practically, if not infeasible, unfavourable.

Perfect masking is ideal but not necessarily holds when there are flaws or only a limited number of random variables are allowed for efficiency consideration. In case that the program is not perfectly masked (i.e., a potential side channel does exist), naturally one wants to tell how severe it is. For instance, one possible measure is the resource the attacker needs to invest in order to infer the secret from the side channel. For this purpose, we adapt the notion of Quantitative Masking Strength, with which a correlation of the number of power traces to successfully infer secret has been established empirically [26, 27].

Main Contributions. We mainly focus on the verification of masked arithmetic programs. We advocate a hybrid verification method combining type system based and model-counting based approaches, and provide additional quantitative analysis. We summarize the main contributions as follows.

  • We provide a hybrid approach which integrates type system based and model-counting based approaches into a framework, and support a sound and complete reasoning of masked arithmetic programs.

  • We provide quantitative analysis in case when the masking is not effective, to calculate a quantitative measure of the information leakage.

  • We provide various heuristics and optimized algorithms to significantly improve the scalability of previous approaches.

  • We implement our approaches in a software tool and provide thorough evaluations. Our experiments show orders of magnitude of improvement with respect to previous verification methods on common benchmarks.

We find, perhaps surprisingly, that for model-counting, the widely adopted approaches based on SMT solvers (e.g. [24, 25, 50]) may not be the best approach, as our experiments suggest that an alternative brute-force approach is comparable for Boolean programs, and significantly outperforms for arithmetic programs.

Related Work. The d-threshold probing model is the de facto standard leakage model for order-d power side-channel attacks [32]. This paper focuses on the case that \(d=1\). Other models like noise leakage model [17, 45], bounded moment model [6], and threshold probing model with transitions/glitch [15, 20] could be reduced to the threshold probing model, at the cost of introducing higher orders [3]. Other work on side channels such as execution-time, faults, and cache do exist ([1, 2, 7, 8, 12, 28, 31, 33] to cite a few), but is orthogonal to our work.

Type systems have been widely used in the verification of side channel attacks with early work [9, 38], where masking compilers are provided which can transform an input program into a functionally equivalent program that is resistant to first-order DPA. However, these systems either are limited to certain operations (i.e., \(\oplus \) and table look-up), or suffer from unsoundness and incompleteness under the threshold probing model. To support verification of higher-order masking, Barthe et al. introduced the notion of noninterference (NI, [3]), and strong t-noninterference (SNI, [4]), which were extended to give a unified framework for both software and hardware implementations in maskVerif [5]. Further work along this line includes improvements for efficiency [14, 19], generalization for assembly-level code [15], and extensions with glitches for hardware programs [29]. As mentioned earlier, these approaches are incomplete, i.e., secure programs may fail to pass their verification.

[24, 25] proposed a model-counting based approach for Boolean programs by leveraging SMT solvers, which is complete but limited in scalability. To improve efficiency, a hybrid approach integrating type-based and model-counting based approaches [24, 25] was proposed in [50], which is similar to the current work in spirit. However, it is limited to Boolean programs and qualitative analysis only. [26, 27] extended the approach of [24, 25] for quantitative analysis, but is limited to Boolean programs. The current work not only extends the applicability but also achieves significant improvement in efficiency even for Boolean programs (cf. Sect. 5). We also find that solving model-counting via SMT solvers [24, 50] may not be the best approach, in particular for arithmetic programs.

Our work is related to quantitative information flow (QIF) [13, 35, 43, 44, 49] which leverages notions from information theory (typically Shannon entropy and mutual information) to measure the flow of information in programs. The QIF framework has also been specialized to side-channel analysis [36, 41, 42]. The main differences are, first of all, QIF targets fully-fledged programs (including branching and loops) so program analysis techniques (e.g. symbolic execution) are needed, while we deal with more specialized (transformed) masked programs in straight-line forms; second, to measure the information leakage quantitatively, our measure is based on the notion QMS which is correlated with the number of power traces needed to successfully infer the secret, while QIF is based on a more general sense of information theory; third, for calculating such a measure, both works rely on model-counting. In QIF, the constraints over the input are usually linear, but the constraints in our setting involve arithmetic operations in rings and fields. Randomized approximate schemes can be exploited in QIF [13] which is not suitable in our setting. Moreover, we mention that in QIF, input variables should in principle be partitioned into public and private variables, and the former of which needs to be existentially quantified. This was briefly mentioned in, e.g., [36], but without implementation.

2 Preliminaries

Let us fix a bounded integer domain \(\mathbb {D}=\{0,\cdots ,2^n-1\}\), where n is a fixed positive integer. Bit-wise operations are defined over \(\mathbb {D}\), but we shall also consider arithmetic operations over \(\mathbb {D}\) which include \(+, -, \times \) modulo \(2^n\) for which \(\mathbb {D}\) is consider to be a ring and the Galois field multiplication \(\odot \) where \(\mathbb {D}\) is isomorphic to (or simply ) for some irreducible polynomial p. For instance, in AES one normally uses and \(p(x)=x^8 + x^4 +x^3 + x^2 + 1\).

2.1 Cryptographic Programs

We focus on programs written in C-like code that implement cryptographic algorithms such as AES, as opposed to arbitrary software programs. To analyze such programs, it is common to assume that they are given in straight-line forms (i.e., branching-free) over \(\mathbb {D}\) [3, 24]. The syntax of the program under consideration is given as follows, where \(c\in \mathbb {D}\).

figure a

A program P consists of a sequence of assignments followed by a return statement. An assignment \(x\leftarrow e\) assigns the value of the expression e to the variable x, where e is built up from a set of variables and constants using (1) bit-wise operations negation (\(\lnot \)), and (\(\wedge \)), or (\(\vee \)), exclusive-or (\(\oplus \)), left shift \(\ll \) and right shift \(\gg \); (2) modulo \(2^n\) arithmetic operations: addition (\(+\)), subtraction (−), multiplication (\(\times \)); and (3) finite-field multiplication (\(\odot \)) (over )Footnote 1. We denote by \(\mathcal {O}^*\) the extended set \(\mathcal {O}\cup \{\ll ,\gg \}\) of operations.

Given a program P, let \(X=X_p \uplus X_k\uplus X_i\uplus X_r\) denote the set of variables used in P, where \(X_p\), \(X_k\) and \(X_i\) respectively denote the set of public input, private input and internal variables, and \(X_r\) denotes the set of (uniformly distributed) random variables for masking private variables. We assume that the program is given in the single static assignment (SSA) form (i.e., each variable is defined exactly once) and each expression uses at most one operator. (One can easily transform an arbitrary straight-line program into an equivalent one satisfying these conditions.) For each assignment \(x\leftarrow e\) in P, the computation \(\mathcal {E}(x)\) of x is an expression obtained from e by iteratively replacing all the occurrences of the internal variables in e by their defining expressions in P. SSA form guarantees that \(\mathcal {E}(x)\) is well-defined.

Semantics. A valuation is a function \(\sigma :X_p\cup X_k\rightarrow \mathbb {D}\) assigning to each variable \(x\in X_p\cup X_k\) a value \(c\in \mathbb {D}\). Let \(\varTheta \) denote the set of all valuations. Two valuations \(\sigma _1,\sigma _2\in \varTheta \) are Y-equivalent, denoted by \(\sigma _1\approx _{Y}\sigma _2\), if \(\sigma _1(x)=\sigma _2(x)\) for all \(x\in Y\).

Given an expression e in terms of \(X_p\cup X_k\cup X_r\) and a valuation \(\sigma \in \varTheta \), we denote by \(e(\sigma )\) the expression obtained from e by replacing all the occurrences of variables \(x\in X_p\cup X_k\) by their values \(\sigma (x)\), and denote by the distribution of e (with respect to the uniform distribution of random variables \(e(\sigma )\) may contain). Concretely, is the probability of the expression \(e(\sigma )\) being evaluated to v for each \(v\in \mathbb {D}\). For each variable \(x\in X\) and valuation \(\sigma \in \varTheta \), we denote by the distribution . The semantics of the program P is defined as a (partial) function which takes a valuation \(\sigma \in \varTheta \) and an internal variable \(x\in X_i\) as inputs, returns the distribution of x.

Threat Models and Security Notions. We assume that the adversary has access to public input \(X_p\), but not to private input \(X_k\) or random variables \(X_r\), of a program P. However, the adversary may have access to an internal variable \(x\in X_i\) via side-channels. Under these assumptions, the goal of the adversary is to deduce the information of \(X_k\).

Definition 1

Let P be a program. For every internal variable \(x\in X_i\),

  • x is uniform in P, denoted by x-\(\mathbf{UF}\), if is uniform for all \(\sigma \in \varTheta \).

  • x is statistically independent in P, denoted by x-\(\mathbf{SI}\), if for all \((\sigma _1,\sigma _2)\in \varTheta ^2_{X_p}\), where \(\varTheta ^2_{X_p}:=\{(\sigma _1,\sigma _2)\in \varTheta \times \varTheta \mid \sigma _1\approx _{X_p}\sigma _2\}\).

Proposition 1

If the program P is x-\(\mathbf{UF}\), then P is x-\(\mathbf{SI}\).

Definition 2

For a program P, a variable x is perfectly masked (a.k.a. secure under 1-threshold probing model [32]) in P if it is x-\(\mathbf{SI}\), otherwise x is leaky.

P is perfectly masked if all internal variables in P are perfectly masked.

2.2 Quantitative Masking Strength

When a program is not perfectly masked, it is important to quantify how secure it is. For this purpose, we adapt the notion of Quantitative Masking Strength (QMS) from [26, 27] to quantify the strength of masking countermeasures.

Definition 3

The quantitative masking strength \(\mathtt{QMS}_{x}\) of a variable \(x\in X\), is defined as:

Accordingly, the quantitative masking strength of the program P is defined by \(\mathtt{QMS}_P:=\min _{x\in X_i}{\mathtt{QMS}_{x}}\).

The notion of QMS generalizes that of perfect masking, i.e., P is x-\(\mathbf{SI}\) iff \(\mathtt{QMS}_{x}=1\). The importance of QMS has been highlighted in [26, 27] where it is empirically shown that, for Boolean programs the number of power traces needed to determine the secret key is exponential in the QMS value. This study suggests that computing an accurate QMS value for leaky variables is highly desirable.

Fig. 1.
figure 1

A buggy version of Cube from [47]

Example 1

Let us consider the program in Fig. 1, which implements a buggy Cube in from [47]. Given a secret key k, to avoid first-order side-channel attacks, k is masked by a random variable \(r_0\) leading to two shares \(x= k\oplus r_0\) and \(r_0\). Cube\((k,r_0,r_1)\) returns two shares \(x_7\) and \(x_9\) such that \(x_7\oplus x_9=k^3:=k\odot k\odot k\), where \(r_1\) is another random variable.

Cube computes \(k\odot k\) by \(x_0=x\odot x\) and \(x_1=r_0\odot r_0\) (Lines 3–4), as \(k\odot k=x_0\oplus x_1\). Then, it computes \(k^3\) by a secure multiplication of two pairs of shares \((x_0,x_1)\) and \((x,r_0)\) using the random variable \(r_1\) (Lines 5–12). However, this program is vulnerable to first-order side-channel attacks, as it is neither \(x_2\)-\(\mathbf{SI}\) nor \(x_3\)-\(\mathbf{SI}\). As shown in [47], we shall refresh \((x_0,x_1)\) before computing \(k^2\odot k\) by inserting \(x_0=x_0\oplus r_2\) and \(x_1=x_1\oplus r_2\) after Line 4, where \(r_2\) is a random variable. We use this buggy version as a running example to illustrate our techniques.

As setup for further use, we have: \(X_p=\emptyset \), \(X_k=\{k\}\), \(X_r=\{r_0,r_1\}\) and \(X_i=\{x,x_0,\cdots ,x_9\}\). The computations \(\mathcal {E}(\cdot )\) of internal variables are:

3 Three Key Techniques

In this section, we introduce three key techniques: type system, model-counting based reasoning and reduction techniques, which will be used in our algorithm.

3.1 Type System

We present a type system for formally inferring distribution types of internal variables, inspired by prior work [3, 14, 40, 50]. We start with some basic notations.

Definition 4

(Dominant variables). Given an expression e, a random variable r is called a dominant variable of e if the following two conditions hold: (i) r occurs in e exactly once, and (ii) each operator on the path between the leaf r and the root in the abstract syntax tree of e is from either \(\{\oplus ,\lnot ,+,-\}\) or \(\{\odot \}\) such that one of the children of the operator is a non-zero constant.

Remark that in Definition 4, for efficiency consideration, we take a purely syntactic approach meaning that we do not simplify e when checking the condition (i) that r occurs once. For instance, x is not a dominant variable in \(((x\oplus y)\oplus x)\oplus x\), although intuitively e is equivalent to \(y\oplus x\).

Given an expression e, let \(\mathsf{Var}(e)\) be the set of variables occurring in e, and \(\mathsf{RVar}(e):=\mathsf{Var}(e)\cap X_r\). We denote by \(\mathsf{Dom}(e)\subseteq \mathsf{RVar}(e)\) the set of all dominant random variables of e, which can be computed in linear time in the size of e.

Proposition 2

Given a program P with \(\mathcal {E}(x)\) defined for each variable x of P, if \(\mathsf{Dom}(\mathcal {E}(x))\ne \emptyset \), then P is x-\(\mathbf{UF}\).

Definition 5

(Distribution Types). Let \(\mathcal {T}=\{\mathsf{RUD},\mathsf{SID},\mathsf{SDD},\mathsf{UKD}\}\) be the set of distribution types, where for each variable \(x\in X\),

  • \(\mathcal {E}(x):\mathsf{RUD}\) meaning that the program is x-\(\mathbf{UF}\);

  • \(\mathcal {E}(x):\mathsf{SID}\) meaning that the program is x-\(\mathbf{SI}\);

  • \(\mathcal {E}(x):\mathsf{SDD}\) meaning that the program is not x-\(\mathbf{SI}\);

  • \(\mathcal {E}(x):\mathsf{UKD}\) meaning that the distribution type of x is unknown.

where \(\mathsf{RUD}\) is a subtype of \(\mathsf{SID}\) (cf. Proposition 1).

Fig. 2.
figure 2

Type inference rules, where \(\star \in \mathcal {O}, \ \circ \in \{\wedge ,\vee ,\odot ,\times \}\), \(\bullet \in \mathcal {O}^*\), \(\bowtie \in \{\wedge ,\vee \}\) and \(\diamond \in \{\oplus ,-\}\).

Type judgements, as usual, are defined in the form of \(\vdash e:\tau ,\) where e is an expression in terms of \(X_r\cup X_k\cup X_p\), and \(\tau \in \mathcal {T}\) denotes the distribution type of e. A type judgement \(\vdash e:\mathsf{RUD}\) (resp. \(\vdash e:\mathsf{SID}\) and \(\vdash e:\mathsf{SDD}\)) is valid iff P is x-\(\mathbf{UF}\) (resp. x-\(\mathbf{SI}\) and not x-\(\mathbf{SI}\)) for all variables x such that \(\mathcal {E}(x)=e\). A sound proof system for deriving valid type judgements is given in Fig. 2.

Rule (Dom) states that e containing some dominant variable has type \(\mathsf{RUD}\) (cf. Proposition 2). Rule (Com) captures the commutative law of operators \(\star \in \mathcal {O}\). Rules (Ide\(_i\)) for \(i=1,2,3,4\) are straightforward. Rule (NoKey) states that e has type \(\mathsf{SID}\) if e does not use any private input. Rule (Key) states that each private input has type \(\mathsf{SDD}\). Rule (Sid\(_{1}\)) states that \(e_1\circ e_2\) for \(\circ \in \{\wedge ,\vee ,\odot ,\times \}\) has type \(\mathsf{SID}\), if both \(e_1\) and \(e_2\) have type \(\mathsf{RUD}\), and \(e_1\) has a dominant variable r which is not used by \(e_2\). Indeed, \(e_1\circ e_2\) can be seen as \(r\circ e_2\), then for each valuation \(\eta \in \varTheta \), the distributions of r and \(e_2(\eta )\) are independent. Rule (Sid\(_{2}\)) states that \(e_1\bullet e_2\) for \(\bullet \in \mathcal {O}^*\) has type \(\mathsf{SID}\), if both \(e_1\) and \(e_2\) have type \(\mathsf{SID}\) (as well as its subtype \(\mathsf{RUD}\)), and the sets of random variables used by \(e_1\) and \(e_2\) are disjoint. Likewise, for each valuation \(\eta \in \varTheta \), the distributions on \(e_1(\eta )\) and \(e_2(\eta )\) are independent. Rule (Sdd) states that \(e_1\circ e_2\) for \(\circ \in \{\wedge ,\vee ,\odot ,\times \}\) has type \(\mathsf{SDD}\), if \(e_1\) has type \(\mathsf{SDD}\), \(e_2\) has type \(\mathsf{RUD}\), and \(e_2\) has a dominant variable r which is not used by \(e_1\). Intuitively, \(e_1\circ e_2\) can be safely seen as \(e_1\circ r\).

Finally, if no rule is applicable to an expression e, then e has unknown distribution type. Such a type is needed because our type system is—by design—incomplete. However, we expect—and demonstrate empirically—that for cryptographic programs, most internal variables have a definitive type other than \(\mathsf{UKD}\). As we will show later, to resolve \(\mathsf{UKD}\)-typed variables, one can resort to model-counting (cf. Sect. 3.2).

Theorem 1

If \(\vdash \mathcal {E}(x):\mathsf{RUD}\) (resp. \(\vdash \mathcal {E}(x):\mathsf{SID}\) and \(\vdash \mathcal {E}(x):\mathsf{SDD}\)) is valid, then P is x-\(\mathbf{UF}\) (resp. x-\(\mathbf{SI}\) and not x-\(\mathbf{SI}\)).

Example 2

Consider the program in Fig. 1, we have:

$$\begin{array}{llll} \vdash \mathcal {E}(x):\mathsf{RUD}; &{} \ \vdash \mathcal {E}(x_0):\mathsf{SID}; &{} \ \vdash \mathcal {E}(x_1):\mathsf{SID}; &{} \ \vdash \mathcal {E}(x_2):\mathsf{UKD}; \\ \vdash \mathcal {E}(x_3):\mathsf{UKD}; &{} \ \vdash \mathcal {E}(x_4):\mathsf{RUD}; &{} \ \vdash \mathcal {E}(x_5):\mathsf{RUD}; &{} \ \vdash \mathcal {E}(x_6):\mathsf{UKD}; \\ \vdash \mathcal {E}(x_7):\mathsf{RUD}; &{} \ \vdash \mathcal {E}(x_8):\mathsf{SID}; &{} \ \vdash \mathcal {E}(x_9):\mathsf{RUD}. \\ \end{array}$$

3.2 Model-Counting Based Reasoning

Recall that for \(x\in X_i\), .

To compute \(\mathtt{QMS}_{x}\), one naive approach is to use brute-force to enumerate all possible valuations \(\sigma \) and then to compute distributions again by enumerating the assignments of random variables. This approach is exponential in the number of (bits of) variables in \(\mathcal {E}(x)\).

Another approach is to lift the SMT-based approach [26, 27] from Boolean setting to the arithmetic one. We first consider a “decision” version of the problem, i.e., checking whether \(\mathtt{QMS}_x\ge q\) for a given rational number \(q\in [0, 1]\). It is not difficult to observe that this can be reduced to checking the satisfiability of the following logic formula:

(1)

where and respectively denote the number of satisfying assignments of and , \(\varDelta _x^q=(1-q)\times 2^m\), and m is the number of bits of random variables in \(\mathcal {E}(x)\).

We further encode (1) as a (quantifier-free) first-order formula \(\varPsi _x^q\) to be solved by an off-the-shelf SMT solver (e.g. Z3 [23]):

$$\begin{aligned} \varPsi _x^q := (\bigwedge \nolimits _{f:\mathsf{RVar}(\mathcal {E}(x))\rightarrow \mathbb {D}}(\varTheta _f\wedge \varTheta _f'))\wedge \varTheta _\mathtt{b2i}\wedge \varTheta _\mathtt{b2i}'\wedge \varTheta _\mathtt{diff}^q \end{aligned}$$

where

  • Program logic (\(\varTheta _f\) and \(\varTheta _f'\)): for every \(f:\mathsf{RVar}(\mathcal {E}(x))\rightarrow \mathbb {D}\), \(\varTheta _f\) encodes \(c_f=\mathcal {E}(x)\) into a logical formula with each occurrence of a random variable \(r\in \mathsf{RVar}(\mathcal {E}(x))\) being replaced by its value f(r), where \(c_f\) is a fresh variable. There are \(|\mathbb {D}|^{|\mathsf{RVar}(\mathcal {E}(x))|}\) distinct copies, but share the same \(X_p\) and \(X_k\). \(\varTheta _f'\) is similar to \(\varTheta _f\) except that all variables \(k\in X_k\) and \(c_f\) are replaced by fresh variables \(k'\) and \(c_f'\) respectively.

  • Boolean to integer (\(\varTheta _\mathtt{b2i}\) and \(\varTheta _\mathtt{b2i}'\)): \(\varTheta _\mathtt{b2i}:=\bigwedge _{f:\mathsf{RVar}(\mathcal {E}(x))\rightarrow \mathbb {D}} I_f= (c=c_f) \ ? \ 1: 0\). It asserts that for each \(f:\mathsf{RVar}(\mathcal {E}(x))\rightarrow \mathbb {D}\), a fresh integer variable \(I_f\) is 1 if \(c=c_f\), otherwise 0. \(\varTheta _\mathtt{b2i}'\) is similar to \(\varTheta _\mathtt{b2i}\) except that \(I_f\) and \(c_f\) are replaced by \(I_f'\) and \(c_f'\) respectively.

  • Different sums (\(\varTheta _\mathtt{diff}^q)\): \(\sum _{f:\mathsf{RVar}(\mathcal {E}(x))\rightarrow \mathbb {D}} I_f-\sum _{f:\mathsf{RVar}(\mathcal {E}(x))\rightarrow \mathbb {D}}I_f' >\varDelta _x^q\).

Theorem 2

\(\varPsi _x^q\) is unsatisfiable iff \(\mathtt{QMS}_x\ge q\), and the size of \(\varPsi _x^q\) is polynomial in |P| and exponential in \(|\mathsf{RVar}(\mathcal {E}(x))|\) and \(|\mathbb {D}|\).

Based on Theorem 2, we present an algorithm for computing \(\mathtt{QMS}_{x}\) in Sect. 4.2.

Note that the qualitative variant of \(\varPsi _x^q\) (i.e. \(q=1\)) can be used to decide whether x is statistically independent by checking whether \(\mathtt{QMS}_x=1\) holds. This will be used in Algorithm 1.

Example 3

By applying the model-counting based reasoning to the program in Fig. 1, we can conclude that \(x_6\) is perfectly masked, while \(x_2\) and \(x_3\) are leaky. This cannot be done by our type system or the ones in [3, 4]. To give a sample encoding, consider the variable \(x_3\) for \(q=\frac{1}{2}\) and \(\mathbb {D}=\{0,1,2,3\}\). We have that \(\varPsi _{x_3}^{\frac{1}{2}}\) is

figure b

3.3 Reduction Heuristics

In this section, we provide various heuristics to reduce the size of formulae. These can be both applied to type inference and model-counting based reasoning.

Ineffective Variable Elimination. A variable x is ineffective in an expression e if for all functions \(\sigma _1,\sigma _2:\mathsf{Var}(e)\rightarrow \mathbb {D}\) that agree on their values on the variables \(\mathsf{Var}(e)\setminus \{x\}\), e has same values under \(\sigma _1\) and \(\sigma _2\). Otherwise, we say x is effective in e. Clearly if x is ineffective in e, then e and e[c/x] are equivalent for any \(c\in \mathbb {D}\) while e[c/x] contains less variables, where e[c/x] is obtained from e by replacing all occurrences of x with c. Checking whether x is effective or not in e can be performed by a satisfiability checking of the logical formula: \(e[c/x]\ne e[c'/x]\). Obviously, \(e[c/x]\ne e[c'/x]\) is satisfiable iff x is effective in e.

Algebraic Laws. For every sub-expression \(e'\) of the form \(e_1\oplus e_1, e_2-e_2,e\circ 0\) or \(0\circ e\) with \(\circ \in \{\times ,\odot ,\wedge \}\) in the expression e, it is safe to replace \(e'\) by 0, namely, e and \(e[0/e']\) are equivalent. Note that the constant 0 is usually introduced by instantiating ineffective variables by 0 when eliminating ineffective variables.

Dominated Subexpression Elimination. Given an expression e, if \(e'\) is a r-dominated sub-expression in e and r does not occur in e elsewhere, then it is safe to replace each occurrence of \(e'\) in e by the random variable r. Intuitively, \(e'\) as a whole can be seen as a random variable when evaluating e. Besides this elimination, we also allow to add mete-theorems specifying forms of sub-expressions \(e'\) that can be replaced by a fresh variable. For instance, \(r\oplus ((2\times r)\wedge e'')\) in e, when the random variable r does not appear elsewhere, can be replaced by the random variable r.

Let \(\widehat{e}\) denote the expression obtained by applying the above heuristics on the expression e.

Transformation Oracle. We suppose there is an oracle \(\varOmega \) which, whenever possible, transforms an expression e into an equivalent expression \(\varOmega (e)\) such that the type inference can give a non-\(\mathsf{UKD}\) type to \(\varOmega (e)\).

Lemma 1

\(\mathcal {E}(x)(\sigma )\) and \(\widehat{\mathcal {E}(x)}(\sigma )\) have same distribution for any \(\sigma \in \varTheta \).

Example 4

Consider the variable \(x_6\) in the program in Fig. 1, \((k\oplus r_0)\) is \(r_0\)-dominated sub-expression in \(\mathcal {E}(x_6)=((k\oplus r_0)\odot (k\oplus r_0))\odot (k\oplus r_0)\), then, we can simplify \(\mathcal {E}(x_6)\) into \(\widehat{\mathcal {E}(x_6)}=r_0\odot r_0 \odot r_0\). Therefore, we can deduce that \(\vdash \mathcal {E}(x_6):\mathsf{SID}\) by applying the NoKey rule on \(\widehat{\mathcal {E}(x_6)}\).

4 Overall Algorithms

In this section, we present algorithms to check perfect masking and to compute the QMS values.

figure c

4.1 Perfect Masking Verification

Given a program P with the sets of public (\(X_p\)), secret (\(X_k\)), random (\(X_r\)) and internal \((X_i)\) variables, PMChecking, given in Algorithm 1, checks whether P is perfectly masked or not. It iteratively traverses all the internal variables. For each variable \(x\in X_i\), it first applies the type system to infer its distribution type. If \(\vdash \mathcal {E}(x):\tau \) for \(\tau \ne \mathsf{UKD}\) is valid, then the result is conclusive. Otherwise, we will simplify the expression \(\mathcal {E}(x)\) and apply the type inference to \(\widehat{\mathcal {E}(x)}\).

If it fails to resolve the type of x and \(\varOmega (\widehat{\mathcal {E}(x)})\) does not exist, we apply the model-counting based (SMT-based or brute-force) approach outlined in Sect. 3.2, in particular, to check the expression \(\widehat{\mathcal {E}(x)}\). There are two possible outcomes: either \(\widehat{\mathcal {E}(x)}\) is \(\mathsf{SID}\) or \(\mathsf{SDD}\). We enforce \(\mathcal {E}(x)\) to have the same distributional type as \(\widehat{\mathcal {E}(x)}\) which might facilitate the inference for other expressions.

Theorem 3

P is perfectly masked iff \(\vdash \mathcal {E}(x):\mathsf{SDD}\) is not valid for any \(x\in X_i\), when Algorithm 1 terminates.

We remark that, if the model-counting is disabled in Algorithm 1 where \(\mathsf{UKD}\)-typed variables are interpreted as potentially leaky, Algorithm 1 would degenerate to a type inference procedure that is fast and potentially more accurate than the one in [3], owing to the optimization introduced in Sect. 3.3.

4.2 QMS Computing

After applying Algorithm 1, each internal variable \(x\in X_i\) is endowed by a distributional type of either \(\mathsf{SID}\) (or \(\mathsf{RUD}\) which implies \(\mathsf{SID}\)) or \(\mathsf{SDD}\). In the former case, x is perfectly masked meaning observing x would gain nothing for side-channel attackers. In the latter case, however, x becomes a side-channel and it is natural to ask how many power traces are required to infer secret from x of which we have provided a measure formalized via QMS.

figure d

QMSComputing, given in Algorithm 2, computes \(\mathtt{QMS}_x\) for each \(x\in X_i\). It first invokes the function PMChecking for perfect masking verification. For \(\mathsf{SID}\)-typed variable \(x\in X_i\), we can directly infer that \(\mathtt{QMS}_x\) is 1. For each leaky variable \(x\in X_i\), we first check whether \(\widehat{\mathcal {E}(x)}\) uses any random variables or not. If it does not use any random variables, we directly deduce that \(\mathtt{QMS}_x\) is 0. Otherwise, we use either the brute-force enumeration or an SMT-based binary search to compute \(\mathtt{QMS}_x\). The former one is trivial, hence not presented in Algorithm 2. The latter one is based on the fact that \(\mathtt{QMS}_x=\frac{i}{2^{n\cdot |\mathsf{RVar}(\widehat{\mathcal {E}(x)})|}}\) for some integer \(0\le i\le 2^{n\cdot |\mathsf{RVar}(\widehat{\mathcal {E}(x)})|}\). Hence the while-loop in Algorithm 2 executes at most \(\mathbf{O}(n\cdot |\mathsf{RVar}(\widehat{\mathcal {E}(x)})|)\) times for each x.

Our SMT-based binary search for computing QMS values is different from the one proposed by Eldib et al. [26, 27]. Their algorithm considers Boolean programs only and computes QMS values by directly binary searching the QMS value q between 0 to 1 with a pre-defined step size \(\epsilon \) (\(\epsilon =0.01\) in [26, 27]). Hence, it only approximate the actual QMS value and the binary search iterates \(\mathbf{O}(\log (\frac{1}{\epsilon }))\) times for each internal variable. Our approach works for more general arithmetic programs and computes the accurate QMS value.

5 Practical Evaluation

We have implemented our methods in a tool named QMVerif, which uses Z3 [23] as the underlying SMT solver (fixed size bit-vector theory). We conducted experiments perfect masking verification and QMS computing on both Boolean and arithmetic programs. Our experiments were conducted on a server with 64-bit Ubuntu 16.04.4 LTS, Intel Xeon CPU E5-2690 v4, and 256 GB RAM.

5.1 Experimental Results on Boolean Programs

We use the benchmarks from the publicly available cryptographic software implementations of [25], which consists of 17 Boolean programs (P1-P17). We conducted experiments on P12-P17, which are the regenerations of MAC-Keccak reference code submitted to the SHA-3 competition held by NIST. (We skipped tiny examples P1-P11 which can be verified in less than 1 second.) P12-P17 are transformed into programs in straight-line forms.

Perfect Masking Verification. Table 1 shows the results of perfect masking verification on P12-P17, where Columns 2–4 show basic statistics, in particular, they respectively give the number of internal variables, leaky internal variables, and internal variables which required model-counting based reasoning. Columns 5–6 respectively show the total time of our tool QMVerif using SMT-based and brute-force methods. Column 7 shows the total time of the tool SCInfer [50].

Table 1. Results on masked Boolean programs for perfect masking verification.

We can observe that: (1) our reduction heuristics significantly improve performance compared with SCInfer [50] (generally 22–104 times faster for imperfect masked programs; note that SCInfer is based on SMT model-counting), and (2) the performance of the SMT-based and brute-force counting methods for verifying perfect masking of Boolean programs is largely comparable.

Computing QMS. For comparison purposes, we implemented the algorithm of [24, 25] for computing QMS values of leaky internal variables. Table 2 shows the results of computing QMS values on P13-P17 (P12 is excluded because it does not contain any leaky internal variable), where Column 2 shows the number of leaky internal variables, Columns 3–7 show the total number of iterations in the binary search (cf. Sect. 4.2), time, the minimal, maximal and average of QMS values using the algorithm from [24, 25]. Similarly, Columns 8–13 shows statistics of our tool QMVerif, in particular, Column 9 (resp. Column 10) shows the time of using SMT-based (resp. brute-force) method. The time reported in Table 2 excludes the time used for perfect masking checking.

Table 2. Results of masked Boolean programs for computing QMS Values.
Table 3. Results of masked arithmetic programs, where P.M.V. denotes perfect masking verification, B.F. denotes brute-force, 12 S.F. denotes that Z3 emits segmentation fault after verifying 12 internal variables.

We can observe that (1) the brute-force method outperforms the SMT-based method for computing QMS values, and (2) our tool QMVerif using SMT-based methods takes significant less iterations and time, as our binary search step depends on the number of bits of random variables, but not a pre-defined value (e.g. 0.01) as used in [24, 25]. In particular, the QMS values of leaky variables whose expressions contain no random variables, e.g., P13 and P17, do not need binary search.

5.2 Experimental Results on Arithmetic Programs

We collect arithmetic programs which represent non-linear functions of masked cryptographic software implementations from the literature. In Table 3, Column 1 lists the name of the functions under consideration, where \(k^{3},\ldots , k^{254}\) are buggy fragments of first-order secure exponentiation [47] without the first RefreshMask function; A2B and B2A are shorthand for ArithmeticToBoolean and BooleanToArithmetic, respectively. Columns 2–4 show basic statistics. For all the experiments, we set \(\mathbb {D}=\{0,\cdots ,2^8-1\}\).

Perfect Masking Verification. Columns 5–6 in Table 3 show the results of perfect masking verification on these programs using SMT-based and brute-force methods respectively.

We observe that (1) some \(\mathsf{UKD}\)-typed variables (e.g., in B2A [30], B2A [18] and Sbox [48], meaning that the type inference is inconclusive in these cases) are resolved (as \(\mathsf{SID}\)-type) by model-counting, and (2) on the programs (except B2A [18]) where model-counting based reasoning is required (i.e., \(\sharp \)Count is non-zero), the brute-force method is significantly faster than the SMT-based method. In particular, for programs \(k^{15},\ldots , k^{254}\), Z3 crashed with segment fault after verifying 12 internal variables in 93 min, while the brute-force method comfortably returns the result. To further explain the performance of these two classes of methods, we manually examine these programs and find that the expressions of the \(\mathsf{UKD}\)-typed variable (using type inference) in B2A [18] (where the SMT-based method is faster) only use exclusive-or (\(\oplus \)) operations and one subtraction (−) operation, while the expressions of the other \(\mathsf{UKD}\)-typed variables (where the brute-force method is faster) involve the finite field multiplication (\(\odot \)).

We remark that the transformation oracle and meta-theorems (cf. Sect. 3.3) are only used for A2B [30] by manually utilizing the equations of Theorem 3 in [30]. We have verified the correctness of those equations by SMT solvers. In theory model-counting based reasoning could verify A2B [30]. However, in our experiments both SMT-based and brute-force methods failed to terminate in 3 days, though brute-force methods had verified more internal variables. For instance, on the expression \(((2\times r_1) \oplus (x - r) \oplus r_1) \wedge r\) where x is a private input and \(r,r_1\) are random variables, Z3 cannot terminate in 2 days, while brute-force methods successfully verified in a few minutes. We also tested the SMT solver Boolector [39] (the winner of SMT-COMP 2018 on QF-BV, Main Track), which crashed with being out of memory. Undoubtedly more systematic experiments are required in the future, but our results suggest that, contrary to the common belief, currently SMT-based approaches are not promising, which calls for more scalable techniques.

Computing QMS. Columns 7–9 in Table 3 show the results of computing QMS values, where Column 7 (resp. Column 8) shows the time of the SMT-based (resp. brute-force) method for computing QMS values (excluding the time for perfect masking verification) and Column 9 shows QMS values of all leaky variables (note that duplicated values are omitted).

6 Conclusion

We have proposed a hybrid approach combing type inference and model-counting to verify masked arithmetic programs against first-order side-channel attacks. The type inference allows an efficient, lightweight procedure to determine most observable variables whereas model-counting accounts for completeness, bringing the best of two worlds. We also provided model-counting based methods to quantify the amount of information leakage via side channels. We have presented the tool support QMVerif which has been evaluated on standard cryptographic benchmarks. The experimental results showed that our method significantly outperformed state-of-the-art techniques in terms of both accuracy and scalability.

Future work includes further improving SMT-based model counting techniques which currently provide no better, if not worse, performance than the naïve brutal-force approach. Furthermore, generalizing the work in the current paper to the verification of higher-order masking schemes remains to be a very challenging task.