Quantitative Verification of Masked Arithmetic Programs against Side-Channel Attacks

Power side-channel attacks, which can deduce secret data via statistical analysis, have become a serious threat. Masking is an effective countermeasure for reducing the statistical dependence between secret data and side-channel information. However, designing masking algorithms is an error-prone process. In this paper, we propose a hybrid approach combing type inference and model-counting to verify masked arithmetic programs against side-channel attacks. The type inference allows an efficient, lightweight procedure to determine most observable variables whereas model-counting accounts for completeness. In case that the program is not perfectly masked, we also provide a method to quantify the security level of the program. We implement our methods in a tool QMVerif and evaluate it on cryptographic benchmarks. The experimental results show the effectiveness and efficiency of our approach.


Introduction
Side-channel attacks aim to infer secret data (e.g. cryptographic keys) by exploiting statistical dependence between secret data and non-functional properties such as execution time [33], power consumption [34], and electromagnetic radiation [46]. They have become a serious threat in application domains such as cyber-physical systems. As a typical example, the power consumption of a device executing the instruction c = p ⊕ k usually depends on the secret k, and this can be exploited via differential power analysis (DPA) [37] to deduce k.
Masking is one of the most widely-used and effective countermeasure to thwart side-channel attacks. Masking is essentially a randomization technique for reducing the statistical dependence between secret data and side-channel information (e.g. power consumption). For example, using Boolean masking scheme, one can mask the secret data k by applying the exclusive-or (⊕) operation with a random variable r, yielding a masked secret data k ⊕ r. It can be readily verified that the distribution of k ⊕ r is independent of the value of k when r is uniformly distributed. Besides Boolean masking scheme, there are other masking schemes such as additive masking schemes (e.g. (k + r) mod n) and multiplicative masking schemes (e.g. (k × r) mod n). A variety of masking implementations such as AES and its non-linear components (S-boxes) have been published over the years. However, designing effective and efficient masking schemes is still a notoriously difficult task, especially for non-linear functions. This has motivated a large amount of work on verifying whether masked implementations, as either (hardware) circuits or (software) programs, are statistically independent of secret inputs. Typically, masked hardware implementations are modeled as (probabilistic) Boolean programs where all variables range over the Boolean domain (i.e. GF(2)), while masked software implementations, featuring a richer set of operations, require to be modeled as (probabilistic) arithmetic programs.
Verification techniques for masking schemes can be roughly classified into type system based approaches [38,3,4,5,16,14,19] and model-counting based approaches [25,24,50]. The basic idea of type system based approaches is to infer a distribution type for observable variables in the program that are potentially exposed to attackers. From the type information one may be able to show that the program is secure. This class of approaches is generally very efficient mainly because of their static analysis nature. However, they may give inconclusive answers as most existing type systems do not provide completeness guarantees.
Model-counting based approaches, unsurprisingly, encode the verification problem as a series of model-counting problems, and typically leverage SAT/SMT solvers. The main advantage of this approach is its completeness guarantees. However, the size of the SMT formula is exponential in the number of (bits of) random variables used in masking, hence the approach poses great challenges to its scalability. We mention that, within this category, some work further exploits Fourier analysis [11,15], which considers the Fourier expansion of the Boolean functions. The verification problem can then be reduced to checking whether certain coefficients of the Fourier expansion are zero or not. Although there is no hurdle in principle, to our best knowledge, currently model-counting based approaches are limited to Boolean programs only.
While verification of masking for Boolean programs is well-studied [25,24,50], generalizing them to arithmetic programs brings additional challenges. First of all, arithmetic programs admit more operations which are absent from Boolean programs. A typical example is field multiplication. In the Boolean domain, it is nothing more than the logical AND operator. However for GF(2 n ) (typically n = 8 in cryptographic algorithm implementations), the operation is nontrivial which prohibits many optimization which would otherwise be useful for Boolean domains. Second, verification of arithmetic programs often suffers from serious scalability issues, especially when the model-counting based approaches are applied. We note that transforming arithmetic programs into equivalent Boolean versions is theoretically possible, but suffer from several deficiencies: (1) one has to encode complicated arithmetic operations (e.g. finite field multiplication) as bitwise operations; (2) the resulting Boolean program needs to be checked against high-order attacks which are supposed to observe multiple observations simultaneously. This is a far more difficult problem. Because of this, we believe such an approach is practically unfavourable, if not infeasible.
Perfect masking is ideal but not necessarily holds when there are flaws or only a limited number of random variables are allowed for efficiency consideration. In case that the program is not perfectly masked (i.e., a potential side channel does exist), naturally one wants to tell how severe it is. For instance, one possible measure is the resource the attacker needs to invest in order to infer the secret from the side channel. For this purpose, we adapt the notion of Quantitative Masking Strength, with which a correlation of the number of power traces to successfully infer secret data has been established empirically [26,27]. Main contributions. We mainly focus on the verification of masked arithmetic programs. We advocate a hybrid verification method combining type system based and model-counting based approaches, and provide additional quantitative analysis. We summarize the main contributions as follows.
-We provide a hybrid approach which integrates type system based and model-counting based approaches into a framework, and support a sound and complete reasoning of masked arithmetic programs. -We provide quantitative analysis in case when the masking is not effective, to calculate a quantitative measure of the information leakage. -We provide various heuristics and optimized algorithms to significantly improve the scalability of previous approaches. -We implement our approaches in a software tool and provide thorough evaluations. Our experiments show orders of magnitude of improvement with respect to previous verification methods on common benchmarks.
One of the advantages of our approaches is the simplicity which renders them amenable for implementations and easily extensible to other settings. We also find, perhaps surprisingly, that for model-counting, the widely adopted approaches based on SMT solvers (e.g. [25,24,50]) may not be the best approach, as our experiments suggest that an alternative brute-force approach is comparable for Boolean programs, and significantly outperforms for arithmetic programs. Related work. The d-threshold probing model is the de facto standard leakage model for formal verification of masked programs against order-d power sidechannel attacks [32]. This paper focuses on the case that d = 1. Other models like noise leakage model [17,45], bounded moment model [6], and threshold probing model with transitions/glitch [20,15] could be reduced to the threshold probing model, at the cost of introducing higher orders [3]. Other work on side channels such as execution-time, faults, and cache do exist ( [33,1,2,12,28,7,8,31] to cite a few), but is orthogonal to our work.
Type systems have been widely used in the verification of side channel attacks with early work [38,9], where masking compilers are provided which can transform an input program into a functionally equivalent program that is resistant to first-order DPA. However, these systems either are limited to certain operations (i.e., ⊕ and table look-up), or suffer from unsoundness and incompleteness under the threshold probing model. To support verification of highorder masking, Barthe et al. introduced the notion of noninterference (NI, [3]), and strong t-noninterference (SNI, [4]), which were extended to give a unified framework for both software and hardware implementations in maskVerif [5]. Further work along this line includes improvements for efficiency [14,19], generalization for assembly-level code [15], and extensions with glitches for hardware programs [29]. As mentioned earlier, these approaches are incomplete, i.e., secure programs may fail to pass their verification. [25,24] proposed a model-counting based approach for Boolean programs by leveraging SMT solvers, which is complete but limited in scalability. To improve efficiency, a hybrid approach integrating type-based and model-counting based approaches [25,24] was proposed in [50], which is similar to the current work in spirit. However, it is limited to Boolean programs and qualitative analysis only. [26,27] extended the approach of [25,24] for quantitative analysis, but is limited to Boolean programs. The current work not only extends the applicability but also achieves significant improvement in efficiency even for Boolean programs (cf. Section 5). We also find that solving model-counting via SMT solvers [24,50] may not be the best approach, in particular for arithmetic programs.
Furthermore, we mention that masking synthesis is recently proposed [28,4] to transform an input program into a functionally equivalent, perfectly masked one. This technique is based on the perfect masking verification [25,24,3].
Our work is also related to quantitative information flow (QIF) [35,44,49,43,13] which leverages notions from information theory (typically Shannon entropy and mutual information) to measure the flow of information in programs. The QIF framework has also been specialized to side-channel analysis [42,41,36]. The main differences are, first of all, QIF targets fully-fledged programs (including branching and loops) so program analysis techniques (e.g. symbolic execution) are needed, while we deal with more specialized (transformed) masked programs in straight-line forms; second, to measure the information leakage quantitatively, our measure is based on the notion of QMS which is correlated with the number of power traces needed to successfully infer the secret, while QIF is based on a more general sense of information theory; third, for calculating such a measure, both work rely on model-counting. In QIF, the constraints over the input are usually linear, but the constraints in our setting involve arithmetic operations in rings and fields. Randomized approximate schemes can be exploited in QIF [36,13] which is not suitable in our setting. Moreover, we mention that in QIF, input variables should in principle be partitioned into public and private variables, and the former of which needs to be existentially quantified. This was briefly mentioned in, e.g., [36] but without implementation.

Preliminaries
Let us fix a bounded integer domain D = {0, · · · , 2 n − 1}, where n is a fixed positive integer. Bit-wise operations are defined over D, but we shall also consider arithmetic operations over D which include +, −, × modulo 2 n for which D is consider to be a ring and the Galois field multiplication ⊙ where D is isomorphic to GF(2)[x]/(p(x)) (or simply GF(2 n )) for some irreducible polynomial p. For instance, in AES one normally uses GF(2 8 ) and p(x) = x 8 + x 4 + x 3 + x 2 + 1.

Cryptographic Programs
We focus on programs written in C-like code that implement cryptographic algorithms such as AES, as opposed to arbitrary software programs. To analyze such programs, it is common to assume that they are given in straight-line forms (i.e., branching-free) over D [24,3]. The syntax of the program under consideration is given as follows, where c ∈ D. Operation: stmt ::= x ← e | stmt; stmt Program: P (X p , X k , X r ) ::= stmt; return x 1 , ..., x m ; A program P consists of a sequence of assignments followed by a return statement. An assignment x ← e assigns the value of the expression e to the variable x, where e is built up from a set of variables and constants using (1) bit-wise operations negation (¬), and (∧), or (∨), exclusive-or (⊕), left shift ≪ and right shift ≫; (2) modulo 2 n arithmetic operations: addition (+), subtraction (−), multiplication (×); and (3) finite-field multiplication (⊙) (over GF(2 n )) 3 . We denote by O * the extended set O ∪ {≪, ≫} of operations. Given a program P , let X = X p ⊎ X k ⊎ X i ⊎ X r denote the set of variables used in P , where X p , X k and X i respectively denote the set of public input, private input and internal variables, and X r denotes the set of (uniformly distributed) random variables for masking private variables. We assume that the program is given in the single static assignment (SSA) form and each expression uses at most one operator. (One can easily transform an arbitrary straight-line program into an equivalent one satisfying these conditions.) For each assignment x ← e in P , the computation E(x) of x is an expression obtained from e by iteratively replacing all the occurrences of the internal variables in e by their defining expressions in P . SSA form guarantees that E(x) is well-defined.

Semantics. A valuation is a function
Given an expression e in terms of X p ∪ X k ∪ X r and a valuation σ ∈ Θ, we denote by e(σ) the expression obtained from e by replacing all the occurrences of variables x ∈ X p ∪ X k by their values σ(x), and denote by e σ the distribution of e (with respect to the uniform distribution of random variables e(σ) may contain). Concretely, e σ (v) is the probability of the expression e(σ) being evaluated to v for each v ∈ D. For each variable x ∈ X and valuation σ ∈ Θ, we denote by x σ the distribution E(x) σ . The semantics of the program P is defined as a (partial) function P which takes a valuation σ ∈ Θ and an internal variable x ∈ X i as inputs, returns the distribution x σ of x.

Threat Models and Security Notions
We assume that the adversary has access to public input X p , but not to private input X k or random variables X r , of a program P . However, the adversary may have access to an internal variable x ∈ X i via side-channel information. Under these assumptions, the goal of the adversary is to deduce the information of X k .
The following property is straightforward. Note that its converse does not hold in general.
return (x7, x9) ; 14 } Fig. 1. A buggy version of the cubing algorithm from [47] Proposition 1. If the program P is x-UF, then P is x-SI.
Definition 2. For a program P , a variable x is perfectly masked (a.k.a. secure under 1-threshold probing model [32] P is perfectly masked if all internal variables in P are perfectly masked.

Quantitative Masking Strength
When a program is not perfectly masked, it is important to quantify how secure it is. For this purpose, we adapt the notion of Quantitative Masking Strength (QMS) from [26,27] to quantify the strength of masking countermeasures.
. Accordingly, the quantitative masking strength of the program P is defined by The notion of QMS generalizes that of perfect masking, i.e., P is x-SI iff QMS x = 1. The importance of QMS has been highlighted in [26,27] where it is empirically shown that, for Boolean programs the number of power traces needed to determine the secret key is exponential in the QMS value. This study suggests that computing accurate QMS values for leaky variables is highly desirable. Example 1. Let us consider the program in Fig. 1, which implements a buggy cubing algorithm in GF(2 8 ) from [47]. Given a secret key k, to avoid first-order side-channel attacks, k is masked by a random variable r 0 leading to two shares x = k⊕r 0 and r 0 . Cube(k, r 0 , r 1 ) returns two shares x 7 and x 9 such that x 7 ⊕x 9 = Then, it computes k 3 by a secure multiplication of two pairs of shares (x 0 , x 1 ) and (x, r 0 ) using the random variable r 1 (Lines 5-12). However, this program is vulnerable to first-order side-channel attacks. As shown in [47], we shall refresh (x 0 , x 1 ) before computing k 2 ⊙ k by inserting x 0 = x 0 ⊕ r 2 and x 1 = x 1 ⊕ r 2 after Line 4, where r 2 is a random variable. We use this buggy version as a running example to illustrate our techniques.
As setup, we have: The computations E(·) of internal variables are:

Three Key Techniques
In this section, we introduce three key techniques: type system, model-counting based reasoning and reduction techniques, which will be used in our algorithm.

Type System
We present a type system for formally inferring distribution types of internal variables, inspired by prior work [40,3,14,50]. We start with some basic notations.

Definition 4 (Dominant variables).
Given an expression e, a random variable r is called a dominant variable of e if the following two conditions hold: (i) r occurs in e exactly once, and (ii) each operator on the path between the leaf r and the root in the abstract syntax tree of e satisfies that it is either from {×, ⊙} and one of its children is a non-zero constant or from {⊕, ¬, +, −}.
Remark that in Definition 4, for efficiency consideration, we take a purely syntactic approach meaning that we do not simplify e when checking the condition (i) that r occurs exactly once. For instance, x is not a dominant variable in ((x ⊕ y) ⊕ x) ⊕ x, although intuitively e is equivalent to y ⊕ x.
Given where RUD is a subtype of SID (cf. Proposition 1).
Type judgements, as usual, are defined in the form of ⊢ e : τ, where e is an expression in terms of X r ∪ X k ∪ X p , and τ ∈ T denotes the distribution type of e. A type judgement ⊢ e : RUD (resp. ⊢ e : SID and ⊢ e : SDD) is valid iff P is x-UF (resp. x-SI and not x-SI) for all variables x such that E(x) = e. A No rule is appliable to e ⊢ e : UKD (Ukd) sound proof system for deriving valid type judgements for expressions is given in Fig. 2. Rule (Dom) states that expression e containing some dominant variable has type RUD (cf. Proposition 2). Rule (Com) captures the commutative law of operators ⋆ ∈ O. Rules (Ide i ) for i = 1, 2, 3, 4 are straightforward.
Rule (NoKey) states that expression e has type SID if e does not use any private input. Rule (Key) states that each private input has type SDD.
Rule (Sid 1 ) states that expression e 1 • e 2 for • ∈ {∧, ∨, ⊙, ×} has type SID, if both e 1 and e 2 have type RUD, and e 1 has a dominant variable r which is not used by e 2 . Indeed, e 1 • e 2 can be seen as r • e 2 , then for each valuation η ∈ Θ, the distributions of r and e 2 (η) are independent. Rule (Sid 2 ) states that expression e 1 • e 2 for • ∈ O * has type SID, if both e 1 and e 2 have type SID (as well as its subtype RUD), and the sets of random variables used by e 1 and e 2 are disjoint. Likewise, for each valuation η ∈ Θ, the distributions on e 1 (η) and e 2 (η) are independent.
Rule (Sdd) states that expression e 1 • e 2 for • ∈ {∧, ∨, ⊙, ×} has type SDD, if e 1 has type SDD, e 2 has type RUD, and e 2 has a dominant variable r which is not used by e 1 . Intuitively, e 1 • e 2 can be safely seen as e 1 • r.
Finally, if no rule is applicable to an expression e, then e has unknown distribution type. Such a type is needed because our type system is-by designincomplete. However, we expect-and demonstrate empirically-that for cryptographic programs, most internal variables have a definitive type other than UKD. As we will show later, to resolve UKD-typed variables, one can resort to model-counting (cf. Section 3.2).

Model-Counting based Reasoning
Recall that for x ∈ X i , QMS x := 1 − max (σ1,σ2)∈Θ 2 Xp ,c∈D ( x σ1 (c) − x σ2 (c)). To compute QMS x , one naïve approach is to use brute-force to enumerate all possible valuations σ and then to compute distributions x σ again by enumerating the assignments of random variables. This approach is exponential in the number of (bits of) variables in E(x).
Another approach is to lift the SMT-based approach [26,27] from Boolean setting to the arithmetic one. We first consider a "decision" version of the problem, i.e., checking whether QMS x ≥ q for a given rational number q ∈ [0, 1]. It is not difficult to observe that this can be reduced to checking the satisfiability of the following logic formula: where ♯(c = x σ1 ) and ♯(c = x σ2 ) respectively denote the number of satisfying assignments of c = x σ1 and c = x σ2 , ∆ q x = (1 − q) × 2 m , and m is the number of bits of random variables in E(x).
We further encode (1) as a (quantifier-free) first-order formula Ψ q x to be solved by an off-the-shelf SMT solver (e.g. Z3 [23]): into a logical formula with each occurrence of a random variable r ∈ RVar(E(x)) being replaced by its value f (r), where c f is a fresh variable. There are |D| |RVar(E(x))| distinct copies, but share the same X p and X k . Θ ′ f is similar to Θ f except that all variables k ∈ X k and c f are replaced by fresh variables k ′ and c ′ f respectively.
x is unsatisfiable iff QMS x ≥ q, and the size of Ψ q x is polynomial in |P | and exponential in |RVar(E(x))| and |D|.
Based on Theorem 2, we present an algorithm for computing QMS x in Section 4.2. Note that the qualitative variant of Ψ q x (i.e. q = 1) can be used to decide whether x is statistically independent by checking whether QMS x = 1 holds. This will be used in Algorithm 1.
Example 3. By applying the model-counting based reasoning to the program in Fig. 1, we can conclude that x 6 is perfectly masked, while x 2 and x 3 are leaky. This cannot be done by our type system or the ones in [3,4].

Reduction Heuristics
In this section, we provide various heuristics to reduce the size of formulae. These can be both applied to type inference and model-counting based reasoning. Given an expression e, if e ′ is an r-dominated sub-expression in e and r does not occur in e elsewhere, then it is safe to replace each occurrence of e ′ in e by the random variable r. Intuitively, e ′ as a whole can be seen as a random variable when evaluating e. Besides this elimination, we also allow to add mete-theorems specifying forms of subexpressions e ′ that can be replaced by a fresh variable. For instance, r ⊕ ((2 × r) ∧ e ′′ ) in e, when the random variable r does not appear elsewhere, can be replaced by the random variable r. Transformation Oracle. We suppose there is an oracle Ω which, whenever possible, transforms an expression e into an equivalent expression Ω(e) such that type inference (may with above heuristics) can give a non-UKD type to Ω(e). Such a transformation is required only in one program in our experiments.
Let e denote the expression obtained by applying the above heuristics (excluding transformation oracle) on the expression e. Lemma 1. E(x)(σ) and E(x)(σ) have same distribution for any σ ∈ Θ.

Perfect Masking Verification
Given a program P with the sets of public (X p ), secret (X k ), random (X r ) and internal (X i ) variables, PMChecking, given in Algorithm 1, checks whether P is perfectly masked or not. It iteratively traverses all the internal variables. For each variable x ∈ X i , it first applies the type system to infer its distribution type. If ⊢ E(x) : τ for τ = UKD is valid, then the result is conclusive. Otherwise, we will simplify the expression E(x) and apply the type inference to E(x).
If it fails to resolve the type of x and O( E(x)) does not exist, we apply the model-counting based (SMT-based or brute-force) method outlined in Section 3.2, in particular, to check the expression E(x). There are two possible outcomes: either E(x) is SID or SDD. We enforce E(x) to have the same distributional type as E(x) which might facilitate the inference for other expressions.
Theorem 3. P is perfectly masked iff ⊢ E(x) : SDD is not valid for any x ∈ X i , when Algorithm 1 terminates.
We remark that, if the model-counting is disabled in Algorithm 1 where UKD-typed variables are interpreted as potentially leaky, Algorithm 1 would degenerate to a sound type inference procedure that is fast and potentially more accurate than the one in [3], owing to the optimization introduced in Section 3.3.

QMS Computing
After applying Algorithm 1, each internal variable x ∈ X i is endowed by a distributional type of either SID (or RUD which implies SID) or SDD. In the former case, x is perfectly masked meaning observing x would gain nothing for side-channel attackers. In the latter case, however, x becomes a side-channel and it is natural to ask how many power traces are required to infer secret data from x of which we have provided a measure formalized via QMS.
QMSComputing, given in Algorithm 2, computes QMS x for each x ∈ X i . It first invokes the function PMChecking for perfect masking verification. For each SID-typed variable x ∈ X i , we can directly infer that QMS x is 1. For each leaky variable x ∈ X i , we first check whether E(x) uses any random variables or not. If it does not use any random variables, we directly deduce that QMS x is 0. Otherwise, we use either the brute-force enumeration or an SMT-based binary search to compute QMS x . The former one is trivial, hence not presented in Algorithm 2. The latter one is based on the fact that QMS x = i 2 n×|RVar( E(x))| for some integer 0 ≤ i ≤ 2 n×|RVar( E(x))| . Hence the while-loop in Algorithm 2 executes at most O(n × |RVar( E(x))|) times for each x.
Our SMT-based binary search for computing QMS values is different from the one proposed by Eldib et al. [26,27]. Their algorithm considers Boolean programs only and computes QMS values by directly binary searching the QMS value q between 0 to 1 with a pre-defined step size ǫ (ǫ = 0.01 in [26,27]). Hence, it only approximate the actual QMS value and the binary search iterates O(log( 1 ǫ )) times for each internal variable. Our approach works for more general arithmetic programs and computes the accurate QMS value.

Practical Evaluation
We have implemented our methods in a tool named QMVerif, which uses Z3 [23] as the underlying SMT solver (fixed size bit-vector theory). We conduct experiments of perfect masking verification and QMS computing on both Boolean and arithmetic programs. Our experiments are conducted on a server with 64-bit Ubuntu 16.04.4 LTS, Intel Xeon CPU E5-2690 v4, and 256GB RAM.

Experimental Results on Boolean Programs
We use the benchmarks from the publicly available cryptographic software implementations [25], which consist of 17 Boolean programs (P1-P17). We conducted experiments on P12-P17, which are the regenerations of MAC-Keccak reference Perfect masking verification. Table 1 shows the results of the perfect masking verification on P12-P17, where Columns 2-4 show basic statistics, in particular, they give the number of internal variables, leaky internal variables, and internal variables which require model-counting based reasoning, respectively. Columns 5-6 respectively show the total time of our tool QMVerif using SMT-based and brute-force methods. Column 7 shows the total time of the state-of-the-art tool SCInfer [50]. We observe that: (1) our reduction heuristics significantly improve the performance compared with SCInfer [50] (generally 16-69 times faster for imperfectly masked programs; note that SCInfer is based on SMT model-counting), and (2) the performance of the SMT-based and brute-force methods in our QMVerif for verifying perfect masking of Boolean programs is largely leveled.
Computing QMS. For comparison purposes, we implemented the algorithm of [25,24] for computing QMS values of leaky internal variables. Table 2 shows the results of computing QMS values on P13-P17 (P12 is excluded because it does not contain any leaky internal variable), where Column 2 shows the number of leaky internal variables, Columns 3-7 show the total number of iterations in the binary search (cf. Section 4.2), time, the minimal, maximal and average of QMS values using the algorithm from [25,24]. Similarly, Columns 8-13 show statistics of our tool QMVerif, in particular, Column 9 (resp. Column 10) shows the time of using SMT-based (resp. brute-force) methods. Note that all the time reported in Table 2 excludes the time used for perfect masking checking.
We observe that (1) the brute-force method outperforms the SMT-based one significantly, and (2) our tool QMVerif using the SMT-based method takes significant less iterations and time, as our binary search step depends on the number of bits of random variables, but not a pre-defined value (e.g. 0.01) as used in [25,24]. In particular, the QMS values of leaky variables whose expressions contain no random variables (e.g. P13 and P17), do not need the binary search.

Experimental Results on Arithmetic Programs
We collect arithmetic programs which represent non-linear functions of masked cryptographic software implementations from literature. In Table 3 Perfect masking verification. Columns 5-6 in Table 3 show the results of the perfect masking verification on the programs using SMT-based and brute-force methods respectively. We observe that: (1) some UKD-typed variables (e.g. in B2A [30], B2A [18] and Sbox [48], meaning that the type inference is inconclusive in these cases) can be resolved by model-counting (resulting in SID-type), and (2) on the programs (except B2A [18]) where the model-counting based reasoning is required (i.e., ♯Count is non-zero), the brute-force method is significantly faster than the SMT-based one. In particular, for programs k 15 , . . . , k 254 , Z3 crashed with segmentation fault after verifying 12 internal variables in 93m, while the brute-force method comfortably returns the result. To further explain the performance of these two classes of methods, we manually examine these programs and find out that the expressions of the UKD-typed variable in B2A [47] (where the SMTbased method is faster) only use exclusive-or (⊕) operations and one subtraction (−) operation, while the expressions of the other UKD-typed variables (where the brute-force method is faster) involve finite field multiplication (⊙).
We remark that transformation oracle and meta-theorems are only used for A2B [30]. Theoretically, model-counting based reasoning could verify A2B [30]. However, in our experiments both SMT-based and brute-force methods failed to terminate in 3 days, though the brute-force method had verified more internal variables. For instance, on the expression ((2 × r 1 ) ⊕ (x − r) ⊕ r 1 ) ∧ r where x is a private input and r, r 1 are random variables, Z3 could not terminate in 2 days, while the brute-force method successfully verified in a few minutes. We also tested the SMT solver Boolector [39] (the winner of SMT-COMP 2018 on QF-BV, Main Track), which failed to terminate in 3 days. Undoubtedly more systematic experiments are required in the future, but our results suggest that, contrary to the common belief, currently SMT-based approaches are not promising, which calls for more scalable techniques. Computing QMS. Columns 7-9 in Table 3 show the results of computing QMS values of leaky variables, where Column 7 (resp. Column 8) shows the time of the SMT-based (resp. brute-force) method for computing QMS values (excluding the time for perfect masking checking) and Column 9 shows the QMS values of all leaky variables (note that duplicated values are omitted).
We observe that: (1) the brute-force method can quickly compute the QMS values of the leaky variables in Sbox [47], k 3 and k 12 , but takes roughly 64 hours on the other programs, (2) surprisingly, the SMT-based method is only able to compute the QMS value of the leaky variable in Sbox [47], but fails for the others after 4 days. Indeed, Z3 cannot even finish the first iteration of the binary search on the smallest formula in 4 days. This, again, indicates the ineffectiveness of current SMT-based approaches. We manually examine k 3 , ..., k 254 programs and find out that (1) variables used in the computations E(x) of leaky variables x are the same, and (2) the computations that can be quickly verified contain at most 4 operations, while the others contain at least 19 operations.

Conclusion
We have proposed a hybrid approach combing type inference and model-counting to verify masked arithmetic programs against first-order side-channel attacks. The type inference allows an efficient, lightweight procedure to determine most observable variables whereas model-counting accounts for completeness, bringing the best of two worlds. We also provided model-counting based methods to quantify the amount of information leakage via side channels. We have presented the tool support QMVerif which has been evaluated on standard cryptographic benchmarks. The experimental results showed that our method significantly outperformed state-of-the-art techniques in terms of both accuracy and scalability.
Future work includes further improving SMT based model-counting techniques which currently provide no better, if not worse, performance than the naïve brute-force method. Furthermore, generalizing the work in the current paper to verification of higher-order masking schemes remains to be a very challenging task.