1 Introduction

Side-channel attacks were first introduced by Kocher et al. [10] as a way to attack implementations of cryptosystems. They exploit the relation between data being processed and several physical emanations, for instance time taken or power consumed to perform computations [11]. Since its first appearance, side-channel analysis has grown quickly with newly developed attacks as well as countermeasures, which try to prevent any sensitive information from being leaked. For instance, sharing schemes randomise intermediate values in such a way that the leaked information no longer depends on any sensitive data [13]. However, the efficiency of countermeasures is deeply linked to physical characteristics of the device on which they are implemented: in 2005, Mangard et al. [14] predicted the criticality of glitches for hardware implementations, which was then demonstrated in the same year [15]. They showed how the propagation of signals in combinatorial logic implementing an apparently secured SBox might result in critical leakages, leading to an ineffective protection.

Overall, there is a gap in the capabilities of quantifying the criticality of glitches in a hardware implementation. This gap is not trivial to close, as glitches in combinatorial logic are functions of the final layout of the circuit and the environmental conditions, and might change during the life of the device. In practice, two equal devices might exhibit a different behaviour in terms of glitches.

Our aim is to provide a formal framework for evaluating the presence of glitches under worst-case conditions without the need of detailed characterisation of the combinatorial logic, i.e. remaining at gate-level description. In order to achieve this result, we start from the mathematical structure created by Brzozowski and Ésik [7], which simulates the propagation of electric signals inside a circuit, and we build a method to relate a modelled power consumption with the sensitive variables that have caused it. Our analysis is done in a worst-case scenario where all possible glitches are taken into account as to achieve the maximum possible generality. Our main result is an assessing tool which is able to formally describe what kind of information could be leaked and to give an heuristic estimate about the security of sharing schemes implemented in hardware.

Related Work

To solve the problem of glitches, Nikova et al. [18, 19] suggested the use of threshold implementations, which allow to tackle glitches at root by developing maps that do not handle all the shares in the same combinatorial circuit. Such maps obviously come at the cost of a significant overhead compared to the unprotected version. Implementations and practical discussions can be found in the work of Moradi et al. [17] and of Bilgin et al. [6]. As for higher-order security, the issue of glitches has been faced with a generalisation of threshold implementations [5, 23], and independently by Prouff and Roche [20]. Specifically on the effects of glitches on the AES SBox, Mangard and Schramm [16] have reported a deep and complete analysis. From a design perspective, instead, some tools that attempt to identify leakages in masked circuits, also caused by glitches, already exist. Reparaz [22] described a methodology based on t test which, although similar in the goal, differs from ours in that it inherits the heuristic nature of the exploited statistical tools. Moreover, we do not require collection nor simulation of power traces. Also, the idea proposed by Leiserson et al. [12] is based on a heuristic method that allows the analysis of values flowing in more parts of a circuit through the so-called activity images. The main focus of their technique is on circuit modifications to thwart glitches’ threat, while we propose an evaluation of already existing circuits without altering their structure. Finally, our approach is similar to the work of Tiwari et al. [24] but they only focus on how untrusted inputs propagate through a circuit to the output, while we mainly care of leaked intermediates. For this reason an important role in our work is played by input transitions, while their focus is more on fixed inputs.

Organisation of the Paper

Section 2 provides the abstract framework underlying our tool, with a particular emphasis on how circuits, signals propagating inside them, power consumption and adversaries are modelled. Section 3 describes parts of the work of Brzozowski and Ésik [7] which are also used by our construction. In Section 4, we present our main contribution: we expand the functionalities of the previously discussed mathematical model with the notion of leakage and we show how such an improved framework can be used to analyse cryptographic circuits. The approach taken in this work is heuristic-oriented, for the sake of focusing on practical experiments of the model. For a more detailed and rigorous description of the latter two sections, we refer the reader to our previous work [4]. In Section 5, we test our tool and the underlying model with the sponge function Keccak. We discuss the soundness of our approach and several practical aspects in Section 6, and we conclude our work in Section 7.

2 Preliminaries

Our work targets hardware implementations of cryptographic schemes. Since the meaning of such can be quite broad, the present section aims at specifying our environment, as well as at setting the notation we adopt. In fact, our mathematical model applies only to an abstraction of real-world circuits: we just refer to logic netlists; hence, circuits formed only of logic gates and connections among them. Our tool therefore achieves a good level of generality, since it does not require any knowledge of implementation details apart from the circuit scheme itself, which means that it is general enough to include all the previously mentioned source of glitches (final layout, environmental conditions…). In particular, we focus on asynchronous feedback-free circuits. We claim this is not too restrictive, because of the following argument. Circuits can be divided into two parts: the combinatorial logic and the state storing part. The combinatorial logic is indeed asynchronous; it is the part in charge of implementing the logic functionality and where glitches might propagate. The state storing part, implemented via registers or memory cells, is clocked and provides the synchronisation between different sections of the circuits. Since we apply our model to logic circuits performing sensitive computations, the most natural choice is to focus on the asynchronous part only. We do not consider the presence of feedbacks in the combinatorial part for the sake of simplicity and because they are not a common construction in this field anyway.

We adopt a high-level abstraction of signals. Since we are only interested in the Boolean value they represent, it is convenient to think of them as square waveforms which can assume the values 0 or 1. To push the abstraction further, we define the following mathematical object.

Definition 1

A transient is a bit-string with no repetitions. More formally, a bit-string \(t=a_{1}{\dots } a_{x} \in \mathbb {Z}_{2}^{x}\) is a transient if a i a i+1 for all 1 ≤ ix − 1. Notice that bit concatenation is denoted by simply writing one bit after the other. Moreover, we denote by T the set of all possible finite-length transients.

Informally, transients can only be of the form 1010 … or 0101 … for an arbitrary finite length x ≥ 1 (note that bits 0 and 1 can be considered as transients when x = 1). The rationale behind transients is the following. Contracting bit strings is equivalent to neglecting time periods during which a signal assumes constant values 1 or 0. This results in transients being exclusively designed to represent which changes occur, but not when the order of switches can then be freely tuned, in such a way that the worst glitch behaviour is always shown at the output of a gate. That is to say if two transients modelling two changing signals are given as inputs to a gate, then the output will be a transient modelling the signal showing the highest possible number of changes. Section 3 specifies how to combine transients so to emulate gates’ logic and to achieve such a functionality.

Further Notation

We denote the power set (i.e. the set of all subsets) of a set S by \(\mathcal {P}(S)\). Vectors are denoted by underlined letters while boldface is reserved for signals seen as transients (cf. Definition 1 and Example 1).

2.1 Power Consumption Model

If we consider global synchronous circuits, the power consumption can be divided in three components: the static leakage, the switching of registers and the switching of combinatorial logic. The static leakage is the amount of power needed by the circuit to maintain the current state when no switch is present. The switching of registers is the consumption taken by the circuit for updating the state and is easily approximated by the Hamming distance of the state in two consecutive clock cycles. The value of the registers can be easily protected by masking schemes. The last contribution is the most interesting for us and is related to the consumption of the combinatorial logic. From a temporal point of view, the switching of registers usually happens at the rising edge of the clock cycle while the static leakage happens in its last part. By contrast, the consumption of combinatorial logic spans, in most cases, the entire duration of the clock cycle [21].

Hamming Distance Model

Consistently with the choice of addressing only the asynchronous part of a circuit, our power consumption model includes only the contribution of the combinatorial logic. As mentioned above, such part of a circuit is in charge for actually implementing the functionality of the circuit and it is the one where all the dynamic changes in values carried by wires happen. Moreover, when dealing with glitches, the main focus should be on how many times and in which moments a signal change in order to even recognise it as a glitch in the first place. For these reasons, a natural choice for our setting is the Hamming distance model, in which changes represent the most important metric. When modelling a gate’s power consumption, it is therefore appropriate to consider the signal it outputs or, equivalently, the corresponding bit string. If the output signal changes, equivalently the corresponding output bit string switches, the gate consumes. In these terms, the Hamming distance model we assume in the present work is described by the following three assumptions:

  1. 1.

    A gate consumes power if and only if its output bit-string switches.

  2. 2.

    A zero-to-one switch consumes the same amount of power as a one-to-zero switch.

As already stated, we neglect static leakage by means of the first assumption. The second assumption is made for the sake of simplicity and it can be dropped in favour of a more realistic model built on top of a specific technology library. Nevertheless, such assumption is often use in the literature.

The above two statements merely refer to what is considered to be the power consumption from a mathematical point of view. They summarise what in literature appears as Hamming distance model. For our purposes, however, a further assumption is needed to relate how such consumption affects what we will define as leakage.

  1. 3.

    Every time some power is consumed, an attacker can measure and exploit it. Hence, we assume that a potential leakage exists as long as a switch occurs.

We will discuss in Section 2.2 more details on what type leakage an adversary can retrieve, and in Section 4 how we model leakage. In practice, the third assumption ensures the highest possible generality: we consider as leaked any variable that has a chance to be leaked.

2.2 d-Probing Model

Designing scheme being provably side-channel resistant comes with the intrinsic problem of mathematically formalising the environment in which a side-channel attack usually takes place. Many models have been proposed, each capturing certain aspects of practical attacks: e.g. whether the adversary has the ability to inject faults, to learn only a small amount of information being processed or to learn a more complete but noisy view on the internal state. In this respect, one of the most influential and seminal work was done by Ishai, Sahai and Wagner [9]. Among other noticeable achievements, they presented a mathematical framework in which it is possible to prove cryptosystems secure against adversaries who can probe a certain set of d wires in the implementation and learn the values they carry. The proof of security is based on the argument that, up to d probes, the view of the adversary is independent of any sensitive variable.

There are several reasons why the d-probing model has been so widely adopted. First of all, Ishai, Sahai and Wagner [9] built a generic compiler that can turn any cryptographic algorithm in a version secure in the d-probing model. The protection obviously comes with an overhead which is at least quadratic in the number of probes the adversary is allowed to use. Nevertheless, such an overhead is unavoidable if one aims for provable security. A further reason why to prefer the d-probing model over more sophisticated ones is its good adherence with what happens in practice: if on one hand it might seem unreasonable that a real adversary learns the exact value on certain wires, Duc et al. [8] have shown this is equivalent to security in the noisy leakage model, where the adversary is only given access to a noisy internal state. The latter interpretation of the d-probing model is much more realistic in that noise is an essential component of hardware implementations.

From the perspective of this work, there are also several reasons why the d-probing model seems a natural choice when trying to model glitch propagation. As we will show in Sections 3 and 4, our analysis is very much focused on relating the switching activity of single gates to the input variables that have caused it in the first place. According to how a gate changes its output, which takes into account glitches too, we will say that some variables are considered to be leaked, under certain circumstances. Thus, it is natural to think of such variables as being learned by an adversary who probes the output of that gate, in the spirit of the d-probing model.

3 Simulation of Signal Propagation

The choice of transients as a formalisation of signals relies on the operations that it is possible to define among them. Since the circuits we study are only formed of logic gates, we want those operations to preserve gates’ functionalities. Therefore, we aim at building a function \(\hat {f}:T^{n} \rightarrow T\) associated to a Boolean function \(f: \mathbb {Z}_{2}^{n} \rightarrow \mathbb {Z}_{2}\) whose inputs are n transients, namely \(\underline {t}=(t_{1},{\dots } ,t_{n}) \in T^{n}\).

Example 1

Let us suppose that two signals s 1 and s 2 are given as input to a gate implementing a Boolean function \(f:\mathbb {Z}_{2}^{2}\rightarrow \mathbb {Z}_{2}\). Firstly, they are fixed at constant values \(b_{1}\in \mathbb {Z}_{2}\) and \(b_{2}\in \mathbb {Z}_{2}\), respectively. Suddenly, s 1 changes from b 1 to \(c\in \mathbb {Z}_{2}\), with cb 1. This is represented by the transient s 1 = b 1 c which can be either 01 or 10. Then, the idea behind the function \(\hat {f}\) is to emulate the behaviour of the function f , but taking as inputs the two transients s 1 = b 1 c and s 2 = b 2 (seen as a length-one transient) and producing a transient with the highest number of switches, i.e. as if the highest number of glitches occurred. Note that we write a variable in boldface if it is seen as a transient and that bit concatenation is denoted by simply writing one bit after the other.

In the present work, we simply assume that the functionality discussed in Example 1 can be achieved. The idea is that, given two input transients \(t_{1}=a_{1}{\dots } a_{d_{1}}\) and \(t_{2}=b_{1}{\dots } b_{d_{2}}\), the first bit the gate computes is f(a 1, b 1). This will be also called the initial stable state. Then the two inputs change to a 2 and b 2, respectively, and we have the freedom to decide which is the first one to affect the gate such that another change in the output (if any) is triggered. For a rigorous definition of the above, we refer the reader to our original paper [4] where all the steps are presented and proven.

Theorem 1

Let \(f:\mathbb {Z}_{2}^{n}\rightarrow \mathbb {Z}_{2}\) be a Boolean function. There always exists a function \(\hat {f}:T^{n}\rightarrow T\) such that the output transient models the maximum number of switches that a gate implementing f might show. Moreover, \(\hat {f}\) is well defined for any given input \(\underline {t}=(t_{1},{\dots } ,t_{n})\in T^{n}\).

3.1 Glitch-Counting Algorithm

The glitch-counting algorithm simulates the propagation of signals inside a circuit in terms of transients. First of all, a change in one or more inputs is assumed and represented as a transient. The glitch-counting algorithm assigns a transient to each gate as soon as the change reaches it. If the gate implements a Boolean function f, then the result is computed according to \(\hat {f}\).

Given a circuit with m inputs and k gates, we denote by \(\underline {X} = (X_{1}, {\dots } , X_{m})\) the vector of input variables and by \(\underline {s} = (s_{1}, {\dots } , s_{k}\)) the vector of state variables, which are the gates’ outputs. We use boldface to distinguish when variables are used as transients, as in Example 1. Initially, suppose that the input \(\underline {X}\) assumes the value \(\underline {X} = \underline {a}' = (a^{\prime }_{1}, {\dots } , a^{\prime }_{m})\in \mathbb {Z}_{2}^{m}\), and that the state has the value \(\underline {s} = \underline {b} = (b_{1}, \dots , b_{k})\in \mathbb {Z}_{2}^{k}\). We assume that the input changes to \(\underline {a} = (a_{1}, {\dots } , a_{m})\in \mathbb {Z}_{2}^{m}\). We call this a transition and we denote it by \(a^{\prime }_{1} {\dots } a^{\prime }_{m} \rightarrow a_{1} {\dots } a_{m}\). The goal is to study how glitches might propagate as a consequence of such a change.

The glitch-counting algorithm starts with the circuit in the initial stable state \((\underline {a}^{\prime }, \underline {b})\). The left-hand side is then set to the transient \(\underline {\mathbf {a}} = (a^{\prime }_{1}a_{1},\dots ,a^{\prime }_{m}a_{m})\) (note that in case a i′ = a i for some im, the transient obtained by concatenating them is only one bit) and is kept constant at that value for the duration of the algorithm. This stores how the input has changed. The right-hand side, instead, is stored in a vector of transients \(\underline {\mathbf {s}}\) and is constantly updated throughout the duration of the algorithm: its value is modified in each position s j according to the function \(\hat {f}\), where f is the functionality of the gate computing s j . We refer the reader to our original paper [4] and to the work of Brzozowski and Ésik [7] for the pseudo-code of the algorithm and further details. In this work, we opt for describing the algorithm by means of an example.

Example 2

Suppose that, in the circuit depicted by Fig. 1, the input changes from \(\underline {a}^{\prime }=(1,0,0)\) to \(\underline {a}=(0,1,0)\); hence, the transition 100 → 010 occurs. The execution of the algorithm is summarised in Table 1, where each row represents one iteration of the cycle and each column refers to one variable (both input and state) of the circuit. The last two rows are identical, which is the termination condition of the algorithm. At each step, the algorithm computes the functions \(\hat {f}\) of each gate for which previous transients are known. It follows the behaviour of real-world signal propagation; hence, earlier gates (i.e. closer to circuit inputs) are affected first. Indeed, the first row just represents the initial state (when only inputs have changed), the second one depicts a change in the first line of gates while in the third row, signals propagate till the last XOR. Figure 1 is a graphical representation of the final situation, which is the output of the algorithm without intermediate steps. Note that the final logic situation can be retrieved from Table 1 by extracting the last bit of each state variable.

Fig. 1
figure 1

Example of a glitch-counting algorithm’s execution

Table 1 Example of a glitch-counting algorithm’s execution

We conclude the present section with a theorem stating the asymptotic running time of the glitch-counting algorithm. The proof is extensively discussed by Brzozowski and Ésik [7] and is then omitted here.

Theorem 2

(Section 8 of 7) Given a feedback-free circuit and a transition of its inputs, the glitch-counting algorithm always terminates. Moreover, it runs in O(m + k 2) time where m is the number of inputs and k the number of gates.

4 LP Model

The glitch-counting algorithm was developed in the first place to prevent unnecessary power consumption by discarding netlists being particularly exposed to glitch propagation [7]. Our main contribution is the LP (leakage path) model, which is a mathematical abstraction that expands the functionalities of the glitch-counting algorithm and relates its simulations to the notion of leakage. Our result leads to a tool that allows to evaluate if a circuit has a critical leakage from the security point of view. The remaining of this section explains the structure of the LP model, which is formed of the following mathematical entities:

Input variables :

can trigger a signal propagation. If no input variable changes, no signal propagates and no power is consumed, therefore no leakage exists according to our power model.

Literals :

are sets of input variables. For each gate reached by a signal’s change, a literal contains which variables have caused the change and could then be leaked.

Literifiers :

are the link between transients and leakage. Essentially, they relate the input and output transients of a gate to the appropriate literal.

The general idea behind the above three objects is the following. The process begins with a change in the input variables, which generates a signal propagation inside the circuit and affects some gates. The gates are then supposed to produce a new output based on the new inputs and their final result depends on which variables have changed and how. In this framework, literifiers are responsible to retrieve the variables involved and represent them via literals. Finally, Section 4.2 develops an argument according to which the above concepts are applied to a whole circuit, and not just to a single gate, so to relate them to the d-probing model.

4.1 Structure of LP Model

We now describe in detail each part of the LP model with respect to a single gate. This means that when we talk of input variables, we mean the variables that are directly given as inputs to it. The next subsection will prove a broader view, showing how to apply notions for single gates to a whole circuit. Following the same notation as the input variables of a circuit, we denote such variables by X j and by X j if they are seen as transients; we assume that \(f:\mathbb {Z}_{2}^{n}\rightarrow \mathbb {Z}_{2}\) is the Boolean function implemented by the gate and we denote by \(\hat {f}:T^{n}\rightarrow T\) the corresponding function among transients.

As stated in the introduction of this section, input variables are of great importance for both the glitch-counting algorithm, since nothing could be simulated without a change of theirs, and the LP model. In essence, they are the objects our study targets as we aim at following their propagation along the circuit.

Definition 2

Given a gate with n inputs, namely \(X_{1},{\dots } ,X_{n}\), we call literal any subset of \(\{1,{\dots } ,n\}\). The set of literals is denoted by \(I = \mathcal {P}(\{1,{\dots } ,n\})\).

Literals are finite sets of input variables. In a sense, they are the result we are looking for: the analysis of a circuit by means of the LP model consists in assigning a literal to each gate. Their utility stems from the fact that they list which input variables are responsible for the power consumption and could then be leaked according to our power model. This is strictly connected with the rationale behind transients. In both cases, we assume the worst possible scenario: transients are supposed to switch as if the worst possible combination of glitches occurred in the same way as literals list all variables being leaked in the worst possible case. It is clear from the above discussion that the core of the LP model is the way we assign literals to gates.

Literifiers are functions establishing which input variables are leaked by a gate, i.e. the ones having caused a change in its output. They depend on how the gate’s inputs change, i.e. which transients enter in it, and on the implemented logic. First of all, we represent the input of a gate as the following vector of couples:

$$((t_{1},l_{1}),{\dots} ,(t_{n},l_{n}))\in(T\times I)^{n}. $$

We call it transient-variable representation: the first component of each couple is a transient modelling how that input signal changes, while the second one is a literal listing the input variables responsible for that change.

Example 3

Recalling Fig. 1, the gate computing s 1 = 010 has the following input according to the transient-variable representation.

$$((10,\{1\}),(01,\{2\})) $$

In Example 3, we have assumed that the literal of a circuit’s input is just the singleton containing its index. As for now, the transient-variable representation is directly possible only for gates at height 1, i.e. whose inputs are inputs of the circuit itself. In that case, each literal is simply the singleton of a variable. In the next subsection, we will show a procedure similar to the glitch-counting algorithm to meaningfully apply literifiers also to gates whose inputs have already been processed. Such gates are said to have height grater than 1. Informally speaking, the height of a gate is inductively defined to be 1 if all its inputs are circuit inputs, and to be the maximum height of its inputs plus one otherwise. We intentionally omit any further formalisation to avoid heavy notations. As an example, in the circuit in Fig. 1, the AND and OR gates are at height 1 and the XOR is at height 2.

We refer the reader to our original paper [4] for a detailed and general description of how literifiers can be built for an arbitrary Boolean function among transients \(\hat {f}:T^{n} \rightarrow T\). For the sake of simplicity, we limit the following discussion to the specific case of the gates AND, NOT, OR and XOR since a compact definition exists.

Definition 3

The literifier associated to a gate implementing the Boolean function \(\mathtt {AND}:\mathbb {Z}_{2}^{n}\rightarrow \mathbb {Z}_{2}\) is defined as:

$$L_{\mathtt{AND}}((t_{1},l_{1}),{\dots} ,(t_{n},l_{n})) \,=\, \left\{\begin{array}{lllllll} \emptyset &\text{if } \exists j \in \{1,{\dots} ,n\} \text{ such that } t_{j}\,=\,0 \\ \bigcup\limits_{j \in J} l_{j} &\text{otherwise } \end{array}\right. $$

where \(J = \{j \in \{1,{\dots } ,n\} \mid \ell (t_{j}) > 1\}\).

Intuitively, the upper branch in Definition 3 states that if there exists one input which is the fixed 0, then the output will be the fixed 0 no matter how other inputs change. Since the output is fixed, no power is consumed and the set of leaked variables is empty. Otherwise, the union of all literals corresponding to non-constant transients is returned. Since we are in the second branch, there is no constant 0 transient, which results in the rule excluding only literals being equal to the constant 1, as they do not contribute to the switching activity of an AND gate.

Example 4

Following Example 2, let us compute the literifier L AND ((10, {1}), (01, {2})) associated to the gate computing s 1. A straightforward application of Definition 3 yields

$$L_{\mathtt{AND}}((10,\{1\}),(01,\{2\})) = \{2\} \cup \{1\} = \{1,2\}. $$

For the OR gate the argument is perfectly dual to the AND gate’s, and then the literifier associated to it follows.

Definition 4

The literifier associated to a gate implementing the Boolean function \(\mathtt {OR}:\mathbb {Z}_{2}^{n}\rightarrow \mathbb {Z}_{2}\) is defined as:

$$L_{\mathtt{OR}}((t_{1},l_{1}),{\dots} ,(t_{n},l_{n})) \,=\, \left\{\begin{array}{lllllll} \emptyset &\text{if } \exists j \in \{1,{\dots} ,n\} \text{ such that } t_{j}\,=\,1 \\ \bigcup\limits_{j \in J} l_{j} &\text{otherwise } \end{array}\right. $$

where \(J = \{j \in \{1,{\dots } ,n\} \mid \ell (t_{j}) > 1\}\).

The NOT gate is clearly the easiest: if the input transient does not switch, so does the output, and then no power is consumed. Otherwise, the only possible literal is returned.

Definition 5

The literifier associated to a gate implementing the Boolean function \(\mathtt {NOT}:\mathbb {Z}_{2}\rightarrow \mathbb {Z}_{2}\) is defined as:

$$L_{\mathtt{NOT}}(t,l) = \left\{\begin{array}{lllllll} \emptyset &\text{if } \ell(t)=1 \\ l &\text{otherwise } \end{array}\right. $$

Finally, the XOR is slightly different than the AND and OR, since such a gate switches whenever at least one input switches. This restricts the cases in which L XOR returns the empty set.

Definition 6

The literifier associated to a gate implementing the Boolean function \(\mathtt {XOR}:\mathbb {Z}_{2}^{n}\rightarrow \mathbb {Z}_{2}\) is defined as:

$$L_{\mathtt{XOR}}((t_{1},l_{1}),{\dots} ,(t_{n},l_{n})) = \left\{\begin{array}{lllllll} \emptyset &\text{if } \forall j \in \{1,{\dots} ,n\} \text{, } \ell(t_{j})=1 \\ \bigcup\limits_{j \in J} l_{j} &\text{otherwise } \end{array}\right. $$

where \(J = \{j \in \{1,{\dots } ,n\} \mid t_{j} \neq 0\}\).

4.2 Application to Circuits

We conclude this section by showing how to apply the LP model to a given circuit with m inputs and k gates. For instance in Fig. 1, on one hand, it is immediate that the transient-variable representation of gate computing s 1 is the one shown in Example 3, but on the other, it is less clear what it should be for gates whose inputs are not the inputs of the circuit, e.g. for the one computing s 3. We recall that we denote by \(\underline {X} = (X_{1},{\dots } ,X_{m})\) the input variables and by \(\underline {s} = (s_{1},{\dots } ,s_{k})\) the state variables of a circuit.

The idea is simply proceeding by height: the only gates we can directly compute literifiers for are those at height 1, since the input literals are just singletons of input variables. Once all literifiers at height 1 have been computed, we can apply those at height 2: their input literals can be either singleton of input variables or outputs of gates at height 1. This procedure always terminates as there are finitely many gates and is well-defined as there are no feedbacks.

Example 5

We conclude what Example 4 has begun by computing all literifiers of Example 2. The only other gate at height 1 is the one computing s 2, for which we have the following.

$$L_{\mathtt{OR}}((01,\{2\}),(0,\{3\})) = \{2\} $$

We now have all the information to compute the literifier for the last gate.

$$L_{\mathtt{XOR}}((010,\{1,2\}),(01,\{2\})) = \{1,2\} \cup \{2\} = \{1,2\} $$

Figure 2 depicts the final outcome of the LP model applied to the circuit in Fig. 1 during transition 100 → 010. Essentially, the LP model adds one literal per gate to the output of the glitch-counting algorithm. They describe which input variables cause a particular gate to switch and whose values could then be leaked through the power consumption. Collecting such an information for all transitions gives the designer a powerful tool to predict possible flaws. In the next section, we deepen this discussion while providing a real-world use case. Note that there is a straightforward interpretation of literal in terms of the d-probing model: they are the set of variables (i.e. their value) that an adversary can learn by placing the probes on the output of the gate which produced them.

Fig. 2
figure 2

Application of literifiers to a circuit

Final Remarks

In the present subsection, we have shown how to practically apply the LP model to the netlist of a circuit. Although the example we have considered was trivial, the LP model is a formal tool to analyse netlists with an arbitrary number of inputs and gates in the d-probing model, where an ad hoc analysis would require much more effort. Once a netlist and an input transition are fixed, the LP model provides a list of variables based on which a risk assessment in the context of side-channel analysis is facilitated. As the next section will suggest, a full analysis would require the LP model to run over every non-trivial input transition, hence 22m − 2m times where m is the number of inputs and where we have subtracted transitions from an input to itself as they clearly do not produce any consumption in our power model. Such exponential requirement is a drawback of our approach: a deeper insight will be given in Section 6. Finally, for a fixed transition, the overall complexity is asymptotically bounded by the running time of the glitch-counting algorithm, described in Theorem 2.

5 Case of Study: Keccak

The present section provides an application of the LP model to Keccak. We show, thanks to our tool, how an unprotected implementation of Keccak’s non-linear layer is proved to be weak against side-channel attacks, and how glitches might also compromise security in the masked scheme. The reason why we chose to adopt Keccak as our case of study mainly relies on it being deployed in real-world applications while still having a not too complex structure. It is then the ideal candidate for being a test bench.

Keccak is a family of sponge functions that uses a permutation from a set of seven possible ones as a building block [3]. The permutations are defined over a state \(s\in \mathbb {Z}_{2}^{b}\) where b = 25 × 2 is called width of the permutation and ∈ {0, … , 6}. Each round is formed of five maps: three linear maps aiming at diffusion and dispersion, one non-linear map aiming at confusion and one addition with round constants. When it comes to implement sharing schemes, linear maps can be directly applied to each share separately. By contrast, non-linear maps need to handle every share to preserve correctness. Therefore, we focus on the only non-linear map of Keccak, namely \(\chi : \mathbb {Z}_{2}^{5} \rightarrow \mathbb {Z}_{2}^{5}\) acting on groups of five bits of the state called rows. For a complete description of Keccak, we invite the reader to refer to the work of Bertoni et al. [3].

The map χ can be seen as the parallel application of five identical maps each defined on three consecutive bits (modulo 5) of a row. Formally:

$$ \chi_{i} : r_{i} \leftarrow r_{i} \oplus \overline{r}_{i+1}r_{i+2} $$
(1)

where \(r\in \mathbb {Z}_{2}^{5}\) denotes a row of the Keccak state and the index i is computed modulo 5. The bits r i are called native values. For our analysis, it is important to note that the five instances of the map \(\chi _{i} : \mathbb {Z}_{2}^{3} \rightarrow \mathbb {Z}_{2}\) are completely independent; they do not share gates in their computation. As a result, we can focus on a specific χ i without loss of generality.

5.1 Unshared χ i

The first case that we study is the unshared χ i , i.e. the Boolean function in Eq. 1. As both the glitch-counting algorithm and the LP model work with netlists, the first step in the analysis of Eq. 1 is to produce one. Assuming the naming convention at the beginning of Section 3.1, input vector X = (X 1, X 2, X 3) corresponds to (r i , r i+1, r i+2), while state vector s = (s 1, s 2, s 3) corresponds to (,,). See Fig. 3 for a graphical representation of the latter.

Fig. 3
figure 3

χ i circuit after LP model, transition 100 → 011

It is trivial to see that any first order leakage at gate level is a leakage of a sensitive variable, then a critical leakage. Since χ i has three input bits, the number of non-trivial transitions is 26 − 23 = 56. We analyse two of them as example, namely 100 → 011 and 110 → 111. The execution of the glitch-counting algorithm for transition 100 → 011 is reported in Table 2 (left).

Table 2 Glitch-counting algorithm’s execution for the χ i circuit, when the transition is 100 → 011 (left) and 110 → 111 (right)

The LP model is then used: at first, literifier corresponding to s 1 is applied, since it is the only gate at height 1:

$$L_{\mathtt{NOT}}(01,\{2\})=\{2\} $$

Moving further to the gates at height more than 1, we compute L AND for s 2 and L XOR for s 3.

$$\begin{array}{@{}rcl@{}} L_{\mathtt{AND}}((10,\{2\}),(01,\{3\}))&=&\{2\} \cup \{3\} = \{2,3\} \\ L_{\mathtt{XOR}}((10,\{1\}),(010,\{2,3\})) &=& \{1\} \cup \{2,3\} = \{1,2,3\} \end{array} $$

In Fig. 3, execution of both the glitch-counting algorithm and the LP model is depicted, in the case of the transition 100 → 011. Instead, in the case in which the inputs transition is 110 → 111, the glitch-counting algorithm shows that no glitch happens, as summarised in Table 2 (right).

The above two examples are meant to show two very different situations: in the first one, in Table 2 (left), it is evident how as soon as any gate switches; the subsequent power consumption will leak a sensitive variable. Indeed, an adversary could learn one by simply placing one probe (hence for d = 1) anywhere in the circuit. Table 2 (right), instead, interestingly shows how not all changes in inputs trigger some power consumption, although being restricted to very few corner cases. Similarly to the latter scenario, there are other three non-trivial input transitions that imply no leakage in the considered circuit, i.e. 111 → 110, 011 → 010 and 010 → 011. Hence, summarising, there are 56 − 4 = 52 transitions that have some first order leakage, namely roughly the 81% of all transitions.

Although being no more than an exercise, the above discussions stress that the glitch-counting algorithm together with the LP model can be really fine grained in their analysis: they are able to precisely state the entity of the leakage even in the unprotected case which, surprisingly, does not happen as a result of every possible input transition. Nevertheless, there is a high possibility of such a critical leakage in this case, which is the reason why threshold implementations are implemented as countermeasures on χ i [18].

5.2 χ with Two Shares

The first sharing scheme we adopt in our analysis is a two-share Boolean scheme, i.e. each row is split in two shares \(a,b\in \mathbb {Z}_{2}^{5}\) such that r = ab [2]. Our results can be easily generalised to many shares. In this setting, Eq. 1 can be masked as follows:

$$ \begin{array}{lllllll} a_{i} & \leftarrow a_{i} \oplus \overline{a}_{i+1}a_{i+2} \oplus a_{i+1}b_{i+2} \\ b_{i} & \leftarrow b_{i} \oplus \overline{b}_{i+1}b_{i+2} \oplus b_{i+1}a_{i+2} \end{array} $$
(2)

where a straightforward computation shows that (2) are correct as Eq. 1 is simply retrieved by XOR ing them. If the order of operations was kept fixed from left to right then the above sharing scheme would be secure in the first order. However, if Eq. 2 were implemented in hardware, such condition could not be guaranteed, for instance because of glitches. This results in possible vulnerabilities when the values a i+2 and b i+2 are involved in the computation of the three-input XOR at the same time.

It can be easily seen from Eq. 2 that the two equations are symmetric; hence, the two netlists are identical. We will refer to them as being two branches of the implementation of Eq. 2. This also implies that we can focus only on the first branch without loss of generality, i.e. the one computing a i . Similarly to what discussed for the unshared χ i , the input vector X = (X 1, X 2, X 3, X 4) corresponds to (a i , b i+2, a i+1, a i+2), while the components of the state vector s = (s 1, s 2, s 3, s 4) correspond, respectively, to the NOT, upper AND, lower AND and XOR. See Fig. 4 for a graphical representation of the latter.

Fig. 4
figure 4

One branch of two-shared χ i circuit after LP model

First of all an input transition is fixed among all the 28 − 24 = 240 non-trivial possible ones. Then, the glitch-counting algorithm is applied as shown in Section 3.1 and all the transients are computed, one per gate. Table 3 reports the execution of the glitch-counting algorithm for the input transition 0110 → 0001.

Table 3 Glitch-counting algorithm’s execution for the shared χ i circuit

At this point, suitable literifiers can be applied as described in Section 4.2, hence starting from gates at height 1. In our example, this means computing the literifiers corresponding to s 1 and s 2 first, respectively an AND and NOT literifiers.

$$\begin{array}{@{}rcl@{}} L_{\mathtt{AND}}((10,\{2\}),(10,\{3\})) & =& \{2\} \cup \{3\} = \{2,3\} \\ L_{\mathtt{NOT}}(10,\{3\}) & =& \{3\} \end{array} $$

There are two gates at height higher than 1: first we compute L AND for the gate computing s 3 and finally L XOR is applied.

$$\begin{array}{@{}rcl@{}} L_{\mathtt{AND}}((01,\{3\}),(01,\{4\})) &\,=\,& \{3\} \cup \{4\} \,=\, \{3,4\} \\ L_{\mathtt{XOR}}((0,\{1\}),(10,\{2,3\}),(01,\{3,4\})) & \,=\,& \{2,3\} \cup \{3,4\} \,=\, \\ & \,=\,& \{2,3,4\} \end{array} $$

Figure 4 summarises the execution of both the glitch-counting algorithm and of the LP model for the transition 0110 → 0001.

To take the most out of the proposed method, a vulnerability definition based on critical combinations of variables needs to be formulated. This is checked among all the literals produced by the model, which has been run over all possible non-trivial input transition. Notice, however, that the sharing scheme outlined above is secure in the one-probing model if the order of operations is enforced, because at each step of the algorithm, each internal variable is independent of any sensitive ones. Unfortunately, glitches falsify such an argument.

A vulnerability of the circuit in Fig. 4 arises when the two variables a i+2 and b i+2 are processed in the same moment by the last XOR gate, as this could leak the value a i+2b i+2 = r i+2 which is unshared. As mentioned above, this would not be possible without glitches: they make an attack feasible with a single probe at the output of the XOR. In our model, this translates to the existence of {2} and {4} in the same literal corresponding to the XOR gate, since X 2 and X 4 are the input variables corresponding to a i+2 and b i+2. By running the model for all the 28 − 24 non-trivial possible input transitions, we have found that 32 out of 240 match our vulnerability definition and could then lead to a critical first order leakage. At this point, the designer possesses valuable information to base security improvements on. In particular, leaving our gate-level abstraction, the designer can carefully tune place-and-route paths in order to minimise the occurrence and impact of those critical transitions. If such an operation is not feasible, the designer still has a valid and sound criterion why to switch to a higher number of shares (three in the case of Keccak, since χ has degree 2).

The sharing scheme we have analysed [2] has not gained much popularity due to its weakness in the presence of glitches. However, our analysis is able to capture more details: we can quantify and list all those transitions threatening the security of unshared values. In this case, a designer could just patch them while being sure that all the others will never show a critical leakage of the first order even in the presence of glitches. We note that the possibilities for such a patch already exist in the literature. For instance, the work by Leiserson et al. [12] presented masked gates resilient to glitch propagation: it could be the case that a clever combination of our approaches might lead to beneficial results, for example by masking only those gates being critical under a certain vulnerability definition and not others, so to spare resources. Another possible heuristic approach would be to link the results shown by our model to practical considerations on actual vulnerability. This means that it might be possible to bound the SNR and other attacks’ success metrics given the simulation provided by our tool, to infer on practical (in)feasibility of attacks. However, since our aim was just to exemplify the potentiality of our model, we consider the latter modifications as being out of scope for the present work, but an interesting future direction towards sound and lightweight countermeasures.

5.3 χ with Three Shares

To overcome the presence of glitches causing a leakage of the first order, generally the three-share Boolean scheme is adopted. Each Keccak row r is split in three shares \(a,b,c \in \mathbb {Z}_{2}^{5}\), such that r = abc [2]. Now Eq. 1 can be masked in the following way:

$$ \begin{array}{lllllll} a_{i} & \longleftarrow b_{i} + \overline{b}_{i+1}b_{i+2} + b_{i+1}c_{i+2} + c_{i+1}b_{i+2} \\ b_{i} & \longleftarrow c_{i} + \overline{c}_{i+1}c_{i+2} + c_{i+1}a_{i+2} + a_{i+1}c_{i+2} \\ c_{i} & \longleftarrow a_{i} + \overline{a}_{i+1}a_{i+2} + a_{i+1}b_{i+2} + b_{i+1}a_{i+2} \end{array} $$
(3)

The of equations in Eq. 3 allows to retrieve the χ i function in Eq. 1. Moreover, each equation in Eq. 3 never processes all the shares of a native value. For example, the map producing a i operates on one share of the native variable r i (b i ), two shares of r i+1 (b i+1 and c i+1) and two of r i+2 (b i+2 and c i+2). Note that the missing share is exactly the one being output, coherently with the definition of threshold implementations [18, 19]. Since all the branches are symmetric, their netlists are the same. The one producing a i is depicted in Fig. 5, where the input vector X = (X 1, X 2, X 3, X 4, X 5) refers to (b i , c i+2, b i+1, b i+2, c i+1).

Fig. 5
figure 5

Netlist of χ i for one share, in the case of three shares

Considering only one share function, no leakage of a native value can appear, since its shares are never involved in the computation of the four-input by construction. As we have just mentioned, the three branches are symmetric; hence, we can focus our analysis on the two branches that produce a i and b i without loss of generality. If we monitor such circuits in at least two different points, it should be possible to observe some leakages that can give information on a native variable if combined together. In the literature, this is usually referred to as high-order leakage. Firstly, we jointly consider the two circuits, i.e. such that the inputs vector is X = (X 1, X 2, X 3, X 4, X 5, X 6, X 7, X 8), that corresponds to the native variables vector (b i , c i+2, b i+1, b i+2, c i+1, c i , a i+2, a i+1).

Since the number of inputs is 8, the number of transitions is 216, and among them 216 − 28 = 65280 are non-trivial. As an example, we choose to show the behaviour of our tool on the transition 00111111 → 01101000: the execution of the glitch-counting algorithm is reported in Table 4, where we have adopted some simplifications to make the table smaller. The variables X 1, X 2, X 3, X 4, X 5, X 6, X 7 and X 8 are set, respectively, to 0, 01, 1, 10, 1, 10, 10 and 10; hence they are not reported.

Table 4 Glitch-counting algorithm’s execution for the two branches of the three-share χ i

The next step is to apply the literifiers, starting from gates at height 1, i.e. gates producing s 1, s 2, s 4 for the first circuit (equations on the left) and gates producing s 6, s 7, s 9 for the second one (equations on the right).

Then, literals for gates at height more than one are computed too, i.e. for gates producing s 3, s 5 for the first circuit (upper equations) and s 8, s s 10 for the other (lower equations).

$$\begin{array}{@{}rcl@{}} L_{\mathtt{AND}}((0,\emptyset),(10,\{4\})) &\,=\,& \emptyset \\ L_{\mathtt{XOR}}((0,\{1\}),(01,\{2\}),(0,\emptyset),(10,\{4\})) &\,=\, &\{2,4\} \\ L_{\mathtt{AND}}((0,\emptyset),(01,\{2\})) &\,=\, &\emptyset \\ L_{\mathtt{XOR}}((10,\{6\}),(10,\{7\}),(0,\emptyset),(010,\{2,8\})) &\,=\,& \{2,6,7,8\} \end{array} $$

Figure 6 summarises the execution of both the glitch-counting algorithm and the LP model for the transitions described in the example.

Fig. 6
figure 6

Branches of the three-shared χ i circuit after LP model; on the left, there is the first circuit producing a i and on the right, the second one producing b i

The first circuit can have a leakage of input variables X 2 and X 4 that refer to shares c i+2 and b i+2, while in the second circuit, there can be a leakage of X 2, X 6, X 7 and X 8 corresponding to shares c i+2, c i , a i+2 and a i+1. Then, with just two probes, an adversary could learn information on shares a i+2, b i+2 and c i+2, making possible to recover the native value r i+2. Obviously, this is no longer a first order leakage, since an attacker has to use at least two probes to implement the attack.

Considering only two branches of the whole sharing scheme, there are some transitions that match the above vulnerability definition and could then lead to a critical high-order leakage of the two native values r i+1 and r i+2. In particular, there are 6016 transitions producing a leakage on r i+1, namely roughly 9.18% of all transitions, while there are 6144 transitions producing a leakage about r i+2, roughly the 9.38%. Finally, 1024 of these transitions lead to leakage on both r i+1 and r i+2, i.e. 1.56% of al l transitions.

In conclusion, analysing the three-sharing scheme for χ i with the LP model, we have deduced that no critical first order leakage can happen, since by placing only one probe, it is not possible to recover information about all the shares of a native value. Instead, studying the propagation of glitches in two circuits, we have noticed that there can be a critical high-order leakage of some native variable. In particular, an adversary being able to place two probes can retrieve a variable which is correlated to a sensitive one, giving rise to an attack of the second order. This is a theoretical result shown by our model; hence, practical experiments to verify the existence of the above would be a very valuable future direction, also considering that the previous best attack against the three-sharing scheme is a third order attack by Bertoni et al. [1]. Furthermore, we notice that all the above discussions remain true even if any two branches are chosen to run the LP model on, not just the ones returning a i and b i . The latter fact reveals how such an attack can work equally well against all the three couples of chosen branches.

6 Computational Effort and Multi-output Circuits

Since our aim is not to find a specific method for Keccak but a rather generic methodology, there are two further topics that need to be addressed: the computational complexity for a generic circuit and the applicability of the method to multi-output combinatorial circuits.

The former topic has been partially addressed in Section 3.1 for the glitch-counting algorithm (Theorem 2) and in Section 4.2 for the LP model. If we refer to Keccak as a practical example and we think at an implementation performing one round in one clock cycle, the target combinatorial circuit is the concatenation of θ, one of the linear maps, and χ [3]. This combinatorial circuit can be seen as a circuit with 33 input bits and 1 output bit in the unprotected version, while the protected version using two shares is a 44-input circuit [2]. As described in Section 4.2, this would turn in computing the propagation of glitches through k gates for each of the 22m − 2m non-trivial input transitions. Considering that the computation can be parallelised and the evaluation of the glitch-counting algorithm is not a very complex computation, we claim that the method could be applicable for a circuit with 44 inputs but would require a well-optimised implementation.

Multi-output circuits are also a very interesting target. In such circuits, there are gates contributing to the computation of different output bits. One approach for tackling these circuits is to divide the circuit in N independent circuits with single output, where N is the number of outputs of the initial combinatorial logic, nothing prevents the model to be applied as it is to each of those separately, but meaningful vulnerability definitions would be required to correctly interpret the results.

7 Conclusions

In their work, Brzozowski and Ésik [7] have developed a mathematical structure to estimate the potential waste of power of a circuit due to glitches. Our first contribution is the expansion of such framework to include a formal definition of leakage. We have then defined a formal procedure to analyse circuits in the d-probing model which takes into account the effect of glitches on the order of operations. Our work analyses only the combinatorial logic and hence achieves a good level of generality since it is not affected by real-world constraints. As a consequence, the LP model allows to retrieve how much a given protection scheme can be weakened by glitches, thus enabling a deep analysis. Using the proposed methodology, a designer might explore alternative workflows for solving local problems of glitches instead of adopting more costly solutions.