Keywords

1 Introduction

With the rise of the Internet of Things (IoT), devices implementing authenticated encryption schemes will become ubiquitous. A trend, NIST is planning to address with standardization efforts in the area of lightweight authenticated encryption schemes [21, 24]. As a consequence, authenticated encryption schemes will be more and more applied on devices in areas, where the physical access of malicious entities is unavoidable. Hence, implementation attacks like side-channel attacks and fault attacks, are a major concern for such devices as demonstrated, e.g., by Ronen, Shamir, Weingarten, and O’Flynn [28] in their attack on smart lamps. To identify and protect against the potential threats raised by implementation attacks, research in the practicability and applicability of implementation attacks on authenticated encryption schemes is needed.

As observed by many publications [29,30,31], the uniqueness of the nonce in authenticated encryption schemes prohibits the straight-forward application of prominent fault attacks like differential fault analysis (DFA) [10] to the authenticated encryption. In the case of authenticated decryption, the built-in validation of the authenticity of the processed data often provides an implicit detection of induced faults. Therefore, a lot of attacks published so far assume scenarios, where the uniqueness of the nonce is not ensured [31] or unverified plaintext is released [29], or even require a precise induction of faults at multiple locations during one execution of the authenticated encryption scheme [30]. Recently, statistical fault attacks (SFA) that are applicable to a wide-range of AES-based authenticated encryption schemes including popular schemes like GCM, CCM and OCB have been published [15]. However, the presented attacks face some limitations. In particular, they are only applicable to schemes where the secret key is processed right before the data is output. Thus, it is typically not applicable to sponge or stream cipher-based constructions. Moreover, they only work in the case of authenticated encryption, leaving fault attacks targeting authenticated decryption (assuming that the unverified plaintext is not released) as an open problem.

Our Contribution. In this work, we close the aforementioned gaps. We present the—to the best of our knowledge—first fault attacks targeting authenticated decryption/verification that are applicable to a broad range of nonce-based authenticated encryption schemes. In particular, the presented attacks are applicable whenever the nonce is mixed with the secret key during the initialization as it is the case in a wide range of authenticated encryption schemes. This includes sponge and stream cipher-based authenticated encryption schemes for which most of the existing fault attacks are not applicable.

We focus our analysis on Keyak and Ketje designed by Bertoni, Daemen, Peeters, Van Assche, and Van Keer [6, 7]. Both designs are based on the permutation [4], which also underlies Keccak/SHA-3 [23]. Please note that the presented attacks do not exploit a weakness inherent in the design of Keyak and Ketje, these two primitives just serve as an example to show the applicability of fault attacks on sponge and stream cipher-based authenticated encryption schemes.

Our attacks are based on statistical ineffective fault attacks [14, 16] and do not require an extensive profiling or characterization of the attacked device. Additionally, they are resilient against “errors” induced by miss-located faults, or in general fault inductions that do not behave as intended. As a consequence, they can be easily applied in practice as demonstrated by our attack targeting 8-bit software implementations of Keyak and Ketje running on an AVR Xmega 128D4. After inducing faults during authenticated decryptions and filtering for the inputs of 24 unaffected computations, we can recover large parts of the secret keys. The remaining unknown key bits can then either be brute-forced or further reduced by repeating the attack and inducing the fault at a different point in time.

Outline. In Sect. 2, we cover the required background of our attack. After, we describe the state-of-the-art of fault attacks, we give a short overview of authenticated encryption schemes. We provide a more detailed description of Keyak and Ketje, the two authenticated encryption schemes that are the main target of our practical attack evaluation, in Sect. 3. In Sect. 4, we discuss the idea and working principle of the attack. Section 5 describes the practical evaluation of our fault attack on a real microprocessor. We conclude the paper in Sect. 6.

2 Background

In this section, we give a brief introduction to fault attacks in general and state the idea behind Statistical Ineffective Fault Attacks (SIFA), recently proposed by Dobraunig, Eichlseder, Korak, Mangard, Mendel, and Primas [16], in more detail. We then recall the concept of nonce-based authenticated encryption with associated data.

2.1 Fault Attacks

The threat of fault attacks was demonstrated by Boneh, DeMillo, and Lipton [11] in 1997 when they showed the vulnerability of several asymmetric primitives like RSA to erroneous computations. Since then, fault attacks have been demonstrated targeting many other cryptographic schemes [9], including symmetric ones [10, 25].

The way in which faults can be induced into a cryptographic computation is manifold. Originally, the most popular fault attacks were based on clock glitches or variations on the supply voltage. However, by the time, more and more sophisticated fault induction methods were presented like attacks based on lasers [32], EM-pulses [19], or even X-rays [1].

While the induction of (more or less) precise faults into a cryptographic computation is an essential prerequisite for the attack, the exploitation of the observed erroneous behavior is equally important. Biham and Shamir [10] proposed Differential Fault Analysis (DFA) as an effective key recovery method for DES. DFA requires the collection of pairs of valid and faulty ciphertexts where a fault was induced in the last few rounds of the encryption. The difference between valid and faulty ciphertexts together with knowledge about the faulted operation can then be used to recover the used secret key. Later it has been shown that DFA is not limited to DES and can be applied to broad range of block ciphers.

One immediate consequence of fault attacks was the evaluation of possible countermeasures that can prevent such attacks. One commonly used countermeasure is the detection of the induced fault by means of redundancy like double-execution [2]. Here, the cryptographic computation is performed twice and the output is only released, if the results of both computations match up. While double-execution does prevent the attacks presented so far, a more powerful attacker can still succeed by either inducing a fault that skips the final comparison or by inducing a fault with equivalent effect during both computations. On top of that, Safe Error Attacks (SEA) [34] or Ineffective Fault Analysis (IFA) [13] solely rely on valid outputs of faulted cryptographic computations and hence are unaffected by double-execution.

So far, most fault attacks require the attacker to send specific inputs multiple times to the attacked cryptographic implementation. This raises the question whether or not such attacks also apply to nonce-based authenticated encryption schemes where unique nonces prevent attackers from doing so. Indeed, the feasibility of fault attacks has been shown by Dobraunig, Eichlseder, Korak, Lomné, and Mendel [15] for various block cipher-based authenticated encryption schemes by using Statistical Fault Attacks (SFA) [18]. However, their attacks face some limitations. For instance, they are not applicable in a straight-forward manner to most sponge-based and stream cipher-based authenticated encryption schemes. In our attack, we make use of Statistical Ineffective Fault Attacks (SIFA) [16] that build upon the concepts of both SFA [18] and IFA [13].

2.2 Statistical Ineffective Fault Attacks

The Statistical Ineffective Fault Attack (SIFA) [16] is a technique that exploits distributions of faults that have been induced, but do not affect the outcome of a computation (ineffective faults). Concretely, the effect of an induced fault depends on the values that are currently processed by a device. As a consequence, the distribution of the values where an induced fault does not change the processed value is often biased in practice. This distribution can then be exploited in attacks, which cannot be precluded by popular detection/infection countermeasures [16]. As shown in [14], even additional masking does not preclude such attacks.

To discuss the basic working principles of SIFA, let us consider an encryption where an attacker is able to force (using fault inductions) one specific intermediate value to follow an unknown but non-uniform distribution during the computation. Such fault inductions are rather easy to achieve in practice as it has been shown, e.g. in [16] by using clock glitches for various microprocessors, or in [15] by using lasers on a hardware AES co-processor. If we continuously perform such faulted encryptions we will probably observe plain- or ciphertexts where the fault was ineffective. In those cases, the distribution of the targeted value, where the fault has been ineffective, might also follow a biased/non-uniform distribution.

Once the attacker has collected a sufficiently large set of unaffected plain- or ciphertexts, key recovery can be performed as follows. First, the attacker needs to identify all key bits that are involved in the calculation of the targeted value. Clearly, the time frame of the fault induction has to be either towards the beginning or the end of the encryption such that the targeted value only depends on parts of the key. Hence, when attacking sponge or stream cipher-based authenticated encryption schemes, the usual location for fault inductions is the initialization phase. Next, she calculates the targeted value for each collected unaffected plain- or ciphertext and every possible key candidate. The targeted value should, when calculated using the correct key candidate, follow a non-uniform distribution (which is usually not known to the attacker). In contrast, the calculated distribution for a wrong key guess is typically unrelated to the event that there has been an ineffective fault and hence, is expected to be closer to uniform. As a consequence, we are able to distinguish wrong key guesses from a right key guess. For a detailed description of the working principles of the attack including statistical background and on the effects of faults we refer to [14, 16].

2.3 Authenticated Encryption

An authenticated encryption scheme provides confidentiality and authenticity for a given plaintext. It is usually modeled as a function of four input parameters: a secret key K, unique nonce N, associated data A and plaintext P [26]. The output of authenticated encryption is a tuple that consists of a ciphertext C and tag T:

$$\begin{aligned} \mathcal {E}(K,N,A,P) = (C,T) \end{aligned}$$

The corresponding authenticated decryption takes the following five inputs: a secret key K, unique nonce N, associated data A, ciphertext C and tag T. During decryption T is used to verify the authenticity of A and C. If they are not authentic the original plaintext P is not released and the special error symbol \(\bot \) is returned instead:

$$\begin{aligned} \mathcal {D}(K,N,A,C,T) \in \{P,\bot \} \end{aligned}$$

The concrete implementation of authenticated encryption schemes can differ significantly. Currently, many of the popular schemes like GCM [20], CCM [33], EAX [3], and OCB [27] are all based on block ciphers like AES. However, since the announcement of CAESAR [12], we can also see an increasing number of stream cipher-based and sponge-based authenticated encryption schemes. In the next section, we will present two such sponge-based designs: Keyak and Ketje, in more detail, since we will use them to describe the attack and for the practical evaluation.

3 Keyak and Ketje

Keyak [7] and Ketje [6] are sponge-based authenticated encryption schemes. Their design is heavily inspired by the hash function Keccak  [4], the winner of the SHA-3 competition. While both schemes make use of variants of the permutation in Keccak, their modes of operation are slightly different. At first, we give a short description of Keccak and its underlying permutation. We then describe how Keyak and Ketje make use of the Keccak permutation in order to build an authenticated encryption scheme.

3.1 Keccak

Keccak is a sponge-based hash function that was selected as the winner of the SHA-3 competition. It is parameterized by the permutation Keccak-f, rate r, and capacity c.

, more precisely denoted by , is an iterated permutation that operates on a b-bit state that is organized in \(5 \times 5\) lanes of \(2^l\) bits where l ranges from 0 to 6. The number of rounds \(n_r\) is determined by the width of the permutation and is equal to \(12+2l\). consists of the 5 operations: \(\theta ,\rho ,\pi ,\chi ,\iota \) that are applied to the state in the presented order in every round. From these 5 operations \(\chi \) is the only non-linear transformation. The purpose of \(\theta , \pi \) and \(\rho \) is to cause diffusion while \(\iota \) breaks any symmetries.

In the case of Keccak, the lane size l equals 6, thus the state has a size of \(5 \times 5 \times 64 = 1600\) bits and the number of rounds \(n_r\) is \(12+2 \times 6=24\). Depending on the desired security, c is chosen as twice the desired preimage resistance in bits and \(r = 1600-c\). Following the sponge construction design principle, Keccak can be divided into two phases: an initial absorbing phase and a subsequent squeezing phase. During the absorbing phase input chunks of r bits are repeatedly XOR-ed into the state and subsequently processed by . Once all input chunks have been absorbed, a chunk of the desired hash bit-size can be extracted from the state (squeezing phase).

Besides , a variety of similar permutations were proposed by the Keccak designers. In contrast to , in Keccak-p the number of rounds \(n_r\) does not depend on the state size b anymore and can be set to any positive integer. This allows for more flexibility in the design of Keccak-based cryptographic primitives. The state size b is however still restricted to the same values. Next, we give basic descriptions of the authenticated encryption schemes Keyak and Ketje.

3.2 Keyak

Keyak is an authenticated encryption scheme that uses the Motorist mode of operation and is based on the Keccak-p permutation. Even though Keyak supports a parameterized degree of parallelism we limit our description to the (recommended) Lake Keyak variant that does not support parallelization and thus can be used even on constrained devices. Lake Keyak utilizes a 1600-bit state, uses the 12-round Keccak-p[1600,12] permutation and performs authenticated encryption with 128 to 256 bits of secret key, up to 150 bytes of nonce and 128-bit tags. In the following, we describe the Motorist mode of operation, as used in Keyak. Whenever we refer to Keyak we mean Lake Keyak.

Motorist Mode. The Motorist mode defines how incoming messages are processed together with key, nonce, associated data and tag in Keyak. It is closely related to the duplex construction [5], with the main difference being the size of the input blocks. While the original duplex construction only allows input blocks as large as the outer part (rate r) of the underlying permutation, Motorist uses full-state keyed duplexes [22] that can make use of the full width of the permutation and thus allow higher throughput as shown in Fig. 1.

Fig. 1.
figure 1

Lake Keyak. \(f_{12}\) denotes the permutation, \(\sigma \) denotes the input string, and Z denotes the key stream.

3.3 Ketje

Ketje is an authenticated encryption scheme that consists of 2 parts: The mode of operation MonkeyWrap and the permutation. While 4 different versions of Ketje have been proposed by the designers for the 4 different permutation sizes of 200, 400, 800 and 1600 bits, our practical evaluation is performed on Ketje Jr. The main use case of Ketje Jr is lightweight authenticated encryption for constraint devices. Hence, the permutation is based on Keccak-\(p[200,n_r]\), meaning that only a rather small 200-bit state is used and the number of permutation rounds \(n_r\) is variable. Ketje Jr performs authenticated encryption with a 96-bit secret key and up to 86-bits of nonce. Different to Keccak and Keyak, in Ketje every call of the permutation is slightly twisted. The twisted permutation Keccak-\(p^*\) is an extended version of the standard permutation Keccak-p. It always starts with an additional call of \(\pi ^{-1}\) and ends with an additional call to \(\pi \). In the following we describe the MonkeyWrap mode of operation, as used in Ketje. Whenever we refer to Ketje we mean Ketje Jr.

Monkey Wrap.The MonkeyWrap mode defines how incoming messages are processed together with key, nonce, associated data and tag in Ketje.

The initialization of MonkeyWrap is called Start which is similar to the Motorist mode. First, key K and nonce N are XOR-ed into the zero-initialized state. Then 12 rounds of twisted Keccak-\(p^*\) permutation are performed.

The key stream generation Step is accomplished by performing duplexing calls, yet this time not the full width of the permutation is utilized, as illustrated in Fig. 2. Since the rate r of the permutation in Ketje is very small only a 1-round twisted Keccak-\(p^*\) permutation is needed in between Step calls.

Before the extraction of the tag starts, a 6-round twisted Keccak-\(p^*\) permutation is performed.

Fig. 2.
figure 2

Ketje Jr. \(f_{n_r}\) denotes the application of a \(n_r\)-round twisted Keccak-\(p^*[200]\) permutation, \(\sigma \) denotes the input string, and Z denotes the key stream.

4 Attack Strategy

In our attack, we target the decryption/verification of Lake Keyak and Ketje Jr (\(\mathcal {D} (K,N,A,C,T)\)). To be precise, we observe the behavior of the authenticated decryption of valid messages (NACT) in the presence of faults that are induced during the initialization phase. For both schemes, the initialization is the application of variants of to a state, which is composed out of the secret key K and a publicly known nonce N. If the fault induction affects and changes the outcome of this computation, also the value of the afterwards computed tag T will change compared to the value of the transmitted tag T and thus, the verification will fail. If the induced fault does not change the outcome of the initialization, the verification will succeed and the authenticated decryption will return a plaintext. Please note that the actual plaintext is not needed for the attack, we solely assume that the attacker is able to distinguish a failed verification from a successful one.

As shown in [16], inducing faults in multiple runs of the same computation with differing inputs, followed by a subsequent filtering for unaffected computations, most likely leads to biased distribution in the targeted intermediate value. In our case, unaffected computations (and thus ineffective faults) can be deduced from the condition that the verification succeeds. Hence, we assume that the attacker is able to affect one or multiple bits (\(A_{\chi _2}[x,y,z]\)) of the internal state before the application of \(\chi \) in the \({2}^\mathrm{nd}\) round of the initialization, so that the distribution of these bits is non-uniform for the filtered inputs (NACT). More concretely, we assume that the attacker is able to collect several nonces N, which lead to one or multiple biased bits before the \({2}^\mathrm{nd}\) round \(\chi \)-layer of the initialization.

Out of this knowledge, the attacker is able to extract information about the secret key. In the following section, we give a detailed description of how key recovery is achieved for Keyak. A very similar approach can then be used to perform key recovery for Ketje Jr. The major difference is the fact that a 200-bit permutation is used and hence bits of the equivalent key directly before the application of the \({1}^\mathrm{st}\) round \(\chi \)-layer are guessed in the attack.

4.1 Involved Bits in Keyak

Information about the secret can be deduced by identifying key bits that are involved in the calculation of \(A_{\chi _2}[x,y,z]\) and evaluate the value of \(A_{\chi _2}[x,y,z]\) under every possible assignment of the key bits for every previously collected value of the nonce N. For the right key guess, we expect to observe the highest bias in the values of \(A_{\chi _2}[x,y,z]\). But at first, we have to identify the involved bits.

First, we need to determine the bits at the input of the linear layer of the \({2}^\mathrm{nd}\) round, which are involved in the calculation of \(A_{\chi _2}[x,y,z]\). The linear layer of one round of Keccak-p[1600, 12] consists of the application of the single round functions \(\theta \), \(\rho \), and \(\pi \). The function \(\pi \) just swaps the words, so that

$$\begin{aligned} A_{\chi _2}[x,y,z] = A_{\pi _2}[(x+3y)\ \mathrm {mod}\ 5,x,z] \ . \end{aligned}$$

The function \(\rho \) rotates each lane by a different offset R[xy]. Hence,

$$\begin{aligned} A_{\chi _2}[x,y,z] = A_{\rho _2}[(x+3y) \ \mathrm {mod}\ 5,x,(z-R[(x+3y) \ \mathrm {mod}\ 5,x])\ \mathrm {mod}\ 64] \ . \end{aligned}$$

Finally, \(\theta \) computes its output by XOR-ing each bit with the parity of two columns in the array, thus, one bit \(A_{\chi _2}[x,y,z]\) is the sum of 11 input bits to \(\theta \).

$$\begin{aligned} A_{\chi _2}[x,y,z] =&A_{\theta _2}[(x+3y) \ \mathrm {mod}\ 5,x,(z-R[(x+3y) \ \mathrm {mod}\ 5,x])\ \mathrm {mod}\ 64] \\ \oplus \bigoplus _{y'=0}^4&A_{\theta _2}[(x+3y-1) \ \mathrm {mod}\ 5,y',(z-R[(x+3y) \ \mathrm {mod}\ 5,x])\ \mathrm {mod}\ 64] \\ \oplus \bigoplus _{y'=0}^4&A_{\theta _2}[(x+3y+1) \ \mathrm {mod}\ 5,y',(z-R[(x+3y) \ \mathrm {mod}\ 5,x]\!-\!1)\ \mathrm {mod}\ 64] \end{aligned}$$

Each of the 11 bits \(A_{\theta _2}[x_i, y_i, z_i]\) can be calculated using three input bits to \(\chi \). Therefore,

$$\begin{aligned} A_{\theta _2}[x_i, y_i, z_i] =&A_{\chi _1}[x_i, y_i, z_i] \oplus \\&((A_{\chi _1}[(x_i+1) \text { mod } 5, y_i, z_i] \oplus 1) \cdot A_{\chi _1}[(x_i+2) \ \mathrm {mod}\ 5, y_i, z_i]) \, . \end{aligned}$$

Note that two bits at the input of \(\theta \) in the \({2}^\mathrm{nd}\) round needed in the calculation of \(A_{\chi _2}[x,y,z]\) are adjacent bits of the same S-box, namely

$$\begin{aligned}&A_{\theta _2}[(x+3y) \ \mathrm {mod}\ 5,x,(z-R[(x+3y) \ \mathrm {mod}\ 5,x])\ \mathrm {mod}\ 64] \\&A_{\theta _2}[(x-3y-1) \ \mathrm {mod}\ 5,y,(z-R[(x+3y) \ \mathrm {mod}\ 5,x])\ \mathrm {mod}\ 64] \ . \end{aligned}$$

As a consequence, only 31 bits of \(A_{\chi _1}[x_j, y_j, z_j]\) are involved in the calculation of \(A_{\chi _2}[x,y,z]\).

The bits at the input to the \({1}^\mathrm{st}\) round that are needed to compute the 31 bits \(A_{\chi _1}[x_j, y_j, z_j]\) can be determined in a similar manner as done for the second round. However, doing so for general values of x and y gets a bit clumsy, hence, we focus on the restricted case of calculating \(A_{\chi _2}[0,0,0]\). The necessary equations are given in Appendix B.

Determining the necessary bits to calculate \(A_{\chi _2}[x,y,z]\) by hand is quite time consuming and also error prone. Thus, we have used a search tool [17], which has been developed to search for linear characteristics to identify the bits at the input of the permutation that are involved in the calculation of a certain \(A_{\chi _2}[x,y,z]\). In Fig. 3, we give the involved bits for calculating \(A_{\chi _2}[0,0,0]\). The figure represents one lane as hexadecimal value, where bits that are set to 1 are needed in the calculation of \(A_{\chi _2}[0,0,0]\). A corresponding figure for Ketje Jr is given in Appendix A.

Fig. 3.
figure 3

Bits involved in calculation of \(A_{\chi _2}[0,0,0]\). The position of the 128-bit key is highlighted in gray. Zeros are replaced by - to improve readability.

4.2 Recovered Bits

In this section, we will discuss how much information on the key bits can be recovered by exploiting a bias in \(A_{\chi _2}[x,y,z]\). For the sake of simplicity, we will stick to the example of \(A_{\chi _2} [0,0,0]\). Bits having a gray background in Fig. 3 are bits that represent the 128 key bits. Hence, to compute \(A_{\chi _2}[0,0,0]\), 25 bits of the key have to be guessed. However, from the equation given in Appendix B, we can see that only the 17 bits:

$$\begin{aligned}&A_{\theta _1}[0,0,0]\,\,\,, A_{\theta _1}[0,0,18], A_{\theta _1}[0,0,20], A_{\theta _1}[0,0,23], A_{\theta _1}[0,0,36], A_{\theta _1}[0,0,43],\\&A_{\theta _1}[0,0,53], A_{\theta _1}[0,0,54], A_{\theta _1}[1,0,2]\,\,\,, A_{\theta _1}[1,0,20], A_{\theta _1}[1,0,21], A_{\theta _1}[1,0,27],\\&A_{\theta _1}[1,0,48], A_{\theta _1}[1,0,58], A_{\theta _1}[1,0,59], A_{\theta _1}[1,0,63], A_{\theta _1}[2,0,62] \end{aligned}$$

can influence \(A_{\chi _2}[0,0,0]\) in a non-linear manner, while the 8 bits:

$$\begin{aligned}&A_{\theta _1}[0,0,19], A_{\theta _1}[0,0,42], A_{\theta _1}[0,0,49], A_{\theta _1}[1,0,3]\,\,\,, A_{\theta _1}[1,0,26], A_{\theta _1}[1,0,45],\\&A_{\theta _1}[1,0,57], A_{\theta _1}[2,0,61] \end{aligned}$$

only have a linear influence.

As a consequence, we can at most uniquely identify the 17 bits that influence \(A_{\chi _2}[0,0,0]\) in a non-linear way. For the 8-bits that influence \(A_{\chi _2}[0,0,0]\) in a linear way, only their XOR-sum (parity) effects the value of \(A_{\chi _2}[0,0,0]\). Since for 8 bits, half of the possible assignments have parity 0 and the other half has parity one, we get at least \(2^7\) key candidates that always lead to the same result. Please note that this is a rather simplistic evaluation and does not consider the dependencies of the non-linear bits and also the bits, which are used as nonce and constants. In fact, the key recovery depends on the value of these bits, since an unfortunate choices for the nonce can, for instance, lead to situations, where some S-boxes are linearized for some key bits, or some key bits are always blocked, so that they do not influence \(A_{\chi _2}[0,0,0]\). For instance, let us have a look at the results of one of our concrete experiments given in Sect. 5. Instead of recovering 17 out of the 25 bits uniquely from \(2^7\) key candidates scoring best, we are able to recover 15 of the 25 bits uniquely out of \(2^9\) key candidates that score best.

5 Practical Evaluation

We now describe the practical evaluation of our attack on a microprocessor implementation. Although we have performed attacks on both Lake Keyak and Ketje Jr, we limit our description to Lake Keyak, since the attack procedure is similar for both schemes. We do, however, state the results for both schemes at the end of this section. We start this section by giving a quick overview of the attack procedure in Sect. 5.1. We then describe the hardware/software that we have used to perform our attack evaluation in Sect. 5.2. After that, we state requirements on a fault setup more generally in Sect. 5.3. Finally, we present the results of our fault attacks on Lake Keyak and Ketje Jr in Sect. 5.4.

5.1 Attack Procedure

As described in Sect. 4, our key recovery exploits the input of specific Keyak decryptions. We are interested in decryptions that have a bias in one or multiple bits of the Keccak state before \(\chi \) in the \(2^\text {nd}\) round. To achieve the required filtering of inputs we use statistical ineffective fault attacks (SIFA), as proposed in [16].

Before the attack we set the secret key of the microprocessor Keyak implementation to a constant and unknown value. During the attack we send inputs, consisting of random nonce and tag, to the microprocessor, induce a clock glitch with constant offset during the computation and observe the behavior. The tag verification is used to detect whether or not an induced fault was ineffective.

5.2 Attack Setup

The practical evaluation of our fault attack was done on an 8-bit Xmega 128D4 microprocessor. The attacked software implementation of Lake Keyak consists of two parts. The first part is a C implementation of the Motorist mode of operation. The second part is a fast 8-bit AVR optimized assembler implementation of the Keccak permutation. Both implementations are taken from the Keccak Code Package [8] and therefore represent a good target software implementation for our practical evaluation. The clock signal of the microprocessor is generated by a Spartan-6 FPGA running at 12 MHz. We additionally use this FPGA for the insertion of glitches onto the clock signal. The insertion of clock glitches is achieved by XOR-ing an additional fast clock edge onto the original clock signal at a specified time. By doing so, we can violate the critical path to force undefined behavior of the microprocessor.

In our practical evaluation we can force strong biases in virtually every state bit that is affected by \(\chi \), however only in blocks of 8 bits at a time (which is not surprising on a 8-bit architecture). We suspect that our glitch does skip one of the XOR instructions in the bit-sliced \(\chi \) implementation, but we cannot say for sure though.

5.3 Attack Setup - Requirements

As we use SIFA [16], the requirements we have on the locality and especially the effect of the fault are quite relaxed. Basically, we only need some sort of bias in any bit at the input of \(\chi \) in the \(2^\text {nd}\) round. This can be achieved by e.g. faulting instructions in \(\chi \), slightly before \(\chi \), or by directly faulting registers using lasers. In the case of AES, such fault inductions have already been demonstrated for multiple microprocessors and even for hardware co-processors [15, 16]. One way to find a suitable glitch location in practice would be to estimate the clock cycles until the targeted operation is executed. Hence, in our scenario, one can estimate the time frame of the \({2}^\mathrm{nd}\) round and try to induce a glitch in several different clock cycles towards the end of that round.

5.4 Results

Keyak. As already mentioned in Sect. 4.2, when getting a bias in the bit \(A_{\chi _2}[0,0,0]\) located at the input of the \({2}^\mathrm{nd}\) round \(\chi \)-layer, 25 bits of the key are involved in its calculation. In our attack, we guess these 25 bits and evaluate the bias in \(A_{\chi _2}[0,0,0]\) for each key guess. Since some of the guessed key bits only influence \(A_{\chi _2}[0,0,0]\) in a linear manner, we get several equivalent key candidates having the same bias. As a consequence, Fig. 4 shows the advantage in bits the attacker gets from guessing key candidates down to a bias which also the correct key guess over just randomly guessing the key, which is \(\log _2(\#\text {total keys})-\log _2(\#\text {candidate keys})\).

Fig. 4.
figure 4

Attack on Keyak. Advantage in bits when targeting \(A_{\chi _2}[0,0,0]\) and guessing the associated 25 bits of the 128-bit key.

As shown in Fig. 4, 24 inputs of such unaffected decryptions are necessary to get a maximum advantage of 16 bits. In our case, we get \(2^9\) keys ranked top that have the same bias (not considering its sign). From those \(2^9\) keys, the values of 15 key bits can be uniquely determined. Due to the architecture of the implementation, we do not only get a bias in one bit, but one byte. By combining this information, we can uniquely determine 82-bits of the key.

In our attack setup, we are able to perform about 20 faulted decryptions per second. According to the practical evaluation, in about 1 out of 250 decryptions the induced fault is ineffective. The total time it took us to gather the required amount of inputs is roughly 5 min.

Ketje. In the attack on Ketje Jr we use the same fault location as in the attack on Lake Keyak. This is however not strictly necessary. Even though both schemes use variants of during initialization, the influence of key bits on one of our biased bits before \(\chi \) in the \({2}^\mathrm{nd}\) round is quite different, mainly due to the fact that the lane sizes are different (see Fig. 6). In contrast to Lake Keyak, in Ketje Jr nearly all key bits influence each of our biased bits, most of the time in a linear way. Hence, for Ketje Jr we instead guess the 200-bit equivalent key before \(\chi \) in the \({1}^\mathrm{st}\) round (i.e. after the first linear layer). By doing so we can reduce the dependency on the equivalent key to 31 bit and guessing becomes feasible in practice.

In our attack setup we can recover about 19 bits of the equivalent key that correspond to one biased bit in about 10 h using a single thread on an Intel Xeon CPU. Note that this time can be significantly improved, since we used for our evaluation purposes just the unoptimized reference implementation. Furthermore, the task of key guessing can be parallelized trivially. If we parallelize the computations for e.g. the 8 bits that were affected by our fault induction we can recover 152 bits of the equivalent key in the same amount of time. The remaining bits can be determined either by brute-force or repeating key recovery for a different fault location.

In total, again 24 inputs of unaffected decryptions are necessary for key recovery as shown in Fig. 5. The total time it took us to gather the required amount of inputs is below 5 min. Hence, the time complexity of entire attack is dominated by the key guessing and was performed in about 10 h.

Fig. 5.
figure 5

Attack on Ketje. Advantage in bits when targeting \(A_{\chi _2}[0,0,0]\) and guessing the associated 31 bits of the 200-bit equivalent key.

6 Conclusion

In this work, we present the first fault attacks targeting a broad range of nonce-based authenticated encryption schemes. While fault attacks on authenticated encryption have already been shown at Asiacrypt 2016 [15], this attack is mostly limited to schemes that additionally feature a final key addition and thus, is not directly applicable to most sponge-based or stream cipher-based constructions. We close this gap and show attacks based on SIFA [16], which are in principle applicable to most nonce-based authenticated encryption schemes that perform some sort of initialization where the nonce (or an other publicly known input) is mixed with the secret key. Since we only need to know whether a fault induction was ineffective or not, attacking the decryption function of authenticated encryption schemes gives us a perfect oracle. Our attack evaluation is focused on Keyak and Ketje, however, we conjecture that our attack can also be adopted to other schemes like the CAESAR finalists ACORN, AEGIS, Ascon, MORUS, etc. in a rather straight-forward way.

SIFA is resistant to popular fault countermeasures like double-execution and infection-based countermeasures as shown in [16]. Even additional masking does not preclude this attack vector [14]. The key recovery is capable of dealing with an arbitrary amount of noise (however requiring more faulted decryptions) that might arise due to possibly imperfect fault inductions. The effort required to perform our attack is rather low. We neither require perfectly timed faults nor precise knowledge about the effect of the induced fault. In our fault setup we are able to collect enough material for key recovery within 5 min. The actual key recovery for Keyak and Ketje is easily parallelizable and takes about 30 min and 10 h, respectively. The hardware cost of the attack setup does not exceed 300$.