Error control scheme for malicious and natural faults in cryptographic modules

Today’s electronic systems must simultaneously fulfill strict requirements on security and reliability. In particular, their cryptographic modules are exposed to faults, which can be due to natural failures (e.g., radiation or electromagnetic noise) or malicious fault-injection attacks. We present an architecture based on a new class of error-detecting codes that combine robustness properties with a minimal distance. The new architecture guarantees (with some probability) the detection of faults injected by an intelligent and strategic adversary who can precisely control the disturbance. At the same time it supports automatic correction of low-multiplicity faults. To this end, we discuss an efficient technique to correct single nibble/byte errors while avoiding full syndrome analysis. We also examine a Compact Protection Code (CPC)-based system level fault manager that considers this code an inner code (and the CPC as its outer code). We report experimental results obtained by physical fault injection on the SAKURA-G FPGA board. The experimental results reconfirm the assumption that faults may cause an arbitrary number of bit flips. They indicate that a combined inner–outer coding scheme can significantly reduce the number of fault events that go undetected due to erroneous corrections of the inner code.


Introduction
With the transition to the cyberphysical system (CPS) paradigm, digital circuits are increasingly used for functions that are safety-and security-critical at the same time.
A preliminary version of this paper was presented at the 7th International Workshop on Security Proofs for Embedded Systems (PROOFS) [12]. This  For example, emerging car electronics will have to support conventional safety features (like anti-lock braking system or airbag control) and advanced electronic drive-assist functions, which are safety-relevant and must be realized in a failure-proof manner. However, the same electronics will provide the customer with access to social networks and payment functions over the Internet, which makes it a target of deliberate security attacks. Moreover, emerging electronic systems are designed to operate in harsh environments (including temperature extremes, vibration, humidity), increasing the chance of failures due to natural causes: noise and ageing. At the same time, their components often lack a "protective perimeter" known from conventional servers located in an access-controlled building and operated by authorized personnel. Cyberphysical infrastructures, vehicles and production systems have parts designed to be placed in public spaces and accessible by anybody, including potential attackers. Therefore, malicious attacks on hardware components can be expected and must be counteracted.
A variety of defences on different abstraction levels have been suggested against natural failures and malicious attacks alike [18]. Here, we relate only to failures and attacks that create a tangible and observable change in the input-output behaviour of a circuit. In the context of natural failures, we do not consider mechanisms with purely parametric effects (e.g., ones which increase the circuit's power consumption but have no pronounced implications on the logic level). In the malicious case, we restrict ourselves to attacks that actively manipulate the operation of a circuit; purely passive analysis [20] is not in scope of this paper. Such fault-injection attacks [25] can aim at disrupting the application's control flow (e.g., jumping over password checks [34] or, in case of cryptographic circuitry, extraction of secret keys via differential fault analysis [3] or fault-sensitivity analysis [19]. We refer by the term "fault" to any logical effect due to either a natural failure or a malicious (active) attack.
Out of various countermeasures against natural and malicious faults, approaches based on error-detecting codes (EDCs) stand out. They can be applied to protect memories, communication channels and combinational circuitry. In the case of natural failures or poorly-controlled malicious fault injections, EDC competes with space-and time-redundancy techniques, including duplication, modular redundancy, and commit-rollback [18]. However, an intelligent attacker with high-precision fault-injection equipment can circumvent this protection by injecting multiple faults into redundant copies such that they cancel each other out. Recent developments such as dual-beam laser fault injectors make this threat practical [29]. On the contrary, special security-oriented EDCs have been proposed [15,30]. They are designed to counteract a strategic attacker who knows the defences and aims at circumventing them. It can be shown that all linear codes offer limited protection against attacks under this assumption, as there are faults that are never detected. Therefore, the usual EDCs like parity or Hamming codes are inadequate in this case, and dedicated nonlinear security-oriented codes are required.
In this paper, we consider and optimize architectures that are designed to handle natural and malicious faults. The architectures are based on a recent code construction, the Rabii-Keren (RK) codes [26] in their generalized form [27].
RK codes are defined on the code alphabet of size q, which is a power of 2. For example, q = 2 4 (q = 2 8 ) is a natural choice for a circuit with a state organized in 4-bit nibbles (8-bit bytes): a fault that affects a nibbles (bytes) directly corresponds to an error of multiplicity a which can be detected and/or corrected. RK codes combine three properties: a user-defined distance (and therefore the possibility of error correction), a low masking probability (a metric for resilience against malicious attacks), and a high rate (ratio between data and check bits). The code-based architecture is summarized in Fig. 1; notice that the syndrome of the code is intended for processing at the system level, which can also take into account further sources of information to decide whether the detected fault was malicious or not and whether it should be corrected or an alarm should be raised.
The feature to reliably correct errors up to a certain multiplicity is useful for both natural and malicious faults. If a fault could be corrected, the system can proceed with its regular operation, while error-detection without correction requires some handling, e.g., re-execution of the affected computation. Therefore, error-correction is attractive in particular for safety-critical systems with real-time requirements, like aircraft or chemical plants which cannot simply stop operation and wait for fault handling. Note that even if an error can be corrected, the system may still have to record the fault event. Moreover, a system may monitor the faults to decide which of them are due to natural causes or due to malicious tampering. The observed fault effect (fault rate and multiplicity) can be an input to such a monitoring procedure, but in general, further inputs are needed for reliably distinguishing between natural and malicious faults. For instance, a system which operates in a high-radiation environment may be equipped with a radiation sensor; if it reports high radiation, then the fault is likely natural.
The contributions of this paper are the following: -An efficient correction procedure for single nibble/byte errors using RK codes is presented. The procedure, based on a compact Error Coefficient and Location Table  (ECLT) is a substantial improvement compared with the regular syndrome analysis. -Inner-Outer code-based architecture is presented. A robust inner code is employed to detect and correct errors. An outer robust code detects critical events of undetected and miscorrected errors. -Induced error statistics. We report experiments using a clock glitch-based fault injector on cryptographic circuits (full-and small-scale AES [5], LED, PRESENT) on the SAKURA FPGA board. The experimental results confirm the long-standing assumption that an injected error can be modeled as an additive symmetric error. -The effectiveness of the RK code in detecting faults causing an arbitrary number of bit-flips is confirmed. The experiments show that the proposed architectures are capable of detecting errors of arbitrary multiplicity and correcting single errors, reliably recognizing erroneous corrections. The architectures are especially effective when they combine codes with a distance larger than 3 and an additional system-level validation by an outer code. -The experimental results indicate that a system level fault manager that employs a Compact Protection Code (CPC) [28] as an outer code is able to detect erroneous RK corrections with high probability. It can decrease the probability that a malicious fault will not be detected (at the first cycle it effects that computation), exponentially with the number of the CPC's redundant bits.
The remainder of the paper is organized as follows. Section 2 gives background on natural and malicious faults and briefly introduces the concept of security-oriented errordetecting and error-correcting codes. Section 3 outlines the detection and correction architectures for codes of distance 3 and distance larger than 3. Section 4 presents an (inner) RK code-based detection and correction architecture combined with an (outer) CPC-based validator which further reduces miss correction events. Experimental results are reported in Sect. 5. Section 6 concludes the paper.

Natural and malicious faults
Both natural and malicious faults can result in correctable or uncorrectable errors. If a natural fault stems from a minor disturbance (e.g., a low-energy particle discharge), it will likely affect only a few bits and the resulting error will be corrected and no further action will be needed. Moreover, the effect of natural faults is usually local; for example, a natural fault in a circuit of an 8-bit SBox can flip at most eight bits. When q-ary error detecting/correcting codes are implemented, the number of errors induced by a fault is counted in terms of the number of erroneous q-ary symbols. That is, if q = 2 8 , such a natural fault causes a single error, and if q = 2 4 , at most two errors may occur.
Malicious attacks often have quite strong effects [25]; e.g., if the circuit's clock is glitched or the voltage is lowered [2], many outputs will typically be affected. When the effect of faults on the number of bit-flips at the circuit output (or on their pattern) cannot be characterized, we refer to such faults as ones that cause arbitrary errors. Security-oriented code are codes which are designed to detect arbitrary errors with high probability.
Errors stemming from such malicious attacks should be detected but it is unrealistic to reliably correct them. Even very pinpointed attacks, like laser fault injections and EM injections which aim at flipping one particular logic gate output or memory cell, typically start with a tuning phase when (detectable but uncorrectable) multi-bit errors are produced [25].
If an attack is run with a restricted fault model, like singlebyte faults in Tunstall's attack on AES [31], the error can be corrected by the RK code with single-error correction capability (distance 3 or larger). Since the correction would happen within the circuit, the attacker would observe no faultaffected ciphertext and therefore would not be able to mount the attack. Another class of attacks where the correction is very useful from the defender's point of view is statistical impossible fault analysis (SIFA) [7]. In SIFA, the secret information is leaked by the observation whether an injected fault had an effect or not; this leakage is prevented by correcting the resulting error.
In this paper, we employ q-ary RK codes which can correct a single erroneous symbol(nibble/byte) and detect with high probability any number of bit flips. Therefore, the proposed architecture can protect the circuit against any fault injection technique.

Security-oriented codes
Given a vector space F n q of dimension n over F q = G F(q), a code C is a subset of size |C|. A code C is said to be systematic if every codeword is of the form c = (x, w(x)) where x ∈ F k q is the information portion, and w ∈ F r q is the redundancy portion.
Let c ∈ C be the correct codeword and denote byĉ the distorted word. It is convenient to model a fault that distorts a symbols as an additive error e =ĉ − c of Hamming weight a; a is called the error multiplicity. In this paper, an error is represented as a q-ary vector e = (e x , e w ) ∈ F n q where e x is the error in the information portion and e w is the error in the redundancy portion. In addition, the multiplicity of a malicious error is considered here as arbitrary, i.e., 1 ≤ a ≤ n.
The effectiveness of a reliability-oriented code is usually measured in terms of its decoding error, that is the probability the decoder will fail to correctly decode a tampered word. Since the most probable error has the lowest Hamming weight, these codes are evaluated using the minimum distance d which is the minimal Hamming distance between all codewords. A security-oriented code is evaluated using the maximal error masking probability, Q, which is the maximum probability that any nonzero error e will map a codeword to another codeword in C.
In this sense, when codes are analyzed for reliability, the average case is considered, whereas the analysis for security is based on the worst case scenario.
The upper bound on the minimum distance d of a code is linearly dependent on the number of redundancy symbols, d ≤ r + 1, whereas the lower bound on Q is exponentially dependent on the number of redundancy symbols.
A security-oriented code can have a deterministic encoder [1,13,14,22] or incorporate randomness [6,23,33]; the latter includes the non-malleable codes [8]. The error detection capabilities of codes with random-encoding depend on the entropy of the random number generator (RNG). However, the hardware implementation of a true RNG is expensive and difficult, and the RNG must be shielded from fault injection attacks which could neutralize it. For this reason, codes with deterministic encoding are an attractive alternative. In fact, when properly designed, such codes can be more effective than random codes of the same rate [16]. This work deals with robust codes, which are codes with deterministic encoding.
Notice that an additive error e is masked by a codeword c ∈ C if c ⊕ e ∈ C. Similarly, an error e is detected by a codeword c ∈ C if c ⊕ e / ∈ C. This leads to the following definition of the error masking probability:

Definition 1
The error-masking probability of an error e, Q(e), is the probability that an error e will be masked by the codewords of C. That is, where Pr(c) is the probability of the codeword c and δ C is the characteristic function of C, In the case of uniformly distributed codewords, it is convenient to represent the error masking probability in terms of the autocorrelation function of the code. That is, For some codes, the set of codewords that mask an error form a linear subspace. Thus a good code will be a union of small disjoint subspaces. On the relationship between the autocorrelation function and the representation of a code as a union of disjoint subspaces see [17].
The detection kernel of a code, denoted K d , contains all the error vectors that are never detected by the codewords of C, i.e., all the errors that are masked with probability Q(e) = 1.

Definition 2 (Robust codes)
A code C is called robust if any nonzero error can be detected with some probability greater than zero. Meaning, Q(e) < 1 for any nonzero error e, or alternatively, K d = {0}.

Definition 3 (Partially Robust codes)
Linear codes have a detection kernel K d = C, and therefore linear codes are not robust and cannot be used for security.
There are two known basic high rate binary systematic robust codes: the Quadratic Systematic (QS) code [13], and the Punctured Cubic (PC) code [1,22]. All other systematic robust codes use these codes as base codes. While the QS code is an optimum robust code when k = 2sr and q is any power of a prime number, and the PC code is a close to optimum robust code for any 1 < r ≤ k and q is a power of two. Another high rate robust code is the Compact Protection Code (CPC) which exists for any set of parameters and has low implementation cost [28]. However, neither of these codes have correction capabilities. Some minimum distance partially robust codes exist, for example the Vasil'ev code [32], the Phelps code [24], the one switching code, and the generalized cubic code [9,21]. While these codes provide the wanted correction capabilities, they are not robust.
In a recent paper Rabii and Keren introduced a construction for a new class of nonlinear robust q-ary codes with q = 2 m and error correction capability [26]. The code is built upon systematic linear codes [n, k, d] q where the n−k redundant symbols that were originally allocated to increase the minimum distance of the code, are modified to provide both correction capability and robustness. The following (generalized) definition of the RK code is taken from [27].
To simplify the writing, when it is clear from the context, for a vector v ∈ F k q we define The robustness and the effectiveness of the RK code are due to the high nonlinearity of f . For odd values of m, the best function to use is the cubic function, f (x i ) = x 3 i , which is an invertible APN function of (relatively) small implementation cost [27]. However, as shown in [27], it is possible to use other functions, for example f (x i ) = x −1 i , with even values of m and to pay with a higher error masking probability Q(e). Namely, the error masking probability of the codes is Q(e) ≤ 2/q for odd values of m and 4/q for even m.
Rabii and Keren did not present encoding, decoding, and error correction algorithms, and their code was not implemented and tested in a realistic environment. This paper aims to close this gap.

Detection and correction architecture for Rabii-Keren codes
In this section we present a novel low-cost implementation of error detection and correction architectures based on RK codes. We start with codes of distance d = 3 and then generalize the decoder for codes with d > 3. We demonstrate the effectiveness of these codes in correcting single erroneous SBox's output and detecting multi-erroneous SBox's outputs in Sect. 5.

Construction algorithm of systematic Rabii-Keren codes
In order to use the RK construction, one has to construct a systematic generator matrix for the corresponding linear code. Algorithm 1 constructs a generator matrix based on the check matrix of a shortened BCH code over an alphabet of size q. Note that q = 16 for SBoxes that work on 4-bit nibbles, and q = 256 for bytes. Note that the matrix H orig,d defined in Step 8 of Algorithm 1 has elements from F q m . Since F q m and F m q are isomorphic, 16 Rabii-Keren code for protecting 16 4-bit SBoxes (q = 16) by using 12 redundant bits; i.e., r = 3. The code is based on the [19,16,3] 16 shortened BCH code with distance d = 3. The check matrix of the shortened BCH code, H orig,3 , has two rows; the first corresponds to first 19 powers of α 0 , and the second to the powers of α 1 , here, α is a root of the primitive polynomial π(x) = x 2 + 11x + 5.

Detection and correction architecture for RK codes of distance d = 3
The general process of error correction is the same for a linear code and a nonlinear code. (The linear version can be obtained by skipping nonlinear operations, in this case the inversion.) Let c = (x, w) be the correct codeword, and let z = (z x , z w ) = (x + e x , w + e w ) be the word received by the checker. The simplest RK decoder works in multiple steps: See Fig. 3 (which can be considered a more detailed version of "protected subsystem" in Fig. 1). First, it prepares the received word for the BCH decoder by applying the inverse function to the elements of z x . Namely, it generates the vector y = (y x , y w ) = ( f −1 (z x ), z w ). Then, it uses y to compute the syndrome s = H d · y T . Note that unlike in linear codes the syndrome depends on both, the correct codeword and the error vector: Recall that an error is detected if the syndrome s is not zero.
To this end, the decoding process is simply to re-encode the information portionx = x + e x intoŵ, and compare it with the redundant portion of the output vector z; i.e., compare the q-ary vectorsŵ and w + e w . Each syndrome is associated with an error vector,ê = (ê x ,ê w ), and the decodedẑ iŝ If the distance between the received word z and the correct word c is less than or equal to the correction capability of the code, the obtainedẑ is in fact the desired codeword c. In other words, if the number of distorted symbols is less than or equal to the correction capability of the code, the decoder will work as desired.
More efficient error location and correction procedure is presented in Algorithm 2. Here too, the decoding starts by computing the syndrome associated with the intermediate word y (Algorithm 2, line 1) Next, the decoder checks whether the error is correctable (Algorithm 2, lines 2-4); Denote by s j the first value in s that is not equal to zero and by g i the first value in h i that is not equal to zero. Calculatê s = s/s j andĥ i = h i /g i .ŝ =ĥ i and therefore s = e iĥi /g i . Overall we can find the value of e i = s j g i and then correct where each row contains a differentĥ i of length r , the column indicator i, and the factor g i . Finally, the decoder verifies that the error has been corrected (lines 5-6). For example, the table for the (19, 2 64 , 3) 16 Rabii-Keren code (described by the above matrix A 3 ) is shown in Table  1.
Note that the original decoder (based on the original BCH check matrix H orig,3 (Fig.2)) requires a table of (q − 1) · n = This amounts to 15 · 19 · 21 bits. In contrast, the proposed architecture in Fig. 4 employs a table with 19 entries of 18 bits each. Clearly, when protecting a full scale cipher with 8-bit SBox, the size of the table can be reduced from 255n to n entries.

Detection and correction architecture for RK codes of distance d > 3
Small-scale natural faults or some precise faults injected by a sophisticated attacker manifest themselves as a single erroneous symbol. However, there is an advantage in protecting the system with a code of distance d > 3. A code of distance d > 3 allows the correction of a single erroneous symbol (i.e., SBox output) and avoids a miscorrection of up to d − 2 erroneous symbols.
In this section, we show how the idea presented in the previous section can be generalized to codes with d > 3. We show that instead of using a table with (q − 1)n entries each of size (1 + (d − 2)m) log 2 q syndrome + log 2 (n) error−location + log 2 q error−value bits, one can use a table with n entries each of size (m + 2) log 2 q + log 2 (n).
Let us start with an example before the correctness of this statement is proven. The corresponding matrix A 5 is . Since e 1 , e 2 consist of elements in F q , and F q ⊂ F m q , the last equality can be written over F q m as follows: For example, the columns in the first 1 + m = 3 rows of A 5 are all distinct, moreover, one is not a multiple of the other. Hence the location of a single erroneous nibble is uniquely defined byŝ [0 :2] . Table 2 contains the location and error coefficient value for the correctable syndromes.

Inner-outer code-based architecture
In this section we introduce four classes of fault events and present an inner-outer code-based architecture that can reduce the probability that a critical fault event will occur. We consider architectures working on ciphers where information is organized in 4-bit nibbles (like LED or PRESENT) or in 8-bit bytes (like AES). In the former case, one nibble is naturally mapped to a symbol over F 16 used in the Rabii-Keren code construction. For 8-bit ciphers, we consider two architectures. In the first architecture, each byte corresponds to two 4-bit symbols (32 symbols for the complete 128-bit state in the case of AES); note that an error in a single byte may affect two (neighbouring) symbols. The second architecture uses two decoders, one over 16 "upper" symbols, where each symbol stands for four most significant bits of a state byte, and one over 16 remaining "lower" symbols.
To assess the capability of an architecture to detect and correct faults, we introduce the following classification of fault events. A fault event is one circuit operation (encryption) with a specific input (plaintext) whose output (ciphertext) deviated from the fault-free value. We deliberately avoid using the term "fault" or "error" to avoid confusion with scenarios when multiple inputs are used and a fault detected by one of the inputs is counted as detected. This view is appropriate for permanent faults, but fault events considered here are transient. Each of the fault events is attributed to the following categories (Fig. 7).
-Class C1: Undetected by the RK code Faults which were undetected, i.e., resulted in the all-zero syndrome. Fault events of class C1 occur when an error maps a codeword onto another codeword. -Class C2: Single errors Faults which affected only one symbol and could be corrected to the original codeword. Note that our experimental setup keeps the faultunaffected ciphertext for reference and thus can attribute the fault precisely; an actual device under attack would not know the multiplicity of the injected attack. Fault events of class C2 occur when the error shifts a codeword to a word within a Hamming ball of radius t around it where t is the maximal number of erroneous symbols that the decoder is allowed to change, t ≤ (d − 1)/2. -Class C3: Recognized as suspicious Faults which resulted in multi-symbol errors and where the correction procedure stopped since it did not find a fitting entry in the ECLT. -Class C4: Erroneous correction Faults which were corrected but into a different codeword than the original one. This can happen if, e.g., for distance-3 code, an error of multiplicity 2 transforms a codeword into a noncodeword with distance 1 to a different codeword. Fault events of class C4 happen when the error shifts a codeword into any Hamming ball whose center is a different codeword.
Fault events from classes C1 and C4 are potentially critical, as they are associated with errors not properly handled by the RK code. The probability of a critical event is about V t q −r where V t is the size of a q-ary Hamming ball of radius t. Thus, it is possible to decrease the probability of critical events by using a code with correction ability t smaller than (d − 1)/2 Table 3 Two decoders-fault classification (as we did in Sect. 3.3). Although this solution costs in code rate, and hence additional area overhead, the probability that a critical event will occur is significantly reduced. This can be followed by considering the percentages of unrecognized fault events for distances 3 and 5 in Table 4.
Another option is to use a system-level fault manager that uses an error detecting robust code to validate the decoders decisions. In our case, we use CPC as the outer robust code and the RK code as an inner code (see Fig. 6). The outer CPC predictor adds r o redundant bits to the k = k q · m output bits of the original component. Since a CPC code exists for every word length and security parameter Q [28], r o can take any value, it is not restricted to multiples of m. For the simplicity of computation, we used as a ground code a binary Punctured Cubic code whose error masking probability equalsQ pc = 2 1−r o . The RK-based predictor generates r q redundant symbols which together with the k q + r o / log 2 (q) symbols form an RK codeword.
In a configuration in which two decoders work in parallel on two disjoint subsets of the output bits, the probability of the four fault events depends on the outcome of both decoders. Note that there may be a correlation between the decoders due to the nonlinearity of the SBox function. In what follows, we take into account this correlation by classifying a fault event according to Table 3. For example, if one decoder masks the error (C1) and the second corrects (C2), then overall the error is miscorrected (C4).
Recall that fault events from class C2 are valid corrections which need no further handling, and fault events from class C3 are already recognized as erroneous before the systemlevel fault manager has been invoked. Thus the role of the CPC-based checker is to detect erroneous corrections and RK undetected errors. The fault events from the classes C1 and C4 are therefore subdivided into classes S1 and S2 based on the outcome of this validation: -Class S1: Recognized by the outer code Seemingly successful but erroneous corrections which created an inconsistency when recalculating the outer code.   Figure 7 visualizes the four classes C1-C4 and their relationship to system-level classes S1 and S2.
In the next section, we discuss the fault-injection experiments into actual cryptographic circuits protected by the inner-outer code-based architecture from this section and the distribution of the observed effects among classes C1, C2, C3, C4, S1 and S2.

Experimental results
We consider error detection and correction architectures for four block ciphers: small-scale AES (with state consisting of 4 × 4 four-bit nibbles instead of bytes); regular AES; LED-64; and PRESENT. The state of AES is organized in bytes, and it incorporates 8-bit SBoxes, whereas all other considered ciphers have nibble-based states and 4-bit SBoxes. We implemented Rabii-Keren (RK) codes of distance 3, and 5 over F 16 , that is, one symbol corresponding to four bits. For all ciphers except AES, one symbol corresponds to one element. For AES, we consider a one-decoder and a two-decoder architecture, as explained in the beginning of Sect. 4.
The first seven columns of Table 4 summarize the considered architectures. Its first three columns show the base circuit, distance d of the RK code and whether one or two decoders are used (the latter only happens for full-scale AES as the only byte-oriented cipher). The subsequent four columns show the number of information and redundant data of the RK code, first expressed in the numbers of (4bit) symbols and then in bits. Note that the numbers for the two-decoder architecture are twice the numbers for a single decoder. The check-bit for the outer code is not included in the table.

Fault injection methodology
For the sake of clarity, we distinguish between faults and errors. Faults are injected into the circuitry, whereas errors are defined on the outputs of the circuit (or of its protected part). Therefore, timing-based fault injections used here can result in errors of different multiplicity, determined by two factors. First, the manipulated (faster-than-nominal) clock runs in parallel through the entire fault-injection campaign, resulting in different and unpredictable deviations between the nominal and the manipulated clock which accumulate over time. When the clock is switched from nominal to manipulated, the next clock edge can occur very quickly, resulting in a large number of failing paths within the circuit and therefore high- We ran fault-injection experiments on the mentioned architectures synthesized on Spartan-6 LX75 FPGA on Sakura-G board. We created a faster-than-nominal clock using the FPGA-level digital clock manager and switched to that clock during a specific cycle of encryption. This resulted in a wide distribution of errors of different multiplicity and is a good model of a malicious attack using a rather imprecise equipment. For each architecture, we collected and characterized 1,000,000 fault events.

Induced error statistics
Fault events cause erroneous output. In Sect. 2.2, we modeled the corrupted output as the sum of a fault free circuit output and a binary error vector. Namely, we assumed that each output bit can be represented as a correct bit that passed through a symmetric binary channel with a crossover probability p = p 0→1 = p 1→0 . The experimental results reported in Fig. 8 confirm this assumption. Moreover, it indicates that a fault in the circuit can be modeled as an additive error vec-tor over the alphabet of the code F q . The figure shows the distribution of the p 0→1 and p 1→0 crossover probabilities for the 64 and 128 bit ciphers. It is clear from the figure the p 0→1 ≈ p 1→0 .
In all tested ciphers, the majority of the bits have a crossover probability p ≈ 0.25. Therefore, we expect that on average there will be k · p = 16 or 32 bit flips in the information part. Recall that in a q-ary code the number of errors is counted in terms of erroneous q-ary symbols (and not in terms of bit errors). Figure 9 shows the probability that the j'th q-ary symbol is erroneous, and Fig. 10 shows the distribution of the error multiplicity. It is clear from the figure that some symbols are more vulnerable than others (this depends on the architecture of the circuit, to the delays of specific structures on an FPGA and on the particular mapping chosen by the FPGA synthesis tool). The average error multiplicity is 9.14 for small-scale AES, 13.32 for full-scale AES, 13.60 for LED-64 and 11.25 for PRESENT. This number is significantly larger than the error correction capability of the code, and therefore this histogram validates our assumption that the multiplicity of the injected error can be considered as arbitrary.

Classification of fault events
The distribution of the 1,000,000 fault events to four classes C1-C4 is shown in the next four columns of Table 4. The  Table 4 present the fault events from classes S1 and S2. The events are classified with respect to indications provided by a fault manager based on a relatively weak outer code-a CPC with r o = 4. Note that the number of S1 and S2 events sum up to the sum of classes C1 and C4, and that the percentages relate to this sum. For example, the number of fault events for the distance-3 architecture for small-scale AES that need system-level handling is 179 (C1) + 66,383 (C4) = 66,562; out of these, 62,463 or 93.8% are detected by the outer code (S1) and the remaining 4099 or 6.2% are not (S2). From the application point of view, the results indicate the suitability of RK-based architectures for mixed detectioncorrection architectures. In particular, using a code with some "reserves" in terms of detection capability (here: distance-5 code) results in no undetected fault events and no unrecognized erroneous corrections. This means that the architecture can correct low-magnitude disturbances, i.e., single-symbol errors, without much risk of missing attempted attacks.

Critical fault events
The decision of the RK encoder are validated by a systemlevel fault manager. Table 5 shows the performance of a system level fault manager that employs a CPC as an outer code (and doesn't use additional system level information such as consistency checks, sensors' indications). Four CPCs with r o = 4, 8, 12 and 16 redundant bits are considered. From Table 5, it can be seen that the vast majority of fault events in potentially critical classes C1 and C4 are handled successfully on the system level and are included in class S1. If distance-3 inner RK codes are used, less than 1% of fault events go undetected (class S2) just by adding 4 redundant CPC bits. Figure 11 shows that this number decreases exponentially with r o . In our experiments, S2 events never occurred for distance-5 inner RK codes. Even for distance-3 codes, one can assume that, prior to an unnoticed fault, the adversary will have to inject a large number of detected faults, such that the circuit can go into state of alert and, e.g., replace the secret key. The rather low number of successful corrections (single errors in Table 4) is just the number of single-symbol errors in the fault injection experiment. The code guarantees that every single error that shows up will be successfully corrected; this also eliminates the threat of precise single-nibble or single-byte fault injections [11,31]. The majority of uncorrectable faults are reliably recognized by either the RK code directly or by the outer code. For distance-5 code, the number of erroneous corrections is extremely small (between 0 and 11 out of 1000,000), and all of them are identified by the outer code. Table 6 shows a comparison of the size of our architecture, a purely linear BCH architecture and a triple modular redundant (TMR) architecture code in numbers of needed FPGA configurable logic blocks (CLB). It can be noticed that the robustness, and thus the increase in security, of the RK architecture comes at a low cost compared to the linear (and therefore non-robust) BCH implementation (which also used the ECLT-based approach). The highest increase due to inversions introduced by the RK code is 24% increase; in one case there is even a small decrease due to optimizations during FPGA synthesis. The cost of our architectures exceeds TMR for small basic ciphers, as some required circuitry is cipher-independent. Note, however, that TMR can be interpreted as repetition code and is not robust (the attacker can simply apply the same error to all copies), and therefore its security is inherently worse compared with a robust RK code. In the case of the AES, the number of CLBs are similar which further encourages the use of our architecture.

Implementation cost
While the detection and correction performance of the architecture is extremely attractive, the hardware cost of the solution based on advanced nonlinear codes is a major limiting factor. For this reason, the ECLT-based approach presented here is an important step towards making these architectures practical. Finally, it is important to note that the used q-ary codes demand more complex operations (multiplications and the inversions) than binary codes. However, it turns out that binary codes with comparable detection and correction properties need considerably more redundancy bits. For example, our distance-5 code over F 16 requires r = 56 redundancy bits for k = 128 data bits, whereas a binary BCH code with the same correction capability (d ≥ 2 · 8 + 1) necessitates r = 112 redundancy bits for the same k. Moreover, the decoding is more complex since the ECLT technique from this paper is not applicable and the Berlekamp-Massey algorithm must be used instead. Note that this algorithm cannot be performed in a single cycle, so our higher expenditures in hardware complexity are offset by execution time savings.

Comparison with conventional codes
A comparison between the performances of the RK code and the corresponding linear BCH code with the same parameters is given in Table 7. The table gives the number of faults (out of 1,000,000 randomly generated faults) that were not detected. Both codes have excellent detection performance and, consistent with [4], there are minimal fluctuations for distance-3 codes. Note that this finding applies to the simulated average-case scenario, namely a large number of random faults of arbitrary multiplicity. In contrast, the advantage of a robust code refers to the worst-case scenario, where the attacker can strategically inject the error such as to stay undetected. In the comparison of Table 7, the RK code can withstand attacks in this worst-case scenario while the linear code cannot, whereas the average-case performance of both codes is quite similar.