Keywords

1 Introduction

Hummingbird-2 is a light-weight authenticated encryption primitive designed by a team led by Eric Smith of Revere Security and presented in RFIDSec ’11 [1]. Hummingbird-2 has been proposed for standardization in RFID use within ISO [2].

Hummingbird-2 was created largely in response to an effective FSE ’11 attack by Saarinen [3] against the original Hummingbird algorithm [46]. Saarinen’s single-key attack broke the 256-bit Hummingbird-1 with \(2^{64}\) effort.

Some independent analysis on Hummingbird-2 has been published. In [7] a “differential sequence attack” is described, but the total complexity of the attack is higher than exhaustive search and therefore it is “of theoretical interest only”. The same is said of the side channel cube attack presented in [8]. An even more far-fetched attack is described in [9], requiring \(2^{240}\) memory.

IACR ePrint [10] described an attack simultaneously using dozens of related keys. Unfortunately the attack, as described, had some errors and the authors subsequently withdrew the paper. However, some observations contained in it inspired our research that led to the discovery of high-probability correlated keys described in Sect. 2.1.

The structure of this paper is as follows. In Sect. 2 we describe the relevant components of the Hummingbird-2 algorithm and make a number of observations about its various features. In Sect. 3 we describe an effective key-recovery attack that uses a single key relation. We discuss enabling factors of the attack in Sect. 3.7, followed by conclusions in Sect. 4.

Appendix A contains a full specification for a new variant which is resistant to these attacks and is based on novel \(\chi \) functions (rather than traditional S-boxes).

2 Examining the Hummingbird-2 Algorithm

Hummingbird-2 is neither a block cipher nor a stream cipher in the traditional sense but combines some of the features of both. In this it resembles other integrated authenticated encryption proposals such as Helix [11] and Phelix [12].

The “Hummingbird structure” uses 16-bit data paths throughout as it was originally targeted towards low-end microcontrollers such as the TI MSP430 family. Data is always encrypted or decrypted in 16-bit increments. The cipher accepts a 64-bit initialization vector \({IV}\), a 128-bit secret key \(K\), and maintains a 128-bit state in registers \(R\). A method for deriving message authentication tags from the internal state is also given in the specification [1].

We use the following symbols and notation:

figure a

In the following sections, we will describe the various algorithm components and present observations that will be used in the final overall attack. These cryptanalytic observations may also be useful in attacks of other types than the one described in this work. For a complete specification of Hummingbird-2, we refer the reader to [1].

2.1 WD16 (and High-Correlation Related Keys)

Hummingbird-2 draws almost all of its nonlinearity from the WD16 function. WD16 uses four keying words (total 64 bits) which define a permutation on a 16-bit input value. One may see WD16 as a 16-bit block cipher with a 64-bit key.

WD16 is a four-round substitution-permutation network. In each round, a 16-bit subkey is XORed to the state, four \(4 \times 4\) - bit S-boxes are applied in parallel, followed by a linear mixing step. The structure is shown in Fig. 1.

We use \(S(x)\) to denote the parallel application of the 4-bit S-boxes \(S_1\), \(S_2\), \(S_3\), \(S_4\) on the 16-bit word \(x\). The linear operation is \(L(x) = x \oplus (x \lll 6) \oplus (x \ggg 6)\). If we shorten their compound operation to \({LS}(x) = L(S(x))\) then \(\mathrm {WD16}\) can be written as:

$$\begin{aligned} \mathrm {WD16}(x, k_1, k_2, k_3, k_4) = \mathrm {LS}(\mathrm {LS}(\mathrm {LS}(\mathrm {LS}(x \oplus k_1) \oplus k_2) \oplus k_3) \oplus k_4). \end{aligned}$$
(1)

We occasionally also use \(\mathrm {LS}^{-1}\) and \(\mathrm {WD16}^{-1}\) to denote the inverses of respective functions. We fist observe that the WD16 can produce closely correlated output with some distinct but related keys.

Observation 1

Consider two 64-bit WD16 keys \((k_1, k_2, k_3, k_4)\) and \((k'_1, k'_2, k'_3, k'_4)\) that for some \(i \in \{1, 2, 3\}\) are related by \(\delta = k_i \oplus k'_i\) and \(\varDelta = k_{i+1} \oplus k'_{i+1}\), with the other two key words equivalent. There are such pairs that will yield equivalent WD16 encryption and decryption for approximately \(1/4\) for input and output values.

In a differential attack we only want to have a single active S-box to maximize the probability. As with any \(4 \times 4\) S-box, each one of \(S_1\), \(S_2\), \(S_3\) and \(S_4\) must have differentials that work for at least four of the 16 input values, leading to the given probability \(1/4\).

Fig. 1.
figure 1

The “WD16” mixing function is a 16-bit substitution-permutation network with four rounds and a 64-bit subkey \((k_1, k_2, k_3, k_4)\). It is used in both initialization and encryption phases.

Looking at Fig. 1 we can see how after the \(\delta = k_i \oplus k'_i\) difference is introduced at position \(i\), it is then subjected to a S-box substitution and a linear transformation before the \(\varDelta = k_{i+1} \oplus k'_{i+1}\) key difference cancels it out at \(i+1\) with the given probability 1/4.

Table 1 gives a list of all of such pairs that have the optimum probability of exactly 1/4. This table was created via an exhaustive search.

Table 1. All \(4 \times 18 = 72\) high-probability related key word pairs where \(\delta = k_i \oplus {k'}_i\) is canceled by \(\varDelta = k_{i+1} \oplus {k'}_{i+1}\) in the WD16 nonlinear function with probability 1/4.

We give some examples of WD16 key pairs for which \({WD16}_{A}(x) = {WD16}_{B}(x)\) with probability 1/4:

figure b

The last two examples use the \(\mathtt{F000} \rightarrow \mathtt{6198}\) relation which was (randomly) chosen for the main attack described in Sect. 3 of this paper. There is a wide spectrum of variations of a more general attack methodology that is represented by that specific case; picking some other relation leads to a different attack.

2.2 Initialization and State Collisions

The initialization phase of Hummingbird-2 creates a 128-bit initial state from the 64-bit IV using the secret key and the WD16 function.

Fig. 2.
figure 2

Initialization round. There are four initialization rounds with a counter stepping through \(i = 0, 1, 2, 3\).

Initialization is a four-round process. Figure 2 shows a single initialization round. The state is first set as \(R = {IV}~|~{IV}\). In each round, there are four invocations of WD16 together with some \(\mathrm{mod}~ 2^{16}\) additive mixing, followed by cyclic rotations of the first four registers and linear exclusive-or “accumulation” mixing of the first four registers with the last four. The round counter \(i = 0, 1, 2, 3\) is also used in the mix at the very beginning. The input keys to WD16 alter between the two halves of the master key \((K_1\), \(K_2\), \(K_3\), \(K_4)\) and \((K_4\), \(K_5\), \(K_7\), \(K_8)\).

Observation 2

For each key \(K\), there is a family of 432 related keys \(K'\) that yield the same state \(R\) after four initialization rounds with probability \(P=2^{-16}\) over all \({IV}\) values.

There are six possible positions \(i\) for \(\delta = K_i \oplus K'_i\) and \(\varDelta = K_{i+1} \oplus K'_{i+1}\) that maximize the probability; \(i \in \{1, 2 ,3, 5, 6, 7\}\). Since there are two S-box activations in each round and four initialization rounds, the total probability of arriving at the same initial state for two such related keys is \((1/4)^{2 \times 4} = 2^{-16}\). As there are 72 suitable \((\delta , \varDelta )\) pairs (see Table 1), for each 128-bit key \(K\) there are at least \(6 \times 72 = 432\) related keys that will give the same initial state with the given \(2^{-16}\) probability. This observation has been experimentally verified.

2.3 Encryption

Hummingbird-2 encrypts and decrypts data in 16-bit increments, as shown in Fig. 3. The 128-bit state \(R^i\) and key \(K\) define a permutation from the plaintext word \(P^i\) to the ciphertext word \(C^i\) or vice versa. To encrypt plaintext word \(P^i\) into a ciphertext word \(C^i\), the following steps are taken:

$$\begin{aligned} t^i_0 =&~ P^i \boxplus R^{i}_1 \\ t^i_1 =&~ \mathrm {WD16}(t^i_0, K_1, K_2, K_3, K_4) \\ t^i_2 =&~ \mathrm {WD16}(t^i_1 \boxplus R^{i}_2, K_5 \oplus R^{i}_5, K_6 \oplus R^{i}_6, K_7 \oplus R^{i}_7, K_8 \oplus R^{i}_8) \\ t^i_3 =&~ \mathrm {WD16}(t^i_2 \boxplus R^{i}_3, K_1 \oplus R^{i}_5, K_2 \oplus R^{i}_6, K_3 \oplus R^{i}_7, K_4 \oplus R^{i}_8) \\ t^i_4 =&~ \mathrm {WD16}(t^i_3 \boxplus R^{i}_4, K_5, K_6, K_7, K_8) \\ C^i =&~ t^i_4 \boxplus R^{i}_1. \end{aligned}$$

After each encrypted word is processed, the state is updated:

$$\begin{aligned} R^{i+1}_1 =&~ R^{i}_1 \boxplus t^i_3 \\ R^{i+1}_2 =&~ R^{i}_2 \boxplus t^i_1 \\ R^{i+1}_3 =&~ R^{i}_3 \boxplus t^i_2 \\ R^{i+1}_4 =&~ R^{i}_4 \boxplus R^{i}_1 \boxplus t^i_3 \boxplus t^i_1 \\ R^{i+1}_5 =&~ R^{i}_5 \oplus (R^{i}_1 \boxplus t^i_3) \\ R^{i+1}_6 =&~ R^{i}_6 \oplus (R^{i}_2 \boxplus t^i_1) \\ R^{i+1}_7 =&~ R^{i}_7 \oplus (R^{i}_3 \boxplus t^i_2) \\ R^{i+1}_8 =&~ R^{i}_8 \oplus (R^{i}_4 \boxplus R^{i}_1 \boxplus t^i_3 \boxplus t^i_1). \end{aligned}$$
Fig. 3.
figure 3

Encryption of plaintext word \(P^i\) to ciphertext word \(C^i\) and update of state \(R\). The “temporary” variables \(t_0 \cdots t_4\) are used in the description of the attack.

For decryption, an inverse of WD16 function is required and the \(t\) quantities are computed in reverse order. The update function remains the same.

2.4 Related-Key Progression in Encryption

We see that there are four invocations of WD16 in each encryption operation and that key halves \(K_1 .. K_4\) and \(K_5 .. K_8\) are used twice each. In the middle two WD16 rounds the key is XORed with four of the higher “accumulator” state registers, but that has no effect on the differential. Since the differential is activated twice, there is a \((1/4)^2 = 1/16\) probability of matching ciphertexts.

Observation 3

There is a 1/16 probability that for a matching state \(R\) the related keys \(K\) and \(K'\) (as defined in Sect. 2.1) will encrypt the same plaintext word to the equivalent ciphertext word.

Note that if the key difference is in \(K_5 .. K_8\), there is a 1/4 probability of equivalent state update as the last WD16 invocation only affects ciphertext output, not the state. Conversely, if the key difference is in \(K_1 .. K_4\), the state update will be equivalent in decryption with 1/4 probability. Furthermore, if the \((\delta ,\varDelta )\) difference is in \((K_1, K_2)\) as the first WD16 does not affect the state in decryption and at least 12 bits of the plaintext will be equivalent as there is only one active S-box.

3 Crafting an Attack

There are many ways that one can use the high-probability correlated keys in an attack. We will describe the one that we implemented, which uses only a single related key pair described in Sect. 3.1.

The attack proceeds in a number of distinct stages. We first find a suitable IV values for the attack (Sect. 3.2), and then proceed to solve various internal quantities (Sects. 3.3 and 3.4) and finally parts of the secret key (Sects. 3.5 and 3.6).

3.1 Attack Model

We assume that the attacker has access to two “black box” oracles whose keys are related by

$$\begin{aligned} K \oplus K' = ( \mathtt {F000} ~ \mathtt {6198} ~ \mathtt {0000} ~ \mathtt {0000} ~ \mathtt {0000} ~ \mathtt {0000} ~ \mathtt {0000} ~ \mathtt {0000}). \end{aligned}$$
(2)

The choice of this particular key relation is almost arbitrary in the set of admissible key differences. Many of the differentials in Table 1 could be used as well.

In our model the attacking algorithm may perform chosen-IV initializations and query encryptions and decryptions from the oracles. For an ideal cipher the most effective way to recover the secret key \(K\) (and \(K'\)) would be to through brute force with expected complexity of \(2^{128}\) trials. Therefore we will use the estimated time required for a single trial, consisting of initialization and encryption/decryption of a single word as the “unit complexity” \(c = 2^0\).

We note that in a brute force attack eight words need to be encrypted in order to be reasonably sure that the correct key has been found, but with the probability 65535/65536 the incorrect ones can be rejected after encryption of a single word. Hence we use this as the unit complexity.

3.2 Finding a State Collision

The first stage of the attack is to find an \({IV}\) value that produces a matching state \(R\) after the four-round initialization procedure for both \(K\) and \(K'\). As indicated by Observation 2 in Sect. 2.2, one expects to find such a collision after searching through \(2^{16}\) different \({IV}\) values. Detection of a collision can be made by trial decryptions. If we decrypt a word \(x\) immediately after initialization, then there is a 1/4 probability that 12 bits of the corresponding plaintext words will match as discussed in Sect. 2.4. The overall complexity of this step is no more than \(2^{20}\) to find an \({IV}\) collision that holds with overwhelming probability.

Note that subsequent collisions may be found faster (for this \(K_1, K_2\) relation) if we first search using words \(({IV}_1\), \({IV}_2\), \({IV}_3)\) and for consecutive searches keep those words constant and loop through values of \({IV}_4\). The two initial round collisions are therefore guaranteed and consecutive collisions can be found with probability \(2^{-12}\).

Our attack requires only a single initialization state collision, henceforth denoted simply as \({IV}\).

3.3 Attacking \(R^i_1\) with Carry Bits

It is important to note that in HB2 encryption we can also have state and ciphertext word collisions when the plaintext words \(P\) (for \(K\) instance) and \(P'\) (for \(K'\) instance) are not equal.

The next stage involves the recovery of \(R^i_1\). We can generate full codebooks \(P^i \leftrightarrow C^i\) and \(P'^i \leftrightarrow C'^i\) that depend on the \({IV}\) and previous \(P^j\), \(j < i\) values with roughly \(2^{17}\) effort if \(i\) is small. We fix \(C^j = C'^j\) for \(j < i\) and the states \(R^i\) do not diverge. Looking at Figures 1 and 3 we note the following.

Observation 4

The first \((\delta , \varDelta )\) collision in the encryption operation works when

$$\begin{aligned} S( (P^i \boxplus R^i_1) \oplus K_1 ) \oplus S( (P'^i \boxplus R^i_1) \oplus K'_1 ) = L^{-1}(\varDelta ). \end{aligned}$$
(3)

Here we use \(S\) to denote the four parallel S-box lookups and \(L^{-1}\) to denote the inverse of the shift/XOR linear step in WD16, as in Eq. 1.

The \(\delta \) and \(\varDelta \) values dictate which values the input differential \(P^i \oplus P'^i\) can take. Since the input differential \(\delta = K_1 \oplus K'_1 = \mathtt {F000}\) is in the high nibble, only the high nibbles \(N = ((P^i \boxplus R^i_1) \oplus K_1)) >> 12\) and \(N' = ((P'^i \boxplus R^i_1) \oplus K'_1 ) >> 12\) really matter. We can tabulate successful pairs; see Table 2.

Table 2. High nibbles of intermediate values \(N = ((P^i \boxplus R^i_1) \oplus K_1)) >> 12\) and \(N' = ((P'^i \boxplus R^i_1) \oplus K'_1 ) >> 12\) in WD16 that will provide a collision. These are the pairs for which \(S_1(N) \oplus S_1(N' \oplus \mathtt {0xF}) = \mathtt {0x6}\). Note that in the diagonal there are four entries as expected; if \(N = N'\) there is a 1/4 probability of a collision.

We see that Table 2 has only one entry per each horizontal and vertical line; \(N'\) can be given as a function of \(N\) and vice versa. If the \(N\) and \(N'\) entries are shifted by one position the collision at that point becomes impossible.

As we only want to have a single active S-box, may choose the high nibbles of \(P^i\) and \(P'^i\) arbitrarily, but we have to keep the low 12 bits the same.

Observation 5

The probability of the carry shift depends solely on the value of plaintext low bits and the low bits of \(R_1^i\). The shift will occur only when

$$\begin{aligned} (P^i \wedge \mathtt {0FFF}) + (R_1^i \wedge \mathtt {0FFF}) \ge \mathtt {1000}. \end{aligned}$$
(4)

Since we have created a codebook of \(P^i \leftrightarrow C^i\), we may effectively loop through the low 12 bits of \(p = P^i \wedge \mathtt {0FFF} = P'^i \wedge \mathtt {0FFF}\) and until the carry-over “shift” occurs and the pattern changes from \(p = \mathtt {0000}\). This will give us the low bits of \(R_1^i\). This process isn’t entirely foolproof as there are is a second collision that is required in the encryption process, but due to abundance of trials we may accurately pinpoint the \(p\) carry transition point with a good probability.

For each \(p\) value we may test \(16 \times 16 = 256\) high nibble pairs for a matching ciphertext collision. Those collisions must occur at the points with an entry in Table 2. We may loop from low values of \(p\) towards higher values and see the lowest \(p\) value which starts to give different “grid”. The algorithm we use is therefore essentially based on elimination of impossible combinations.

Note that the \(K_1\) keying XOR in Eq. 3 also affects this step and the actual shift that occurs. However, we have found that if we guess the highest bit of \(K_1\) (and hence \(K'_1\) which has the inverse high bit), we can actually determine all 16 bits of \(R^i_1\) with high probability with roughly \(2^{17}\) total complexity and one guessed bit.

3.4 Deriving Additional Quantities for an Attack

From Sect. 2.3 we see that \(R_1\) is updated as \(R^{i+1}_1 = R^{i}_1 \boxplus t^i_3\). If we have derived two consecutive \(R_1\) values using the technique outlined in Sect. 3.3, we obtain the value of \(t_3\) at round \(i\):

$$\begin{aligned} t^i_3 = R^{i+1}_1 \boxminus R^{i}_1. \end{aligned}$$
(5)

Furthermore, since \(C^i = t^i_4 \boxplus R^{i}_1\), we obtain

$$\begin{aligned} t^i_4 = C^i \boxminus R^{i}_1. \end{aligned}$$
(6)

This stage proceeds by attempting to create a sequence where \(t^i_4 = t^{i+1}_4\) holds with a high probability. To do this, for \(i = 1, 2, 3 \cdots ~2^7\) process each full 16-bit codebook as discussed in Sect. 3.3 and choose \(C_i\) to be the smallest value after \(R^i_1\) such that corresponding \(P_i\) and \(P'_i\) form a state collision.

For those pairs where \(t^i_4 = t^{i+1}_4\), the following relation holds since WD16 is a permutation and matching output words imply matching input words:

$$\begin{aligned} t^i_3 \boxplus R^i_4 = t^{i+1}_3 \boxplus R^{i+1}_4. \end{aligned}$$
(7)

We manipulate Eq. 7 into \(t^i_3 = t^{i+1}_3 \boxplus R^{i+1}_4 \boxminus R^i_4\) and substitute that into the \(R_4\) update function

$$\begin{aligned} R^{i+1}_4 = R^{i}_4 \boxplus R^{i}_1 \boxplus t^i_3 \boxplus t^i_1 \end{aligned}$$
(8)

to obtain

$$\begin{aligned} t^i_1 = \boxminus R^{i}_1 \boxminus t^{i+1}_3. \end{aligned}$$
(9)

Since \(R^{i}_1\) and \(t^{i+1}_3\) are known quantities, as is \(t^i_0 = P^i \boxplus R^{i}_1\), we now can attack the first half of the keywords:

$$\begin{aligned} t^i_1 = \mathrm {WD16}(t^i_0, K_1, K_2, K_3, K_4). \end{aligned}$$
(10)

Note that due to the probabilistic nature of our \(R_1\) derivation method, not all of these candidate pairs are valid. However, we have experimentally verified that in practice a sufficient number is valid and the key search algorithm (described in Sect. 3.5) is designed in a way that accounts for false pairs.

3.5 A Time-Memory Trade-off for \(K_1 \cdots K_4\) Search

The information obtained in Sects. 3.3 and 3.4 – especially Eq. 10 – already allow the keyspace of Hummingbird-2 to be split in half and a \(2^{64}\) attack can be mounted via exhaustive search. We will describe a simple time-memory tradeoff attack that allows further square root reduction for the first half of the key words.

In this step, we are given \(n\) values \((x_i, y_i)\), \(1 \le i \le n\), that satisfy

$$\begin{aligned} \mathrm {WD16}(x_i, K_1, K_2, K_3, K_4) = y_i \end{aligned}$$
(11)

with a reasonable probability (see Eq. 10).

We’ve experimentally discovered that if we perform the search for matching consecutive \(t_4\) pairs discussed in Sect. 3.4 up to a limit of \(2^7\) plaintext / ciphertext words, we are typically left with \(n=2^4\) candidates. Out of these, about \(2^3\) will be “right pairs” that actually satisfy Eq. 11 for the correct subkeys. This is a sufficient fraction for a time-memory trade-off technique.

To eliminate one of the keys, we pair the values and investigate \((x_i,y_i)\) and \((x_j,y_j)\), \(1 \le i \le j \le n\). There are \(n(n-1)/2\) pairs, quarter of which will be right pairs. This will help to cancel out \(K_3\) in the computation.

Table Generation. For each \(i, j\) pair, we first construct a lookup table for subkey \(K_4\). For each guessed \(0 \le K_4 < 2^{16}\) we compute the middle value \(h\) and build a table \(T()\):

$$\begin{aligned} h=&~ \mathrm {LS}^{-1}(\mathrm {LS}^{-1}(y_i) \oplus K_4) \oplus \mathrm {LS}^{-1}(\mathrm {LS}^{-1}(y_j) \oplus K_4) \\ T(h) =&~ K_4. \end{aligned}$$

Here a candidate for \(K_4\) can be obtained from the \(h\) value by building an appropriate data structure that takes care of collisions.

Key Search. Approaching the WD16 from the other direction, we then loop through the \(2^{32}\) values of \(K_1\) and \(K_2\) and look for a match in

$$\begin{aligned} h' = \mathrm {LS}(\mathrm {LS}(x_i \oplus K_1) \oplus K_2) \oplus \mathrm {LS}(\mathrm {LS}(x_j \oplus K_1) \oplus K_2) \end{aligned}$$
(12)

Here \(T(h')\) gives a candidate for \(K_4\) with \(O(1)\) effort. Then we check for all \(1 \le k \le n\) pairs \((x_k,y_k)\) how many of those yield the same \(K_3\) value

$$\begin{aligned} K^{?}_3 = \mathrm {LS}(\mathrm {LS}(x_k \oplus K_1) \oplus K_2) \oplus \mathrm {LS}^{-1}(\mathrm {LS}^{-1}(y_k) \oplus T(h')). \end{aligned}$$
(13)

If five or six of those \(K_3\) values agree, then there is a significant probability that we have found the correct 64-bit quartet \((K_1, K_2, K_3, K_4)\) of the secret key words.

Complexity. Since about \(2^4\) lookup key searches of \(2^{32}\) primitive operations (and a total of \(2^{16}\) memory) is required, we estimate that the total complexity of this step is less than \(2^{36}\) when adjusted to the scale of the complexity of brute force key search as discussed in the beginning of Sect. 3.

3.6 Finding the Rest: \(K_5 \cdots K_8\) Search

After the first half of the keying material has been discovered, it is a simple matter to brute force the rest. We have not found a time-memory tradeoff or other simple shortcut for the recovery of this part. Hence the total complexity is dominated by the second half, giving the total complexity of \(2^{64}\) processing and about \(2^{16}\) data.

It is quite easy to see that the last WD16 instance could be used to speed up key recovery if the difference between two keys would be at the right half of the key. However, in the beginning of Sect. 3 we chose a specific difference which lies at the first words. If we adopt the nonstandard setting of [10] where more than two “black boxes” with specific key relations can be accessed, then it the overall complexity of key recovery can be pushed down to the \(2^{36}\) range. However, this attack model is rather unrealistic.

3.7 Discussion

Our attacks are specific to the Hummingbird structure as they do not purely follow any clear classical attack path such as linear or differential cryptanalysis. One may create a number of different attacks based on the same observations.

We developed the attack described in this paper while we were implementing it. One discovery led to the next. Our attack implementation used clear black box insulation and therefore we have a high degree of confidence that it works. We have tested it with various subsets of key space.

Design Issues. The attacks are made possible by a combination of factors. Lessons were perhaps not fully learned from the attacks of [3] which exploited the simplistic key schedule and algebraic properties of the Hummingbird structure. However, a simple and fast key schedule is partly dictated by the timing constraints of the RFID environment and protocols for which Hummingbird was designed. It can also be argued that having 16-bit datapaths with additive mixing has certain advantages when a cipher is specifically to be used with a 16-bit embedded CPU, even though the particular structure of Hummingbird may not fully utilize the potential.

Fixing WD16: Hummingbird-2 \({\varvec{\nu .}}\) The main enabler of the attacks is the WD16 function and the way it is keyed. Furthermore WD16 has a linear mixing stage \(L(x)\) that has suboptimal diffusion and does not allow effective use of lookup tables to speed up decryption of data like the MDS [13] matrices of SHARK [14] and AES [15] do.

To mitigate both security and efficiency issues, we propose an alternative where \(\mathrm {WD16}(x, k_1, k_2, k_3, k_4)\) has been replaced with “S-boxless” \(\chi _\nu (x, k_1, k_2, k_3, k_4)\) to produce a variant called Hummingbird-2\(\nu \). Hummingbird-2\(\nu \) is described in more detail in Appendix A. This variant is geared towards hardware implementation. We note that that the estimated implementation footprint for a 32-cycle version of HB2 is only 500 GE and an implementation that can perform both encryption and decryption is around 700 GE. More accurate implementation results will be reported separately.

4 Conclusions

We have discovered and demonstrated large related key classes which produce closely correlated output for any given input. The weak key classes penetrate both the initialization and actual ciphering stages of Hummingbird-2.

We have developed a full key recovery related-key attack algorithm which effectively halves the cipher’s key size. This attack allows the secret key can to be recovered with only \(2^{64}\) time and \(2^{16}\) data in a two-key setting. The attack has been implemented and verified to work. Furthermore, the first half of the key can be recovered with only \(2^{36}\) effort. Other types of attacks may be derived from the same observations.

Even though it may be tempting to derive multiple keys from a single one (e.g. one for each communication direction or medium), Hummingbird-2 should only be used with strictly random keys. This approach is taken in the ISO protocol proposal [2]. System designs where the secret keys of tags are related or shortened should be avoided. Key bits must never be used to denote access / product categories or other information.