Full Plaintext Recovery Attack on Broadcast RC4

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8424)

Abstract

This paper investigates the practical security of RC4 in broadcast setting where the same plaintext is encrypted with different user keys. We introduce several new biases in the initial (1st to 257th) bytes of the RC4 keystream, which are substantially stronger than known biases. Combining the new biases with the known ones, a cumulative list of strong biases in the first 257 bytes of the RC4 keystream is constructed. We demonstrate a plaintext recovery attack using our strong bias set of initial bytes by the means of a computer experiment. Almost all of the first 257 bytes of the plaintext can be recovered, with probability more than 0.8, using only \(2^{32}\) ciphertexts encrypted by randomly-chosen keys. We also propose an efficient method to extract later bytes of the plaintext, after the 258th byte. The proposed method exploits our bias set of first 257 bytes in conjunction with the digraph repetition bias proposed by Mantin in EUROCRYPT 2005, and sequentially recovers the later bytes of the plaintext after recovering the first 257 bytes. Once the possible candidates for the first 257 bytes are obtained by our bias set, the later bytes can be recovered from about \(2^{34}\) ciphertexts with probability close to 1.

Keywords

RC4 Broadcast setting Plaintext recovery attack Bias Experimentally-verified attack SSL/TLS Multi-session setting 

1 Introduction

RC4, designedIsobe, TakanoriOhigashi, ToshihiroWatanabe, YuheiMorii, Masakatu by Rivest in 1987, is one of most widely used stream ciphers in the world. It is adopted in many software applications and standard protocols such as SSL/TLS, WEP, Microsoft Lotus and Oracle secure SQL. RC4 consists of a key scheduling algorithm (KSA) and a pseudo-random generation algorithm (PRGA). The KSA converts a user-provided variable-length key (typically, 5–32 bytes) into an initial state \(S\) consisting of a permutation of \(\{0, 1, 2, \ldots , N - 1\}\), where \(N\) is typically 256. The PRGA generates a keystream \(Z_1\), \(Z_2\), \(\ldots \), \(Z_r\), \(\ldots \) from \(S\), where \(r\) is a round number of the PRGA. \(Z_r\) is XOR-ed with the \(r\)-th plaintext byte \(P_r\) to obtain the ciphertext byte \(C_r\). The algorithm of RC4 is shown in Algorithm 1, where \(+\) denotes arithmetic addition modulo \(N\), \(\ell \) is the key length, and \(i\) and \(j\) are used to point to the locations of \(S\), respectively. Then, \(S[x]\) denotes the value of \(S\) indexed \(x\).

After the disclosure of its algorithm in 1994, RC4 has attracted intensive cryptanalytic efforts over past 20 years. Distinguishing attacks, which attempt to distinguish an RC4 keystream from a random stream, were proposed in [3, 4, 8, 10, 11, 14, 16]. State recovery attack, which recovers a full state instead of the user-provided key, was shown by Knudsen et al. [7], and it was improved by Maximov and Khovratovich [13]. Other types of attacks are also proposed, e.g., key collision attack [12], keystream predictive attack [10] and key recovery attacks from a state [1, 15].

In FSE 2001, Mantin and Shamir presented an attack on RC4 in the broadcast setting where the same plaintext is encrypted with different user keys [11]. The Mantin-Shamir attack can extract the second byte of the plaintext from only \(\varOmega (N)\) ciphertexts encrypted with randomly-chosen different keys by exploiting a bias of \(Z_2\). Specifically, the event \(Z_2 = 0\) occurs with twice the expected probability of a random one. In FSE 2011, Maitra, Paul and Sen Gupta showed that \(Z_3, Z_4, \ldots , Z_{255}\) are also biased to 0 [8]. Then the bytes 3 to 255 can also be recovered in the broadcast setting, from \(\varOmega (N^3)\) ciphertexts.

Although the broadcast attacks were theoretically estimated, we find that three questions are still open in terms of a practical security of broadcast RC4.
  1. 1.

    Are the biases exploited in the previous attacks the strongest biases for the initial bytes 1 to 255?

     
  2. 2.

    While the previous results [8, 11] estimate only lower bounds (\(\varOmega \)), how many ciphertexts encrypted with different keys are actually required for a practical attack on broadcast RC4?

     
  3. 3.

    Is it possible to efficiently recover the later bytes of the plaintext, after byte 256?

     

1.1 Our Contribution

In this paper, we provide answers to all the aforesaid questions. To begin with, we introduce a new bias regarding \(Z_1\), which is a conditional bias such that \(Z_1\) is biased to 0 when \(Z_2\) is \(0\). Using this bias in conjunction with the bias of \(Z_2 = 0\) [11], the first byte of a plaintext is extracted from \(\varOmega (N^{2})\) ciphertexts encrypted with different keys. Although the strong bias of the first byte, which is a negative bias towards zero, has already been pointed out in [6, 14], it requires \(\varOmega ({N}^3)\) ciphertexts to extract the first byte of the plaintext. Thus, the new conditional bias observed by us is very useful, because the number of required ciphertexts to recover the first byte reduces by a factor of \(N/2\) compared the straightforward method. Besides, we introduce new strong biases, i.e., \(Z_3 = 131\), \(Z_r = r\) for \(3 \le r \le 255\), and extended keylength-dependent biases such that \(Z_{x \cdot \ell } = -x \cdot \ell \) for \(x = 2, 3, \ldots , 7\) and \(\ell = 16\), which are extensions of the keylength-dependent biases in which only the parameter of \(x=1\) is considered [5]. These new biases are substantially stronger than known biases of \(Z_r =0\) in case of certain bytes within \(Z_3, Z_4, \ldots , Z_{255}\). After providing theoretical considerations for these biases, we experimentally confirm the validity of the same. Combining the new biases with known biases, we construct a cumulative list of strongest known biases in \(Z_1, Z_2, \ldots , Z_{255}\). At the same time, we experimentally show two new biases of \(Z_{256}\) and \(Z_{257}\), and add these to our bias set. Note that biases of \(Z_2, Z_3, \ldots , Z_{257}\) included in our bias set are \(strongest\) biases amongst all single positive and negative biases of each byte when a 16-byte (128-bit) key is used.

We demonstrate a plaintext recovery attack using our bias set by the computer experiment, and estimate the number of required ciphertexts and success probability when \(N =256\). Almost all first 257 bytes, \(P_1, P_2, \ldots , P_{257}\), can be extracted with probability more than 0.8 from \(2^{32}\) ciphertexts encrypted by randomly-chosen keys. Given \(2^{34}\) ciphertexts, all bytes of \(P_1, P_2, \ldots , P_{257}\) can be narrowed down to two candidates each with probability one. This is a first practical security evaluation of broadcast RC4 using all known biases of the cipher, and some new ones that we observe.

Finally, an efficient method to extract later bytes of the plaintext, namely bytes after \(P_{258}\), is given. It exploits our bias set of \(Z_{1}, Z_{2}, \ldots , Z_{257}\) in conjunction with the digraph repetition bias proposed by Mantin [10], and then sequentially recovers bytes of the plaintext. Once the possible candidates for \(P_1, P_2, \ldots , P_{257}\) are obtained by our bias set, \(P_r\)\((r \ge 258)\) are recovered from about \(2^{34}\) ciphertexts with probability one. Since the digraph repetition bias is a long-term bias, which occurs in any keystream byte, our sequential method is expected to recover any plaintext byte from only ciphertexts produced by different randomly-chosen keys. We show that the first \(2^{50}\) bytes \(\approx \) 1000 T bytes of the plaintext can be recovered from \(2^{34}\) ciphertexts with probability of \(0.97170\).

Also, the broadcast setting is converted into the multi-session setting of SSL/TLS where the target plaintext block are repeatedly sent in the same position in the plaintexts in multiple sessions.

2 Known Attacks on Broadcast RC4

This section briefly reviews known attacks on RC4 in the broadcast setting where the same plaintext is encrypted with different randomly-chosen keys.

2.1 Mantin-Shamir (MS) Attack

Mantin and Shamir first presented a broadcast RC4 attack exploiting a bias of \(Z_2\) [11].

Theorem 1

[11]. Assume that the initial permutation \(S\) is randomly chosen from the set of all the possible permutations of \(\{0, 1, 2, \ldots , N-1\}\). Then the probability that the second output byte of RC4 is 0 is approximately \(\frac{2}{N}\).

This probability is estimated as \(\frac{2}{256}\) when \(N = 256\). Based on this bias, the broadcast RC4 attack is demonstrated by Theorems 2 and 3.

Theorem 2

[11]. Let \(X\) and \(Y\) be two distributions, and suppose that the event \(e\) happens in \(X\) with probability \(p\) and in \(Y\) with probability \(p \cdot (1+q)\). Then for small \(p\) and \(q\), \(\mathrm {O}(\frac{1}{p \cdot q^2})\) samples suffice to distinguish \(X\) from \(Y\) with a constant probability of success.

In this case, \(p\) and \(q\) are given as \(p = 1/N\) and \(q = 1\). The number of samples is about \(N\).

Theorem 3

[11]. Let \(P\) be a plaintext, and let \(C^{(1)}, C^{(2)}, \ldots , C^{(k)}\) be the RC4 encryptions of \(P\) under \(k\) uniformly distributed keys. Then, if \(k = \varOmega (N)\), the second byte of \(P\) can be reliably extracted from \(C^{(1)}, C^{(2)}, \ldots , C^{(k)}\).

According to the relation \(C^{(i)}_2= P^{(i)}_2 \oplus Z^{(i)}_2\), if \(Z^{(i)}_2 = 0\) holds, then \(C^{(i)}_2\) is same as \(P^{(i)}_2\). From Theorem 1, \(Z_2 = 0\) occurs with twice the expected probability of a random one. Thus, most frequent byte in amongst \(C_2^{(1)}, C_2^{(2)}, \ldots , C_2^{(k)}\) is likely to be \(P_2\) itself. When \(N = 256\), it requires more than \(2^8\) ciphertexts encrypted with randomly-chosen keys.

2.2 Maitra, Paul and Sen Gupta (MPS) Attack

Maitra, Paul and Sen Gupta showed that \(Z_3, Z_4, \ldots , Z_{255}\) are also biased to 0 [6, 8]. Although the MS attack assumes that an initial permutation \(S\) is random, the MPS attack exploits biases of \(S\) after the KSA [9]. Let \(S_r[x]\) be the value of \(S\) indexed \(x\) after \(r\) round, where \(S_0\) is the initial state of RC4 after the KSA. Biases of the initial state of the PRGA are given as follow.

Proposition 1

[9]. After the end of KSA, for \(0 \le u \le N - 1, 0 \le v \le N - 1\),
$$\begin{aligned} \mathrm{Pr}(S_0[u] = v) = \left\{ \begin{array}{@{}ll} \frac{1}{N} \cdot \left( (\frac{N-1}{N})^v + (1 - (\frac{N-1}{N})^v ) \cdot (\frac{N-1}{N})^{N - u - 1} \right) &{} (v \le u), \\ \frac{1}{N} \cdot \left( (\frac{N-1}{N})^{N - u - 1} + (\frac{N-1}{N})^v \right) &{} (v > u). \end{array} \right. \nonumber \end{aligned}$$

The probability of \(S_{r-1}[r]\) in the PRGA are given as the follows.

Theorem 4

[6]1. For \( 3 \le r \le N-1\), the probability \(\mathrm{Pr}(S_{r-1}[r] = v)\) is approximately
$$\begin{aligned} \mathrm{Pr}(S_{1}[r] = v) \cdot \left( 1 - \frac{1}{N} \right) ^{r - 2} + \sum _{t = 2}^{r - 1}\sum _{w = 0}^{r - t} \frac{\mathrm{Pr}(S_{1}[t] = v) }{w! \cdot N} \cdot \left( \frac{r - t - 1}{N} \right) ^{w} \cdot \left( 1 - \frac{1}{N} \right) ^{r - 3 - w}, \nonumber \end{aligned}$$
where \(\mathrm{Pr}(S_{1}[t] = v)\) is given as
$$\begin{aligned} \mathrm{Pr}(S_1[t] = v) = \left\{ \begin{array}{ll} \mathrm{Pr}(S_0[1] = 1) + \sum \nolimits _{X \ne 1} \mathrm{Pr}(S_0[1] = X \wedge S_0[X] = 1) &{} (t = 1, v = 1),\\ \sum \nolimits _{X \ne 1,v} \mathrm{Pr}(S_0[1] = X \wedge S_0[X] = v) &{} (t = 1, v \ne 1), \\ \mathrm{Pr}(S_0[1] = t) + \sum \nolimits _{X \ne t} \mathrm{Pr}(S_0[1] = X \wedge S_0[t] = t) &{} (t \ne 1, v = t),\\ \sum \nolimits _{X \ne t,v} \mathrm{Pr}(S_0[1] = X \wedge S_0[t] = v) &{} (t \ne 1, v \ne t). \end{array} \right. \nonumber \end{aligned}$$

Then, the bias of \(\mathrm{Pr}(Z_r = 0)\) is estimated as follows.

Theorem 5

[6]. For \(3 \le r \le N-1\), \(\mathrm{Pr}(Z_r = 0)\) is approximately
$$ \mathrm{Pr}(Z_r = 0) \approx \frac{1}{N} + \frac{c_r}{N^2}, $$
where \(c_r\) is given as
$$ c_r = \left\{ \begin{array}{ll} \frac{N}{N - 1} \cdot (N \cdot \mathrm{Pr}(S_{r-1}[r] =r) - 1) - \frac{N-2}{N-1} &{} (r = 3),\\ \frac{N}{N - 1} \cdot (N \cdot \mathrm{Pr}(S_{r-1}[r] =r) - 1) &{} (r \ne 3). \end{array} \right. $$

Since the parameters of \(p\) and \(q\) are given as \(p = 1/N\) and \(q = c_r/N\), The number of required ciphertexts with different keys for the extraction of \(P_3, P_4, \ldots , P_{255}\) is roughly estimated as \(\varOmega (N^3)\).

3 New Biases : Theory and Experiment

This section introduces four new biases in the keystream of RC4. To begin with, we prove a conditional bias of \(Z_1\) towards \(0\) when \(Z_2 = 0\). After that, we present new biases in the events, \(Z_3 = 131\), \(Z_r = r\), and extended keylength-dependent biases, which are substantially stronger than the known biases such as \(Z_r = 0\). Then, we construct a cumulative list of strong biases in \(Z_1, Z_2, \ldots , Z_{257}\) to mount an efficient plaintext recovery attack on broadcast RC4.

3.1 Bias of \(Z_1 = 0 | Z_2 = 0\)

A new conditional bias such that \(Z_1\) is biased to 0 when \(Z_2 = 0\) is given as Theorem 6.

Theorem 6

\(\mathrm{Pr}(Z_1 = 0 | Z_2 = 0)\) is approximately
$$ \mathrm{Pr}(Z_1 = 0 | Z_2 = 0) \approx \frac{1}{2} \cdot \Bigl (\mathrm{Pr}(S_0[1] = 1) + (1 - \mathrm{Pr}(S_0[1] = 1)) \cdot \frac{1}{N} \Bigr ) + \frac{1}{2} \cdot \frac{1}{N}. $$

Proof

Two cases of \(S_0[2] = 0\) and \(S_0[2] \ne 0\) are considered. As mentioned in [11], when \(Z_2\) is \(0\), \(S_0[2]\) is also 0 with probability of \(\frac{1}{2}\).
  • \(S_0[2] = 0\)

    For \(i = 1\), if \(S_0[1] \) is \(1\), the index \(j\) is updated as \(j = S_0[i] = S_0[1] = 1\). Then the first output byte \(Z_1\) is expressed as follows (see Fig. 1),
    $$ Z_1 = S_1[S_1[i] + S_1[j]] = S_1[S_1[1] + S_1[1]] = S_1[2] = S_0[2] = 0. $$
    Assuming that \(Z_1 = 0\) holds with probability of \(\frac{1}{N}\) when \(S_0[1] \ne 1\), the probability of \(\mathrm{Pr}(Z_1 = 0|S_0[2] = 0)\) is estimated as
    $$ \mathrm{Pr}(Z_1 = 0|S_0[2] = 0) = \mathrm{Pr}(S_0[1] = 1) + (1 - \mathrm{Pr}(S_0[1] = 1)) \cdot \frac{1}{N}. $$
  • \(S_0[2] \ne 0\)

    Suppose that the event of \(Z_1 = 0\) occurs with probability of \(\frac{1}{N}\). Then \(\mathrm{Pr}(Z_1 = 0|S_0[2] = 0)\) is estimated as
    $$ \mathrm{Pr}(Z_1 = 0|S_0[2] \ne 0) = \frac{1}{N}. $$
Therefore \(\mathrm{Pr}(Z_1 = 0 | Z_2 = 0)\) is approximately
$$\begin{aligned} \mathrm{Pr}(Z_1 = 0 | Z_2 = 0)&= \mathrm{Pr}(Z_1 = 0| S_0[2]=0) \cdot \mathrm{Pr}(S_0[2] = 0|Z_2 = 0) \nonumber \\&\quad + \mathrm{Pr}(Z_1 = 0| S_0[2] \ne 0) \cdot \mathrm{Pr}(S_0[2] \ne 0|Z_2 = 0) \nonumber \\&\approx \frac{1}{2} \cdot \Bigl ( \mathrm{Pr}(S_0[1] = 1) + (1 - \mathrm{Pr}(S_0[1] = 1)) \cdot \frac{1}{N}\Bigr ) + \frac{1}{2} \cdot \frac{1}{N}. \nonumber \end{aligned}$$
\(\square \)
When \(N = 256\), \(\mathrm{Pr}(S_0[1] = 1)\) is obtained by Proposition 1.
$$ \mathrm{Pr}(S_0[1] = 1) = \frac{1}{256} \cdot \left( \left( \frac{1}{256}\right) + \left( 1 - \left( \frac{1}{256}\right) \right) \cdot \left( \frac{1}{256}\right) ^{254} \right) = 0.0038966. $$
Then, \(\mathrm{Pr}(Z_1 = 0 | Z_2 = 0)\) is computed as
$$\begin{aligned} \mathrm{Pr}(Z_1 = 0 | Z_2 = 0)&= \frac{1}{2} \cdot \left( \mathrm{Pr}(S_0[1] = 1) + (1 - \mathrm{Pr}(S_0[1] = 1)) \cdot \frac{1}{256}\right) + \frac{1}{2} \cdot \frac{1}{256} \nonumber \\&= 0.0058470 = 2^{-7.418} = 2^{-8}\cdot (1 + 2^{-1.009}).\nonumber \end{aligned}$$
Since the experimental value of \(\mathrm{Pr}(Z_1 = 0 | Z_2 = 0)\) for \(2^{40}\) randomly-chosen keys is obtained as \(0.0058109 = 2^{-8}\cdot (1 + 2^{-1.036})\), the theoretical value is correctly approximated.
From this bias, \(\mathrm{Pr}(Z_1 = 0 \wedge Z_2 = 0)\) can also be estimated, as follows.
$$ \mathrm{Pr}(Z_1 = 0 \wedge Z_2 = 0) = \mathrm{Pr}(Z_2 = 0) \cdot \mathrm{Pr}(Z_1 = 0|Z_2 = 0). $$
When \(N =256\), it is estimated as
$$ \mathrm{Pr}(Z_1 = 0 \wedge Z_2 = 0) = \frac{2}{256} \cdot 2^{-7.418} = 2^{-14.418} = 2^{-16} \cdot (1 + 2^{0.996}). $$
This type of bias, called digraph bias, was proved as a long term bias by Fluhrer and McGrew [3]. However, such a strong bias in initial bytes was not reported. Specifically, the probability of the general long-term digraph bias is estimated as \(2^{-16} \cdot (1 + 2^{-8})\) in [3] when \( N =256\), while that of our bias is \(2^{-16} \cdot (1 + 2^{0.996})\). Thus our result reveals that the digraph bias in initial bytes is much stronger than what is estimated in [3].

Note that we searched for the similar form of conditional biases in first 256 bytes of the RC4 keystream. In particular, we check following specific patterns, \((Z_{r-a} = X | Z_r = Y)\) for \(0 \le X\), \(Y \le 255\), \(2 \le r \le 256\), \(1 \le a \le 8\). However, such a strong bias could not be found in our experiment, while all conditional biases are not covered.

Application to Broadcast RC4 attack. Using this new conditional bias of \(Z_1 = 0 | Z_2 = 0\) in conjunction with the bias of \(Z_2 = 0\) [11], the first byte of the plaintext can be efficiently extracted, where \(N = 256 \). After \(2^{17}\) ciphertexts with randomly-chosen keys are collected, following procedures are performed.
Step 1.

Extract the second byte of the target plaintext, \(P_2\), from \(2^{8}\) ciphertexts [11].

Step 2.

Find the ciphertext in which \(Z_2 = 0 \) is XOR-ed by the computation of \(C_2 \oplus P_2\). Then, \(2^{10} = 2^{17} \cdot 2/256\) ciphertexts matching this criterion are expected to be obtained.

Step 3.

Regard the most frequent byte in the first byte \(C_1\) of these matching \(2^{10}\) ciphertexts as \(P_1\).

In Step 3, using the bias of \(\mathrm{Pr}(Z_1 = 0 | Z_2 = 0) = 2^{-8} \cdot (1 + 2^{-1.009})\), \(P_1\) is extracted from remaining \(2^{10} (\sim \frac{1}{2^{-8} \cdot (2^{-1.009})^2} )\) ciphertexts by Theorems 2 and 3, assuming the relation of \(C_1 = P_1 \oplus Z_1 = P_1\) holds. Although the bias of the first byte has already been pointed out in [6, 14], it requires \(2^{24}\) ciphertexts to extract the first byte using the known biases, because the probability of the strongest bias, which is a negative bias of \(Z_1\) towards 0, is estimated as about \(2^{-8} \cdot (1 - 2^{-8})\) [6]. Thus, the new conditional bias identified by us is very efficient, because the number of required ciphertexts reduces by a factor close to \(N/2\) compared to that of the straightforward method.

3.2 Bias of \(Z_3 = 131\)

A new bias of \(Z_3 = 131\), which is stronger than \(Z_3 = 0\) [6, 8], is given as Theorem 7.

Theorem 7

\(\mathrm{Pr}(Z_3 = 131)\) is approximately
$$\begin{aligned} \mathrm{Pr}(Z_3 =131)&\approx \mathrm{Pr}(S_0[1] = 131) \cdot \mathrm{Pr}(S_0[2] = 128) + \nonumber \\&(1 - \mathrm{Pr}(S_0[1] = 131) \cdot \mathrm{Pr}(S_0[2] = 128)) \cdot 1/N. \nonumber \end{aligned}$$

Proof

Suppose the events \(S_0[1] = 131\) and \(S_0[2] = 128\) occur after the KSA. For \(i = 1\), \(j\) is updated as \(S_0[1] = 131\). After \(S_0[1]\) and \(S_0[131]\) are swapped, \(S_1[131]\) becomes \(131\). For \(i = 2\), \(j\) is updated as \(131 + S_1[2] = 131 + S_0[2] = 131 + 128 = 3\), and \(S_1[2]\) and \(S_1[3]\) are swapped. Then \(S_2[3] = 128\) is obtained. Finally, for \(i = 3\), \(j\) is updated as \(3 + S_2[3] = 3 + 128 = 131\). After \(S_2[3]\) and \(S_2[131]\) are swapped, \(S_3[3] = 131\) and \(S_3[131] = 128\) holds. Then, a third output byte \(Z_3\) is \(Z_3 = S_3[S_3[3] + S_3[131]] = S_3[131 + 128] = S_3[3] = 131\). Thus, when \(S_0[1] = 131\) and \(S_0[2] = 128\) hold, \(Z_3 = 131\) holds with probability one. Figure 2 depicts this event.
Fig. 1.

Event for bias of \(Z_1 = 0 | Z_2 = 0\)

Fig. 2.

Event for bias of \(Z_3 = 131\)

Assuming that in other cases, that is when \(S_0[1] \ne 131\) or \(S_0[2] \ne 128\), the event \(Z_3 = 131\) holds with probability of \(1/N\), the probability of \(\mathrm{Pr}(Z_3 =131)\) is estimated as
$$\begin{aligned} \mathrm{Pr}(Z_3 =131)&\approx \mathrm{Pr}(S_0[1] = 131) \cdot \mathrm{Pr}(S_0[2] = 128) + \nonumber \\&(1 - \mathrm{Pr}(S_0[1] =131)\cdot \mathrm{Pr}(S_0[2] = 128)) \cdot 1/N. \nonumber \end{aligned}$$
\(\square \)
When \(N = 256\), by Proposition 1, \(\mathrm{Pr}(S_0[1] = 131)\) and \(\mathrm{Pr}(S_0[2] = 128)\) are estimated as
$$ \mathrm{Pr}(S_0[1] = 131) = \frac{1}{256} \cdot \left( \left( \frac{255}{256}\right) ^{256 - 1 - 1} + \left( \frac{255}{256}\right) ^{131} \right) = 0.0037848, $$
$$ \mathrm{Pr}(S_0[2] = 128) = \frac{1}{256} \cdot \left( \left( \frac{255}{256}\right) ^{256 - 2 - 1} + \left( \frac{255}{256}\right) ^{128} \right) = 0.0038181. $$
Thus, \(\mathrm{Pr}(Z_r =131)\) is computed as
$$ \mathrm{Pr}(Z_3 =131) \approx 0.0039206 = 2^{-8} \cdot (1 + 2^{-8.089}). $$
Since experimental value of this bias for \(2^{40}\) randomly-chosen keys is obtained as \(0.0039204 = 2^{-8} \cdot (1 + 2^{-8.109})\), the theoretical value is correctly approximated.
Let us compare it to the bias of \(Z_3 = 0\) of the MPS attack [6, 8]. The experimental value for \(2^{40}\) randomly-chosen keys is obtained as
$$ \mathrm{Pr}(Z_3 = 0) = 0.0039116 = 2^{-8} \cdot (1 + 2^{-9.512}). $$
Thus, the bias of \(Z_3 = 131\) is stronger than that of \(Z_3 = 0\).

We should utilize \(Z_3 = 131\) instead of \(Z_3 = 0\) for the efficient plaintext recovery attack. When \(Z_3 = 131\) and \(Z_3 = 0\) are jointly used, two candidates of \(P_3\) remain. Thus, in order to detect one correct value of \(P_3\), the only use of \(Z_3 = 131\) is more efficient.

3.3 Bias of \(Z_r = r\) for \(3 \le r \le N - 1\)

We also present a new bias in the event \(Z_r = r\) for \(3 \le r \le N - 1\), whose probabilities are very close to those of \(Z_r = 0\) [8], and the new biases are stronger than those of \(Z_r = 0\) in some rounds. Thus, for an efficient attack, we need to carefully consider which biases are stronger in each round. The probability of \(Z_r = r\) is given as Theorem 8.

Theorem 8

\(\mathrm{Pr}(Z_r = r)\) for \(3 \le r \le N - 1\) is approximately
$$\begin{aligned} \mathrm{Pr}(Z_r = r)&\approx p_{r-1, 0} \cdot \frac{1}{N} + p_{r-1, r} \cdot \frac{1}{N} \cdot \frac{N-2}{N} + \nonumber \\&(1 - p_{r-1, 0} \cdot \frac{1}{N} - p_{r-1, r} \cdot \frac{1}{N} - (1 - p_{r-1, 0} ) \cdot \frac{1}{N} \cdot 2) \cdot \frac{1}{N}, \nonumber \end{aligned}$$
where \(p_{r-1, 0} = \mathrm{Pr}(S_{r - 1}[r] = 0 )\) and \(p_{r-1, r} = \mathrm{Pr}(S_{r - 1}[r] = r )\).

Proof

Let \(i_r\) and \(j_r\) be \(r\)-th \(i\) and \(j\), respectively. For \(i_r = r\), an output \(Z_r\) is expressed as
$$\begin{aligned} Z_r&= S_r[S_r[i_r] + S_r[j_r] ] = S_r[S_r[r] + S_{r-1}[r] ]. \nonumber \end{aligned}$$
Then, let us consider four independent cases.
Case 1 :

\(S_{r-1}[r] = 0 \wedge S_r[r] = r\)

Case 2 :

\(S_{r-1}[r] = r \wedge S_r[r] = j_r - r \wedge j_r \ne r, r + r\)

Case 3 :

\(S_{r-1}[r] \ne 0 \wedge S_r[r] = r - S_{r-1}[r] \)

Case 4 :

\(S_{r-1}[r] \ne 0 \wedge S_r[r] = r \)

In Case 1 and Case 2, the output is always \(Z_r = r\). On the other hand, in Case 3 and Case 4, the output is not \(Z_r = r\).

Case 1 :\(S_{r-1}[r] = 0 \wedge S_r[r] = r\)

The output is expressed as \(Z_r = S_r[S_r[r] + S_{r-1}[r] ] = S_r[r + 0] =S_r[r] = r\) (see Fig. 3). Then, the probability of \(Z_r = r\) is one. Here \(S_r[r]\) is chosen by pointer \(j\). Since \(j_r\) for \(r \ge 3\) behaves randomly [8], \(S_r[r]\) is assumed to be uniformly random. it is estimated as
$$ \mathrm{Pr}(S_{r-1}[r] = 0 \wedge S_r[r] = r ) = p_{r-1, 0} \cdot \frac{1}{N}. $$
Case 2 :\(S_{r-1}[r] = r \wedge S_r[r] = j_r - r \wedge j_r \ne r, r + r\)

The output is expressed as \(Z_r = S_r[S_r[r] + S_{r-1}[r] ] = S_r[j_r - r + r ] =S_r[j_r] = S_{r-1}[r] = r\) (see Fig. 4). Then, the probability of \(Z_r = r \) is one. Similar to Case 1, \(S_r[r]\) is assumed to be uniformly random.

When \(j_r = r\), the probability of \(Z_r = r\) is zero because of the relation of \(Z_r = S_r[S_r[r] + S_{r-1}[r] ] = S_r[0 + r] =S_r[r] = 0\). Also, when \(j_r = r + r\), since \(S_r[r] = r\) and \(Z_r = S_r[S_r[r] + S_{r-1}[r] ] = S_r[r + r] \ne r\), the probability of \(Z_r = r\) is zero. Thus, the conditions of \(j_r \ne r, r + r\) are necessary for \(Z_r = r\). Then, it is estimated as
$$ \mathrm{Pr}(S_{r-1}[r] = r \wedge S_r[r] = j_r - r \wedge j_r \ne r, r + r) = p_{r-1, r}\cdot \frac{1}{N} \cdot \frac{N-2}{N}. $$
Case 3 :\(S_{r-1}[r] \ne 0 \wedge S_r[r] = r - S_{r-1}[r]\)
The equation of \(Z_r = S_r[r - S_{r-1}[r] + S_{r-1}[r] ] = S_r[r]\) holds. Then, \(S_r[r] = r - S_{r-1}[r]\) is not \(r\), because \(S_{r-1}[r]\) is not \(0\). Thus, it is estimated as
$$ \mathrm{Pr}(S_{r-1}[r] \ne 0 \wedge S_r[r] = r - S_{r-1}[r] ) = (1 - p_{r-1, 0} ) \cdot \frac{1}{N}. $$
Case 4 :\(S_{r-1}[r] \ne 0 \wedge S_r[r] = r \)
The output is expressed as \(Z_r = S_r[r + S_{r-1}[r] ]\). According to the equation of \(S_{r-1}[r] \ne 0\), The probability of \(Z_r = r\) is zero. Thus, it is estimated as
$$ \mathrm{Pr}(S_{r-1}[r] \ne (0, r) \wedge S_r[r] = r - S_{r-1}[r] ) = (1 - p_{r-1, 0}) \cdot \frac{1}{N}. $$
Assuming that in other cases, \(Z_r = r\) holds with probability of \(1/N\), the probability of \(\mathrm{Pr}(Z_r =r)\) is estimated as
$$\begin{aligned} \mathrm{Pr}(Z_r = r)&\approx p_{r-1, 0} \cdot \frac{1}{N} + p_{r-1, r} \cdot \frac{1}{N} \cdot \frac{N-2}{N} + \nonumber \\&(1 - p_{r-1, 0} \cdot \frac{1}{N} - p_{r-1, r} \cdot \frac{1}{N} - (1 - p_{r-1, 0} ) \cdot \frac{1}{N} \cdot 2) \cdot \frac{1}{N}. \nonumber \end{aligned}$$
\(\square \)
Fig. 3.

Event (Case 1) for bias of \(Z_r = r\)

Fig. 4.

Event (Case 2) for bias of \(Z_r = r\)

Fig. 5.

Theoretical values and experimental values of \(Z_r = r\)

Here, \(p_{r-1, r}\) and \(p_{r-1, 0}\) are obtained from Theorem 4. Figure 5 shows the comparison of theoretical values and experimental values of \(Z_r = r\) for \(2^{40}\) randomly-chosen keys when \(N = 256\). Since the theoretical values do not exactly coincide with the experimental values, we do not claim that Theorem 8 completely prove this bias. We guess that several minor events are not covered in our approach. However, the order of the bias seems to be well matched. At least it can be said that the main event causing this bias is discovered.

3.4 Extended Keylength-Dependent Biases

Extended keylength-dependent biases, which are extensions of keylength-dependent biases [5, 17], are the bias of \(Z_\ell = -\ell \) when the key length is \(\ell \) bytes. For example, when using a 128-bit key (16 bytes), \(Z_{16}\) is biased to \(-16\)\((= 240)\). In addition to it, we show that when the key length is \(\ell \) bytes, \(Z_{x \cdot \ell }\) is also biased to \(-x\cdot \ell \)\((x = 2, 3, 4, 5, 6, 7)\), e.g., \(Z_r = -r \) for \(r =\)\(32\), \(48\), \(64\), \(80\), \(96\), \(112\), assuming \(\ell \) = 16. Importantly, the extended keylength-dependent biases are much stronger than the other known biases such as \(Z_r=0\) and \(Z_r=r\). Table 1 shows experimental values of the extended keylength-dependent bias \(Z_r=-r\), \(Z_r=0\), and \(Z_r=r\) for \(2^{40}\) randomly-chosen keys, when \(r\) is a multiple of the key length, \(\ell = 16\) in this case.
Table 1.

Experimental values of \(Z_r=-r\), \(Z_r=0\) and \(Z_r=r\)

\(r\)

\(\mathrm{Pr}(Z_r=-r)\)

\(\mathrm{Pr}(Z_r=0)\)

\(\mathrm{Pr}(Z_r=r)\)

16

\(2^{-8} \cdot (1 + 2^{-4.811})\)

\(2^{-8} \cdot (1 + 2^{-7.714})\)

\(2^{-8} \cdot (1 + 2^{-7.762})\)

32

\(2^{-8} \cdot (1 + 2^{-5.383})\)

\(2^{-8} \cdot (1 + 2^{-7.880})\)

\(2^{-8} \cdot (1 + 2^{-7.991})\)

48

\(2^{-8} \cdot (1 + 2^{-5.938})\)

\(2^{-8} \cdot (1 + 2^{-8.043})\)

\(2^{-8} \cdot (1 + 2^{-8.350})\)

64

\(2^{-8} \cdot (1 + 2^{-6.496})\)

\(2^{-8} \cdot (1 + 2^{-8.244})\)

\(2^{-8} \cdot (1 + 2^{-8.664})\)

80

\(2^{-8} \cdot (1 + 2^{-7.224})\)

\(2^{-8} \cdot (1 + 2^{-8.407})\)

\(2^{-8} \cdot (1 + 2^{-9.052})\)

96

\(2^{-8} \cdot (1 + 2^{-7.911})\)

\(2^{-8} \cdot (1 + 2^{-8.577})\)

\(2^{-8} \cdot (1 + 2^{-9.351})\)

112

\(2^{-8} \cdot (1 + 2^{-8.666})\)

\(2^{-8} \cdot (1 + 2^{-8.747})\)

\(2^{-8} \cdot (1 + 2^{-9.732})\)

The probability of these biases is given as Theorem 9 (the proof is in Appendix A).

Theorem 9

When \(r=x \cdot \ell \)\((x=1, 2, \ldots , 7)\), the probability of \(\mathrm{Pr}(Z_r=-r)\) is approximately
$$\begin{aligned} \mathrm{Pr}(Z_r=-r) \approx \frac{1}{N^2} + \left( 1- \frac{1}{N^2} \right) \cdot \gamma _r + (1- \delta _r) \cdot \frac{1}{N}, \nonumber \end{aligned}$$
where
$$\begin{aligned} \gamma _r&= \frac{1}{N^2} \cdot \left( 1- \frac{r+1}{N} \right) \nonumber \\&\quad \cdot \sum ^{N-1}_{y=r+1} \left( 1-\frac{1}{N} \right) ^{y} \cdot \left( 1-\frac{2}{N} \right) ^{y-r} \cdot \left( 1-\frac{3}{N} \right) ^{N-y+2r-4}, \nonumber \end{aligned}$$
and \(\delta _r=\mathrm{Pr}(S_r[j_r]=0)=\mathrm{Pr}(S_{r-1}[r]=0)\).
Figure 6 shows our experimental values for \(2^{40}\) randomly-chosen keys and theoretical values of these extended keylength-dependent biases. Since theoretical and experimental values have almost the same value, theoretical values are correctly approximated.
Fig. 6.

Experimental values and theoretical values of \(Z_r=-r\) when \(\ell =16\) for \(r=16, 32, 48, 64, 80, 96, 112\)

Fig. 7.

Comparison between \(Z_r = 0\) and \(Z_r = r\) for \(3 \le r \le 255\)

3.5 Cumulative Bias Set of First 257 Bytes

When \(N = 256\), a set of strong biases in \(Z_1, Z_2, \ldots , Z_{255}\) is given in Table 2. Our new biases, namely the ones involving \(Z_1\), \(Z_{3}\), \(Z_{32}\), \(Z_{48}\), \(Z_{64}\), \(Z_{80}\), \(Z_{96}\), \(Z_{112}\), are included. Here, let us compare between the biases of \(Z_r = 0\)  [6, 8] and \(Z_r = r\), whose probabilities are of the same order, and are very close in the range \(3 \le r \le 255\). According to our experiments with \(2^{40}\) randomly-chosen keys (see Fig. 7), \(Z_r =r\) is stronger than \(Z_r =0\) in \(Z_{5}, Z_{6}, \ldots , Z_{31}\). Thus we choose the bias \(Z_r=r\) in \(Z_{5}, Z_{6}, \ldots , Z_{31}\) and the bias \(Z_r = 0\) in the other cases as the strongest bias except for the cases involving \(Z_{3}\), \(Z_{16}\), \(Z_{32}\), \(Z_{48}\), \(Z_{64}\), \(Z_{80}\), \(Z_{96}\), \(Z_{112}\). Besides, we experimentally found two new biases for the events \(Z_{256} \ne 0\) and \(Z_{257} = 0\), and added these to our bias set, while we could not provide the theoretical proofs. Note that it is experimentally confirmed that biases of \(Z_2, Z_{3}, \ldots , Z_{257}\) included in our bias set are strongest known biases amongst all the positive and negative biases that have been discovered for these bytes.
Table 2.

Cumulative bias set of first 257 bytes

\(r\)

Strongest known bias of \(Z_{r}\)

Prob.(Theoretical)

Prob.(Experimental)

1

\(Z_1 = 0 | Z_2 = 0\) (Our)

\(2^{-8} \cdot (1 + 2^{-1.009})\)

\(2^{-8} \cdot (1 + 2^{-1.036})\)

2

\(Z_2 = 0\)  [11]

\(2^{-8} \cdot (1 + 2^{0})\)

\(2^{-8} \cdot (1 + 2^{0.002})\)

3

\(Z_3 = 131\) (Our)

\(2^{-8} \cdot (1 + 2^{-8.089})\)

\(2^{-8} \cdot (1 + 2^{-8.109})\)

4

\(Z_4 = 0\)  [8]

\(2^{-8} \cdot (1 + 2^{-7.581})\)

\(2^{-8} \cdot (1 + 2^{-7.611})\)

\(5\)\(15\)

\(Z_r = r\) (Our)

max: \(2^{-8} \cdot (1 + 2^{-7.627})\)

max: \(2^{-8} \cdot (1 + 2^{-7.335})\)

  

min: \(2^{-8} \cdot (1 + 2^{-7.737})\)

min: \(2^{-8} \cdot (1 + 2^{-7.535})\)

16

\(Z_{16} = 240\) [5]

\(2^{-8} \cdot (1 + 2^{-4.841})\)

\(2^{-8} \cdot (1 + 2^{-4.811})\)

\(17\)\(31\)

\(Z_r = r\) (Our)

max: \(2^{-8} \cdot (1 + 2^{-7.759})\)

max: \(2^{-8} \cdot (1 + 2^{-7.576})\)

  

min: \(2^{-8} \cdot (1 + 2^{-7.912})\)

min: \(2^{-8} \cdot (1 + 2^{-7.839})\)

32

\(Z_{32} = 224\) (Our)

\(2^{-8} \cdot (1 + 2^{-5.404})\)

\(2^{-8} \cdot (1 + 2^{-5.383})\)

\(33\)\(47\)

\(Z_r = 0\)  [8]

max: \(2^{-8} \cdot (1 + 2^{-7.897})\)

max: \(2^{-8} \cdot (1 + 2^{-7.868})\)

  

min: \(2^{-8} \cdot (1 + 2^{-8.050})\)

min: \(2^{-8} \cdot (1 + 2^{-8.039})\)

48

\(Z_{48} = 208\) (Our)

\(2^{-8} \cdot (1 + 2^{-5.981})\)

\(2^{-8} \cdot (1 + 2^{-5.938})\)

\(49\)\(63\)

\(Z_r = 0\)  [8]

max: \(2^{-8} \cdot (1 + 2^{-8.072})\)

max: \(2^{-8} \cdot (1 + 2^{-8.046})\)

  

min: \(2^{-8} \cdot (1 + 2^{-8.224})\)

min: \(2^{-8} \cdot (1 + 2^{-8.238})\)

64

\(Z_{64} = 192\) (Our)

\(2^{-8} \cdot (1 + 2^{-6.576})\)

\(2^{-8} \cdot (1 + 2^{-6.496})\)

\(65\)\(79\)

\(Z_r = 0\)  [8]

max: \(2^{-8} \cdot (1 + 2^{-8.246})\)

max: \(2^{-8} \cdot (1 + 2^{-8.223})\)

  

min: \(2^{-8} \cdot (1 + 2^{-8.398})\)

min: \(2^{-8} \cdot (1 + 2^{-8.376})\)

80

\(Z_{80} = 176\) (Our)

\(2^{-8} \cdot (1 + 2^{-7.192})\)

\(2^{-8} \cdot (1 + 2^{-7.224})\)

\(81\)\(95\)

\(Z_r = 0\)  [8]

max: \(2^{-8} \cdot (1 + 2^{-8.420})\)

max: \(2^{-8} \cdot (1 + 2^{-8.398})\)

  

min: \(2^{-8} \cdot (1 + 2^{-8.571})\)

min: \(2^{-8} \cdot (1 + 2^{-8.565})\)

96

\(Z_{96} = 160\) (Our)

\(2^{-8} \cdot (1 + 2^{-7.831})\)

\(2^{-8} \cdot (1 + 2^{-7.911})\)

\(97\)\(111\)

\(Z_r = 0\) [8]

max: \(2^{-8} \cdot (1 + 2^{-8.592})\)

max: \(2^{-8} \cdot (1 + 2^{-8.570})\)

  

min: \(2^{-8} \cdot (1 + 2^{-8.741})\)

min: \(2^{-8} \cdot (1 + 2^{-8.722})\)

112

\(Z_{112} = 144\) (Our)

\(2^{-8} \cdot (1 + 2^{-8.500})\)

\(2^{-8} \cdot (1 + 2^{-8.666})\)

\(113\)\(255\)

\(Z_r = 0\)  [8]

max: \(2^{-8} \cdot (1 + 2^{-8.763})\)

max: \(2^{-8} \cdot (1 + 2^{-8.760})\)

  

min: \(2^{-8} \cdot (1 + 2^{-10.052})\)

min: \(2^{-8} \cdot (1 + 2^{-10.041})\)

256

\(Z_{256} = 0\) (negative bias) (Our)

N/A

\(2^{-8} \cdot (1 - 2^{-9.407})\)

257

\(Z_{257} = 0\) (Our)

N/A

\(2^{-8} \cdot (1 + 2^{-9.531})\)

For the first time, we propose a cumulative list of strongest known biases in the initial bytes of RC4 that can be exploited in a practical attack against the broadcast mode of the cipher.

4 Experimental Results of Plaintext Recovery Attack

We demonstrate a plaintext recovery attack using our cumulative bias set of first 257 bytes by a computer experiment, when \(N=256\), and estimate the number of required ciphertexts and the probability of success for our attack. The details of our experiment are as follows.
Step 1.

Randomly generate a target plaintext \(P\).

Step 2.

Encrypt \(P\) with \(2^x\) randomly-chosen keys, and obtain \(2^x\) ciphertexts \(C\).

Step 3.

Find most frequent byte in each byte, and extract \(P_r\), assuming \(P_r = C_r \oplus Z_r\) where \(Z_r\) is the value of the keystream byte from our bias set.

In the case of \(P_1\), the method mentioned in Sect. 3.1 is used for efficient extraction of \(P_1\). Specifically, after \(P_2\) is recovered, we extract \(P_1\) by using the conditional bias such that \(Z_1 = 0\) when \(Z_2 = 0\).
We perform the above experiment for 256 different plaintexts in the cases where \(2^{6}, 2^{7}, \ldots , 2^{35}\) ciphertexts with randomly-chosen keys are given. Figure 8 shows the probability of successfully recovering the values of \(P_1, P_2, P_3\), \(P_{5}\), and \(P_{16}\) for each amount of ciphertexts. Here, the success probability is estimated by the number of correctly-extracted plaintexts for each byte. For example, if the target byte of only 100 plaintexts out of 256 plaintexts can be correctly recovered, the probability is estimated as \(0.39\)\((=100/256)\). The second byte of plaintext \(P_2\) can be extracted from \(2^{12}\) ciphertexts with probability one. In previous attacks such as the MS attack [11] and the MPS attack [8], the number of required ciphertexts is theoretically estimated only in terms of the lower bound \(\varOmega \). Our results first reveal the concrete number of ciphertexts, and the corresponding success probability.
Fig. 8.

Relation of the number of ciphertexts and success probability of recovering \(P_1, P_2, P_3\), \(P_{5}\), and \(P_{16}\)

Fig. 9.

Success probability of extracting \(P_r\)\((1 \le r \le 257)\) with different number of samples (one candidate)

Fig. 10.

Success probability of extracting \(P_r\)\((1 \le r \le 257)\) with different number of samples (two candidates)

Fig. 11.

The number of plaintext bytes that are extracted with five times higher than that of a random guess

Figure 9 shows that the success probability of extracting each byte \(P_r\)\((1 \le r \le 257)\) when \(2^{24}\), \(2^{28}\), \(2^{32}\), \(2^{35}\) ciphertexts are given. Note that the probability of a random guess is \(1/256 = 0.00390625\). Given \(2^{32}\) ciphertexts, all bytes of \(P_1, P_2, \ldots , P_{257}\) can be extracted with probability more than 0.5. In addition, most bytes can be extracted with probability more than 0.8. Also, the bytes having stronger bias such as \(P_1\), \(P_2\), \(P_{16}\), \(P_{32}\), \(P_{48}\), \(P_{64}\), are extracted from only \(2^{24}\) ciphertexts with high probability. However, even if \(2^{35}\) ciphertexts are given, the probability does not become one in some bytes. It is guessed that in such bytes, the difference of probability of the strongest known bias (as in our cumulative bias set) and the second one is very small. Thus, more ciphertexts are required for an attack with probability one.

We additionally utilize the second most frequent byte in the ciphertexts for extracting plaintext bytes. In other words, two candidates are obtained by using the relation of \(P_r = C_r \oplus Z_r\), where \(C_r\) are most and second most frequent ciphertext bytes and \(Z_r\) is chosen from our bias set. This result is shown in Fig. 10, and its success probability is estimated as the probability that the guess for the correct plaintext byte is narrowed down to two possible candidates. Note that the probability of a random guess for such a scenario is \(2/256 = 0.0078125\). Given \(2^{34}\) ciphertexts, each byte of \(P_1, P_2, \ldots , P_{257}\) can be extracted with probability one. In this case, although we can not obtain the correct byte of the plaintext, it is narrowed down to only two candidates. For the experiments of Figs. 9, 10, it requires about one day if one uses a single CPU core (Intel(R) Core(TM) i7 CPU 920@ 2.67 GHz) to obtain the result of one plaintext, where 256 plaintexts are used.

Figure 11 shows the number of plaintext bytes that are extracted with five times higher probability than that of a random guess, i.e., where the success probability is more than \(\frac{5}{256}\). Given \(2^{29}\) ciphertexts, all the plaintext bytes \(P_1, P_2, \ldots , P_{257}\) are guessed with much higher probability than random guesses.

5 How to Recover Bytes of the Plaintext After \(P_{258}\)

In this section, we propose an efficient method to recover later bytes of the plaintext, namely bytes after \(P_{258}\). The method using our bias in initial bytes is not directly applied to extract these bytes, because it exploits biases existing in only the initial keystream. For the extraction of the later bytes, a long-term bias, which occurs in any keystream bytes, is utilized. In particular, the digraph repetition bias (also called \(ABSAB\) bias) proposed by Mantin [10], which is the strongest known long-term bias, is used. Combining it with our cumulative bias set of \(Z_{1}, Z_{2}, \ldots , Z_{257}\), we can sequentially recover bytes of a plaintext, even after \(P_{258}\), given only the ciphertexts.

5.1 Best Known Long-Term Bias (\(ABSAB\) bias)

\(ABSAB\) bias is statistical biases of the digraph distribution in the RC4 keystream [10]. Specifically, digraphs \(AB\) tend to repeat with short gaps \(S\) between them, e.g., \(ABAB\), \(ABCAB\) and \(ABCDAB\), where gap \(S\) is defined as zero, \(C\), and \(CD\), respectively. The detail of \(ABSAB\) bias is expressed as follows,
$$\begin{aligned} Z_r ~||~ Z_{r+1} = Z_{r+2+G} ~||~ Z_{r+3+G}~~\mathrm{for}~G \ge 0, \end{aligned}$$
(1)
where \(||\) is a concatenation. The probability that Eq. (1) holds is given as Theorem 10.

Theorem 10

[10]. For small values of G the probability of the pattern ABSAB in RC4 keystream, where S is a G-byte string, is \((1 + e^{(-4-8G)/N}/N) \cdot 1/N^2\).

For the enhancement of these biases, combining use of \(ABSAB\) biases with different \(G\) is considered by using the following lemma for the discrimination.

Lemma 1

[10]. Let X and Y be two distributions and suppose that the independent events {\(E_i\): \(1 \le i \le k\) } occur with probabilities \(p_X(E_i) = p_i\) in \(X\) and \(p_Y(E_i)=(1+b_i) \cdot p_i\) in Y. Then the discrimination \(D\) of the distributions is \(\sum _i p_i \cdot b_i^{2}\).

The number of required samples for distinguishing the biased distribution from the random distribution with probability of \(1-\alpha \) is given as the following lemma.

Lemma 2

[10]. The number of samples that is required for distinguishing two distributions that have discrimination \(D\) with success rate \(1-\alpha \) (for both directions) is \((1/D) \cdot (1- 2\alpha ) \cdot log_2 \frac{1-\alpha }{\alpha }\).

This lemma shows that in the broadcast RC4 attack, given \(D\) and the number of samples \(N_{ciphertext}\), the success probability for distinguishing the distribution of correct candidate plaintext byte (the biased distribution) from the distribution of one wrong candidate of plaintext byte (a random distribution) is a constant. \(\mathrm{Pr}_{distinguish}\) denotes this probability.

5.2 Plaintext Recovery Method Using \(ABSAB\) Bias and Our Bias Set

The following equation allows us to efficiently use \(ABSAB\) bias in the broadcast RC4 attack.
$$\begin{aligned}&(C_r ~||~ C_{r+1}) \oplus (C_{r+2+G} ~||~ C_{r+3+G}) \nonumber \\&= (P_r \oplus Z_r~||~ P_{r+1} \oplus Z_{r+1}) \oplus ( P_{r+2+G} \oplus Z_{r+2+G} ~||~ P_{r+3+G} \oplus Z_{r+3+G}) \nonumber \\&= (P_r \oplus P_{r+2+G} \oplus Z_r \oplus Z_{r+2+G}~||~ P_{r+1} \oplus P_{r+3+G} \oplus Z_{r+1} \oplus Z_{r+3+G}). \end{aligned}$$
(2)
Assuming that Eq. (1) (the event of the \(ABSAB\) bias) holds, the relation of plaintexts and ciphertexts without keystreams is obtained, i.e., \((C_r ~||~ C_{r+1})\)\(\oplus \)\(( C_{r+2+G} ~||~ C_{r+3+G})\)\(= (P_r \oplus P_{r+2+G}~||~P_{r+1} \oplus P_{r+3+G}) = (P_{r} ~||~ P_{r+1}) \oplus ( P_{r+2+G} ~||~ P_{r+3+G})\).

However, in the straight way, we can not combine these relations with different \(G\) to enhance the biases, as we do in the distinguishing attack setting. When the value of \(G\) is different, the above equation is surely different even if \(r\) is properly chosen. For example, in the cases of (\(r\) and \(G=1\)) and (\(r+1\) and \(G=0\)), right parts of equations are given as \((P_{r} ~||~ P_{r+1}) \oplus ( P_{r+3} ~||~ P_{r+4})\) and \((P_{r+1} ~||~ P_{r+2}) \oplus ( P_{r+3} ~||~ P_{r+4})\), respectively. Thus, due to independent use of these equations with different \(G\), we are not able to efficiently make use of \(ABSAB\) bias in the broadcast setting.

In order to get rid of this problem, we give a method that sequentially recovers the plaintext after \(P_{258}\) with the knowledge of pre-guessed plaintext bytes. For example, in the cases of (\(r\) and \(G=1\)) and (\(r+1\) and \(G=0\)), if \(P_{r}\), \(P_{r+1}\), and \(P_{r+2}\) are already known, the two equations with respected to \((P_{r+3}~||~P_{r+4})\) is obtained by transposing \(P_{r}\), \(P_{r+1}\), and \(P_{r+2}\) to the left part of the equation. Then, these equations with different \(G\) can be merged.

Suppose that \(P_1, P_2, \dots , P_{257}\) are guessed by our cumulative bias set of the initial bytes, where the success probability of finding these bytes are evaluated in Sect. 4. Then we aim to sequentially find \(P_r ~\mathrm{for}~ r = 258, 259, \dots , P_{MAX}\) by using \(ABSAB\) biases of \(G = 0, 1, \dots , G_{MAX}\). The detailed procedures are given as follows.

 
Step 1.

Obtain \(C_{258-3-G_{MAX}},C_{258-2-G_{MAX}},\dots ,C_{P_{MAX}}\) in each ciphertext, and make frequency tables \(T_{count}[r][G]\) of \((C_{r-3-G} ~||~ C_{r-2-G}) \oplus ( C_{r-1} ~||~ C_{r})\) for all \(r=258, 259, \dots , P_{MAX}\) and \(G = 0, 1, \dots , G_{MAX}\), where \((C_{r-3-G} ~||~ C_{r-2-G})\)\(\oplus \)\(( C_{r-1} ~||~ C_{r}) = (P_{r-3-G} ~||~ P_{r-2-G}) \oplus ( P_{r-1} ~||~ P_{r})\) only if Eq. (1) holds.

Step 2.

Set \(r=258\).

Step 3.

Guess the value of \(P_{r}\).

Step 3.1.

For \(G = 0, 1, \dots , G_{MAX}\), convert \(T_{count}[r][G]\) into a frequency table \(T_{marge}[r]\) of \((P_{r-1} ~||~P_{r})\) by using pre-guessed values of \(P_{r-3-G_{MAX}}\), \(\dots \), \(P_{r-2}\), and merge counter values of all tables.

Step 3.2.

Make a frequency table \(T_{guess}[r]\) indexed by only \(P_{r}\) from \(T_{marge}[r]\) with knowledge of the \(P_{r-1}\). To put it more precisely, using a pre-guessed value of \(P_{r-1}\), only Tables \(T_{marge}[r]\) corresponding to the value of \(P_{r-1}\) is taken into consideration. Finally, regard most frequency one in table \(T_{guess}[r]\) as the correct \(P_{r}\).

Step 4.

Increment \(r\). If \(r=P_{MAX} + 1\), terminate this algorithm. Otherwise, go to Step 3.

The bytes of the plaintext are correctly extracted from \(T_{marge}[r]\) only if it is distinguished from other \(N^2 - 1\) wrong candidate distributions. Assuming that wrong candidates are randomly distributed, a probability of the correct extraction from \(T_{marge}[r]\) is estimated as \((\mathrm{Pr}_{distingush})^{N^2 -1}\). In Step 3.2, our method converts \(T_{marge}[r]\) into \(T_{guess}[r]\) by using knowledge of \(P_{r-1}\), where \(T_{guess}[r]\) has \(N-1\) wrong candidates. It enables us to reduce the number of wrong candidates from \(N^2 -1\) to \(N-1\). Then, a probability of the correct extraction from \(T_{guess}[r]\) is estimated as \((\mathrm{Pr}_{distingush})^{N-1}\), which is \(1/(\mathrm{Pr}_{distingush})^{N+1}\) times higher than that of \(T_{marge}[r]\). Therefore, the table reduction technique of Step 3.2 enables us to further optimize the attack.

Experimental Results. We perform practical experiments using our algorithm to find \(P_{258}\), \(P_{259}\), \(P_{260}\), and \(P_{261}\) (\(P_{MAX}=261\)). As a parameter of \(ABSAB\) bias, \(G_{MAX} = 63\) is chosen, because the increase of \(D\) is converged around \(G_{MAX} = 63\). Then, \(D\) is estimated as \(D=2^{-28.0}\). The success probability of our algorithm for recovering \(P_r\) (\(r \ge 258\)) when \(2^{30}\) to \(2^{34}\) ciphertexts are given is shown in Table 3, where the number of tests is 256. Note that \(P_{1}, P_{2}, \dots , P_{257}\) are obtained by using our bias set (candidate one) with success probability as shown in Fig. 9. For this experiment, it requires about one week if one uses a single CPU core (Intel(R) Core(TM) i7 CPU 920@ 2.67 GHz) to get the result of one plaintext, where 256 plaintexts are used.

Interestingly, given \(2^{34}\) ciphertexts, \(P_{258}\), \(P_{259}\), \(P_{260}\), and \(P_{261}\) can be recovered with probability one, while the success probability of some bytes in \(P_{1}, P_2, \dots , P_{257}\) is not one. Combining multiple biases allows us to omit negative effects of some uncorrected value of \(P_{1}, P_2, \dots , P_{257}\). Although our experiment is performed until \(P_{261}\), the success probability is expected not to change even in the case of later bytes, because \(ABSAB\) bias is a long-term bias.

Let us discuss the success probability of extracting bytes after \(P_{262}\) when \(2^{34}\) ciphertexts are given. According to Lemma 2 and \(D=2^{-28.0}\), \(2^{34}\) ciphertexts allow us to distinguish an RC4 keystream from a random stream with the probability of \(\mathrm{Pr}_{distinguish} = 1 - 10^{-19}\). Then, assuming that wrong candidates are randomly distributed, the probability of correctly extracting the candidate from \((N-1)\) wrong candidates is estimated as \((\mathrm{Pr}_{distinguish})^{N - 1}\). Therefore, our method enables to extract consecutive (\(257 + X\)) bytes of a plaintext with the probability of \(((\mathrm{Pr}_{distinguish})^{N - 1})^{X} = (\mathrm{Pr}_{distinguish})^{(N - 1) \cdot X}\). For instance, when \(X = 2^{40}\) and \(X = 2^{50}\), the success probabilities are estimated as \(0.99997\) and \(0.97170\), respectively.

As a result, by using our sequential method, a large amount of plaintext bytes, e.g., first \(2^{50}\) bytes \(\approx \) 1000 T bytes, is recovered from \(2^{34}\) ciphertext with a probability of almost one. Therefore, it can be said that our attack is a full plaintext recovery attack on broadcast RC4, the first of its kind proposed in the literature.
Table 3.

Success Probability of our algorithm for recovering \(P_r\) (\(r \ge 258\)).

# of ciphertexts

\(P_{258}\)

\(P_{259}\)

\(P_{260}\)

\(P_{261}\)

\(2^{30}\)

0.003906

0.003906

0.000000

0.000000

\(2^{31}\)

0.039062

0.007812

0.003906

0.007812

\(2^{32}\)

0.386719

0.152344

0.070312

0.027344

\(2^{33}\)

0.964844

0.941406

0.921875

0.902344

\(2^{34}\)

1.000000

1.000000

1.000000

1.000000

6 Conclusion

In this paper, we have evaluated the practical security of RC4 in the broadcast setting. After the introduction of four new biases of the keystream of RC4, i.e., the conditional bias of \(Z_1\), the biases of \(Z_3 = 131\) and \(Z_r = r\) for \(3 \le r \le 255\), and the extended keylength-dependent biases, a cumulative list of strongest known biases in \(Z_1, Z_2, \ldots , Z_{257}\) is given. Then, we demonstrate a practical plaintext recovery attack using our bias set by a computer experiment. As a result, most bytes of \(P_1, P_2, \ldots , P_{257}\) could be extracted with probability more than \(0.8\) using \(2^{32}\) ciphertexts encrypted by randomly-chosen keys. Finally, we have proposed an efficient method to extract bytes of plaintexts after \(P_{258}\). Our attack is able to recover any plaintext byte from only ciphertexts generated using different keys. For example, first \(2^{50}\) bytes of the plaintext are expected to be recovered from \(2^{34}\) ciphertexts with high probability.

Note that our attack on broadcast RC4, as proposed in this paper, utilizes the advantage of sequential recovery of plaintext bytes. If the initial 256/512/768 bytes of the keystream are suppressed in the protocol, as recommended in case of RC4 usages [14], our attack does not work any more. However, widely-used protocols such as SSL/TLS use initial bytes of the keystream. For SSL/TLS, the broadcast setting is converted into the multi-session setting where the target plaintext block are repeatedly sent in the same position in the plaintexts in multiple SSL/TLS sessions [2].

Our evaluation reveals that broadcast RC4 is practically vulnerable to the plaintext recovery attacks as moderate amount of ciphertexts, i.e., \(2^{24}\) to \(2^{34}\) ciphertexts generated by different keys, leaks considerable information about the plaintext. Thus, RC4 is not to be recommended for the encryption in case of the typical broadcast setting and multi-session setting of SSL/TLS.

Footnotes

  1. 1.

    The theorems with respect to \(Z_r = 0\) in [8] and [6] are slightly different. This paper uses the results from the full version [6].

Notes

Acknowledgments

We would like to thank to Sourav Sen Gupta and the anonymous referees for their fruitful comments and suggestions. We also would like to thank to Tubasa Tsukaune and Atsushi Nagao for insightful discussions. This work was supported in part by Grant-in-Aid for Scientific Research (C) (KAKENHI 23560455) for Japan Society for the Promotion of Science and Cryptography Research and Evaluation Committee (CRYPTREC).

References

  1. 1.
    Biham, E., Carmeli, Y.: Efficient reconstruction of RC4 keys from internal states. In: Nyberg, K. (ed.) FSE 2008. LNCS, vol. 5086, pp. 270–288. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  2. 2.
    Canvel, B., Hiltgen, A.P., Vaudenay, S., Vuagnoux, M.: Password interception in a SSL/TLS channel. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 583–599. Springer, Heidelberg (2003) CrossRefGoogle Scholar
  3. 3.
    Fluhrer, S.R., McGrew, D.A.: Statistical analysis of the alleged RC4 keystream generator. In: Schneier, B. (ed.) FSE 2000. LNCS, vol. 1978, p. 19. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  4. 4.
    Golić, J.D.: Linear statistical weakness of alleged RC4 keystream generator. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 226–238. Springer, Heidelberg (1997) CrossRefGoogle Scholar
  5. 5.
    Sen Gupta, S., Maitra, S., Paul, G., Sarkar, S.: Proof of empirical RC4 biases and new key correlations. In: Miri, A., Vaudenay, S. (eds.) SAC 2011. LNCS, vol. 7118, pp. 151–168. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  6. 6.
    Sen Gupta, S., Maitra, S., Paul, G., Sarkar, S.: (Non-)random sequences from (Non-)random permutations - analysis of RC4 stream cipher. J. Cryptol 27(1), 67–108 (2014). http://dblp.uni-trier.de/rec/bibtex/journals/joc/GuptaMPS14 CrossRefMATHGoogle Scholar
  7. 7.
    Knudsen, L.R., Meier, W., Preneel, B., Rijmen, V., Verdoolaege, S.: Analysis methods for (alleged) RC4. In: Ohta, K., Pei, D. (eds.) ASIACRYPT 1998. LNCS, vol. 1514, pp. 327–341. Springer, Heidelberg (1998) CrossRefGoogle Scholar
  8. 8.
    Maitra, S., Paul, G., Sen Gupta, S.: Attack on broadcast RC4 revisited. In: Joux, A. (ed.) FSE 2011. LNCS, vol. 6733, pp. 199–217. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  9. 9.
    Mantin, I.: Analysis of the stream cipher RC4. Master’s Thesis, The Weizmann Institute of Science, Israel (2001). http://www.wisdom.weizmann.ac.il/itsik/RC4/rc4.html
  10. 10.
    Mantin, I.: Predicting and distinguishing attacks on RC4 keystream generator. In: Cramer, R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 491–506. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  11. 11.
    Mantin, I., Shamir, A.: A practical attack on broadcast RC4. In: Matsui, M. (ed.) FSE 2001. LNCS, vol. 2355, p. 152. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  12. 12.
    Matsui, M.: Key collisions of the RC4 stream cipher. In: Dunkelman, O. (ed.) FSE 2009. LNCS, vol. 5665, pp. 38–50. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  13. 13.
    Maximov, A., Khovratovich, D.: New state recovery attack on RC4. In: Wagner, D. (ed.) CRYPTO 2008. LNCS, vol. 5157, pp. 297–316. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  14. 14.
    Mironov, I.: (Not so) random shuffles of RC4. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 304–319. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  15. 15.
    Paul, G., Maitra, S.: Permutation after RC4 key scheduling reveals the secret key. In: Adams, C., Miri, A., Wiener, M. (eds.) SAC 2007. LNCS, vol. 4876, pp. 360–377. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  16. 16.
    Paul, S., Preneel, B.: A new weakness in the RC4 keystream generator and an approach to improve the security of the cipher. In: Roy, B., Meier, W. (eds.) FSE 2004. LNCS, vol. 3017, pp. 245–259. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  17. 17.
    Sepehrdad, P., Vaudenay, S., Vuagnoux, M.: Discovery and exploitation of new biases in RC4. In: Biryukov, A., Gong, G., Stinson, D.R. (eds.) SAC 2010. LNCS, vol. 6544, pp. 74–91. Springer, Heidelberg (2011) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Kobe UniversityNada-ku, KobeJapan
  2. 2.Hiroshima UniversityHigashi-Hiroshima, HiroshimaJapan

Personalised recommendations