4.1 The Attack of AlFardan et al.
The idea behind the single-byte bias attack of AlFardan et al. [3] is to first obtain a detailed picture of the distributions of RC4 keystream bytes \(Z_r\), for all positions \(r\) of interest, by gathering statistics from keystreams generated using a large number of independent keys. That is, for all \(r\), we (empirically) estimate
$$ p_{r,k}:= \Pr (Z_r = k), \quad k = \mathtt{0x00},\ldots ,\mathtt{0xFF}, $$
where the probability is taken over a random choice of the RC4 encryption key. In [3], these keys were taken to be random 128-bit values, reflecting how session keys are set in TLS; for TKIP, these keys should be generated according to the procedure described in Sect. 3.1.
The second step in the approach of [3] is to use the \(p_{r,k}\) estimates to recover plaintext using a maximum-likelihood approach, as follows. Suppose we have \(S\) ciphertexts \(C_1,\ldots ,C_S\) available for our attack (for the \(r\)-th byte of ciphertext \(C_j\) we write \(C_{j,r}\)). For any fixed position \(r\) and any candidate plaintext byte \(\mu \) for that position, vector \((N^{(\mu )}_{\mathtt{0x00}},\ldots ,N^{(\mu )}_{\mathtt{0xFF}})\) with
$$ N^{(\mu )}_k = |\{j \; | \; C_{j,r} = k\oplus \mu \}_{1\le j \le S }|\qquad (\mathtt{0x00}\le k\le \mathtt{0xFF}) $$
represents the distribution on \(Z_r\) required to obtain the observed ciphertext bytes \(\{C_{j,r}\}_{1\le j\le S}\) by encrypting \(\mu \). We compare these induced distributions (one for each possible \(\mu \)) with the accurate distribution \(p_{r,\mathtt{0x00}},\ldots ,p_{r,\mathtt{0xFF}}\) and interpret a close match as an indication for the corresponding plaintext candidate \(\mu \) being the correct one, i.e., \(P_r=\mu \). More formally, we observe that the probability \(\lambda _\mu \) that plaintext byte \(\mu \) is encrypted to ciphertext bytes \(\{C_{j,r}\}_{1\le j\le S}\) follows a multinomial distribution:
$$\begin{aligned} \lambda _\mu = \frac{S!}{ N^{(\mu )}_{\mathtt{0x00}}! \cdots N^{(\mu )}_{\mathtt{0xFF}}!} \prod _{k \in \{\mathtt{0x00},\ldots ,\mathtt{0xFF}\}} p_{r,k}^{N^{(\mu )}_{k}}. \end{aligned}$$
(2)
The approach of [3] then determines the (optimal) maximum-likelihood plaintext byte value \(\mu \) by computing \(\lambda _\mu \) for all \(\mathtt{0x00}\le \mu \le \mathtt{0xFF}\) and identifying \(\mu \) such that \(\lambda _\mu \) is largest. Algorithm 3 more formally specifies the described attack, incorporating some optimizations discussed in [3] (in particular, as the fraction in Eq. (2) is independent of \(\mu \), we compute the \(\lambda _\mu \) values only up to that constant; in fact, we actually compute and compare \(\log \lambda _\mu \), rather than \(\lambda _\mu \)).
4.2 Attack Based on \((\mathtt{TSC}_0,\mathtt{TSC}_1)\) Pair Binning
We next discuss our extension of the attack in Algorithm 3 that uses the single-byte RC4 biases, along with their strengths, on a per \((\mathtt{TSC}_0,\mathtt{TSC}_1)\) pair basis. For ease of notation, we let \(\mathtt{\overline{TSC}}\) denote the pair \((\mathtt{TSC}_0, \mathtt{TSC}_1)\) in mathematical expressions.
The idea is to first obtain a detailed picture of the distributions of RC4 keystream bytes \(Z_r\), for all positions \(r\) in some range, on a per \((\mathtt{TSC}_0, \mathtt{TSC}_1)\) pair basis, by gathering statistics from keystreams generated using a large number of keys (\(2^{24}\) per \((\mathtt{TSC}_0, \mathtt{TSC}_1)\) pair in our case). That is, for all \(r\) in our selected range, we now estimate
$$ p_{\mathtt{\overline{TSC}},r,k}:= \Pr (Z_r = k),\, \mathtt{\overline{TSC}} = (\mathtt{0x00},\mathtt{0x00}), \ldots , (\mathtt{0xFF},\mathtt{0xFF}),\, k = \mathtt{0x00},\ldots ,\mathtt{0xFF} $$
where the probability is taken over the random choice of the RC4 encryption key \(\mathtt{K}\), subject to the structure on \(\mathtt{K}_0\), \(\mathtt{K}_1\), \(\mathtt{K}_2\) induced by \(\mathtt{\overline{TSC}} = (\mathtt{TSC}_0,\mathtt{TSC}_1)\).
Using these biases \(p_{\mathtt{\overline{TSC}},r,k}\), in a second step, plaintext can be recovered using a variation of the preceding maximum-likelihood approach, as follows.
Suppose we have \(S\) ciphertexts \(C_1,\ldots ,C_S\) available for our attack. We partition these into \(2^{16}\) groups according to the value of the \((\mathtt{TSC}_0,\mathtt{TSC}_1)\) pair; for convenience, we assume the resulting bins of ciphertexts are all of equal size \(T=S/2^{16}\), but this need not be the case. Let the bin of ciphertexts associated with a particular \(\mathtt{\overline{TSC}} = (\mathtt{TSC}_0, \mathtt{TSC}_1)\) pair be denoted \(\mathcal{S}_{\mathtt{\overline{TSC}}}\) and have members \(C_{\mathtt{\overline{TSC}},j}\) for \(j=1, \ldots , T\); we denote the byte at position \(r\) of \(C_{\mathtt{\overline{TSC}},j}\) by \(C_{\mathtt{\overline{TSC}},j,r}\). For any fixed position \(r\) and any candidate plaintext byte \(\mu \) for that position, vector \((N^{(\mu )}_{\mathtt{\overline{TSC}},\mathtt{0x00}},\ldots ,N^{(\mu )}_{\mathtt{\overline{TSC}},\mathtt{0xFF}})\) with
$$ N^{(\mu )}_{\mathtt{\overline{TSC}},k} = |\{j \; | \; C_{\mathtt{\overline{TSC}},j,r} = k\oplus \mu \}_{1\le j \le T}|\qquad (\mathtt{0x00}\le k\le \mathtt{0xFF}) $$
represents the distribution on \(Z_r\) required to obtain the observed ciphertext bytes \(\{C_{\mathtt{\overline{TSC}},j,r}\}_{1\le j\le T}\) for bin \(\mathcal{S}_{\mathtt{\overline{TSC}}}\) by encrypting \(\mu \). We compare these induced distributions (one for each possible \(\mu \) and for each possible \((\mathtt{TSC}_0,\mathtt{TSC}_1)\) pair) with the accurate distribution \(p_{\mathtt{\overline{TSC}},r,\mathtt{0x00}},\ldots ,p_{\mathtt{\overline{TSC}},r,\mathtt{0xFF}}\) and interpret a close match as being an indication for the corresponding plaintext candidate \(\mu \) being the correct one, i.e., \(P_r=\mu \), in bin \(\mathcal{S}_{\mathtt{\overline{TSC}}}\). The probability \(\lambda _{\mathtt{\overline{TSC}},\mu }\) that plaintext byte \(\mu \) is encrypted to ciphertext bytes \(\{C_{\mathtt{\overline{TSC}},j,r}\}_{1\le j\le T}\) in bin \(\mathcal{S}_{\mathtt{\overline{TSC}}}\) now follows a multinomial distribution:
$$\begin{aligned} \lambda _{\mathtt{\overline{TSC}},\mu } = \frac{T!}{ N^{(\mu )}_{\mathtt{\overline{TSC}},\mathtt{0x00}}! \cdots N^{(\mu )}_{\mathtt{\overline{TSC}},\mathtt{0xFF}}!} \prod _{k \in \{\mathtt{0x00},\ldots ,\mathtt{0xFF} \}} p_{\mathtt{\overline{TSC}},r,k}^{N^{(\mu )}_{\mathtt{\overline{TSC}},k}}\,. \end{aligned}$$
(3)
The probability that plaintext byte \(\mu \) is encrypted to ciphertext bytes \(\{C_{\mathtt{\overline{TSC}},j,r}\}_{1\le j\le T}\) across all bins \(\mathcal{S}_{\mathtt{\overline{TSC}}}\) can then be precisely calculated as
$$\begin{aligned} \lambda _{\mu } = \prod _{(\mathtt{0x00},\mathtt{0x00}) \le \mathtt{\overline{TSC}} \le (\mathtt{0xFF},\mathtt{0xFF})} \lambda _{\mathtt{\overline{TSC}},\mu }\,. \end{aligned}$$
By computing \(\lambda _{\mu }\) for all \(\mathtt{0x00}\le \mu \le \mathtt{0xFF}\), and identifying \(\mu \) such that \(\lambda _\mu \) is largest, we determine the (optimal) maximum-likelihood plaintext byte value. This informal description, together with some optimisations that we describe next, is specified in algorithmic form in Algorithm 4.
Observe that, for each fixed position \(r\) and set of ciphertexts \(\{C_{\mathtt{\overline{TSC}},j,r}\}_{1\le j\le T}\), values \(N^{(\mu )}_{\mathtt{\overline{TSC}},k}\) can be computed from values \(N^{(\mu ')}_{\mathtt{\overline{TSC}},k}\) by equation \(N^{(\mu )}_{\mathtt{\overline{TSC}},k} = N^{(\mu ')}_{\mathtt{\overline{TSC}},k \oplus \mu ' \oplus \mu }\), for all \(k\). In other words, for a fixed \((\mathtt{TSC}_0,\mathtt{TSC}_1)\) pair, vectors \((N^{(\mu )}_{\mathtt{\overline{TSC}},\mathtt{0x00}},\ldots ,N^{(\mu )}_{\mathtt{\overline{TSC}},\mathtt{0xFF}})\) and \((N^{(\mu ')}_{\mathtt{\overline{TSC}},\mathtt{0x00}},\ldots ,N^{(\mu ')}_{\mathtt{\overline{TSC}},\mathtt{0xFF}})\) are permutations of each other; by consequence, the term \(T! / (N^{(\mu )}_{\mathtt{\overline{TSC}},\mathtt{0x00}}! \cdots N^{(\mu )}_{\mathtt{\overline{TSC}},\mathtt{0xFF}}!)\) in Eq. (3) is a constant for each choice of \(\mu \) (but not necessarily constant across different values for the \((\mathtt{TSC}_0,\mathtt{TSC}_1)\) pair). If \(T\) is fixed (as we assume it to be), then the \(T!\) terms can all be omitted from all calculations. Furthermore, computing and comparing \(\log (\lambda _{\mathtt{\overline{TSC}},\mu })\) and \(\log (\lambda _\mu )\) instead of \(\lambda _{\mathtt{\overline{TSC}},\mu }\) and \(\lambda _\mu \) makes the computation more efficient and accuracy easier to maintain.
Comparing with Algorithm 3, we see that our new Algorithm 4, at its core, runs Algorithm 3 once for each \((\mathtt{TSC}_0,\mathtt{TSC}_1)\) pair, and then combines the resulting likelihood estimates \(\lambda _{\mathtt{\overline{TSC}},\mu }\) to obtain the final estimate \(\lambda _{\mu }\) for plaintext candidate \(\mu \). Some care is needed, however, to use the correct scaling factors (\({T!}\) and \(N^{(\mu )}_{\mathtt{\overline{TSC}},\mathtt{0x00}}! \cdots N^{(\mu )}_{\mathtt{\overline{TSC}},\mathtt{0xFF}}!\)) for each \((\mathtt{TSC}_0,\mathtt{TSC}_1)\) pair.
4.3 Attack Based on Aggregation Over \(\mathtt{TSC}_0\) Values
As mentioned in the introduction, one method of coping with noisy estimates for the probabilities \(p_{\mathtt{\overline{TSC}},r,k}\) is to consider aggregation of biases over \(\mathtt{TSC}_0\). This is supported by the experiments reported in Sect. 3.2, where we saw that there is broad agreement between the \(\mathtt{TSC}_0\)-aggregated data and the data for individual \((\mathtt{TSC}_0,\mathtt{TSC}_1)\) pairs.
It is not difficult to see how to modify Algorithm 4 to work with \(2^8\) bins, one for each value of \(\mathtt{TSC}_1\), instead of \(2^{16}\) bins. The execution of the modified algorithm becomes in practice faster, since each estimate for a plaintext byte \(\mu \) now only involves calculation of \(\lambda _{\mathtt{\overline{TSC}},\mu }\) over \(2^8\)
\(\mathtt{TSC}_1\) values instead of \(2^{16}\)
\((\mathtt{TSC}_0,\mathtt{TSC}_1)\) pair values.
4.4 Further Optimizations
In specific settings where the attacker has a priori information about the encrypted plaintext the performance of Algorithms 3 and 4 can be further improved. Here, the considerations are similar to those in [3] and so we omit further discussion.