Abstract
RC4 has been the most popular stream cipher in the history of symmetric key cryptography. Its internal state contains a permutation over all possible bytes from 0 to 255, and it attempts to generate a pseudorandom sequence of bytes (called keystream) by extracting elements of this permutation. Over the last twenty years, numerous cryptanalytic results on RC4 stream cipher have been published, many of which are based on nonrandom (biased) events involving the secret key, the state variables, and the keystream of the cipher.
Though biases based on the secret key are common in RC4 literature, none of the existing ones depends on the length of the secret key. In the first part of this paper, we investigate the effect of RC4 keylength on its keystream, and report significant biases involving the length of the secret key. In the process, we prove the two known empirical biases that were experimentally reported and used in recent attacks against WEP and WPA by Sepehrdad, Vaudenay and Vuagnoux in EUROCRYPT 2011. After our current work, there remains no bias in the literature of WEP and WPA attacks without a proof.
In the second part of the paper, we present theoretical proofs of some significant initialround empirical biases observed by Sepehrdad, Vaudenay and Vuagnoux in SAC 2010.
In the third part, we present the derivation of the complete probability distribution of the first byte of RC4 keystream, a problem left open for a decade since the observation by Mironov in CRYPTO 2002. Further, the existence of positive biases towards zero for all the initial bytes 3 to 255 is proved and exploited towards a generalized broadcast attack on RC4. We also investigate for longterm nonrandomness in the keystream, and prove a new longterm bias of RC4.
Similar content being viewed by others
1 Introduction
In the domain of symmetric key cryptology, the stream ciphers are considered to be one of the most important primitives. A stream cipher aims to output a pseudorandom sequence of bits, called the keystream, and encryption is done by masking the plaintext (considered as a sequence of bits) by the keystream. The masking operation is just a simple XOR in general, and so the ciphertext is also a sequence of bits of the same length as that of the plaintext. For ideal information theoretic ‘perfect secrecy’ of the scheme, it is desired that the masking is done using a onetime pad, where a unique sequence of bits is used as a mask for each plaintext. In reality, however, a onetime pad is not practically feasible, as it requires a key as large as the length of the plaintext. Instead, a computational notion of secrecy is ensured by the pseudorandom nature of the output sequence (keystream) generated by a stream cipher. Any nonrandom event in the internal state or the keystream of a stream cipher is not desired from a cryptographic point of view, and rigorous analysis is performed to identify the presence of any such nonrandomness in its design.
The most important and cryptographically significant goal of a stream cipher is to produce a pseudorandom sequence of bits or words using a fixedlength secret key (or a secret key paired with an initialization vector). Over the last three decades of research and development in stream ciphers, a number of designs have been proposed and analyzed by the cryptology community. One of the main ideas for building a stream cipher relies on constructing a pseudorandom permutation and thereafter extracting a pseudorandom sequence from this permutation. Interestingly, even if the underlying permutation is pseudorandom, if the method of extracting the words from the permutation is not carefully designed, then it may be possible to identify certain biased events in the final keystream of the cipher.
To date, the most popular stream cipher has been RC4, which follows the design principle of extracting pseudorandom bytes from pseudorandom permutations. This cipher gains its popularity for its intriguing simplicity that has made it widely accepted for numerous software and web applications. In this paper, we study and analyze some important nonrandom events of the RC4 stream cipher, thereby illustrating some key design vulnerabilities in the shuffleexchange paradigm.
1.1 RC4 Stream Cipher
RC4 is the most widely deployed commercial stream cipher, having applications in network protocols such as SSL, WEP, WPA and in Microsoft Windows, Apple OCE, Secure SQL, etc. It was designed in 1987 by Ron Rivest for RSA Data Security (now RSA Security). The design was a trade secret since then, and was anonymously posted on the web in 1994. Later, the public description was verified by comparing the outputs of the posted design with those of the licensed systems using proprietary versions of the original cipher, although the public design has never been officially approved or claimed by RSA Security to be the original cipher.
The cipher consists of two major components, the Key Scheduling Algorithm (KSA) and the PseudoRandom Generation Algorithm (PRGA). The internal state of RC4 contains a permutation of all 8bit words, i.e., a permutation of N=256 bytes, and the KSA produces the initial pseudorandom permutation of RC4 by scrambling an identity permutation using the secret key k. The secret key k of RC4 is of length typically 5 to 32 bytes, which generates the expanded key K of length N=256 bytes by simple repetition. If the length of the secret key k is l bytes (typically 5≤l≤32), then the expanded key K is constructed as K[i]=k[imodl] for 0≤i≤N−1. The initial permutation S produced by the KSA acts as an input to the PRGA that generates the keystream. The RC4 algorithms KSA and PRGA are as shown in Fig. 1.
Notation
For round r=1,2,… of RC4 PRGA, we denote the indices by i _{ r },j _{ r }, the permutations before and after the swap by S _{ r−1} and S _{ r } respectively, the output byteextraction index as t _{ r }=S _{ r }[i _{ r }]+S _{ r }[j _{ r }], and the keystream output byte by Z _{ r }=S _{ r }[t _{ r }]. After r rounds of KSA, we denote the state variables by adding a superscript K to each variable. By \(S^{K}_{0}\) and S _{0} we denote the initial permutations before KSA and PRGA, respectively. Note that \(S^{K}_{0}\) is the identity permutation and \(S_{0} = S^{K}_{N}\) is the permutation obtained right after the completion of KSA. We denote the length of the secret key k as l. In this paper, all arithmetic operations in the context of RC4 are to be considered modulo N, unless specified otherwise.
1.2 An Overview of RC4 Cryptanalysis
The goal of RC4, like all stream ciphers, is to produce a pseudorandom sequence of bits from the internal permutation. Hence, one of the main ideas for RC4 cryptanalysis is to investigate for biases, that is, statistical weaknesses that can be exploited to computationally distinguish the keystream of RC4 from a truly random sequence of bytes with a considerable probability of success.
The target of an attack may be to exploit the nonrandomness in the internal state of RC4, or the nonrandomness of byteextraction from the internal permutation. Both ideas have been put to practice in various ways in the literature, and the main theme of attacks on RC4 can be categorized in four major directions, as follows.

1.
Weak keys and Key recovery from state: Weaknesses of RC4 keys and KSA have attracted quite a lot of attention from the community. In particular, Roos [27] and Wagner [37] showed that for specific properties of a ‘weak’ secret key, certain undesirable biases occur in the internal state and in the keystream. Grosul and Wallach [11] demonstrated that certain relatedkey pairs generate similar output bytes in RC4. Later, Matsui [21] reported colliding key pairs for RC4 for the first time and then stronger key collisions were found by Chen and Miyaji [5].
A direct approach for key recovery from the internal permutation of RC4 was first proposed by Paul and Maitra [26], and was later studied by Biham and Carmeli [4], Khazaei and Meier [13], Akgün, Kavak and Demirci [1], and Basu, Maitra, Paul and Talukdar [3].

2.
Key recovery from the keystream: Key recovery from the keystream primarily exploits the use of RC4 in WEP and WPA. The analysis by Fluhrer, Mantin and Shamir [7] and Mantin [20] are applicable towards RC4 in WEP mode, and there are quite a few practical attacks [14, 29–31, 35, 36] on the WEP protocol as well. After a practical breach of WEP by Tews, Weinmann and Pyshkin [34] in 2007, the new variant WPA came into the picture. This too used RC4 as a backbone, and the most recent result published by Sepehrdad, Vaudenay and Vuagnoux [31] mounts a distinguishing attack as well as a key recovery attack on RC4 in WPA mode. Sepehrdad’s Ph.D. thesis [29] presents a thorough and revised analysis of the most recent WEP and WPA attacks published in Refs. [30, 31].

3.
State recovery attacks: The huge statespace of RC4 (256!×256^{2}≈2^{1700} for N=256) makes a staterecovery attack quite challenging for this cipher. The first important state recovery attack was due to Knudsen, Meier and Preneel [15], with complexity 2^{779}. After a series of improvements by Mister and Tavares [24], Golic [9], Shiraishi, Ohigashi and Morii [32], and Tomasevic, Bojanic and NietoTaladriz [33], the best attack with complexity 2^{241} was published by Maximov and Khovratovich [22]. Due to this, a secret key of length beyond 30 bytes is not practically meaningful. A contemporary result by Golic and Morgari [10] claims to improve the attack of Ref. [22] even further by iterative probabilistic reconstruction of the RC4 internal states.

4.
Biases and Distinguishers: Most of the results in this category are targeted towards specific shortterm (involving only the initial few bytes of the output) biases and correlations [8, 12, 16, 18, 23, 25, 27, 30], while there exist only a few important results for longterm (prominent even after discarding an arbitrary number of initial bytes of the output) biases [2, 6, 8, 19].
Figure 2 gives a chronological summary of the important cryptanalytic results on RC4 to date.
Before summarizing our contributions, let us now present a brief outline explaining how many keystream bytes are required to identify a bias with a good success probability. For a stream cipher, if there is an event such that the probability of occurrence of the event is different from that in case of a uniformly random sequence of bits, the event is said to be biased. If there exists a biased event based only on the bits of the keystream, then such an event gives rise to a distinguisher for the cipher that can computationally differentiate between the keystream of the cipher and a random sequence of bits. The efficiency of the distinguishers is mostly judged by the number of samples required to identify the bias.
Let E be an event based on some key bits or state bits or keystream bits or a combination of them in a stream cipher. Suppose, Pr(E)=p for a uniformly random sequence of bits, and Pr(E)=p(1+q) for the keystream of the stream cipher under consideration. The cryptanalytic motivation of studying a stream cipher is to distinguish these two sequences in terms of the difference in the above probabilities when p is small and q≠0. One may refer to Ref. [18] to note that one requires approximately 1/pq ^{2} many samples to identify the bias with a success probability 0.78 which is reasonably higher than half.
1.3 Our Contributions
In this paper, we extend and supplement the literature of RC4 cryptanalysis by introducing the concept of keylengthdependent biases, identifying new shortterm biases, and by investigating for new longterm biases in RC4. Sections 2, 3 and 4 contain the technical results of this paper.
 Section 2::

In SAC 2010, Sepehrdad, Vaudenay and Vuagnoux [30] reported the empirical bias Pr(S _{16}[j _{16}]=0∣Z _{16}=−16)=0.038488 and mentioned that no explanation of this bias could be found. A related bias of the same order involving the event \((S^{K}_{17}[16] = 0 \mid Z_{16} = 16)\) has been empirically reported in Ref. [29, Sect. 6.1], and this has been used to mount WEP and WPA attacks on RC4. Our detailed experimentation suggests that the number 16 in both the events comes from the keylength of 16 bytes with which the experiments were performed in Refs. [29, 30] and similar biases hold for any length of the secret key. For the first time, we present a proof of these keylengthdependent conditional biases in RC4.
Along the same line of investigation, we establish some new keylengthdependent conditional biases. These include a strong correlation between the length l of the secret key and the lth byte in the keystream (typically, for 5≤l≤32), and thus we propose a method to predict the keylength of the cipher by observing the keystream.
 Section 3::

In this section, we provide theoretical proofs for some significant empirical biases of RC4 involving the state variables in the initial rounds, that were reported by Sepehrdad, Vaudenay and Vuagnoux [30] in SAC 2010. In addition, we rigorously study the nonrandomness of index j to find a strong bias of j _{2} towards 4. We further use this bias to establish a correlation between the state variable S _{2}[2] and the output keystream byte Z _{2}.
 Section 4::

In this section, we investigate and discuss biases related to the RC4 keystream.

4.1
In CRYPTO 2002, Mironov [23] observed that the first byte Z _{1} of RC4 keystream has a negative bias towards zero, and also found an interesting nonuniform probability distribution (similar to a sine curve) for all other values of this byte. However, the theoretical proof remained open for almost a decade. In Sect. 4.1, for the first time we derive the complete theoretical distribution of Z _{1}.

4.2
In FSE 2001, Mantin and Shamir [18] proved the bias \(\Pr(Z_{2} = 0) \approx\frac{2}{N}\), and claimed that no such bias exists in any subsequent byte in the keystream. Contrary to this claim, we prove in Sect. 4.2 that all the bytes 3 to 255 of RC4 initial keystream are biased towards zero.

4.3
Biases in initial rounds of RC4 have no effect if one throws away some initial bytes from the keystream of RC4. This naturally motivates a quest for longterm biases in the RC4 output, if any exists. In Sect. 4.3, we observe and prove a new longterm bias in RC4 keystream.

4.1
2 Biases Based on the Length of the Secret Key
In this section, we present a family of biases in RC4 that are dependent on the length of the secret key, and thereby try to predict the keylength of RC4. Our motivation arises from the conditional bias Pr(S _{16}[j _{16}]=0∣Z _{16}=−16)≈0.038488 observed by Sepehrdad, Vaudenay and Vuagnoux [30]. They also mentioned in Ref. [30, Sect. 3] that no explanation for this bias could be found. For direct exploitation in WEP and WPA attacks, a related KSA version of this bias (of the same order) was reported in Ref. [29, Sect. 6.1] for the event \((S^{K}_{17}[16] = 0 \mid Z_{16} = 16)\).
While exploring these conditional biases in RC4 PRGA, we ran extensive experiments (1 billion runs of RC4 with randomly chosen keys in each case) with N=256 and keylength 5≤l≤32. We could observe that the biases actually correspond to the keylength l:
where each of \(\eta^{(1A)}_{l}\) and \(\eta^{(1B)}_{l}\) decreases from 12 to 7 (approx.) as l increases from 5 to 32. In this section, we present proofs of these two biases for the first time.
We also observe and prove a family of new conditional biases. Experimenting with 1 billion runs of RC4 in each case, we observed that:
where \(\eta^{(2)}_{l}\) decreases from 12 to 7 (approx.), each of \(\eta^{(3)}_{l}\) and \(\eta^{(4)}_{l}\) decreases from 34 to 22 (approx.), and \(\eta^{(5)}_{l}\) decreases from 30 to 20 (approx.) as l increases from 5 to 32.
We also find a keylength distinguisher for RC4, based on the following event.
2.1 Technical Results Required to Prove the Biases
For the proofs of the biases in this section we need some additional technical results that we present here. Some of these results would also be referred for our results in subsequent sections. We start with Ref. [17, Theorem 6.2.1], restated as Proposition 1 below.
Proposition 1
At the end of RC4 KSA, for 0≤u≤N−1, 0≤v≤N−1,
Now, we extend the above result to the end of the first round of the PRGA. Since the KSA ends with i ^{K}=N−1 and the PRGA begins with i=1, skipping the index 0 of RC4 permutation, this extension is nontrivial, as would be clear from the proof of Lemma 1. Note that this is a revised version of Ref. [28, Lemma 1].
Lemma 1
After the first round of RC4 PRGA, the probability Pr(S _{1}[u]=v) is:
Proof
First, let us represent the probability as \(\Pr(S_{1}[u] = v) = \sum_{X=0}^{N1} \Pr(S_{0}[1] = X \wedge S_{1}[u] = v)\). The goal is to reduce all probabilities in terms of expressions over S _{0}. After the first round of RC4 PRGA, all positions of S _{0}, except for i _{1}=1 and j _{1}=S _{0}[1]=X, remain fixed in S _{1}. So, we need to be careful about the cases where X=1,u,v. Let us separate these cases and write
Now, depending on the values of u,v, we get a few special cases. In the first PRGA round,
This indicates that one needs to consider two special cases, u=1 and u=v, separately. However, there is an overlap within these two cases at the point (u=1,v=1), which in turn, should be considered on its own. In total, we have fours cases to consider for (4), as shown in Fig. 3.
 Common point u=1,v=1::

In this case, S _{0}[1]=X=1 implies no swap, resulting in S _{1}[u]=S _{1}[1]=S _{0}[1]. If X≠1, we have S _{1}[u]=S _{1}[1]=S _{0}[X]. Thus, (4) reduces to
 Special case u=1,v≠1::

In this case, S _{0}[1]=X=1 implies S _{1}[u]=S _{1}[1]=S _{0}[1], as before, and S _{0}[1]=X=v implies S _{1}[u]=S _{1}[1]=S _{0}[v]. If X≠1,v, we have S _{1}[u]=S _{1}[1]=S _{0}[X]. Thus,
 Special case u≠1,v=u::

In this case, S _{0}[1]=X=1 implies no swap, resulting in S _{1}[u]=S _{0}[u]. Again, S _{0}[1]=X=u implies S _{1}[u]=S _{0}[1], and if X≠1,u, we have S _{1}[u]=S _{0}[u]. Thus,
 General case u≠1,v≠u::

In this case, S _{0}[1]=X=1 implies no swap, resulting in S _{1}[u]=S _{0}[u]. Again, S _{0}[1]=X=u implies S _{1}[u]=S _{0}[1], and if X≠1,u, we have S _{1}[u]=S _{0}[u]. Thus,
Combining all the above cases together, we obtain the desired result.
□
The probabilities depending on S _{0} can be derived from Proposition 1. The estimation of the joint probabilities Pr(S _{0}[u]=v∧S _{0}[u′]=v′) is also required for our next result, i.e., Theorem 1, as well as for our results in Sect. 4.1. This estimation is explained in detail in Sect. 4.1.3.
In Theorem 1, we find the probability distribution of S _{ u−1}[u]=v, just before index i touches the position u during PRGA. This is a generalization of Ref. [28, Theorem 4].
Theorem 1
In RC4 PRGA, for 3≤u≤N−1,
Proof
From Lemma 1, we know that the event Pr(S _{1}[u]=v) is positively biased for all u. Hence the natural path for investigation is as follows:
Case (S _{1}[u]=v): Index i varies from 2 to (u−1) during the evolution of S _{1} to S _{ u−1}, and hence never touches the uth index. Thus, the index u will retain its value S _{1}[u] if index j does not touch it. The probability of this event is (1−1/N)^{u−2} over all the intermediate rounds. Hence we get:
Case (S _{1}[u]≠v): Suppose that S _{1}[t]=v for some t≠u. In such a case, only a swap between the positions u and t during rounds 2 to (u−1) of PRGA can result in (S _{ u−1}[u]=v). If index i does not touch the tth location, then the value at S _{1}[t] can only go to some position behind i≤u−1, and can never reach S _{ u−1}[u]. Thus we must have i touching the tth position, i.e., 2≤t≤u−1.
Now suppose that it requires (w+1) hops for v to reach from S _{1}[t] to S _{ u−1}[u]. The transfer will never happen if the position t swaps with any index which is not touched by i later. This fraction of favorable positions start from (u−t−1)/N for the first hop and decreases approximately to (u−t−1)/(lN) at the lth hop. It is also required that j does not touch the position u for the remaining (u−3−w) rounds. Thus, the second part of the probability for a specific position t is:
Finally, the number of hops is bounded as 1≤w+1≤u−t+1 (here w+1=1 or w=0 denotes a singlehop transfer), depending on the initial gap between t and u positions. Summing over all t,k with their respective bounds, we get the desired expression for Pr(S _{ u−1}[u]=v). □
2.2 Proofs of the KeylengthDependent Biases in (2)
Observation of the biases (2) was first reported in Ref. [28, Sect. 3], but without any proof. In this section, we present complete proofs of all these biases. Although the biases are all conditional in nature, for ease of understanding we first compute the associated joint probabilities and then discuss how the conditional probabilities can be computed. All the biases that we are interested in are related to \((S^{K}_{l+1}[l1] = l \wedge S^{K}_{l+1}[l] = 0)\). So we first derive the probability for this event.
Lemma 2
Suppose that l is the length of the secret key of RC4. Then we have
Proof
The major path that leads to the target event is as follows.

In the first round of the KSA, when \(i^{K}_{1} = 0\) and \(j^{K}_{1} = K[0]\), the value 0 is swapped into the index S ^{K}[K[0]] with probability 1.

The index \(j^{K}_{1} = K[0] \notin\{l1, l, l\}\), so that the values l−1,l,−l at these indices respectively are not swapped out in the first round of the KSA. We as well require K[0]∉{1,…,l−2}, so that the value 0 at index K[0] is not touched by these values of i ^{K} during the next l−2 rounds of the KSA. This happens with probability \((1\frac{l+1}{N} )\).

From round 2 to l−1 (i.e., for i ^{K}=1 to l−2) of the KSA, none of \(j^{K}_{2}, \ldots, j^{K}_{l1}\) touches the three indices {l,−l,K[0]}. This happens with probability \((1\frac{3}{N} )^{l2}\).

In round l of the KSA, when \(i^{K}_{l} = l1\), \(j^{K}_{l}\) becomes −l with probability \(\frac{1}{N}\), thereby moving −l into index l−1.

In round l+1 of the KSA, when \(i^{K}_{l+1} = l\), \(j^{K}_{l+1}\) becomes \(j^{K}_{l} + S^{K}_{l}[l] + K[l] = l + l + K[0] = K[0]\), and as discussed above, this index contains the value 0. Hence, after the swap, \(S^{K}_{l+1}[l] = 0\). Since K[0]≠l−1, we have \(S^{K}_{l+1}[l1] = l\).
Considering the above events to be independent, the probability that all of above occur together is given by \(\alpha_{l} = \frac{1}{N} (1\frac {3}{N} )^{l2} (1\frac{l+1}{N} )\). If the above path does not occur, then the target event happens due to random association, with probability \(\frac{1}{N^{2}}\), thus contributing a probability of \((1\alpha_{l})\frac{1}{N^{2}}\). Adding the two contributions, the result follows. □
Now we may derive the joint probabilities associated with the conditional events of (2), as follows.
Theorem 2
Suppose that l is the length of the secret key of RC4. Then we have
where \(\beta_{l} = \frac{1}{N} (1\frac{1}{N} ) (1\frac{2}{N} )^{N3} (1\frac{3}{N} )^{l2} (1\frac{l+1}{N} )\).
Proof
From the proof of Lemma 2, consider the major path with probability α _{ l } for the event \((S^{K}_{l+1}[l1] = l \wedge S^{K}_{l+1}[l] = 0)\). For the remaining N−l−1 rounds of the KSA and for the first l−2 rounds of the PRGA (i.e., for a total of N−3 rounds), none of the values of j ^{K} (corresponding to the KSA rounds) or j (corresponding to the PRGA rounds) should touch the indices {l−1,l}. This happens with a probability of \((1\frac{2}{N} )^{N3}\).
Now, in round l−1 of PRGA, i _{ l−1}=l−1, from where the value −l moves to index j _{ l−1} due to the swap. In the next round, i _{ l }=l and j _{ l }=j _{ l−1}+S _{ l−1}[l]=j _{ l−1}, provided the value 0 at index l had not been swapped out by j _{ l−1}, the probability of which is \(1\frac{1}{N}\). So during the next swap, the value −l moves from index j _{ l } to index l and the value 0 moves from index l to j _{ l }. The probability of the above major path leading to the event (S _{ l }[l]=−l∧S _{ l }[j _{ l }]=0) is given by \(\beta_{l} = \alpha_{l} (1\frac{2}{N} )^{N3} (1\frac{1}{N} )\). If this path does not occur, then there is always a chance of \(\frac{1}{N^{2}}\) for the target event to happen due to random association. Adding the two contributions and substituting the value of α _{ l } from Lemma 2, the result follows.
Further, as t _{ l }=S _{ l }[l]+S _{ l }[j _{ l }], the event (S _{ l }[l]=−l∧S _{ l }[j _{ l }]=0) is equivalent to the event (t _{ l }=−l∧S _{ l }[j _{ l }]=0), and hence the result. □
Theorem 3
Suppose that l is the length of the secret key of RC4. Then we have
where \(\gamma_{l} = \frac{1}{N^{2}} (1\frac{l+1}{N} ) \sum_{x=l+1}^{N1} (1\frac{1}{N} )^{x} (1\frac{2}{N} )^{xl} (1\frac{3}{N} )^{Nx+2l4}\).
Proof
From the PRGA update rule, we have j _{ l }=j _{ l−1}+S _{ l−1}[l]. Hence, S _{ l }[j _{ l }]=S _{ l−1}[l]=0 implies j _{ l }=j _{ l−1} as well as Z _{ l }=S _{ l }[S _{ l }[l]+S _{ l }[j _{ l }]]=S _{ l }[S _{ l−1}[j _{ l }]+0]=S _{ l }[S _{ l−1}[j _{ l−1}]]=S _{ l }[S _{ l−2}[l−1]]. Thus, the event (Z _{ l }=−l∧S _{ l }[j _{ l }]=0) is equivalent to the event (S _{ l }[S _{ l−2}[l−1]]=−l∧S _{ l−1}[l]=0).
From the proof of Lemma 2, consider the major path with probability α _{ l } for the joint event \((S^{K}_{l+1}[l1] = l \wedge S^{K}_{l+1}[l] = 0)\). This constitutes the first part of our main path leading to the target event. The second part, having probability \(\alpha'_{l}\), can be constructed as follows.

For an index x∈[l+1,N−1], we have \(S^{K}_{x}[x] = x\). This happens with probability \((1\frac{1}{N} )^{x}\).

For the KSA rounds l+2 to x, the j ^{K} values do not touch the indices l−1 and l. This happens with probability \((1\frac{2}{N} )^{xl1}\).

In round x+1 of KSA, when \(i^{K}_{x+1} = x\), \(j^{K}_{x+1}\) becomes l−1 with probability \(\frac{1}{N}\). Due to the swap, the value x moves to \(S^{K}_{x+1}[l1]\) and the value −l moves to \(S^{K}_{x+1}[x] = S^{K}_{x+1}[S^{K}_{x+1}[l1]]\).

For the remaining N−x−1 rounds of the KSA and for the first l−1 rounds of the PRGA, none of the j ^{K} or j values should touch the indices {l−1,S[l−1],l}. This happens with a probability of \((1\frac{3}{N} )^{Nx+l2}\).

So far, we have (S _{ l−1}[S _{ l−2}[l−1]]=−l∧S _{ l−1}[l]=0). Now, we should also have j _{ l }∉{l−1,S[l−1]} for S _{ l }[S _{ l−2}[l−1]]=S _{ l−1}[S _{ l−2}[l−1]]=−l. The probability of this condition is \((1\frac{2}{N} )\).
Assuming all the individual events in the above path to be mutually independent, we get \(\alpha'_{l} = \frac{1}{N}\sum_{x=l+1}^{N1} (1\frac {1}{N} )^{x} (1\frac{2}{N} )^{xl} (1\frac{3}{N} )^{Nx+l2}\). Thus, the probability of the entire path is given by \(\gamma_{l} = \alpha_{l} \cdot\alpha'_{l} = \frac{1}{N^{2}} (1\frac{l+1}{N} ) \sum_{x=l+1}^{N1} (1\frac{1}{N} )^{x} (1\frac{2}{N} )^{xl}\allowbreak {(1\frac{3}{N} )}^{Nx+2l4}\).
If this path does not occur, then there is always a chance of \(\frac {1}{N^{2}}\) for the target event to happen due to random association. Adding the two contributions, we get the result. □
In order to calculate the conditional probabilities of (2), we need to compute the marginals δ _{ l }=Pr(S _{ l }[j _{ l }]=0) and τ _{ l }=Pr(t _{ l }=−l). Our experimental observations reveal that in 5≤l≤32, δ _{ l } does not change much with l, and has a slightly negative bias: δ _{ l }≈1/N−1/N ^{2}. On the other hand, as l varies from 5 to 32, τ _{ l } changes approximately from 1.13/N to 1.08/N. We can derive the exact expression for δ _{ l } as a corollary to Theorem 1, and an expression for τ _{ l } using δ _{ l }.
Corollary 1
For any keylength l, with 3≤l≤N−1, the probability Pr(S _{ l }[j _{ l }]=0) is given by
Proof
Note that S _{ l }[j _{ l }] is assigned the value of S _{ l−1}[l] due to the swap in round l. Hence, by substituting u=l and v=0 in Theorem 1, we get the result. □
Theorem 4
Suppose that l is the length of the secret key of RC4. Then we have
where β _{ l } is given in Theorem 2 and δ _{ l } is given in Corollary 1.
Proof
We can write Pr(t _{ l }=−l)=Pr(t _{ l }=−l∧S _{ l }[j _{ l }]=0)+Pr(t _{ l }=−l∧S _{ l }[j _{ l }]≠0), where the first term is given by Theorem 2. When S _{ l }[j _{ l }]≠0, the event (t _{ l }=−l) can be assumed to occur due to random association. Hence the second term can be computed as \(\Pr(S_{l}[j_{l}] \neq0) \cdot\Pr(t_{l} = l \mid S_{l}[j_{l}] \neq0) \approx(1\delta_{l})\frac{1}{N}\). Adding the two terms, we get the result. □
Theoretical values for both δ _{ l } and τ _{ l } match closely with the experimental ones for all values of l.
Computing the Conditional Biases in ( 2 )
When we divide the joint probabilities Pr(S _{ l }[l]=−l∧S _{ l }[j _{ l }]=0) and Pr(t _{ l }=−l∧S _{ l }[j _{ l }]=0) of Theorem 2, and Pr(Z _{ l }=−l∧S _{ l }[j _{ l }]=0) of Theorem 3 by the appropriate marginals δ _{ l }=Pr(S _{ l }[j _{ l }]=0) of Corollary 1 and τ _{ l }=Pr(t _{ l }=−l) of Theorem 4, we get theoretical values for all the biases in (2). The theoretical values closely match with the experimental observations reported in the beginning of Sect. 2.
2.3 Bias in (Z _{ l }=−l) and Keylength Prediction from Keystream
First, we prove the bias in (3) and thereby show how to predict the length l of RC4 secret key. Next, we use the marginal probability Pr(Z _{ l }=−l) to derive the conditional probabilities of (1).
Theorem 5
Suppose that l is the length of the secret key of RC4. Then we have
where γ _{ l } is given in Theorem 3 and δ _{ l } is given in Corollary 1.
Proof
We can write Pr(Z _{ l }=−l)=Pr(Z _{ l }=−l∧S _{ l }[j _{ l }]=0)+Pr(Z _{ l }=−l∧S _{ l }[j _{ l }]≠0), where the first term is given by Theorem 3. When S _{ l }[j _{ l }]≠0, the event (Z _{ l }=−l) can be assumed to occur due to random association. Hence the second term can be computed as \(\Pr(S_{l}[j_{l}] \neq0) \cdot\Pr(Z_{l} = l \mid S_{l}[j_{l}] \neq0) \approx(1\delta_{l})\frac{1}{N}\). Adding the two terms, we get the result. □
It is important to note that the estimate of Pr(Z _{ l }=−l) is always greater than 1/N+1/N ^{2}≈0.003922 for N=256 and 5≤l≤32. In Fig. 4, we plot the theoretical as well as the experimental values of Pr(Z _{ l }=−l) against l for 5≤l≤32, where the experiments have been run over 1 billion trials of RC4 PRGA, with randomly generated keys.
Keylength Distinguisher
From this estimate, we immediately get a distinguisher of RC4 that can effectively distinguish the output keystream of the cipher from a random sequence of bytes. For the event E:(Z _{ l }=−l), the bias proved in Theorem 5 can be written as p(1+q), where p=1/N and q>1/N for 5≤l≤32 and N=256. Thus, the number of samples required to distinguish RC4 from random sequence of bits with a constant probability of success is approximately \(\frac{1}{pq^{2}} = N^{3}\). Using this distinguisher, one may predict the length l of RC4 secret key from the output keystream.
Proofs of the KeylengthDependent Biases in ( 1 )
To prove the conditional biases in (1), we first compute the associated joint probabilities Pr(S _{ l }[j _{ l }]=0∧Z _{ l }=−l) and \(\Pr(S^{K}_{l+1}[l] = 0 \wedge Z_{l} = l )\), and then use the marginal Pr(Z _{ l }=−l) to obtain the final results. The first joint probability is already computed in Theorem 3, and the second one is computed as follows.
Theorem 6
Suppose that l is the length of the secret key of RC4. Then we have
where α _{ l } is given in Lemma 2 and \(\alpha'_{l}\) is given in Theorem 3.
Proof
We consider the main path in this case to be \(\Pr(S^{K}_{l+1}[l1] = l \wedge S^{K}_{l+1}[l] = 0)\), which occurs with probability \(\frac{1}{N^{2}} + (1  \frac{1}{N^{2}} )\alpha_{l}\), as in Lemma 2. We also need to compute \(\Pr(S^{K}_{l+1}[l1] = l)\). Since i ^{K} in round l+1 has touched the index l, the value at this position can be assumed to be random. Thus, we may assume \(\Pr(S^{K}_{l+1}[l] = 0) \approx\frac {1}{N}\), and hence
Now, we may compute the main probability \(\Pr(Z_{l} = l \wedge S^{K}_{l+1}[l] = 0)\), as follows:
From Lemma 2 and proof of Theorem 3, the first part is approximated by \((\frac{1}{N^{2}} + (1  \frac{1}{N^{2}} )\alpha_{l} ) \cdot\alpha'_{l}\). In the second part, we assume that when \(S^{K}_{l+1}[l1] \neql\), with probability \(1  \frac{1}{N}  (1  \frac {1}{N^{2}} )\alpha_{l}\), then the event \((Z_{l} = l \wedge S^{K}_{l+1}[l] = 0)\) happens due to random association, with probability \(\frac{1}{N^{2}}\). Adding the contributions from the two parts as above, we obtain the result. □
If we divide Pr(S _{ l }[j _{ l }]=0∧Z _{ l }=−l) of Theorem 3 and \(\Pr(S^{K}_{l+1}[l] = 0 \wedge Z_{l} = l )\) of Theorem 6 by Pr(Z _{ l }=−l) of Theorem 5, we get the desired conditional probabilities of (S _{ l }[j _{ l }]=0∣Z _{ l }=−l) and \((S^{K}_{l+1}[l] = 0 \mid Z_{l} = l )\) respectively. These theoretical estimates closely match with our experimental observations. For example, in case of l=16, from simulations with 1 billion randomly generated secret keys, we obtained the experimental values of the above probabilities as 9.7/256 and 9.5/256 (approx.) respectively, whereas the theoretical values are close to 9.6/256 for both cases.
3 Biases Involving State Variables in Initial Rounds of RC4 PRGA
In this section, we discuss and prove some empirically observed biases that involve the state variables i,j and S along with to the output keystream Z. We investigate some significant empirical biases discovered and reported by Sepehrdad, Vaudenay and Vuagnoux [30]. We provide theoretical justification only for the biases which are of the approximate order of 2/N or more, as in Table 1.
3.1 Bias at Specific Initial Rounds
In this section, we first prove the bias labeled “New_noz_014” in Ref. [30, Figs. 3 and 4] and Table 1.
Theorem 7
After the first round (r=1) of RC4 PRGA,
Proof
We have j _{1}+S _{1}[i _{1}]=S _{0}[1]+S _{0}[j _{1}]=S _{0}[1]+S _{0}[S _{0}[1]]. We compute the desired probability using the following two conditional paths depending on the value of j _{1}=S _{0}[1]:
□
If we consider the RC4 permutation after the KSA, the probabilities involving S _{0} in the expression for Pr(j _{1}+S _{1}[i _{1}]=2) should be evaluated using Proposition 1 and the joint probability should be estimated in the same manner as in Sect. 4.1.3, giving a total probability of approximately 1.937/N for N=256. This closely matches the observed value 1.94/N. If we assume that RC4 PRGA starts with a random initial permutation S _{0}, the probability turns out to be approximately 2/N−1/N ^{2}≈1.996/N for N=256, i.e., almost twice that of a random occurrence.
Next, we prove the biases “New_noz_007,” “New_noz_ 009” and “New_004,” as in Ref. [30] and Table 1.
Theorem 8
After the second round (r=2) of RC4 PRGA, the following probability relations hold between the index j _{2} and the state variables S _{2}[i _{2}],S _{2}[j _{2}]:
Proof
We have j _{2}+S _{2}[j _{2}]=(j _{1}+S _{1}[i _{2}])+S _{1}[i _{2}]=S _{0}[1]+2⋅S _{1}[2] in RC4 PRGA. Now for (5), we have the following paths depending on the value of j _{1}=S _{0}[1]:
We explore the conditional events in each of the above paths as follows:
To satisfy X+2⋅S _{1}[2]=6 in the second path, the value of X must be even and for each such value of X, the variable S _{1}[2] can take two different values, namely (3+N/2−X/2) and (3+N−X/2) modulo N. Thus, we have the following:
In case of (6), we have the following conditional paths depending on the value of S _{1}[2]:
In the first case, the condition holds with probability 1, since
For all other cases in the second path, with S _{1}[2]=X≠0, we can assume the condition to hold with probability approximately 1/N. Thus, we have:
For (7), the condition is almost identical to the condition of (6) apart from the inclusion of Z _{2}. However, our first path S _{1}[2]=0 gives Pr(Z _{2}=0∣S _{1}[2]=0)=1 (as in [18]), which implies the following:
In all other cases with S _{1}[2]≠0, we assume the conditions to match uniformly at random. Therefore:
□
In case of (5), if we assume S _{0} to be the initial state for RC4 PRGA, and substitute all probabilities involving S _{0} using Proposition 1, we get the total probability equal to 2.36/N for N=256. This value closely matches with the observed probability 2.37/N. If we assume S _{0} to be a random permutation in (5), we get probability 2/N−2/N ^{2}≈1.992/N for N=256. The theoretical results are summarized in Table 2 along with the experimentally observed probabilities from Ref. [30].
3.2 RoundIndependent Biases at All Initial Rounds
In this section, we turn our attention to the biases labeled “New_ noz_001” and “New_noz_002.” In Ref. [30] it was observed that both of these biases exist for all initial rounds (1≤r≤N−1) of RC4 PRGA. In Theorem 9 below, we prove a more general result. We show that actually these biases do not change with r and they continue to persist at the same order of 2/N at any arbitrary round of PRGA. Thus, the probabilities for “New_noz_001” and “New_noz_002” from Ref. [30] turn out to be special cases (for 1≤r≤N−1) of Theorem 9.
Theorem 9
At any round r≥1 of RC4 PRGA, the following two relations hold between the indices i _{ r },j _{ r } and the state variables S _{ r }[i _{ r }],S _{ r }[j _{ r }]:
Proof
We denote the events as E _{1}:(j _{ r }+S _{ r }[j _{ r }]=i _{ r }+S _{ r }[i _{ r }]) and E _{2}:(j _{ r }+S _{ r }[i _{ r }]=i _{ r }+S _{ r }[j _{ r }]). For both the events, we shall take the conditional paths as follows for computing the probabilities:
We have Pr(i _{ r }=j _{ r })≈1/N and Pr(E _{1}∣i _{ r }=j _{ r })=Pr(E _{2}∣i _{ r }=j _{ r })=1. In the case where i _{ r }≠j _{ r }, we have S _{ r }[j _{ r }]≠S _{ r }[i _{ r }], as S _{ r } is a permutation. Thus in case i _{ r }≠j _{ r }, the values of S _{ r }[i _{ r }] and S _{ r }[j _{ r }] can be chosen in N(N−1) ways (drawing from a permutation without replacement) to satisfy the events E _{1},E _{2}. This gives the total probability for each event E _{1},E _{2} approximately as:
□
Our theoretical results match the probabilities reported in Ref. [30, Fig. 2] for the initial rounds 1≤r≤N−1. One may note that the biases in Theorem 9 look somewhat similar to Jenkin’s correlations [12]:
However, the biases proved in Theorem 9 do not contain the keystream byte Z _{ r }, and one may check that the results do not follow directly from Jenkin’s correlations [12] either.
3.3 RoundDependent Biases at All Initial Rounds
Next, we consider the biases that are labeled as “New_000,” “New_ noz_004” and “New_noz_006” [30, Fig. 2]. We prove the biases for rounds 3 to 255 in RC4 PRGA, and we show that all of these decrease in magnitude with increase in r, as observed experimentally in Ref. [30].
The bias labeled “New_noz_006” in Ref. [30] can be derived as a corollary to Theorem 1 as follows.
Corollary 2
For PRGA rounds 3≤r≤N−1,
Proof
S _{ r }[j _{ r }] is assigned the value at S _{ r−1}[r] due to the swap in round r. Hence substituting u=r and v=i _{ r }=r in Theorem 1, we get the result. □
In Fig. 5, we illustrate the experimental observations (each data point represents the average obtained from over 100 million experimental runs with 16byte key in each case) and the theoretical values for the distribution of Pr(S _{ r }[j _{ r }]=i _{ r }) over the initial rounds 3≤r≤255 of RC4 PRGA. It is evident that our theoretical formula, as derived in Corollary 2, matches the experimental observations.
Next we take a look at the other two rounddependent biases of RC4, observed in Ref. [30]. We state the related result in Theorem 10, corresponding to observations “New_noz_004” and “New_000.”
Theorem 10
For PRGA rounds 3≤r≤N−1,
Proof
We can write the two events under consideration as E _{3}:(S _{ r−1}[j _{ r }]=j _{ r }) and E _{4}:(S _{ r }[t _{ r }]=t _{ r }), where j _{ r } and t _{ r } can be considered as pseudorandom variables for all 3≤r≤N−1. We consider the following conditional paths for the first event E _{3}, depending on the range of values j _{ r } may take:
 Case I.:

In this case, we assume that j _{ r } takes a value X between 0 and r−1. Each position in this range is touched by index i, and may also be touched by index j. Thus, irrespective of any initial condition, we may assume that Pr(E _{3}∣j _{ r }=X)≈1/N in this case. Hence, this part contributes:
$$\sum_{X = 0}^{r1} \Pr(E_3 \mid j_r = X) \cdot\Pr(j_r = X) \approx\sum _{X=0}^{r1} \frac{1}{N} \cdot\frac{1}{N} = \frac{r}{N^2}. $$  Case II.:

Here we suppose that j _{ r } assumes a value r≤X≤N−1. In this case, the probability calculation can be split into two paths, as follows:
If S _{1}[X]=X, similarly to the logic in Theorem 1, we get the following:
$$\Pr\bigl(E_3 \mid j_r = X \wedge S_1[X] = X \bigr) \cdot\Pr\bigl(S_1[X] = X\bigr) \approx\Pr \bigl(S_1[X] = X\bigr) \biggl(1  \frac{1}{N} \biggr)^{r2}. $$If we suppose that S _{1}[u]=X for some u≠X, then one may note the following two subcases:

Subcase 2≤u≤r−1: The probability for this path is similar to that in the proof of Theorem 1:
$$\sum_{u = 2}^{r1} \sum _{w = 0}^{ru} \frac{\Pr(S_1[u] = r)}{w! \cdot N} \biggl( \frac{ru1}{N} \biggr)^{w} \biggl( 1\frac{1}{N} \biggr)^{r3w}. $$ 
Subcase r≤u≤N−1: In this case the value X will always be behind the position of i _{ r }=r, whereas X>r as per assumption, i.e., the value X can never reach index position X from initial position u. Thus the probability is 0 in this case.

Assuming Pr(j _{ r }=X)=1/N for all X, and combining all contributions from the abovementioned cases, we get the value of Pr(S _{ r−1}[j _{ r }]=j _{ r })=Pr(S _{ r }[i _{ r }]=j _{ r }), as desired.
In case of Pr(S _{ r }[t _{ r }]=t _{ r }), t _{ r } is a random variable just like j _{ r }, and may take all values from 0 to N−1 with approximately the same probability 1/N. Thus we can approximate Pr(S _{ r }[t _{ r }]=t _{ r })≈Pr(S _{ r−1}[j _{ r }]=j _{ r }) to obtain the desired expression. □
Remark 1
The approximation Pr(S _{ r }[t _{ r }]=t _{ r })≈Pr(S _{ r−1}[j _{ r }]=j _{ r }), as in Theorem 10, is particularly close for higher values of r because the effect of a single state change from S _{ r−1} to S _{ r } is low in such a case. For smaller values of r, it is more accurate to approximate Pr(S _{ r−1}[t _{ r }]=t _{ r })≈Pr(S _{ r−1}[j _{ r }]=j _{ r }) and critically analyze the effect of the rth round of PRGA thereafter.
In Fig. 6, we show the experimental observations (averages taken over 100 million runs with 16byte key) and the theoretical values for the distributions of Pr(S _{ r }[i _{ r }]=j _{ r }) and Pr(S _{ r }[t _{ r }]=t _{ r }) over the initial rounds 3≤r≤255 of RC4 PRGA. It is evident that our theoretical formulae closely match with the experimental observations in both the cases.
3.4 (Non)Randomness of j in the Initial Rounds
Two indices, i and j, are used in RC4 PRGA—the first is deterministic and the second one is pseudorandom. Index j depends on the values of i and S[i] simultaneously, and the pseudorandomness of the permutation S causes the pseudorandomness in j. In this section, we attempt to analyze the pseudorandom behavior of j more clearly.
In RC4 PRGA, we know that for r≥1, i _{ r }=rmodN and j _{ r }=j _{ r−1}+S _{ r−1}[i _{ r }], starting with j _{0}=0. Thus, we can recursively write the values of j at different rounds 1≤r≤N−1:
Nonrandomness of j _{1}
In the first round of PRGA, j _{1}=S _{0}[1] follows a probability distribution which is determined by S _{0}. According to Proposition 1, we have:
This clearly tells us that j _{1} is not random. This is also portrayed in Fig. 7.
NonRandomness of j _{2}
In the second round of PRGA, however, we have j _{2}=S _{0}[1]+S _{1}[2], which demonstrates better randomness, as per the following discussion. We have:
The following cases may arise with respect to (10).

Case I: Suppose that j _{1}=S _{0}[1]=w=2. Then, S _{1}[i _{2}]=S _{1}[2]=S _{1}[j _{1}]=S _{0}[i _{1}]=S _{0}[1]=2. In this case, we have:
$$ \Pr(j_2 = v) = \left\{ \begin{array}{l@{\quad}l} \Pr(S_0[1] = 2), & \mbox{if}\ v = 4; \\ 0, & \mbox{otherwise.} \end{array} \right. $$ 
Case II: Suppose that j _{1}=S _{0}[1]=w≠2. Then S _{0}[2] will not get swapped in the first round, and hence S _{1}[2]=S _{0}[2]. In this case, Pr(S _{0}[1]=w∧S _{1}[2]=v−w)=Pr(S _{0}[1]=w∧S _{0}[2]=v−w).
We substitute the results obtained from these cases into (10) to obtain:
Equation (11) completely specifies the exact probability distribution of j _{2}, where the exact values of the probabilities Pr(S _{0}[x]=y) can be substituted from Proposition 1 with the adjustment as in Sect. 4.1.3 for estimating the joint probabilities. However, the expression suffices to exhibit the nonrandomness of j _{2} in the RC4 PRGA, having a large bias for v=4. We found that the theoretical probabilities from (11) match almost exactly with the experimental data plotted in Fig. 7. For the sake of clarity, we do not show the theoretical curve in Fig. 7.
Randomness of j _{ r } for r≥3
It is possible to compute the explicit probability distributions of \(j_{r} = \sum_{x=1}^{r} S_{x1}[x]\) for 3≤r≤255 as well. We do not present the complicated expressions for Pr(j _{ r }=v) for r≥3 here, but it turns out that j _{ r } becomes closer to be random as r increases.
The probability distributions of j _{1},j _{2} and j _{3} are shown in Fig. 7, where the experiments have been run over 1 billion trials of RC4 PRGA, with randomly generated keys of size 16 bytes. One may note that the randomness in j _{2} is more than that of j _{1} (apart from the case v=4), and j _{3} is almost uniformly random. This trend continues for the later rounds of PRGA as well. However, we do not plot the graphs for the probability distributions of j _{ r } with r≥4, as these distributions are almost identical to that of j _{3}, i.e., almost uniformly random in behavior.
3.5 Correlation Between Z _{2} and S _{2}[2]
We now explore the bias in (j _{2}=4) more deeply and establish a correlation between the state S _{2} and the keystream. Let us first evaluate Pr(j _{2}=4):
Following Proposition 1 and the estimation of joint probabilities as in Sect. 4.1.3, the sum in the above expression evaluates approximately to 0.965268/N for N=256. Thus, we get:
This closely matches with our experimental observation, as depicted in Fig. 7. To exploit this bias in (j _{2}=4), we focus on the event (S _{2}[i _{2}]=4−Z _{2}) or (S _{2}[2]=4−Z _{2}), and prove the following.
Theorem 11
After completion of the second round of RC4 PRGA with N=256,
Proof
We can write Z _{2} in terms of the state variables as follows:
Thus, we can write the probability of the target event (S _{2}[2]=4−Z _{2}) as follows:
Computing the First Term
The probability for the first event can be calculated as follows:
In the last expression, the values taken from S _{1} are independent of the value of j _{2}, and thus the events (S _{1}[4]+S _{2}[y]=4) and (S _{1}[4]+S _{1}[2]=y) are both independent of the event (j _{2}=4). Also, if y=4, we obtain S _{1}[4]+S _{2}[y]=S _{1}[4]+S _{2}[4]=S _{1}[4]+S _{2}[j _{2}]=S _{1}[4]+S _{1}[i _{2}]=S _{1}[4]+S _{1}[2], which results in the events (S _{1}[4]+S _{2}[y]=4) and (S _{1}[4]+S _{1}[2]=y) being identical. In all other cases, we have S _{1}[4]+S _{2}[y]≠S _{1}[4]+S _{1}[2] and thus the values are chosen distinctly independent at random. Hence, we obtain:
Thus, the probability Pr(S _{1}[j _{2}]+S _{2}[S _{1}[j _{2}]+S _{1}[2]]=4∧j _{2}=4) for the first event turns out to be:
Computing the Second Term
The probability calculation follows a similar path:
The case y=x poses an interesting situation. On the one hand, we obtain S _{1}[x]+S _{2}[y]=S _{1}[x]+S _{2}[x]=S _{1}[x]+S _{2}[j _{2}]=S _{1}[x]+S _{1}[i _{2}]=S _{1}[x]+S _{1}[2]=4, while on the other hand, we get S _{1}[x]+S _{1}[2]=x≠4. We rule out this case to get Pr(S _{1}[j _{2}]+S _{2}[S _{1}[j _{2}]+S _{1}[2]]=4∧j _{2}≠4):
As before, the values taken from S _{1} are independent of the value of j _{2}, and thus the events (S _{1}[x]+S _{2}[y]=4) and (S _{1}[x]+S _{1}[2]=y) are both independent of the event (j _{2}=x).
If y=4, we have S _{1}[x]+S _{2}[4]=4, while S _{1}[x]+S _{1}[2]=4. One may note that S _{1}[4] does not get swapped to obtain S _{2}, as i _{2}=2 and j _{2}=x≠4. Thus, S _{2}[4]=S _{1}[4] and we get S _{1}[x]+S _{1}[4]=4 and S _{1}[x]+S _{1}[2]=4. This indicates S _{1}[4]=S _{1}[2], which is impossible as S _{1} is a permutation. All other cases (y≠4) deal with two distinct locations of the permutation S _{1}. Therefore, we obtain:
Thus, the probability Pr(S _{1}[j _{2}]+S _{2}[S _{1}[j _{2}]+S _{1}[2]]=4∧j _{2}≠4) of the second event turns out to be:
Calculation for Pr(S _{2}[2]=4−Z _{2})
Combining the probabilities for the first and second events, we get the following:
This establishes a correlation between the state byte S _{2}[2] and the keystream byte Z _{2}. For N=256, the result matches with our experimental data generated from 1 billion runs of RC4 with randomly selected 16byte keys.
4 Biases in Keystream Bytes of RC4 PRGA
In the previous section, we discussed some biases involving the RC4 state variables S, i, j, during RC4 PRGA. A few of those biases involved the keystream bytes also. In this section, we concentrate on biases exhibited by RC4 keystream bytes towards constant values in {0,…,255}.
4.1 Probability Distribution of Z _{1}
Here we derive the complete probability distribution of the first RC4 keystream byte Z _{1}, as observed by Mironov [23, Fig. 6] in CRYPTO 2002. Before proceeding to prove the general result, we start with a specific case, namely, the negative bias of Z _{1} towards 0.
4.1.1 Negative Bias in Z _{1} Towards Zero
The special case of Z _{1}’s negative bias towards 0 is contained in the complete probability distribution of Z _{1} to be proved shortly. However, we present a separate proof for this special case because, unlike the proof for the complete case, this special case has a much simpler proof which reveals a different relationship of the RC4 state variables. This is elaborated further in Remark 2 later.
Theorem 12
Assume that the initial permutation S _{0} of RC4 PRGA is randomly chosen from the set of all permutations of {0,1,…,N−1}. Then the probability that the first output byte of RC4 keystream is 0 is approximately 1/N−1/N ^{2}.
Proof
We explore the probability Pr(Z _{1}=0) using the following conditional paths:
Case I: S _{0}[j _{1}]=0. Suppose that j _{1}=S _{0}[1]=X≠1 and S _{0}[j _{1}]=S _{0}[S _{0}[1]]=0. Then we have
as S _{0} is a permutation, where X and 0 belong to two different indices 1 and X. Thus, in this case we have Pr(Z _{1}=0∣S _{0}[j _{1}]=0)≈0.
Case II: S _{0}[j _{1}]≠0. In this case, output byte Z _{1} can be considered uniformly random, and thus
Combining the two cases, the total probability that the first output byte is 0 is given by
□
From Theorem 12, we immediately get a distinguisher of RC4 that can effectively distinguish the output keystream of the cipher from a random sequence of bytes. For the event E:(Z _{1}=0), the bias proved above can be written as p(1+q), where p=1/N and q=−1/N. The number of samples required to distinguish RC4 from random sequence of bits with a constant probability of success in this case is approximately N ^{3}.
4.1.2 Complete Distribution of Z _{1}
In this section, we turn our attention to the complete probability distribution of the first byte Z _{1}. In Ref. [23, Fig. 6], the empirical plot of Z _{1} has a peculiar sinecurvelike pattern which is not observed for any other variables or events related to RC4. In Theorem 13, we theoretically derive this interesting distribution.
Theorem 13
The probability distribution of the first output byte of RC4 keystream is as follows, where v∈{0,…,N−1}, \(\mathcal{L}_{v} = \{ 0, 1, \ldots, N1\} \setminus\{1, v\}\) and \(\mathcal{T}_{v, X} = \{0, 1, \ldots, N1\} \setminus\{0, X, 1X, v\}\).
Proof
The first output byte Z _{1} can be explicitly written as
where we denote j _{1}=S _{0}[1] by X and S _{0}[S _{0}[1]]=S _{0}[X] by Y. Thus, we have
Special Cases Depending on X,Y
Our goal is to write all probability expressions in terms of S _{0}. To express S _{1}[X+Y] in terms of S _{0}, we observe that the state S _{1} is different from S _{0} in at most two places, i _{1}=1 and j _{1}=X. Thus, we need to treat specially the case X+Y=1, which holds if and only if Y=1−X, and X+Y=X, which holds if and only if Y=0. Another special case to consider is X=1, which holds if and only if Y=X, where no swap occurs from S _{0} to S _{1}. These special cases result in the following values of Z _{1}:
In all other circumstances, we would have Z _{1}=S _{1}[X+Y]=S _{0}[X+Y]. Considering all the special cases as discussed above, we obtain Pr(Z _{1}=v) in terms of S _{0} as follows:
The first sum refers to the special case Y=1−X and the second one refers to Y=0. The special case X=1, which holds if and only if Y=X, merges to produce the third term, common point (X=1,Y=1). All other points on X=1 and Y=X are discarded. The last double summation term denotes all other general cases. One may refer to Fig. 8 to obtain a clearer exposition of the ranges of sums.
Special Cases Depending on v
The first summation term reduces to a single point (X=1−v,Y=v), as we fix 1−X=v and Y=1−X. The second summation, similarly, reduces to the point (X=v,Y=0). Furthermore, we have two impossible cases in the double summation:
Hence, the most general form for the probability Pr(Z _{1}=v) can be written as follows:
where Q _{ v }=Pr(S _{0}[1]=1−v∧S _{0}[1−v]=v)+Pr(S _{0}[1]=v∧S _{0}[v]=0)+Pr(S _{0}[1]=1∧S _{0}[2]=v).
Value of Q _{ v }
State S _{0} being a permutation, some of the probability terms in Q _{ v } are 0 when v takes particular values. We have the following three cases in this regard.

Case v=0: We have Q _{0}=Pr(S _{0}[1]=1∧S _{0}[1]=0)+Pr(S _{0}[1]=0∧S _{0}[0]=0)+Pr(S _{0}[1]=1∧S _{0}[2]=0)=Pr(S _{0}[1]=1∧S _{0}[2]=0), as S _{0} is a permutation.

Case v=1: We have Q _{ v }=Pr(S _{0}[1]=0∧S _{0}[0]=1)+Pr(S _{0}[1]=1∧S _{0}[1]=0)+Pr(S _{0}[1]=1∧S _{0}[2]=1)=Pr(S _{0}[1]=0∧S _{0}[0]=1), as S _{0} is a permutation.

Case v≠0,1: Here we have no conflicts or special conditions as in the previous cases, and hence the general form of Q _{ v } holds.
Combining the general formula for Pr(Z _{1}=v) and all three cases for Q _{ v }, we obtain the desired theoretical probability distribution for the first output byte Z _{1}. □
4.1.3 Estimation of the Joint Probabilities and Numeric Values
We consider two special cases while computing the numeric values of Pr(Z _{1}=v). First, we investigate RC4 PRGA where S _{0} is fed from the output of RC4 KSA, as in practice. Next, we probe into the scenario when the initial permutation S _{0} is random.
Assume that the initial permutation S _{0} of RC4 PRGA is constructed from the regular KSA, i.e., the probabilities Pr(S _{0}[u]=v) follow the distribution mentioned in Proposition 1. However, we require the joint probabilities like Pr(S _{0}[1]=X∧S _{0}[X]=Y∧S _{0}[X+Y]=v) in our formula derived in Theorem 13, and we devise the following estimates for these joint probabilities.

Consider the joint probability Pr(S _{0}[u]=v∧S _{0}[u′]=v′) where u≠u′ and v≠v′. We can represent this by Pr(S _{0}[u]=v∧S _{0}[u′]=v′)=Pr(S _{0}[u]=v)⋅Pr(S _{0}[u′]=v′∣S _{0}[u]=v). The first term is estimated directly from Proposition 1. For the second term, S _{0}[u]=v ⇒ S _{0}[u′]≠v. Thus we normalize Pr(S _{0}[u′]=v) and estimate the second term as
$$\Pr\bigl(S_0\bigl[u'\bigr] = v' \mid S_0[u] = v\bigr) \approx\Pr\bigl(S_0 \bigl[u'\bigr] = v'\bigr) + \frac{\Pr(S_0[u'] = v)}{N1}. $$ 
For the joint probability Pr(S _{0}[u]=v∧S _{0}[u′]=v′∧S _{0}[u″]=v″), we can represent it by Pr(S _{0}[u]=v)⋅Pr(S _{0}[u′]=v′∣S _{0}[u]=v)⋅Pr(S _{0}[u″]=v″∣S _{0}[u′]=v′∧S _{0}[u]=v). The first term comes from Proposition 1 and the second term as above. The third term is estimated as
$$\begin{aligned} &\Pr\bigl(S_0\bigl[u''\bigr] = v'' \mid S_0\bigl[u'\bigr] = v' \wedge S_0[u] = v\bigr) \\ &\quad \approx\Pr \bigl(S_0\bigl[u''\bigr] = v''\bigr) + \frac{\Pr(S_0[u''] = v')}{N2} + \frac{\Pr(S_0[u''] = v)}{N2}. \end{aligned}$$
We compute the theoretical values of Pr(Z _{1}=v) using Theorem 13 and Proposition 1, along with the estimations for joint probabilities discussed above. Figure 9 shows the theoretical and experimental probability distributions of Z _{1}, where the experimental data is generated over 100 million runs of RC4 PRGA using 16byte secret keys. The figure clearly shows that our theoretical justification closely matches the experimental data, and justifies the observation by Mironov [23].
As an alternative to the additive correction described above for estimating the conditionals, one may consider multiplicative correction by normalizing the probabilities as follows:

Estimate Pr(S _{0}[u′]=v′∣S _{0}[u]=v) as \(\frac{\Pr(S_{0}[u'] = v')}{1  \Pr(S_{0}[u'] = v)}\).

Estimate Pr(S _{0}[u″]=v″∣S _{0}[u′]=v′∧S _{0}[u]=v) as \(\frac{\Pr(S_{0}[u''] = v'')}{1  \Pr(S_{0}[u''] = v')  \Pr(S_{0}[u''] = v)}\).
We found that the numeric values of Pr(Z _{1}=v) estimated using the two different models (additive and multiplicative) almost coincide and the graphs fall right on top of one another.
If the initial permutation S _{0} of RC4 PRGA is considered to be random, then we would have Pr(S _{0}[u]=v)≈1/N for all u,v, and the joint probabilities can be computed directly (samples drawn without replacement). Substituting all the relevant probability values, we get
which is almost a uniform distribution for 2≤v≤255. The dashed line in Fig. 9 shows the graph for this theoretical distribution, and it closely matches our experimental data as well (we omit the experimental curve for random S _{0} as it coincides with the theoretical one).
Remark 2
Theorem 12 is the special case v=0 of Theorem 13 and hence may seem redundant. However, we like to point out that the former has a simple and straightforward proof assuming S _{0} to be random and the latter has a rigorous general proof without any assumption on S _{0}. The result of Theorem 12 signifies that this negative bias is not an artifact of nonrandom S _{0} produced by RC4 KSA, rather it would be present, even if one starts PRGA with a uniform random permutation.
4.2 Biases of Keystream Bytes 3 to 255 Towards Zero
In FSE 2001, Mantin and Shamir [18] proved the famous 2/N bias towards the value 0 for the second byte of RC4 keystream. In addition, they made the following claims:

MSClaim1: \(\Pr(Z_{r} = 0) = \frac{1}{N}\) at PRGA rounds 3≤r≤255.

MSClaim2: \(\Pr(Z_{r} = 0 \mid j_{r} = 0) > \frac{1}{N}\) and \(\Pr(Z_{r} = 0 \mid j_{r} \neq0) < \frac{1}{N}\) for 3≤r≤255.
It is reasoned in Ref. [18] that the two biases in MSClaim2 cancel each other to produce no bias in the event (Z _{ r }=0) in rounds 3 to 255, thereby justifying MSClaim1. In this section, contrary to MSClaim1, we show (in Theorem 14) that \(\Pr(Z_{r} = 0) > \frac{1}{N}\) for all rounds r from 3 to 255.
To prove the main result, we will require Corollary 2. For ease of reference, we restate another version of this corollary below.
Corollary 2
For PRGA rounds 3≤r≤N−1,
Theorem 14
For PRGA rounds 3≤r≤N−1, the probability that Z _{ r }=0 is given by
Proof
The expression for c _{ r } has an extra term \(( \frac{N2}{N1} )\) in the case r=3, and everything else is the same as in the general formula for 4≤r≤N−1. We shall first prove the general formula for 4≤r≤N−1, and then justify the extra term for the special case r=3. We may write:
We will use Z _{ r }=S _{ r }[S _{ r }[i _{ r }]+S _{ r }[j _{ r }]]=S _{ r }[S _{ r }[r]+S _{ r−1}[i _{ r }]]=S _{ r }[S _{ r }[r]+S _{ r−1}[i _{ r }]]=S _{ r }[S _{ r }[r]+S _{ r−1}[r]].
Calculation of Pr(Z _{ r }=0∧S _{ r−1}[r]=r)
In this case, Z _{ r }=0 ⇒ S _{ r }[S _{ r }[r]+r]=0, and thus:
Now the events (S _{ r }[x+r]=0) and (S _{ r }[r]=x) are both independent of (S _{ r−1}[r]=r), as a state update has occurred in the process, and S _{ r−1}[r]=r is one of the values that got swapped. Hence,
We note that if there exists any bias in the event (S _{ r }[x+r]=0), then it must propagate from a similar bias in (S _{0}[x+r]=0), as was the case for (S _{ r−1}[r]=r) in Corollary 2. However, Pr(S _{0}[x+r]=0)=1/N by Proposition 1, and thus we assume Pr(S _{ r }[x+r]=0)≈1/N as well. For Pr(S _{ r }[r]=x∣S _{ r }[x+r]=0), we have the following two cases:
and
Moreover, in the second case, the value of S _{ r }[r] is independent of S _{ r−1}[r] because [r]=[i _{ r }] position got swapped to generate S _{ r } from S _{ r−1}. Thus we have:
Combining all the above probability values, we get
Calculation of Pr(Z _{ r }=0∧S _{ r−1}[r]≠r)
Similarly to the previous case, we can derive
In the above expression, we have
which is a contradiction. Moreover, the events (S _{ r }[x+y]=0) and (S _{ r }[r]=x) are both independent of (S _{ r−1}[r]=y), as S _{ r−1}[r] got swapped in the state update. Thus we get:
Similarly to the derivation of (13), we obtain:
The only difference occurs in the case x=0. Here we get
which is a contradiction as y≠r are distinct locations in the permutation S _{ r }. In all other cases (x≠0), the argument is same as before. Combining the above probabilities, we get:
Calculation for Pr(Z _{ r }=0)
Combining (12), (14) and (16), we obtain
where \(c_{r} = \frac{N}{N1} ( N \cdot\Pr(S_{r1}[r] = r)  1 )\), as required in the general case.
Special Case for r=3
The expression for Pr(Z _{ r }=0∧S _{ r−1}[r]=r) is identical to that in the general case, that is, the same as in (14). However, for Pr(Z _{ r }=0∧S _{ r−1}[r]≠r) we have a special case. For r=3, if S _{ r−1}[r]=S _{2}[3]=0, we have j _{3}=j _{2}+S _{2}[3]=j _{2}, and thus
This poses a contradiction, as S _{0}[1]=S _{1}[0]=0 can only produce S _{2}[i _{2}]=S _{2}[2]=0 in the case j _{2}=0, and may never result in S _{2}[3]=0. Thus, for r=3, (16) changes as follows:
This gives rise to the special expression of \(c_{r} = \frac{N}{N1} ( N \cdot\Pr(S_{r1}[r] = r)  1 )  \frac{N2}{N1}\).
The extra term does not appear in the general case 4≤r≤N−1, because we have
which does not pose any contradiction for r>3, as we can assume j _{ r−2} to be random and independent to the condition S _{ r−1}[r]=y=0 in these cases. □
Corollary 3
For N=256 and 3≤r≤255, the probability Pr(Z _{ r }=0) is bounded as follows:
Numerical calculation of c _{ r } for N=256 and 3≤r≤255 gives that c _{ r } decreases for 4≤r≤255 (as in Fig. 10). Thus, c _{4}=1.337057≥c _{ r }≥0.242811=c _{255} for 4≤r≤255, and the special case c _{3}=0.351089 for r=3 also falls within the same bounds. Hence the bounds on Pr(Z _{ r }=0).
Figure 11 depicts a comparison between the theoretical and experimental values of Pr(Z _{ r }=0) plotted against r, where N=256 and 3≤r≤255, and the experimentation is performed over 1 billion runs of RC4, each with a randomly generated 16byte key.
Let E _{ r } denote the event (Z _{ r }=0) for some 3≤r≤255. If we write p=1/N and q=c _{ r }/N, then to distinguish RC4 keystream from random sequence based on event E _{ r }, one would need number of samples of the order of (1/N)^{−1}⋅(c _{ r }/N)^{−2}∼N ^{3}. It will be interesting to see if one can combine the effect of all these distinguishers to have a stronger one.
In this section, we have contradicted MSClaim1 by proving the biases in Pr(Z _{ r }=0) for all 3≤r≤255. If the supporting statement MSClaim2 was correct, then one would have a positive bias \(\Pr(Z_{r} = 0 \mid j_{r} = 0) > \frac{1}{N}\). However, we have run extensive experiments to confirm that \(\Pr(Z_{r} = 0 \mid j_{r} = 0) \approx\frac{1}{N}\), thereby contradicting MSClaim2 as well.
4.2.1 Guessing State Information Using the Bias in Z _{ r }
Mantin and Shamir [18] used the bias of the second byte of RC4 keystream to guess some information regarding S _{0}[2], based on the following:
Note that in the above expression, no randomness assumption is required to obtain Pr(S _{0}[2]=0)=1/N. This probability is exact and can be derived by substituting u=2,v=0 in Proposition 1. Hence, on every occasion we obtain Z _{2}=0 in the keystream, we can guess S _{0}[2] with probability 1/2, and this is significantly more than a random guess with probability 1/N.
In this section, we use the biases in bytes 3 to 255 (observed in Theorem 14) to extract similar information about the state array S _{ r−1} using the RC4 keystream byte Z _{ r }. In particular, we try to explore the conditional probability Pr(S _{ r−1}[r]=r∣Z _{ r }=0) for 3≤r≤255, as follows:
In the above expression, c _{ r } is as in Theorem 14, and one may write:
In Fig. 12, we plot the theoretical values of Pr(S _{ r−1}[r]=r∣Z _{ r }=0) for 3≤r≤255 and N=256, and the corresponding experimental values over 1 billion runs of RC4 with random 16byte keys. It clearly shows that all values of Pr(S _{ r−1}[r]=r∣Z _{ r }=0) for N=256 and 3≤r≤255 (both theoretical and experimental) are greater than 2/N. Thus, one can guess S _{ r−1}[r] with probability more than twice of that of a random guess, every time we obtain Z _{ r }=0 in the keystream.
Remark 3
In proving Corollary 2, we use the initial condition S _{1}[r]=r to branch out the probability paths, and not S _{0}[r]=r as in Ref. [16, Lemma 1]. This is because the probability of S[r]=r takes a leap from around 1/N in S _{0} to about 2/N in S _{1}, and this turns out to be the actual cause behind the bias in S _{ r−1}[r]=r. Consideration of this issue eventually corrects the mismatches observed in the graphs of Ref. [16, Figs. 2 and 3]. Note that Theorem 14, Fig. 11 and Fig. 12 are, respectively, the corrected versions of Theorem 1, Fig. 2 and Fig. 3 in Ref. [16].
4.2.2 Attacking the RC4 Broadcast Scheme
We revisit the famous attack of Mantin and Shamir [18] on broadcast RC4, where the same plaintext is encrypted using multiple secret keys, and then the ciphertexts are broadcast to a group of recipients. In Ref. [18], the authors propose a practical attack against an RC4 implementation of the broadcast scheme, based on the bias observed in the second keystream byte. They prove that an attacker that collects Ω(N) number of ciphertexts corresponding to the same plaintext M, can easily deduce the second byte of M, by exploiting the bias in Z _{2}.
In a similar line of action, we may exploit the bias observed in bytes 3 to 255 of the RC4 keystream to mount a similar attack on RC4 broadcast scheme. Notice that we obtain a bias of the order of 1/N ^{2} in each of the bytes Z _{ r } where 3≤r≤255. Thus, roughly speaking, if the attacker obtains about N ^{3} ciphertexts corresponding to the same plaintext M (from the broadcast scheme), then he can check the frequency of occurrence of bytes to deduce the rth (3≤r≤255) byte of M. We can formally state our result (analogous to Ref. [18, Theorem 3]) as follows.
Theorem 15
Let M be a plaintext,and let C _{1},C _{2},…,C _{ w } be the RC4 encryptions of M under w uniformly distributed keys. Then if w=Ω(N ^{3}), the bytes 3 to 255 of M can be reliably extracted from C _{1},C _{2},…,C _{ w }.
Proof
Recall from Theorem 14 that Pr(Z _{ r }=0)≈1/N+c _{ r }/N ^{2} for all 3≤r≤255 in RC4. Thus, for each encryption key chosen during broadcast, the rth plaintext byte M[r] has probability 1/N+c _{ r }/N ^{2} to be XORed with 0. Due to this bias, (1/N+c _{ r }/N ^{2}) fraction of the rth ciphertext bytes will have the same value as the rth plaintext byte. When w=Ω(N ^{3}), the attacker can identify the most frequent byte in C _{1}[r],C _{2}[r],…,C _{ w }[r] as M[r] with constant probability of success. □
The attack on broadcast RC4 is applicable to many modern Internet protocols (such as group emails encrypted under different keys, groupware multiuser synchronization, etc.). Note that Mantin and Shamir’s attack [18] works at the byte level. It can recover only the second byte of the plaintext under some assumptions. On the other hand, our attack can recover an additional 253 bytes (namely, bytes 3 to 255) of the plaintext as well.
4.3 A New LongTerm Bias in RC4 Keystream
The biases discussed so far are prevalent in the initial bytes of the RC4 keystream, and are generally referred to as the shortterm biases. It is a common practice to discard a few hundred initial bytes of the keystream to avoid these biases, and this motivates the search for longterm biases in RC4 that are present even after discarding an arbitrary number of initial bytes.
The first result in this direction was observed in 1997 by Golic [8], where certain correlation was found between the least significant bits of the two nonconsecutive output bytes Z _{ r } and Z _{ r+2}, for all rounds r of RC4. In 2000, a set of results was proposed by Fluhrer and McGrew [6], where the biases depend upon the frequency of occurrence of certain digraphs in the RC4 keystream. Later in 2005, Mantin [19] improved these to obtain the \(AB\mathcal{S}AB\) distinguisher, which depends on the repetition of digraph AB in the keystream after a gap of string \(\mathcal{S}\) having G bytes. This is the best longterm distinguisher of RC4 to date. In 2008, Basu et al. [2] identified another conditional longterm bias, depending on the relationship between two consecutive bytes in the keystream.
In this section, we prove that the event (Z _{ wN+2}=0∧Z _{ wN }=0) is positively biased for all w≥1. After the first longterm bias observed by Golic [8] in 1997, this is the only one that involves nonconsecutive bytes of RC4 keystream. Golic [8] proved a strong bitwise correlation between the least significant bits of Z _{ wN } and Z _{ wN+2}, while we prove a bytewise correlation between Z _{ wN } and Z _{ wN+2}, as follows.
Theorem 16
For any integer w≥1, assume that the permutation S _{ wN } is randomly chosen from the set of all possible permutations of {0,…,N−1}. Then
Proof
The positive bias in Z _{2}, proved in Ref. [18], propagates to round (wN+2) if j _{ wN }=0. Mantin and Shamir’s observation [18, Theorem 1] implies
If j _{ wN }≠0, we observe that Z _{ wN+2} does not take the value 0 by uniform random association. In particular, we get the following:
For Z _{ wN }, we have i _{ wN }=0, and when j _{ wN }=0 (this happens with probability 1/N), no swap takes place and the output is Z _{ wN }=S _{ wN }[2⋅S _{ wN }[0]]. Two cases may arise from here. If S _{ wN }[0]=0, then Z _{ wN }=S _{ wN }[0]=0 for sure. Otherwise if S _{ wN }[0]≠0, the output Z _{ wN } takes the value 0 only due to random association. Combining the cases,
Similarly to Pr(Z _{ wN+2}=0∣j _{ wN }≠0), it is easy to show that
Now, we may compute the joint probability Pr(Z _{ wN+2}=0∧Z _{ wN }=0), which is equal to
Given j _{ wN }=0, the random variables Z _{ wN+2} and Z _{ wN } can be considered independent. Using (18) and (20), we get Pr(Z _{ wN+2}=0∧Z _{ wN }=0∧j _{ wN }=0) as
Using (19) and (21), one has Pr(Z _{ wN+2}=0∧Z _{ wN }=0∧j _{ wN }≠0) as
Adding the two expressions, we have Pr(Z _{ wN+2}=0∧Z _{ wN }=0)≈1/N ^{2}+1/N ^{3}. □
This is the first longterm bytewise correlation to be observed between two nonconsecutive bytes (Z _{ wN },Z _{ wN+2}). The gap between the related bytes in this case is 1, and we could not find any other significant longterm bias with this gap. An interesting direction for experimentation and analysis would be to look for similar longterm biases with larger gaps between the related bytes.
5 Conclusion
In this paper, we have explored several classes of nonrandom events in RC4—from key correlations to keystreambased distinguishers, and from shortterm biases to longterm nonrandomness.
KeylengthDependent NonRandomness [Sect. 2 ]
In practice, RC4 uses a small secret key of length l that is typically much less than the permutation size N, and this is the source of several keycorrelations and biases in the keystream. However, no biases that depend on the length l of the secret key were reported in the literature. In this paper, we demonstrate the first keylengthdependent biases in the RC4 literature. In the process, we prove all the empirical biases used to mount the WEP and WPA attacks [29, 31], whose proofs were left open so far. Thus, our current theoretical work complements the practical WEP attacks nicely and completes the whole picture.
ShortTerm and LongTerm NonRandomness [Sects. 3 and 4 ]
The permutation after the RC4 KSA is nonrandom, and this is the source of many biases in the initial keystream bytes, including the observations in Refs. [18, 23, 30]. We prove all significant empirical biases observed in Ref. [30] and also provide theoretical justification for the sinecurve distribution of the first byte observed in Ref. [23]. We also extend the observation of secondbyte bias in Ref. [18] to all initial bytes 3 to 255 in the RC4 keystream, and hence generalize the attack on broadcast RC4 protocol. We also discover a new longterm bias in the RC4 keystream.
Future Direction
In the search for nonrandom events in RC4, or other stream ciphers in general, our results open up the following interesting directions of research.

What are the implications of using a secret key with length relatively small compared to the internal secret state of the cipher? How is the keylength related to the biases?

Is there a general pattern in the nonrandom events generated from the initial nonrandom state produced by the KSA? Can we find more shortterm biases in this direction?

How does one generalize the concept of digraph biases to related bytes with arbitrary gaps in between? Are there more longterm biases of this kind in the RC4 keystream?
References
M. Akgün, P. Kavak, H. Demirci, New results on the key scheduling algorithm of RC4, in INDOCRYPT’08. Lecture Notes in Computer Science, vol. 5365 (2008), pp. 40–52
R. Basu, S. Ganguly, S. Maitra, G. Paul, A complete characterization of the evolution of RC4 pseudo random generation algorithm. J. Math. Cryptol. 2(3), 257–289 (2008)
R. Basu, S. Maitra, G. Paul, T. Talukdar, On some sequences of the secret pseudorandom index j in RC4 key scheduling, in AAECC’09. Lecture Notes in Computer Science, vol. 5527 (2009), pp. 137–148
E. Biham, Y. Carmeli, Efficient reconstruction of RC4 keys from internal states, in FSE’08. Lecture Notes in Computer Science, vol. 5086 (2008), pp. 270–288
J. Chen, A. Miyaji, How to find short RC4 colliding key pairs, in ISC’11. Lecture Notes in Computer Science, vol. 7001 (2011), pp. 32–46
S.R. Fluhrer, D.A. McGrew, Statistical analysis of the alleged RC4 keystream generator, in FSE’00. Lecture Notes in Computer Science, vol. 1978 (2000), pp. 19–30
S.R. Fluhrer, I. Mantin, A. Shamir, Weaknesses in the key scheduling algorithm of RC4, in SAC’01. Lecture Notes in Computer Science, vol. 2259 (2001), pp. 1–24
J.D. Golic, Linear statistical weakness of alleged RC4 keystream generator, in EUROCRYPT’97. Lecture Notes in Computer Science, vol. 1233 (1997), pp. 226–238
J.D. Golic, Iterative probabilistic cryptanalysis of RC4 keystream generator, in ACISP’00. Lecture Notes in Computer Science, vol. 1841 (2000), pp. 220–233
J.D. Golic, G. Morgari, Iterative probabilistic reconstruction of RC4 internal states. IACR Cryptology ePrint Archive, Report 2008/348 (2008). Available at http://eprint.iacr.org/2008/348
A.L. Grosul, D.S. Wallach, A relatedkey cryptanalysis of RC4. Technical Report TR00358, Department of Computer Science, Rice University (2000)
R.J. Jenkins, ISAAC and RC4 (1996). Published on the Internet at http://burtleburtle.net/bob/rand/isaac.html
S. Khazaei, W. Meier, On reconstruction of RC4 keys from internal states, in MMICS’08. Lecture Notes in Computer Science, vol. 5393 (2008), pp. 179–189
A. Klein, Attacks on the RC4 stream cipher. Des. Codes Cryptogr. 48(3), 269–286 (2008)
L.R. Knudsen, W. Meier, B. Preneel, V. Rijmen, S. Verdoolaege, Analysis methods for (alleged) RC4, in ASIACRYPT’98. Lecture Notes in Computer Science, vol. 1514 (1998), pp. 327–341
S. Maitra, G. Paul, S. Sen Gupta, Attack on broadcast RC4 revisited, in FSE’11. Lecture Notes in Computer Science, vol. 6733 (2011), pp. 199–217
I. Mantin, Analysis of the stream cipher RC4. Master’s Thesis, The Weizmann Institute of Science, Israel (2001). Available at http://www.wisdom.weizmann.ac.il/~itsik/RC4/rc4.html
I. Mantin, A. Shamir, A practical attack on broadcast RC4, in FSE’01. Lecture Notes in Computer Science, vol. 2355 (2002), pp. 152–164
I. Mantin, Predicting and distinguishing attacks on RC4 keystream generator, in EUROCRYPT’05. Lecture Notes in Computer Science, vol. 3494 (2005), pp. 491–506
I. Mantin, A practical attack on the fixed RC4 in the WEP mode, in ASIACRYPT’05. Lecture Notes in Computer Science, vol. 3788 (2005), pp. 395–411
M. Matsui, Key collisions of the RC4 stream cipher, in FSE’09. Lecture Notes in Computer Science, vol. 5665 (2009), pp. 38–50
A. Maximov, D. Khovratovich, New state recovery attack on RC4, in CRYPTO’08. Lecture Notes in Computer Science, vol. 5157 (2008), pp. 297–316
I. Mironov, (Not so) random shuffles of RC4, in CRYPTO’02. Lecture Notes in Computer Science, vol. 2442 (2002), pp. 304–319
S. Mister, S.E. Tavares, Cryptanalysis of RC4like ciphers, in SAC’98. Lecture Notes in Computer Science, vol. 1999 (1998), pp. 131–143
S. Paul, B. Preneel, Analysis of nonfortuitous predictive states of the RC4 keystream generator, in INDOCRYPT’03. Lecture Notes in Computer Science, vol. 2904 (2003), pp. 52–67
G. Paul, S. Maitra, Permutation after RC4 key scheduling reveals the secret key, in SAC’07. Lecture Notes in Computer Science, vol. 4876 (2007), pp. 360–377
A. Roos, A class of weak keys in the RC4 stream cipher. Two posts in sci.crypt, messageid 43u1eh$1j3@hermes.is.co.za and 44ebge$llf@hermes.is.co.za (1995). Available at http://www.impic.org/papers/WeakKeysreport.pdf
S. Sen Gupta, S. Maitra, G. Paul, S. Sarkar, Proof of empirical RC4 biases and new key correlations, in SAC’11. Lecture Notes in Computer Science, vol. 7118 (2011), pp. 151–168
P. Sepehrdad, Statistical and algebraic cryptanalysis of lightweight and ultralightweight symmetric primitives. Ph.D. Thesis, No. 5415, École Polytechnique Fédérale de Lausanne (EPFL) (2012). Available at http://lasecwww.epfl.ch/~sepehrdad/Pouyan_Sepehrdad_PhD_Thesis.pdf
P. Sepehrdad, S. Vaudenay, M. Vuagnoux, Discovery and exploitation of new biases in RC4, in SAC’10. Lecture Notes in Computer Science, vol. 6544 (2011), pp. 74–91
P. Sepehrdad, S. Vaudenay, M. Vuagnoux, Statistical attack on RC4—distinguishing WPA, in EUROCRYPT’11. Lecture Notes in Computer Science, vol. 6632 (2011), pp. 343–363
Y. Shiraishi, T. Ohigashi, M. Morii, An improved internalstate reconstruction method of a stream cipher RC4, in Communication, Network, and Information Security. Track 440088, New York, USA, December 10–12 (2003)
V. Tomasevic, S. Bojanic, O. NietoTaladriz, Finding an internal state of RC4 stream cipher. Inf. Sci. 177, 1715–1727 (2007)
E. Tews, R.P. Weinmann, A. Pyshkin, Breaking 104 bit WEP in less than 60 seconds, in WISA’07. Lecture Notes in Computer Science, vol. 4867 (2007), pp. 188–202
E. Tews, M. Beck, Practical attacks against WEP and WPA, in WISEC’09 (ACM, New York, 2009), pp. 79–86
S. Vaudenay, M. Vuagnoux, Passiveonly key recovery attacks on RC4, in SAC’07. Lecture Notes in Computer Science, vol. 4876 (2007), pp. 344–359
D.A. Wagner, My RC4 weak keys (1995). http://www.cs.berkeley.edu/~daw/myposts/myrc4weakkeys
Acknowledgements
We are sincerely thankful to the anonymous reviewers for their detailed review reports containing invaluable feedback and kind suggestions. These reports helped in substantially improving the technical quality as well as the editorial aspects of our paper. We are also grateful to the Centre of Excellence in Cryptology (CoEC), Indian Statistical Institute, Kolkata, funded by the Government of India, for partial support towards this project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Willi Meier.
This is a substantially revised and extended version of the papers [16] of FSE 2011 and [28] of SAC 2011. Sects. 2 and 3 are based on Ref. [28], with major revision in Lemma 1 and a generalization in Theorem 1, along with substantial new contributions in Sect. 2. Section 4.2 is based on Ref. [16] with major revision in the proof of Theorem 14. Section 2.2, Theorem 6 of Sect. 2.3, and Sects. 4.1 and 4.3 are completely new technical contributions in this paper.
Rights and permissions
About this article
Cite this article
Sen Gupta, S., Maitra, S., Paul, G. et al. (Non)Random Sequences from (Non)Random Permutations—Analysis of RC4 Stream Cipher. J Cryptol 27, 67–108 (2014). https://doi.org/10.1007/s0014501291381
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0014501291381