Predicting and Distinguishing Attacks on RC4 Keystream Generator

Abstract

In this paper we analyze the statistical distribution of the keystream generator used by the stream ciphers RC4 and RC4A. Our first result is the discovery of statistical biases of the digraphs distribution of RC4/RC4A generated streams, where digraphs tend to repeat with short gaps between them. We show how an attacker can use these biased patterns to distinguish RC4 keystreams of 226 bytes and RC4A keystreams of 226.5 bytes from randomness with success rate of more than 2/3. Our second result is the discovery of a family of patterns in RC4 keystreams whose probabilities in RC4 keystreams are several times their probabilities in random streams. These patterns can be used to predict bits and words of RC4 with arbitrary advantage, e.g., after 245 output words a single bit can be predicted with probability of 85%, and after 250 output words a single byte can be predicted with probability of 82%, contradicting the unpredictability property of PRNGs.