1 Introduction

KISS (‘Keep it Simple Stupid’) is an efficient pseudo-random number generator specified by G. Marsaglia and A. Zaman in 1993 [1]. G. Marsaglia in 1998 posted a C version to various USENET newsgroups, including sci.crypt, culminating in a final version [3]. While Marsaglia himself has never claimed cryptographic security for the KISS generator, he does hint about it, and others have made the intellectual leap and claimed that it is of cryptographic quality. In this paper we show a number of reasons why the generator does not meet some of the KISS authors’ claims, why it is not suitable for use as a stream cipher, and that it is not cryptographically secure as a pseudo-random number generator. Indeed, this paper is written as a cautionary tale; stream ciphers and cryptographically secure random number generators are hard to get right, even for some of the world’s preeminent experts on the subject of randomness. We present a number of standard attacks that damage KISS to some extent, and our best attack requires only 70 words of generated output and on average 229.1 operations of a simple algorithm to recover the initial state. We attempt to present the reasoning that leads to the attacks.

Since 1998 Marsaglia has posted a number of variants of KISS (without version numbers), including [4] in January 2011. In the sequel, we refer to this as KISS11 for disambiguation. These variants have similar design elements but ever increasing state sizes, making them impractical for most cryptographic uses anyway. We present a straightforward divide-and-conquer attack on this (currently) latest version of KISS, requiring 241 operations equivalent to key initializations.

Marsaglia’s generators are primarily intended for scientific applications such as Monte Carlo simulations. There are many other generators in use, often with (ridiculously) long periods, for example the Mersenne Twister [6], and later versions of KISS. We choose to examine the 1999 version of KISS rather than some of these later proposals, because:

  • It is proposed by well known and respected authors

  • It has a reasonably long but not excessive (claimed) period

  • It has a compact state

  • The output ‘looks random’ immediately after initialization.

Other proposals often involve large state variables, often calling on simpler RNGs to initialize these large states. For cryptographic purposes, the lack of a well-specified initialization or keying phase, the large state, and the time required for initialization are all undesirable. It is our opinion that most of these other proposals are also cryptographically weak, but that doesn’t matter because they are also impractical in the context of a stream cipher. In any case, one recent proposal is examined below.

In Section 2 we describe KISS and the components that make it up. Section 4 discusses some simple structural attacks. Mathematical properties of the components are described in Section 3, and in Section 5 these properties are combined to form a cryptanalytic key recovery attack in the known-keystream model. A variation of the attack would even apply to unknown plaintext, if the input language is sufficiently biased and with a significant slowdown. Finally Section 6 examines the 2011 version of KISS (KISS11) and presents a simple attack.

Quotations in this document are from Marsaglia’s USENET posting [3] unless mentioned otherwise.

2 Description of KISS

KISS consists of a combination of four sub-generators each with 32 bits of state, of three kinds:

  • one linear congruential generator modulo 232

  • one general binary linear generator over the vector space GF(2)32

  • two multiply-with-carry generators modulo 216, with different parameters

The four generators are updated independently, and their states are combined to form a stream of 32-bit output words. The four state variables are treated as unsigned 32-bit words. Figure 1 shows C code from [3], while Fig. 2 shows the structure pictorially. We refer to the state registers using the names of the C macros.

figure a
Fig. 1
figure 1

C code of KISS

Fig. 2
figure 2

Diagram of KISS

Note that only half of the state of znew contributes directly to the output of KISS. Its placement in the diagram is intended to show this, and is not an error.

3 Properties of components

The four registers, of three kinds, that make up KISS are independently updated, then combined using shifts, adds, and one XOR operation. As is often the case, the registers that use arithmetic operations are combined with XOR, while the results that are produced bitwise are combined with addition modulo 232.

3.1 Multiply with carry registers

Two of the registers, wnew and znew, are Multiply With Carry generators. At each step, the low-order 16 bits is multiplied with a constant, and the high order 16 bits from the previous step are added. The low order 16 bits are random looking, and with a period determined by the entire construct. The most significant bits of the register are not very random; in particular the MSB will be almost always zero. But these most significant bits contribute to the long period. The low 16 bits of znew are shifted left before being added to the whole register wnew, presumably to cover up this nonrandomness.

With a multiplier C, this generator calculates x n = C x n−1 mod216 C − 1. With the stated constants, the modulus is prime, so the generator is readily inverted.The word ‘concatenates’ is somewhat misleading, as the high order bits of wnew contribute to the output, even though Marsaglia appears to consider it to be a 16-bit generator, but this does not affect either the period of the MWC construction, or our later cryptanalysis.

figure b

For cryptographic purposes, initialization of these registers is somewhat important. It is easy to see that the value zero will repeat forever. Large values will, at the first update, become smaller values that collide with the results of updating some smaller value, and the generators will then follow the same path as if they were initialized with those smaller values. For each register, there are two distinct cycles of equal prime period. (We were not aware of any analytical result, but the small state space allowed us to enumerate the cycles in a few minutes of CPU time.) There is also another fixed point (besides zero) for each register. Table 1 summarizes these details. In the table, a cycle length of ‘n/a’ denotes a value that collides and enters an existing cycle.

Table 1 Properties of the Multiply-With-Carry registers

The values shown in the table are the least values that occur in the cycle, or the values that lead into the existing cycles.

For cryptographic purposes there are thus two bad values for each register, and approximately half of random values enter the same cycles as other values, which might be seen as nearly equivalent keys: only the first word generated would be different, but the same thereafter.

Since the periods of the two registers are prime, if the initial values are not bad, the period of MWC (that is, the combination of the two subgenerators) will be about 259.3. The period of the least significant 16 bits will only be that of wnew, which is about 229.1.

Two consecutive 16-bit outputs from either generator suffice to recover the unknown upper 16 bit carry value, once the generator is in a cycle. It is only slightly more complicated to go backwards in the cycle. Given the least significant 16 bits of an initial (out of cycle) value, and the next (in cycle) state, it is easy to recover the most significant 16 bits of the initial value.

3.2 Properties of the linear congruential generator

The linear congruential generator CONG is well studied in the literature. With the specified parameters it has period 232.Observe that if the value is odd, the next value will be even, and vice versa. That is, the least significant bit toggles ...010101.... Other bits exhibit patterns with longer periods, but the attack we describe only needs the least significant bit.

figure c


For each n < 32, the least significant n bits of CONG form a smaller Linear Congruential Generator modulo 2n, with recurrence

$$x_{i} = (69069\ \text{mod}\ 2^{n})x_{i-1} + (1234567\ \text{mod}\ 2^{n}) (\text{modulo} 2^{n})$$

where mod represents a remainder operation as in C. The behaviour of the least significant bit is just the simplest application of this observation.

Plumstead [7] gives a method of inferring the state of a linear congruential generator from information about its most significant bits, which appear more random, but the above observation implies that information from the least significant bits cannot be extended upward; consider the analogy with knowing the least significant bits of a simple counter.

3.3 The 3-shift register generator

The generator SHR3 is updated using shifts and XOR operations that are efficient. However for analysis purposes this is most easily understood as multiplication modulo 2 of a 32 by 32 bit update matrix with the 32-bit register state. The system is by definition linear and amenable to standard techniques.The sequence of values generated by SHR3 is not maximal length. Appendix lists the possible cycles. The bit sequence generated by each bit position of the register is the output of some Linear Feedback Shift Register of degree D ≤ 32. There are 16 different polynomials, with 64 disjoint cycles, with lengths ranging from 1 to 306,706,140 (approximately 228.2). The expected cycle length (although no such cycle exists) is approximately 227.7; more than half of initial values lie on one of the longest cycles. The values 0 and 0xaea21b8f are self-perpetuating.

figure d

McQueen [5] pointed out that later versions of KISS exchange the values 13 and 17 in the formula, and do indeed lead to a single cycle with maximum length. In particular KISS11, analyzed below, uses the correct version of this generator.

Any sequence of 32 output bits (in particular, we use the least significant bit of consecutive output words) suffices to recover the initial state of the SHR3 register, via a simple matrix multiplication.

4 Cryptographic deficiencies and simple structural attacks


Since the authors of KISS did not envision the generator being used for cryptographic purposes, they did not think about issues such as setting up ‘keys’, or reinitializing with nonces. Since the state of KISS is only 128 bits, it makes sense to simply use a 128-bit key by dividing it into 4 32-bit words, and defining that words (both key and output) should be treated as Little- or Big- Endian. None of the components show obvious output relationships with related keys after a few words of output; if a nonce or initialization vector is to be used, it could be simply XORed or added to the key, with a few rounds of output being discarded to allow any similarities to dissipate. We equate key recovery with recovering the exact initial state.

Time-Memory Tradeoff

Since the keyspace is the same as the state space, standard time-memory tradeoff attacks could be easily applied. See [8] for an overview of the past and current application of such attacks. We do not explore them further here.

Weak keys

If the first output word is ignored, only about 2126 output sequences can be generated, due to the colliding behaviour of the Multiply-With-Carry generators. CONG is the only generator for which zero is not a bad value. Any key in which one of the words initializing another register is zero will effectively take that register out of the combined generator, leading to a shortened cycle and possibly exposing the generator to simpler attacks. In particular, the all-zero key results in the generator being exactly the output from CONG, a simple and easily recognized linear congruential generator.

Cycle length

The maximum cycle length is about 232+59.3+28.2 = 2119.5. This contradicts the KISS authors’ claim of ‘about 2123’. We hypothesize that they expected SHR3 to generate a maximum length sequence, as in the quotation above. Significantly shorter cycles are possible but correspondingly less likely to happen. The shortest cycle is 232 (from CONG) occurring with probability 2−93, since there are two minimum-cycle-length values for each of the other three generators.

Divide and Conquer

KISS updates the four sub-generators independently, and only combines them for the output. This structure invites attacks where one or more of the components can somehow be ignored or cancelled out to enable an attack on the remainder. Indeed the attack below uses this technique to some extent. Another way to apply divide-and-conquer attacks would be to take output words from a long keystream, and combine words separated by the period of one of the subgenerators, in such a way as to cancel it out. For example, if the output of SHR3 really had period 232 − 1, words that far apart in the keystream could be subtracted to remove the effect of that sub-generator. This technique was used successfully in the cryptanalysis of the SSC2 stream cipher [9].

5 Key recovery attack in the known keystream model

Marsaglia, in a different article [2], writes:

figure e

It is a common theme in both amateur and professional cryptography to start with a design, and as defects are discovered, to try to cover them with another feature or another generator. In the case of KISS, this design is apparent with the use of simple generators with known flaws to cover each others’ deficiencies. The job of the cryptanalyst, conversely, is to find ways to expose the flaws and defeat them in detail. In this section we not only show a key recovery attack against KISS, but we attempt to reveal the thought process, and experiments conducted, to arrive at this attack. This paper hopes to teach a little bit of cryptanalysis technique.

Our goal is to take some amount of known keystream (a standard assumption for cryptanalysis of stream ciphers) and recover the initial state of the four registers.

We first began by analyzing the components, to better understand them, as Marsaglia’s USENET posting was short on details. We did not try to obtain the original technical report [1], which probably contains more analysis, in favor of developing our own understanding. CONG was already well understood. The Multiply With Carry generators wnew and znew were new to us, so we wrote code to enumerate the cycle structure, and looked at how easy it was to run the generators backwards.

When it came time to analyze the SHR3 generator, at first we accepted the implication that it had the maximum possible period of 232 − 1, and wanted to find the equivalent LFSR, as we were more familiar with cryptanalyzing such structures. Using the default initial value from the KISS paper, we generated 128 output words, and selected the least significant bits for input to the Berlekamp-Massey algorithm. As expected, a degree-32 polynomial was the output, which did not contradict our belief that it might have a maximum length cycle. We continued on that assumption.

Sorting out the combination of four generators seems already to be difficult, but one of the generators, znew, only contributes to the upper half of the output words. Concentrating on the lower half reduces the number of generators to three. This is already an application of the Divide and Conquer strategy. The generators are combined with both integer addition and XOR. Either the carries from the addition must be accomodated, or we could focus on the least significant bit, in which addition and XOR are equivalent since there is no carry in. This approach looked promising, as the behaviour of the least significant bits of two of the generators were easily analyzed.

The least significant bit of wnew did not exhibit any linearity or other non-random properties we could detect, as was to be expected from Marsaglia’s description. Further, it depended on the entire 32-bit state of the register. But only the very first output depended on the full state; the second and all subsequent outputs were in one of the known cycles. By skipping the first output word, at least for now, only 1,179,648,000 values (about 230.1) for the state needed to be considered.

This suggests the first part of the attack. Assume that the cryptanalyst knows some amount of output keystream words Z i ,0 ≤ iN for some as yet undetermined number N. We want to apply some sort of Guess and Determine attack. In such an attack, we guess a value for some unknown quantity, determine from the known and guessed values some other values, and then attempt to verify whether or not the guess was correct, that is, consistent with known values or properties. If the verification fails, we try another guess, thus eventually enumerating some subset of the unknown state.

Step 1. Ignore Z 0 . Ignore all but the least significant bits of Z i . We will worry about the actual initial state later; for now we try to recover the state after the first output word. We now have a sequence of N bits of known keystream.

Step 2. Guess w n e w 1 (that is, the state of the register wnew immediately after the first output). We only have to enumerate the values in cycles. From this guess, generate the stream of least significant bits w i = w n e w i &1,1 ≤ iN.

Step 3. Guess the LSB of C O N G 1 . As discussed above, the stream of LSBs is alternating zeros and ones, but we don’t know which it starts with. We need to test both. Let c i = 010101010…. If this guess doesn’t work out, invert it; in fact, we can just invert the entire sequence used in steps 4 and 5.

Step 4. Determine the possible LSBs of the SHR3 output. Set s i = Z i w i c i ,1 ≤ iN.

Step 5. Verify that s i is the output of an LFSR. When we first implemented this step, we used the discovered polynomial 1 + x 3 + x 5 + x 8 + x 18 + x 22 + x 30 + x 32 to check the parity of the corresponding output bits, with N = 100. If our guesses for the wnew and CONG registers were correct, the sums of the bits should all be zero, otherwise with overwhelming probability (1 − 2−67) at least one of the parity equations would be one. This worked fine for the set of initial values used by default in KISS, but our second trial did not find a solution! After a bit of thought, the only reason we could think of was that there was not a single LFSR that described the system. We already had a very simple program to enumerate cycles in a 32-bit function, so with one minute of editing and three minutes of computation on an Apple laptop, we were able to find the details in Appendix A. We could have proceeded with the verification by checking all 16 of the distinct sets of LFSR taps, but instead chose the simpler approach of just using the Berlekamp-Massey algorithm again. 64 bits of input are necessary to discover the longest LFSR, and only surprisingly few more bits are needed to detect if the bit stream is not linear. With high probability, an extra 5 bits will show that. It is faster to let a small number of false positive results through this stage and eliminate them later than to run extra bits through the Berlekamp-Massey algorithm, which has O(n 2) behaviour in the number of bits. This turns out to be the limiting step in the attack, and could perhaps be optimized further.

Step 6. Recover S H R 3 1 . A simple matrix multiplication turns the bits s i ,1 ≤ i ≤ 32 into the initial state S H R31. At this point, with high probability, we know all 32 bits of SHR3.

Step 7. Recover the least 16 bits of C O N G 1 . Given the known output, and full (assumed correct for now) values for SHR3 and wnew, the least significant 16 bits of CONG are recovered. At first we assumed that the well known result [7] would reveal the full initial state, but we had misremembered the result. At this point we know the full state of two of the registers, half the state of CONG, and nothing about znew.

Step 8. Guess the high 16 bits of C O N G 1 . We have to apply the Guess and Determine attack again, guessing one of the registers and determining the other. It is obviously most efficient to guess the remaining contents of the register that is partially known, rather than the whole 32 bits of the other unknown register. We then generate 3 output words from the three known or guessed registers.

Step 9. Determine the least 16 bits of znew. From the known keystream and the other registers, we recover the low 16 bits of z n e w i ,1 ≤ i ≤ 3.

Step 10. Verify consistency of the guess of C O N G 1 . From the low 16 bits of the first two values of znew, we derive the full state of z n e w 1. From this we generate z n e w 3 and verify that the least 16 bits agree with those generated in Step 9. If not, we continue enumerating the high order bits of CONG.

Step 11. Verify the entire state. We now have a candidate for the state of all four registers after the first output. We generate 8 words of output from KISS with that state, and check that they agree with Z i ,1 ≤ i ≤ 8. Checking 8 values is almost certainly more than necessary.

Step 12. Recover the true initial state. Remember that we skipped the first output Z 0, because the multiply-with-carry registers might not have started with values in their corresponding cycles. For each of the registers there are two (or for wnew sometimes three) possible initial states. We simply need to check which of the (up to 6) possibilities are consistent with the known output Z 0.

The algorithm above is completely dominated by the time taken in Step 5; at most a handful of candidates get through to steps 7-12. This step runs the Berlekamp-Massey algorithm on 69 bits; in the worst possible case it will be run on about 231.1 inputs (the possible cycles of the wnew register, times two for the LSB of the CONG register). It takes on average about 2 hours on a single CPU of an Apple Macbook Pro (2.66 GHz Intel Core i7), with negligible memory usage. We did not bother optimizing or parallelizing the code. By the nature of the Guess and Determine steps, there are no obstacles to parallelization. Possible optimizations of this step include performing the parity checks instead; we think this would result in a significant speed increase but we didn’t actually try it.

We note in passing that Step 5 could be done by applying a Fast Correlation Attack to unknown plaintext, if the least significant bits of the plaintext language are sufficiently biased and enough output is available. Of course this would take significantly longer to run the attack.

6 KISS11

In January 2011, just after we completed the above analysis, Marsaglia posted an article to sci.crypt and other Usenet newsgroups with a new version of KISS, which we refer to as KISS11. In this article he presented two generators, one based on 32-bit integer variables, and the other on 64-bit integer variables. Here we examine only the 32-bit version, but the attack applies equally to the larger version with correspondingly larger complexity.

KISS11 (shown in Fig. 3, with one declaration moved for readability) has three components. The congruential generator and the 3-shift generator (CNG and XS respectively) have new names but are otherwise the same as described above (with XS corrected). The real change is that the two small Multiply With Carry generators have been replaced with a single, huge Complementary Multiply With Carry generator, implemented by the function b32MWC. This generator has 222 + 1 words of state (the array Q and integer carry above). We don’t try to explain how it works; for that see [4]. Marsaglia does mention its cryptographic weakness, and attempting to hide it:Really, the only cryptographic strength of this subgenerator is its huge (just over 227 bits) state. For the first 16 megabytes of output, it is behaving like a Vernam cipher. For simplicity in the discussion below, we refer to R = 222 as the number of outputs in a “round”, that is, a full pass through the array Q.

Fig. 3
figure 3

C code of KISS11

figure f

We assume that the contents of Q, cng and xs are initialized completely randomly and independently of each other. Note that the initial value of carry is explicitly set to zero. We assume that the output stream from KISS is known to the attacker (a standard cryptographic assumption); call this output stream Z i ,i = 1...

Some observations:

1) at every stage, carry will usually be only 28 bits in length. The exception is when the high order 28 bits of the input Q value are all zero, carry will (almost never, but depending on the input carry value) become 0xFFFFFFFF. This is rare enough not to worry about in the attack below.

1a) if you know the input value Q[t], the output carry value is almost entirely defined by the most significant 28 bits. 15/16ths of the time, whether the optional subtraction of 1 happens is determined by comparing the least significant and most significant 4 bits of the input Q[t], independent of the input carry.

2) at every stage t, the output of b32MWC will be the input value of Q[t mod R] to the calculation of b32MWC R steps later.

3) CNG and XS are easily invertible (see above).

4) b32MWC is also invertible; if you know both the input Q[j] and the output Q[j] not only can you derive the output carry, you can also derive the input carry.

5) given (3) and (4), you can run the generator backwards if you need to.

The simple attack we present is a trivial application of the divide-and-conquer method, splitting the generator into two parts.

1. Start by guessing (enumerating) the 264 possible pairs (x s 0 and c n g 0).

2. Run the subgenerators CNG and XS forward R+3 rounds, and calculate M i−1 = Z i c n g i x s i ,(1 <= i <= R + 3). These are possible candidate values for the Q[i − 1 mod R] values.

3. If we guessed the initial values correctly in the first step, then M 0 should be the output Q[0] (hence also the input to the calculation of Q[R]), and M R should be the output Q[0] after R steps. From these, we can calculate c a r r y R , and from c a r r y R and M R+1 (which we hope equals output Q[1]) we can check whether, in fact, the computation given input and output Q[1], and what we just calculated, agree with each other. If they don’t, go back to step 1. If they do (which might happen randomly), do the same for R + 2, and R + 3. At this point, if all checks worked, with high probability we must have guessed CNG and XS correctly.

4. Run b32MWC backwards to recover the initial settings of Q[i].

The expected work to do this is about 285 times the work to generate output words, which is a pretty normal measure. That is 264/2 (since you expect to search half the values) times about 222 words generated (because of the huge state space). The attack requires a little more than 222 words of known keystream, again because of the size of the state. Another way to look at this is that it’s about 263 times the cost of doing a key setup (independent of how you actually set up that huge key!).

However this attack can be significantly optimized. First we note that most of the generated values of from CNG and XS are not used in the important validation step; of the R + 3 pairs of values generated, only the first three and last three are used. Secondly, we note that both generators can be jumped forward a known amount (in our case R steps) with very much less complexity than simply evaluating the functions R times. In the case of XS, we can precompute the update matrix, and compute x s R by multiplying x s 0 by it. Similarly, CNG can be jumped forward using a modular “square and multiply” exponentiation calculation.

When enumerating the possible initial values of XS and CNG, there is no reason to enumerate them as counters. Since the values all fall on cycles (a single cycle in both cases, given the corrected values in XS) the values can be enumerated in the sequence produced by the cycle. For example, we start with c n g 0 = 0, and quickly calculate c n g R from that. By iterating CNG we easily calculate c n g 1..c n g 3 and c n g R+1..c n g R+3 as required above. After testing this value (and presumably failing), we can test the next value by discarding c n g 0, using the old c n g 1 as the new c n g 0, similarly rolling over c n g R and so on. The cycle of XS can be handled in the same way. Only once the correct values for c n g 0 and x s 0 have been identified is it necessary to run the generators R steps to recover the initial values of the Q array. Parallelization of this attack is a bit more tricky, but still easy. With this optimization, about 263 operations on average can be expected to recover the key. Since the Q array is so large, this corresponds to about 241 key setup operations.

7 Conclusion

We conclude that the KISS generator, unsurprisingly, should not be used in any context where cryptographic security is important. Its period is more than sufficient for tasks like simulations, so long as it is correctly initialized, but care should be taken with this initialization step. Its period (maximum, expected, and minimum) is not as long as its authors claim.

As a design rule, updating the component generators independently leads to easy analysis, which is both a good and a bad thing. Exchanging values between the various state registers might defeat the kind of attack we have applied, but it would also make any kind of rigorous analysis impossible, and might make the generator weaker in ways that are difficult to deduce but damaging in practice.

We would like to thank our colleague David Jacobson for help with the linear algebra, and the anonymous reviewers who provided suggestions for improvement.