PudgyTurtle: Using Keystream to Encode and Encrypt

Stream cipher encryption works by modulo-2 adding plaintext bits to keystream bits, which are in turn produced by successively updating a finite-state machine initialized to a secret starting state. PudgyTurtle is a way to encode the plaintext in a keystream-dependent manner before encryption. Since it can use keystream from any stream cipher, PudgyTurtle functions somewhat like an encryption mode. The process begins by generating successive 4-bit groups of keystream (‘nibbles’) until one of them matches the current plaintext nibble to within one bit. The number of keystream nibbles required, as well as the nearness of this match, is then encoded into a variable-length codeword. Finally, this codeword is encrypted by modulo-2 addition to an equal amount of keystream. Compared to normal binary-additive stream ciphers, this process is less efficient (i.e., more time is required to generate extra keystream nibbles, and more space is needed for the codewords than for the plaintext). However, with this cost comes a benefit: PudgyTurtle resists time-memory tradeoff attacks better than standard stream encryption.


Introduction
Binary-additive stream ciphers (BASC) encrypt by modulo-2 adding each plaintext bit ( x i ) to a keystream bit ( k i ), thus producing ciphertext y i = x i ⊕ k i , where ⊕ denotes the XOR operation. Because of the self-inverting property of modulo-2 addition, decryption is accomplished in a similar manner: The sequence of bits, K = k 1 , k 2 , … , is called the keystream. K is produced by the keystream generator (KSG)a finite-state machine operating on an n-bit state, S. Since details of the KSG are assumed to be public, its starting-state ( S 0 ) must include a secret key. The starting-state may also incorporate an initialization vector (IV), which need not be secret (e.g., it can be a publicly shared random 'nonce', or can be generated by each communicating party from a counter). Unique IVs allow the same secret key to be used for more than one message.
The KSG works by applying an update function ∶ {0, 1} n → {0, 1} n and an output function o ∶ {0, 1} n → {0, 1} to the current n-bit state. Thus, k i = o(S i ) = o( (S i−1 )) . Functions o and are designed to make K appear random and unpredictable, which makes it hard to reconstruct any previous state from any sub-sequence of keystream. PudgyTurtle uses the keystream to encode the plaintext into a sequence of variable-length codewords and then encrypts these codewords. This process is cipheragnostic, in that no constraints are placed on the KSG. Plaintext X is first separated into 4-bit groups ('nibbles'), X 1 , X 2 , … , where X i = (x 4i−3 ‖x 4i−2 ‖x 4i−1 ‖x 4i ) and ‖ stands for concatenation. For each X i , new keystream nibbles are generated until one matches X i to within a 1-bit tolerance.
A codeword ( C i ) is then created using the number of failed matching attempts, plus some error-correcting information. This codeword is encrypted with a mask, M i -a (possibly non-contiguous) sequence of keystream nibbles of the same length as C i -thus producing ciphertext

Stream-Cipher Cryptanalysis
A major goal of stream-cipher cryptanalysis is to reconstruct any KSG state used during encryption. Once discovered, the generator can then be run forward (and, if is reversible, backwards) from that state to recreate the keystream and decrypt the ciphertext.
Some stream ciphers are susceptible to direct attacks, which exploit algebraic, key-scheduling, statistical or other structural weaknesses particular to individual crypto-algorithms [11,21,36,38,39]. If no direct attack is known, then an exhaustive key-search may be attempted. It might seem that, for any KSG with a sufficiently large state-size, this so-called brute-force cryptanalysis would be infeasible. However, this is not always true: cryptanalytic time-memorydata tradeoffs (TMDTOs), also called collision attacks, are probabilistic methods which take advantage of the Birthday Paradox to speed up an exhaustive search to practical levels (e.g., only 2 n∕2 operations instead of 2 n ) [16,26,29]. These attacks, described more fully in Sect. 5, have proven successful against a variety of stream ciphers [6,13,48].
TMDTO attacks against stream ciphers require a sample of known keystream, K ′ , obtained by XOR'ing some known plaintext, X ′ , to the intercepted ciphertext at the correct offset. PudgyTurtle, however, changes the plaintext-ciphertext relationship into something different than a simple XOR, thus making it harder to obtain K ′ from X ′ . On the one hand, this property makes PudgyTurtle less efficient than standard binary-additive stream-ciphers: its codewords take up more space than the original plaintext ('pudgy'), and its encoding procedure needs extra time to match each plaintext nibble to the keystream ('turtle'). On the other hand, these features also increase the difficulty of a collision attack.

Coding and Cryptography
Coding theory has been part of modern cryptology since the work of Shannon in the late 1940's [50]. Hellman extended these ideas for cryptosystem design, commenting on the 'duality' between ciphers and error-correcting codes (ECCs) [25]. Other cryptographers have also investigated various aspects of coding. For example, Bellare and Rogaway explored semantic security and authentication in cryptosystems that include a 'key-less encoding' step (e.g., prepending a counter or IV, and appending a checksum to the message) [7].
Another cryptographic application of coding is 'randomized encryption' [47], in which one ciphertext is chosen randomly from among a family of possibilities. An early instantiation of this idea for asymmetric cryptosystems was by McEliece, who proposed adding random noise after encoding the plaintext with a binary Goppa-type ECC [40].
Here, the private key (G) is the ECC's generator matrix and the public key ( G ′ ) is obtained by multiplying G by permutation and scrambling matrices. Security arises from the fact that a fast method exists for decoding and error-correction given G, but the generalized linear decoding problem given G ′ is NP-complete [9]. More recently, similar ideas have appeared in the realm of quantum-resistant cryptography (e.g., lattice-based systems relying on the difficulty of finding the closest vector to one that has been perturbed by noise [45,46]).
With regard to stream ciphers, several approaches exist for randomized encryption. Kara and Erguler [32,33] proposed using an ECC to encode the plaintext, which is then encrypted with a 'noisy keystream' (i.e., the modulo-2 sum of the true keystream and random binary noise). The receiver decrypts the ciphertext into a noisy state, then recovers the plaintext by using the ECC. Importantly, the additive noise does not need to be reconstructed by the receiver-unlike a cryptographic key. Another ECC-based approach is based on the 'learning parity with noise' (LPN) problem, analyzed by Fossorier [20]. For example, Imai and Mihaljević [42] proposed an LPN-based system in which external randomness is not just used for additive noise, but also as a basis for homophonic coding (i.e., each plaintext group can be encoded in multiple ways).
PudgyTurtle is similar to these approaches in some ways: its ciphertext is longer than its plaintext; and its encoding changes the plaintext-ciphertext relationship from a simple XOR into something more complex. However, PudgyTurtle also has several important differences. First, unlike some 'encode-then-encipher' ideas [7], PudgyTurtle's encoding is key-dependent, not key-less. Second, PudgyTurtle is deterministic: it is not randomized encryption and does not require an external noise source [32,33,40]. Third, PudgyTurtle operates on small (4-bit) plaintext groups: message padding is not required, nor are complex vector and matrix operations. Fourth, PudgyTurtle's codewords do not contain any raw plaintext, unlike other ECC-based systems. Rather, information about the relative position of various keystream nibbles is what actually gets encoded. Finally, PudgyTurtle does not suffer from decoding failures, as may (in rare cases) occur with some ECC-based systems [41].

Stream-Cipher Modes
PudgyTurtle is not a cipher itself, but operates alongside existing stream ciphers. In this respect, it resembles an 'encryption mode'. Some stream-cipher modes are intended to add advanced features, like authenticated encryption, by using a BASC as the crypto-primitive [1,10,49]. Other stream-cipher modes are designed to resist TMDTO attacks, such as by continuously incorporating the secret key/IV into the KSG's state-update function [2,3,43]. The SN Computer Science 'FP(1)-mode', used by the LIZARD cipher for instance [24], describes how to re-synchronize the KSG state in a particular key-and IV-dependent manner [23]. In theory, systems like these, which can be analyzed as pseudorandom functions [8], could make TMDTO attacks harder (e.g., by breaking the state-space into a set of mutually exclusive 'keystream equivalence classes', each requiring a separate attack). In practice, security concerns may still exist, due to weak keys, key-scheduling or other issues [15,18,19,24].
Like these stream-cipher modes, PudgyTurtle also attempts to make TMDTO attacks harder. However, rather than altering initialization or re-synchronization procedures, PudgyTurtle instead takes the keystream and uses it in two different ways: • A binary-additive (codeword ⊕ mask) process; • A non-binary-additive (plaintext-keystream matching) process; PudgyTurtle can still be compatible with other stream-cipher modes. Whether KSG initialization and re-synchronization happens 'simply' (as in Trivium or A5/1) or by a more complex protocol (as in LIZARD), the keystream produced works equally well for PudgyTurtle.

Outline
After Sect. 2 presents some notation, Sects. 3 and 4 describe the PudgyTurtle algorithm and its output in detail. Section 5 introduces TMDTO attacks, and Sect. 6 discusses some general obstacles to performing such attacks against PudgyTurtle. Section 7 proposes two new TMDTO attacks against PudgyTurtle, and Sect. 8 describes how to quantify their performance. These new attacks are then implemented in Sect. 9, using a toy cipher as the KSG. We show that both attacks are less efficient than two well-known tradeoffs from the literature. Finally, Sect. 10 suggests some potential limitations of PudgyTurtle.

Notation
Hexadecimal values are prefixed by '0x' and binary values have '2' as a subscript. For example, 242 could be written as 0xF2 or 11110010 2 .
Hamming weight h(a) is the number of 1's in binary vector a.
Single vertical bars denote the number of elements in a set (e.g., |K| is the size of the key-space).
Throughout, X refers to plaintext, K to keystream, and Y to ciphertext, with the following embellishments: • Lowercase letters with subscripts ( x i , k i , y i ) represent individual bits. • Uppercase letters with subscripts denote groups of bits: X i and K i are (4-bit) nibbles; Y i is one or more (8-bit) bytes. • Uppercase letters with a prime ( ′ ) are assumed to be known to the attacker (e.g., X ′ is the 'known plaintext' and K ′ the 'known keystream'). • Uppercase letters with double-subscripts denote individual bytes within a multi-byte symbol. For example, if Y i is two bytes long, then its first and second bytes are Y i,1 and Y i,2 . • Uppercase letters with parentheses denote an n-bit segment of a longer sequence, where n is the size of the KSG-state. Depending on context, these segments may be called 'windows', 'prefixes', or 'fragments'. For example, K(a) = {k a , k a+1 , k a+2 , … , k a+n−1 } means n bits of keystream starting at bit a. • K t(i) stands for the keystream nibble that matches plaintext nibble X i to within 1 bit. • N X , N K , and N Y represent the number of symbols of plaintext, keystream, and ciphertext, whereas L X , L K , and L Y are their actual lengths in bits;

PudgyTurtle
This section provides details of PudgyTurtle encryption and decryption. A descriptive overview is offered, followed by an algorithmic explanation. Table 1 can also be consulted as a visual aid.

Overview
The first task is to encode each plaintext nibble X i into a variable-length codeword, C i , written as where • O i is a variable-length overflow indicator, which contains zero or more copies of the special byte 0xFF (see details below); • F i = 0, 1, 2, … is the failure-counter, which counts the number of un-successful attempts to match X i to a keystream nibble. Since F i is zero-indexed, F i = 0 means that the one keystream nibble had to be generated in order to match X i , and so on. • D i is the discrepancy-code (see details below), which describes the mismatch pattern between X i and K t(i) , thereby allowing single-bit error correction by the receiver; The PudgyTurtle process begins by saving the first two keystream nibbles ( K 1 ‖K 2 ) as mask M 1 and setting the first failure-counter F 1 to 0. Then, each new keystream nibble is compared to X 1 , starting from K 3 . If it differs from X 1 by more than one bit, then F 1 is incremented, a new keystream nibble is generated, and the search continues (i.e., X 1 is then compared to K 4 and so on) until a match is found. Once the keystream nibble K t(1) that matches X 1 is discovered, the nearness of this match is captured by discrepancy- The 8-bit codeword C 1 is then constructed by concatenating F 1 modulo 32 (which is 5 bits) together with D 1 (which is 3 bits). Finally, the ciphertext is produced by encrypting the codeword with the mask: Y 1 = C 1 ⊕ M 1 . This process then repeats for the next plaintext nibble, X 2 , starting with F 2 = 0 and using keystream beginning at K t(1)+1 . Table 1 provides a visual example of how a short message is encoded and encrypted via PudgyTurtle.

Overflow Events
This match-encode-encrypt cycle has one caveat: if a failurecounter (say F 1 ) is ≥ 32, it can no longer be represented by 5 bits, and this overflow event triggers a special encoding process: first, an 0xFF byte is pre-pended to C 1 ; second, mask M 1 is expanded to include the next two available keystream nibbles, which would be K 35 and K 36 in this case. That is, Attempts to match X 1 then continue, starting from keystream nibble K 37 and F 1 = 32. When a match is found, its codeword will be two bytes instead of one: In the unlikely event that no match occurs even within the next 32 keystream nibbles, this overflow process can be repeated (i.e., both the codeword C 1 = ‖ ‖(F 1 mod 32)‖D 1 and mask M 1 = K 1 ‖K 2 ‖K 35 ‖K 36 ‖K 69 ‖K 70 would become three bytes long).
The overflow byte 0xFF is made by concatenating the 5-bit failure-counter 31 = 11111 2 together with the 3-bit symbol 111 2 . There is no theoretical reason to choose 111 2 : any 3-bit discrepancy-code not already in use could also serve (i.e., either 101 2 or 110 2 ). Practically, however, using 111 2 allows for easy specification of the overflow indicator: if n O = ⌊F i ∕32⌋ is the number of overflow events that occur while encoding X i , then In software, ∅ is implemented as an 'empty string'. For example, two overflow events means O i = 2 16 Because of overflow events, each mask ( M i ), codeword ( C i ) and ciphertext symbol ( Y i ) can be one or more bytes. Most of the time, however, there are no overflows, so O i is the empty string and each of these symbols is just one byte long.

Algorithmic Description
The PudgyTurtle encoding/encryption process can also be conceptualized as an algorithm: Each column illustrates the encryption of one nibble of the plaintext message "Hi" (Row 1, ASCII characters 0x48 and 0x69). Row 2 shows the keystream nibbles. Row 3 depicts the two keystream nibbles set aside for each mask. The failure-counter (Row 4) increments from zero until a keystream nibble matches the plaintext nibble to within a 1-bit tolerance, as quantified by the Hamming distance in Row 5 (e.g., the first Hamming-distance, between K 3 = 0xB = 1011 2 and X 1 = 4 = 0100 2 , equals 4). When this Hamming distance first becomes ≤ 1, a match occurs, the nearness of which is captured by the discrepancy-code (Row 6). For example, the notation "8 versus 0" above "100" means that X 2 = 8 = 1000 2 differs from its matching keystream nibble K 8 = 0 = 0000 2 in the most-significant bit, so the discrepancy code is 100 2 . Row 7 shows how each codeword is built by concatenating the 5-bit failure counter (normal font) and the 3-bit discrepancy code (boldface). Finally, encryption is accomplished by XOR'ing the mask (shown again in Row 8, as binary) and the codeword, thus producing the ciphertext in Row 9 • Make the overflow indicator: • Make the discrepancy-code: • Make the codeword: where O i is either not present or is a multiple of 8 bits, F i (mod 32) is 5 bits, and D i is 3 bits.

Decryption
One difference between PudgyTurtle decryption and encryption is that because of overflow events, ciphertext symbols Y 1 , Y 2 , … may not all be the same length. Put another way, N Y always equals N X , but ciphertext length L Y does not always equal N Y bytes. Thus, decryption requires a separate 'unmasking' of individual bytes within each ciphertext symbol.
The first byte of Y is unmasked by XOR'ing it with M 1 = ( K 1 ‖K 2 ), thereby producing the first byte of the first codeword. If this byte is not equal to 0xFF, then the byte is split into its first 5 bits (failure-counter F 1 modulo 32) and its last 3 bits (discrepancy-code D 1 ). Next, F 1 + 1 new keystream nibbles are generated. The final one of these, K t (1) , is the one that matches the original plaintext nibble to within one bit. The plaintext is then recovered by inverting the discrepancy code, as shown below: Or more generally, If, however, unmasking the first byte of Y produces 0xFF, then an overflow event has occurred (i.e., Y 1 > 1 byte long). In this case, 32 keystream nibbles must be generated and discarded, after which the next 2 keystream nibbles ( K 35 ‖K 36 ) are used to unmask the second byte of Y (i.e., Y 1,2 ), which is then split into F 1 and D 1 as described above. (In the rare case that Y 1,2 is also 0xFF, this overflow process can be repeated.) Y 2 is decrypted in a similar manner, starting one nibble beyond the current keystream position. That is, the first byte of Y 2 is unmasked by XOR'ing it with ( K t(1)+1 ‖K t(1)+2 ), and-depending upon whether or not the result is 0xFFanalogous steps are followed. This byte-by-byte unmasking-decoding cycle continues for each of the N Y ciphertext symbols.

Packet Systems
Some stream ciphers, like E0 for Bluetooth A5/1 for mobile telephony, operate in packet mode: the keystream is generated in short segments, and re-synchronized with a new IV or counter after each such packet [23]. For example, A5/1 produces 228 bits of keystream at a time, after which its IV ('frame number') needs to be incremented. With a little extra book-keeping, PudgyTurtle can work with such systems. All that is required is to keep track of the number of available keystream nibbles remaining in the current packet, and then to re-synchronize whenever needed-whether during mask generation, plaintext-keystream matching or an overflow event. The only constraint is that, since PudgyTurtle operates on nibbles, the packet size (in bits) must be a multiple of four.

Statistics
PudgyTurtle's encoding procedure depends upon a random process with an underlying geometric distribution: each uniformly distributed keystream nibble either 'succeeds' in matching the current plaintext nibble or 'fails' to match. One success after F failures occurs with probability g(F, p) = (1 − p) F p , where p = 5∕16 describes the five ways a match can happen between two 4-bit symbols (i.e., one exact match plus four 1-bit mismatches).
The mean of this distribution is 1∕p = (5∕16) −1 = 3.2 , which implies that 3.2 keystream nibbles (12.8 keystream bits) on average will be required to match each plaintext nibble.

Ciphertext Length
Because of the probabilistic nature of the plaintext/keystream matching process, the ciphertext length is not known exactly until after encryption. For the plaintext, L X = 4N X bits. For the ciphertext, however, bits, where N O is the total number of overflow events. Thus, the ciphertext includes (L Y ∕8) − N X 'extra' bytes due to the need to encode, on average, N O ≈ N X ⋅ p O overflow events.

Expansion Factors
The ciphertext expansion factor (CEF) can be written The key expansion factor (KEF) is the amount of required keystream as a multiple of the plaintext length. For normal stream-cipher operation, KEF = 1. For PudgyTurtle, KEF ≈ 5.2. This value is obtained by adding 3.2 (the average number of keystream nibbles required to match each plaintext nibble) to 2 (the average number of nibbles used by each mask).

Testing
These predictions were tested using three different plaintext sources: an English-language ASCII document 1 ('English'); a JPEG-formatted digital photograph 2 ('Image'); and a file entirely composed of 0x00 bytes ('Zeros'). A 1280000-byte sample of each plaintext was encrypted using Trivium [14] as the PudgyTurtle KSG, with session key 0x0123456789 ABCDEF1234 and initial value 0x6666699999 aaaaa55555. Results are shown in the left half of Table 2.
As expected, CEF ≈ 2 and KEF ≈ 5.2. PudgyTurtle ciphertext should appear random and uniformly distributed no matter what is the underlying statistical structure of the plaintext. This was confirmed using twosample Kolmogorov-Smirnov tests (right half of Table 2). Specifically, single-byte frequencies were compared between each pair of ciphertexts and also between each ciphertext and a collection of 2560012 uniformly distributed random bytes ('Random'). The non-significant p-values (Column 7) show that the ciphertexts are indistinguishable from one another, and also from random data. At this level of scrutiny, PudgyTurtle does not appear to leak information about its underlying plaintext statistics.

Time Memory Tradeoffs
Here we introduce TMDTO attacks-especially those that target stream-ciphers, and review two in detail. Readers already familiar with these ideas may wish to skip to Sect. 6.

Background
A time-memory tradeoff is a general-purpose probabilistic method for solving certain problems in cryptology and computer science, like inverting a one-way function [16]. Consider Y = E(X, key) which uses one of N possible keys to encrypt X. If no direct way to determine E −1 is known, then the cryptanalyst can instead attempt a brute-force solution.
In one such approach, the adversary chooses a likely string (e.g., X ′ = "Dear Sir or Madam:"), and encrypts this string under every possible key in advance. The resulting N pairs { key i , E(X � , key i ) } are stored in a large table. Upon intercepting the ciphertext, the actual key can then quickly be found by searching Y for any sub-string that matches an E(X � , key i ) in the table. An alternative approach assumes that the attacker knows some of the plaintext. The corresponding portion of Y is then decrypted under every possible key until the result matches this known plaintext. Either way, it would seem that brute-forcing the key requires N memory units (to store the table) or N time units (to perform the decryptions), implying that such an attack could be foiled by choosing a large-enough N. However, this need not be the case: TMDTO attacks can efficiently cover enough of the search-space that the probability of success becomes ≫ 0 while the complexity remains ≪ N [29].

TMDTO Attacks
TMDTO attacks against stream ciphers proceed in two phases: a precomputation phase, during which one or more tables are constructed from a set of randomly chosen KSGstates; and a realtime phase, during which the table(s) are searched for fragments of known keystream. Tradeoff curves involve several parameters [27]: the size of the key-space. For stream ciphers, N = |S| , the number of possible KSG-states; • P is the time required for the precomputation phase; • M is the amount of memory required to store the precomputed table(s); • T is the time required to complete the realtime phase; • D is the amount of plaintext known to (or chosen by) the attacker; For PudgyTurtle, one more parameter is also useful: • D ′ is the amount of realtime data. For binary-additive stream ciphers, D � = D . For PudgyTurtle, however, Distributed computing can improve the efficiency of many tradeoffs. These effects can be described with another parameter, W, representing the number of parallel processors [27]. For block ciphers, Hellman proposed a tradeoff of TM 2 = N 2 , using one chosen plaintext ( D = 1 ) and a very long precomputation phase ( P = N ) [26]. One reasonable point on this curve is M = T = N 2∕3 . Oechslin's 'rainbow table' method [44] somewhat improves Hellman's tradeoff and reduces its need for time-consuming disc-access operations [37].
For stream ciphers, Babbage [5] and Golić [22] independently developed the 'BG-attack', whose tradeoff of TM = N (with P = M and T ≤ D ) arises from the Birthday Paradox. Here, the point M = T = N 1∕2 appears more efficient than Hellman's N 2∕3 . However, direct comparisons can be misleading: one attack targets block ciphers, and the other stream ciphers; one uses a single chosen plaintext/ciphertext PudgyTurtle-encryptions of three different 1280000-byte plaintexts: an ASCII-formatted English-language book, a JPEG image, and a file containing only 0x00 bytes. Shown here are the ciphertext length (Column 2), ciphertext expansion factor (CEF, Column 3), and keystream expansion factor (KEF, Column 4). As expected, CEF ≈ 2 and KEF ≈ 5.2. The right half of this table (Columns 5-7) reports two-point Kolmogorov-Smirnov tests comparing the byte-distribution among the different ciphertexts, and also between each ciphertext and a file of 2560012 uniformly-distributed random bytes ('Random'). The non-significant p-values (Column 7) suggest that the ciphertexts do not statistically differ from one another, nor from random bytes pair, and the other exploits more realtime data; one uses multiple tables, and the other just one; and so on. Biryukov and Shamir adapted some of Hellman's methods to create another TMDTO attack against stream ciphers, which accounts for D in detail [12]. The tradeoff of this 'BSattack' is TM 2 D 2 = N 2 , with P = N∕D and D 2 ≤ T ≤ N . For example, one point on this curve, assuming N ≈ 2 100 , is P = T = N 2∕3 = N 66 , and M = D = N 1∕3 = N 33 . The BSattack uses many tables, all related through a simple function like bit permutation.
The goal of both the BG-and BS-attacks is to recover a KSG state. However, TMDTO attacks designed to recover the secret key/IV combination have been proposed by Hong and Sarkar [30] and discussed by Dunkelman and Keller [18]. Tradeoffs in these approaches take the general form of [27].
Another improvement in TMDTO attacks is sampling [13], whose main idea is to limit the attack to a smaller space of special points (e.g., KSG states that begin with a certain number of 0's in a row) [4]. Limiting the precomputed table to these points speeds up the realtime phase, since a tablesearch is only required when the known keystream fragment also happens to start with this string. This method offers different advantages against different ciphers: the tradeoff curve itself may change; its range of parameters may expand; and/ or practical speedups (e.g., fewer disk-access operations) may become possible [31,51].

The BG-Attack
Here we describe in detail the original BG-attack [5,22]. During the precomputation phase, M unique n-bit starting states S i are chosen. Each of these is used to initialize the KSG, after which its prefix, e(S i ) (i.e., the first n bits of keystream) is computed. The { S i , e(S i ) } pairs are then stored in a M × 2 table, sorted by prefix.
During the realtime phase, it is assumed that the adversary possesses D + n − 1 bits of known plaintext, X ′ . From these data, the known keystream, K ′ , is obtained by XOR'ing X ′ and the ciphertext at the appropriate position. Then, starting at bit-offset a = 1, an n-bit sliding window is applied to K ′ to produce a known keystream fragment, K � (a) = {k a , k a+1 , k a+2 , … , k a+n−1 } . The table is searched for any prefixes that match this fragment. If none are found, the sliding-window is advanced by one position, and the table is searched for K � (a + 1) -a process which may be repeated up to D times. If a matching prefix e(S � ) is found, then its paired state, S ′ , likely reflects the KSG at some point during encryption. If keystream obtained by seeding the KSG with S ′ correctly decrypts the relevant portion of Y into X ′ , then the attack succeeds.
The tradeoff curve TM = N suggests that the original search-space can be covered more efficiently than exhaustive search. For example, time and memory resources can be balanced by choosing N . More generally, letting M = 2 m and T = 2 t , other tradeoffs can also be made, subject to m + t ∼ n.

The BS-Attack
Biryukov and Shamir's method (the so-called 'BS-attack') expands Hellman's time-memory tradeoff for block ciphers into the realm of stream ciphers [12]. Unlike Hellman's original idea, however, which assumed a single block of chosen plaintext ( D = 1 ), the BS-attack allows attackers to take full advantage of D bits of known plaintext.
Since D may be constrained by factors external to the cryptosystem itself, it is taken as a predetermined 'given' from which the other parameters are calculated. After specifying D, the cryptanalyst next chooses m and t (explained below) which satisfy Hellman's 'matrix-stopping rule': During the precomputation phase, t/D tables are constructed, each of dimension m × 2 . The first column's entries, called 'start points' ( SP i ), are m unique, randomly selected n-bit KSG-states. The second column's entries, called 'end points' ( EP i ), are obtained by applying a function t-many times to each corresponding start-point: where f ∶ {0, 1} n → {0, 1} n is explained below. The intermediate results of this composition of functions are called a Hellman chain. To save memory, only the first and last links of each chain need to be stored, but-if required-any link can be regenerated from the start-point. Each row thus 'covers' t keys, and an m-row table covers mt keys while only requiring m ⋅ 2n bits of storage.
The function f(S) is itself composed of two other functions, e and r, where e(S) is the first n bits of keystream (the 'prefix') produced by the KSG from state S, and r changes this prefix in some simple way, like permuting its bits or XOR'ing it to a constant. Each table has a unique version of r, so During the realtime phase of the BS-attack, known keystream K ′ is split into successive n-bit fragments, K � (a) , where a = 1, 2, … , D . For each fragment, the search begins by checking whether or not r z (K � (a)) matches an end-point of any table. If no matches are found, then the adversary modifies the search-target by one application of f, and now searches the end-points for f z (r z (K � (a))) . The attacker may repeat this, searching through a so-called realtime Hellman chain by applying f up to t times. If still no match has been discovered, then the next known-keystream fragment, K � (a + 1) , is processed the same way, until all D fragments have been tried.
When a match is found, the attacker wishes to find the KSG-state, S ′ , for which K � (a) is the prefix: K � (a) = e(S � ) . The first step is to regenerate the appropriate precomputed chain. For example, suppose that the -th realtime application of f matches the i-th end-point of the z-th precomputed table: The adversary then reconstructs the i-th precomputed chain (by (t − ) applications of f z ) to 'meet' the beginning of their realtime chain ) . Finally, the desired result is the precomputed-chain state immediately preceding this one: The cryptanalyst knows that the first n bits of keystream generated by KSG-state S ′ will equal K � (a) , since the attack has been set up so that and therefore e(S � ) = K � (a) . If this new keystream correctly decrypts the message, the attack succeeds. Otherwise, a false-alarm has been discovered, and the attack continues.
The BS time-memory-data tradeoff can now be appreciated in more detail. Taking each table-search as one 'time-operation', the BS-attack requires searching t/D different tables, for one of D known keystream fragments, and repeating each search for t applications of f, so that T = (t∕D) ⋅ D ⋅ t = t 2 . For t/D tables containing m rows each, the memory requirement is M = mt∕D . Thus, from Hellman's matrix-stopping rule N = mt 2 , the BS-tradeoff is It is important to note that Biryukov and Shamir's approach does not require multiple tables. Rather, the number of tables (t/D) just factors into the tradeoff: using one table means performing the attack with a relatively bigger table and relatively shorter realtime phase, for a given D. The toy cipher used in Sect. 9 has low enough computational and memory requirements that a 'one-table' tradeoff (e.g.,

PudgyTurtle and Collision Attacks
This section describes some of the challenges associated with TMDTO attacks against PudgyTurtle. As we have seen, what the adversary has is known plaintext X ′ , but what the attacker actually needs is known keystream, K ′ . This observation reveals two (sometimes unstated) assumptions behind TMDTO attacks: • Known plaintext equals known keystream. With binaryadditive stream ciphers, K ′ can be easily obtained by XOR'ing the intercepted ciphertext with X ′ . • Known keystream is contiguous, or at least predictably spaced [27]. TMDTO attacks involve successively applying a sliding window to K ′ , thereby obtaining targets to search for within the precomputed table(s). It is assumed that an n-bit window produces n bits of useful data.
With PudgyTurtle, neither assumption holds. First, because the plaintext-keystream interaction during encoding is probabilistic, a single known plaintext-ciphertext pair is consistent with many different keystreams. Second, because some keystream nibbles are skipped during encryption, each keystream fragment contains irregularly spaced gaps of data which remain unknown to the attacker. By making it harder to obtain K ′ from X ′ , the realtime phase of TMDTO attacks against PudgyTurtle becomes more difficult. Central to this discussion is the idea that a particular (X, Y) pair can be consistent with many different keystreams. To see how this is possible, recall from Table 1 that plaintext 0x48 ("H" in ASCII) produced codewords {0x10, 0x04} and ciphertext 0x4327 under keystream 0x53B14230.
Yet, this same ciphertext could also have resulted from different encodings of plaintext 0x48 under different keystreams. For example, the keystream {K 1 , K 2 , , K 4 , K 5 , } exactly matches each plaintext nibble on the first attempt (i.e., no failures), making two 0x00 codewords. Therefore, if the masks ( K 1 ‖K 2 ) and ( K 4 ‖K 5 ) were chosen to be the same as their corresponding nibbles in the original ciphertext (i.e., keystream {4,3,4,2,7,8}), then Y will also be 0x4327: Similarly, if a nibble within one mask happened to be one bit off, then the same ciphertext would still result if its corresponding discrepancy code was also one bit off. This might happen, for instance, if the second mask ( K 4 ‖K 5 ) was 0x26 instead of 0x27, and the second discrepancy code was 001 2 instead of 000 2 -meaning that K 6 would be X 2 ⊕ 0001 2 = ⊕ 0001 2 = 1001 2 = instead of 0x8. Thus, keystream {4,3,4,2,6,9} would also produce the same ciphertext: Y 2 would remain 0x27, but would be calculated as: Many other keystreams would also produce this same ciphertext.
A few examples are given in Table 3.

Tentative Keystream
TMDTO attacks against PudgyTurtle may have to contend with many possible keystreams, rather than just the single 'known keystream' required to attack a BASC. Here, we describe how the adversary builds a set of tentative keystreams from the intercepted ciphertext, the known plaintext, a hypothesized encoding, and something called the 'verified sequence'. These tentative keystreams become the input to our new TMDTO attacks against PudgyTurtle. The Model. Tentative keystreams are based on different models of how X ′ is encoded. Ignoring overflow events for now, each model, C j ⊂ C , is a collection of codewordsone failure-counter and discrepancy-code for each nibble of known plaintext: where j = 1, 2, … , |C| . We emphasize that F j i and D j i are just guesses, not necessarily the actual failure-counter ( F i ) and discrepancy code ( D i ) that produced Y i from X ′ i . To specify a particular model, C * , the components of each codeword, F * i ∈ {0, 1, 2, … , 31} and D * i ∈ {0, 1, 2, 3, 4} , can either be chosen randomly or taken from a list-perhaps ordered by probability of occurrence. Since discrepancycodes are equiprobable, the probability of any model can be ranked according to the product of the probabilities of its f a i l u r e -c o u n t e r s : . As a specific example assuming that X ′ is 4 nibbles long, we use the following randomly chosen model , and so on. Verified sequence. Given our model C * from above, the next step is to build its verified sequence, V * . This sequence marks which nibbles of the tentative keystream can be predicted by the model, and which ones remain unknown (i.e., keystream nibbles that would have been skipped-over and discarded during encoding because they failed to match a plaintext nibble).
Specifically, V * i = 0xF = 1111 2 if the i-th tentative keystream nibble can be predicted by the model, and V * i = 0x0 = 0000 2 otherwise. This means that each nibble of X ′ adds three 0xF nibbles to V * : two coinciding with the mask, and one positioned where the keystream nibble would have matched the plaintext nibble. The number of intervening 0-nibbles corresponds to the failure-counter. For example, if some failure-counter in the model were 3, its corresponding representation in V * would be ...FF000F.... Similarly, a failure-counter of 0 would correspond to … … in V * , and so on. Thus, the Hamming weight of the verified sequence is The verified sequence for our model C * is shown below, where #'s mark the verified-sequence nibbles associated with each mask, F * shows the progression of each failurecounter, and *'s mark the verified-sequence nibble associated with each plaintext-keystream match: ('verified'), and 4 remain unknown. Different models would produce other tentative keystreams, a collection of which become inputs for the realtime phase of our TMDTO attacks against PudgyTurtle.

TMDTOs and PudgyTurtle
After clarifying some terminology, we describe two new TMDTO attacks against PudgyTurtle (i.e., 'modified' versions of the BG-and BS-attacks), and also discuss how these new attacks differ from their original counterparts.

Terminology
During the realtime phase of our attacks, a hit refers to any instance in which an n-bit fragment of realtime data (tentative keystream) matches an entry in the second column of the precomputed table. Every hit falls into one of two categories: high-quality and spurious. High-quality hits are cryptographically significant events and must therefore be investigated further via a test-decryption. Spurious hits, on the other hand, occur by chance and may therefore simply be ignored.
Filling in. Finally, the tentative keystream K � * is created by filling in the non-zero elements of V * . To illustrate this process, assume that the known plaintext X ′ is "Hi" (ASCII 0x4869) with corresponding ciphertext 0xEE7D22C3.
Since the first ciphertext byte Y 1 = 0xEE is made by XOR'ing first codeword ( F * 1 ‖D * 1 ) with the first mask ( K � * 1 ‖K � * 2 ), the attacker knows that , the cryptanalyst deduces that one tentative keystream nibble ( K � * 3 ) was skipped because it failed to match X ′ 1 , and that next tentative keystream nibble ( K � * 4 ) matched X ′ 1 to within one bit. Its value can therefore be filled in by inverting discrepancy code D * 1 = 3: At this point, the first codeword has been used to fill in the first 4 nibbles of tentative keystream K � * = E, 5, ?, 0 … , where ? represents the 'unknown' nibble corresponding to V * 3 = 0 . The complete tentative keystream (shown in the final row of the diagram below) can be constructed by continuing this pattern. 1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 To summarize, we have described how the adversary builds a tentative keystream from one particular 4-codeword model, in conjunction with a 4-nibble known plaintext and its corresponding 4-byte ciphertext. This particular tentative keystream contains 16 nibbles, of which 12 can be predicted Each high-quality hit has one of two outcomes: a valid hit leads to a correct decryption of some portion of Y into X ′ ; a false-alarm, on the other hand, means that the testdecryption was not correct.
If any table contains one or more valid hits, the attack is deemed a success; otherwise it's a failure. Failures happen either when there are no high-quality hits (i.e., all hits are spurious) or when all high-quality hits are false-alarms.

Hamming-Weight Threshold
During the normal BG-and BS-attacks, all hits are assumed to be high-quality; none are ignored. During our new TMDTO attacks, however, spurious hits abound. The reason for this that an 'unknown' nibble from the tentative keystream fragment (marked as '?' in the illustrations in Sect. 6.1) could theoretically match any similarly located nibble in the precomputed table.
Fortunately, a simple technique allows the attacker to distinguish high-quality hits from spurious ones. A Hamming-weight threshold, ≤ n , is applied to each n-bit fragment of the verified sequence. If the Hamming-weight of a verified-sequence fragment is ≥ (i.e., if it has fewer than n − 'unknown' bits), then any hits discovered while investigating its corresponding n bits of tentative keystream are defined as high-quality. Using avoids many unnecessary test-decryptions, thereby speeding up our TMDTO attacks considerably.
is chosen to reduce the number of total hits down to a more reasonable number of high-quality hits, subject to constraints imposed by the attacker's processing power. In practice, appears to generate an adequate number of high-quality hits while also allowing reasonably fast computation. In the numerical experiments below (Sect. 9), choosing in this range produced roughly 200-400 high-quality hits per model, with successful attacks taking ∼ 1 day to 1 week on off-the-shelf laptops with Intel ® I3/I5-generation processors. We emphasize that the exact value of this parameter is not crucial: could always be chosen as n/2, for instance, if the attacker is willing to wait somewhat longer for their results.

First TMDTO Attack
Our first attack is inspired by the Babbage-Golić method, but differs substantially by its use of multiple keystream models containing partly-unknown data.   (a) Set a, the offset of an n-bit sliding-window, to a = 1; (b) Apply the sliding window to V j , producing the n-bit verified sequence fragment V j (a); (c) If h(V j (a)) < , then assume that any hits will be spurious. Instead of searching the table, set a ← a + 1 and return to 4(b); (d) Assuming V j (a) ≥ , apply the sliding window to the tentative keystream to produce n-bit fragment, K �j (a); (e) Bit-by-bit multiply K �j (a) by V j (a) . We denote this by K �j (a) ⊗ V j (a) and refer to the result as the verified tentative keystream fragment. This ensures that any bits left 'undefined' in software implementations are actually set to 0; (f) Adjust each prefix in the precomputed table the same way, thus creating M verified prefixes e(S i ) ⊗ V j (a); (g) Search the table for any verified prefixes that match the verified tentative keystream fragment. For simplicity, we imagine a sequential (row-by-row) search; (h) If a matching verified prefix ("high-quality hit") is found, then use its paired state ( S ′ ) to perform a testdecryption: • Loadthe KSG with S ′ ; • Generate enoughkeystream to decrypt Y into asmuch of X ′ aspossible. For standard stream-cipher operation, decryption is assimple as XOR'ing this newly generated keystream with Y (starting at offset a). For PudgyTurtle, however, somecomputational effort is required to determine the 'firstdecryptable' byte of Y, asdescribed in Sect. 7.3.1. • If the test decryption matchesX ′ ,then a valid hit has been discovered: label the attack a success andSTOP. • If thetest decryption does not match X ′ ,then a falsealarm has occurred: return to Step 4(g) to continue-SN Computer Science searching for the same verified tentative keystream fragment,starting from next row of the table; (i) If no high-quality hits were discovered, or if they were all false-alarms, then try again with the next tentative keystream fragment: set a ← a + 1 and go to 4(b). This can be repeated up to D ′ times, where D � = KEF × D ≈ 5.2D.

5.
If the entire tentative keystream has been searched without finding any valid hits, then repeat the whole process with a new model: set j ← j + 1 and return to Realtime phase, Step 1;

Adjusting the State
The output of this attack consists of a putative KSG-state S ′ and the bit-offset, a, of its corresponding tentative keystream fragment, K � (a) . A test-decryption must then be performed to determine whether keystream produced from S ′ yields a valid hit or a false-alarm. For the standard attacks against a BASC, test decryptions are simple: generate keystream starting from S ′ , XOR this new keystream to the ciphertext at bit a, and then compare the result to the known-plaintext-also starting from a. With PudgyTurtle, however, test decryptions are more complicated. Since PudgyTurtle operates on nibbles (not bits) and 'skips' some of the keystream, the cryptanalyst cannot simply decrypt starting from bit a, but must instead determine A, the offset of the 'first decryptable' ciphertext byte.
To simplify notation, we drop the superscript j and just consider the model currently being analyzed. Define Z(i) as the number of keystream nibbles required to encode/encrypt the plaintext {X 1 , X 2 , … , X i } up to its i th nibble, using the model {C 1 , C 2 , …} = {(F 1 ‖D 1 ), (F 2 ‖D 2 ), …}: The '3' accounts for the extra 0xF's in each verified sequence fragment (e.g., F 1 = 2 would correspond to verified sequence fragment 0xFF00F, and therefore Z(i) would equal F 1 + 3 = 5 , not just two). Thus, encoding/encrypting the current plaintext nibble, X i , uses the keystream fragment and encrypting the next plaintext nibble, X i+1 , uses keystream starting from nibble K Z(i)+1 .
Since bit a corresponds to nibble ⌊a∕4⌋ , the new index, A, is found by calculating the smallest Z(A) such that Z(A) ≥ ⌊a∕4⌋ . Practically, it is convenient to decrypt the ciphertext into 'full' plaintext bytes only. For example, the second plaintext byte ( X 3 ‖X 4 ) is produced by decrypting the ciphertext from byte Y 3 . Decryption starting at Y 4 , however, would produce only half of this plaintext byte, which may cause practical difficulties when comparing files. For this reason, we only allow test-decryptions to start from oddnumbered ciphertext bytes: if A turns out to be even, increase it by one.
After this adjustment process, ciphertext starting from byte Y A can be decrypted into plaintext starting at nibble X A (i.e., plaintext byte (A + 1)∕2 ), using keystream This keystream is produced by loading the KSG with S ′ and then updating the state 4(Z(A) + 1) − a times before generating any keystream.

Comparison to Traditional BG-Attack
Although this new attack broadly resembles the method of Babbage and Golić, there are several important differences. First, our new attack requires multiple 'tentative keystreams' rather than a single known keystream. Second, our attack allows for unknown bits in each table-search, something not necessary for the traditional BG-attack. Third, our attack includes a new parameter (Hamming-weight threshold ) to reduce spurious hits, which are not a significant problem for the original BG-attack. Fourth, once a hit is found, our method requires further adjusting the KSG-state before each test-decryption, which is also not needed during the standard BG-attack. Fifth, sorting takes longer in our attack. Even though there is only one table, each prefix in its second column must be bitwise multiplied by V j (a) ('verified') before comparison with the current tentative keystream fragment. This operation changes the values of, and therefore the sorted order of, these prefixes each time. Thus, if sorting is used, the table must be re-sorted with each new application of the sliding window. (Alternatively, it can be left unsorted, and searched row-by-row.) Finally, our attack's table-search procedure is more involved. Each prefix in the table is unique, but each verified prefix need not be. A binary search returns an index of a matching element but not necessarily a particular index of a repeated matching element. For example, suppose that KSG-state S w is correct, and that the table contains three different prefixes e(S u ) ≠ e(S v ) ≠ e(S w ) which become identical once they are verified: e(S u ) ⊗ V j (a) = e(S v ) ⊗ V j (a) = e(S w ) ⊗ V j (a) = P . A binary search for P might return e(S u ) or e(S v ) instead of e(S w ) , incorrectly leading the cryptanalyst to dismiss the hit as a false-alarm after a failed test-decryption using states S u or S v . The cryptanalyst must therefore check whether or not the prefix associated with each high-quality hit is unique and, if not, also perform test-decryptions using KSG-states associated with any other identical verified prefixes. (Alternatively, the attacker can do a simple sequential search through the whole table for each K � (a) fragment, as mentioned above).

TMDTO Attack #2
Our second TMDTO-attack is inspired by the method of Biryukov and Shamir [12,26]. However, building Hellman chains from an initially uncertain state produces various difficulties not seen in the classical BS-attack, as discussed below.

Variant Keystream Fragments and Tentative Hellman Chains
One way to modify the BS-attack to deal with 'unknown bits' is to use many realtime Hellman chains instead of just one. These chains are constructed from variants of each tentative keystream fragment. If fragment K � (a) has u unknown bits, then it will have 2 u variants, denoted KV 0 (a) , KV 1 (a) , … , KV 2 u −1 (a) . During the realtime phase of our new attack, each variant initiates its own Hellman chain.
The i-th variant of K � (a) is constructed by replacing each of its unknown bits with one bit from the binary expansion of i. During the realtime phase of our attack, these variant keystream fragments are then used to create the initial links of 2 u tentative Hellman chains, as illustrated in Fig. 1. From each variant, a (t + 1)-link chain is made by defining the 0-th link of the i-th chain as H i [0] = r(KV i (a)) and each subsequent link as

Modified BS Attack
Here we describe our second TMDTO attack (a modified version of the Biryukov-Shamir attack), which uses variant keystream fragments and tentative Hellman chains.

PARAMETER SELECTION
Given N = 2 n (the state-space size) and D (the quantity of known plaintext), choose m and t such that N = mt 2 , and choose Hamming-weight threshold ≤ n;  Note: We drop the v (table) subscript below for convenience, but emphasize that all steps occur for each table; REALTIME PHASE 1. As in the modified BG-attack, choose a model C j , and determine its verified sequence V j and tentative keystream K ′j . 2. For each bit-offset a = 1, 2, … , D � , extract an n-bit fragment of both the verified sequence and the tentative keystream, denoted V j (a) and K �j (a) respectively. As before, 8. If a match IS found (a 'high-quality hit'), determine the desired KSG-state, S ′ , as follows. For concreteness, assume that the 42-nd endpoint (row) of the ⇒ If this decryption matches the known-plaintext starting at X A ('valid hit'), then the attack succeeds. If not, return to Realtime Step 7 and continue searching.

Comparison to Traditional BS-Attack
Compared to the original BS attack, our modified attack employs multiple realtime Hellman-chains instead of just one. Otherwise, this modified attack differs from the standard BS-attack in mostly the same ways that the modified-BG attack differs from its original counterpart: (1) it uses models to generate tentative keystreams containing 'unknown' bits, leading to many spurious hits, which in turn must be rejected by the inclusion of a Hamming-weight threshold parameter-none of which apply to the standard BS-attack; and (2) the KSG-state, once discovered, must be adjusted before attempting a test-decryption, unlike the original BS-attack. One similarity with its original counterpart, however, is that sorting the precomputed table(s) helps. In our modified attack, end points ( EP i ) do not need to be 'verified' (i.e., bitwise multiplied by the verified sequence fragment), as they do in the modified BG-attack. In essence, tentative Hellman chains fix the problem of unknown bits. Thus, quick-sort and binary-search techniques will speed up this attack just as they would the original BS-attack, and more dramatically than the modified BG-attack.

Quantifying the New TMDTO Attacks
How do our new TMDTO attacks compare to the standard BG-and BS-tradeoffs? Since the precomputation phase of these attacks are similar to their original counterparts, we neglect this phase and instead focus on the realtime duration of each attack. T stands for the number of realtime operations, where a 'time operation' is defined as either one table-search or one test-decryption. Normally, test-decryptions are ignored, since only one is required (or perhaps just a few) [28]. With PudgyTurtle, however, the abundance of 'unknown' bits means that most test-decryptions produce false alarms, and therefore should be counted. This parameter can be expressed as where • N searches is the number of table-searches performed per model; • N decrypts is the number of test-decryptions per model; • P valid is the probability that a model yields a valid hit (i.e., successful test-decryption). Specifically, where N valid is the observed number of valid hits and N trials equals the number of tables used multiplied by the number of models tested per table. Normally, N valid = 1 , since an attack stops once success is achieved. In these experiments, however, some attacks are allowed to run through a predetermined number trials, possibly producing > 1 valid hit.
The standard BG and BS tradeoffs have time-parameters T BG = N∕M and T BS = N 2 ∕(M 2 D 2 ) . We compare these benchmarks to T BG and T BS , which represent the number of realtime operations actually observed during the numerical experiments below.

Implementing the TMDTO Attacks
Here, we launch two new TMDTO attacks against PudgyTurtle and discuss their performance in a variety of situations. For the first several attacks, the KSG will be a 24-bit maximal-period, nonlinear feedback shift register (NLFSR) [17], with the following specifications: • Initial state S 0 is 0xAAA AAA = 1010... We emphasize that this is not intended to be a secure KSG, but only a 'toy' cipher for illustrative purposes. Its small key-space of N = 2 24 = 16777216 makes for efficient computations (e.g., using the standard tradeoff parameters like √ N = 4096 or N 1∕3 = 256). For simulations requiring larger-sized KSG's, we use linear feedback shift registers (LFSRs) instead of nonlinear ones. The reason for this is pragmatic: maximal-period NLF-SRs are difficult to find, and Dubrova's well-known source only goes up to n = 25 [17]. Obviously, there are easier ways to break LFSR-based ciphers than a TMDTO attack, but again these examples are for explanatory purposes only.

Modified BG Attack
Below are results of the first new TMDTO attack against PudgyTurtle.

Experiment 1: Contrived Plaintexts
This experiment is designed to confirm the general feasibility of our approach. Modified BG-attacks are performed against two 'contrived' plaintexts, which have been specifically tailored to bias the results toward success by limiting the number of unknown bits: • The "EVERY-3" plaintext is constructed by taking every third nibble of the actual keystream. This forces each  1 and 3); or √ N bits of verified tentative keystream (Rows 2 and 4); or √ N bits of total tentative keystream (Rows 2 and 5). Note that for EVERY-3, there is no difference between the second and third assumption. Each attack used a single tentative keystream model and 1000 precomputed tables. In these contrived scenarios, success was common (> 600/1000 tables), and successful tables contained > 1 valid hit. Unsurprisingly, the attacker enjoyed more success when granted more realtime data (Rows 1 & 3 vs. Rows 2, 4, & 5). False-alarms occurred in both scenarios, but became noticeably more frequent when the known keystream contained unknown bits (rightmost column, lower vs. upper section of codeword to be 0x00 and every nibble of the verified sequence to be 0xF; • The "EVERY-3-OR-4" plaintext forces every codeword to be either 0x00 (i.e., an exact match on the first attempt) or 0x08 (i.e., one failure followed by an exact match). This is accomplished by comparing the two keystream nibbles after each mask. If they differ by more than 1 bit, the second one is taken as the plaintext nibble, producing codeword 0x08 and adding 0xFF0F to the verified sequence. If they are within ≤ 1 bit of each other, then the first one is taken instead, producing codeword 0x00 and adding 0xFFF to the verified sequence. The net result is a verified sequence with mostly 0xF's and some 0x0's.
This experiment also examines the question, "how much realtime data is there?" The usual TMDTO attack against a BASC grants the attacker D bits of known plaintext, which is assumed to also mean D = L X � bits of known keystream. Since PudgyTurtle is not a BASC, however, its L X ′ bits of known plaintext becomes L K ′ bits of tentative keystream, of which only h(V) are known (i.e., 'verified' as corresponding to a 1-bit in V), such that L X � ≤ h(V) ≤ L K � . So, does " D ′ bits of realtime data" mean that the adversary has D � = D = L X � bits of known plaintext, or D � = h(V) bits of verified-sequence, or D � = L K � bits of tentative keystream? Although the answer is open to interpretation, attacks are performed under each of these assumptions. Table 4 shows modified BG-attacks against both contrived plaintexts, with different values fixed at ≈ √ N bits (for technical reasons, this may be 4096 or 4104). The value that is fixed is • L X � ≈ 4096 , in Rows 1 and 3; • h(V) ≈ 4096 , in Rows 2 and 4; • L K � ≈ 4096 , in Rows 2 and 5.
(Note: For EVERY-3, h(V) = L K � , so Row 2 works for both assumptions). Each row shows the result of a modified BGattack using one model and 1000 different precomputed tables. Columns 1-3 show the relative sizes of X ′ , h(V), and K ′ , with the fixed value in boldface. Column 4 shows the number of successes. The probability of success increases with more realtime data, being highest for Rows 1 and 3 (i.e., when D = 4096 and D � = L K � = 12, 288 ). Column 5, the average number of valid hits per success, illustrates that a single table may contain multiple valid hits. Falsealarms (Column 6) occur occasionally even when the verified sequence is all 1's (EVERY-3), but become much more likely when the verified sequence contains even a minimal number of unknown nibbles (EVERY-3-or-4).
For all subsequent experiments, we assume that the attacker has D bits of known plaintext and D � ≈ 5.2D bits of realtime data (tentative keystream)-a conservative assumption most advantageous to the adversary.

Experiment 2: Hamming-Weight Threshold
Since the previous experiment used contrived plaintexts which exactly (or nearly) matched the original keystream, all hits were taken to be high-quality rather than spurious. When the plaintext and model are unrestricted, however, spurious hits become more likely. This experiment shows how different values of Hamming-weight threshold reduce Table 5 The Hamming-weight threshold The modified BG-attack was performed using a range of thresholds ( ) for distinguishing high-quality hits from spurious ones. Each attack used the same precomputed table, the same 250 models, a 24-stage NLFSR as the KSG, and assumed that the attacker knows 4096 bits of 'English' plaintext. For each threshold in Column 1, the corresponding number of total (Column 2), high-quality (Column 3, averaged over 250 models), and valid (Column 4) hits are shown. Lower threshold values ( = 10-12) do not improve efficiency much-thousands of test-decryptions are still required for each model. Mid-range values ( = 14-16) improve efficiency by reducing the number of high-quality hits (and test-decryptions) while still achieving success. Higher values ( ≥ 18 ) reduce success-so few high-quality hits are obtained overall that finding any valid hits among them becomes unlikely. In practice, choosing so as to produce several hundred high-quality hits afforded a reasonable balance between an attack's computational cost and its likelihood of success the total number of hits to a reasonable number of 'highquality' hits.
Modified BG-attacks were performed against encrypted English using the same precomputed table and same 250 randomly-chosen models, but a different each time. As shown in Table 5, many values of can still produce successful attacks, even with substantially fewer high-quality than total hits (Column 3 vs. Column 2). Making too small slows down the attack (i.e., more high-quality hits occur than are needed for success), but making too big risks missing a valid hit (e.g., when ≥ 18, there are too few hits overall for success). Attackers choose pragmatically, balancing computational resources against the number of high-quality hits (Sect. 7.2). In our experiments, for example, = 15 works well for n = 24. Table 6 shows the modified BG-attack carried out against each of the three plaintexts from earlier (English, Image, and Zeros), with one table, 1000 models, n = 24, D = 4096, and = 15. The success rate, P valid ranged from 0.2 to 1.6%. No successful attack produced more than 1 valid hit, but all had ≈ 270 high-quality hits (i.e., false-alarms).

Experiment 3. How Successful is TMDTO Attack #1?
How Dividing by the probability of success, we estimate the number of realtime operations T = (N searches + N decrypts )∕P valid to be or 528875 ≤T ≤ 4231000, which exceeds T BG = 4096 by more than 100-fold.

Experiment 4. Scaling the Modified BG-Attack
Experiment 3 suggests that our modified BG-attack requires more time than predicted by the original BG-attack. Is this result simply a fluke for n = 24, or does it apply to other state-sizes? To address this issue, we repeated the attack for several different values of n, using LFSRs for n > 25 as mentioned earlier. Also in this experiment, N searches and N decrypts were counted rather than estimated.
The upper section of Table 7 shows the results.  The r-function used for Hellman chains was simply the KSG-state XOR'd to the least-significant n bits of a constant, where R 0 = 0x5075646779547572-sixty-four bits representing the letters "PudgyTur" in ASCII.
For convenience, each attack was carried out using only one precomputed table at a time (i.e., t∕D = 1 ), so that D = t ≈ m ≈ 2 n∕3 . However, an attack could be repeated several times with new tables, as summarized by the N trials parameter. Results are shown the lower section of Table 7. Again, as in Experiment 4, we observed that T BS ∕T BS > 1 (Column 7).
Note that the n = 28 attack did not succeed within the prespecified number of trials. In this case, we reported N valid as < 1 and provided a lower-bound on T BS (i.e., if the attack had continued until getting a valid hit, the success probability would be smaller and T BS would be higher). These findings appear robust to variations in P valid : even if this probability was ∼ tenfold higher than observed, ratios in Column 7 would still exceed 1.

Limitations of PudgyTurtle
Despite its improved resistance against TMDTO attacks, PudgyTurtle also has some drawbacks related to short plaintexts, side-channel attacks, and variable time and space requirements.

Plaintext-Ciphertext Mismatch
Length differences between a very short plaintext and its ciphertext could potentially leak one byte of keystream.
Consider a one-nibble (4-bit) plaintext X = X 1 . If the ciphertext is observed to be 2 bytes instead of just one (i.e., Y = Y 1,1 ‖ Y 1,2 ), then the adversary will know that one overflow event has occurred, and that the keystream has the following structure: The attacker can thus recover the first keystream byte by computing (K 1 ‖K 2 ) = Y 1,1 ⊕ 0xFF. If, in addition to knowing the plaintext length, the attacker also knows the value of X 1 , then more information can be inferred. Specifically, the Hamming distance between X 1 and K j (for j > 2 ) must be > 1, except for K 35 and K 36 (which are unrestricted) and K t(1) (whose Hamming-distance is ≤ 1 from X 1 ). This reduces the number of possible keystreams in an exhaustive search from 16 t(1)−2 down to 11 t(1)−5 ⋅ 16 2 ⋅ 5 ≈ 0.008 ⋅ 2 3.46×t (1) . On a practical note, however, even if t(1) takes its smallest possible value of 37, this still leaves 2 121 possibilities.
What about slightly longer plaintexts? Consider a 4-nibble (2-byte) plaintext which produces the 5-byte ciphertext 0xAABBCCDDEE in Table 8. Realizing that an overflow event has occurred, the adversary's goal is to determine the identity and position of one byte of keystream, KB a = (K a ‖K a+1 ).
There are four equi-probable ways (lower section of Table 8) for five ciphertext bytes to represent a 4-codeword message containing one overflow event. As shorthand, c i denotes the final byte of the codeword with the overflow event: In Case #1 (i.e., when F 1 ≥ 32 ), the attacker knows that KB 1 = ⊕ . In Case #2 (i.e., when F 2 ≥ 32 ), the attacker could surmise that KB a = ⊕ occurs at one of 32 geometrically-distributed locations in the keystream, since C 1 could encode any of 32 possible failure-counters. Similarly, for cases #3 and #4, KB a could be in any of 64 or 96 different positions, respectively. The probability of guessing KB a declines as the message gets longer and as a moves farther away from the beginning of the keystream. For example, assuming that the third codeword, C 3 = ( ‖c 3 ) encodes the overf low, the probability of correctly guessing that KB a equals where, as before, g(F, p) is the geometric distribution with F failures and p = 5/16, and a = 1 + (F 1 + 3) + (F 2 + 3) . For our short message, this probability ranges from as little as Pr(KB 69 = ) ≈ 8 × 10 −12 (when F 1 = F 2 = 31 ) to as much as Pr(KB 7 = ) ≈ 0.0976 (when F 1 = F 2 = 0). Importantly, this seemingly high value (0.0976) actually represents the conditional probability Pr(KB 7 = 0xCC, given that [ C 3 contains the overflow] AND [one overflow occurs in 4 encodings]). The true probability, including the a priori chance of both conditions, is actually only where is the probability of a 'no overflow' encoding.
For the general case of a N X -nibble plaintext producing a (N X + 1)-byte ciphertext, the probability of guessing KB a 's identity and location is: We performed this calculation for various N X , with the following results:  The middle column is an unlikely 'best-case' scenario in which every failure-counter happens to be 0; the rightmost column represents the 'typical' scenario, obtained by 1000 simulations of randomly-chosen, geometrically-distributed failure-counters. As can be seen, the probability diminishes as messages get longer and when failure-counters are chosen realistically.
To summarize, we have quantified the probability that an attacker could guess a single keystream byte given the knowledge that a N X -nibble plaintext has been encrypted into a ( N X + 1)-byte ciphertext. In practice, though, this particular information (i.e., 8 bits of a long keystream) may be of little use in breaking any stream-ciphers with adequate state-size currently in widespread use.

Side-Channel Attacks
If implemented straightforwardly, PudgyTurtle encryption exhibits data-dependent execution times, which could expose it to a timing-based side-channel attack [34]. By comparing timing differences during the encryption of each plaintext nibble, it might be possible to determine F i , the failure counter that encodes plaintext nibble X i . Knowing F i , the codeword C i only has 5 possibilities instead of 32 × 5 = 160. In this case, the maximum number of models to test during a collision attack (i.e., |C| = 5 N � X ) might become small enough to fully enumerate. Even so, practical difficulties are still significant: a 16-byte (32-nibble) plaintext would produce |C| > 2 64 possibilities.
Standard countermeasures against timing attacks include constant-time execution, blinding, and chunking. Constanttime execution is difficult to achieve in practice, difficult to maintain (i.e., unpredictable changes may occur with CPU firmware updates), and difficult to implement without a performance penalty. Blinding incorporates a random element into encryption so that the execution-time becomes uncorrelated with the plaintext or key, but also adds complexity to the algorithm. Chunking (also called bucketing) breaks a large, variable-length computation into fixed-length pieces which are then returned at predetermined points in the execution cycle [35].
Chunking may be the most appropriate way to harden PudgyTurtle against timing attacks. One idea, for example, would be to always generate the same-sized 'chunk' of keystream for each plaintext nibble, thus making execution time independent of the failure-counter. Each encryption cycle would then proceed as follows, assuming X i is being Timing attacks are exquisitely dependent on the specific hardware and software used to implement the cryptosystem. Carrying out such an attack is beyond the scope of this paper, and results would in any case be limited to one particular set of implementation choices. We suggest this as an important topic for future research.
The 'chunking' approach described here would keep KEF and CEF the same, but would take longer to encrypt each message. Specifically, let t g be the time needed to generate one keystream nibble; t h the time required to calculate the Hamming-distance between two 4-bit numbers; and t d the time it takes to calculate a discrepancy code. On average, PudgyTurtle needs PT = 5.2t g + 3.2t h + t d time to encrypt each plaintext nibble. With chunking, each encryption cycle would need 34 keystream nibbles, 32 Hamming-distance calculations, and 32 discrepancy code calculations, requiring time PT chunk = 34t g + 32t h + 32t d .
Keystream generation takes longer than calculating Hamming weights or discrepancy-codes, since the latter two operations could be accomplished by small-sized table lookup. Thus, we assume that t h = t d = x and t g = x , where > 1 . From this, it follows that Although chunking introduces an execution-time penalty, it actually works better as the gap widens between t g and t h (or t d ). For example, PudgyTurtle with chunking would run about 8.4 times slower when = 2 ; but only 7 times slower when = 10 ; and just 6.6 times slower when = 50 -approaching the limiting value of PT chunk = 6.54 ×PT . The exact value for , of course, depends on hardware and software implementation details.

Variability
Overflow events make it impossible to know the exact ciphertext length until after encryption. For situations involving fixed-length message fields, PudgyTurtle's variable output-size may be problematic. One solution would be to allow space within a fixed-length string for either overflow events or padding. For example, if L bits of plaintext produce 2L + b bits of PudgyTurtle ciphertext on average, users could agree to use a fixed data-block size of, perhaps, 2L + 2b . In the extremely rare case that this L-bit message required more than 2b overflow bits, it would have to be rejected and reencrypted with a different key; otherwise, any of the 2b bits not used to encode overflows would become padding (either a predetermined pattern or random bits). Not only is the ciphertext length variable, but so is the total time needed for encryption. For many encryption applications (e.g., email, file storage), this may not be problematic. For high-throughput, low-latency applications, however, small fluctuations in the duration of the 'crypto' component could potentially degrade overall system performance.

Conclusions
PudgyTurtle is a way to implement keystream-dependent, variable-length encoding of plaintext before stream-encryption. In some ways, it resembles an encryption mode for stream ciphers, in that its goal is to work along with existing systems. PudgyTurtle is less efficient than normal streamcipher operation: it produces about twice as much ciphertext and requires about five times as much keystream. However, it is also more robust against TMDTO attacks.
The cryptographic literature contains other approaches aimed at making TMDTO attacks against stream ciphers harder, such as using error-correcting codes [32,33,40,42] and certain encryption modes [23,24]. PudgyTurtle differs from ECC-based systems in that it is not a randomizedencryption protocol, and does not require an external noise source. It differs from other stream cipher modes by focusing on how keystream is used, rather than on the state-update, re-synchronization, and initialization procedures.
Modified versions of the well-known Babbage-Golić and Biryukov-Shamir time-memory-data tradeoff attacks are proposed and tested against PudgyTurtle, and the extra work required to cope with multiple 'tentative keystreams' and to reject false-alarms is quantified. For toy-cipher KSGs with inner-states of up to 32 bits, our experiments suggest that the number of realtime operations required for TMDTOs against PudgyTurtle exceeds those predicted by the standard BG-and BS-tradeoffs.
Of the two TMDTO attacks against PudgyTurtle, the modified BG-attack did 'better' than the modified BS-attack (i.e., T BG ∕T BG exceeds 1 by less than T BS ∕T BS does). This is of interest since the BS-attack is considered to be somewhat more complex. While both new TMDTO attacks require more work than their traditional counterparts, the modified BS-attack also includes an 'extra' work-factor (multiple tentative Hellman chains) not present in the modified BGattack. This scales up the number of table-searches from t 2 to t 2 × 2 u , where 2 u is the average number of tentative Hellman chains per model. This translates to factors of 112.6 (n=20; = 12 ), 111.7 ( n = 24 ; = 15 ), and 125.5 ( n = 28 ; = 19 ), explaining at least some of the relative inefficiency of this attack.
Stream cipher security depends largely upon the details and state-size of the underlying KSG. Since PudgyTurtle works alongside existing ciphers, it is cipher-agnostic: we do not recommend any particular KSG (cipher) over any other. If PudgyTurtle makes TMDTO attacks harder, it then becomes tempting to consider reducing KSG state-sizes. However, we suggest that this is premature. TMDTOs are just one cryptanalytic attack among many, and security against this approach does not imply security against all others. PudgyTurtle itself, or a cipher with which it is used, may still be susceptible to other (non-TMDTO) methods of cryptanalysis. Therefore, we suggest a conservative approach until more research into breaking PudgyTurtle exists: maintain the state-sizes currently specified for existing KSG's, even when using PudgyTurtle.