# Open Problems on With-Carry Sequence Generators

## Abstract

Pseudorandom sequences are used in a wide range of applications in computing and communications, including cryptography. It is common to use linear feedback shift registers (LFSRs) to generate such sequences, either directly or as components in more complex structures. Much of the analysis of such sequences is done using the algebra of polynomials and power series over finite fields. The subjects of this chapter are feedback with carry shift registers (FCSRs) and algebraic feedback shift registers (AFSRs, generalizations of both LFSRs and FCSRs), sequence generators that are analogous to LFSRs, but whose state update involves arithmetic with a carry. Their analysis is based on algebraic structures with carry, such as the integers and the *N*-adic numbers. After a brief review of the basics on LFSRs, FCSRs, and AFSRs, we describe several open problems. These include: given part of a sequence, how to find an optimal generator of the sequence; how to construct sequences that cannot be generated by short LFSRs, FCSRs, or AFSRs; and the analysis of various statistical properties related to these generators.

## 1 Introduction

- 1.
Cryptography: stream ciphers scramble messages by combining them with sequences that are unpredictable from short prefixes.

- 2.
CDMA: large families of uncorrelated sequences minimize interference and allow a collection of channels to be shared by users (see Sect. 5.2).

- 3.
Radar ranging and GPS: peaks in autocorrelations of a sequence allow delay to be measured.

- 4.
Quasi-Monte Carlo: integrals are approximated by sampling integrands at points determined by pseudorandom sequences.

- 5.
Built in self-test: test patterns are determined by pseudorandom sequences.

- 6.
Wear leveling of storage media: pseudorandom sequences are used to remap the memory locations in a way that distributes the wear evenly across the whole disk.

For some 60 years linear feedback shift registers (LFSRs) (described in Sect. 3) have been used as generators (or components of generators) of pseudorandom sequences for these and other applications. In the form of linear equations modulo *N*, they have been studied by mathematicians since at least the 1920s [4]. The primary mathematical tools for analyzing these sequences are finite fields and particularly polynomials and power series over finite fields. A great deal is known about these sequences, but there is still much that is unknown.

More recently (since 1993 [6, 20]), researchers have been studying feedback with carry shift registers (FCSRs), a “with-carry” analog of LFSRs (described in Sect. 3). So far they have found a smaller number of applications—cryptanalysis of the summation combiner, quasi-Monte Carlo integration, and the F-FCSR stream cipher. One advantage they have is that the state change is nonlinear, which makes stream ciphers based on them resistant to algebraic attacks.

Much less is known about sequences generated by FCSRs (and algebraic feedback shift registers (AFSRs), a generalization). The purpose of this chapter is to describe some of the open problems in this area. The main focus is on properties of sequences that are of interest cryptographically.

Throughout this chapter, the book by Goresky and the author [10] serves as a reference.

## 2 Stream Ciphers

In this section we discuss one important application of pseudorandom sequences. The main problem of practical cryptography is how to send a message securely in real time. The common techniques of public key cryptography are too slow for large transmissions (such as video on demand). For example, RSA encrypts by computing \(E(m) = m^{e}\mod pq\), where *p* and *q* are perhaps 500 bit primes. This is much too slow to encrypt large data sets in real time.

*block or stream ciphers*. The trade-off between these two approaches is that the fastest stream ciphers are somewhat faster than the fastest block ciphers, but stream ciphers seem to be more vulnerable to attack. In this section we are interested in stream ciphers. In their simplest form, a sender and receiver agree on a pseudorandom sequence generator (PSG)

*G*(publicly) and a small shared seed

*s*(privately, perhaps by a slow key agreement protocol).

*G*, initialized with

*s*, generates a pseudorandom sequence \(G(s) = \mathbf{a} = a_{0},a_{1},\ldots \in \{ 0,1\}^{\infty }\). A message \(m = m_{0},m_{2},\ldots \in \{ 0,1\}^{\infty }\) is encrypted by computing \(c_{i} = m_{i} \oplus a_{i}\). See Fig. 1.

- Large period:
A sequence \(\mathbf{a} = a_{0},a_{1},\ldots\) is periodic if \(\forall i: a_{i} = a_{i+p}\). It is eventually periodic if \(\forall i> t: a_{i} = a_{i+p}\) for some

*t*. The period,*p*, must be large for use in a stream cipher.- Balance:
In one period the numbers of occurrences of different symbols must be nearly equal.

- Uniform distribution of small subsequences:
For any

*r*, in one period the numbers of occurrences of different blocks of length*r*must be nearly equal.- Uncorrelated with shifts:
- Let
**a**be a binary sequence with period*p*. The*autocorrelation*of**a**with shift*t*isIf$$\displaystyle{\mathcal{A}_{\mathbf{a}(t)} =\sum _{ i=0}^{p-1}(-1)^{a_{i}+a_{i+t} }.}$$*t*is not a multiple of*p*, this integer should be close to zero. - Unpredictable from a short prefix:
It should not be possible to determine

**a**knowing only \(a_{0},\ldots,a_{k-1}\) for small*k*using any known methods (e.g., using the Berlekamp–Massey algorithm). This is a critical requirement for stream ciphers.

Since we do not know what requirements will arise in the future, it is useful to have a large pool of high-quality pseudorandom sequences available.

Note that the approach to security described here is different from the complexity theory approach. In that approach one defines a *cryptographically strong pseudorandom bit generator* (CSPRBG) to be a sequence generator whose output is indistinguishable from a truly random sequence generator by any polynomial time probabilistic distinguisher. Unfortunately this is a strong constraint, and all known CSPRBGs are unable to approach real-time operation (and in fact the security of known CSPRBGs depends on the assumed intractability of certain computational problems such as quadratic residuosity).

## 3 Sequence Generators

In this section we describe simple, fast devices that satisfy many of the requirements for sequence generators (but not the unpredictability). They are commonly used as building blocks for stream ciphers.

*Σ*,

*G*= (

*S*,

*Γ*,

*δ*), where the set

*S*is the

*state space*,

*Γ*:

*S*→

*S*is the state change function, and

*δ*:

*S*→

*Σ*is the output function. Such a PSG generates a pseudorandom sequence from a given initial state

*σ*∈

*S*by iterating the state change forever. That is

*σ*, the set of states {

*Γ*

^{i}(

*σ*):

*i*= 0, 1, 2,

*…*} be finite. This implies that

*G*(

*σ*) is eventually periodic.

In what follows, we are concerned with families of PSGs. We may be interested, for example, in finding the most efficient PSG *G* that generates a given sequence **a**, where *G* is in a given family \(\mathcal{G}\) of PSGs. In the next few subsections, we describe some interesting families of PSGs.

### 3.1 LFSRs

*r*over a field

*F*is a finite state PSG whose state set is

*F*

^{r}and whose state change function is determined by a set of coefficients \(g_{1},\ldots,g_{r} \in F\) [10, p. 23]. If the current state is \((a_{0},a_{1},\ldots,a_{r-1})\), then the next state is \((a_{1},\ldots,a_{r-1},a_{r})\), where \(a_{r} = g_{r}a_{0} + \cdots + g_{1}a_{r-1}\). The output function is \(\delta (a_{0},a_{1},\ldots,a_{r-1}) = a_{0}\). See Fig. 2.

- 1.
The

*connection polynomial*is \(g(x) = -1 + g_{1}x + \cdots + g_{r}x^{r}\). The generating function of the output sequence**a**is \(a(x) = a_{0} + a_{1}x + a_{2}x^{2} + \cdots\). There is a polynomial*u*(*x*), uniquely determined by the initial state \((a_{0},\ldots,a_{r-1})\), so that \(a(x) = u(x)/q(x)\). - 2.
The sequence

**a**is eventually periodic. It’s periodic if and only if deg(*u*) < deg(*g*). The maximum possible period is |*F*|^{r}− 1. This is achieved when*g*(*x*) is a*primitive polynomial*, meaning that a root of*g*is a primitive element in \(\mathbb{F}_{q^{r}}\). In this case*a*is called an*m-sequence*[10, p. 208]. These sequences are the most commonly used LFSR sequences. - 3.
M-sequences have many good statistical properties. Their shifted autocorrelations are all − 1. They are as balanced as possible for their period, and the distribution of subblocks of fixed size is as uniform as possible. They have the

*run property*[10, p. 172] and the*shift and add property*[10, p. 191]. - 4.Let
*E*be the unique degree*r*extension field of*F*. Let*Tr*be the trace function from*E*to*F*. If the connection polynomial*g*(*x*) is irreducible, and the sequence**a**is periodic, then it can be expressed as \(a_{i} = Tr(A\alpha ^{i})\) where*α*is a root of*g*(*x*) and*A*∈*E*corresponds to the initial state. More generally, if**a**is periodic, then it can be expressed asmeaning (1) compute the element \(v \equiv \mathit{Ax}^{-i}\mod g\) with deg($$\displaystyle{a_{i} = (\mathit{Ax}^{-i}\mod g)\mod x,}$$*v*) <*r*; and (2)*a*_{i}is the constant term of*v*[10, p. 48].

We can form a family of PSGs by fixing *F* and considering all LFSRs with entries in *F*.

### 3.2 FCSRs

*N*≥ 2 be an integer and \(S =\{ 0,1,\ldots,N - 1\}\). A FCSR of length

*r*based on \(N\) is a PSG whose state set is \(S^{r} \times \mathbb{Z}\) and whose state change function is determined by a set of coefficients \(g_{1},\ldots,g_{r} \in \mathbb{Z}\) [10, p. 70]. If the current state is \((a_{0},a_{1},\ldots,a_{r-1};z)\), then the next state is \((a_{1},\ldots,a_{r-1},a_{r};z^{{\prime}})\), where \(a_{r} + Nz^{{\prime}} = g_{r}a_{0} + \cdots + g_{1}a_{r-1} + z\). Here the addition and multiplication are in \(\mathbb{Z}\). The output function is \(\delta (a_{0},a_{1},\ldots,a_{r-1};z) = a_{0}\). See Fig. 3.

FCSRs have many properties that parallel properties of LFSRs. Now, however, the algebra of polynomials and power series is replaced by the algebra of integers and *N -adic numbers*, which we briefly review [10, p. 72].

*N*-adic number is an infinite expression

*a*

_{i}∈

*S*. Addition of

*N*-adic numbers is addition with carry. That is,

*i*≥ 0 we have \(a_{i} + b_{i} + d_{i} = c_{i} + \mathit{Nd}_{i+1}\). Similarly, we have

*i*≥ 0 we have

*N*-adic numbers is thus an algebraic ring, denoted by \(\mathbb{Z}_{N}\). Note that

*N*-adic number

*u*∕

*g*with gcd(

*g*,

*N*) = 1.

- 1.
The

*connection integer*of an FCSR is \(g = -1 + g_{1}N + \cdots + g_{r}N^{r}\). The associated*N*-adic number of the output sequence**a**is \(a = a_{0} + a_{1}N + a_{2}N^{2} + \cdots\). There is an integer*u*(uniquely determined by the initial state \((a_{0},\ldots,a_{r-1};z)\)) so that \(a = u/g\) [10, p. 80]. - 2.
The sequence

**a**is eventually periodic [10, p. 88]. This is equivalent to saying that the carry*z*is bounded in any infinite execution of the FCSR. The sequence**a**is periodic iff −*g*≤*u*≤ 0. The period is at most*g*− 1. The period equals*g*− 1 when*g*is prime and*N*is a*primitive root*modulo*g*, meaning that the multiplicative order of*N*modulo*g*is*g*− 1. In this case**a**is called an*ℓ -sequence*[10, p. 264]. These sequences are the most interesting FCSR sequences. It is unknown whether for a fixed*N*, there are infinitely many primes*g*such that*N*is primitive modulo*g*(Artin’s conjecture). However, Hooley showed that if a certain generalized Riemann hypothesis holds, then for every*N*there are infinitely many primes*g*so that*N*is primitive modulo*g*[14]. Moreover, it is known that there are at most two values of*N*for which Artin’s conjecture fails, although it is unknown what these values are [13]. - 3.
*ℓ*-sequences have many good statistical properties. If*N*= 2, then their shifted*arithmetic*autocorrelations (defined in Sect. 5.2) are all 0 [10, p. 172]. They are as balanced as possible for their period and the distribution of subblocks of fixed size is as uniform as possible. They have the*arithmetic shift and add property*[10, p. 204]. - 4.If
**a**is periodic, then it can be expressed asfor some \(A \in \mathbb{Z}\), meaning (1) compute the element \(v \equiv \mathit{AN}^{-i}\mod g\) with 0 ≤$$\displaystyle{a_{i} = (\mathit{AN}^{-i}\mod g)\mod N,}$$*v*<*g*; and (2) \(a_{i} = v\mod N \in S\) [10, p. 87].

We can form a family of PSGs by fixing *N* and considering all *N*-ary FCSRs.

### 3.3 AFSRs

*R*be an algebraic ring. Let

*π*∈

*R*be neither a unit and nor a zero divisor, and assume that

*R*∕(

*π*) is finite. Let

*S*⊆

*R*be a complete set of representatives for

*R*∕(

*π*). An AFSR of length

*r*based on

*π*is a PSG whose state set is

*S*

^{r}×

*R*and whose state change function is determined by a set of coefficients \(g_{0},\ldots,g_{r} \in R\) with

*g*

_{0}invertible modulo

*π*. If the current state is \((a_{0},a_{1},\ldots,a_{r-1};z)\), then the next state is \((a_{1},\ldots,a_{r-1},a_{r};z^{{\prime}})\), where \(g_{0}a_{r} + Nz^{{\prime}} = g_{r}a_{0} + \cdots + g_{1}a_{r-1} + z\). Here the addition and multiplication are in

*R*. The output function is \(\delta (a_{0},a_{1},\ldots,a_{r-1};z) = a_{0}\). See Fig. 4.

*π*-adic numbers, which we briefly recall [10, p. 98]. A

*π*-adic number is an infinite expression

*a*

_{i}∈

*S*. Addition of

*π*-adic numbers is again addition with carry. That is,

*i*≥ 0 we have \(a_{i} + b_{i} + d_{i} = c_{i} +\pi _{i+1}\). Multiplication is defined similarly. The set of

*π*-adic numbers is \(d_{i} = c_{i} +\pi d_{i+1}\) thus an algebraic ring, denoted by

*R*

_{π}.

In the case when *R* = *F*[*x*], *F* a finite field, *π* = *x*, *S* = *F*, *g*_{0} = 1, and *z* = 0, we obtain LFSRs (the carries are all 0 in this case). In the case when \(R = \mathbb{Z}\), *π* = *N* > 1, \(S =\{ 0,1,\ldots,N - 1\}\), and *g*_{0} = 1, we obtain FCSRs. Other special cases that have been studied include the case when *R* = *F*[*x*] and deg(*π*) > 1 [10, p. 250], and *d*-FCSRs, where \(R = \mathbb{Z}[\pi ]\) and \(\pi = N^{1/d}\) with *N* square free [10, p. 133]. In the latter case, addition in *R* is addition with carry where the carry jumps *d* places ahead.

It is not in general the case that the output from an AFSR is eventually periodic. However, it is known that if *R* is a ring of integers in a number field, then the output is always eventually periodic iff for every embedding of the fraction field of *R* in \(\mathbb{C}\) the complex norm of *π* is greater than 1. This is the case, for example, for *d*-FCSRs.

- 1.
The

*connection element*of an AFSR is \(g = -g_{0} + g_{1}\pi + \cdots + g_{r}\pi ^{r}\). The associated*π*-adic number of the output sequence**a**is \(a = a_{0} + a_{1}\pi + a_{2}\pi ^{2} + \cdots\). There is an integer*u*(uniquely determined by the initial state \((a_{0},\ldots,a_{r-1};z)\)) so that \(a = u/q\). - 2.
The sequence

**a**is eventually periodic if*R*is a ring of integers in a number field and the complex norm of*π*is greater than 1 under every embedding of the fraction field of*R*in \(\mathbb{C}\). Otherwise there are AFSRs that do not produce eventually periodic output. There is in general no known condition on the numerator*u*characterizing the periodic output sequences, even in the case when all output sequences are eventually periodic. However, for*d*-FCSRs, we have the following.Let*π*^{d}= 2. We denote by*P*the parallelepiped in \(R = \mathbb{Z}[\pi ]\) which is spanned by the*d*linearly independent vectors \(-g,-g\pi,\ldots,-g\pi ^{d-1}\),Let \(\varDelta = P \cap \mathbb{Z}[\pi ]\) be the set of points of the integer lattice \(\mathbb{Z}[\pi ]\) in$$\displaystyle{P = \left \{\sum _{i=0}^{d-1}v_{ i}g\pi ^{i}\vert \ v_{ i} \in \mathbb{Q}\text{ and } - 1 \leq v_{i} \leq 0\right \} \subset \mathbb{Q}[\pi ].}$$*P*.

### Theorem 3.1 ([9])

*Suppose*\(g \in \mathbb{Z}[\pi ]\)*is a unit modulo π. Let***a***be an output sequence from a d-FCSR with connection element g and let a be the π-adic number associated with***a***. Suppose that*\(a = u/g\)*. Then***a***is periodic if and only if u ∈Δ.*

*π*is a

*primitive element*modulo

*g*, meaning that the multiplicative order of

*π*modulo

*g*is \(\vert R/(q)\vert - 1\). In this case

**a**is called a

*π*-adic

*ℓ -sequence*.

- 3.
The statistical properties of

*π*-adic*ℓ*-sequences are not well understood, except in some special cases. For example, for*d*-FCSRs with*d*= 2, we have the following. Let \(\mathcal{N}_{\mathbb{Q}}^{F}\) denote the rational norm function on*F*.

### Theorem 3.2 ([16])

*Let*\(\pi ^{2} = N \geq 2 \in \mathbb{Z}\)

*with N square free. Let F be the fraction field of*\(R = \mathbb{Z}[\pi ]\)

*. Suppose that*\(g = y + z\pi \in R\)

*, with*\(y,z \in \mathbb{Z}\)

*, is invertible modulo π, that*\(h = \mathcal{N}_{\mathbb{Q}}^{F}(g)\)

*is a prime integer, and that π is primitive modulo g. Let*

**a**

*be an ℓ-sequence defined over*\(\mathbb{Z}[\pi ]\)

*with connection element g. If*\(s \in \mathbb{Z}^{+}\)

*is even, then the number K of occurrences of any s-tuple in one period of*

**a**

*satisfies*

*If*\((h/N)^{1/2} \leq \vert z\vert \leq h^{1/2}\)

*, then*

*If N = 2, then*

- 4.
It has only been shown that there is an exponential representation of periodic AFSR sequences under special conditions.

We can form a family of PSGs by fixing *R*, *π*, and *S* and considering all AFSRs based on these ingredients. We also may want to impose constraints on the *g*_{i}s, such as requiring that they be in *S*. Note that if we let the *g*_{i}s be arbitrary elements of *R*, then we can take *r* = 1 and \(g = g_{1}\pi + g_{0}\) with *g*_{0} ∈ *S*. Thus any *π*-adic number *u*∕*g* can be generated by an AFSR of length one.

## 4 Register Synthesis Problem

Let \(\mathcal{G}\) be a family of PSGs. Suppose that given part of a sequence **a** we can find the most efficient (in some sense) \(G = (S,\varGamma,\delta ) \in \mathcal{G}\) and *σ* ∈ *S* so that *G*(*σ*) = **a**. If *G* is efficient enough, then we have cryptanalyzed **a** [10, p. 295]. Let us make this more precise.

A *register synthesis algorithm* for the family \(\mathcal{G}\) is an algorithm *T* that on input \(a_{0},a_{1},\ldots,a_{n-1}\), a prefix of **a**, outputs \(G = (S,\varGamma,\delta ) \in \mathcal{G}\) and initial state *σ* ∈ *S* so that

- 1.
\(G(\sigma ) = a_{0},a_{1},\ldots,a_{n-1},?,?,\cdots\).

- 2.
If

*n*is large enough,*G*(*σ*) =**a**(convergence). - 3.
*T*runs in polynomial time in*n*.

To measure the effectiveness of such an algorithm, we first need a notion of *size* of a sequence generator *G* in a family \(\mathcal{G}\). This should at least approximate the amount of space needed to store the states that occur in an infinite execution of *G*. Then we define the \(\mathcal{G}\)*-complexity*\(\lambda _{\mathcal{G}}(\mathbf{a})\) of a sequence **a** to be the minimum size of a generator in \(\mathcal{G}\) that outputs **a**. We typically measure the effectiveness of a \(\mathcal{G}\)-synthesizing algorithm in terms of \(\lambda _{\mathcal{G}}(\mathbf{a})\): for some slowly growing function *μ*, if the prefix length *n* is at least \(\mu (\lambda _{\mathcal{G}}(\mathbf{a}))\), then *T* outputs *G*, *σ* with *G*(*σ*) = **a**. In all cases we know, *μ*(*λ*) is linear in *λ*.

As a consequence, if \(\lambda _{\mathcal{G}}(\mathbf{a})\) is small and \(\mathcal{G}\) has an effective register synthesis algorithm, then \(\mathbf{a}\) is cryptographically insecure.

We later use the notion of the \(\mathcal{G}\)-complexity of a finite sequence, the minimum size of a generator in \(\mathcal{G}\) that outputs \(a_{0},a_{1},\ldots,a_{n-1}\) as its first *n* output symbols. We denote the \(\mathcal{G}\)-complexity by \(\lambda _{\mathcal{G}}(a_{0},a_{1},\ldots,a_{n-1})\).

### 4.1 LFSR Synthesis

*g*

_{i}. There is an efficient algorithm due to Berlekamp and Massey in 1969 [10, p. 296], [22]. This algorithm exploits the special structure of the equations and runs in time

*O*(

*n*

^{2}). Given \(a_{0},a_{1},\ldots,a_{n-1}\), the goal is to find relatively prime polynomials

*u*(

*x*) and

*g*(

*x*) so that

*g*(

*x*) is the connection polynomial of a minimal size LFSR that generates

**a**, and

*u*(

*x*) determines the initial state. The algorithm proceeds iteratively—at the

*i*th iteration it finds the minimal degree polynomials

*u*

_{i}(

*x*),

*g*

_{i}(

*x*) so that

*m*and a certain

*b*.

*linear complexity*of

**a**. If \(a(x) = u(x)/g(x)\) and gcd(

*u*(

*x*),

*g*(

*x*)) = 1, then

*μ*(

*λ*) = 2

*λ*. That is, if the sequence

**a**has linear complexity

*λ*and the input to the Berlekamp–Massey algorithm is \(a_{0},\ldots,a_{2\lambda -1}\), then the output is the precise rational representation of the generating function of

**a**.

### 4.2 FCSR and AFSR Synthesis

A first attempt at solving the FCSR synthesis problem is to use the Berlekamp–Massey algorithm but using integer linear combinations instead of \(\mathbb{F}_{q}\) linear combinations when finding a new approximation. This doesn’t work—the propagation of carries interferes with convergence.

*N*= 2, Goresky and Klapper developed an FCSR synthesis algorithm based on work of Mahler and De Weger. This

*Rational Approximation Algorithm*iteratively finds a minimal basis for the

*k*th approximation lattice,

Subsequently Xu and Klapper solved the problem for any *N* [10, p. 348], [21, 28]. They modified the Berlekamp–Massey algorithm so that when a new rational approximation is needed, one is found that works for the next three symbols of **a**. This means that the effect of the carry is overwhelmed by the growth in the number of terms accounted for. This algorithm also works for some classes of AFSRs: if the base ring *R* is the ring of integers of a number field \(F = \mathbb{Q}[N^{1/d}]\) (*d*-FCSRs) or *R* is the ring of integers of certain quadratic extensions of \(\mathbb{Q}\).

*N*, a modified Euclidean algorithm is used. The idea is, given \(a_{0},\ldots,a_{n-1}\), to run the extended Euclidean algorithm on input

*N*

^{n∕2}. If

*n*is large enough, this is guaranteed to succeed.

*u*(

*x*)∕

*g*(

*x*) is the corresponding rational representation of the generating function of

**a**. For FCSRs, the “engineering” definition of size would be the number of cells plus the maximum number of

*N*-ary digits needed to store the carry in an infinite execution of the FCSR. We call this the

*N -adic span*. Unfortunately we know of no reasonable algebraic definition of

*N*-adic span. Instead, we define the

*N -adic complexity*of

**a**to be

*N*-adic number associated with

**a**has rational representation

*u*∕

*g*and gcd(

*u*,

*g*) = 1. It can be seen that

*N*-adic span and

*N*-adic complexity differ only by a small amount.

The situation is more complicated for AFSRs based on a ring *R* and an element *π* ∈ *R*. There may be multiple competing choices for a size function. For example, represent *z* ∈ *R* as \(\sum _{i=0}^{k}z_{i}\pi ^{k}\), and let the size of *z* be *k*. Or let the size of *z* be the log (to an appropriate base) of the rational norm of *z*. The former definition is inadequate in general since not all elements can be represented this way. The latter fails to distinguish the sizes of *z* and *uz* where *u* is a nontrivial unit.

All three algorithms have time complexity *O*(*n*^{2}). For Goresky and Klapper’s rational approximation algorithm, we have \(\mu (\lambda ) = \left \lceil 2\lambda \right \rceil + 2\). For Xu and Klapper’s modified Berlekamp–Massey algorithm applied to \(R = \mathbb{Z}\), we have \(\mu (\lambda ) = \left \lceil 6\lambda \right \rceil + 27\). For the Euclidean approximation algorithm, we have \(\mu (\lambda ) = \left \lceil 2\lambda \right \rceil + 3\).

- 1.
How can we build efficient generators of sequences that have large \(\lambda _{\mathcal{G}}\) for all “reasonable” \(\mathcal{G}\)?

- 2.
Are there other “interesting” families \(\mathcal{G}\) of PSGs with good register synthesis algorithms?

- 3.
Are there families \(\mathcal{G}\) of PSGs that provably have no register synthesis algorithm? Or even just no algorithm with

*μ*(*λ*) linear? - 4.
Can we find effective register synthesis algorithms for other classes of AFSRs?

### 4.3 Combined and Filtered Generators

In this section we consider two approaches to reducing vulnerability of stream ciphers to synthesis attacks. The general idea is to introduce some nonlinearity to the PSG while maintaining the good statistical properties.

*n*simple PSGs, such as LFSRs and FCSRs and to combine their outputs with a nonlinear

*combiner*function \(H(x_{1},\ldots,x_{n})\). How can we choose

*H*to maximize security? In particular, how can we choose

*H*to make the linear or

*N*-adic complexity large? In the binary case, suppose the underlying PSGs generate sequences \(\mathbf{a}^{1},\ldots,\mathbf{a}^{n}\). Let the overall output sequence be \(\mathbf{b} = b_{0},b_{1},\ldots\), with \(b_{i} = H(a_{i}^{1},\ldots,a_{i}^{n})\). Then Key showed that the linear complexity of

**b**satisfies

*H*as a polynomial with integer coefficients that happen to be 0s and 1s [15]. Moreover, Key showed that if the

*a*

^{j}are m-sequences and their periods are pairwise relatively prime, then we have equality in equation (1).

- 1.
Can we express or bound the 2-adic complexity of

**b**in terms of the 2-adic complexities of the**a**^{i}? Similarly for various*π*-adic complexities. - 2.
Are there conditions under which both the linear and 2-adic complexities of

**b**are large? All*π*-adic complexities? - 3.
What if we add a small amount of memory to the combiner? Rueppell investigated the

*summation combiner*, where*H*is binary addition with carry [26]. He gave a heuristic argument that the linear complexity should be large, but gave no actual proof. To our knowledge, no proof has yet been found. On the other hand, it is known that the 2-adic complexity of**b**is the sum of the 2-adic complexities of the**a**^{i}, so the summation combiner is vulnerable to an FCSR synthesis attack. In fact it was this that motivated the invention of FCSRs.

We must point out that even if we achieve large linear and 2-adic complexities (or even large *π*-adic complexity for all *π*), this does not make these sequences secure. There are other attacks. For example, combiners tend to be vulnerable to *correlation attacks* [23].

*d*= deg(

*F*), then

- 1.
How can

*F*be chosen to achieve equality in inequality (2)? - 2.
What is the

*N*-adic complexity of a filtered FCSR? - 3.
What is the linear complexity of an

*ℓ*-sequence or a filtered FCSR? - 4.
What is the

*N*-adic complexity of an m-sequence or a filtered LFSR?

Similar questions can be asked about *π*-adic complexity, where *π* is an element of a ring *R*. More generally, it is an open problem how to efficiently generate sequences that have both large linear complexity and large 2-adic complexity (or *π*-adic complexity for any *π*).

We mention here some additional motivation for studying FCSRs. There is a type of attack on filtered LFSR generators known as *algebraic cryptanalysis* [5]. The basic idea is to treat each monomial in the filter function *F* as a variable. Knowing some ciphertext and plaintext gives the attacker some keystream and thus gives an equation in these metavariables. More known keystream gives more equations using the composition of *F* with iterations of the state change. If the degree of *F* is small (or if there is a low degree multiple of *F*), then the number of metavariables is small, and if there are enough equations, we can solve for the metavariables.

Critical to this attack is the fact that *F* composed with the state change has the same degree as *F*. However, if we replace the LFSR with an FCSR, this is no longer the case and algebraic cryptanalysis no longer works.

## 5 Statistics of Sequences

In this section we consider various open questions on statistical properties of shift register sequences.

### 5.1 Average Behavior

We would like to understand the average behavior of the \(\mathcal{G}\)-complexity of sequences. Deviation from the average can be used as a measure of nonrandomness (the NIST test suite does this with linear complexity [25]). Moreover, if the average is large, then we know that randomly chosen sequences are likely to have large \(\mathcal{G}\)-complexity. This is important because many stream ciphers are designed to be hard to analyze. For such ciphers, it is likely to be impossible to determine the \(\mathcal{G}\)-complexity.

But averaged over what? We can use Haar measure on infinite sequences. However, the eventually periodic sequences are countable, so have measure zero. For most \(\mathcal{G}\) of interest only eventually periodic sequences have finite \(\mathcal{G}\)-complexity, so the average \(\mathcal{G}\)-complexity is infinite. This tells us nothing.

- 1.
\(E_{n}^{\text{fin},\mathcal{G}} =\) average \(\mathcal{G}\)-complexity over all finite, length

*n*sequences. - 2.
\(E_{n}^{\text{per},\mathcal{G}} =\) average \(\mathcal{G}\)-complexity over all infinite period

*n*sequences.

*n*extensions. Thus

- 1.
\(E_{n}^{\text{fin,lin}} = n/2 + O(1/q)\) for sequences over \(\mathbb{F}_{q}\).

- 2.
\(E_{n}^{\text{per,lin}} \geq n - m/(q - 1).\) for sequences over \(\mathbb{F}_{q}\),

*q*a power of*p*,*n*=*p*^{v}*m*, gcd(*m*,*p*) = 1 (the exact value can be expressed in terms of cyclotomic numbers). - 3.
\(E_{n}^{\text{per,N-adic}} \in n - O(\log (n))\) (the exact value can be expressed in terms of the prime factorization of

*N*^{n}− 1).

This leaves open the determination of \(E_{n}^{\text{fin,N-adic}}\).

*R*is a UFD with \(\pi ^{2} = -N <0\) or

*π*

^{d}=

*N*> 0 and

*n*is a multiple of 4 in the former case and is arbitrary in the latter case, then \(E_{n}^{\text{per,}\mathcal{G}}\in n - O(\log (n))\) [19]. The average finite

*π*-complexity is unknown, as are both averages for any other

*R*. Note that in these cases, there are reasonable definitions of the size of an AFSR. For example, if \(\pi ^{2} = -N\) and

*F*is the field of fractions of

*R*, then we define the size of an element

*u*+

*vπ*∈

*R*, \(u,v \in \mathbb{Z}\), to be

*g*and outputs a sequence whose associated

*π*-adic number has rational representation

*f*∕

*g*. If

*π*

^{d}=

*N*> 0, then we let the size of an element be

*N*-ary digits needed to represent the state. It seems that the first step in extending these results to more general

*R*is to find a suitable notion of size.

### 5.2 Correlations

**a**and

**b**be binary sequences of period

*T*. The classical notion of the cross-correlation of

**a**and

**b**is

**a**=

**b**, then the cross-correlation is called the

*autocorrelation*of

**a**, denoted \(\mathcal{A}_{\mathbf{a}(t)}\).

The cross-correlation is used in code division multiple access (CDMA) communications. Each user has a sequence **a** that determines how the user’s signal is distributed across a set of *T* channels. Typically it is necessary that the sequences used by two users have low cross-correlation to prevent interference. Thus the capacity of the system is limited by the size of a family of sequences with low pairwise cross-correlations.

*Welch bound*[27]. Let

*S*be a set of

*n*binary sequences of period

*T*. Let \(\mathcal{C}_{\max }\) be the maximum cross-correlation between distinct sequences in

*S*(including shifts of sequences and including shifted autocorrelations). Then

_{2}denote subtraction with borrow of binary sequences. That is, to compute \(\mathbf{a} -_{2}\mathbf{b}\), find the associated 2-adic numbers

*a*and

*b*, subtract them in \(\mathbb{Z}_{2}\), and extract the sequence of coefficients of the result. Note that if

**a**and

**b**have period

*T*, then

**a**−

_{2}

**b**is only eventually periodic (in fact it is periodic from the

*T*th term on). We define the

*arithmetic cross-correlation*to be

We next define a set of sequences whose pairwise arithmetic cross-correlations are identically zero. The *d-fold decimation* of a sequence \(\mathbf{a} = a_{0},a_{1},a_{2},\ldots\) is the sequence \(\mathbf{a}^{d} = a_{0},a_{d},a_{2d}\ldots\).

### Theorem 5.1

*Suppose 2 is a primitive root modulo the prime number g. Let***a***be an ℓ-sequence with connection integer g. Suppose that*\(\gcd (g,d) =\gcd (g,e) = 1\)*and that***a**^{d}*is not a shift of***a**^{e}*. Then for all t*, \(\mathcal{C}_{\mathbf{a}^{d},\mathbf{a}^{e}}^{A}(t) = 0\)*.*

It follows that the set \(S_{g} =\{ \mathbf{a}^{d}:\gcd (g,d) = 1\}\) is a set of sequences with identically zero arithmetic cross-correlations. This is in stark contrast to the classical setting. Two questions remain. First, is there an application of this remarkable fact? Second, how large is *S*? That is, how many shift distinct decimations of an *ℓ*-sequence are there.

### Conjecture 5.2

If \(d\not\equiv e\mod q\) and *q* > 13, then **a**^{d} is not a shift of **a**^{e}.

If true, this would give us sets *S*_{g} of period *g* − 1 sequences with zero arithmetic cross-correlations and \(\vert S_{g}\vert =\phi (g - 1)\). Note that *ϕ*(*g* − 1) can be as large as \((g - 3)/2\).

Bourgain’s et al. result is based on recent deep results on bounds for certain exponential sums.

### 5.3 Asymptotic Complexity

Let \(\mathcal{G}\) be a family of PSGs. Typically a sequence **a** is eventually periodic if and only if it can be generated by some \(G \in \mathcal{G}\). Let us call such a \(\mathcal{G}\)*periodic*. The “if” part certainly holds if for any *G* = (*S*, *Γ*, *δ*) and *σ* ∈ *S*, the set of states \(\{\varGamma ^{i}(\sigma ): i \in \mathbb{N}\}\) is finite. The “if and only if” holds for LFSRs, FCSRs, and AFSRs based on ring \(R = \mathbb{Z}[\pi ]\) and *π* if | *π* | > 1 for every embedding of the fraction field of \(R\) in \(\mathbb{C}\).

Suppose **a** is not eventually periodic. How can we understand the \(\mathcal{G}\)-complexity of **a**? One way is to consider finite prefixes of **a** and study the growth in their \(\mathcal{G}\) complexities as the length increases.

*-complexity profile*of

**a**is the sequence \(\varLambda _{\mathcal{G}}(\mathbf{a}) = (\lambda _{\mathcal{G},1}(\mathbf{a}),\lambda _{\mathcal{G},2}(\mathbf{a}),\ldots )\). Assume for the remainder of this subsection that \(\mathcal{G}\) is a family of periodic PSGs. Then

**a**, we have \(\lambda _{\mathcal{G},n}(\mathbf{a}) \leq n + o(n)\). This is the case for LFSRs, FCSRs, and AFSRs since these families contain pure cycling registers (\(g_{1} = \cdots = g_{r-1} = 0,g_{r} = 1\)). In this case we can normalize by defining

*n*tends to infinity.

However, it is not in general the case that this sequence has a single limit point. Rather, it has a set of accumulation points *T*(**a**) ⊆ [0, 1]. It is this set we want to study. When does there exist a single limit point of the \(\delta _{\mathcal{G},n}(\mathbf{a})\)? In general what is the structure of *T*(**a**)?

The first question was answered by Niederreiter for linear complexity [24]. The answer is that generically a single limit exists and that limit is 1∕2. More precisely, recall that there is a natural measure on the set *L* = { 0, 1}^{∞} of infinite binary sequences, called Haar measure. This is simply the infinite product of the uniform measure on {0, 1}. This is very nearly the uniform measure on the real unit interval [0, 1]. Niederreiter showed that there is a set \(U \subseteq L\) with measure one such that if **a** ∈ *U* then \(T(\mathbf{a}) = [1/2,1/2]\).

It is an open problem to prove this for any other family of PSGs.

Next we mention a theorem that partially answers the second question.

### Theorem 5.3 ([17])

*If*\(\lambda _{\mathcal{G},n} \leq \lambda _{\mathcal{G},n+1}\)*, then*\(T(\mathbf{a}) = [B,C] \subseteq [0,1]\)*.*

But what are the possible values of *B* and *C*?

### Conjecture 5.4

For all **a**, we have \(T(\mathbf{a}) = [B,1 - B]\). For every *B* ∈ [0, 1∕2] there are uncountably many sequences **a** for which \(T(\mathbf{a}) = [B,1 - B]\).

All other cases are open.

## Notes

### Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. CCF-0514660. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

### References

- 1.F. Arnault, T. Berger, A. Necer, Feedback with carry shift registers synthesis with the Euclidean algorithm. IEEE Trans. Inf. Theory
**50**, 910–917 (2004) This paper modifies the extended Euclidean algorithm to find a minimal FCSR generating a sequence given a sufficiently long prefix of the sequence.Google Scholar - 2.J. Bourgain, T. Cochrane, J. Paulhus, C. Pinner, Decimations of
*ℓ*-sequences and permutations of even residues mod*p*. SIAM J. Discrete Math.**23**, 842–857 (2009)Google Scholar - 3.J. Bourgain, T. Cochrane, J. Paulhus, C. Pinner, On the parity of
*k*-th powers mod*p*, a generalization of a problem of Lehmer. Acta Arith.**147**, 173–203 (2011) These two papers show that if*p*is large enough and 2 is primitive modulo*p*, then all decimations of an*ℓ*-sequence with connection integer*p*are cyclically distinct. It is conjectured that this is true for all primes*p*> 13.Google Scholar - 4.R. Carmichael, Sequences of integers defined by recurrence relations, Q. J. Pure Appl. Math.
**48**, 343–372 (1920) This is one of the first papers to study integer linear recurrences modulo an integer.Google Scholar - 5.N. Courtois, Fast algebraic attacks on stream ciphers with linear feedback, in
*Advances in Cryptology: Crypto 2003*, ed. by D. Boneh, Lecture Notes in Computer Science, vol 2729 (Springer, Berlin, 2003), pp. 177–194 Courtois’ seminal paper describes an attack on stream ciphers based on finding low degree multiples of the polynomials that express the output from a keystream generator in terms of the state bits. If such multiples can be found, then the problem of recovering the state from the output can be solved by solving a system of linear equations in the monomials of low degree.Google Scholar - 6.R. Couture, P. L’Écuyer, Distribution properties of multiply-with-carry random number generators. Math. Comput.
**66**, 591–607 (1997) Couture and L’Écuyer invented multiply with carry sequences, generated by linear recurrences with carry. These are equivalent to FCSR sequences, which were invented independently at about the same time by Goresky and Klapper.Google Scholar - 7.Z. Dai, S. Jiang, K. Imamura, G. Gong, Asymptotic behavior of normalized linear complexity of ultimately non-periodic sequences. IEEE Trans. Inf. Theory
**50**, 2911–2915 (2004) Let**a**be an infinite, binary, eventually aperiodic sequence. The authors show that the set of accumulation points of the normalized linear complexities of prefixes of**a**is an interval of the form [*B*, 1 −*B*].Google Scholar - 8.M. Goresky, A. Klapper, Arithmetic cross-correlations of FCSR sequences. IEEE Trans. Inf. Theory
**43**, 1342–1346 (1997) It is shown that the arithmetic cross-correlations of cyclically distinct binary*ℓ*-sequences are identically zero.Google Scholar - 9.M. Goresky, A. Klapper, Periodicity, correlation, and distribution properties of
*d*–FCSR sequences. Des. Codes Cryptogr.**33**, 123–148 (2004)*d*-FCSRs are a variant of FCSRs base on the algebra of \(\mathbb{Z}[2^{1/d}]\). In this paper various statistical properties of maximum period*d*-FCSR sequences are considered.Google Scholar - 10.M. Goresky, A. Klapper,
*Algebraic Shift Register Sequences*(Cambridge University Press, Cambridge, 2012), http://www.cs.uky.edu/~klapper/algebraic.html This is an extensive monograph on sequence generators based on abstract algebra. Topics studied include statistical analysis, maximum period sequences, and the register synthesis problem: the problem of finding a minimal generator of a particular type for a sequence given a short prefix. - 11.M. Goresky, A. Klapper, R. Murty, On the distinctness of decimations of
*ℓ*-sequences, in*Sequences and Their Applications—SETA ’01*, eds. by T. Helleseth, P.V. Kumar, K. Yang (Springer, Berlin, 2002), pp. 197–208Google Scholar - 12.M. Goresky, A. Klapper. R. Murty, I. Shparlinski, On decimations of
*ℓ*-sequences. SIAM J. Discrete Math.**18**, 130–140 (2004) These two papers give a partial solution to the conjecture that all decimations of an*ℓ*-sequence are distinct if the connection integer is greater than 13. Exponential sum techniques are used.Google Scholar - 13.D. Heath-Brown, Artin’s conjecture for primitive roots. Q. J. Math. Oxford Ser.
**37**(1), 27–38 (1986)CrossRefMATHMathSciNetGoogle Scholar - 14.C. Hooley, On Artin’s conjecture. J. Reine Angew. Math.
**22**, 209–220, (1967) These two papers give a partial solution to Artin’s conjecture (that for any integer*N*there are infinitely many primes for which*N*is primitive), assuming a generalized Riemann hypothesis.Google Scholar - 15.E. Key, An Analysis of the structure and complexity of nonlinear binary sequence generators. IEEE Trans. Inf. Theory
**22**(1), 732–736 (1976) Key analyzed the linear complexities of sequences generated by nonlinear combiners and LFSRs with nonlinear output functions.Google Scholar - 16.A. Klapper, Distributional properties of
*d*–FCSR sequences. J. Complexity**20**, 305–317 (2004) Let**a**be a maximum period sequence generated by a length*m**p*-ary*d*-FCSR (an AFSR based on a ring \(\mathbb{Z}[p^{1/d}]\)). We study the variation in the number of occurrences of blocks of length*s*≤*m*. If*d*= 2, we see that the variation is bounded by a constant times the square root of the average number of occurrences of blocks of length*s*.Google Scholar - 17.A. Klapper, The asymptotic behavior of 2-adic complexity. Adv. Math. Commun.
**1**, 307–319 (2007) Let**a**be an infinite, binary, eventually aperiodic sequence. We show that the set of accumulation points of the normalized 2-adic complexities of prefixes of**a**is an interval of the form [*B*, 1 −*B*].Google Scholar - 18.A. Klapper, The asymptotic behavior of
*π*-adic complexity with \(\pi ^{2} = -2\), in*Sequences, Subsequences, and Consequences*, eds. by S. Golomb, G. Gong, T. Helleseth, H.-Y. Song, Lecture Notes in Computer Science, vol 4893 (Springer, Berlin, 2007), pp. 134–147 Let**a**be an infinite, binary, eventually aperiodic sequence. We show that the set of accumulation points of the normalized*π*-adic complexities of prefixes of**a**is an interval of the form [*B*, 1 −*B*] in some cases.Google Scholar - 19.A. Klapper, Expected
*π*-adic complexity of sequences. IEEE Trans. Inf. Theory**56**, 2486–2501 (2010) This paper computes the average*π*-adic complexity of sequences of fixed period.Google Scholar - 20.A. Klapper, M. Goresky, Feedback shift registers, 2-adic span, and combiners with memory. J. Cryptol.
**10**, 111–147 (1997) In this paper FCSRs were introduced and many basic properties were worked out. Parts of this analysis were based on the algebra of 2-adic numbers.Google Scholar - 21.J. Xu, A. Klapper, Feedback with carry shift registers over \(\mathbb{Z}/(n)\), in
*Proceedings of International Conference on Sequences and their Application (SETA), Singapore, December 1998*, eds. by C. Ding, T. Helleseth, H. Niederreiter (Springer, Berlin, 1999), pp. 379–392 This paper generalizes Xu and Klapper’s algorithm [28] to AFSRs over certain number fields (including*d*-FCSRs).Google Scholar - 22.J.L. Massey, Shift register synthesis and BCH decoding. IEEE Trans. Inf. Theory
**15**(1), 122–127 (1969) Jim Massey showed here how Berlekamp’s decoding algorithm could be used as an efficient solution to the LFSR synthesis problem. The idea is to process one symbol at a time. When the rational approximation needs updating (i.e., a discrepancy occurs), a new approximation that is correct for the new symbol is found as a linear combination of two earlier approximation.Google Scholar - 23.W. Meier, O. Staffelbach, Correlation properties of combiners with memory in stream ciphers. J. Cryptol.
**5**, 67–86 (1992) This paper describes an effective attack on stream ciphers that combine several m-sequences with a nonlinear combiner, endowed with a small amount of extra memory. The basis is the analysis of the combining function to find a correlation between state bits and output bits.Google Scholar - 24.H. Niederreiter, The probabilistic theory of linear complexity, in
*Advances in Cryptology—Eurocrypt 88*, ed. by C. Günther, Lecture Notes in Computer Science, vol 330 (Springer, Berlin, 1988), pp. 191–209 Niederreiter showed that with probability 1 the limit of the normalized linear complexity of a sequence exists and equals 1∕2. The proof of this fact uses a relationship between continued fractions and linear complexity and uses the theory of dynamical systems.Google Scholar - 25.NIST, Statistical test suite for random and pseudorandom number generators for cryptographic applications, http://csrc.nist.gov/groups/ST/toolkit/rng/index.html This is a resource with useful tools for measuring statistical randomness of sequences.
- 26.R. Rueppel,
*Analysis and Design of Stream Ciphers*(Springer, Berlin, 1986) This book studies several aspects of stream ciphers and statistical properties of sequences, including nonlinear combiners with memory such as the summation combiner. It was largely based on Ruepppel’s Ph.D. dissertation.Google Scholar - 27.L.R. Welch, Lower bounds on the maximum correlation of signals. IEEE Trans. Inf. Theory
**20**(1), 397–399 (1974) Here Welch derived a fundamental constraint on the size of sequence families with low pairwise correlations.Google Scholar - 28.A. Klapper, J. Xu, Register synthesis for algebraic feedback shift registers based on non-primes. Des. Codes Cryptogr.
**31**, 227–250 (2004) This paper presents a solution to the FCSR synthesis problem for*n*-ary FCSRs with*n*arbitrary. This algorithm modifies the Berlekamp–Massey algorithm—when a discrepancy is found, the rational approximation is amended to account for several new sequence symbols instead of just one.Google Scholar