Abstract
We introduce rational complexity, a new complexity measure for binary sequences. The sequence s ∈ B^{ω} is considered as binary expansion of a real fraction \(s \equiv {\sum }_{k\in \mathbb {N}}s_{k}2^{k}\in [0,1] \subset \mathbb {R}\). We compute its continued fraction expansion (CFE) by the Binary CFE Algorithm, a bitwise approximation of s by binary search in the encoding space of partial denominators, obtaining rational approximations r of s with r → s. We introduce Feedback in \(\mathbb {Q}\) Shift Registers (F\(\mathbb {Q}\)SRs) as the analogue of Linear Feedback Shift Registers (LFSRs) for the linear complexity L, and Feedback with Carry Shift Registers (FCSRs) for the 2adic complexity A. We show that there is a substantial subset of prefixes with “typical” linear and 2adic complexities, around n/2, but low rational complexity. Thus the three complexities sort out different sequences as nonrandom.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction and motivation
Stream ciphers employ pseudorandom sequences as a replacement for onetimepads. Such a cipher is deterministic, but should be indistinguishable from true randomness. A means to assess the quality of such a pseudorandom symbol stream s ∈ B^{ω} is the linear complexity profile (LCP), \((L(s,n), n\in \mathbb {N}_{0})\), among other tests. The expected value is L(s, n) ≈ n/2, and while deviations of \(\pm \log _{2}(n)\) are typical, values with \(L(s,n) n/2> 2\cdot \log _{2}(n)\) indicate atypical “nonrandom” behaviour, see [15].
Klapper and Goresky introduced 2adic complexity with profile \((A(s,n), n\in \mathbb {N}_{0})\), \(A = 1 + \lfloor \log _{2} ({\Phi })\rfloor \in \mathbb {N}\) (see [8, p. 128] for Φ). They demand [6, p. 1]: “In general, one must consider this ‘2adic span’ as a measure of security along with ordinary linear span.”
Linear complexity uses Linear Feedback Shift Registers (LFSRs) and approximates \(G(s) = {\sum }_{k\in \mathbb {N}}s_{k}x^{k}\in \mathbb {F}_{2}[[x^{1}]]\) by rational functions from \(\mathbb {F}_{2}(x)\), while 2adic complexity employs Feedback with Carry Shift Registers (FCSRs) and approximates \({\sum }_{k\in \mathbb {N}}s_{k}2^{k1}\in \mathbb {Z}_{2}\) by rational numbers \(p/q\in \mathbb {Q}\) with odd denominator.
The new measure of rational complexity, the corresponding register type Feedback in \(\mathbb {Q}\) Shift Register (F\(\mathbb {Q}\)SR), and thus this paper grew out of 3 observations:

Binary sequences can be interpreted as real numbers from [0,1] and approximated by rationals from \(\mathbb {Q}_{1}\). No sequence measure seems to treat this pointofview yet.

Approximations of \(s\in \mathbb {R}_{1}\) by \(P/Q\in \mathbb {Q}_{1}\) are notoriously difficult numerically due to the Archimedean, nonultrametric nature of \(\mathbb {R}\). Only recently, we developed a pseudoultrametric continued fraction expansion (CFE) algorithm which produces about one bit of CFE encoding for each additional bit of the sequence prefix, thus mimicking the BerlekampMassey algorithm (BMA) for L in this respect, see [22].

Comparing linear, 2adic and rational complexities L(s, n),A(s, n) and R(s, n) of finite prefixes of a certain length n, it turns out that the measures assign low complexities to very different parts of the prefix set:
Motivating Example
Consider the three binary prefixes of length n = 24 given in Table 1. The columns L, A, R give the linear, 2adic, and rational complexities, respectively.
As can be seen, in each case two complexities lie around n/2 = 12, which is near the expected average behaviour (see [15] for L, [16] for A: we only know (n − 1)/2 ≤ E(A) < 0.772 ⋅ n, and Section 5 for R), and thus the sequence is unremarkable from that point of view. Each sequence, however, has a markedly low value in one of the complexities, (1) for L, (2) for A, and (3) for R. Hence, from a cryptographic point of view, all three sequences should be discarded as nonpseudorandom. We need though all three tests, L, A, and R to encounter all three of them as atypical.
Like Klapper stated for the 2adic complexity, we therefore should also consider the rational complexity R, defined in Section 5, as a means to assess randomness.
The general idea is to represent a given binary sequence as a real number
and to obtain its CFE, and thereby good rational approximations. By reuse of notation, we will write s both for the binary sequence and the resulting real number.
Going in the direction from s to its CFE is a very tedious process over \(\mathbb {R}\). It is not suited to bitwise, incremental computation with growing prefix length, since it needs the “full” precision (whatever that will be) right from the start. Instead, we define a sequence of CFEs (of rational numbers) that converge to the CFE of s, by doing a binary search in the space of all CFEs. Due to the results from [22], this is possible and efficient.
We obtain an encoding of the CFEs which is both monotonic
(in \(\mathbb {R}\) and in lexicographical order, respectively) and at the same time pseudoultrametric: If two sequences (or binary representations of numbers) s and \(s^{\prime }\) differ in bit k, but not before, their encodings differ—in the limit for large k, for numbers \(s,s^{\prime }\) satisfying the socalled GaußKuz’min measure (Section 3)—for the first time around bit ⌊1.024 ⋅ k⌋, not much earlier and not much later: The information content provided by \(s_{1},{\dots } , s_{k}\) is roughly preserveded and translated into ≈ 1.024 ⋅ k bits of information about the CFE of s.
The actual trial values \(r\in \mathbb {Q}_{1}\) with r → s are taken from the socalled V_{10} tree [22]. Let v be the address of r within this infinite binary tree (v = ε at the root, each move down appends a bit to v, 0 to the left, 1 to the right). Then the CFE encoding of r will be v10^{ω} ∈ B^{ω}.
Therefore apparently, since we move further and further down the V_{10} tree, any address is a prefix of the CFE encoding of s: the prefix grows when moving down, previously defined bits are not altered afterwards.
This paper is divided in 3 parts.
First, we treat continued fractions (Section 2), encodings (Section 3), the Binary CFE algorithm (Section 4), define Rational Complexity (Section 5), and give an example.
The second part treats properties of rational complexity such as connections with 2adic complexity for periodic sequences as well as impulse response sequences (Section 7). A comparison of the three complexity measures R, L, and A on 30bit prefixes (a generalisation of the “Motivating Example”) is given in Section 8. We show that encodings and prefix lengths do not differ too much (the “pseudoultrametric” property) in Section 9. In Section 10, we introduce the Q Tree, providing an ultrametric ordering of \(\mathbb {Q}_{1}\).
The final part covers implementation aspects: In Section 11, we show, how to efficiently update the convergents P/Q, when moving down the V_{10} tree, by a Finite State Machine with Counter. Section 12 covers Feedback in \(\mathbb {Q}\) Shift Registers (F\(\mathbb {Q}\)SR) which allow to efficiently compute the infinite rational sequence r—to be compared to the input s—from the two numbers P and Q. We finish with Open Problems.
2 Continued fraction expansions in \(\mathbb {R}\) and \(\mathbb {F}_{2}[[x^{1}]]\)
Case \(\mathbb {R}\)
The usual way to compute the CFE of a number \(s := {\sum }_{k\in \mathbb {N}} s_{k}2^{k}\in \mathbb {R}_{1}\) is
giving \(\displaystyle s = 0 +\frac {1}{ b_{1}}+\frac { 1}{ b_{2}}+\cdots \), where the resulting b_{i} are the partial denominators (PDs).
We write \(s = [b_{1},b_{2},\dots ]\) for the CFE.
The convergents P_{i}/Q_{i} to s are obtained via Perron’s schema, Table 2 (see [14], we use s = π − 3 as example). The initial values are P_{− 2} = Q_{− 1} = 0, Q_{− 2} = P_{− 1} = 1, and then P_{i} := b_{i} ⋅ P_{i− 1} + P_{i− 2}, Q_{i} := b_{i} ⋅ Q_{i− 1} + Q_{i− 2} with \(\left  s  P_{i}/Q_{i}\right  < Q_{i}^{2}\) for \(s\not \in \mathbb {Q}\), [14, Satz 2.10].
Case \(\mathbb {F}_{2}[[x^{1}]]\)
In the case of formal power series from \(\mathbb {F}_{2}[[x^{1}]]\), one computes the linear complexity via the BerlekampMassey Algorithm (BMA) [1, 11]. The BMA takes the sequence s of coefficients and computes the CFE
with polynomials \(p_{i}(x) \in \mathbb {F}_{2}[x]\) as partial denominators. With the adaptations by Dornstetter [3] and Vielhaber [19], the BMA computes an isometry K on \(\mathbb {F}_{2}^{\omega }\) such that the modified discrepancy sequence d := K(s) (see [19])
is at the same time an encoding of the PDs, by
The degrees \(g_{i} = \deg (p_{i})\) of the PDs determine the linear complexity profile of s (see [19]). For fast implementations see [2, 13].
Our approach is inspired by this ultrametric, bitwise model and will avoid the costly inversions s^{(i)} := 1/{s^{(i− 1)}} (which would have to be executed each time with the maximum prefix length corresponding to the eventually desired precision).
3 Binary CFE I: GaußKuz’min measure, codes, monotonicity
3.1 GaußKuz’min measure
To define an encoding for PDs, we start with the relevant facts obtained by Gauß, Kuz’min, Khinchin, and Lévy about continued fractions for \(r\in \mathbb {R}\backslash \mathbb {Q}\) and their PDs b_{i}. The probability for a PD value of b is \(\mu _{GK}(b) := \lim _{n\to \infty }{\{i\ 1\leq i\leq n,\ b_{i}=b\}}/{n}\).
Theorem 1
GaußKuz’minKhinchinLévy [5, 9, 10]
For almost all values \(r\in \mathbb {R}\) (exceptions are rational, quadraticalgebraic and some more numbers, including \(e^{2/k},k\in \mathbb {N}\)), we have:

(i)
The probability for a partial denominator \(b\in \mathbb {N}\), its GaußKuz’min measure, is
$$\mu_{GK}(b) = \log_2\left( 1\frac{1}{(b+1)^2}\right) = \log_2\left( 1+\frac{1}{b(b+2)}\right).$$ 
(ii)
The geometric average of the partial denominators is Khinchin’s constant
$$K := \lim_{n\to\infty}\sqrt[n]{b_1\cdot b_2{\cdots} b_n} \doteq 2.68545.$$ 
(iii)
The average gain in precision, per partial denominator in bits, is
$$\displaystyle \frac{\pi^{2}}{6\ln(2)^{2}} \doteq 3.42371, \textit{with}~2^{3.42371/2}\doteq 3.27582~\textit{being L\'{e}vy's constant.} $$
3.2 Encodings C_{I}, C_{II}, and C
The encoding \(C\colon \mathbb {R}_{1}\to B^{\omega }\backslash B^{*}1^{\omega }\) for CFEs, i.e. for the sequence of PDs from \(\mathbb {N}\), shall satisfy the following 3 desiderata:
(GK) C is reasonably close to the GaußKuz’min measure.
We want to be almost isometric that is the encoding length should advance in parallel with the sequence prefix, bitbybit (preservation of information content). The measure μ_{GK}(b) for the expected fraction of the PD value \(b\in \mathbb {N}\) thus requires that an encoding for b has length around \(\log _{2}(\mu _{GK}(b))\).
(MON) C is monotonically increasing: \(r < r^{\prime } \Longleftrightarrow C(r) < C(r^{\prime })\) in lexicographical order.
This allows to avoid the costly CFE computation for the real s altogether. Instead, we do a binary search in the encoding space \(C(\mathbb {R}_{1})\subset B^{\omega }\) and compare the resulting (rational!) number r with s, where (MON) tells us, which of the two subintervals of the binary search remains valid, and (GK) ensures efficiency for this process.
(INC) C is incremental: Given the last two convergents P/Q and \(P^{\prime }/Q^{\prime }\), from r = P/Q = C^{− 1}(v10^{ω}) we obtain
for both values a ∈ B with some \(\delta _{a}\in \mathbb {N}_{0}\). This yields an efficient implementation, where the multiplication P_{i} := b_{i} ⋅ P_{i− 1} + P_{i− 2} is replaced by iterated shiftandadd. This requirement (INC) is only relevant for fast implementations, see Section 11.
Definition 2
Encodings C_{I}, C_{II}, and C (Table 3)
We encode a PD \(b\in \mathbb {N}\), \(b = 2^{n} +{\sum }_{k=0}^{n1} a_{k}2^{k}\) with \(n := \lfloor \log _{2}(b)\rfloor \) as follows:

(i)
For PDs b_{i} with odd index i, let
$$\text{C}_{\mathrm{I}}\colon \mathbb{N}\to B^{*},\ b \mapsto \text{C}_{\text{I}}(b) = 0^n1\overline a_{n1}{\dots} \overline a_1\overline a_0.$$In particular, 1↦1;2↦011;3↦010. We also use the “pseudoPD” ℵ_{0} (AlephZero), the first, countable infinity, with 1/ℵ_{0} := 0, and set C_{I}: ℵ_{0}↦0^{ω}.

(ii)
For PDs b_{i} with even index i, we complement the codewords to obtain code C_{II}:
$$\text{C}_{\text{II}}\colon \mathbb{N}\to B^{*},\ b \mapsto \text{C}_{\text{II}}(b) = \overline{\text{C}_{\text{I}}(b)} = 1^n0a_{n1}{\dots} a_1a_0.$$In particular, 1↦0;2↦100;3↦101;4↦11000, and C_{II}(ℵ_{0}) := 1^{ω}.
Observe that both \(\text {C}_{\mathbb {F}_{2}[\mathrm {x}]}\) and C_{I},C_{II} are composed of a first half, encoding the size (degree / \(\lfloor \log _{2}(b)\rfloor \)), and a second half, describing the details from “coarse” (a_{d− 1}a^{d− 1} / a_{n− 1}2^{n− 1}) to “fine” (a_{0}x^{0} / a_{0}2^{0}). C_{I} and C_{II} are prefixfree and complete prefixcodes.

(iii)
We concatenate encodings (rational finite CFE and irrational infinite CFE) as
$${}C\colon\mathbb{Q}_1\to B^{*}0^{\omega}, r =[b_1,b_2,\dots,b_{2l}] \mapsto C(r):= \text{C}_{\mathrm{I}}(b_1)\text{C}_{\text{II}}(b_{2})\dots\text{C}_{\text{II}}(b_{2l})\text{C}_{\mathrm{I}}(\aleph_0).$$and
$$C\colon\mathbb{R}_1\backslash \mathbb{Q}_1\to B^{\omega} \backslash(B^{*}0^{\omega} \cup B^{*}1^{\omega}), r =[b_1,b_2,b_3,\dots] \mapsto C(r):=\text{C}_{\text{I}}(b_1)\text{C}_{\text{II}}(b_{2})\text{C}_{\text{I}}(b_{3})\dots{}. $$
About C for \(\mathbb {Q}_{1}\): A finite CFE can be represented in two ways, as \([\dots , b_{n},1] = [\dots , b_{n}+1]\), leading to the same value. We can resolve this ambiguity in 4 ways:

(i)
Let the last PD always be greater than 1, or

(ii)
always equal to 1, or

(iii)
always have an even, or (iv) always an odd number of PDs.
We use convention (iii), thus C always terminates in some C_{II}(b_{2l}), followed by the “pseudoPD” ℵ_{0} with C_{I}(ℵ_{0}) = 0^{ω}. We therefore have \(B^{*}0^{\omega } = C(\mathbb {Q}_{1})\) as encodings of finite CFEs.
(GK) Columns l_{I,II}, with \(l_{\text {I,II}}(b) = 1 + 2\lfloor \log _{2}(b)\rfloor \), and l_{GK}, with \(l_{GK}(b) = \log _{2}(\mu _{GK}(b))\), of Table 3 give the coding length for our code and an “ideal code” according to the GaußKuz’min measure (which would lead to nonintegral word lengths), respectively. As can be seen, the two columns have reasonably similar values for each b. The codes are from [22], but C_{II} already appears in Vallée [17] and she mentions Elias (reference [12] in [17]) as inventor. The complemented code C_{I} and the interleaving as C are though apparently new.
Comparing with L, the analogous measure for a polynomial PD \(p(x) = x^{m}+{\sum }_{k=0}^{m1}a_{k}x^{k}\) \(\in \mathbb {F}_{2}[x]\) is μ(p) = 2^{− 2m}, and the encoding K (see [19]) is an isometry, with \(\text {C}_{\mathbb {F}_{2}[\mathrm {x}]}(p) = 2m = \log _{2}(\mu (p))\). In a sense, this paper tries to achieve the same in the setting of \(\mathbb {R}\).
3.3 Monotonicity
We next show the monotonicity of \(C\colon \mathbb {R}\to B^{\omega }\), where < is the usual order relation in \(\mathbb {R}\), and refers to lexicographical order in B^{ω}. From continued fraction theory, we know the impact on r of changing a PD b_{i}:
Proposition 3
Let two real numbers \( r^{\prime } = [b_{1},\dots ,b_{i1},b^{\prime }_{i},\dots ]\), \(r^{\prime \prime }= [b_{1},\dots ,b_{i1},b^{\prime \prime }_{i},\dots ] \) with \(b_{i}^{\prime } < b_{i}^{\prime \prime }\) be given. Then \(r^{\prime } < r^{\prime \prime }\) for even index i, and \(r^{\prime } > r^{\prime \prime }\) for odd i, respectively.
Proof
See Satz 2.8 in Perron [14]. □
Theorem 4
(MON) Monotonicity of C(r)
Let \(r^{\prime },r^{\prime \prime }\in [0,1)\subset \mathbb {R}\) with \(r^{\prime } < r^{\prime \prime }\), and \(C(r^{\prime }),C(r^{\prime \prime })\) their encodings.
Then \(C(r^{\prime }) < C(r^{\prime \prime })\) in lexicographical order.
Proof
Follows with Proposition 3. As mentioned (see [14, Satz 2.8]), for odd index 2i + 1 (b_{j}, j < 2i + 1, fixed), \(b^{\prime }_{2i+1} > b^{\prime \prime }_{2i+1}\) implies \(r^{\prime } = [b_{1},\dots , b_{2i},b^{\prime }_{2i+1}] < r^{\prime \prime } = [b_{1},\dots , b_{2i}, b^{\prime \prime }_{2i+1}]\). Also, we then have \(\text {C}_{\mathrm {I}}(b^{\prime }_{2i+1}) < \text {C}_{\mathrm {I}}(b^{\prime \prime }_{2i+1})\) and thus \(C(r^{\prime }) < C(r^{\prime \prime })\) lexicographically.
For even index 2i + 2 (b_{j}, j < 2i + 2, fixed), we have \(b^{\prime }_{2i+2} < b^{\prime \prime }_{2i+2}\Leftrightarrow r^{\prime } < r^{\prime \prime }\Leftrightarrow \text {C}_{\text {II}}(b^{\prime }_{2i+2}) < \text {C}_{\text {II}}(b^{\prime \prime }_{2i+2})\Leftrightarrow C(r^{\prime }) < C(r^{\prime \prime })\). □
4 Binary CFE II: Algorithm
The Binary Continued Fraction Expansion Algorithm (Table 4, left) builds upon the monotonicity of \(C\colon \mathbb {R}_{1}\to B^{\omega }\) from sequences to encodings. We use the computationally easier inverse to obtain a trial value \(r \in \mathbb {Q}_{1}\) from an encoding with finite support in B^{∗}10^{ω}, r := C^{− 1}(v10^{ω}),v ∈ B^{∗}. We then compare s to r with s < r ⇔ C(s) < C(r), Theorem 4.
The function C^{− 1} restricted to B^{∗}10^{ω} is represented in the V_{10} Tree defined in [22], an infinite binary tree with labels, the binary analogue to the SternBrocot Tree. Table 4 (right) (see also Table 9) gives the first 5 levels of the V_{10} Tree, and therefore the first 5 trial values (with v≤ 4) for any sequence s, with C^{− 1}(v10^{ω}) = [PDs\(]=P/Q=r\in \mathbb {Q}_{1}\).
The  separates the encodings C_{I},C_{II} and the “.” stands after the current prefix v.
The V_{10} Tree arranges the \(r\in \mathbb {Q}_{1}\) according to their CFE support length. The r with CFE v10^{ω} appears as label at address v on level v + 1, level 1 being the root node with address ε. The node at v has child nodes with addresses v0 (left) and v1 (right child).
Moving down the V_{10} Tree with growing v, we successively shrink the set vB^{ω} of permissible encodings, whose center is the trial encoding v10^{ω}. Analogously to the BMA for L, we compute a discrepancy sequence D(s) = (d_{k}), where d_{k} = 0, if the approximation of s by r up to bit k − 1 also satisfies s_{k} = r_{k}; d_{k} = 1 otherwise, for s_{k}≠r_{k}. By definition, D is again an isometry on B^{ω}.
The bits within the sequences, input s, trial value r, discrepancies d, are indexed by k and the index is advanced in the FOREVER loop as long as s_{k} = r_{k} with d_{k} = D(s)_{k} := 0.
Whenever a discrepancy r_{k}≠s_{k}, d_{k} = D(s)_{k} := 1 occurs, we enter the WHILE loop to adjust the encoding, the CFE, and thereby the trial value r:
Either we have \((s_{1}{\dots } s_{k}) < (r_{1}{\dots } r_{k})\) lexicographically, and r is too large, has to decrease, Case ⊖: We set v := v0, leading to v0B^{ω} as new cylinder set of valid encodings, with the previous trial encoding v10^{ω} as upper bound. Thus v010^{ω} is the new trial encoding—the new center value.
Or, to the contrary, we had \((s_{1}{\dots } s_{k}) > (r_{1}{\dots } r_{k})\), Case ⊕: Then we adjust to v := v1, with v110^{ω} the center of v1B^{ω}. This interval halving or binary search within \(B^{\omega } = C(\mathbb {R}_{1})\) allows to avoid the costly inversions s^{(i)} := 1/{s^{(i− 1)}} (see Section 2, CFE in \(\mathbb {R}\)), which in principle would have to be carried out with the full precision of s.
Given the new v (one new bit appended to the PD encoding), we generate the new convergent r = P/Q incrementally. We have to take three actions:

I
\(v\mapsto v10^{\omega }\stackrel {C^{1}}{\longmapsto } [b_{1},b_{2},\dots , b_{2l},\aleph _{0}]\), obtain the new sequence of PDs.

II
\([b_{1},b_{2},\dots , b_{2l}]\mapsto P/Q\), evaluate the encoded PDs by Perron’s schema.
For now, cut v10^{ω} into the pieces according to C_{I},C_{II} in I, and then use Table 2 for II. Section 11 will show, how the new convergents P/Q are obtained almost effortlessly from the two previous ones, combining I and II, by property (INC).

III
P/Q↦r ∈ B^{ω}, expand in binary. Again, for now think “paperandpencil long division in binary / in base 2”. Section 12 will introduce F\(\mathbb {Q}\)SRs to do this efficiently.
If now s_{i} = r_{i},1 ≤ i ≤ k, we leave the WHILE loop and return into the FOREVER loop, otherwise we refine the PD encoding by again extending v as long as necessary.
For sequences satisfying Theorem 1 (GaußKuz’min), we enter the WHILE loop on average every second cycle of the FOREVER loop (for s_{k}≠r_{k} i.e. d_{k} = 1), and we show in Section 9 that \(\lim _{k\to \infty } v/k \doteq 1.024\) in this case. Therefore, we stay within the WHILE loop for an average of 2.048 cycles for such sequences.
5 Rational complexity of a binary sequence
Before defining rational complexities, a comment is in order: The reals \(\mathbb {R}_{1}\), and B^{ω} as their dyadic representations, are not topologically equivalent. Rather B^{ω} is isomorphic to the Cantor set (see Vepstas [18]). As a consequence, a dyadic fraction \(r \in \mathbb {D}_{1}\subset \mathbb {Q}_{1}\) corresponds to the two sequences w10^{ω} and w01^{ω} for some w ∈ B^{∗}. We set the complexity for w01^{ω} to at most w + 2, including the first 1 of the “stuckat” part in the length.
For 2^{−k} = 0^{k− 1}10^{ω} = 0^{k}1^{ω}, this leads to R = k and k + 1, respectively. Apparently, for the latter, a kbit register would be initialized with 0^{k} and could not generate 0^{k}1^{ω}.
We now define the rational complexity of a sequence in analogy to L and A.
Definition 5
Rational Complexity R(s, n)

(i)
Let R(s,0) := 0,∀s ∈ B^{∗}∪ B^{ω}. For allzero sequences, let \(R(0^{\omega },n) := 0, n\in \mathbb {N}\) and R(0^{k}, n) := 0,1 ≤ n ≤ k.

(ii)
For sequences s ∈ B^{∗}∖({0}^{∗}∪ B^{∗}11) i.e. neither allzero, nor ending in 11, and for s ∈ B^{ω}∖({0^{ω}}∪ B^{∗}1^{ω}), the rational complexity of s is
$$R(s,n) := \lceil \log_2(Q_i)\rceil,$$where P_{i}/Q_{i} = r is the first convergent of the Binary CFE matching r_{k} = s_{k},1 ≤ k ≤ n. In view of Section 12, R(s, n) is the length of an F\(\mathbb {Q}\)SR producing the nbit prefix of s.

(iii)
For sequences s = v01^{k}, k ≥ 2 or s = v01^{ω} with some v ∈ B^{∗}, let \(R(s,n) = \min \limits (v+2, \lceil \log _{2}(Q_{i})\rceil )\), v + 2 ≤ n ≤s, with Q_{i} as in (ii). Also, \(R(1^{\omega },n) := 1,\forall n\in \mathbb {N}\).

(iv)
Let the rational complexity profile for a sequence s ∈ B^{∗} be (R(s, n),0 ≤ n ≤s). For an s ∈ B^{ω}, the profile is \((R(s,n), n\in \mathbb {N}_{0})\).

(v)
For ultimately periodic sequences s = x(y)^{ω} with x ∈ B^{∗} and y ∈ B^{+}, let \(R(s) := \lim _{n\to \infty }R(s,n)\). Then R(s) is the length of the shortest F\(\mathbb {Q}\)SR producing s.

(vi)
The number of sequences of length n with rational complexity \(c\in \mathbb {N}_{0}\) is denoted by N_{R}(n, c), and we set \(N_{R}(c) := \lim _{n\to \infty } N_{R}(n,c)\).
Let \(E(R(n)) := 2^{n}\cdot {\sum }_{s\in B^{n}} R(s,n) = 2^{n}\cdot {\sum }_{k=0}^{n} k\cdot N_{R}(n,k)\) be the average rational complexity for words of length n, and Var(R(n)) the corresponding variance. The distribution of rational complexities for words of length n ≤ 12, as well as E = E(R(n)) and Var = Var(R(n)) for n ≤ 24 are given in Table 5.
Theorem 6
Rational Complexity: Counts
The number N_{R}(c) is N_{R}(0) = 1 (for s = 0^{ω}), N_{R}(1) = 2 (for s = 10^{ω},1^{ω}), and for c ≥ 2
where φ is Euler’s totient function.
Proof
The sum is the number of fractions P/Q with 2^{c− 1} < Q ≤ 2^{c},1 ≤ P < Q and P, Q coprime. The additional 2^{c− 2} sequences correspond to the special cases v01^{ω} with v ∈ B^{c− 2}, the set of words over B of length c − 2, for dyadic fractions (Def. 5 (iii)). □
Conjecture 7
Rational Complexity: Average and Variance
The following limits exist and yield four small constants \({\nu _{E}^{0}},{\nu _{E}^{1}},{\nu _{V}^{0}},{\nu _{V}^{1}}\):
The approximation of each PD may be alternating from above and below. Thus the rational complexity is not necessarily monotonic (the sequence of encodings v10^{ω} though has monotonically increasing support length v + 1). However, the error in monotonicity is at most one:
Proposition 8
For all sequences s ∈ B^{ω} and all \(k_{1} < n < k_{2}\in \mathbb {N}\), we have
Proof
A reduction of the absolute value of Q, and thereby possibly of R(s, n), may only occcur when the current PD b is decreased.
The PDs first increase through the values \(2^{k}, k=1,2,3,\dots \) with either C_{I}(2^{k} − 1) = 0^{k− 1}10^{k− 1}, C_{II}(1) = 0 or C_{II}(2^{k}) = 1^{k− 1}0^{k} as last encodings before C_{I}(ℵ_{0}) = 0^{ω}, see Table 3 (also Section 11, Table 11). They then settle within the interval [2^{n},2^{n+ 1}] for some n — only this may decrease R.
The claim follows with \(Q+Q^{\prime }\leq Q+2^{n}Q^{\prime } \leq Q+bQ^{\prime } \leq Q+2^{n+1}Q^{\prime } < 2(Q+2^{n} Q^{\prime })\). □
Remark 9
Being monotonic with growing prefix length is of course a desirable property of a complexity measure. We therefore define a monotonic version of R in the next definition. Note, however that this approach is no longer the Binary CFE with incremental bitwise consideration of both the sequence and the encoding.
Definition 10
Monotonic Rational Complexity \(\overline R\)
Let \(\overline R(s,k) := R(s,k)\) for all sequences s and positions k ≤s, with two exceptions (following the algorithm from Table 4, left):

(i)
If R(s, k) is the result of the WHILEloop with Case ⊖ in an encoding C_{I} or a case (C_{II},⊕), then R(s, k) was computed from an upper bound for b. In this case, advance the encoding (append to v) until a situation (C_{I},⊕) or (C_{II},⊖), a lower bound for b, occurs within the current encoding, or until reaching the encoding of the next PD, irrespective of ⊕,⊖. If hereby the rational complexity decreases to R(s, k) − 1, take this value for \(\overline R(s,k) := R(s,k)1\), otherwise, keep \(\overline R(s,k) = R(s,k)\).

(ii)
If, in a Case (√), we previously had \(\overline R(s,k1) = R(s,k1)  1\), keep that value, i.e. \(\overline R(s,k) := \overline R(s,k1) = R(s,k1)1 = R(s,k)1\).
6 Example π
Example 11
The wellknown CFE of \(\pi \doteq \texttt {0x3.243F6A8885A308D}\) is \([3;7,15,1,292,1,\dots ]\). The Binary CFE Algorithm proceeds on π − 3 as in Table 6. We omit the final code 0^{ω} and PD ℵ_{0}. \(\{\texttt {S}\in \texttt {A},\dots , \overline {\texttt {D}}\}\) is the state of the FSM+C (Section 11). \(B^{\omega }\ni r\equiv p/q \in [0,1)\subset \mathbb {R}\) is given as prefix in hexadecimal. k gives the first place with s_{k}≠r_{k}. ⊖ indicates s < r, while ⊕ indicates s > r. R = R(k − 1); values (R) appear inside the WHILE loop.
Given that the sequence π − 3 =.243F6A88\(\dots \) fails to meet the suggested approximations in places \(s_{1},s_{2},s_{6},s_{9},s_{2224}, s_{2630}, s_{32}, s_{35}\dots \), we obtain the discrepancy sequence D(0010.0110.0011.1111.0110.1010.1000.1000) = 1100.0100.1000.0000.0000.0111.0111.1101 = (d_{k} ) = d. The long 0run between d_{9} and d_{22} is of course the result of the large PD 292 and the excellent approximation 16/113 (cf. 355/113 for π). Also, the encoding C(s) = v (compare with the ⊖ / ⊕ sequence), is \(C(s) = 00100.1110111.1.11111111000100100.1\dots \).
7 Connections between rational and 2adic complexities, and impulse response sequences
In the context of linear complexity, a periodic sequence \(s = (s_{1},s_{2},\dots ,s_{T})^{\omega }\) of period T has linear complexity \(T  \deg (\gcd (x^{T}1, S))\), where \(S := \sum s_{i}x^{i1}\in \mathbb {F}_{2}[x]\) assembles the digits of the period. For rational complexity, we similarly have:
Proposition 12 (Rational complexity of periodic sequences)
Let \(s=(s_{1},\dots ,s_{T})^{\omega }\) be a periodic sequence. Let \(S := {\sum }_{i=1}^{T} s_{i}2^{Ti}\in \mathbb {N}\), and let g := gcd(2^{T} − 1,S). Then s has rational complexity \(R(s) = T  \lfloor \log _{2}(g)\rfloor \).
Proof
s can be generated as P/Q with P := S and Q := 2^{T} − 1. Set \(\tilde P := S/g\) and \(\tilde Q :=(2^{T}1)/g\). Then still \( s = \tilde P / \tilde Q = (S/g) / ((2^{T}1)/g) = S / (2^{T}1)\), which shows the upper bound \(R(s) \leq \lceil \log _{2}((2^{T}1)/g)\rceil \). As \(\mathbb {Z}\) is a unique factorization domain, there is no simpler way to represent \(\tilde P\) and \(\tilde Q\). Since 2̸g, \(\log _{2}(g)\) is not integral, and thus \(R(s) = \lceil \log _{2}((2^{T}1)/g)\rceil = T  \lfloor \log _{2}(g)\rfloor \). □
In the following, \(\hat s\in \mathbb {Q}\subset \mathbb {Z}_{2}\) refers to the 2adic value of the ultimately periodic sequence s ∈ B^{ω}. Note that \(\hat s\neq s \in \mathbb {Q}\subset \mathbb {R}\) (reuse of symbol s in \(\mathbb {R}_{1}\) and B^{ω}) in general.
Theorem 13
Connection between Rational and 2adic Complexity

(i)
Let s = y^{ω} be a purely periodic sequence. Then R(s) equals the 2adic complexity of the sequence with reversed period, \(R(y^{\omega }) = A((\stackrel {\leftarrow }{y})^{\omega })\).
The symmetric 2adic complexity [4], \(A_{\text {\scriptsize {\sc sym}}}(y^{\omega }) := \min \limits (A(\stackrel {}{y}^{\omega }), A((\stackrel {\leftarrow }{y})^{\omega }))\) is equal to \(\min \limits (A(\stackrel { }{y}^{\omega }), R(\stackrel { }{y}^{\omega })) =\min \limits (R(\stackrel { }{y}^{\omega }), R((\stackrel {\leftarrow }{y})^{\omega }))\).

(ii)
Let s = xy^{ω}, x, y ∈ B^{+}, be an ultimately periodic sequence. Then R(s) and the 2adic complexity \(A((\stackrel {\leftarrow }{x})(\stackrel {\leftarrow }{y})^{\omega })\) are both obtained from the numbers (x)_{2},(y)_{2},2^{x},2^{y}− 1.
Proof

(i)
For purely periodic sequences s = y^{ω}, let \( \stackrel {\leftarrow }{s}:= (\stackrel {\leftarrow }{y})^{\omega }\). Then we have
$$ s = \frac{(y)_2}{2^{y}1}\in\mathbb{R},\quad \hat s = \frac{(\stackrel{\leftarrow}{y})_2}{2^{y}1}\in\mathbb{Z}_2,\quad \stackrel{\leftarrow}{s} = \frac{(\stackrel{\leftarrow}{y})_2}{2^{y}1}\in\mathbb{R},\quad \widehat{\stackrel{\leftarrow}{s}} = \frac{(y)_2}{2^{y}1}\in\mathbb{Z}_2 .$$In this case, the reduced fractions \(s=p/q, \stackrel {\leftarrow }{s}=m/n\in \mathbb {R}, \hat s = m/n,\widehat {\stackrel {\leftarrow }{s}} = p/q\in \mathbb {Z}_{2}\) satisfy 1 ≤ p < q and 1 ≤ m < n, q, n odd, see also [7, Cor. 2.1]. Hence, the rational complexity—of purely periodic sequences—can indeed be computed via the FCSR apparatus from 2adic complexity:
\(R(y^{\omega }) = A((\stackrel {\leftarrow }{y})^{\omega }) = \lceil \log _{2}(q)\rceil = 1+\lfloor \log _{2}(q)\rfloor \), if T > 1. The complexities also coincide for the special cases R(0^{ω}) = A(0^{ω}) = 0 and R(1^{ω}) = A(1^{ω}) = 1 (from \(1\in \mathbb {Z}_{2}\)).
The claim about the symmetric 2adic complexity follows immediately.

(ii)
For ultimately periodic sequences s = xy^{ω} with \(s^{\prime } := \stackrel {\leftarrow }{x}(\stackrel {\leftarrow }{y})^{\omega }\), we similarly have
$$ s = 2^{x}\cdot\left( (x)_2 + \frac{(y)_2}{2^{y}1}\right)\in\mathbb{R},\ \hat s = (\stackrel{\leftarrow}{x})_2+ 2^{x}\cdot\frac{(\stackrel{\leftarrow}{y})_2}{2^{y}1},\ \widehat{s}^{\prime} = (x)_2+ 2^{x}\cdot\frac{(y)_2}{2^{y}1}\in\mathbb{Z}_2, $$where the first term of the sum corresponds to the preperiod, the second to the period, in all formulae. s and \(\widehat {s}^{\prime }\) are built up from the same 4 parts, thus so are R(s) and \(A(s^{\prime })\), but (of course) s and \(\widehat {s^{\prime }}\) are not necessarily equal, neither are R(s) and \(A(s^{\prime })\).
□
Example 14
Let s = 1011(1100)^{ω} and t = 1101(0011)^{ω}, with \((1011)_{2}=11, (\overleftarrow {1011})_{2} = (1101)_{2} = 13, (1100)_{2}=12, (\overleftarrow {1100})_{2} = (0011)_{2} = 3\)
Then \(s = \frac {11}{16} + \frac {1}{16}\cdot \frac {12}{15} = \frac {177}{240} = \frac {59}{80}\in \mathbb {R}\) and \(\hat s = 13 + 16 \cdot \frac {3}{15} = \frac {19548}{15} = \frac {147}{15} = \frac {49}{5}\in \mathbb {Z}_{2}\).
We have \(R(s) = \lceil \log _{2}(80)\rceil = 7\), and \(A(s) = 1+\lfloor \log _{2}(49)\rfloor = 6\).
Also, \(t = \frac {13}{16} + \frac {1}{16}\cdot \frac {3}{15} = \frac {195+3}{240} = \frac {33}{40}\in \mathbb {R}\) and \(\hat t = 11 + 16 \cdot \frac {12}{15} = \frac {16512}{15} = \frac {153}{15} = \frac {51}{5}\in \mathbb {Z}_{2}\).
We have \(R(t) = \lceil \log _{2}(40)\rceil = 6\), and \(A(t) = 1+\lfloor \log _{2}(51)\rfloor = 6\).
R(s)≠A(t) are both obtained using the numbers 11,12,2^{4},2^{4} − 1, while R(t) = A(s) use the numbers 13,3,2^{4},2^{4} − 1. The equality in the latter case is pure coincidence, R(t) stemming from the denominator, A(s) using the numerator.
Conjecture 15
Simulating R by A in O(n^{4})
There is an O(n^{4}) algorithm to compute the rational complexity without using the Binary CFE, applying the apparatus from 2adic complexity—and vice versa.
Proof
(Idea:) We can assume the separation into x and y after any position 0 ≤ k ≤ n within \(s_{1}{\dots } s_{n}\), and then apply Theorem 13(ii) to each case. This incurs another factor n + 1 in complexity. We thus obtain n + 1 rational approximations to s that are all valid on the given nbit prefix. Among these approximations, the one with smallest Q defines the rational complexity.
Let the current approximation to s = v... be \(s \doteq (va{\dots } z)^{\omega }\).
We efficiently obtain z from v by first generating an FCSR for v, and then running the FCSR backwards. Thus, we need O(v) steps instead of the full period, which might be considerably larger.
Incrementally computing the best rational approximation to a sequence \(s = v\overline {a}\dots \) from the approximation to the shorter prefix \(s = v\dots \) via 2adic complexity / FCSRs is not straightforward:
If the next bit after v in s happens to be a, the discrepancy is zero and the approximation is maintained. If however, s continues with \(\overline {a} = 1a\), then \(s = v\overline {a}\dots \) will have a best approximation \(s \doteq v_{1}(v_{2}\overline {a}{\dots } \tilde z)^{\omega }\), where firstly v possibly will be split into a preperiod v_{1}≠ε and the start v_{2} of the new period, and secondly the end of the new period, \(\tilde z\), has no direct relation with z. Also, the period length may grow or shrink considerably.
If the best approximation to v was already ultimately periodic with preperiod, things are similar. The length of the preperiod may change: \(s=uv\dots \doteq uv^{\omega }\) may lead to \(s=uvuv{\dots } \doteq (uv)^{\omega }\) for one extension, and to \(s=uv10^{uv+1}\dots \doteq uv10^{\omega }\) in another case. In other words, the new preperiod may be empty or include the whole previous prefix, or anything in between.
Since the new periods terminate in different words, \(\tilde z\neq z\), for each successive approximation— unless the previous value remains the same—we have to restart the FCSR from time step 1, a further factor n. Hence, we get a new approximation for a given separation position in O(n^{3}). All in all, we can simulate rational approximation by 2adic approximation in O(n^{4}).
The four factors n correspond to (i) the operand lengths, (ii) the assumed length of the preperiod, i.e. the position of the separation, (iii) the restart time for the FCSRs with the reversed end of the new period, \(\stackrel {\leftarrow }{\tilde z}\), and (iv) the growing prefix size of s.
As far as we can see, speedup via FFT techniques is not feasible due to (iii), since every two steps on average (for i.i.d. sequences) we have to restart the FCSRs with the new value \(\stackrel {\leftarrow }{\tilde z}\) instead of the \(\stackrel {\leftarrow }{z}\) (Open Problem 7). □
Theorem 16
Periods of impulse response sequences

(i)
Given an odd \(q\in \mathbb {N}\), let its impulse response sequence s ∈ B^{ω} be the binary expansion of 1/q. Then the period of s divides φ(q).

(ii)
The period of a/q is at most that of the impulse response sequence 1/q, for any a.

(iii)
For q = 2^{n} − 1, the period of the associated sequence 1/q divides n. Hence, unlike the LFSR case, there are no sequences with period 2^{n} − 1 and rational complexity n.

(iv)
Let p be a prime and 2 a generator of the multiplicative group \(\mathbb {F}_{p}^{*}\). Then a/p has period p − 1 for all a≠ 0.

(v)
For \(n\in \mathbb {N}\), N := 2^{n}, we can expect a prime p with \(\langle 2 \rangle = \mathbb {F}_{p}^{*}\) within the range [N − 2n, N − 1] (where N − 1 is excluded by (iii)).
Proof
Let ord_{q}(g) be the order of the (multiplicative) cyclic subgroup \(\langle g\rangle \subset (\mathbb {Z}/q\mathbb {Z})^{*}\). For prime q = p we also write \(\langle g\rangle \subset \mathbb {F}_{p}^{*}\). Our case is g = 2 for binary shifts.

(iii)
The residues mod q of 1/q are \(\{2^{k} \ (\text {mod}~q)\  \ k=1,2,\dots ,\operatorname {ord}_{q}(2)\} = \langle 2\rangle \subset (\mathbb {Z}/q\mathbb {Z})^{*}\), the group of units mod q. Since \((\mathbb {Z}/q\mathbb {Z})^{*}= \varphi (q)\), the claim now follows from Lagrange’s (subgroup index) theorem.

(ii)
The residues a ⋅ 2^{k} (mod q) for a/q differ from those for 1/q by a factor a. In particular they repeat after the same period, possibly having a shorter least period.

(iii)
\(2^{n}=1 \ (\text {mod}~ q)\Longleftrightarrow \operatorname {ord}_{q}(2)\ \ n\).

(iv)
Since 2 generates \(\mathbb {F}_{p}^{*}\), i.e. 2^{k} passes through all nonzero residues mod p, the first p − 1 shifts of 1/p pass through all a/p with a≠ 0. Thus they all have the period p − 1.

(v)
By the Prime Number Theorem, the probability of being prime for some p near N is \(1/\log _{e}(N)\doteq 1/ (0.69\cdot n)\). Assuming Artin’s conjecture (on primitive roots) [12] for prime p, 2 generates \(\mathbb {F}_{p}^{*}\) with probability \(C_{Artin} ={\prod }_{p} \left (1\frac {1}{p(p1)}\right ) \doteq 0.373955\). The joint probability C_{Artin} ⋅ 1/(0.69 ⋅ n) > 1/2n yields the result.
□
8 Comparison of R with A and L
Rational complexity R, linear complexity L, and 2adic complexity A all assign a finite complexity to ultimately periodic sequences. However, the three complexities assign low values to quite different sequences / prefixes [20].
In Table 7, we give the number of 30bit prefixes which have an unremarkable value in one of the complexities (L ≥ 14, A ≥ 14, and R ≥ 15, respectively), being higher than or at most one smaller than the expected value. We then classify these prefixes according to the remaining two complexities into 3 classes: expected value or higher (third column/row), off by at most \(5\approx \log _{2}(30)\) (middle), and the definitely atypical case, off from the expected value by more than a logarithmic difference (first column/row).
In particular, we have in boldface the numbers of sequences, which are only recognized as lowcomplexity by rational complexity, but neither by linear, nor 2adic complexity.
Over the set of prefixes of length 30, the three complexities L, A, and R are essentially uncorrelated, both in pairs and all three combined. Very low (0 or 1) and very high (30 = n) complexities correspond to the same sequences for L, A, and R, namely (0^{30};1^{30},10^{29};0^{29}1) For most sequence prefixes, however, the three complexities behave statistically unrelated. We examine prefixes of length n = 30 in the following lemma.
Lemma 17
LowCorrelationLemma
Let #(L = l, A = a) = {s ∈ B^{30}: L(s,30) = l ∧ A(s,30) = a} etc.
Define the “fraction of nonoverlapping of distributions” as
Then \( \rho _{LAR}\doteq 0.00938, \rho _{LA} \doteq 0.00494, \rho _{AR} \doteq 0.00720, \rho _{LR} \doteq 0.00528. \)
Proof
By numerical evaluation. More than 99% of all sequence prefixes behave according to the distribution expected for completely uncorrelated complexity measures, for all 4 correlations. □
9 Deviation from isometric behaviour
For the linear complexity, the encoding of PDs and the induced isometry, the discrepancy sequence, coincide. Both are given by K(s). In our case, the encoding C(s) and the discrepancy operator D(s), like K an isometry on B^{ω}, are different. This section investigates, by how much.
Lemma 18
Information Content of Prefixes from s, d = D(s), and C(s)
Let \(s\in \mathbb {R}_{1}\) satisfy the GaußKuz’min measure μ_{GK}. For a given prefix length \(n\in \mathbb {N}\), let \(r= P_{2i}/Q_{2i} = [b_{1},\dots ,b_{2i}]\) be the first approximation that matches the first n bits of s, i.e. r_{k} = s_{k},1 ≤ k ≤ n.
Then asymptotically, encoding length and precision satisfy \(\lim _{n\to \infty } \frac {{\sum }_{j=1}^{2i}l_{\text {I,II}}(b_{j})}{n} \doteq 1.024\).
Proof
From \({\sum }_{b\in \mathbb {N}}l_{\text {I,II}}(b)\cdot \mu _{GK}(b) \doteq 3.50698\), Theorem 1(iii), and \(\frac {3.50698}{3.42371}\) \(\doteq 1.024\). □
Lemma 18 shows that for numbers satisfying the GaußKuz’min measure, the code length C(s) grows faster than the considered prefix length of the sequences s and d = D(s) by a factor of (only) 1.024. One could say that we miss our goal to mimic the isometric behaviour of K by just 2.4% (on average).
For numbers not satisfying the GaußKuz’min measure, e.g. rationals or quadraticalgebraic numbers, the code may grow faster or slower. We compare the coding length l for \({\Phi }_{b} := [b,b,b,b,\dots ] = (\sqrt {b^{2}+4}b)/2\) with its gain in precision. In this case, the convergents’ denominators (and numerators) asymptotically increase by a factor \(\phi _{b} := (b+\sqrt {b^{2}+4})/2\) (= b + Φ_{b}), from \( \begin {array}{ccc} &&b_{i}=b\\ 1&\phi _{b}&b\cdot \phi _{b}+1 \end {array}\) (Perron’s schema) with \( b\cdot \phi _{b}+1\stackrel {!}{=}{\phi _{b}^{2}}\Longleftrightarrow {\phi _{b}^{2}}b\phi _{b}1=0\).
As r − P_{i}/Q_{i} decreases like \({\Theta }(Q_{i}^{2}) = {\Theta }(\phi _{b}^{2i})\) (since \(r\not \in \mathbb {Q}\)), see [14, Satz 2.10], each additional PD b asymptotically yields \(L := \log _{2}({\phi _{b}^{2}})\) more bits with r_{k} = s_{k}.
Since b is encoded by \(l = l_{\text {I,II}}(b) = 1+2\lfloor \log _{2}(b)\rfloor \) bits, we obtain the length extension for the encoding by
The values γ_{b} decrease monotonically with b, from b = 2^{t} to b = 2^{t+ 1} − 1 with fixed l = 2t + 1, then jump to a local maximum at 2^{t+ 1} etc. From Table 8, we can infer that γ_{1} = 0.72 is the minimum with a saving of 28% in coding size, 1000 bits of precision require only 720 bits of C(s), while γ_{4} = 1.20 is the maximum value for a length extension of 20%, the 1000 bits \(s_{1},{\dots } s_{1000}\) require about 1200 bits of C(s). Hence, ϕ_{1} has the shortest, ϕ_{4} the longest code, compared with attained precision. Comparing with Lemma 18, the value 1.024 has to be replaced by the range [0.72, 1.20] to cover all \(s\in \mathbb {R}_{1}\backslash \mathbb {Q}_{1}\).
For \(s\in \mathbb {Q}_{1}\), we have l/L → 0 asymptotically, a finite CFE yields infinite precision.
10 The Q Tree: an ultrametric ordering of \(\mathbb {Q}_{1} = \mathbb {Q}\cap [0,1)\)
In this section, we generate the Q Tree (Table 9), an isometric version of the V_{10} Tree. Its upper k levels, plus the allzero word, include all prefixes from B^{k} exactly once. We show, how to use the Q Tree to accelerate the Binary CFE.
First define an infinite binary tree with labels, the Expanded Q Tree: We read out the V_{10} Tree [22] in breadthfirst order, prepending 0/1, and put each value P/Q as label on every node whose address is a prefix of s = P/Q and that has not yet been labeled.
Example: The label s = 0/1 from 0^{ω} is put on nodes ε (the root), 0, 00, 000 etc. Then s = 1/2 = 10^{ω} labels 1, 10, 100, etc., while ε is already taken. s = 1/4 = 010^{ω} skips ε, 0 and labels from 01 onward. s = 2/3 = (10)^{ω} skips ε, 1, 10 and labels 101, 1010 and so on.
Theorem 19
Nodes and labels of the Expanded Q Tree

(i)
The labels P/Q = s of the 2^{k} nodes on level k yield each prefix v ∈ B^{k} exactly once.

(ii)
Exactly one of the two child nodes has the same label as the parent node.
Proof

(i)
By construction: Any prefix v ∈ B^{∗} is met just at the node with address v.

(ii)
Once a node receives a label s, the same s moves further down the addresses and labels a— yet unlabeled—child node of each visited node.
□
To generate the Q Tree from the Expanded Q Tree, for each node replace its label by the one of its child nodes, which is different. This in particular removes all labels 0/1.
Theorem 20
Equidistribution of Prefixes in the Q Tree

(i)
The binary expansions of the labels of the 2^{k} − 1 nodes on levels \(1{\dots } k1\) of the Q Tree, plus the special value 0^{k}, include every prefix from B^{k} exactly once.

(ii)
In the Q Tree, each word v ∈ B^{k}∖{0^{k}} appears as prefix of the binary expansion of the label P/Q, at some point on the path from the root to the node with address \(v_{1}{\dots } v_{v1}\).
Proof

(i)
By construction and Theorem 19. All entries on level k from the Expanded Q Tree are preserved somewhere further up—except 0^{ω}.

(ii)
The prefix was present at address v in the Expanded Q Tree and has moved up by construction. However, all such places, where it can move are just those on the path from the root to the parent of the node with address v, which has address \(v_{1}{\dots } v_{v1}\).
□
Corollary 21
Writing out the Q Tree, preceeded by the value 0, yields an ultrametric ordering of \(\mathbb {Q}_{1}\) (where the first 2^{k} entries of the sequence include all prefixes of length k exactly once, for all \(k\in \mathbb {N}):\) \(0/1,1/2,1/4,4/5,1/8,2/5\dots \).
The procedure of Table 4 (left part) is simplified considerably, if the Q Tree is known, see Table 10. Note that now the WHILE loop runs through r, s, and d. The original WHILE loop has been replaced by the single access to Q[v], whenever the current approximation does no longer hold, and at the exact address v for the new approximation P/Q. However, the Q Tree is as yet only a byproduct after setting up the V_{10} Tree, see Open Problem 6.
11 Binary CFE III: states, incremental change of b _{i}, P _{i}, Q _{i}
Recalling Section 4, we start with the full cylinder set B^{ω} or \(\mathbb {R}_{1}\) as possible encodings. Using the known correct prefix v (initially we know nothing, v = ε), we successively cut the cylinder set into half by probing with the trial value v10^{ω}↦C^{− 1}r, which sits at the center of the current cylinder set (we could as well have used v01^{ω}, yielding the same r, with convention (iv) for the ambiguity). This binary search leads to a new approximation:
In order to obtain an efficient implementation of this part, we maintain the state of a Finite State Machine plus Counter (FSM+C) and the current as well as an auxilliary convergent, P/Q and \(P^{\prime }/Q^{\prime }\), with initial values 1/2 and 0/1, respectively. The state set is \(\{\texttt { A...D},\overline {\texttt {A}}...\overline {\texttt {D}}\}\), the counter appears as index j for states \(\texttt {B}, \texttt {C}, \overline {\texttt {B}}, \overline {\texttt {C}}\). We start in state A.
The states A...D handle PDs with odd index 2i + 1, while states \(\overline {\texttt {A}}...\overline {\texttt {D}}\) for PDs with even index operate exactly 01inverse to A...D. Since we always maintain an even number of PDs, for an odd current index the PD value b_{2l+ 1} is distributed as \([\dots , b_{2l+1}1,1]\). This is the only difference between the two groups of states, apart from the bit complement. The update of the state and of the two convergents are shown in Tables 11 and 12 (for \(Q,Q^{\prime }\) analogously). “<<” is the left shift. The first 2i PDs are fixed and omitted.
Observe that no multiplication (as in P_{i} := b_{i} ⋅ P_{i− 1} + P_{i− 2}, Q_{i} := b_{i} ⋅ Q_{i− 1} + Q_{i− 2}) is necessary. It is replaced by an incremental shiftandadd. We thus obtain property (INC)—everything is done bitwise, incrementally, and only additions and shifts are used (the shifts by arbitrarily large j can be avoided by shifting \(P^{\prime }, Q^{\prime }\) in each state \(\texttt {B}_j, \texttt {C}_j, \overline {\texttt {B}_{j}}, \overline {\texttt {C}_{j}}\)).
12 Feedback in \(\mathbb {Q}\) shift registers F\(\mathbb {Q}\)SRs: ReduceByFeedback
We have to obtain the rational sequence r = P/Q to be compared with s. Paperandpencil long division in base 2 of P by Q (here P < Q is assumed) proceeds as in Table 13 (left).
An F\(\mathbb {Q}\)SR is thus just a standard long accumulator X for values from \(\mathbb {N}_0\) with a shiftandadd unit. It requires a full length compare in each cycle. The next part shows, how to replace that timeconsuming compare by a 2bit compare and a different way of keeping X (almost) within the range [0,Q).
Remark 22
In analogy with the LFSR case, we can consider FCSRs as “Fibonacci style” and F\(\mathbb {Q}\)SRs as “Galois style” implementation of feedback with carry.
If implementing in software, skip directly to Theorem 24.
An alternative implementation of division in \(\mathbb {Q}\) and thus F\(\mathbb {Q}\)SRs, by ReduceByFeedback (see [21]), is shown in Table 13, right hand side.
Let \(n := \lceil \log _2(Q)\rceil , N := 2^n\), hence N/2 < Q ≤ N. We use an (n + 2)bit register for X, initially filled with P. Setting K ≡ N(mod Q) that is K := N − Q, we may replace X := X − Q by X := X − N + K.
Let m denote the two leftmost bits of the F\(\mathbb {Q}\)SR, of content 0,1,2, or 3. We replace the decision (X ≥ Q?), a long compare, by the twobitcheck (X ≥ N?) that is (m ≥ 1?).
If necessary, for m≠ 0, we subtract m ⋅ N (by setting m := 0) and compensate by adding the constant m ⋅ K. In this way, the contents of X is always correct mod Q. It may, however, exceed Q − 1, lying in the range Q ≤ X < N.
We now show that m ≤ 2 is always guaranteed.
Proposition 23
Overflowfreeness of the range m ∈{0,1,2}
Let 0 ≤ m ≤ 2. Then after one clock cycle we still have 0 ≤ m^{+} = ⌊(2X + mK)/N⌋≤ 2.
Proof
We have K = N − Q and N/2 < Q ≤ N. Therefore, 0 ≤ K < N/2 and 0 ≤ 2X + 0 ⋅ K ≤ 2X + m ⋅ K ≤ 2X + 2 ⋅ K < 2 ⋅ N + 2 ⋅ N/2 = 3N, which assures that the range {0,1,2} for m is also satisfied in the next step. □
We want the sequence r = (r_{k}) ∈{0,1}^{ω}, but instead we obtain (m_{l}) ∈{0,1,2}^{ω}. While we have \({\sum }_{k\in \mathbb {N}}r_k2^{k} = {\sum }_{l\in \mathbb {N}}m_l2^{l}\) by design, we must convert (m_{l}) into (r_{k}):
Essentially, a run of ones with leading zero is saved for later and output when m = 0. With m = 2, a carry ripples through the 1run, converting it to 10^{ctr}. We start anew with a leading zero, not yet output. The value r_{0} will always be zero and should be thrown away (since P < Q, we do never start with \(m= 1^k2\dots \)).
More on the implementation of F\(\mathbb {Q}\)SRs in hardware can be found in [21], including the use of Brickell’s delayedcarryadder technique: Paperandpencil long division is a subset of RSA via modular exponentiation \(\supseteq \) multiplication mod Q \(\supseteq \) shiftandadd mod Q \(\supseteq \) shift (no add) mod Q = long division.
Theorem 24
Fast Reinitialisation of F\(\mathbb {Q}\)SRs
We denote values at timestep \(k\in \mathbb {N}\) by the superscript ^{(k)}.
Consider 3 F\(\mathbb {Q}\)SRs with initial contents \(P^{(1)},P^{\prime (1)},\) and \(P^{\prime \prime (1)} := P^{(1)}+bP^{\prime (1)}\) for some \(b\in \mathbb {N}\), with feedback values \(Q, Q^{\prime }\), and \(Q^{\prime \prime } := Q+bQ^{\prime }\), and where \(N, N^{\prime },N^{\prime \prime }\) are the respective powers of 2 corresponding to the sizes of \(Q, Q^{\prime }, Q^{\prime \prime }\).
Let m^{(k)} := 2 ⋅ P^{(k)} mod Q ∈{0,1}, and similarly \(m^{\prime (k)}, m^{\prime \prime (k)}\) be the respective feedback bits and \(P^{(k+1)} := 2P^{(k)}m^{(k)}Q,\ \ P^{\prime (k+1)} := 2P^{\prime (k)}m^{\prime (k)}Q^{\prime },\ \ P^{\prime \prime (k+1)} := 2P^{\prime \prime (k)}m^{\prime \prime (k)}Q^{\prime \prime }\) be the register contents in the next timestep. Then:
(i) If \(m^{(k)} = m^{\prime (k)}\) and the weighted sum property \( P^{\prime \prime (k)} = P^{(k)} + b P^{\prime (k)}\) holds, then also \(m^{\prime \prime (k)} = m^{(k)} (= m^{\prime (k)})\) and the property remains valid for k + 1, \( P^{\prime \prime (k+1)} = P^{(k+1)} + b P^{\prime (k+1)}\).
(ii) Let r := P^{(1)}/Q^{(1)} and \(r^{\prime } := P^{\prime (1)}/Q^{\prime (1)}\) coincide for the first n bits, \(r_k=r^{\prime }_k, 1\leq k\leq n\). Then \(r^{\prime \prime } := P^{\prime \prime (1)}/Q^{\prime \prime (1)}\) also satisfies \(r^{\prime \prime }_k = r_k, 1\leq k\leq n\), and the contents of the 3 F\(\mathbb {Q}\)SRs generating \(r,r^{\prime }\), and \(r^{\prime \prime }\) satisfy the weighted sum property \(P^{\prime \prime (k)} = P^{(k)} + b\cdot P^{\prime (k)}\) for all timesteps 1 ≤ k ≤ n.
Proof

(i)
First assume \(m=m^{\prime }=0\). Then \(P<Q, P^{\prime } < Q^{\prime }\) and therefore \(P^{\prime \prime } = P+bP^{\prime } < Q+bQ^{\prime }=Q^{\prime \prime }\), hence \(m^{\prime \prime } = 0\). For \(m=m^{\prime }=1\), analogously \(P\geq Q, P^{\prime } \geq Q^{\prime }\) and therefore \(P^{\prime \prime } = P+bP^{\prime } \geq Q+bQ^{\prime }=Q^{\prime \prime }\), hence \(m^{\prime \prime } = 1\).
Also, \( P^{\prime \prime (k+1)} = 2P^{\prime \prime (k)} m^{\prime \prime (k)}Q^{\prime \prime } = 2(P^{(k)}+bP^{\prime (k)})  (m^{(k)}Q+m^{\prime (k)}bQ^{\prime }) = (2P^{(k)}m^{(k)}Q) + b(P^{\prime (k)}m^{\prime (k)}Q^{\prime }) = P^{(k+1)} + b\cdot P^{\prime }(k+1)\).

(ii)
By induction on k with (i), using \(r_k = m^{(k)},r^{\prime }_k = m^{\prime (k)}\), and \(r^{\prime \prime }_k = m^{\prime \prime (k)}\).
□
Remark 25
By Theorem 24, we may maintain two registers for the current and the previous convergent, P/Q and \(P^{\prime }/Q^{\prime }\). Whenever a new F\(\mathbb {Q}\)SR with P^{+}, Q^{+} is required, we set its current contents from \(P, P^{\prime }\) by the weighted sum property, and may then drop \(P^{\prime },Q^{\prime }\). We only have to repeat the (few) clock cycles starting from the last mismatch. Overall, we advance every contents P/Q, \(P^{\prime }/Q^{\prime }\), or P^{+}/Q^{+} from one mismatch to the one after the next, a total of 2n clock cycles or O(n).
Theorem 26
Bit Complexity
The bit complexity of computing the current best approximations P/Q, the discrepancy sequence D(s), the coding sequence C(s), and the rational complexity profile corresponding to the prefix \(s_1,\dots , s_n\) of length n is O(n^{2}).
Proof
Calculating P/Q requires a factor n for the growing wordlength and an amortized number of 2n cycles according to the previous theorem and remark, a total of O(n^{2}) bit operations. D(s) = (d_{k}) and C(s) are obtained “on the fly” in the algorithm from Table 4 (left), in O(1) operations per cycle or a total of O(n). Observe that the comparison \((s_1{\dots } s_k) \lesseqqgtr (r_1{\dots } r_k)\) can effectively be done on a suffix, whose start index is maintained and updated separately for ⊖ and ⊕: We always have “r[⊖] < s < r[⊕]”, the r values from Cases ⊖ are strictly increasing towards s from below, the values for Cases ⊕ strictly decrease from above, and thus once a bit position matches the corresponding s_{k}, this fact will be maintained. Hence, for any index k, the corresponding s_{k} is matched exactly twice, once in a Case ⊖, once for a Case ⊕, a total of O(n). The rational complexity profile is obtained by comparing the length of the current Q against R(s, k − 1). Since R(s, n) ≤ n, the comparison is O(n) per step, O(n^{2}) in total. □
Remark 27
This coincides with the complexity of computing L or A without using FFT techniques, O(n^{2}). However, there are FFT implementations both for linear (Blackburn [2]) and 2adic complexity (Klapper [8, p. 137]) that need only \(O(n\log (n)^2\log \log (n))\) bit operations. See also Niederreiter and Vielhaber [13] for an algorithm that computes all discrepancy sequences of all shifted sequences \((\textbf {K}(s_k,\dots , s_n)_i, 1\leq i\leq nk+1), 1\leq k\leq n\) in overall time O(n^{2}), i.e. in a constant number of bit operations per output bit.
13 Open problems

1.
When the approximation is very good, e.g. after n consecutive discrepancy bits d_{k} = 0, can we extend v directly by, say, 01^{⌊0.72n⌋} or 10^{⌊0.72n⌋}, using the results in Section 9?

2.
How, if at all, can the binary procedure be adapted to sequences over \(\mathbb {F}_p\), p > 2?

3.
Can we use FFT techniques to achieve an overall bit complexity of \(O(n\log (n)^2\log \log (n))\) for the rational complexity, like in the case of L and A?

4.
Give closed formulae for the rational complexity values N_{R}(n, c), E(R(n)) and Var(R(n)), for even and odd n. Describe the numbers \(\nu _{E}^a\approx 0.06\) and \(\nu _{V}^a\) from Conjecture 7.

5.
How does the rational complexity behave for sequences with perfect autocorrelation, in particular compared with linear complexity and 2adic complexity?

6.
Can we construct the Q Tree without previously constructing the V_{10} Tree via the FSM+C of Section 11? (this would also solve Problem 1)

7.
Prove Conjecture 15. Can we speed up the simulation of rational complexity by 2adic complexity (or vice versa) below the suggested O(n^{4}) complexity?

8.
R is a CFE on the Archimedean \(\mathbb {R}\), while A is an isometry on \(\mathbb {Z}_2\) (ultrametric, no CFE).
Can we reconcile R with A beyond the case of ultimately periodic sequences?
14 Conclusion
We have introduced Rational Complexity R as a novel complexity measure for binary sequences, a third way to assess the pseudorandomness of binary sequences by comparison with ultimately periodic sequences, besides linear and 2adic complexity, L and A.
We presented the Binary CFE Algorithm, a pseudoultrametric, bitbybit version of the standard continued fraction expansion in \(\mathbb {R}\), making use of Feedback in \(\mathbb {Q}\) Shift Registers. The algorithm avoids the costly inversion s^{(i)} := {s^{(i− 1)}}^{− 1} in \(\mathbb {R}\).
We also defined two prefixfree complete codes, C_{I} and C_{II}, mapping the partial denominators from \(\mathbb {N}\) to {0,1}^{∗} and following the GaußKuz’min measure. C_{I} and C_{II} yield a monotonic function \(C\colon \mathbb {R}_1 \to B^{\omega }\), mapping reals to CFE encodings. We have seen that R, L and A differ pairwise on considerable parts of the prefix space B^{30} and thus all three measures should be used to examine a given sequence for pseudorandomness.
Abbreviations
 \(\mathbb {N}\) :

\(= \{1,2,3,\dots \}, \mathbb {N}_{0} = \{0,1,2,3,\dots \}\)
 \(\mathbb {D}\) :

\(:= \{a / 2^{k}\colon a\in \mathbb {Z} \text { odd}, k\in \mathbb {N} \text {\ or\ } a\in \mathbb {Z}, k=0\}\), dyadic fractions
 \(\mathbb {R}_{1},\mathbb {Q}_{1},\mathbb {D}_{1}\colon \) :

For \(X\subset \mathbb {R}\), X_{1} := X ∩ [0,1]
 B :

= {0,1} is the binary alphabet, with complement \(\overline a = 1a, a\in B\), e.g. \(\overline {10011} = 01100\)
 \(B^{*}=\{\varepsilon ,0,1,00,01,10,11,000,\dots \}\) :

and B^{ω}, are the finite, resp. infinite, words over B
 B ^{+} :

= B^{∗}∖{ε}. For \(v= v_{1}v_{2}{\dots } v_{v}\in B^{*}\): \((v)_{2} = {\sum }_{k=0}^{v1} v_{vk} 2^{k}\in \mathbb {N}_{0}\) in binary, (ε)_{2} = 0
References
Berlekamp, E.: Nonbinary BCH decoding. TR North Carolina State University Department of Statistics (1966)
Blackburn, S.: Fast rational interpolation, ReedSolomon decoding, and the linear complexity profiles of sequences. IEEE Trans. IT 43(2), 537–548 (1997)
Dornstetter, J.L.: On the equivalence between Berlekamp’s and Euclid’s algorithm. IEEE Trans. IT 33(3), 428–431 (1987)
Hu, H., Feng, D.: On the 2adic complexity and the kerror 2adic complexity of periodic binary sequences. In: SETA 2004 LNCS, vol. 3486, pp 185–197 (2005)
Khintchine, A.: Metrische kettenbruchprobleme. Comput. Math. 1, 361–382 (1935)
Klapper, A., Goresky, M.: 2Adic shift registers. In: Crypto ’95 LNCS, vol. 963, pp 262–273 (1995)
Klapper, A., Goresky, M.: Cryptanalysis based on 2Adic rational approximation. In: Crypto ’95, LNCS, vol. 963, pp 262–273 (1995)
Klapper, A., Goresky, M.: Feedback shift registers, 2adic span, and combiners with memory. J. Crypt. 10, 111–147 (1997)
Kuzmin, R. O.: Sur une problème de Gauss. Atti Congr. Int. Bologna 6, 83–89 (1928)
Lévy, P.: Sur les lois de probabilité dont dépendent les quotients complets et incomplets d’une fraction continue. Bull. S.M.F. 57, 178–194 (1929)
Massey, J.: Shiftregister synthesis and BCH decoding. IEEE Trans. IT 15(1), 122–127 (1969)
Moree, P.: Artin’s primitive root conjecture—a survey. arXiv Math arXiv:0412262 (2012)
Niederreiter, H., Vielhaber, M.: Simultaneous shifted continued fraction expansions in quadratic time. AAECC 9(2), 125–138 (1998)
Perron, O.: Die Lehre von den Kettenbrüchen, Bd. I. Teubner, Stuttgart 1954/1977
Rueppel, R.: Analysis and Design of Stream Ciphers. Springer, Berlin (1986)
Tian, T., Qi, W.F.: Expected values for the rational complexity of finite binary sequences. Des. Codes Crypt. 55(1), 65–79 (2010)
Vallée, B.: Digits and continuants in euclidean algorithms Ergodic versus tauberian theorems. J. Théor. Nombres Bordeaux 12(2), 531–570 (2000)
Vepstas, L.: The Minkowsi Question Mark, PSL(2,z) and the Modular Group http://ns.linas.org/math/chapminkowski.pdf (2020)
Vielhaber, M.: Continued fraction expansion as isometry  the law of the iterated logarithm for linear, jump, and 2adic complexity. IEEE Trans. IT 53(11), 4383–4391 (2007). Preprint: https://arxiv.org/pdf/cs/0511089.pdf
Vielhaber, M.: A unified view on sequence complexity measures as isometries. In: SETA 2004 LNCS, vol. 3486, pp 143–153 (2005)
Vielhaber, M.: Reducebyfeedback: timing resistant and DPAaware modular multiplication, plus: how to break RSA by DPA. In: CHES 2012. Springer (2012)
Vielhaber, M.: V Tree—continued fraction expansion, SternBrocot tree. Minkowski’s ?(x) Function In Binary: Exponentially Faster. arXiv:2008.08020 (2020)
Acknowledgements
We sincerely thank the anonymous reviewers, both at SETA 2020 and Cryptography and Communications for their valuable and detailed suggestions.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Vielhaber, M., Canales Chacón, M.d.P. & Jara Ceballos, S. Rational complexity of binary sequences, F\(\mathbb {Q}\)SRs, and pseudoultrametric continued fractions in \(\mathbb {R}\). Cryptogr. Commun. 14, 433–457 (2022). https://doi.org/10.1007/s12095021005392
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12095021005392
Keywords
 Rational complexity
 Feedback in \(\mathbb {Q}\) Shift Register
 F\(\mathbb {Q}\)SR
 Binary Continued Fraction Expansion Algorithm
 Linear complexity
 2adic complexity