1 Introduction

For a positive integer t, we use \({{\mathbb {Z}}}_t\) to denote the residue ring modulo t, which we always assume to be represented by the set \(\{0, \ldots , t-1\}\).

We fix an r-dimensional integer vector

$$\begin{aligned} \textbf{z}= (z_1, \ldots , z_r) \in {{\mathbb {Z}}}_t^r \end{aligned}$$
(1.1)

and define the function \(S_{r,t,\textbf{z}}:~{{\mathbb {Z}}}_t \rightarrow {{\mathbb {Z}}}_t\) as follows. Given \(w \in {{\mathbb {Z}}}_t\) (which following our convention we interpret as an integer from the set \(\{0, \ldots , t-1\}\)) we expand w in binary \(w = \overline{u_{s}\dots u_{1}}\), where \(u_{i}\) represents the i-th least significant bit of w, that is, the i-th bit from the right (if \(r > s\) we pad w with \(r-s\) leading zeroes) and then set

$$\begin{aligned} S_{r,t,\textbf{z}}(w) = \sum _{i = 1}^{r}u_{i}z_{i} \in {{\mathbb {Z}}}_t. \end{aligned}$$

Furthermore, for a fixed vector \(\textbf{z}\) and a given initial value \(w_0\in {{\mathbb {Z}}}_t\) we define the sequence

$$\begin{aligned} v(0) = w_0, \quad v (n+1) = S_{r,t,\textbf{z}}\left( v(n)\right) , \qquad n =0, 1, \ldots . \end{aligned}$$

This construction has been introduced by Rueppel [23, Chapter 7] (see also [24, 25]), is known as the subset sum pseudorandom number generator. The efficiency of the generator and its cryptographic properties have been studied by Impagliazzo and Naor [18]. This generator is believed to be cryptographically secure since it relies on a combinatorial rather than an algebraic structure, which prevents mounting attacks similar to those designed in [2,3,4, 12,13,14,15, 19, 21], see also the references therein.

We note that one of the parameters characterising the pseudorandom properties of any map is the number of its fixed points since it reflects the mixing properties of this map. For example, the statistics of fixed points has been investigated for such classical cryptographic maps as the RSA encryption function [5] and the discrete logarithm [6, 7, 16, 17]. Several other examples of such results can be found in [1, 8, 9, 11, 20, 22]. A survey of such results, and of related results on short cycles in these maps, can be found in [27].

Here we consider this question for the map \(w \mapsto S_{r,t,\textbf{z}}(w)\). That is, we define and study

$$\begin{aligned} F_{r,t}(\textbf{z}) = \#\{w \in {{\mathbb {Z}}}_t:~ w = S_{r,t,\textbf{z}}(w)\}. \end{aligned}$$

More precisely, we are interested in the power moments of this quantity over all \(t^r\) possible choices of the vectors (1.1):

$$\begin{aligned} M_\nu (r,t) = \frac{1}{t^r} \sum _{\textbf{z}\in {{\mathbb {Z}}}_t^r} F_{r,t}(\textbf{z})^\nu , \qquad \nu =1,2, \ldots . \end{aligned}$$

In particular for the first moment, that is, for the average values of \(F_{r,t}(\textbf{z})\) we simplify the notation as

$$\begin{aligned} A(r,t) = M_1(r,t). \end{aligned}$$

We recall that it has been shown in [26, Theorem 31.2] that for \(t \geqslant 2^r\) the bound

$$\begin{aligned} A(r,t) \leqslant (2t)^{1/2} + 2 \end{aligned}$$
(1.2)

holds.

Here we improve this bound and also obtain a new bound for higher moments.

We note that the subset sum pseudorandom number generator is very fast as no modular multiplication is needed and no weaknesses has been discovered so far. However so far very few theoretical results have been known. Thus besides giving some concrete theoretic results, we also hope to attract more attention to this generator.

2 Evaluation of the average value of the number of fixed points

We start with a significant improvement of (1.2) and in fact we evaluate A(rt) explicitly.

Theorem 2.1

For \(t \geqslant 2^r\), we have

Proof

Let

(2.1)

be the length of the binary expansion of t. Hence we write binary representations of \(w \in {{\mathbb {Z}}}_t\) as binary strings of length exactly m (possible with some zeros on the left, that is, on the most significant positions).

Note that by our assumption \(t \geqslant 2^r\) we have \(m \geqslant r\). \(m > r\).

Changing the order of summation we write

$$\begin{aligned} A(r,t) = \frac{1}{t^r} \sum _{\textbf{z}\in {{\mathbb {Z}}}_t^r} \sum _{\begin{array}{c} w \in {{\mathbb {Z}}}_t\\ w = S_{r,t,\textbf{z}}(w) \end{array}}1 = \frac{1}{t^r} \sum _{w \in {{\mathbb {Z}}}_t} \sum _{\begin{array}{c} \textbf{z}\in {{\mathbb {Z}}}_t^r\\ w = S_{r,t,\textbf{z}}(w) \end{array}}1. \end{aligned}$$

For

$$\begin{aligned} w = \overline{u_{m}\ldots u_{r+1} \underbrace{0 \ldots 0}_{\text {{ r} zeros}}} \in {{\mathbb {Z}}}_t \end{aligned}$$
(2.2)

whose binary expansion end with a string of r zeros, we obviously obtain \(S_{r,t,\textbf{z}}(w)=0\). This leaves only one possible value for \(w \in {{\mathbb {Z}}}_t\) with \(S_{r,t,\textbf{z}}(w) = w\), namely, \(w = 0\), in which case the inner sum is equal to \(t^r\).

The condition (2.2) on w means that \(2^r \mid w\) and thus this happens for elements \(w \in {{\mathbb {Z}}}_t\).

For the remaining choices of \(w = \overline{u_{m} \ldots u_{1}}\) with

$$\begin{aligned} \left( u_r, \ldots , u_{1}\right) \ne \left( 0, \ldots , 0\right) , \end{aligned}$$

there is at least one non-zero entry among the first r least significant bits in its binary representation, whose index we define as i. Then the component \(z_{i}\) of \(\textbf{z}\) as in (1.1) is uniquely defined from the equation

$$\begin{aligned} w = S_{r,t,\textbf{z}}(w) = \sum _{i = 1}^{r}u_{i}z_{i} \end{aligned}$$

by the other components of \(\textbf{z}\), hence there exactly \(t^{r-1}\) such choices for \(\textbf{z}\).

Therefore,

which concludes the proof. \(\square \)

In particular, we see from Theorem 2.1 that we can improve (1.2) as

$$\begin{aligned} A(r,t) < 2. \end{aligned}$$

3 Bounding higher moments of the number of fixed points

We recall that the notation \(U = O(V)\), \(U \ll V\) and \( V\gg U\) are equivalent to \(|U|\leqslant c V\) for some positive constant c, which throughout the paper may depend on the order of the moment \(\nu \).

Here we always assume that t is a prime number, hence \({\mathbb {Z}}_t = {\mathbb {F}}_t\) is a finite field of t elements and hence we can use linear algebra over \({\mathbb {F}}_t\).

Theorem 3.1

For a prime \(t> 2^r\), for any fixed integer \(\nu \geqslant 1\) we have

$$\begin{aligned} M_{\nu }(r,t) \ll \left( t /2^{r}\right) ^{\nu -1} \end{aligned}$$

Proof

Let m be defined by (2.1), that is, m is the length of the binary expansion of t. In particular, by our assumption \(t > 2^r\) we have \(m \geqslant r\).

We start with an observation that the value of \(F_{r,t}({\textbf {z}})^{\nu }\) is equal to the number of solutions to the system of \(\nu \) equations in \(m\nu \) variables \(u_{i,j} \in \{0,1\}\), \(i=1,\ldots , m\), \(j =1, \ldots , \nu \):

$$\begin{aligned} \sum _{i = 1}^{m}u_{i,j} \cdot 2^{i- 1} \equiv \sum _{i = 1}^{r}u_{i,j} \cdot z_{i} \pmod {t}, \qquad j \in \{1,\cdots ,\nu \}, \end{aligned}$$
(3.1)

Note that the variables \(u_{i,j} \in \{0,1\}\), \(i=1,\ldots , m\), \(j =1, \ldots , \nu \), in (3.1) correspond to \(\nu \) vectors \((\textbf{u}_1, \ldots , \textbf{u}_\nu )\) coming from binary expansions of solutions \(w_1, \ldots , w_\nu \in {{\mathbb {F}}}_t\) to \(\nu \) independent equations \(w_j= S_{r,t,\textbf{z}}(w_j)\), \(j =1, \ldots , \nu \).

We define \(U_{\nu , r}(s)\) to be the set \(\nu \)-tuples of binary vectors \((\textbf{u}_1, \ldots , \textbf{u}_\nu )\) for which the first r components form a matrix of rank s over \( {\mathbb {F}}_t\), that is,

$$\begin{aligned} \textrm{rank}_{{{\mathbb {F}}}_t}\begin{pmatrix} u_{1,1} &{} \ldots &{} u_{1,r} \\ \ldots &{} \ldots &{} \ldots \\ u_{\nu ,1} &{} \ldots &{} u_{\nu ,r} \end{pmatrix} = s. \end{aligned}$$
(3.2)

Clearly for every \(\nu \)-tuple \((\textbf{u}_1, \ldots , \textbf{u}_\nu ) \in U_{\nu , r}(s)\) of vectors, the system of congruences (3.1) has at most \(t^{r-s}\) solutions in \({\textbf {z}} \in {\mathbb {Z}}_{t}^{r}\).

We now switch the roles of the binary variables \(u_{i,j} \in \{0,1\}\), \(i=1,\ldots , m\), \(j =1, \ldots , \nu \), and the vectors \({\textbf {z}} \in {\mathbb {Z}}_{t}^{r}\). That is, for each choice of \(u_{i,j} \in \{0,1\}\), \(i=1,\ldots , m\), \(j =1, \ldots , \nu \), we count the number of vectors \({\textbf {z}} \in {\mathbb {Z}}_{t}^{r}\) satisfying (3.1).

We can then bound our summation in terms of \(\# U_{\nu , r}(s)\):

$$\begin{aligned} \sum _{{\textbf {z}} \in {\mathbb {Z}}_{t}^{r}} F_{r,t}(\textbf{z})^{\nu } \leqslant \sum _{s = 0}^{\nu }\#U_{\nu , r}(s)t^{r - s}. \end{aligned}$$
(3.3)

First we note that \(\#U_{\nu , r}(0) = 1 \) as this corresponds to the zero matrix in (3.2) and thus (3.1) implies that the remaining \(m-r\) components of each of the binary vectors \((\textbf{u}_1, \ldots , \textbf{u}_\nu )\) also vanish. Then we have \(t^r\) choices for \(\textbf{z}\). Hence such vectors contribute in total \(t^{r}\) to the case \(s =0\).

To estimate \(\#U_{\nu , r}(s)\) with \(s \geqslant 1\), we note that if we fix s linearly independent vectors

$$\begin{aligned} (\textbf{u}_{j_1}, \ldots , \textbf{u}_{j_s}), \qquad 1\leqslant j_1< \ldots < j_s\leqslant \nu , \end{aligned}$$

in a family of vectors \((\textbf{u}_1, \ldots , \textbf{u}_\nu ) \in U_{\nu , r}(s)\), then any other vector \(\textbf{u}_j\) belongs to the linear span of \(\textbf{u}_{j_1}, \ldots , \textbf{u}_{j_s}\) over \( {\mathbb {F}}_{t}\). That is,

$$\begin{aligned} \textbf{u}_j = a_1 \textbf{u}_{j_1}+ \ldots + a_s \textbf{u}_{j_s} \end{aligned}$$
(3.4)

for some \(a_1, \ldots , a_s\in {\mathbb {F}}_t\). By the Cramer rule we have

$$\begin{aligned} a_j \equiv \frac{\Delta _j}{\Delta } \pmod t, \qquad j =1, \ldots , s, \end{aligned}$$
(3.5)

for some determinants \(\Delta , \Delta _1, \ldots , \Delta _s\) over \({\mathbb {F}}_t\) forms by the components of the vectors \(\textbf{u}_1, \ldots , \textbf{u}_\nu \) and with \(\Delta \not \equiv 0 \pmod t\). Since all vectors \(\textbf{u}_1, \ldots , \textbf{u}_\nu \) are binary, we easily infer that

$$\begin{aligned} |\Delta |, |\Delta _j| \leqslant 2^{-s} \left( s+1\right) ^{(s+1)/2}. \end{aligned}$$
(3.6)

see, for example, [10, Problem 523]. Thus, adjusting the signs we see from (3.5) and (3.6) that, regardless of the choice of \(\textbf{u}_{j_1}, \ldots , \textbf{u}_{j_s}\), each vector \((a_1, \ldots , a_s)\) satisfies

$$\begin{aligned} (a_1, \ldots , a_s) = \left( D_1D^{-1}, \ldots , D_sD^{-1}\right) \pmod t \end{aligned}$$
(3.7)

(where \(D^{-1}\) is computed modulo t) with some integers

$$\begin{aligned} D \in \left[ 1, 2^{-s} \left( s+1\right) ^{(s+1)/2}\right] \end{aligned}$$

and

$$\begin{aligned} D_j \in \left[ - 2^{-s} \left( s+1\right) ^{(s+1)/2}, 2^{-s} \left( s+1\right) ^{(s+1)/2}\right] , \qquad j =1, \ldots , s, \end{aligned}$$

and hence there are at most

$$\begin{aligned} A_s = 2^{-s} \left( s+1\right) ^{(s+1)/2} \left( 1+2^{-s+1} \left( s+1\right) ^{(s+1)/2}\right) ^s \end{aligned}$$
(3.8)

choices for the vector of the coefficients \((a_1, \ldots , a_s)\) in (3.4).

We emphasise that the meaning of the bound (3.8) is even if the number of possible vectors \((\textbf{u}_1, \ldots , \textbf{u}_\nu ) \in U_{\nu , r}(s)\), and thus the number systems of relations (3.4). grows rapidly with r and t, the number of possible choices for the coefficients \((a_1, \ldots , a_s)\) can be bounded only in terms of s (and thus of \(\nu \)) and therefore independently on r and t.

This implies that when \(\textbf{u}_{j_1}, \ldots , \textbf{u}_{j_s}\) are fixed to satisfy (3.2), there at most \(A_s\) possibilities to form the first r coordinates of each of the other vectors to form a \(\nu \)-tuple \((\textbf{u}_1, \ldots , \textbf{u}_\nu ) \in U_{\nu , r}(s)\), and thus at most \(A_s 2^{m-r}\) possibilities for the whole vector. Since there are at most

$$\begin{aligned} \left( {\begin{array}{c}\nu \\ s\end{array}}\right) (2^m)^{s} = \left( {\begin{array}{c}\nu \\ s\end{array}}\right) 2^{ms} \end{aligned}$$

choices for \(\textbf{u}_{j_1}, \ldots , \textbf{u}_{j_s}\) we obtain

$$\begin{aligned} \#U_{\nu , r}(s) \leqslant \left( {\begin{array}{c}\nu \\ s\end{array}}\right) 2^{ms} \left( A_s 2^{m-r} \right) ^{\nu - s}. \end{aligned}$$

Since we assume that \(\nu \) is fixed and \(s \leqslant \nu \), this simplifies as

$$\begin{aligned} \#U_{\nu , r}(s) \ll 2^{ms+ (m-r)(\nu - s)} = 2^{m\nu -r(\nu - s)} \ll t^{\nu } 2^{-r(\nu - s)}. \end{aligned}$$

We can now substitute the above bound for \(\#U_{\nu , r}(s)\) in (3.3), getting

$$\begin{aligned} \sum _{{\textbf {z}} \in {\mathbb {Z}}_{t}^{r}} F_{r,t}(\textbf{z})^{\nu } \ll t^r + \sum _{s = 1}^{\nu } t^{\nu +r-s}2^{-r(\nu -s)} = t^r + t^{r} \sum _{s=1}^{\nu } \left( t \cdot 2^{-r}\right) ^{\nu - s}. \end{aligned}$$

Note that we have requested that \(t >2^{r}\), which implies \(t \cdot 2^{-r} > 1\), so

$$\begin{aligned} \sum _{{\textbf {z}} \in {\mathbb {Z}}_{t}^{r}} F_{r,t}(\textbf{z})^{\nu } \ll t^{r} + t^{r} \sum _{s=1}^{\nu } \left( t \cdot 2^{-r}\right) ^{\nu - s} \ll t^{r} \left( t \cdot 2^{-r}\right) ^{\nu -1}, \end{aligned}$$

which concludes the proof. \(\square \)

4 Comments

In Theorem 3.1, we have suppressed the dependence on the order of the moments \(\nu \). There are two reasons for this.

First, we do not consider \(\nu \) to be an important parameter. For example, the choice of \(\nu = 2\) already gives us important information and extra technical calculations do not seem to justify the importance of this. However, we provide all necessary estimates for this, if one decides to trace the dependence on \(\nu \). For example, we note that (3.6) is slightly stronger that the classical Hadamard inequality, which is still sufficient for our purposes, since we do not compute the explicit dependence on \(\nu \). Besides the potential contribution to computing explicit dependence on \(\nu \), we also present (3.6) because we believe it deserves to be known more broadly.

The second reason is that before computing the explicit dependence on \(\nu \), one has to attempt to improve the bound (3.8) on the number of distinct vectors which can be solutions to all non-singular systems of s linear congruences modulo t with binary coefficients. This question seems to be of independent interest and certainly deserves further investigation. Certainly one can improve (3.8) by an absolute constant, taking into account that in (3.7) we need only count \(D, D_1, \ldots , D_s\) with

$$\begin{aligned} \gcd \left( D, D_1, \ldots , D_s\right) = 1. \end{aligned}$$

However we are interested in more substantial improvements.

We also would like to note that our approach does not extend on bounding the number of short cycles. For example, we do not have any nontrivial estimate on the number 2-cycles

$$\begin{aligned} \#\{w \in {{\mathbb {F}}}_t:~ w = S_{r,t,\textbf{z}}\left( S_{r,t,\textbf{z}}(w)\right) \} \end{aligned}$$

on average over \(\textbf{z}\in {{\mathbb {F}}}_t^r\), which is another interesting open question.