1 Introduction

Distinguishing a randomly chosen permutation from a random function is a combinatorial problem which is fundamental in cryptology. A few examples where this problem plays an important role are the security analysis of block ciphers, hash, and MAC schemes.

One formulation of this problem is the following. An oracle chooses a function \(F: \{0, 1\}^n \rightarrow \{0, 1\}^n\), which is either a randomly (uniformly) chosen permutation of \(\{0, 1\}^n\) or a randomly (uniformly) chosen function from \(\{0, 1\}^n\) to \(\{0, 1\}^n\). An adversary selects a “querying and guessing” algorithm. He first uses it to submit q (adaptive) queries to the oracle, and the oracle responds with F(w) to the query \(w \in \{0, 1\}^n\). After collecting the q responses, the adversary uses his algorithm to guess whether or not F is a permutation. The quality of such an algorithm (in the cryptographic context) is the ability to distinguish between the two cases (rather than successfully guessing which one it is). It is measured by the difference between the probability that the algorithm outputs a certain answer, given that the oracle chose a permutation, and the probability that the algorithm outputs the same answer, given that the oracle chose a function. This difference is called the “advantage” of the algorithm. We are interested in estimating Adv, which is the maximal advantage of the adversary, over all possible algorithms, as a function of a budget of q queries.

The well-known (folklore) answer to this problem is based on the simple “collision test” and the Birthday Problem:

$$\begin{aligned} \hbox {Adv}=1-\left( 1-\frac{1}{2^n}\right) \left( 1-\frac{2}{2^n}\right) \ldots \left( 1-\frac{q-1}{2^n}\right) . \end{aligned}$$

Since for every \(1\le k\le q-1\)

$$\begin{aligned} 1-\frac{q}{2^n}\le \left( 1-\frac{k}{2^n}\right) \left( 1-\frac{q-k}{2^n}\right) \le \left( 1-\frac{q}{2^{n+1}}\right) ^2, \end{aligned}$$

we get, for \(q\le 2^n\), that

$$\begin{aligned} 1-e^{-\frac{q(q-1)}{2^{n+1}}}\le 1-\left( 1-\frac{q}{2^{n+1}}\right) ^{q-1}\le \hbox {Adv} \le 1-\left( 1-\frac{q}{2^n}\right) ^{\frac{q-1}{2}}\le \frac{q(q-1)}{2^{n+1}}. \end{aligned}$$
(1)

This result implies that the number of queries required to distinguish a random permutation from a random function, with success probability significantly larger than, say, \(\frac{1}{2}\), is \(\varTheta (2^{\frac{n}{2}})\). We now consider the following generalization of this problem:

Problem 1

(Distinguishing a truncated permutation) Let \(0 \le m<n\) be integers. An oracle chooses \(c \in \{0, 1\}\). If \(c = 1\), it picks a permutation p of \(\{0, 1\}^n\) uniformly at random, and if \(c = 0\), it picks a function \(f: \{0, 1\}^n \rightarrow \{0, 1\}^n\) uniformly at random. An adversary is allowed to submit queries \(w \in \{0, 1\}^n\) to the oracle. The oracle computes \(\alpha = p(w)\) (if \(c=1\)) or \(\alpha = f(w)\) (if \(c=0\)), truncates (with no loss of generality) the last m bits from \(\alpha \), and replies with the remaining \((n-m)\) bits. The adversary has a budget of q (adaptive) queries, and after exhausting this budget, is expected to guess c. How many queries does the adversary need in order to gain non-negligible advantage?

Specifically, we seek \(q_{\frac{1}{2}}(n,m)=\min \{q\mid \mathbf{Adv}_{n,m}(q)\ge \frac{1}{2}\}\) as a function of m and n.

2 So, How Many Queries are Really Needed?

The Birthday bound (folklore) We start with remarking that the classical “Birthday” bound \(q_{\frac{1}{2}}(n,m)=\varOmega (2^{n/2})\) is obviously valid as a bound for the adversary’s advantage in Problem 1. In fact, any algorithm that the adversary can use with the truncated replies of \((n-m)\) bits from f(w) can also be used by the adversary who sees the full f(w) (he can ignore m bits and apply the same algorithm).

Of course, we are looking for a better upper bound that would reflect the fact that the adversary receives less information when f(w) is truncated. We have the following bounds for Problem 1.

Hall et al. [5] Problem 1 was studied by Hall et al. [5]. The authors showed an algorithm that gives a non-negligible distinguishing advantage using \(q = O (2^{(n+m)/2})\) queries (for any m). They also proved the following upper bound:

$$\begin{aligned} \mathbf{Adv}_{n,m}(q) \le 5\left( \frac{q}{2^{\frac{n+m}{2}}}\right) ^{\frac{2}{3}}+\frac{1}{2}\left( \frac{q}{2^{\frac{n+m}{2}}}\right) ^3\frac{1}{ 2^{\frac{n-7m}{2}}}. \end{aligned}$$
(2)

For \(m\le n/7\) the bound in (2) implies that \(q_{\frac{1}{2}}(n,m)=\varOmega (2^{\frac{m+n}{2}})\). However, for larger values of m, the bound on \(q_{\frac{1}{2}}(n,m)\) that is offered by (2) deteriorates, and becomes (already for \(m>n/4\))worse than the trivial “Birthday” bound \(q_{\frac{1}{2}}(n,m)=\varOmega (2^{n/2})\).

Hall et al. [5] conjectured that \(\varOmega (2^{\frac{m+n}{2}})\) queries are needed in order to get a non-negligible advantage, in the general case.

Bellare and Impagliazzo [1] Theorem 4.2 in [1] states that

$$\begin{aligned} \mathbf{Adv}_{n,m}(q)=O(n)\frac{q}{2^{\frac{n+m}{2}}} \end{aligned}$$
(3)

whenever \(2^{n-m}<q<2^{\frac{n+m}{2}}\).

This implies that \(q_{\frac{1}{2}}(n,m)=\varOmega (\frac{1}{n}2^{\frac{m+n}{2}})\) for \(m>\frac{1}{3}n+\frac{2}{3}\log _2 n+\varOmega (1)\). We point out that it is hard to extract an upper bound for \(\mathbf{Adv}_{n,m}\) from [1], in a form that can be directly compared to the other approximations that are discussed here.

Gilboa and Gueron [4] The method used to show (2) can be pushed to prove the conjecture in [5] for (almost) every m. In particular, it can be shown that if \(m\le n/3\) then

$$\begin{aligned} \mathbf{Adv}_{n,m}(q)\le 2\root 3 \of {2}\left( \frac{q}{2^{\frac{n+m}{2}}}\right) ^{\frac{2}{3}}+\frac{2\sqrt{2}}{\sqrt{3}}\left( \frac{q}{2^{\frac{n+m}{2}}}\right) ^{\frac{3}{2}}+\left( \frac{q}{2^{\frac{n+m}{2}}}\right) ^2, \end{aligned}$$
(4)

and if \(\frac{n}{3}<m\le n-4-\log _2 n\) then

$$\begin{aligned} \mathbf{Adv}_{n,m}(q)\le 3\left( \frac{q}{2^{\frac{n+m}{2}}}\right) ^{\frac{2}{3}}+2\left( \frac{q}{2^{\frac{n+m}{2}}}\right) +5\left( \frac{q}{2^{\frac{n+m}{2}}}\right) ^2+\frac{1}{2}\left( \frac{2q}{2^{\frac{n+m}{2}}}\right) ^{\frac{n}{n-m}}. \end{aligned}$$
(5)

This implies that \(q_{\frac{1}{2}}(n,m)=\varOmega (2^{\frac{m+n}{2}})\) for any \(0 \le m \le n-4-\log _2 (n)\).

Stam [9] Surprisingly, it turns out that Problem 1 was solved 20 years before Hall et al. [5], in a different context. The bound

$$\begin{aligned} \mathbf{Adv}_{n,m}(q)\le \frac{1}{2}\sqrt{\frac{(2^{n-m}-1)q(q-1)}{(2^n-1)(2^n-(q-1)}}\le \frac{1}{2\sqrt{1-\frac{q-1}{2^n}}}\cdot \frac{q}{2^{\frac{n+m}{2}}}, \end{aligned}$$
(6)

which is valid for all \(0\le m<n\), follows directly from a result of Stam [9, Theorem 2.3]. Note that if \(q \le \frac{3}{4} 2^{n}\) then (6) can be simplified to the very handy form

$$\begin{aligned} \mathbf{Adv}_{n,m}(q) \le \frac{q}{ 2^{ \frac{m+n}{2} } }. \end{aligned}$$
(7)

This implies that \(q_{\frac{1}{2}}(n,m)=\varOmega (2^{\frac{m+n}{2}})\) for any \(0 \le m < n\), confirming the conjecture of [5] in all generality (20 years before the conjecture was raised).

Remark 1

The bound (6) is tighter than all the bounds mentioned above, with one exception: the elementary upper bound (1) is better than (6) for \(q\le 2^{\frac{n-m}{2}}\).

3 Different Methods Give Different Bounds

It is interesting to see how different approaches yield different bounds for Problem 1. To this end, we first define some notations.

For fixed \(m<n\) and \(q\le 2^n\) denote \(\varOmega _q :=\left( \{0,1\}^{n-m}\right) ^q\). We view \(\varOmega _q\) as the set of all possible sequences of replies that can be given by the oracle (to the adversary’s \(q\) queries).

For any \(j\ge 2\) , \(\omega \in \varOmega \) let

$$\begin{aligned} \mathrm {col}_j(\omega )=\#\{1\le i_1<i_2<\ldots <i_j\le q\mid \omega _{i_1}=\omega _{i_2}=\ldots =\omega _{i_j}\} \end{aligned}$$

For \(\omega =(w_1,w_2,\ldots ,w_q)\in \varOmega \) and \(1\le r\le q\), let

$$\begin{aligned} V_r(\omega ):=\{(x_1,x_2,\ldots ,x_q)\in \varOmega \mid \forall 1\le i\le r:\,x_i=w_i\} \end{aligned}$$

be the set of sequences of replies that are the same as \(\omega \) up to the r-th query.

For \(\omega \in \varOmega \) let \({\Pr }_{\mathrm {perm}}(\omega )\) and \({\Pr }_{\mathrm {func}}(\omega )\) be the probabilities that \(\omega \) is the actual sequence of replies that the oracle gives to the adversary’s \(q\) queries, in the case the oracle chose a random permutation or a random function, respectively.

For \(1\le r\le q\), let

$$\begin{aligned} Q^{(r)}_{\mathrm {perm}}(\omega )=\frac{{\Pr }_{\mathrm {perm}}(V_r(\omega ))}{{\Pr }_{\mathrm {perm}}(V_{r-1}(\omega ))},\quad Q^{(r)}_{\mathrm {func}}(\omega )=\frac{{\Pr }_{\mathrm {func}}(V_r(\omega ))}{{\Pr }_{\mathrm {func}}(V_{r-1}(\omega ))}. \end{aligned}$$

Note that

$$\begin{aligned} {\Pr }_{\mathrm {perm}}(\omega )=\prod _{r=1}^q Q^{(r)}_{\mathrm {perm}}(\omega ),\quad {\Pr }_{\mathrm {func}}(\omega )=\prod _{r=1}^q Q^{(r)}_{\mathrm {func}}(\omega ). \end{aligned}$$

3.1 The Proof Method of Hall et al

The proof of (2) uses the general bound

$$\begin{aligned} \mathbf{Adv}_{n,m}(q)\le & {} \max _{\omega \in S}\left|\frac{{\Pr }_{\mathrm {perm}}(\{\omega \})}{{\Pr }_{\mathrm {func}}(\{\omega \})}-1\right|+ \max \left\{ {\Pr }_{\mathrm {func}}(\overline{S}), {\Pr }_{\mathrm {perm}}(\overline{S}) \right\} \le \nonumber \\\le & {} 2\max _{\omega \in S}\left|\frac{{\Pr }_{\mathrm {perm}}(\{\omega \})}{{\Pr }_{\mathrm {func}}(\{\omega \})}-1\right|+{\Pr }_{\mathrm {func}}(\overline{S}). \end{aligned}$$
(8)

that holds for any \(S\subseteq \varOmega \). It is applied to the set

$$\begin{aligned} S:= \left\{ \omega \in \varOmega :\left|\mathrm {col}_2(\omega )-\left( {\begin{array}{c}q\\ 2\end{array}}\right) \frac{1}{2^{n-m}}\right|\le c_1\frac{q}{2^{\frac{n-m}{2}}}\; ,\; \mathrm {col}_3(\omega )=0\right\} . \end{aligned}$$

The expression \(\max _{\omega \in S}\left|\frac{{\Pr }_{\mathrm {perm}}(\{\omega \})}{{\Pr }_{\mathrm {func}}(\{\omega \})}-1\right|\) is bounded by direct computations. The expression \({\Pr }_{\mathrm {func}}(\overline{S})\) is bounded by combining the Union Bound and the Chebyshev inequality. Finally, \(c_1\) is chosen to minimize the resulting bounds.

3.2 The Proof Method of Gilboa and Gueron

To get (4) (for \(m\le n/3\)), we can apply the slightly better (than (8)) bound

$$\begin{aligned} \mathbf{Adv}_{n,m}(q)\le & {} \frac{1}{2}\max _{\omega \in S}\left|\frac{{\Pr }_{\mathrm {perm}}(\{\omega \})}{{\Pr }_{\mathrm {func}}(\{\omega \})}-1\right|+\frac{1}{2}\left( {\Pr }_{\mathrm {func}}(\overline{S})+{\Pr }_{\mathrm {perm}}(\overline{S}) \right) \le \nonumber \\\le & {} \max _{\omega \in S}\left|\frac{{\Pr }_{\mathrm {perm}}(\{\omega \})}{{\Pr }_{\mathrm {func}}(\{\omega \})}-1\right|+ \min \left\{ {\Pr }_{\mathrm {func}}(\overline{S}), {\Pr }_{\mathrm {perm}}(\overline{S}) \right\} \end{aligned}$$
(9)

to the set

$$\begin{aligned} S:= \left\{ \omega \in \varOmega :\left|\mathrm {col}_2(\omega )-\left( {\begin{array}{c}q\\ 2\end{array}}\right) \frac{1}{2^{n-m}}\right|\le c_2\frac{q^{2/3}2^{2m/3}}{2^{n/3}}\, ,\, \mathrm {col}_3(\omega )\le c_3\frac{q^{3/2}}{2^n}\right\} \end{aligned}$$

Here, \(c_2,c_3\) are chosen to minimize the bound. Again, \(\max _{\omega \in S}\left|\frac{{\Pr }_{\mathrm {perm}}(\{\omega \})}{{\Pr }_{\mathrm {func}}(\{\omega \})}-1\right|\) is bounded by direct (elaborate) computation, and \({\Pr }_{\mathrm {func}}(\overline{S})\) is bounded by combining (via the Union Bound) the Chebyshev inequality and the Markov inequality.

The bound (5) (for \(n/3<m\le n-4-\log _2 n\)) follows similarly by examining the set

$$\begin{aligned} S:= \left\{ \omega \in \varOmega :\left|\mathrm {col}_{j+1}(\omega )-\left( {\begin{array}{c}q\\ j+1\end{array}}\right) \frac{1}{2^{j(n-m)}}\right|\le \alpha _j\;\forall 1\le j\le t-1\, ,\, \mathrm {col}_{t+1}(\omega )\le \beta \right\} \end{aligned}$$

for \(t:=\left\lceil \frac{n+m}{n-m}\right\rceil \) and \(\alpha _1,\ldots ,\alpha _{t-1},\beta \) which are chosen to optimize the bound.

3.3 The Proof Method of Bellare and Impagliazzo

Bellare and Impagliazzo also used (9), for the set S of all \(\omega \in \varOmega \) satisfying (for suitable \(\delta \) and \(\lambda \)):

  1. 1.

    For any \(1\le r\le q\),

    $$\begin{aligned} \left| \log \frac{Q^{(r)}_{\mathrm {perm}}(\omega )}{Q^{(r)}_{\mathrm {func}}\omega )}\right| \le \frac{3\delta }{2} \end{aligned}$$
  2. 2.

    For any \(1\le r\le q\),

    $$\begin{aligned} \left| \sum _{x\in V_{r-1}(\omega )}\frac{{{\Pr }_{\mathrm {func}}}(x)}{{{\Pr }_{\mathrm {func}}}(V_{r-1}(\omega ))}\log \frac{Q^{(r)}_{\mathrm {perm}}(x)}{Q^{(r)}_{\mathrm {func}}(x)}\right| \le \frac{\delta ^2}{2}, \end{aligned}$$
  3. 3.
    $$\begin{aligned} \left| \log \frac{{{\Pr }_{\mathrm {perm}}}(\omega )}{{{\Pr }_{\mathrm {func}}}\omega )}-\sum _{r=1}^q\sum _{x\in V_{r-1}(\omega )}\frac{{{\Pr }_{\mathrm {func}}}(x)}{{{\Pr }_{\mathrm {func}}}(V_{r-1}(\omega ))}\log \frac{Q^{(r)}_{\mathrm {perm}}(x)}{Q^{(r)}_{\mathrm {func}}(x)}\right| \le \frac{\delta (\delta +3)\lambda \sqrt{q}}{2}. \end{aligned}$$

The expression \({\Pr }_{\mathrm {func}}(\overline{S})\) is bounded by combining the Azuma inequality and the observation that for any \(1\le r\le q\),

$$\begin{aligned} 0\ge \sum _{\omega \in \varOmega }Q^{(r)}_{\mathrm {func}}(\omega )\log \frac{Q^{(r)}_{\mathrm {perm}}(\omega )}{Q^{(r)}_{\mathrm {func}}(\omega )}\ge -\frac{1}{2} \left( \max _{\omega \in S}\left|\frac{Q^{(r)}_{\mathrm {perm}}(\omega )}{Q^{(r)}_{\mathrm {func}}(\omega )}-1\right|\right) ^2, \end{aligned}$$

3.4 The Proof Method of Stam

Stam’s approach observes that by Pinsker’s inequality [8]Footnote 1 we have

$$\begin{aligned} \mathbf{Adv}_{n,m}(q)\le & {} \frac{1}{2}\sum _{\omega \in \varOmega }\left| {\Pr }_{\mathrm {perm}}(\omega )-{\Pr }_{\mathrm {func}}(\omega )\right| \le \nonumber \\\le & {} \sqrt{\frac{1}{2}\sum _{\omega \in \varOmega }{\Pr }_{\mathrm {perm}}(\omega )\log \frac{{\Pr }_{\mathrm {perm}}(\omega )}{{\Pr }_{\mathrm {func}}(\omega )}}. \end{aligned}$$
(10)

He then uses the decomposition

$$\begin{aligned} \sum _{\omega \in \varOmega }{\Pr }_{\mathrm {perm}}(\omega )\log \frac{{\Pr }_{\mathrm {perm}}(\omega )}{{\Pr }_{\mathrm {func}}(\omega )}=\sum _{r=1}^q\sum _{\omega \in \varOmega }{\Pr }_{\mathrm {perm}}(V_{r-1}(\omega ))Q^{(r)}_{perm}(\omega )\log \frac{Q^{(r)}_{perm}(\omega )}{Q^{(r)}_{func}(\omega )}, \end{aligned}$$

direct (exact) computations, and the concavity of the \(\log \) function.

4 Stam’s Bound is Sometimes Sharp

In the case \(m=n-1\) (i.e., the oracle returns only 1 bit), (6) gives

$$\begin{aligned} \mathbf{Adv}_{n,n-1}(q)\le \frac{1}{2}\sqrt{\frac{q(q-1)}{(2^n-1)(2^n-(q-1)}}\le \frac{1}{\sqrt{2-\frac{q-1}{2^{n-1}}}}\cdot \frac{q}{2^n}. \end{aligned}$$

In this section, we show that this bound is essentially sharp.

With no loss of generality, we may assume q is even and \(q\le \frac{1}{2}2^n\). We define the following adversarial algorithm.

Algorithm 1

Collect the answers (which are, in this case, just bits) of q arbitrary queries.

Compute the difference \(\Delta \) between the number of 0’s and 1’s.

If \(\Delta \le \sqrt{q}/2\), guess that the oracle was using a truncated random permutation. Otherwise, guess that the oracle was using a random function.

The advantage of Algorithm 1 is

$$\begin{aligned} \sum _{|k-(q-k)|<\sqrt{q}/2}\left( {\begin{array}{c}q\\ k\end{array}}\right) \left( \frac{\prod _{i=1}^k(2^{n-1}-(i-1))\cdot \prod _{i=1}^{q-k}(2^{n-1}-(i-1))}{\prod _{i=1}^q(2^n-(i-1))}-\frac{1}{2^q}\right) =\\ =\sum _{|k-(q-k)|<\sqrt{q}/2}\left( {\begin{array}{c}q\\ k\end{array}}\right) \frac{1}{2^q}\left( \frac{\prod _{i=1}^k(2^n-2(i-1))\cdot \prod _{i=1}^{q-k}(2^n-2(i-1))}{\prod _{i=1}^q(2^n-(i-1))}-1\right) \end{aligned}$$

We show that

$$\begin{aligned}&\left( {\begin{array}{c}q\\ k\end{array}}\right) \frac{1}{2^q}\ge \frac{1}{2\sqrt{q}}, \end{aligned}$$
(11)
$$\begin{aligned}&p_k:=\frac{\prod _{i=1}^k(2^n-2(i-1))\cdot \prod _{i=1}^{q-k}(2^n-2(i-1))}{\prod _{i=1}^q(2^n-(i-1))}>1+\frac{q/2}{2^n} \end{aligned}$$
(12)

for any k such that \(|k-(q-k)|<\sqrt{q}/2\). From this, we can conclude that

$$\begin{aligned} \mathbf{Adv}_{n,n-1}(q)>\sqrt{q}\frac{1}{2\sqrt{q}}\frac{q/2}{2^n}=\frac{q/4}{2^n}. \end{aligned}$$

First, note that for \(k=q/2\).

$$\begin{aligned} \left( {\begin{array}{c}q\\ q/2\end{array}}\right) \frac{1}{2^q}= & {} \frac{1}{2}\prod _{i=2}^{q/2}\frac{2i-1}{2i}\ge \frac{1}{2}\prod _{i=2}^{q/2}\frac{\sqrt{i-1}}{\sqrt{i}}=\frac{1}{\sqrt{2q}}, \end{aligned}$$
(13)
$$\begin{aligned} p_{q/2}= & {} \prod _{i=1}^{q/2}\left( 1+\frac{1}{2^n-(2i-1)}\right) \ge \left( 1+\frac{1}{\frac{1}{2}2^n}\right) ^{q/2}\ge 1+\frac{q}{2^n}. \end{aligned}$$
(14)

Since for any \(0\le j<q/2\)

$$\begin{aligned} \frac{\left( {\begin{array}{c}q\\ j\end{array}}\right) }{\left( {\begin{array}{c}q\\ j+1\end{array}}\right) }= & {} 1-\frac{q-2j-1}{q-j}>1-\frac{2(q-2j-1)}{q},\\ \frac{p_j}{p_{j+1}}= & {} 1-\frac{2(q-2j-1)}{2^n-2j}\ge 1-\frac{4(q-2j-1)}{2^n}, \end{aligned}$$

we get that for any \(\frac{q}{2}-\frac{\sqrt{q}}{4}\le k<\frac{q}{2}\)

$$\begin{aligned} \frac{\left( {\begin{array}{c}q\\ k\end{array}}\right) }{\left( {\begin{array}{c}q\\ q/2\end{array}}\right) }= & {} \prod _{i=k}^{\frac{q}{2}-1}\frac{\left( {\begin{array}{c}q\\ j\end{array}}\right) }{\left( {\begin{array}{c}q\\ j+1\end{array}}\right) }\ge \prod _{j=k}^{\frac{q}{2}-1}\left( 1-\frac{2(q-2j-1)}{q}\right) \\\ge & {} 1-\frac{2\sum _{j=k}^{\frac{q}{2}-1}(q-2j-1)}{q}=\\= & {} 1-\frac{(q-2k)^2}{2q}\ge \frac{7}{8},\\ \frac{p_k}{p_{q/2}}= & {} \prod _{i=k}^{\frac{q}{2}-1}\frac{p_j}{p_{j+1}}\ge \prod _{j=k}^{\frac{q}{2}-1}\left( 1-\frac{4(q-2j-1)}{2^n}\right) \\\ge & {} 1-\frac{4\sum _{j=k}^{\frac{q}{2}-1}(q-2j-1)}{2^n}=\\= & {} 1-\frac{(q-2k)^2}{2^n}\ge 1-\frac{q/4}{2^n}. \end{aligned}$$

Now, using (13) and (14) we get

$$\begin{aligned} \left( {\begin{array}{c}q\\ k\end{array}}\right) \frac{1}{2^q}= & {} \frac{\left( {\begin{array}{c}q\\ k\end{array}}\right) }{\left( {\begin{array}{c}q\\ q/2\end{array}}\right) }\left( {\begin{array}{c}q\\ q/2\end{array}}\right) \frac{1}{2^q}\ge \frac{7}{8}\cdot \frac{1}{\sqrt{2q}}>\frac{1}{2\sqrt{q}},\\ p_k= & {} \frac{p_k}{p_{q/2}}p_{q/2}\ge \left( 1-\frac{q/4}{2^n}\right) \left( 1+\frac{q}{2^n}\right) >1+\frac{q/2}{2^n}. \end{aligned}$$

The proof of (11) and (12) for \(\frac{q}{2}< k\le \frac{q}{2}+\frac{\sqrt{q}}{4}\) is similar.

5 An Open Problem

By combining (1), (6), and the trivial bound 1, we can conclude that the best known bound for Problem 1 is

$$\begin{aligned} \mathbf{Adv}_{n,m}(q)\le {\left\{ \begin{array}{ll}\frac{q(q-1)}{2^{n+1}}\quad &{} q< (1 + o(1)) ~ 2^{\frac{n-m}{2}}\\ \frac{1}{2}\sqrt{\frac{(2^{n-m}-1)q(q-1)}{(2^n-1)(2^n-(q-1)}}\quad &{} (1 + o(1)) ~ 2^{\frac{n-m}{2}}\le q\le (2 + o(1)) ~ 2^{\frac{n+m}{2}}\\ 1 \quad &{} (2 + o(1)) ~ 2^{\frac{n+m}{2}}< q,\end{array}\right. } \end{aligned}$$
(15)

and in a simpler form:

$$\begin{aligned} \mathbf{Adv}_{n,m}(q)=O\left( \min \left\{ \frac{q^2}{2^n},\,\frac{q}{2^{\frac{n+m}{2}}},\,1\right\} \right) . \end{aligned}$$
(16)

Figure 1 shows the graphs of the base 2 logarithm of \(\frac{q^2}{2^n}\) and \(\frac{q}{2^{\frac{n+m}{2}}}\) as a function of q, for different ranges of q, illustrating the crossover point at \(q= 2^{\frac{n-m}{2}}\).

Fig. 1
figure 1

Base 2 logarithm of \(\frac{q^2}{2^n}\) (red line) and \(\frac{q}{2^{\frac{n+m}{2}}}\) (blue lines), which appear in the upper bound 16. Here, \(n=128\), \(m=n/2=64\), and the functions are plotted for low (left) and high (right) ranges of q (the scale of the horizontal axis is logarithmic). The value at \(q=2^{32}\)) is the crossover point, were the “linear” term (blue line) provides the better upper bound than the “quadratic” term (red line). Note that the latter term becomes worse than the trivial bound (\(\log _2 (1) = 0\)) at \(q=2^{64}\) (Color figure online)

By the lower bound in (1), we know that the bound in (16) is essentially sharp for \(m=0\). By our proof in Sect. 4, we know that the bound in (16) is essentially sharp \(m=n-1\). The natural question that remains open is whether the bound (16) is essentially sharp for all \(0\le m<n\).

Added in proof Since the paper was submitted, the two first authors managed to solve the above question, and to prove that the bound (16) is essentially tight. See Ref. [2].