Improving bounds on probabilistic affine tests to estimate the nonlinearity of Boolean functions

In this paper we want to estimate the nonlinearity of Boolean functions, by probabilistic methods, when it is computationally very expensive, or perhaps not feasible to compute the full Walsh transform (which is the case for almost all functions in a larger number of variables, say more than 30). Firstly, we significantly improve upon the bounds of Zhang and Zheng (1999) on the probabilities of failure of affinity tests based on nonhomomorphicity, in particular, we prove a new lower bound that we have previously conjectured. This new lower bound generalizes the one of Bellare et al. (IEEE Trans. Inf. Theory 42(6), 1781–1795 1996) to nonhomomorphicity tests of arbitrary order. Secondly, we prove bounds on the probability of failure of a proposed affinity test that uses the BLR linearity test. All these bounds are expressed in terms of the function’s nonlinearity, and we exploit that to provide probabilistic methods for estimating the nonlinearity based upon these affinity tests. We analyze our estimates and conclude that they have reasonably good accuracy, particularly so when the nonlinearity is low.


Introduction and motivation
Boolean functions are defined on a vector space over the binary finite field F 2 with output in F 2 .For many cryptographic applications it is important that functions are not affine, and not even close (with respect to the Hamming distance, defined in (3)) to being affine.The nonlinearity of a function f , denoted d A (f ), defined as the minimum Hamming distance to any affine function, is therefore an important cryptographic property.This indicator can be computed by using the Walsh transform (also called Walsh-Hadamard or discrete Fourier transform).The Walsh transform of a function f in n variables can be computed from its truth table by an algorithm similar to the fast Fourier transform in time O(n2 n ).Computing the Walsh transform is not feasible in practice when the number of variables is large (e.g., it is not feasible for functions in 80 variables; functions which model an output of a stream or block cipher as a function of the key would have a number of variables equal to the length of the key, i.e. at least 80 variables) and the function is given as a "black box" (or given by an algorithm or formula which is not amenable to simple manipulation for the purpose of computing the Walsh transform).
The motivation of this paper is to probabilistically estimate the nonlinearity of f to a reasonable degree of accuracy.The main idea is as follows.Consider a probabilistic test (we will see some examples shortly) which has a success/fail outcome based on the values of f at some fixed number k of points in F n 2 (f can therefore be given as a "black box" function).Denote by T (f ) the probability of failing the test (with the probability taken over all possible choices of k inputs in F n 2 ).We assume T (f ) is positively correlated, to some extent, with the nonlinearity d A (f ), and can be bounded by some functions in d A (f ), say Lower(d A (f )) ≤ T (f ) ≤ Upper(d A (f )).If we can obtain T (f ) with reasonable accuracy by practical statistical testing (e.g.binomial proportion confidence interval), we can then estimate the nonlinearity as: d A (f ) ∈ [min(Upper −1 (T (f ))), max(Lower −1 (T (f )))], ( (we use F −1 (x) to denote the preimage of x under F ), or, if the preimage has only one element, d A (f ) ∈ [Upper −1 (T (f )), Lower −1 (T (f ))].
( 2 ) To obtain an accurate estimate, it is important that T (f ) depend strongly on d A (f ) and that the bounds are very good.We will examine several probabilistic tests, improve some of the existing bounds, and analyze the accuracy of the resulting estimation.
The linearity test most commonly used is based on the textbook definition of a linear function, namely f (u + v) = f (u) + f (v) (often called the BLR test from [3]): what it means is that we pick u, v∈ F n 2 uniformly at random, compute u + v, query the black box to extract f (u), f (v), f (u + v), and check if the aforementioned condition holds.If f passes this test for many pairs (u, v), then f is probably linear.If f fails the test for at least one pair, then f is certainly not linear.We denote by P 2 (f ) the probability of f failing the test (with probability taken over all pairs (u, v) ∈ F n 2 ×F n 2 ) and by d L (f ) the normalized Hamming distance of f to the closest linear function.Several authors have determined upper and lower bounds for P 2 (f ) as a function of d L (f ) (see [1,9] and the references therein).
For cryptographic applications we are not so much interested in whether the function is linear, but rather whether it is affine.For example, such tests play a crucial role in the cube and AIDA attacks (see [6,12]), which are refined high-order differential attacks, targeted at primitives in stream and block ciphers based on low-degree components.The probabilistic test used in [6] for deciding whether a function f is affine is to check whether f (u + w) + f (u) + f (w) + f (0) = 0 holds (for u, w chosen uniformly at random), which can be viewed as using the BLR test to check whether f (u) − f (0) is linear.The functions of interest f are functions in many variables (typically at least 80 variables), obtained as higher-order derivatives of a function g which describes, for example, the first output bit of the stream cipher as a function of the key and initialisation vector.Although explicit algorithms are available for computing g and f (in the case of the Trivium cipher, the algorithm for computing g starts with some relatively simple functions of algebraic degree two, which are iteratively composed 1152 times for the full cipher, or about 700 times for reduced versions of the cipher), it is not feasible in practice to compute their algebraic normal form, or truth table, or nonlinearity, or Walsh transform.Instead, g is treated as a "black box" function, and f can be evaluated at any given input using several calls to g.
Another test used in the literature for deciding whether a Boolean function is affine is to check whether the equation f (u 2 chosen uniformly at random.Like in the case of the linearity test, if f passes the test for many triples u, v, w, then f is probably affine.We denote by P 3 (f ) the probability of f failing this affinity test (with the probability taken over all triples (u, v, w) ∈ F 3n 2 ).As in the case of the linearity tests, a natural question is whether P 3 (f ) is related to d A (f ), the distance to the closest affine function (note that this is the nonlinearity of f ).A lower bound for P 3 (f ) in terms of d A (f ) was given in Bellare et al. [1].
A generalization of the tests above was proposed by Zhang and Zheng in [14], where the authors defined the notion of (k + 1)-st order nonhomomorphicity of a function f as the probability P k (f ) of failing the test with the probability taken over all tuples (u 1 , . . ., u k ) ∈ (F n 2 ) k (see Definition 1).It was shown that for k odd, f is affine if and only if P k (f ) = 0; for k even, f is linear if and only if P k (f ) = 0; also, still for k even, f is affine if and only if P k (f ) ∈ {0, 1}.Furthermore, some bounds on P k (f ) with respect to d A (f ), for k odd, were given in [14].
In this paper, we firstly improve both the upper and lower bounds presented by Zhang and Zheng in [14] for P k (f ) with k odd (see Sections 3 and 4).Our lower bound holds for arbitrary k and generalizes the lower bound proven in [1] for k = 2, 3.The proofs use the techniques employed in [1] as well as additional combinatorial manipulation.We also prove the lower bound we conjectured in [11].
Secondly, we consider the following probabilistic test for affine functions.We can use any probabilistic linearity test, and test whether f is linear or f + 1 is linear.If either of these holds, then f is affine.For the nonhomomorphicity test with k even, this is equivalent to testing whether P k (f ) ∈ {0, 1}.The fact that f is affine if and only if P k (f ) ∈ {0, 1} was proven in [14].However, when f is not affine no results were given regarding how the probability of failing this test depends on the nonlinearity of f .In Section 5 we show that upper and lower bounds can be obtained for the value of min(P k (f ), P k (f + 1)) in the case k = 2 (i.e. the BLR test).Namely, using the bounds on failing the BLR linearity test from [1], which depend on the distance to the closest linear function, we show that similar bounds hold for min(P 2 (f ), P 2 (f + 1)), but this time the bounds depend on the distance to the closest affine function.We also show that the refinements of the bounds from [1] given in [9] can be applied to our bounds too.
The nonlinearity of a function f can be estimated by first using any of the above tests and a practical statistical method to estimate the probability of failing that test (as demonstrated in [14]).Then, using (1) or (2), we obtain an estimate for the nonlinearity of f .In Section 6 we analyze the accuracy of the estimation.There are functions f, g such that f has higher probability of failing the test than g, even though f has lower nonlinearity than g.This was shown in [2] for the BLR test and in [14] for the tests based on the (k + 1)-st order nonhomomorphicity with k odd.However, the estimates get more accurate as k increases.For example, for k = 7, for any given value of P 7 (f ) we can estimate the nonlinearity as being within an interval of length 0.011 or less if P 7 (f ) ≤ 0.49 and length 0.053 or less if Other nonlinearity tests were proposed for reducing the number of evaluations needed for the black box function, such as [7,13] (the latter being also useful to estimate the algebraic degree of f ).The (k + 1)-st order nonhomomorphicity for k = 3 was used for attacks on actual ciphers in [10].We intend to push further the connection between the probability of failing these tests and the nonlinearity, as well as look at estimating the nonlinearity of functions of cryptographic interest.

Preliminaries
We recall definitions and known results needed for the rest of the paper.
Throughout, n will denote a positive integer.Boolean functions in n variables are functions f : where F 2 is the binary field, and F n 2 is the n dimensional vector space over F 2 .It is well known that any such function can be uniquely represented in its ANF (Algebraic Normal Form), i.e. as a polynomial in F 2 [x 1 , . . ., x n ] of degree at most 1 in each variable.The total degree of the ANF representation is called the algebraic degree of f .Functions of algebraic degree at most one are called affine; affine functions with zero constant term are called linear.We will denote by A the set of affine functions and by L the set of linear functions in n variables over F 2 , if the dimension is understood from the context.
In this paper, like in [1], it will be convenient to use the normalized version of the Hamming distance and weight.More precisely, we define the (normalized) Hamming distance and Hamming weight for vectors a = (a 1 , . . ., a t ) and b = (b 1 , . . ., b t ) in F t 2 , as well as the distance of a vector a ∈ F t 2 to a set of vectors S ⊆ F t 2 as: In the literature, the Hamming weight and distance are more often used without normalization (i.e. in the definitions above, one does not divide by the length of the vector) but we will explain shortly why normalization is useful for our purpose.The truth table of a function f is the vector ), where v i are all the elements of F n 2 in some fixed order, e.g., lexicographical order.The (normalized) Hamming weight, denoted by w(f ), of a Boolean function f is w(T T (f )) and the distance, denoted by d(f, g), between two Boolean functions f, g is d(T T (f ), T T (g)).
Of particular importance will be the distance of a function f to the set of affine or of linear functions.The minimum distance to any affine function, d A (f ), is called the (normalized) nonlinearity of f and is a very important cryptographic indicator.It is easy to see that d A (f ) = min(d L (f ), d L (f + 1)).Our motivation for using the normalized version of nonlinearity (based on the normalized version of Hamming distance) is that it allows a meaningful comparison of the nonlinearity of two functions which might not have the same number of variables.
The Fourier-Hadamard transform of a function f : F n 2 → R (the 0/1 values of a Boolean functions are viewed as real numbers for this purpose) is the function W (f ) : where the dot product can be defined as u • v = n i=1 u i v i .Note that we use a normalized version of the transform here.If f is replaced by its sign function, f , defined by f (u) = (−1) f (u) , then W ( f ) is customarily referred to as the Walsh (or Walsh-Hadamard) transform of f , and the values W ( f )(v) for v ∈ F n 2 are called the Walsh coefficients.We will refer to the sequence of output values of the Walsh transform (when the input is ordered lexicographically) as the Walsh spectrum.
We will be using later Parseval's identity (see [5] for example): which holds for any Boolean function f .It is well-known [5] and easy to see that the Walsh transform of a Boolean function f expresses its distance to the set of linear functions, and consequently the distance of f to the set of affine functions.Denoting a (u) = a • u, the nonlinearity of f is related to the Walsh transform as follows: . We call a function f : (they exist only for even integers n).It is known [5] that f is bent if and only if the absolute values of all of its Walsh coefficients satisfy

Definition 1 ([14]
) Let f : F n 2 → F 2 be a Boolean function in n variables and let k ≤ 2 be an integer.The (k + 1)-st order nonhomomorphicity of f , denoted P k (f ), is defined as the probability that the equation f (u with the probability taken over all tuples (u 1 , . . ., u k ) ∈ F kn 2 i.e.
In other words, P k (f ) is the normalised Hamming weight of the function Note that the BLR test corresponds to the particular case of k = 2.

Improved bounds on the probability of failure of existing affinity tests
We consider the test of whether a function is affine by checking whether f (u for some fixed odd integer k.We examine the relationship between the probability P k (f ) of failing this test and d A (f ), the nonlinearity of f .It is well known, and easy to prove, that f is affine if and only if P k (f ) = 0.
A lower bound for P 3 (f ) was proven in [1, Lemma 5.1] (with x = d A (f )): The following lower and upper bounds were given in [14] for k odd (we reformulated them to use the normalized version): We improve on the bounds (7) as follows: Theorem 2 Let f : F n 2 → F 2 and let k ≥ 2 be an integer.Then: For k odd we have the upper bound If we allow the bound to also depend on n, we have the improved bound Proof In [8, Theorem 3.1], [14, Theorem 2] (and, for k ≤ 3, in [1]) the following expression for P k is obtained (we reformulate it for the normalized versions of the Walsh transform and nonhomomorphicity): where f (x) = (−1) f (x) and W ( f ) is the Walsh transform of f .
In order to obtain the lower bound in the statement we need an upper bound on the sum , which we obtain by a technique similar to the one of [1]: where the last equality uses Parseval's identity (4).Using (5), we obtain (8) as follows: When k is odd (so the exponent k + 1 is even), all the terms in the sum in ( 11) are non-negative, so a simple lower bound for this sum is which gives the upper bound (9).For the bound (10) , we obtain (10).
Note that for k ≥ 3 odd, the bounds in the theorem above are better than the bounds (7).
The lower bound in (7) , so it does not provide any useful information in that range.When it is positive, it is still always smaller than the lower bound in (8), only reaching equality when d A (f ) attains its maximum value, namely 1  2 1 . The upper bound in (7) does not depend on d A (f ), whereas the one in (10) increases continuously from 0 to 1 2 1 − We examine the tightness of the bounds in Theorem 2. The upper bound (9) cannot be reached (except for the trivial case d A (f ) = 0 and P k (f ) = 0) because Upper n,k (x) < Upper k (x) for 0 < x < 0.5.Note however that Upper k (x) is the limit of Upper n,k (x), as n approaches infinity.We found experimentally functions f for which P k (f ) is very close to the upper bound Upper k (d A (f )), while d A (f ) covers many values throughout the interval (0, 0.5), see the last graph in the Appendix.We suspect therefore that this upper bound cannot be improved much (as a bound which is independent of n).
The examples below present functions for which the upper bound (10) as well as the lower bound in Theorem 2 are attained.
Example 3 For n even and k odd, consider a bent function in n variables, for example Note that in this case so both the lower bound (8) and the upper bound (10) are attained.
The Walsh coefficients can be easily computed directly from the definition: The nonlinearity is d A (f ) = 1 2 n .Using (11) we have for k odd One can verify that in this case P k (f ) = Upper k,n (d A (f )) so the upper bound (10) is attained.
Example 5 Consider an arbitrary function in m variables, f (x 1 , . . ., x m ).We can view it as a function in a larger number n of variables for any n ≥ m by defining f (x 1 , . . ., x n ) = f (x 1 , . . ., x m ).We show that f and f have the same nonlinearity and P k (f ) = P k (f ).To this end, we examine the Walsh transform.Denoting x = (x 1 , . . ., x m ) and x = (x m+1 , . . ., x n ), as well as, y = (y 1 , . . ., y m ) and y = (y m+1 , . . ., y n ), we have by using [5, Lemma 2.9].Therefore we conclude that d A (f ) = d A (f ) using ( 5); also P k (f ) = P k (f ) using (11).
We consider now the function x m with m even and m ≤ n and let k be odd.Using the argument above and the computation in Example 3 we know that d , so this function reaches the lower bound in Theorem 2 as well.
Summarising the examples above, for each fixed number of variables n and each odd k, the upper bound Upper n,k in Theorem 2 is reached at nonlinearity 1  2 n (which is the lowest possible non-zero nonlinearity) and if n is even, also at for m even, m ≤ n, i.e. 1  4 , 3 8 , 7 16 , . .., for m = 2, 4, 6, . .., respectively.In between these values, the lower bound might not be tight.Indeed for d A (f ) < 1  4 , (6) provides a better lower bound for k = 3.We conjectured in [11] that the lower bound (6) can be generalized to arbitrary odd k ≥ 3, as follows: In the next section we will prove this conjecture.

Reformulated conjecture and its proof
Theorem 7 Let f be a Boolean function in n variables, k ≥ 2 an integer and a linear function in n variables if k is even or an affine function if k is odd.Then where for any Boolean function h in n variables sl(f, h) (called the "slack" in [1]) is defined as with the probability taken over all u 1 , . . ., Proof The first part of the proof follows the lines of for those values u 1 , . . ., u k for which an odd number of the values f (u 1 ) − (u 1 ), . . ., f (u k+1 ) − (u k+1 ) are equal to 1 (note that always passes the test).Denote by A j the probability that the first j of these k + 1 values are equal to 1 and the rest are equal to 0, i.e.
where, for ease of notation, we denoted g = f − , and the probability is taken over all the 2 kn elements of the set V = {(u 1 , . . ., u k+1 ) ∈ (F n 2 ) k+1 : k+1 i=1 u i = 0}.For any subset I ⊆ {1, . . ., k + 1} of cardinality j , A j also equals the probability (again over all (u 1 , . . ., u k+1 ) ∈ V ) that g(u i ) = 1 for all i ∈ I and g(u i ) = 0 for all i ∈ {1, . . ., k + 1} \ I .Therefore, taking into account that there are k + 1 j subsets of cardinality j , we obtain For each fixed i, the probability P (f (u i ) − (u i ) = 1) over all u i ∈ F n 2 equals x = d(f, ).For any j with 0 ≤ j ≤ k we have We obtain In the second part of the proof we will obtain a closed form for the formula above.Firstly, let us process the inner sum in (15); we replace the index of summation by u = i − j and then use the well-known identity Substituting this in (15) and since (−1) k+1−j = −(−1) k+1 , when j is odd, we obtain We note that the first sum consists of alternating terms of a binomial expansion.The following result is therefore useful: where m ≥ 1 is an integer and a, b indeterminates.This is a known result, but for a quick proof, we denote by A and B the following quantities Using Equations ( 17), (18) and A k+1 = sl(f, ) in ( 16), we obtain ( 14): This concludes the proof.
We are now ready to prove Conjecture 6; we will, in fact, prove a more general result that also includes the case of k even.

Corollary 8 For odd k we have P k (f ) ≥ Lower k (d A (f )) and for even k we have
In more detail, for k odd we have The H k (x) component of the lower bound was proven in Theorem 2, so we concentrate on the G k (x) component.For k even, Theorem 7 gives for any linear function .Using the fact that sl(f, ) ≥ 0 and choosing to be a linear function whose distance to f is minimal, we obtain For any function h (affine or not) we have with the probability taken over all tuples (u 1 , . . ., u k ) ∈ (F n 2 ) k .Combining this inequality with Theorem 7 for k odd gives for any affine function .Choosing to be an affine function whose distance to f is minimal, we obtain Surely, we can ask ourselves whether it is possible that another affine/linear function (for k odd/even) say 1 , which is further from f , i.e. d(f, 1 ) > d(f, 0 ), could yield a better lower bound, i.e.G k (d(f, 1 )) > G k (d(f, 0 )).This is not the case, and we give a sketch of the proof for k odd.Firstly, the reader can verify that G k (x) ≥ 0 on [0, 0.5], G k (x) ≤ 0 on [0.5, 1], and that on the interval [0.25, 0.5], the function G k is monotonically decreasing.Therefore, when d(f, 0 ) ≥ 0.25, keeping in mind that 0 ≤ d(f, 0 ) < 0.5 and which is greater than or equal to zero on [0, 0.25].
Finally, the more explicit expression (19) for k odd is obtained by verifying that ).A similar situation happens for k even, but the intersection of the two functions does not occur at 1  4 , but at a point whose value depends on k, and is in the interval 1  4 , 1  3 .
The new lower bound in Corollary 8 is attained for k odd by some functions with nonlinearity in the range 0 < d A (f ) < 1  4 and for k even by some functions with 0 < d L (f ) < 1  4 : On the other hand, computing Lower k (x) defined in Corollary 8 for so the lower bound is attained.
For k even we compute using ( 11) so again the lower bound is attained.This shows that for each fixed n our lower bound is attained at nonlinearity (for k odd) or d L (f ) (for k even) equal to 1  8 , 1 16 , 1 32 , . . ., 1  2 n , but in between these values, the bound might not be tight.
While not the purpose of this paper, we get an interesting consequence of the previous theorem, namely an upper bound for the moments of the Walsh coefficients.

Corollary 10
For any integer k ≥ 2, the (k + 1)-st moments of the Walsh transform satisfy Proof The claim follows by using ( 5), (11) and the previous corollary.

Affinity tests using linearity tests and bounds on the probability of failure
In this section we focus on the test f (u for k even, and on the probability P k (f ) of failing this test.It is shown in [14] that f is linear if and only if P k (f ) = 0; moreover, f is affine if and only if P k (f ) ∈ {0, 1}.The test can therefore be used as a probabilistic affinity test as follows: run the test several times on f , and if f always passes the test (suggesting a probability of failure P k (f ) = 0), or f always fails the test (suggesting that P k (f ) = 1), then declare f to be affine.Note that f +1 passes the linearity test above for some given tuples if and only if f fails the test for those same tuples; therefore we have P k (f + 1) = 1 − P k (f ).Another way of looking at this affinity test is that we are testing both f and f + 1 for linearity, and if one of them passes all the tests and is declared linear then f can be declared affine.Any other linearity test could be used this way as an affinity test.When f is not affine however (and therefore neither f nor f + 1 are linear), there are to our knowledge no results regarding the relationship of the probability P k (f ) of failing the test (for k even) and the nonlinearity d A (f ) of f ; the existing lower and upper bounds on P k (f ) depend on d L (f ), the distance of f to the set of linear functions.Since this affinity test is equivalent to testing both f and f + 1 for linearity, it seems natural to consider both P k (f ) and P k (f + 1) when examining a connection to d A (f ).We define and study its relation to d A (f ).Further motivation for this choice is given in Remark 12.For k = 2, we will prove lower and upper bounds for P 2 (f ) in terms of the nonlinearity of f .
In Bellare et al. [1] lower and upper bounds were given for P 2 (f ) in terms of d L (f ).Namely, it was proven that where Lower 2 , Upper 2 : [0, 1  2 ] → R. The function Lower 2 (x) is defined as We now prove bounds for P 2 (f ) in terms of d A (f ), the distance to the closest affine function, which is the natural parameter to consider when testing if a function is affine.Note that in the following theorem, although the bounds look similar to the bounds in (21) above, there is a subtle and important difference: the bounds are now a function of d A (f ), the distance to the closest affine function, whereas in (21) the bounds are expressed in terms of d L (f ), the distance to the closest linear function.
, where P 2 (f ) is the probability of failure of the BLR test.We have where d A (f ) is the nonlinearity of f and Lower 2 (x), Upper 2 (x) are as defined above in ( 22), respectively, (23).
Proof We know that d ).We can assume, without loss of generality, that d L (f ) ≤ d L (f + 1) (otherwise, we can just replace f by f + 1, and First, let us examine the function Upper 2 (x) more closely.If 1 4 ≤ x < 1 2 then log 2 x = −2 so a simple computation shows that Upper 2 (x) = 6x 2 − 3x + 1 in this case.If 1 8 ≤ x < 1 4 then log 2 x = −3 so Upper 2 (x) = 6x 2 + 1 4 in this case.One can check that the function Upper 2 (x) is monotonically increasing on the domain 0, 1  2 .(It is continuous, and the derivative exists at all points except those of the form x = 1 2 m for some integer m ≥ 1.The derivative is greater than zero at all points where it exists.)The equation Upper 2 (x) = 1 2 has only one solution in the interval 0, 1 2 , namely x = (see the first graph in the Appendix).
For the upper bound, from (21) we have Therefore, using the fact that Upper 2 is monotonic and the assumption d we obtain From the behaviour of Upper 2 (x) discussed above, we see that this can only happen when d L (f ) ≥ 1 2 √ 6 . We have to prove that in this case Let us first consider the case 1 We have where the last inequality uses the fact that d L (f ) ≤ 1 4 .Next assume that 1  4 < d A (f ).Consider first the subcase 1 4 ≤ d A (f ) < 5 16 .We have: and therefore using the fact that Upper 2 (x) = 6x 2 − 3x + 1 when x ≥ 1 4 we have Finally, let us consider the subcase d A (f ) ≥ 5 16 .We have with the last inequality based on the fact that 5 16 < d L (f ) ≤ d L (f + 1) and Lower 2 (x) is monotonically increasing when the argument is above 5  16 .
Remark 12 One might wonder if the situation where d L (f ) ≤ d L (f + 1) and P 2 (f ) > P 2 (f + 1), which is the non-straightforward case in the proof of Theorem 11, does even happen in practice.Experimentally, we did find such functions, but they seemed to be relatively rare.For example, for n = 6 and n = 7, we generated several random functions for each possible nonlinearity and we only observed that behaviour in a proportion of less than 0.06 of them.Therefore it is a reasonable heuristic, but only a heuristic, to assume that whichever of the functions f and f + 1 achieves min(P 2 (f ), P 2 (f + 1)) also achieves min(d L (f ), d L (f + 1)).It also justifies our choice to examine min(P 2 (f ), P 2 (f + 1)) for its correlation to d As a byproduct of the proof of Theorem 11, we can also obtain bounds on how large the difference P 2 (f ) − P 2 (f + 1) can be when d L (f ) ≤ d L (f + 1).Namely, denoting x = d L (f ), we have the following cases.When x ∈ 0, 128 ≈ 0.296.By contrast, for arbitrary functions f, g such that d L (f ) < d L (g) but P 2 (f ) > P 2 (g), the difference P 2 (f ) − P 2 (g) can be larger, approaching 0.5.For example, for any integer t ≥ 2 consider the functions f ( Using the calculations in Examples 3, 4, 5 and 9, we have that d An improvement of the lower bound for the BLR linearity test ( 21) is given in [9].Namely, it is shown that P 2 (f ) ≥ H (d L (f )), where where for any constant 0 < c ≤ 1 2 , g 1 , g 2 are defined as Note that this is indeed an improved lower bound as Lower 2 (x) ≤ H (x) and the inequality is strict on the interval 45 128 , 1 2 .The analogue of Theorem 11 holds for this improved bound as well.
Proposition 13 With the notations in Theorem 11, we have P 2 (f ) ≥ H (d A (f )), where H (x) is as defined above in (28).
Proof Examining the proof of Theorem 11 we see that for the lower bound in the interval 5  16 , 1 2 the only property that is used is that it is monotonically increasing.It suffices therefore to show that min(g 1 , g 2 ) is monotonically increasing.We compute g 1 and g 2 , the derivatives of g 1 and g 2 , and show that they are positive on the specified domain.Namely, This concludes the proof.

Estimating nonlinearity
The above affinity tests can be used to estimate the nonlinearity of a Boolean function.
The probability of failing a test can be estimated by running the test several times and using statistical methods such as the binomial proportion confidence interval (see [14]).The bounds will then allow to give an interval for the value of the nonlinearity as per ( 1) and ( 2).
For simplicity, we will assume that we have obtained an exact value for P k (f ) (in practice we will actually obtain a confidence interval).We will examine each test in turn.The graphs in the Appendix will aid the discussion.
We first look at the affine test based on the BLR test, as described in Section 5.The first graph in the Appendix displays the lower and upper bound described in Theorem 11.Thus, for values of 0 ≤ P 2 (f ) < y ( (P 2 (f )), P 2 (f )}, where 0 < α (2) (y) ≤ β (2) (y) < 5 16 are the two roots of the equation 3x − 6x 2 = y in this domain.We obtain two disjoint intervals where d A (f ) might be: Finally, for P 2 (f ) ≥ We use the upper bound Upper k described in Theorem 2 and the lower bound Lower k described in Corollary 8, illustrated for k = 3, 5 in the second and third graph in the Appendix.
As x increases in the interval [0, 0.5], Upper k (x) increases, whereas Lower k (x) first increases from 0 to a local maximum y (k)  2 , then decreases to a value of y 4 ) and increases again to 0.5.Consequently, we have three cases.When 0 where for each 0 ≤ y ≤ 0.5 we denote by 0 < α (k) The length of this interval increases as P k (f ) increases from 0 to y (k) 1 (for illustration, it increases to a value of 0.028, 0.016 and 0.011 for k = 3, 5 and 7, respectively). When Finally, for y (k) Note that the less tight bounds (7) from [14] would give the considerably less accurate estimate The length of the interval produced by our estimate (29) is This quantity has a unimodal behavior: the length increases as a function of P k (f ), peaking at a value of 1 2 , achieved when and then decreases to 0, when P k (f ) reaches 0.5.For example, if k = 3, 5, 7, the length of the interval at a value of 0.125, 0.0741 and 0.05273, respectively (achieved when P k (f ) = 0.469, 0.496 and 0.4995, respectively).The maximum length of the interval is achieved when P k (f ) is quite close to 0.5; the larger the value of k, the smaller the maximum length of the interval, that is, the more precisely we can estimate the nonlinearity.We summarize these results in Table 1, which contains, for different values of k, the maximum length of the interval obtained when estimating the nonlinearity.The length is  1 ] discussed above.Secondly, the last column displays the maximum length of the interval for the remaining (higher) values of P k (f ).
We also present in Table 2 the estimate of the nonlinearity d A that would be obtained by this method for a few examples of functions, and compare it with the true value of the nonlinearity.The examples in this table are the ones in Example 5, f (x 1 , . . ., x n ) = x 1 x 2 + • • • + x m−1 x m with m = 2, 4, 6, 8 and n ≥ m and the functions in Example 9 of the type f (x 1 , . . ., x n ) = x 1 x 2 • • • x m with m = 3, 4, 5, 6 and n ≥ m.We observe that for all these functions, the true value of the nonlinearity is at the top end of the estimated interval.
We also examined experimentally random functions in up to 9 variables (see the fourth figure in the Appendix), plotting the probability of failure P 3 (f ) as a function of the nonlinearity d A (f ).To obtain data for each possible value of the nonlinearity we started by randomly generating several functions for each possible weight lower than 0.5.We then computed their nonlinearity (for weights lower than 0.25 it is equal to the weight of the function, as the function is closer to the all-zero function than to any other affine function; for higher weights, the nonlinearity can be different from the weight, but many functions will have a nonlinearity close to their weight).We noticed that for functions in 7 or more variables most of the functions in our data have probability P 3 of failing the test close to the upper bound for P 3 .This translates to the true value of the nonlinearity being at the low end of the estimated interval.We observed a similar situation for k = 5, 7.
To conclude this section, we note that each test we considered is quite accurate in estimating nonlinearity when the probability of failing the test is small (and consequently the nonlinearity of the function is small), but the accuracy decreases as the probability of failing the test increases.If we were to apply different tests to the same function, we note that the estimated interval for the nonlinearity is least accurate when using the affinity test based on the BLR test.The tests based on (k + 1)-st order nonhomomorphicity with k odd have better accuracy, and this accuracy improves as k increases.

2 k+1 2 .
Recall that the weighted power means inequality states that for any integers m ≥ 1, j ≥ 2 and any positive real numbers a 1 , . . ., a m we have m i=1 example [4, Chapter III]).Using this inequality and Parseval's identity, we obtain ) j b m−j , and use the fact that A + B = (a + b) m and −A + B = (−a + b) m .For a = b = 1, (17) becomes 0≤j ≤m j odd m j = 2 m−1 .(18)

2 m 1 2 m− 1 ,
Using the computations in Example 4 and the same arguments as in Example 5 we see that d A (f ) = d L (f ) = 1 and the Walsh spectrum consists of one element equal to 1 − 1 2 m−1 , 2 m−1 elements equal to and 2 m−1 − 1 elements equal to − 1 2 m−1 , the remaining elements being zero.For k odd, like in Example 4 we compute

Function d A k = 3 ,+x 5 x 6 + x 7 x 8 x 1 x 2 x 3 4 .
estimated d A k = 5, estimated d A k = 7, estimated d A 47 • 10 −12 , 1 64 displayed firstly, for low values of P k (f ), namely the values in the interval [0, y (k) f) depending on non-linearity The nonlinearity achieves the maximum possible value for a function in n variables, namely d A (f ) = n 2 −1 the opposite sign.Using (11) we can compute The function Upper 2 (x) is defined as Upper 2 (0) = 0 and for x > 0 Upper 2

Table 1
Precision of estimating the nonlinearity k Length of interval for d A Length of interval for d A

Table 2
Examples of estimating the nonlinearity