Probabilistic estimation of the algebraic degree of Boolean functions

The algebraic degree is an important parameter of Boolean functions used in cryptography. When a function in a large number of variables is not given explicitly in algebraic normal form, it is usually not feasible to compute its degree, so we need to estimate it. We propose a probabilistic test for deciding whether the algebraic degree of a Boolean function f is below a certain value k. If the degree is indeed below k, then f will always pass the test, otherwise f will fail each instance of the test with a probability dtk(f)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{dt}_k(f)$$\end{document}, which is closely related to the average number of monomials of degree k of the polynomials which are affine equivalent to f. The test has a good accuracy only if this probability dtk(f)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{dt}_k(f)$$\end{document} of failing the test is not too small. We initiate the study of dtk(f)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{dt}_k(f)$$\end{document} by showing that in the particular case when the degree of f is actually equal to k, the probability will be in the interval (0.288788, 0.5], and therefore a small number of runs of the test will be sufficient to give, with very high probability, the correct answer. Exact values of dtk(f)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{dt}_k(f)$$\end{document} for all the polynomials in 8 variables were computed using the representatives listed by Hou and by Langevin and Leander.


Introduction and motivation
The algebraic degree is an important parameter of Boolean functions used in cryptography.A Boolean function f in n variables can be uniquely represented in ANF (algebraic normal form), i.e. as a polynomial in n variables over F 2 (the binary field) of degree at most one in each variable.The degree of this polynomial is called the algebraic degree of f .Ciphers which can be represented (or approximated) as functions of low degree are vulnerable to attacks such as higher order differential attacks.
When the ANF of f is not given explicitly (e.g.f is a composition of functions, or is given as a "black box") and depends on a large number of variables, it may not be feasible to compute its degree.Instead, we aim to estimate the degree using probabilistic tests.
The coefficient of a particular monomial x i1 • • • x i k of degree k in the ANF of f can be computed by summing the values of f over the vector space generated by the k vectors e i1 , . . ., e i k in the canonical basis.This method, sometimes called the Moebius transform, has many applications in cryptography and coding theory; in cryptanalysis it was used to detect and exploit non-randomness features of the number of monomials of a given degree (see [5,10,3,12]).
One could use the Moebius transform to estimate the degree of a function as follows: pick a monomial of degree k, compute its coefficient in f and test whether it is zero.If it is not, then we know that deg(f ) ≥ k.If we run this test for several monomials of degree k and all the computed coefficients are zero, then we conclude that f probably has degree strictly less than k.The probability of finding a monomial of degree k (and correctly concluding deg(f ) ≥ k) in one run of the test is equal to the proportion of monomials of degree k that have non-zero coefficients in f .Therefore, this method has the shortcoming that if f has degree at least k but only has a very small number of monomials of degree k, then one might incorrectly classify f as having degree less than k, as illustrated in the following example: Example 1 The function f (x 1 , . . ., x 9 ) = x 1 x 2 x 3 + x 4 x 5 x 6 + x 7 x 8 x 9 has only 3 out of the 9  3 = 84 possible monomials of degree 3 in 9 variables.Assume we run the test based on the Moebious transform, to search for monomials of degree 3.Each run of the test has a probability of only 3  84 ≈ 0.0357 to detect a monomial.After running the test 9 times, for example, we have still a rather high probability of (1 − 3 84 ) 9 ≈ 0.72 that no monomial of degree 3 has been detected yet by the test, and therefore we might wrongly conclude that deg(f ) < 3.
In this paper we generalise this idea.The intuition behind our proposed method to test whether deg(f ) < k is that even if a function f has a very small number of monomials of degree k, after applying a random affine invertible change of variables to f (the degree of f being invariant to such changes of variables), the number of monomials of degree k is likely to be high and therefore it will be easier to probabilistically detect their existence.The aim is that the test should perform reasonably well for all functions.
We will call our proposed test the deg(f ) < k test and define it as follows.Pick u 0 , u 1 , . . ., u k ∈ F n 2 and check whether the sum of the values of f over the affine space u 0 ⊕ ⟨u 1 , . . ., u k ⟩ is zero.Again, given a function f we run the test several times.If it passes all test we conclude that deg(f ) < k, otherwise we conclude deg(f ) ≥ k.A function f of degree less than k will always pass this test (there are no false negatives).A function of degree k or more will sometimes fail and sometimes pass the test, depending on the chosen vectors.We denote by dt k (f ) the probability of failing this test, taken over all values u 0 , u 1 , . . ., u k ∈ F n 2 .This probability determines the probability (1 − dt k (f )) t of wrongly concluding, after t tests, that deg(f ) < k when in fact deg(f ) ≥ k (false positive).Ideally, dt k (f ) should not be very low.A very small value of dt k (f ) would mean that we would need to run the test a very large number of times to obtain a reasonable accuracy.
We initiate the study of the probability dt k (f ) of failing the deg(f ) < k test.We consider the case when the degree of f is in fact equal to k (although we would not know this beforehand, if we knew we would not need to do any test).We prove in Theorem 14 and Corollary 16 that the probability dt k (f ) satisfies an upper bound of 0.5 and a lower bound of 0.288788... (q-Pochhammer symbol at (0.5, 0.5, ∞)).This means there are no functions with very low probability dt k (f ), and therefore a small number of runs of the test is sufficient to give, with very high probability, the correct answer.For example, to a obtain a probability of less than 0.05 that a polynomial of degree k has been incorrectly classified as being of degree less than k, we would need to run the test 9 times.Contrast this with the situation illustrated in Example 1.
We compute and analyse the values of the probability of failing the deg(f ) < k test for all functions in 8 variables of degree k, using the representatives listed by Hou [6] and by Langevin and Leander [8] (see Section 5).
The study of the probability of failing the deg(f ) < k test for polynomials of degree strictly higher than k will be the subject of future work.
The probability dt k (f ) is connected to other existing notions as follows.For k = 2, i.e. the deg(f ) < 2 test, if we restrict to linear rather than affine spaces, we obtain the usual textbook linearity test f (u 1 ⊕ u 2 ) = f (u 1 ) ⊕ f (u 2 ), often called the BLR test.The probability of failing the BLR test was studied in several papers, see for example [1].In [13], in the context of the cube/AIDA attack, we proposed a linearity test similar to the deg(f ) < k test above, but fixing a linear space of dimension k and running the deg(f ) < m test on all its subspaces of dimension m, for all 2 ≤ m ≤ k.
We show in Theorem 8 that the probability of failing the deg(f ) < k test, when restricted to affine spaces of dimension exactly k, is equal to the average number of monomials of degree k over all the polynomials in the affine equivalence class of f .We propose this average density of monomials of degree k as a new parameter of Boolean functions (see Definition 4).It is somewhat similar, but distinct from the notion of algebraic thickness defined in [2], see Remark 6.

Definitions
We denote by F 2 the finite field with 2 elements, and by F n 2 the n-dimensional vector space over F 2 .Addition in F 2 and in F n 2 will be denoted by ⊕.Any function f : F n 2 → F 2 can be represented in its algebraic normal form (ANF), i.e. as a polynomial function given by a polynomial of degree at most 1 in each variable: with b (a1,...,an) ∈ F 2 .The degree of this polynomial is called the algebraic degree of f , and here we will call it simply the degree of f and denote it by deg(f ).
Recall that the number of subspaces of dimension k of F n 2 equals the Gaussian binomial coefficient .
Consider the general linear group GL(n, F 2 ), consisting of the invertible n × n matrices over F 2 .For any matrix M ∈ GL(n, F 2 ) and any v ∈ F n 2 we will denote by φ M and φ M,v the invertible linear, respectively affine transformation of F n 2 defined as φ M (x) = M x and φ M,v (x) = M x ⊕ v respectively.There are Two functions f, g : Later in the paper there will be situations where only the monomials of degree k or more of a polynomial are relevant, and any monomials of lower degree can be ignored.Combining that with affine equivalence, we also define the equivalence g and h coincide if we ignore any monomials of degree less than k).

Degree testing and the degree density
We will define two notions: the "degree less than k" probabilistic test and the "average degree-k monomial density" of a function.We will then examine the relations between them.
The probability of f failing this test, taken over all u 0 , u 1 , . . ., u k ∈ F n 2 will be denoted dt k (f ).In other words Remark 3 It is not hard to verify that if u 1 , u 2 , . . ., u k are linearly dependent then any function f passes that particular deg(f ) < k test.Therefore, in practice there is no need to run the test when they are linearly dependent.We could therefore define the test either with, or without the requirement that u 1 , u 2 , . . ., u k are linearly independent.We decided that both probabilities of failure, with or without the requirement are of interest.There are at least two reasons why the probability of failure without the requirement that u 1 , u 2 , . . ., u k are linearly independent, which we denoted dt k (f ), is of interest.Firstly, the case k = 2 and u 0 = 0 corresponds to what is usually called the BLR test; the probability of failing the BLR test is defined without the requirement that u 1 , u 2 should be linearly independent (see for example [1]).Secondly, as we shall see in Proposition 10(i), the value of dt k (f ) does not change if the function f in n variables is viewed as a function in more than n variables.By contrast, this value would change if the definition required linearly independent vectors.The probability of failing the test when we require that u 1 , u 2 , . . ., u k are linearly independent, equals the quantity add k (f ) in Definition 4, see Theorem 8.
Definition 4 Let 0 ≤ k ≤ n be integers and let f : F n 2 → F 2 be a function.The degree-k monomial density of f , denoted dd k (f ) is defined as the number of monomials of degree k in the ANF of f , divided by n k (the total number of monomials of degree with t ranging over all monomials in n variables and b t ∈ F 2 then . The average degree-k monomial density of f , denoted by add k (f ) is the average (arithmetic mean) of dd k (g) of all the functions g such that f ∼ g, i.e.
Remark 5 The two ways of defining add k (f ) in equation ( 5) are indeed equal.Namely, denote by A the cardinality of the stabilizer of f under the action of invertible affine transformations of Remark 6 The notion of average degree-k monomial density has some similarity, but is different from the algebraic thickness of a function f , defined in [2].The algebraic thickness is defined as the minimum number of monomials among all the functions g such that f ∼ g.Both notions look at the number of monomials, but the average degree density looks at monomials of a given degree while the algebraic thickness looks at monomials of all degrees.Also, while both notions look at the whole equivalence class of f , the average degree density computes the average while the algebraic thickness computes the minimum.
The average degree-k monomial density is closely connected to the probability of failing the deg(f ) < k test.After a preliminary lemma, we will give the exact relationship in the next theorem.
Lemma 7 Let M ∈ GL(n, F 2 ) be an invertible matrix and let Proof.Using (2) and the fact that M e i j = u j , we see that the coefficient b of Theorem 8 The average degree-k monomial density of a function f equals the probability of failing the test for which the vectors (u 1 , u 2 , . . ., u k ) are linearly independent.In other words: Proof.We aim to count all the monomials of degree k in f • φ M,v for all M ∈ GL(n, F 2 ) and v ∈ F n 2 , let us call this number N 1 .In other words, using (5), N 1 is such that Then we compare this number with the number of tuples of k + 1 vectors (u 0 , u 1 , u 2 , . . ., u k ) ∈ (F n 2 ) k+1 for which the test fails, let us call this number N 2 .Using (4) we have Note that the fact that the test for (u 0 , u 1 , u 2 , . . ., u k ) fails implies that u 1 , u 2 , . . ., u k are linearly independent (see Remark 3).
For each fixed (u 0 , u 1 , u 2 , . . ., u k ) ∈ (F n 2 ) k+1 , we know from Lemma 7 that the test for some invertible matrix M which has the columns u 1 , u 2 , . . ., u k in some positions 1 ≤ i 1 < . . .< i k ≤ n.The number of invertible matrices M that have the vectors u 1 , u 2 , . . ., u k appearing (in this order) as some of their columns is (there are n k ways of choosing the positions where these k columns appear, and for each of them we can choose incrementally the remaining n − k columns such that each newly chosen column is not in the vector space generated by the previously chosen columns, to ensure that the final matrix is invertible).Therefore each fixed (u 0 , u 1 , u 2 , . . ., u k ) for which the test fails corresponds to ) monomials in polynomials of the form f • φ M,u0 , and thus Combining this relation with ( 7) and ( 8) yields the desired result.Note that the probability of an arbitrary k-tuple of vectors u 1 , u 2 , . . ., u k to be linearly independent is From Lemma 7 and Theorem 8 above and the fact that the degree is invariant to invertible affine transformations, we can deduce: We prove some useful properties of dt k (f ): . Whether the deg(g) < k test passes or fails at v 0 , v 1 , . . ., v k only depends on the projection of the vectors on the first n coordinates, as f only depends on those coordinates.Let us denote by u 0 , u 1 , . . ., u k , respectively, the projections of v 0 , v 1 , . . ., v k on the first n coordinates.For each u 0 , u 1 , . . ., u k , there are 2 k+1 possible values for v 0 , v 1 , . . ., v k which have the same projections u 0 , u 1 , . . ., u k .So the number of v 0 , v 1 , . . ., v k vectors on which the deg(g) < k test fails, let us call it N 1 equals 2 k+1 N 2 , where we denoted by N 2 the number of u 0 , u 1 , . . ., u k vectors on which the deg(f ) < k test fails.Therefore: (ii) Since the sets of variables of the two functions g 1 and g 2 are disjoint, their values can be viewed as independent events.The function g fails the test if and only if exactly one of the functions g 1 or g 2 fails the test, so dt k

Probability of failing the deg(f ) < k test when f has degree k
The deg(f ) < k test would be used as follows: we run the deg(f ) < k test a number of times, say t times, for different choices of vectors u 0 , . . ., u k ∈ (F n 2 ) k+1 (chosen independently, with uniform distribution).If f passes all the t tests, we conclude that we probably have deg(f ) < k; if f fails at least one of the test we conclude that deg(f ) ≥ k.
When deg(f ) is truly below k, the function f will always pass the deg(f ) < k test as, by Corollary 9, we have dt k (f ) = 0.In other words, there are no false negatives.However, when the true degree of f is at least k, it is possible to wrongly conclude that deg(f ) < k (false positive).The probability of that happening is (1 − dt k (f )) t .It is therefore important to determine dt k (f ) for polynomials of degree k or more.In this paper we commence the study of dt k (f ) by proving results for the case when deg(f ) = k; the case deg(f ) > k will be the subject of further work.
Throughout this section we assume that deg(f ) = k.Note that for affine invertible transformations φ M,u0 (x) = M x + u 0 , the monomials of maximum degree in f • φ M,u0 , are the same regardless of the value of u 0 , and therefore the same as the ones in f • φ M , where φ M (x) = M x is a linear invertible transformation.Therefore, when we study only the monomials of maximum degree, it is sufficient to look at linear, rather than affine transformations, so when deg(f ) = k the equation ( 5) becomes: .
Recall from Corollary 9 that add k (f ) and dt k (f ) are invariant under the equivalence ∼ k−1 i.e. f ∼ k−1 g implies add k (f ) = add k (g) and dt k (f ) = dt k (g).When considering polynomials of degree k under ∼ k−1 equivalence, is suffices to consider representatives which only contain monomials of degree k, i.e. are homogeneous.In this context, the following construction was used extensively in the classification in [6] and [8].Assume f is homogeneous of degree k and write its algebraic normal form as f (x 1 , . . ., x n ) = t b t t, with t ranging over all monomials of degree k in the variables x 1 , . . ., x n .Define (this is sometimes called the complement of f , but it should not be confused with the Boolean complement, which is f ⊕ 1).
We prove sone additional useful properties of dt k (f ): Proof.(i) Consider k+1 vectors u 0 , u 1 , . . ., u k ∈ (F n 2 ) k+1 such that u 1 , u 2 , . . ., u k are linearly independent.Running the deg(f ) < k test on these vectors involves adding the values of f on all vectors in the affine space U = u 0 ⊕ ⟨u 1 , u 2 , . . ., u k ⟩.The result will be the same for all the other (k + 1)-tuples Let us denote by L k+1 (g) the set of vector spaces of dimension k +1 on which the deg(g) < k + 1 test fails, and denote by A k (f ) the set of affine spaces of dimension k on which the deg(f ) < k test fails.We have: All we have to do now is to show that the sets L k+1 (g) and A k (f ) have the same cardinality.
Let V ∈ L k+1 (g).Consider the space V 0 = {v ∈ V : ψ(v) = 0}.Note that g(v) = 0 for all v ∈ V 0 , as g(x 1 , . . ., x n , x n+1 ) = 0 whenever x n+1 = 0. Since the test fails on V , g must be non-zero on at least some elements of V .Therefore V 0 must be a proper subspace of V , and has dimension dim(V )−1 = k.Moreover, we can uniquely partition V into the linear space V 0 and the affine space V \ V 0 = w ⊕ V 0 where w is any vector w ∈ V \ V 0 .We have f (u).(10) We have proved therefore that if g fails on V then f fails on U = π(V \ V 0 ).Conversely, for any affine space U ∈ A k (f ), we construct the space V ∈ L k+1 (g) as follows.For any affine space U of dimension k there is a unique linear space U 0 of dimension k such that U can be written as U = u 0 ⊕U 0 for some u 0 ∈ U (in fact any u 0 ∈ U will work).We then define V 0 as the k-dimensional linear space {v ∈ F n+1 2 : π(v) ∈ U 0 , ψ(v) = 0}.We pick one vector u 0 ∈ U and define w as the vector satisfying π(w) = u 0 , ψ(w) = 1.Finally we define V = V 0 ∪ (w ⊕ V 0 ), which is a linear space of dimension k + 1.Note that V is the same regardless on which u 0 ∈ U 0 was chosen.As in (10) above, one can see that if f fails the test on U then g fails the test on V .
(ii) In [6, page 110] it was proven that the orbit of f c under the equivalence ∼ n−k−1 has the same cardinality as the orbit of f under ∼ k−1 , and moreover, the orbit of f c is {h c : h ∼ k−1 f }.Since h and h c have the same number of monomials, from Definition 4 we have that add k (f ) = add n−k (f c ).We then apply Theorem 8 to obtain the required result.
Propositions 10 and 11 above allow us to compute the values of dt k (f ) for some simple functions f : Example 13 Using the Corollary 12 above we compute the exact values of dt k (f ) for some functions f .For example 47969.This means that after running 9 times the deg(f ) < 3 test on this f we have only a (1 − 0.47969) 9 = 0.0028 probability of incorrectly deciding that deg(f ) < 3; compare that with a probability of 0.72 for the original test, as explained in Example 1.
We will now prove lower and upper bounds for dt(f ): The lower bound is achieved if and only if Proof.The proof will be by induction on k.We consider first the case when k = 1.
Recall that the normalised Hamming weight of f is defined as the proportion of its inputs that produce non-zero outputs, i.e.: The probability of failing this test, over all u ∈ F n 2 is equal to wt(f ) if f (0) = 0 and it is equal to 1 − wt(f ) if f (0) = 1.
Since f has degree 1, i.e. it is an affine non-constant function, its weight is 1 2 .So dt 1 (f ) = 1  2 .Now consider an arbitrary degree k and assume the statement holds for degrees less than k.Recall that the discrete derivative of f in a direction a ∈ F n 2 is defined as D a f (x) = f (x ⊕ a) ⊕ f (x) (usually the case a = 0 is excluded, but here we will allow it, and obviously the derivative is identically zero when a = 0).
The deg(f In [11] it was shown that the number of fast points for a function f of degree k in n variables can vary from zero to at most 2 n−k −1, the latter being achieved if and only if Let us denote by S(f ) the vectors in F n 2 \ {0} that are not fast points for f .We have therefore 2 Back to the equation (11), using the fact that whenever u k is a fast point for f we have deg(D u k f ) < k − 1 and therefore dt k−1 (D u k f ) = 0, we have that Since deg(D Combining this with equation ( 13) we obtain and finally using (12) we obtain the bounds in the theorem's statement.Note that the lower bound is achieved when Example 15 For functions f in 8 variables, Theorem 14 above shows that when f has degree 4 we have 0.307617 ≤ dt 4 (f ) ≤ 0.4941635; when f has degree 3 we have 0.328125 ≤ dt 3 (f ) ≤ 0.496101379.
If we are interested in lower and upper bounds for dt k (f ) which do not depend on either k or the number of variables n, Theorem 14 implies: Corollary 16 Let f be a function of degree k ≥ 1.

Numerical results
We computed the values of dt k (f ) and add k (f ) for all functions of degree k ≥ 1 in 8 variables.Due to the invariance to ∼ k−1 , it suffices to compute these values for one representative from each class.
For degree k = 3 we used the 31 representatives of equivalence classes of polynomials in 8 variables listed in [6].For degree k = 4 we used the 998 equivalence classes of polynomials of degree k = 4 in 8 variables listed in [8].
For each function f of degree k, we computed add k (f ) by picking one basis u 1 , . . ., u k for each of the 8 k 2 vector spaces of F 8 2 of dimension k, and then running the deg(f ) < k test for each such basis.We then computed dt k (f ) using Theorem 8.
Table 1 lists the values of add 3 (f ) and dt 3 (f ) for each of the 31 non-zero representatives of degree 3 in 8 variables; they are listed in increasing order of dt 3 (f ).The values of dt 3 (f ) range from 0.328125 to 0.489626, with all but one polynomial (namely the one that consists of a single monomial) having dt 3 (f ) in the interval [0.4,0.5].As expected from Theorem 14, there are no values over 0.5.We note that there are only 16 different values; some classes do have the same value of dt 3 (f ).
For the 998 polynomials of degree 4 in 8 variables, there are 54 different values of dt 4 (f ), ranging from 0.307617 to 0.480051, with all polynomials having dt 4 (f ) in the interval [0.4,0.5] except for two polynomial classes, namely the class of a monomial, e.g.x 1 x 2 x 3 x 4 and the class of x 1 x 2 (x 3 x 4 ⊕ x 5 x 6 ), which have the values 0.307617 and 0.384521, as expected from Corollary 12.As expected from Theorem 14, there are no values over 0.5.Histograms are given in Figures 1 and 2 in the Appendix.The first histogram shows, for equally sized intervals, the number of polynomials that have dt k (f ) in that interval.We notice that the vast majority of classes have dt 4 (f ) between 0.462808 and 0.480051.The second histogram shows, for each possible value of dt 4 (f ), the number of classes that have that particular value.This allows us to see whether dt k (f ) could be used to distinguish between equivalence classes (recall that dt k (f ) is invariant to the equivalence ∼ k−1 ).Given f, g, if dt k (f ) ̸ = dt k (g), we know that f and g are inequivalent.However, if dt k (f ) = dt k (g) we are unable to use this invariant to decide whether f and g are equivalent or not.Invariants where not many classes have the same value of the invariant are therefore preferable for this purpose (several invariants have been used in the classification of [6] and [8]).Unfortunately, as seen from Figure 2, dt k (f ) is not particularly suited for distinguishing classes, as there are many classes with the same value.
For degree k = 5 there are again 31 representatives, which are obtained as f c with f running through the 31 degree 3 representatives mentioned above.Using Proposition 11(ii), one can compute dt 5 (f c ) = dt 3 (f ) 1 − 1 2 4   1 − 1 2 5 = 0.96875dt 3 (f ).Similarly, for degree k = 6 there are 4 representatives, which are obtained from the degree 2 representatives by dt 6 (f c ) = dt 2 (f ) 2 , ..., i k } be the support of a, i.e. a i = 1 if and only if i ∈ {i 1 , i 2 , ..., i k }.In other words b (a1,...,an) is the coefficient ofx i1 x i2 • • • x i k .Denote by e 1 , . . ., e n the canonical basis of F n 2 , i.e. e i has a 1 in position i and zeroes elsewhere.Then