Higher order differentiation over finite fields with applications to generalising the cube attack

Higher order differentiation was introduced in a cryptographic context by Lai. Several attacks can be viewed in the context of higher order differentiations, amongst them the cube attack of Dinur and Shamir and the AIDA attack of Vielhaber. All of the above have been developed for the binary case. We examine differentiation in larger fields, starting with the field GF(p)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {GF}(p)$$\end{document} of integers modulo a prime p, and apply these techniques to generalising the cube attack to GF(p)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {GF}(p)$$\end{document}. The crucial difference is that now the degree in each variable can be higher than one, and our proposed attack will differentiate several times with respect to each variable (unlike the classical cube attack and its larger field version described by Dinur and Shamir, both of which differentiate at most once with respect to each variable). Connections to the Moebius/Reed Muller Transform over GF(p)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {GF}(p)$$\end{document} are also examined. Finally we describe differentiation over finite fields GF(ps)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {GF}(p^s)$$\end{document} with ps\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p^s$$\end{document} elements and show that it can be reduced to differentiation over GF(p)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {GF}(p)$$\end{document}, so a cube attack over GF(ps)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {GF}(p^s)$$\end{document} would be equivalent to cube attacks over GF(p)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {GF}(p)$$\end{document}.


Introduction
The original motivation for this work was to generalise the cube attack from the binary field to arbitrary finite fields.While doing so, we developed a number of tools and results for differentiation over finite fields which could have a broader applicability in cryptography.
Higher order differentiation was introduced in a cryptographic context by Lai in [14] (called there higher order derivative).This notion had already been used for a very long time, under the name of finite difference, in other areas of mathematics (notably for the numerical approximation of the derivative).
The finite difference for a function f is defined as the function (∆ a f )(x) = f (x + a) − f (x), for a fixed difference a (the domain and codomain of f are commutative groups in additive notation).Usually f is a function of n variables, so x = (x 1 , . . ., x n ) and a = (a 1 , . . ., a n ).An important particular case is the finite difference with respect to one variable, namely a = he i where the difference step h is a scalar constant (equal to 1 by default) and e i are the elementary vectors having a 1 in position i and zeroes elsewhere.Higher order differentiation means repeated application of the finite difference operator.
The functions we use here are functions in several variables over a finite field.Any such function can be represented as a polynomial function and after a sufficiently high number of applications of the finite difference operator the result is the identically zero function.However for certain choices of differences a, this can happen prematurely, for example over the binary field GF(2) differentiating twice using the same difference a will always result in the zero function, regardless of the original function f .For our applications, we need to ensure that this does not happen prematurely.
A number of cryptographic attacks can be reformulated using higher order differentiation.Differential cryptanalysis (introduced by Biham and Shamir [3]) has been thus reformulated by Lai in [14]; the cube attack of Dinur and Shamir [5] and the related AIDA attack of Vielhaber [17] have been reformulated in Knellwolf and Meier [11], Duan and Lai [7].
Our main motivation came from the cube attack.In both the cube attack and the AIDA attack we have a "black box" function f in several public and secret variables and we select a set of indices of public variables I = {i 1 , . . ., i k }.Then f is evaluated at each point of a "cube" consisting of the vectors that have all the possible combinations of 0/1 values for the variables with index in I, whereas the remaining variables are left indeterminate; the resulting values are summed and the sum will be denoted f I .The attacks hope that for suitable choices of subsets I of public variables, the resulting f I is linear in the secret variables, for the cube attack (or equals to one secret variable or the product of several secret variables for the AIDA attack).This situation is particularly likely when the cardinality of I is just marginally lower than the total degree of the function.Such subsets I are found in a preprocessing phase, where the values of the keys can be chosen freely.In the online phase the key variables are unknown, and by computing the f I for the sets I identified in the preprocessing phase, one obtains a system of linear equations in the key variables.
It was shown (see Knellwolf and Meier [11], Duan and Lai [7]) that choosing the variable indices I = {i 1 , . . ., i k } and computing f I (as described above) is equivalent to computing the k-order finite difference of f with respect to the elementary vectors e i1 , . . ., e i k , i.e. by differentiating once with respect to x i1 , then w.r.t.x i2 and so on, finally differentiating w.r.t.x i k .
All the attacks above, as well as the higher order differentiation used in cryptography are over the binary field.While all cryptographic functions can be viewed as binary functions, there are a number of ciphers which make significant use of operations modulo a prime p > 2 in their internal processing, for example ZUC [9], IDEA [15,16], MMB [4].It may therefore be advantageous for such ciphers to also be viewed and analysed as functions over GF(p), the field of integers modulo p. Unlike the binary case, a polynomial function can now have degree more than one in each variable, in fact it can have degree up to p − 1 in each variable.There are yet other ciphers which use operations over Galois fields of the form GF(p m ), for example SNOW [8] and in such fields the degree of the polynomial functions can be up to p m − 1 in each variable.
A first generalisation of the cube attack to GF(p m ) was sketched by Dinur and Shamir in [5, page 284] and also developed more explicitly by Agnese and Pedicini [1].We show that their approach can again be viewed as k-order differentiation, where we differentiate once with respect to each of the variables x i1 , . . ., x i k .However we argue that their generalisation, while correct, has very low chances to lead to a successful attack because we don't differentiate sufficiently many times.Namely, on one hand, like in the binary case, the best chances of success are when the function is differentiated a number of times just marginally lower than its total degree; on the other hand in their proposed scheme the number of times that the function is differentiated is upper bounded by the number of variables, which (unlike the binary case) can be significantly lower then the degree of the function (see Remark 12).
Our proposed generalisation of the cube attack to GF(p) improves the chances of success by differentiating several times with respect to each of the chosen variables.Thus there is no intrinsic limit on the number of differentiations and therefore this number can be as close as we want to the degree of the polynomial (only limited by the computing power available).
We first examine higher order differentiation in GF(p) (Section 4.1).We show that for repeated differentiation with respect to the same variable, we can use any non-zero difference steps and the degree will decrease by exactly one for each differentiation.Choosing all the steps equal to one gives a compact and efficient formula for evaluating the higher order differentiation for a "black box" function.
We then show, in Section 4.2 that the main result of the classical cube attack, [5, Theorem 1], no longer holds when we differentiate repeatedly with respect to the same variable in GF(p); Example 20 gives a counterexample.However, we show that a similar result does hold, see Theorem 21.Also, just like in the binary case, if the "black box" function has total degree d, differentiating d − 1 times with respect to public variables always results in a function which is either constant or is linear in the secret variables.The resulting algorithm is sketched in Section 4.3.Now we not only choose variables for the "cube" but we also choose the number of times we are going to differentiate with respect to each variable.For computational efficiency, choosing only one variable (or a small number of variables) and differentiating a large number of times with respect to that variable is preferable.In GF(p) probabilistic linearity testing has a smaller expected number of tests than in GF (2), see [10].
While this paper concentrates on generalising the cube attack, other attacks that use differentiation could also be generalised to GF(p) using our technique, for example cube testers (see [2]) or differential cryptanalysis.
Finally, for completeness, we deal with generalisations to finite fields of the form GF(p m ) in Section 5.2.Here, for functions such as x d with p | d, differentiation with respect to x decreases the degree by more than one regardless of the difference step.We give a more precise expression of the decrease in degree for higher order differentiation depending on the representation of the degree in base p.Any function can be differentiated at most m(p − 1) times before it becomes identically zero.Moreover, in order to avoid the result becoming identically zero even earlier, the difference steps will be chosen as follows: p − 1 steps equal to b 0 , p − 1 steps equal to b 1 and so on, where b 0 , . . ., b m−1 is a base of GF(p m ) when viewed as a vector space over GF(p).We can thus differentiate m(p − 1) times.Due to the fact that differentiation only uses the additive group of GF(p m ), which is isomorphic to GF(p) m , differentiation over GF(p m ) can in fact be reduced to differentiation over GF(p) in each component of the projection of the function f .Therefore, we feel that developing a cube attack in GF(p m ), while possible, does not bring any additional advantages compared to a cube attack in GF(p).

Preliminaries
Throughout this paper R denotes an arbitrary commutative ring with identity and GF(p m ) denotes the finite field with p m elements where p is prime.We denote by e i = (0, . . ., 0, 1, 0, . . ., 0) ∈ R n the vector which has a 1 in position i and zeroes elsewhere, i.e. e 1 , . . ., e n is the canonical basis of the vector space R n .
We recall the definition of differentiation, which was introduced in the cryptographic context by Lai in [14].This notion was used long before, under the name finite difference, in other areas of mathematics, notably for approximating the derivative.Definition 1.Let f : R n → R s be a function in n variables x 1 , . . ., x n .Let a = (a 1 , . . ., a n ) ∈ R n \ {0}.The finite difference operator (or differentiation operator) with respect to a associates to each function f the function ∆ a f : R n → R s defined as For the particular case of a = he i for some 1 ≤ i ≤ n and h ∈ R \ {0}, we will call ∆ he i the finite difference operator (or differentiation) with respect to the variable x i with step h, or simply the finite difference operator with respect to the variable x i if h = 1 or if h is clear from the context.We will use the abbreviation "w.r.t.x i " for "with respect to x i ".
Remark 2. Note that in the cryptographic literature this operator (and the resulting function) is usually called the derivative or differential (see [14,12]).We will avoid the term derivative because of the risk of confusion with the well established mathematical notion of formal derivative of a polynomial.For a polynomial It can easily be seen that the formal derivative operator w.r.t.x i coincides with the finite difference operator w.r.t.x i only for polynomials which have degree at most one in x i .Polynomial functions over GF (2) have degree at most one in each variable, so in this case these notions coincide.Hence the use of the term "derivative" for ∆ he i f (x 1 , . . ., x n ) is justified for polynomials over GF (2), but not for polynomials over other rings/fields.Remark 3.For defining the finite difference operator, we do not actually need to work over a ring R, a commutative group (using additive notation for convenience) is sufficient.Here we used a ring due to our application to finite fields, and also due to some of the techniques involving polynomials.
The finite difference operator is a linear operator; it is commutative and associative.Repeated application of the operator (also called higher order differentiation or higher order derivative in [14]) will be denoted by ∆ (k) a1,...,a k f = ∆ a1 ∆ a2 . . .∆ a k f where a 1 , . . ., a k ∈ R n are not necessarily distinct.An explicit formula can be obtained easily from Definition 1 by induction: Depending of the values of the a 1 , . . ., a k and the characteristic of the ring, ∆ a1,...,a k f could collapse, becoming the identical zero function regardless of the function f .(This happens, for example, if the ring is GF(2) and a 1 , . . ., a k are not linearly independent.)When differentiating w.r.t.one variable we need to choose the difference steps so that that this does not happen.Details will be given in Section 4.1 for finite fields of the form GF(p) and in Section 5.2 for finite fields of the form GF(p m ).
While the finite difference operator can be defined for any function, in the sequel we will concentrate on polynomial functions.We will denote by deg xi (f ) the degree of f in the variable x i .The total degree will be denoted deg(f ).The following three results are well known and straightforward, but will be needed later.The first result states that differentiating with respect to one variable decreases the degree in that variable by at least one.The other propositions deal with the results of the differentiation in a few simple cases.
One combinatorial interpretation is the number of ways that we can distribute n objects into s (labeled) boxes, so that the first box has k 1 elements, the second k 2 elements e.t.c.Multinomials are generalisations of the usual binomial coefficients, with Next we examine the effect of higher order differentiation on univariate monomials; the general formula for univariate polynomials can be obtained using the linearity of the ∆ operator.
Not let us assume the statement holds for a given k and we prove it for k + 1.
The last line uses the identity: Recall that in a finite field GF(p m ) we have a p m = a for all elements a.Hence, while (formal) polynomials over GF(p m ) could have any degree in each variable, when we talk about the associated polynomial function, there will always be a unique polynomial of degree at most p m − 1 in each variable which defines the same polynomial function.In other words we are working in the quotient ring GF(p m )[x 1 , . . ., x n ]/ x p m 1 − x 1 , . . .x p m n − x n , and we use as representative of each class the unique polynomial which has degree at most p m − 1 in each variable.
Moreover, all functions in n variables over a finite field can be written as polynomial functions of n variables The polynomial can be obtained by interpolation from the values of the function at each point in its (finite) domain.(This is obviously not the case for infinite fields).To summarise, each function in n variables over GF(p m ) can be uniquely expressed as a polynomial function defined by a polynomial in GF(p m )[x 1 , . . ., x n ] of degree at most p m − 1 in each variable.

Classical cube attack and differentiation
In this section we first recall the classical cube attack from [5], and its interpretation in the framework of higher order differentials (see [11,7]).We then recall a first generalisation to higher fields sketched in [5] (see also [1]).
In the cube attack ( [5]), one has a "black box" polynomial function f : GF(2) n → GF(2) in n variables x 1 , . . ., x n .Recall that polynomial functions over GF (2) have degree at most one in each variable.(Note that the function is named p in the cube attack papers, but we had to rename it f as later we will work in fields of characteristic other than 2, and we felt p was a well-established notation for the characteristic.) The next definitions are taken from [5]: "Any subset I of size k defines a k dimensional Boolean cube C I of 2 k vectors in which we assign all the possible combinations of 0/1 values to variables in I and leave all the other variables undetermined.Any vector v ∈ C I defines a new derived polynomial f |v with n − k variables (whose degree may be the same or lower than the degree of the original polynomial).Summing these derived polynomials over all the 2 k possible vectors in C I we end up with a new polynomial which is denoted by f I = v∈CI f |v .." Note that the computation of f I requires 2 k calls to the "black box" function f .On the other hand denoting by t I the product of the variables with indices in we can factor the common subterm t I out of some of the terms in f and write f as where each of the terms of r(x 1 , . . ., x n ) misses at least one of the variables with index in I.Note that f S(I) is a polynomial in the variables with indices in {1, 2, . . ., n} \ I.
The cube attack is based on the following main result: This result was reformulated using higher order differentials by several authors ( [11,7]).We present such a reformulation using our notations: Theorem 10.For any polynomial f and subset of variables Note that evaluating the expression above for any fixed constant values of x i1 , . . ., x i k yields f I .Hence ∆ (k) ei 1 ,...,ei k f does not depend on x i1 , . . ., x i k and is equal to f I .By Theorem 9, f I = f S(I) .
For the cube attack we are particularly interested in the situation when f S(I) (and therefore f I ) has degree exactly one, i.e. it is linear but not constant (the corresponding term t I is then called maxterm in [5]).Let d be the total degree of f .Then I having d − 1 elements is a sufficient (but not necessary) condition for f S(I) to have degree at most one, i.e. to be linear or constant.
Generalising the cube attack from the binary field to GF(p m ) was sketched in [5]: "Over a general field GF(p m ) with p > 2, the correct way to apply cube attacks is to alternately add and subtract the outputs of the master polynomial with public inputs that range only over the two values 0 and 1 (and not over all their possible values of 0, 1, . . ., p), where the sign is determined by the sum (modulo 2) of the vector of assigned values." We make this idea more precise; this was also done in [1] but we will follow a simpler approach for the proof of the main result.Let f be again a function of n variables x 1 , . . ., x n , but this time over an arbitrary finite field GF(p m ).Note that now f can have degree up to p m − 1 in each variable.
As before, we select a subset of k indices I = {i 1 , . . ., i k } ⊆ {1, 2, . . ., n} and consider a "cube" C I consisting of the n-tuples which have all combinations of the values 0/1 for the variables with indices in I, while the other variables remain indeterminate.The function f is evaluated at the points in the cube and these values are summed with alternating + and − signs obtaining a value On the other hand denoting by t I the product of the variables with indices in I, we can factor the common subterm t I out of some of the terms in f and write f as where each of the terms of r(x 1 , . . ., x n ) misses at least one of the variables with index in I.Note that, unlike the binary case, now f S(I) can contain variables with indices in I.
Now we can prove an analogue of Theorems 9 and 10. (A similar theorem appears in [1, Theorem 6], but both the statement and the proof are more complicated, involving a term and consequently when factoring out t and writing f = tf S(t) + q, having to treat separately the terms of q which contain some variables with indices in I, and the terms of q which do not.) Proof.The fact that f I = (∆ (k) ei 1 ,...,ei k f )(u) follows from Proposition 4 in the same way as in the proof of Theorem 10.
It suffices to show (∆ for the case when f is a monomial.The rest follows from the linearity of the operators, as (f + g) S(I) = f S(I) + g S(I) and ∆ is a linear operator.
If f is a monomial not divisible by t I , then both f S(I) and ∆ (k) ei 1 ,...,ei k f are identically zero, the latter using Proposition 5. Now assume f = t I f S(I) for some monomial f S(I) .Like in the proof of [5, Theorem 1], we note that in the sum in the definition of f I (or, equivalently in the sum given by Proposition 4 for (∆ (k) ei 1 ,...,ei k f )(u)) only one term is non-zero; namely t I evaluates to a non-zero value iff x i1 = . . .
Remark 12.A cube attack based on Theorem 11 above would again search for sets I for which f I is linear in the variables whose indices are not in I.If the total degree of f is d, the total degree of f I can be, in the worst case, d − k.If k = d − 1 we can guarantee that f I is linear or a constant.More generally, the closer k gets to d − 1 (while still having k ≤ d − 1), the higher the chances of linearity.
However, unlike the binary case where d ≤ n, now d can have any value up to (p m − 1)n.Hence the (unknown) degree d of f could well be considerably higher than the number of variables n.In such a case, k ≤ n−1 is considerably lower than d−1, and the chances of linearity are very small.In other words, since we differentiate at most once w.r.t. each variable, so a total of at most n − 1 times, the degree decreases by around n − 1 in general, and the resulting function can still have quite a high degree.Therefore, while a cube attack based on this result would be correct, it would have extremely low chances of success.
Our proposed generalisation of this attack would increase these chances by differentiating several times with respect to each variable.This will result in a greater decrease of the degree, thus improving the chances of reaching a linear result.

Differentiation in GF(p)
Differentiation with respect to a variable decreases the degree in that variable by at least one.In the binary case, the degree of a polynomial function in each variable is at most one, so we can only differentiate once w.r.t. each variable; a second differentiation will trivially produce the zero function.In GF(p) the degree in each variable is up to p − 1.We can therefore consider differentiating several times (and possibly using different difference steps) with respect to each variable.We first show that a monomial of degree d i in a variable x i can be differentiated m i times w.r.t.x i , for any m i ≤ d i (and using any collection of non-zero difference steps) and the degree decreases by exactly m i .Hence we can differentiate d i times without the result becoming identically zero.
For the remainder of this section, for simplicity we will always choose all the difference steps h i equal to one.The case of arbitrary h i can be treated similarly, but the formulae become more cumbersome.For convenience we will introduce some more notation.We pick a subset of k variable indices I = {i 1 , . . ., i k } and we also pick multiplicities for each variable, m 1 , . . ., m k .Denote by t the term t = x m1 i1 • • • x m k i k .We will apply the finite difference operator m 1 times w.r.t. the variable x i1 , and m 2 times w.r.t. the variable x i2 etc. always with difference step equal to one.More precisely we define: e i 1 ,.
We have where for any 1 ≤ m ≤ j ≤ d we define = 0 and the free term is equal to For the particular case of m 1 = d 1 , . . ., m k = d k , we have Proof.Induction on k, applying Theorem 13.When all the difference steps h i are equal to one, the evaluation of the finite difference for a "black box" function f using Proposition 4 becomes simpler.We treat first the case when we differentiate w.r.t. a single variable: If R has characteristic p and m 1 < p then all the coefficients m 1 i in the sum above are non-zero.If f is a "black box" function, evaluating f x m 1 1 at one point in its domain requires m 1 + 1 evaluations of f .
Proof.The formula follows from Proposition 4. Since 0 ≤ i ≤ m 1 < p and p is prime, m1 i cannot be divisible by p, so it is non-zero in a field of characteristic p.
We now look at the situation where we differentiate w.r.t.several variables x i1 , . . ., x i k .
Proposition 17.Let f : R n → R and t = k ℓ=1 x m ℓ i ℓ .Then If R has characteristic p and all m ℓ < p, then all the coefficients m1 j1 • • • m k j k in the sum above are non-zero.If f is a "black box" function, one evaluation of f t needs k ℓ=1 (m ℓ + 1) evaluations of f .In particular evaluating f t for x i ℓ = 0, ℓ = 1, . . ., k we obtain: with j 1 , . . ., j k in positions i 1 , . . ., i k respectively.
We note that in terms of the time complexity of evaluating f t , it is now not only the total degree of t that matters (as in the binary case), but also the exponents of each variable.For a given number m of differentiations (i.e.t of total degree m), the smallest time complexity is achieved when t contains only one variable, i.e. t = x m i1 .Among all t of total degree m that contain k variables, the best time complexity is achieved when t has degree one in each but one of its variables, e.g.
Remark 18.We saw that in the binary case, differentiating once w.r.t. each of the variables x i1 , . . .x i k is equivalent to summing f evaluated over a "cube" consisting of all the 0/1 combinations for the variables x i1 , . . .x i k .According to Proposition 17, the analogue of the "cube" will now be a k-dimensional grid/mesh with sides of "length" m 1 , m 2 , . . ., m k .Namely each variable x i ℓ w.r.t. which we differentiate will have increments of 0, 1, . . ., m ℓ and each term in the sum has alternating signs as well as being multiplied by binomial coefficients.

Fundamental theorem of the cube attack generalised to GF(p)
We will use the notation as in the previous section.Factoring out t, we can write f as where f S(t) and r are uniquely determined such as none of the terms in r is divisible by t.
We can already give a bound on the degree of f t : Denote by v the n-tuple having values of 1 in the positions i 1 , . . ., i k and indeterminates elsewhere, and by u the n-tuple having values of 0 in the positions i 1 , . . ., i k and indeterminates elsewhere.
Write f S(t) = t 1 g 1 + . . .+ t u g u , where g i are polynomials that do not depend on any of the variables x i1 , . . ., x i k and t 1 , . . ., t u are all the distinct terms in the variables x i1 , . . ., x i k that appear in f S(t) .
Then there are constants c 1 , . . ., c u ∈ GF(p) such that f t (u) equals c 1 g 1 + . . .+ c u g u (which can also be viewed as c 1 t 1 g 1 + . . .+ c u t u g u evaluated at v).The exact values for the constants c i can be determined as follows: if where D() is as defined in Theorem 15.
In particular if f S(t) does not depend on any of the variables x i1 , . . ., x i k then Proof.We first use Theorem 15 for individual monomials and then the linearity of the ∆ operator.
Again, for the cube attack we are interested in the cases where f t (u) is linear: The latter is also equal to the total degree of f S(t) (x) in the variables {x 1 , . . .,

Proposed Algorithm for the cube attack in GF(p)
In this section we give more details of the algorithm, drawing on the results from previous sections.The main idea of our proposed attack is that when the degree in one variable is higher than one, we can differentiate w.r.t. that variable repeatedly, unlike the cube attacks described in the Section 3, which use differentation at most once for each variable.We are given a cryptographic "black box" function f (v 1 , . . .v m , x 1 , . . ., x n ) with v i being public variables and x i being secret variables.
Preprocessing phase Using formula (4) in Proposition 17 we evaluate f t (0, x) for several choices of the secret variables x, in order to decide whether, with reasonably high probability, the total degree of f t (0, x) in x equals one.
(For this one can use the textbook definition of linearity; namely, for various values of a, b ∈ GF(p) and y, z ∈ GF(p) n test whether a(f t (0, y)−f t (0, 0))+b(f t (0, z)−f t (0, 0)) = f t (0, ay+bz)−f t (0, 0); in GF(p) with p large, we will need in general much fewer linearity tests than in the binary case, see [10]; one can at the same time check whether f t (0, x) is non-constant).
3. If the decision above is "yes", we determine f t (0, x) explicitly, as f t (0, x) = c 0 + n i=1 c i x i where c 0 = f t (0, 0) and c i = f t (0, e i ) − c 0 ; we store (t, c 0 , c 1 , . . ., c n ).For the heuristic of choosing t one could take into account the computational cost for a term t, see Proposition 17 and the comment following it.However a full heuristic is beyond the scope of this paper.A number of optimisations have been proposed for the binary cube attack; many of them can be transferred to the modulo p case, but again, this is beyond the scope of this paper.
Online phase 1.For each (t, c 0 , c 1 , . . ., c n ) stored in the preprocessing phase, compute f t (0, x) (with x being now unknown) using formula (4) in Proposition 17. Form the linear equation: 2. Solve the system of linear equations thus obtained, determining the secret variables x 1 , . . ., x n .If the preprocessing phase only produced s < n equations, then not all the secret variables can de determined, we would need to do an exhaustive search for n − s of them.
Remark 23.Let ℓ be the length of the binary representation of p.We can view each bit of an element in GF(p) as one binary variable.If f is a function of n variables over GF(p), we can also view it as ℓ binary functions in ℓn binary variables.We could therefore apply the classical (binary) cube attack on these functions.A rough estimate suggests that the running time for corresponding cubes will be approximately the same.(Differentiating p − 1 times with respect to one variable x i in GF(p) takes p evaluations of f ; differentiating once w.r.t. each of the binary variables that are components of x i will take 2 ℓ evaluations of f ; we have p ≈ 2 ℓ .)The chances of success on a particular cube bear no easy relationship between the two approaches, because the degree of f and the degrees of the ℓ binary functions are not related in a simple way.
Hence we would argue that in general one cannot tell which of the attacks will work better, so one should try both.If the cipher has a structure that would suggest that the degree as polynomial over GF(p) is relatively low, then a cube attack over GF(p) should certainly be an approach to consider.

Generalisations to GF(p m )
In this section we take our generalisation further, to arbitrary finite fields GF(p m ).An important particular case would be GF(2 m ), as many cryptographic algorithms include operations over a field of this type.

Preliminaries
We need some known results regarding the values of binomial coefficients and multinomial coefficients in fields of finite characteristic.
Theorem 24.(Kummer's Theorem, [13, p. 115]) Let n ≥ k ≥ 0 be integers and p a prime.Let j be the highest exponent for which n k is divisible by p j .Then j equals the sum of carries when adding k and n − k as numbers written in base p.
Kummer's theorem has been generalised to multinomials by various authors (see for example [6] and citations therein).
Theorem 25.Let d, k 1 , k 2 , . . ., k s be integers such that s i=1 k i = d and k i ≥ 0 and let p be a prime.Let j be the highest exponent for which d k1,k2,...,ks is divisible by p j .Then j equals the sum of all the carries when adding all of k 1 , k 2 , . . ., k s as numbers written in base p.
We will be interested in the situations where the multinomial coefficients are not zero modulo p.
Corollary 26.Let p be a prime.The following are equivalent: (i) The multinomial coefficient d k1,k2,...,ks is not zero modulo p. (ii) There are no carries when adding k 1 , k 2 , . . ., k s as numbers written in base p. (iii) In base p, each digit of n equals to the sum of the digits of k 1 , k 2 , . . ., k s in the corresponding position.

Differentiation in GF(p m )
When moving from GF(p) to GF(p m ) several things work differently.For a start, differentiating once w.r.t. a variable x may decrease the degree in x by more than one, regardless of the difference step.For example let us differentiate x d once.In (x + h) d − x d the coefficient of x d−1 is dh, so when d is a multiple of p the degree is strictly less than d − 1.To examine the general case we will use Theorem 8, so we introduce for convenience the following notation: Note that Corollary 26 gives a useful characterisation of this set.We have: ; finally define d ′ as the number written in base p as d ′ = d u d u−1 . . .d i+2 d ′ i+1 0 . . .0 (with i + 1 zeroes at the end).In particular, for p = 2, the binary representation of the degree d ′ is obtained from the binary representation of d by replacing k of its ones by zeroes, starting from the least significant digit.
Proof.By Theorem 8 the degree d ′ will be less than or equal to d − j where j is minimal such that C p (d, j, k) = ∅.Using Corollary 26(iii) we see that the minimum value for j for given d and k is achieved by choosing i 1 , . . ., i k ≥ 1 as small as possible while maintaining d i1,...i k ,d−j not equal to zero modulo p.This is achieved by choosing i 1 , . . .i k as follows: d 0 of them will be equal to 1, d 1 will be equal to p (i.e. 10 in base p), d 2 will be equal to p 2 (i.e. 100 in base p), . .., d i of them will be equal to p i and finally k − (d 0 + d 1 + • • • + d i ) will be equal to p i+1 .It can be verified that d ′ = d − (i 1 + . . .+ i k ) will then have the form described in the theorem statement.
Note that the sum of the digits of a number in base p an important role here.For any nonnegative integer a we will introduce the notation S p (a) as being the sum of the digits in the base p representation of a.We define the digit-sum degree of a univariate polynomial f in a variable x i as being max{S p (j)|c j = 0} where f = d j=0 c j x j i with c j polynomials in the remaining variables.The previous theorem implies: Corollary 28.Let f be a polynomial function and let s be the digit-sum degree of f in x i .Then differentiating f a total of s times w.r.t.x i will always produce a polynomial function which does not depend on x i (possibly the identically zero function).
Corollary 29.Any function f : GF(p m ) → GF(p m ) can be differentiated at most m(p − 1) times w.r.t. a given variable before the result becomes the identically zero function.
Is the bound in Corollary 28 tight, in the sense that there are functions which are non-zero after a number of differentiations equal to their digit-sum degree?In particular, are there functions which are still non-zero after m(p − 1) differentiations?We will show that this indeed the case, but only if we choose the h i carefully.First let us illustrate a choice of the steps h i which we need to avoid.By Proposition 16, if we differentiate p times with all steps equal to 1 the result is identically zero regardless of the original function f : As in Section 4, we pick a set of variables and their multiplicities, defining the term t = x m1 i1 • • • x m k i k .For a polynomial function f in n variables, we now define: where the sequence h 1 , . . ., h m(p−1) has been fixed as above.We will concentrate on differentiating several times w.r.t.one variable, x 1 .
All the coefficients in the sum above are non-zero.If f is a "black box" function, one evaluation of f t needs p q (r + 1) evaluations of f .Proof.Similar to the proof of Proposition 17.
We show next that our choice of h i is indeed a valid choice in the sense that there are functions which can be differentiated m(p − 1) times w.r.t. the same variable without becoming zero.
Proposition 31.For each t = x m1 1 with 0 ≤ m 1 ≤ m(p − 1) there is at least a function f : GF(p m ) n → GF(p m ) such that f t (x) is not the identical zero function.Moreover there is at least a polynomial function f with digit-sum degree in x 1 equal to m 1 such that f t (x) is a non-zero constant function.
Proof.We will construct a polynomial function f in one variable, x 1 .Write m 1 = q(p − 1) + r with 0 < r ≤ p − 1 as in Proposition 30.In the formula in Proposition 30 all the terms in the sum have

Conclusion
We examined higher order differentiation over integers modulo a prime p, as well as over general finite fields of p m elements, proving a number of results applicable to cryptographic attacks, and in particular generalising the fundamental theorem on which the cube attack is based.
Using these results we proposed a generalisation of the cube attack to functions over the integers modulo p; the main difference to the binary case is that we can differentiate several times with respect to the same variable.Such an attack would be particularly suited to ciphers that use operations modulo p in their internal structure.
We also show that a further generalisation to general finite fields GF(p m ) is possible, but not as promising as the generalisation to GF(p), due to the fact that differentiation in GF(p m ) can be reduced to differentiation in GF(p).
where w(v) denotes the Hamming weight of v ignoring the variables what have remained indeterminate.

Theorem 11 .
Let f : GF(p m ) n → GF(p m ) be a polynomial function and I a subset of variable indices.Denote by v the n-tuple having values of 1 in the positions with indices in I and indeterminates in the other positions, and by u the n-tuple having values of 0 in the positions in I and indeterminates in the other positions.Then:

4 .
Repeat the steps above for different values of t until one obtains n linearly independent stored tuples (c 1 , . . ., c n ), or until one runs out of time/memory.

1 ) and let h 1
, . . ., h k ∈ GF(p m )\{0}.The degree of ∆ (k) h1e1,...,h k e1 x d is less than or equal to the integer d ′ computed as follows: write d in base p as d = d u d u−1 . . .d 1 d 0 ; let i be the highest integer for which d 0 +d 1