On the Largest Product-free Subsets of the Alternating Groups

A subset $A$ of a group $G$ is called product-free if there is no solution to $a=bc$ with $a,b,c$ all in $A$. It is easy to see that the largest product-free subset of the symmetric group $S_n$ is obtained by taking the set of all odd permutations, i.e. $S_n \setminus A_n$, where $A_n$ is the alternating group. By contrast, it is a long-standing open problem to find the largest product-free subset of $A_n$. We solve this problem for large $n$, showing that the maximum size is achieved by the previously conjectured extremal examples, namely families of the form $\{\pi~|~\pi(x)\in I, \pi(I)\cap I=\emptyset\}$ and their inverses. Moreover, we show that the maximum size is only achieved by these extremal examples, and we have stability: any product-free subset of $A_n$ of nearly maximum size is structurally close to an extremal example. Our proof uses a combination of tools from Combinatorics and Non-abelian Fourier Analysis, including a crucial new ingredient exploiting some recent theory developed by Filmus, Kindler, Liftshitz and Minzer for global hypercontractivity on the symmetric group.


Introduction
The long-standing problem of determining the largest product-free set in the alternating group A n has been recently highlighted by Ben Green [8], who credits Edward Crane for the conjectured extremal examples, which are families of the form F x I := {π : π(x) ∈ I, π(I) ∩ I = ∅} and their inverses.Writing µ for uniform measure on A n and |I| = t √ n, one can calculate µ(F x I ) ≈ te −t 2 n −1/2 , which suggests the conjecture that the maximum measure should be Θ(n −1/2 ), and more precisely that it should be ∼ 1/ √ 2en.Improving earlier bounds of Kedlaya [9] and Gowers [7], the conjecture was proved up to logarithmic factors by Eberhard [3], who showed that any product-free A ⊆ A n has µ(A) = O(n −1/2 log 7/2 n).Our main result here answers the question completely, as follows.
Theorem 1.1.Suppose n is sufficiently large and A ⊆ A n is a product-free subset of maximum size.Then A or A −1 is some F x I .

99% stability
We also obtain the following '99% stability' result, showing that any large product-free subset of A n is essentially contained in an F x I or its inverse.Our stability result holds for sets whose measure is much smaller than the extremal family.
Theorem 1.2.Suppose n is sufficiently large and A ⊆ A n is a product-free set with µ (A) ≥ n −0.66 .Then there is some F x I such that µ(A\F x I ) < n −0.66 or µ(A −1 \F x I ) < n −0.66 .Moreover, if A ⊆ A n is a product-free set with size very close to the maximum we show that A or A −1 is contained in some F x I .Theorem 1.3.There exists an absolute constant c such that if n is sufficiently large and A ⊆ A n is a product-free set with µ(A) > max I,x µ (F x I ) − c n then there is some This 1% stability result will be deduced from a combination of the trace method, a recent level-d inequality due to Filmus, Kindler, Lifshitz and Minzer [5], and novel upper bounds on eigenvalues of Cayley graphs over the symmetric group.

Globalness
Theorem 1.5 naturally leads us to define the notion of 'globalness', which plays a crucial role throughout the paper.This is intuitively a pseudorandomness property stating that membership is not determined by local information, which one can think of as being the polar extreme to dictators.
More precisely, we make the following definition, saying that small restrictions do not have large measure.We include two versions of the same concept with different parameterisations, as we need the first version when referring to [5], but the second version is more natural for the applications in our paper.Definition 1.6.We say A ⊆ A n is (t, ǫ)-global if the density of A inside each t-umvirate is < ǫ 2 .We say that A is relatively (t, K)-global if the density of A inside each t-umvirate is at most Kµ(A).

Product-free triples
We also consider the following cross version of the question.Given A, B, C ⊆ A n , we say that the triple (A, B, C) is product-free if there is a solution of1 ab = c with a ∈ A, b ∈ B, c ∈ C. In particular, (A, A, A) is product-free if and only if A is product-free.We consider the following problem.
Problem 1.7.What are the possible sizes of product free triples (A, B, C) .Problem 1.7 incorporates the well-studied problem of upper bounding sizes of independent sets inside Cayley graphs (see e.g.Ellis, Filmus and Pilpel [4]), which concerns the special case A = A −1 and B = C.
For simplicity we restrict ourselves to the question of maximising min(µ(A), µ(B), µ(C)) when (A, B, C) is product-free.Gowers [7] established an upper bound of (n − 1) −1/3 by showing that if (A, B, C) is product-free then µ(A)µ(B)µ(C) ≤ 1 n−1 .Eberhard [3] improved this to O(n −1/2 log 3.5 n) by showing that one of µ(A)µ(B), µ(A)µ(C), µ(B)µ(C) is O(n −1 log 7 n).We obtain the following bound, which gives the correct order of In fact, we show corresponding stability results that are a bit harder to state (see Theorem 6.4).Roughly speaking, they say that any sufficiently dense product-free triple has the approximate form (1 I→J , 1 x→I , 1 x→J ) .Moreover, we also prove a version of Theorem 1.5 that finds structure in product-free triples that are only polynomially dense.
As discussed later in the introduction, our approach can be viewed as an non-abelian analogue of Roth's bound for sets of integers with no three-term arithmetic progression, whereby we improve the earlier approaches of Gowers and Eberhard by establishing a form of the 'Structure versus Randomness' dichotomy.We achieve this by exploiting some recent theory developed by Filmus, Kindler, Lifshitz and Minzer [5] for global hypercontractivity on the symmetric group, and by also developing some further theory of the 'Cayley operators' associated to global subsets.

Techniques
We call a set of the form 1 x→I = {σ : σ (x) ∈ I} a star.We call the set σ : σ −1 (x) ∈ I = 1 −1  x→I an inverse star.The main steps in the proof of our main theorem (Theorem 1.1) are as follows.
1. Achieving dictatorial structure: We show that A has large density inside many dictators.In fact, we show that in some sense, the product freeness of A is completely explained by its densities inside dictators.
2. Achieving star structure: We then upgrade our dictatorial structure into a tighter star structure.We find some S that is either a star or an inverse star such that |A\S| is small and A has significant density in each restriction defined by S.
3. Bootstrapping: Using the approximate star structure, we deduce our exact results from further stability analysis showing that any small deviation from the structure leads to a suboptimal configuration.

Gowers' approach: the second eigenvalue
We start out along the path established by Gowers [7].His idea was to express the number of products in A ⊆ A n of density α as the sum of a 'main term' and an 'error term', where the error term is smaller in magnitude than the main term when α is large, so the number of products cannot be zero.The main term α 3 |A n | 2 is the expected number of products in a random set of density µ (A), whereas his bound for the error term has order (n − 1) −1/2 α 3/2 |A n | 2 , so when A is product-free he obtained the bound α ≤ (n − 1) −1/3 .
We will now outline his argument.Let f = 1 A be the indicator function of A ⊆ A n and T be the linear operator on L 2 (A n ) defined by Then A contains a product if and only if f, T f > 0. Let V ′ be the space of functions of expectation 0. Then we have Writing T * for the adjoint of T , some Representation Theory of S n (discussed in more detail below) tells us that the self-adjoint operator T * T acts on V ′ with all eigenvalues bounded absolutely by α n−1 .We deduce

Beyond spectral gap: the degree decomposition
This path was continued by Eberhard [3], whose improved bound is based on a refined analysis of the main contribution to the error term (he credits Ellis and Green for suggesting this approach), replacing the basic decomposition f = α+f ′ by a refined 'level' decomposition f = n−1 d=0 f =d .Instead of working with functions on A n , henceforth it will be more convenient to work with functions on S n that are supported on A n , so we will now let α = |A|/n!denote the density of A inside S n .This allows us to import machinery developed for S n without reworking it for A n , although it introduces some slight inconvenience in keeping track of factors of 2 in the calculations.
For d ≤ n, let W d be the linear subspace of L 2 (S n ) generated by indicators of d-umvirates.The degree of a nonzero function f is the minimal and note that the spaces V =d form an orthogonal decomposition of L 2 (S n ), known as the degree decomposition of S n .For a function f on S n we write f =d for the projection of f onto V =d .
To set up Eberhard's refined analysis, we write Again, some Representation Theory shows that T * T acts on V ′′ with all eigenvalues O(α/n 2 ).Similarly to the above argument this implies Thus it suffices to control the 'linear term' f =1 , T f =1 , which turns out to be equal to However, it is not generally true that the linear term is small compared with the main term.Indeed, this would imply 'product mixing', i.e. that the number of products is close to that in a random set of the same density, but Eberhard [3] constructed examples with significantly more products when α = o n −1/3 .The key to his approach is that the above counting argument only needs a lower bound on the linear term, and that this exhibits much nicer concentration properties than the upper bound.Nevertheless, even these better estimates break down for densities within a polylogarithmic factor of the optimum bound.

Our approach: structure versus randomness
Our approach to stability can be viewed as a further refinement of this method that is close in spirit to Roth's theorem, in that it is analogous to the 'structure versus randomness' dichotomy: for Roth's theorem, if the error term is large then A correlates with an arithmetic progression, whereas in our setting, if the linear term cancels the main term then A correlates with dictators.
Our starting point for this strategy is the formula where x i→j (π) = 1 π(i)=j , and each measures the correlation of f with a dictator.A large value of a i,j corresponds to a large density inside a dictator.On the other hand, having all a i,j of the same order of magnitude as µ(A) can be interpreted as pseudorandomness.Some calculations reveal that and that a ij a jk a ik .
These formulae alone do not suggest any structural properties for a contribution of −α 3 to the left hand side; in principle, Θ(n/α) values of a i,j = −α could contribute 1 n 2 Θ(n/α) 3/2 (−α) 3 , so when α = o n −1/3 we seem to have no structure.
Such extreme situations can be ruled out by concentration of measure, which is a key tool in Eberhard's approach, but seems doomed to give up logarithmic factors.However, much stronger structure can be extracted from a recent hypercontractive inequality of Filmus, Kindler, Lifshitz, and Minzer [5].
It will be convenient for us to put the coefficients a ij inside a matrix A = (a ij ) and to equip real valued n × n matrices with the inner product Our idea is to decompose A = A rand + A struc + A − as follows.We let the matrix A − consist of the negative coefficients a ij , where the other coefficients are replaced by 0. We then set the matrix A struc to consist of the 'large' values a ij ≥ ǫ, for some carefully chosen ǫ > 0. Finally, we let A rand consist of the 'small' positive coefficients a ij ∈ (0, ǫ) .
We then expose the dictatorial structure of A by showing that This could be understood intuitively as follows.Writing AB = {στ : σ ∈ A, τ ∈ B}, we see that the dictators satisfy D j→k • D i→j = D i→k .When expanding (1) in terms of the coefficients a ij , it shows that the only significant negative contribution to comes from triples a jk a ij a ik corresponding to dictators D j→k • D i→j = D i→k , with the property that A has a large density in two of the dictators and a small density in the remaining one.
To establish (1) we will expand the inner product A 2 , A in terms of the matrices A − , A struc , A rand and use the inequality to upper bound the undesirable terms.Our proof will thus crucially rely on upper bounds for A rand 2 and A − 2 , which we will establish via a hypercontractive inequality for global functions, as discussed in the next subsection.

Level-1 inequalities
The relationship between hypercontractive inequalities and inequalities that upper bound the 'level-1 weight' f =1 2 2 is well-known in the context of Boolean functions f : {0, 1} n → {0, 1} .There an inequality of the form is true for all Boolean functions and is known as the level-1 inequality (see O'Donnell [11,Chapter 5]).As a similar level-1 inequality for the symmetric group would be most desirable, as it would imply that n−1 is negligible compared to α 3 .However, such an inequality is not true in general, as it fails for dictators, and more generally for t-umvirates with small t.
The local nature of the obstructions suggests that the approach could be rescued by proving a level-1 inequality for global functions.Indeed, this was achieved in the analogous setting of general product spaces by Keevash, Lifshitz, Long and Minzer [10], who developed a hypercontractivity theory for global functions that has recently been a fruitful source of many applications.The corresponding results in the setting of the symmetric group have recently been established by Filmus, Kindler, Lifshitz and Minzer [5].Their level-1 inequality shows that if f : This inequality cannot be applied directly to our setting, as we cannot guarantee that f has small density inside each 2-umvirate.However, we are able to extend their approach to obtain the upper bounds The key idea is to apply the hypercontractive result of [5] to and to These inequalities will establish (1).As an expository simple case of our argument, in the next section we will show that this can be used to reprove Eberhard's result, by setting ǫ = 1, so that A struc = 0. Extracting the star structure for smaller ǫ is considerably more complicated, so we defer an overview of this part of the argument to subsection 5.3.

From star structure to extremal families
Once we know that A is almost contained in a star or inverse star S, say S = 1 x→I , then it is not too hard to show that it is in fact almost contained in F x I .In other words, we wish to show for each i, i ′ ∈ I that A has small density inside the dictator D i→i ′ .We accomplish this by inspecting the triple (A i→i ′ , A x→i , A x→i ′ ).Such triples should be intuitively considered as product-free after factoring out the corresponding dictators with To formalise this, we would like a variant of Eberhard's bound that holds not only for product-free triples A ′ , B ′ , C ′ ⊆ A n , but also for product-free sets A ′ , B ′ , C ′ living inside compatible dictators.This can be achieved by the following transformation.We set A ′ = (i ′ n) A (n i), B ′ = (i n) A (n x), and C ′ = (i ′ n) A(x n).The transformation from (A, B, C) to (A ′ , B ′ , C ′ ) preserves products.Moreover, the restrictions correspond to the original restrictions We may then view the triple (A ′ n→n , B ′ n→n , C ′ n→n ) as subsets of A n−1 and translate Eberhard's bounds inside A n−1 to the densities of A i→i ′ , A x→i and A x→i ′ .
Similar considerations combined with some more involved structural arguments will show that µ (A \ F x I ) is much smaller than µ (F x I \ A), thereby showing that F x I is extremal.

Eigenvalues of global Cayley graphs
The main tool for proving our 1% stability result (Theorem 1.5) is an upper bound on the eigenvalues of Cayley graphs Cay(S n , A) that correspond to global sets A. We believe that the result has independent interest and will be beneficial in various other applications; indeed, the study of Cayley graphs on S n and their eigenvalues is the basis of an entire field of research (see e.g.Diaconis [2]).Let A = A −1 and let T A be the operator corresponding to the random walk on Cay(G, A), given by T A f (x) := E a∼A [f (ax)].As mentioned above, the operator T A preserves each of the spaces V =d .We define or equivalently r d (T A ) 2 is the largest eigenvalue of the self-adjoint operator T * A T A acting on V =d .A useful fact from representation theory gives lower bounds on the dimension of each eigenspace of T A using the observation that each is invariant under the action of S n from the right.Indeed, this was a crucial ingredient in the seminal work of Ellis, Filmus and Pilpel [4], who derived bounds on the dimensions from well-known properties of invariant spaces; in particular, the dimensions are Θ(n d ) when d = O(1).
The trace method provides a fundamental way to exploit this lower bound.Specifically, writing m λ for the multiplicity of an eigenvalue λ of T A , we have ).Furthermore, if A is closed under conjugation then the operator T A commutes with the action of S n from both sides, in which case we have We show that similar statements hold for global sets, even when they are quite sparse.
Furthermore, if A is also closed under conjugation then Theorem 1.9 is the crucial new ingredient for proving our 1% stability result for product-free sets of density ≥ n −C (Theorem 1.5).

Organisation of the paper
We start in the next section with some technical preliminaries on the representation theory of S n and also the proofs of the results discussed immediately above, i.e. new eigenvalue estimates for global sets and their application to the proof of our 1% stability result.Section 3 considers the analysis of linear functions on the symmetric group.The main result of this section will be a level-1 inequality for the pseudorandom part of a function, which in itself will suffice to reprove Eberhard's result.We then move into more refined arguments that extract structural properties of product-free sets, first exposing the dictatorial structure in Section 4 and then the precise star structure in Section 5.In Section 6 we implement our bootstrapping arguments that refine the approximate structure to deduce our main results, giving the exact extremal result and strong stability results for product-free sets in A n .The final section contains some concluding remarks.

1% stability
This section contains some background on the representation theory of S n , our new eigenvalue estimates for global sets and their application to the proof of our 1% stability result.

Notation
We write X = O(Y ) to say that there exists an absolute constant C > 0 such that X ≤ C •Y , and similarly X = Ω(Y ) to say that there exists an absolute constant c > 0 such that X ≥ c • Y .We write X = Θ(Y ) to say that X = O (Y ) and also X = Ω (Y ).We also write X ≤ Y log O (1) n to say that there exists an absolute constant C, such that X ≤ Y log C n.
We discuss the space of real-valued functions on S n equipped with expectation inner product and L p -norms.We write We write σ ∼ S n to denote that σ is uniformly distributed inside S n and σ, τ ∼ S n to denote that they are chosen independently out of S n .For a function f : although we caution the reader that this usage will depend on context, as when we define E [f I→J ] below the expectation will be conditioned on the given restriction.

Restrictions
We define restrictions of functions in a manner that naturally generalizes the notion of restrictions of subsets of S n used in the introduction.
Definition 2.1.For t ≤ n and I = (i 1 , . . ., i t ) , J = (j 1 , . . ., j t ) ⊆ [n] ordered sets of size t.We denote by U I→J the t-umvirate of permutations sending each i l to j l .We define be ordered sets of some size t.We denote by f I→J : We write We may identify U I→J with S n−t by choosing a permutation σ sending n − t + l to i l for each l ∈ [t] and π sending j l to n − t + l.Then πU I→J σ is the set of permutations on S n fixing {n − t + 1, . . ., n}, which can be identified with S n−t .We will use this identification to import results on the symmetric group S n−t to the t-umvirate U I→J .

Orthogonal decompositions
Our proof will use spectral analysis over S n , so we need to recall its level decomposition and the more refined representation theoretical decomposition into isotypical components.

The level decomposition
We start by decomposing according to degree, as discussed in the introduction.
Definition 2.2.The space W d is the linear span of the indicators of the d-umvirates.We say that a real-valued function on S n has degree at most d if it belongs to W d .
By construction, W d−1 ⊆ W d for all d ≥ 1.We define the space of functions of pure degree d as , and so we can decompose each real-valued function f : We often refer to this decomposition as the level decomposition of f .Parts of our argument require the following finer decomposition that further decomposes f =i into more structured pieces.

The representation theoretic decomposition
Here we will list the properties we require of the finer decomposition of functions on S n into isotypical components; these can be found e.g. in [5,Section 7.2].We adopt the following standard notation.A partition λ uniquely corresponds to a Young diagram.Its transpose λ t is obtained by swapping the roles of the rows and the columns of the Young diagram.We write λ ⊢ n to denote that λ is a partition of n.We let S n act on L 2 (S n ) from the left and from the right by setting τ g = g (τ π) and gτ (π) = g πτ −1 .

Lemma 2.3.
There exists an orthogonal decomposition L 2 (S n ) = λ⊢n V =λ with the following properties: 1.For some numbers dim (λ), each V =λ is the direct sum of dim (λ) irreducible representations of dimension dim (λ).We write f =λ for the projection of f onto V λ .We identify functions f : A n → {0, 1} with function on S n whose value is 0 on the odd permutations.Such functions satisfy sign • f = f .This gives rise to two decompositions of f as a sum of elements in V λ , which therefore must be equal.The first is f = λ⊢n f =λ and the second is λ⊢n sign•f =λ t .This shows that sign • f =λ = f =λ t .

Operators from functions
We write When f = 1A µ(A) , the operator L f is the operator corresponding to the random walk sending σ to aσ for a random a ∈ A. Similarly, R f corresponds to the random walk that sends σ to σa.The operators L f and R f commute with the actions of S n from one side.Indeed, for any g ∈ L 2 (S n ) and σ, τ ∈ S n we have Similarly, we have R h (τ g) = τ (R h g) .When f is a class function (meaning that f (σ) depends only on the conjugacy class of σ) we have L f = R f .Indeed, when f is a class function we have As mentioned, the trace method plays a crucial role in our work, and computing the trace of the operators L * f L f will allow us to upper bound its eigenvalues.
Proof.We only consider T = L f , as the proof for R f is similar.We have

Hypercontractivity of global functions
In this subsection we state two inequalities from [5].To do so, we need the following natural extension of the definition of globalness from sets to functions.
Definition 2.5.We say f : for each such I, J.The hypercontractivity inequality of [5] takes the following form.
Theorem 2.6.There exists an absolute constant C such that the following holds.Let d, q ∈ N with q ≥ 2 and We also require the level-d inequality of [5], which is a consequence (not immediate) of their hypercontractive inequality, showing that global functions have low weight on the first d levels.

Eigenvalues of global Cayley graphs
We show the following stronger version of Theorem 1.9.Let T : L 2 (S n ) → L 2 (S n ) be an operator that commutes with the action of S n either from the left or from the right.Then we write When T is self-adjoint r d (T) is the largest eigenvalue of T inside V d ; in general, r d (T) is the square root of the largest eigenvalue of T * T. Theorem 1.9 is immediate from the following result applied with f = 1A E[1A] , as f 2 = µ(A) −1/2 , ǫ = Kµ(A), and L f = T A is the random walk operator corresponding to A.

Theorem 2.8. There is an absolute constant
If moreover f is a class function then To prove Theorem 2.8 we rely on the following lemma.
Lemma 2.9.For each d and λ, the operator L f agrees with L f =d on V =d and with L f =λ on V =λ .
Proof.Let R σ be the operator sending g to gσ.Then R σ commutes with the action of S n from the left.By Lemma 2.3 it therefore preserves the spaces V =d and V =λ .Let g ∈ V =d and let σ ∈ S n .Then as R σ g is in V =d and as f =d is the projection of f onto V =d we have The proof that L f agrees with L f =λ on V =λ is similar.
The trace of the operator L * f =d L f =d is f =d 2 2 by Lemma 2.4.By Lemma 2.9, and standard linear algebra we have On the other hand, the trace of a self-adjoint operator is the sum of its eigenvalues.Applying Lemma 2.3 gives Putting everything together, plugging in Theorem 2.7 and Lemma 2.3 we obtain for some absolute constant C.This implies that the theorem holds with 2C replacing C. When f is a class function the same proof works with dim (λ) replaced by dim (λ) 2 to give

The trace bound in high dimensions
The above upper bound on r d (L f ) is complemented by the following simpler bound that is more effective when dλ is large.
Lemma 2.10.Let d < n 10 and suppose that dλ On the other hand, by Lemma 2.3

Proof of our 1% stability results
Here we prove a version of Theorem 1.5 that is stronger in two ways: we consider any triple of global sets with density ≥ n −O (1) and we establish the product mixing phenomenon.For a function f : S n → R we write f for f • sign.Theorem 1.5 (in contrapositive form) is immediate from the following by setting Theorem 2.11.Fix C > 0 and suppose that n is sufficiently large.Let f, g, h : The first step of the proof is to separate the left hand side E σ,τ ∼Sn [f (σ) g (τ ) h (στ )] into low degree terms and high degree ones, as in the following lemma.
Proof.As L f commutes with the action of S n from the right it preserves each V =λ .We may therefore use orthogonality to obtain the following expansion into isotypical parts:

Regrouping terms of small degree
As d < n 2 − 1 at most one of d λ t , d λ can be at most d, so

Regrouping terms of small 'dual' degree
On the other hand, by Lemma 2.3 we have g =λ t = g=λ .Hence, We now prove a lemma showing that the high degree terms are negligible.
We now prove the theorem by combining the bound on f ≤d 2 2 from the level-d inequality with the bounds from Lemma 2.13 on the eigenvalues of L f that correspond to large degrees.

The main terms are
we bound the high-degree error terms as By Theorems 2.7 (for g and h) and 2.8 (for f ) we bound the low-degree error terms as The same bounds hold replacing f, g, h on the left-hand side by f , g, h (which have the same globalness properties) so the theorem follows.

The linear terms dominate
For future reference, we conclude this section by noting that the above arguments show that when comes from the linear terms.The following is immediate from Lemmas 2.12 and 2.13 with d = 1.
Proposition 2.14.Let f, g, h : 3 Analysis of linear functions over the symmetric group The main result of this section is our level-1 inequality for the pseudorandom part of a function.This in itself will suffice to reprove Eberhard's result (for expository purposes we will give the argument at the end of the section).We will start by describing a canonical way to represent f =1 as a linear combination of the dictators x i→j .This canonical representation f =1 = a ij x i→j will naturally lead to a decomposition of f =1 as a sum of its random part f rand := |aij |<ǫ a ij x i→j − 1 n and its structural part f struc .Our level-1 inequality will bound f rand 2 by ǫ f 2 up to logarithmic factors.

The normalized form of linear functions
We say that a linear function i,j a ij x i→j is in normalized form if for each i we have n j=1 a ij = 0 and for each j we have n i=1 a ij = 0. Every linear function in normalized form has zero expectation, i.e. is in V =1 .We will soon show the converse, i.e. that every f ∈ V =1 has a normalized form.First we give a simple formula for the inner product between two linear functions, which holds when at least one of them is in normalized form.
Proof.Consider the linear functionals ϕ, ψ : R n×n → R given by As both ϕ, ψ are linear it is enough to show that ϕ = ψ on a basis.Hence, it is sufficient to prove the lemma when g = x i→j .There we may use the fact that f is in normalized form to deduce that This completes the proof.
We are now ready to show that every function in V =1 has a normalized form.In fact, we give an explicit formula for the coefficients of each Proof.First we note that j E[f i→j ] = nEf for each i, so i,j a ij x i→j is a linear function in normalized form.As f =1 is the projection of f onto the space V =1 of linear functions with expectation 0, to prove the lemma it suffices to show that for each linear function g with E [g] = 0 we have f, g = i,j a ij x i→j , g .By linearity, it is enough to show this when g = x i→j − 1 n , as such functions g span V =1 .For such g we have On the other hand, by Lemma 3.1 we have i,j For f = i,j a ij x i→j in normalized form, Lemma 3.1 gives the Parseval formula f 2 2 = (n − 1) −1 i,j a 2 ij .For any linear function, not necessarily in normalized form, we still have the following upper bound, which has the same form up to a constant factor.
Proof.Computing, we get .
We bound each term on the right hand side separately.Using 2 |ab| ≤ a 2 + b 2 , each of the sums on the right hand side is ≤ 2 n i,j a 2 i,j , and so g Using Lemma 3.1 we can now derive a useful formula for the linear term in the count for the number of products in terms of the coefficient matrices of the normalised forms.Lemma 3.4.Let f = i,j a ij x i→j , g = i,j b ij x i→j , h = i,j c ij x i→j all be linear functions in normalized form.Let their coefficient matrices be defined by M f = (a ij ) i,j , M g = (b ij ) i,j , and Proof.We have where d kj (τ ) = i c ij x i→k (τ ) .By Lemma 3.1 we therefore have where e ij = k a jk c ik .We deduce that

Global hypercontractivity and the level-1 inequality
The following lemma shows that Theorem 2.6 may be applied to linear functions with small coefficients.The lemma is applicable for f rand defined above.
Proof.We need to show g I→J 2 ≤ ǫ ′ for any restriction with |I| = |J| ≤ 2. By averaging, it suffices to consider |I| = |J| = 2. Take distinct i, j, k, ℓ, and consider the restriction i → j, k → ℓ, corresponding to the duumvirate U i→j,k→l .We apply the bound where we define g : U i→j,k→l → R by and let By the triangle inequality we have |L| ≤ 9ǫ.For the second term above, we may use the aforementioned identification between U i→j,k→l and S n−2 to apply Lemma 3.3, deducing that gi→j,k→ℓ Thus we obtain the required bound g i→j,k→ℓ 2 ≤ ǫ ′ .

A level-1 inequality for the pseudorandom part
We now show the desired upper bound on the L 2 -norm f rand 2 of the pseudorandom part of f =1 ; in fact, we bound the right hand side of the bound f rand 2 2 ≤ 8X 2 from Lemma 3.3, where X 2 is as in the following statement.Lemma 3.6.Let ǫ ∈ 0, 1  2 and f : Proof.We note that f rand ∈ V =1 and by Lemma 3.1 we have Let q = 10 log (1/ǫ ′′ ) and ǫ ′ = 9ǫ + 3 f rand 2 , so that f rand is (2, ǫ ′ )-global by Lemma 3.5.Applying Theorem 2.6 we obtain where we used the fact that f is {0, 1}-valued.Using f rand 2 2 ≤ 8X 2 by Lemma 3.3 and rearranging we get We next consider two cases according to which of ǫ and X is larger.

Recovering Eberhard's result
For expository purposes, we will now cash in on our level-1 inequalities and reprove Eberhard's result (up to the polylog factor).We will repeatedly use the following upper bound on | M N, S | for three matrices M, N, S.
Lemma 3.7.Let M, N, S ∈ R n×n .Then we have . We use analogous notation for g = 1 B and h = 1 C , with g = b ij x i→j and h = c ij x i→j .
Then by Proposition 2.14 By Lemma 3.4 we have By our level 1 inequality (Lemma 3.6) we have with analogous statements for B − and C − .By our Parseval lemma (Lemma 3.1) we have and similarly for B and C.
We may now expand BA, C by writing A = A + + A − and similarly for B and C.After discarding the terms with a positive contribution to BA, C we are left with the four terms The lemma now follows from Lemma 3.7.
When (A, B, C) is product-free the above immediately implies the following bounds on their densities, as if min (αβ, βγ, γα) ≥ log R n n for sufficiently large R then all terms bar 2αβγ in the right hand side of (2) are o (αβγ).For future reference, we also note the following slightly stronger bound for the regime β, γ ≥ ǫ > n −o (1) , where we can replace the factor log O (1) n by log O(1) 1 ǫ .

Dictatorial structure
In this section we will expose the dictatorial structure of product-free sets and triples that are not too sparse.Throughout the remainder of the paper we adopt the following notation.We let We write A = (a ij ) , B = (b ij ) and C = (c ij ) .Proposition 2.14 shows that in order to understand it is sufficient to understand the linear part for which Lemma 3.4 gives the formula We will decompose the matrix A (and similarly B, C) into three parts: 1.The matrix A − contains the negative coefficients of A, and so represents the negative correlations that A has with dictators.
2. The matrix A rand contains the small positive coefficients of A, and so represents the pseudorandom part of f =1 .
3. The matrix A struc contains the large coefficients of A, and so corresponds to the dictators with which A is heavily correlated.
We expand BA, C according to this decomposition and show that most of the negative contributions come from the terms namely those compatible triples of dictators for which two of the matrices have a strong positive correlation and the third has a negative correlation.

Parameters
The following parameters will be used throughout the remainder of the paper.We let We fix R much larger than all the absolute constants implicitly appearing in our O (1) notation and suppose that n is sufficiently large with respect to R. We let Note that in our exact result, which is the case of most interest, we have

Our decomposition
For a matrix M = (m ij ) and an interval I ⊆ R we write M I = a ij 1 aij∈I .As mentioned above, our idea is to decompose our matrix A as the sum where A − = A (−∞,0) , A rand = A (0,ǫA) and A struc = A [ǫA,∞) .We decompose B and C similarly with ǫ B and ǫ C replacing ǫ A .
As mentioned in the introduction, the key to our approach is to combine Lemma 3.7 with the following upper bounds on A rand 2 2 and A − 2 2 , which will follow easily from our level-1 inequalities in the previous section.Lemma 4.1.Let A ⊆ S n and let A − , A struc , A rand as above.Then We are now ready to show that the only significant negative contributions to BA, C come from two structure matrices and one negative coefficient matrix.Lemma 4.2.We have

A struc
Proof.We expand the left hand side according to the decomposition A = A − + A struc + A rand and similarly for B, C. For a lower bound we can discard all the terms that involve an even number of A − , B − , C − as those have a non-negative contribution to BA, C .For the remaining terms not listed on the right hand side above we may apply Lemmas 3.7 and 4.1 to deduce that they have absolute value at most Here, the first three lines corresponds to terms such as B rand A − , C struc and B rand A − , C rand , while the fourth line corresponds to the term B − A − , C − .The second inequality follows by plugging in the values of ǫ A , ǫ B , ǫ C .

Star structure
In this section we will refine the dictatorial structure established in the previous section to extract a strong star structure that explains how some set or triple in A n can be quite dense yet product-free.These stability results will then be refined by bootstrapping arguments in the next section to deduce our exact and strong stability results.

Equivalence and inversion
The equation ab = c can be written in 6 equivalent ways, e.g.we may write ca Thus if the triple (A, B, C) is product-free then we have 6 equivalent product-free triples such as C, A −1 , B and B −1 , A −1 , C −1 .The structure explaining this product-freeness may appear in any of 6 different forms, so to avoid cumbersome statements, we will say that a certain structural statement for (A, B, C) holds up to equivalence if it holds when (A, B, C) is replaced by one of its 6 equivalent triples.Similarly, for a single product-free set A, the structural statement for A may apply to A or A −1 , so we will say that it holds up to inversion.

Goals of this section
Our first main result of this section will show that any product-free set has a strong star structure under a fairly mild assumption on its density (recall that δ −1 = log R n).
Proposition 5.1.Suppose that A is product-free with µ (A) ≥ δ −2 n −2/3 .Then up to inversion there exist x ∈ [n] and Moreover, for each i ∈ I we have µ Our second main result of the section describes the star structure for product-free triples under mild density assumptions: up to equivalence B and C must be strongly correlated with stars at some common vertex x.Proposition 5.2.Suppose that (A, B, C) is product-free with Then up to equivalence there exist x ∈ [n] and I, J ⊆ [n] such that the following hold:

Associated stars
We introduce some further notation that will be used throughout the remainder of the paper to describe the stars associated to the structured parts of A, B and C. For each i ∈ [n], we define the associated star for (A, i) by (Recall that ǫ A , ǫ B , ǫ C were defined in the previous section.) Similarly, we define the associated inverse star for (A, i) by We may interpret s A (i) and s ′ A (i) combinatorially as the correlation between A and the corresponding associated (inverse) star by noting that and similarly for s ′ A (i).We also define the corresponding notions for B and C similarly.We say that an associated star S A (i) is small if s A (i) ≤ δµ (A), or otherwise we say that it is large.We use this terminology for associated stars and associated inverse stars for each of A, B, C.

Overview of proof
We will now give an overview of the arguments used to extract star structure from the inequalities discussed above.For simplicity in the overview we will concentrate on the case of a single product-free set A, which is analysed by applying the inequalities with A = B = C.
For terms such as A struc A − , A struc we use the fact that each coefficient of A − is at least −α, which provides the following lower bound in terms of the associated stars: Summing over all the similar terms, we thus reduce our goal to a lower bound where X dominates the error terms unless A has strong star structure.
There will be two main ingredients in this bound.The first ingredient concerns getting rid of the small associated stars: we use the level-1 inequality (Lemma 3.6) to show that the ith contribution to (4) is negligible unless either The second ingredient provides a combinatorial analysis of large associated (inverse) stars, which is motivated by the heuristic that such stars should be essentially disjoint.Writing s A := i s A (i) and s ′ A = i s ′ A (i), we should therefore expect s A + s ′ A to be essentially bounded by α.In terms of the rescaled star sizes and moreover that if the sum is close to 1 then some v i or v ′ i is close to 1, which is equivalent to the required approximation of A by an associated star or inverse star. 1) n.
We will use the dyadic expansion Each of the resulting terms will be bounded using the following claim.
Claim 5.5.Let M ′ , N ′ be matrices such that each row of M ′ has at most m ′ nonzero coefficients and each row of N ′ has at most n ′ nonzero coefficients.Then Proof.This follows from Cauchy-Schwarz as For each row v of M(2 −i ,2 1−i ] , all nonzero coefficients are ≥ 2 −i and v, 1 ≤ δη 1 , so v has ≤ 2 i δη 1 nonzero coefficients.Similarly each row of Ñ(2 −j ,2 1−j ] has ≤ 2 −j δη 2 nonzero coefficients.Applying the claim, we obtain where the final inequality uses the assumption of the lemma.

Combinatorial analysis of the star structure matrices
In the next section we will use Lemma 5.4 to complete our transition from dictatorial structure matrices A struc to the star-structure matrices A ⋆ .To achieve this, we first need to control the error terms that will arise from applying Lemma 5.4, namely the L 1 -norms of the star structure matrices.
As discussed in the overview, this comes down to a combinatorial argument showing that the corresponding large associated stars and inverse stars are essentially disjoint, provided that A, B, C (and so the parameters ǫ A , ǫ B , ǫ C ) are sufficiently large.This is captured by saying for each E ∈ {A, B, C} that µ (E ∩ S) ≈ µ (E), where the sum is over all large associated stars and inverse stars of E. Lemma 5.6.Let ǫ > 0, δ > 0 and E ⊆ S n .Let S be a collection of stars and inverse stars such that for each S ∈ S we have δµ (E) ≤ µ (E ∩ S) and ǫ 2 µ (S) ≤ µ (E ∩ S).Suppose also that 100 Proof.Fix S ′ ⊆ S with |S ′ | = min{ 2 δ , |S|}.By Inclusion-Exclusion we have For any distinct S 1 , S 2 ∈ S we have Rearranging, we obtain which also yields To complete the proof we show that S ′ = S.This will follow once we show that |S ′ | < 2 δ .In fact we have and hence |S ′ | < 2 δ .Lemma 5.6 can be restated in terms of the star-structure matrices.It translates to the following upper bound on their 1-norms.
Then we have Analogous statements hold for B and C.
Proof.Let S A be the set of large associated stars for A. For each S ∈ S A we have µ (A ∩ S) ≥ δµ (A) by definition.On the other hand, for each dictator 1 i→j contained in S we have which gives µ (A i→j ) ≥ ǫ A , and so µ (A ∩ S) ≥ ǫ A µ (S).Hence we can apply Lemma 5.6 with ǫ = ǫ A , which gives Substituting ǫ A = nδα min(β, γ) and using 1000 δ 5 n 2 ≤ α min (β, γ) 2 gives ǫ 2 A /α ≥ 1000δ −3 , so the lemma follows.

Reducing to large associated stars
The following lemma combines everything we proved so far.It reduces us to upper bounding the star structure inner products and let ζ = min(ζ ′ , 1/2).Applying Lemma 5.8 with A = B = C, then Lemma 5.10, and then Lemma 5.7, we obtain ))αn, so by definition of ζ ′ we have This completes the proof.
5.10 A dense product-free set is close to a single star Now we will refine the star structure obtained in the previous subsection to prove Proposition 5.1, which shows that any moderately dense product-free set is closely approximated by a single star.Our idea is to use the (inverse) star S provided above with the fact that (A, A, A \ S) is a product-free triple to which we can apply Proposition 5.2 to deduce that A \ S is small.
We assert that µ (C) ≤ δ −2 n −2/3 .Suppose otherwise.Then the triple (A, A, C) is product-free, and we can apply Proposition 5.2 to deduce that its conclusion holds for some triple equivalent to (A, A, C).In principle, the possibilities are for some i that However, µ A (S A (1)) is so large that it precludes the existence of any other associated star or inverse star S with µ A (S) > 1/100, so the only possibility is that (3) holds with i = 1.However, by definition of C we have µ (C 1→j ) ≤ n − 1 3 for all i, but as S C (1) = ∅ we can choose j with This contradiction completes the proof.
The following lemma shows that if (A, B, C) are product-free and B, C are dense in stars 1 x→I , 1 x→J , then A is sparse outside of 1 I→J := {σ : σ(I) ∩ J = ∅}.
Proof.By a union bound and Lemma 6.1 we have To complete the proof we show that |I| |J| ≤ 10n log n.Suppose otherwise.Then by removing elements from either I or J we may assume that |I| |J| ∈ (10n log n, 20n log n) .In which case we have Together with (6) this yields which contradicts the hypothesis α ≥ 1 δǫn .This shows that |I| |J| ≤ 10n log n.

Stability result for product-free sets
We now prove the following stronger version of Theorem 1.2.
. Without loss of generality we assume this for A rather than A −1 .By Lemma 6.2 with A = B = C and ǫ = n − 1 3 we have |I| 2 ≤ 10n log n and

Bootstrapping triples
With slightly more work we are also able to prove the following stability result for product-free triples that are somewhat sparse (we replace a log factor by a log log factor).
In particular, we have The idea of the proof is to start with the weaker star structure guaranteed from Proposition 5.2.Lemma 6.2 will then easily imply a weaker variant of the theorem with 1 − O δ 1/4 replaced by 1  100 .This will allow us to show that actually A has no large associated stars and inverse stars, so instead of Proposition 5.2 we may apply the more suitable Lemma 5.13.
Proof.We claim that α ≥ 1 δ 100 n .Indeed, otherwise our assumption implies ǫ := min(β, γ) > (log log n) K δ 100 , and then Corollary 3.10 gives α ≤ log O(1) (1/ǫ) ǫn , so recalling δ −1 = log R n we have min(αβ, αγ) < (log log n) We now assert that A has no large associated star or inverse star S. Indeed, for such S, as µ (A ∩ S) ≥ δµ (A) we would have On the other hand, This contradicts the bound µ(A) = α ≥ 1 δ 100 n , so such S cannot exist.Thus we can apply Lemma 5.13 to obtain the required stronger approximation of B and C by associated stars, i.e. µ B (S B (x)) and µ C (S C (x)) are both 1 − O δ 1/4 .

Further bootstrapping of product-free sets
Recall from Theorem 6.3 that if A is a dense product-free set then µ (A \ F x I ) is small for some x and I.If A is extremal this implies µ F x I (A) ≈ 1.Our goal in this subsection is to show for such A that and therefore any extremal product-free A must be of the form F x I .Our proof will also work for sets that are sufficiently close to extremal.
We require various lemmas that will be applied to certain restrictions of A. The first considers a product-free triple (A, B, C) (which will be restrictions of the original A) and shows that if two sets are dense in sets of the form 1 I→I then the third must be empty.Proof.We only prove the first statement, as the proofs for permutations of ABC are similar.Suppose on the contrary that there exists τ ∈ A. Let σ ∼ S n be a random permutation.We e 8000 √ n then A x→j = ∅.We suppose for a contradiction that this fails.
We will apply Lemma 6.6 to the product-free triple (A i→j ′ , A x→i , A x→j ′ ) for each i ∈ I 2 and j ′ ∈ J ′ .By our above assumption on C := A x→j the conclusion of this lemma does not hold, so one of the hypotheses does not hold.By definition of I 2 , the hypothesis for B := A x→i is satisfied with 2ζ in place of ζ.Thus the hypothesis for A i→j ′ does not hold, so µ We conclude this section with the proof of Theorem 1.3, which implies Theorem 1.1 (our main theorem).
Proof of Theorem 1.3.We introduce the absolute constant c = e −9000 .We need to show that if n is sufficiently large and A ⊆ A n is a product-free set with µ(A) > max I,x µ (F x I ) − c n then up to inversion A is contained in some F x I .By Theorem 1.2, there exists some F x I such that µ (A \ F x I ) ≤ n −0.66 , where we may assume that this holds for A rather than A −1 .We note that µ(F By choice of c this implies ζ ≤ 1 e 8000 √ n , so we can apply Lemma 6.7 to obtain A ⊆ F x I .

Concluding remarks
Our methods for bounding product-free subsets of the alternating group have justified the intuitive idea that globalness can be regarded as a pseudorandomness notion, by showing estimates on the eigenvalues of Cayley operators for global sets that correspond to the intuition for random sets.Given the large literature on random Cayley graphs inspired by the seminal paper of Alon and Roichman [1], one natural direction for further research is whether analogous results hold for Cayley graphs with respect to global sets.We also propose the study of extremal problems for word maps in general groups (see the survey of Shalev [12] for background).Fix any group G. Any word w = w(x 1 , . . ., x d ) in the free group F d on d generators naturally defines a word map w : G d → G.For example, if d = 3, w = x 1 x 2 x −1 3 , and A, B, C ⊆ G then A is product-free iff A 3 ∩ ker w = ∅.Our main result therefore describes the largest A ⊆ A n such that A 3 ∩ ker w = ∅, and so suggests the following more general problem.

Problem 7 . 1 .
For any finite group G and word w ∈ F d , what is the largest A ⊆ G with A d ∩ ker w = ∅?
Theorem 6.4.There is an absolute constant K such that if n is sufficiently large and (A, B, C) is a product-free triple in A n with min (αβ, βγ, γα) ≥ (log log n) K n then up to equivalence there exist I, J ⊆ [n] , x ∈ [n] with |I| |J| ≤ 10n log n such that O(1)n , which contradicts our assumption for K large enough.Thus we have the claimed lower bound on α.By Proposition 5.2, up to equivalence of the triple (A, B, C) there exist associated stars S B (x) = 1 x→I , S C (x) = 1 x→J , such that µ B (S B (x)) > 1 100 and µ C (S C (x)) ≥ 1 100 .Our assumptions implies min (ǫ B , ǫ C ) ≥ δ, so by Lemma 6.2 we have |I| |J| ≤ 10n log n and µ