Partial associativity and rough approximate groups

Suppose that a binary operation $\circ$ on a finite set $X$ is injective in each variable separately and also associative. It is easy to prove that $(X,\circ)$ must be a group. In this paper we examine what happens if one knows only that a positive proportion of the triples $(x,y,z)\in X^3$ satisfy the equation $x\circ(y\circ z)=(x\circ y)\circ z$. Other results in additive combinatorics would lead one to expect that there must be an underlying"group-like"structure that is responsible for the large number of associative triples. We prove that this is indeed the case: there must be a proportional-sized subset of the multiplication table that approximately agrees with part of the multiplication table of a structure introduced by Tao which is a metric-entropy analogue of an approximate group, and that the converse is also true. We also present an example that suggests that our result cannot be strengthened to yield a dense subset that agrees with part of the multiplication table of a group.


Introduction
A Latin square is an n × n grid with each point in the grid given a label from a set of size n in such a way that no label occurs more than once in any row or column. This condition is loose enough for there to be a large number of nonisomorphic Latin squares (where we regard two Latin squares as isomorphic if one can be turned into the other by permuting rows, columns and labels) but tight enough for the restriction to be an interesting one. One way of constructing Latin squares is to take the multiplication table of a finite group. However, this source of examples is very special: the associativity property is a significant extra constraint.
It is trivial that if • is a binary operation on a set X of size n with the property that for each a the operations b → a • b and b → b • a are bijections, then the multiplication table of • is a Latin square. It is also a straightforward exercise to prove that if • is associative as well, then (X, •) is a group. Once one has made this observation, an obvious stability question arises: what if one knows only that a • (b • c) = (a • b) • c for almost all triples (a, b, c) ∈ X 3 rather than all triples? Must the multiplication table of (X, •) agree almost everywhere with a genuine group multiplication table?
In his master's thesis, Elad Levi [6] gave a positive answer to this question. He proved that for every ǫ > 0 there exists δ > 0 such that if the multiplication table of (X, •) is a Latin square and if a • (b • c) = (a • b) • c for at least (1 − δ)|X| 3 of the triples (a, b, c) ∈ X 3 , then there is a group G = (Y, * ) such that |Y △X| ≤ ǫ|X| and a * b is defined and equal to a•b for at least (1− ǫ)|X| 2 pairs (a, b) ∈ X 2 .
In this paper we shall look at the "one percent version" of this question (as opposed to the "99 percent version" just discussed). Here the hypothesis is weakened considerably, so that instead of almost all the triples (a, b, c) satisfying associativity, we just assume that a positive fraction of them satisfy it. Some of the central results in additive combinatorics, such as Freiman's theorem and the inverse theorems for the U k norms, are of this flavour, and have many applications.
Hrushovski has asked 1 what can be said if X is a finite set with a binary operation • that makes X into a Latin square and is such that the equation a • (b • c) = (a • b) • c is satisfied for at least ǫ|X| 3 triples (a, b, c) ∈ X 3 . One tempting conjecture is that there is a subset A ⊂ X 2 of density at least δ, a group G, and an injection f : X → G such that f (x • y) = f (x)f (y) for every (x, y) ∈ A. Loosely speaking, this would say that X 2 contains a large set such that the restriction of the binary operation • gives rise to a labelling that is isomorphic to part of a group multiplication table.
In this paper, we prove a slightly weaker statement, as well as giving an example that suggests that the stronger statement above is false 2 . Very roughly, the positive result asserts that it is possible to find a large subset A ⊂ X 2 such that there is "no short proof" that A does not embed into a group multiplication table -we will elaborate further on this statement later in this introduction. This turns out to imply that A is closely related (in a manner that we will make precise later) to a type of structure called a (K, r)-approximate subgroup introduced in a blog post of Tao [7] about metric-entropy analogues of sumset theory, which we informally call a rough approximate group. (This notion is also discussed in a preprint of Hrushovski [5].) Using some lemmas from [7] and appropriate modifications of arguments of Tao from [8] that are in a very similar spirit, it is possible to show that much of the basic theory of sets of small doubling and tripling carries over in suitably modified form to the context of rough approximate groups. In particular, we give a metric-entropy analogue of the Theorem 4.6 of [8], which shows that if X and Y are two subsets of a group with a small product set XY , then there is an approximate group H that is not too big such that X can be covered by a bounded number of left translates of H and Y by a bounded number of right translates. For the convenience of the reader, we give the details in an appendix.

Linear hypergraphs and the cuboctahedral norm
Before we state our main results more precisely, let us mention that we arrived at the problem from a different direction from Hrushovski, and the statement we shall prove is not quite the one we have just discussed, but a different statement that turns out to imply it. So now let us start again and introduce the problem in the way we arrived at it.
A tripartite linear 3-uniform hypergraph with vertex sets X, Y, Z is a subset H of X × Y × Z with the property that no two distinct elements of H share more than one vertex -so, for instance, if x ∈ X and y ∈ Y , then there is at most one z ∈ Z such that (x, y, z) ∈ H. We shall abbreviate the phrase "tripartite linear 3-uniform hypergraph" to "linear hypergraph" in this paper. We shall also use the word "faces" to refer to the elements of H and "vertices" to refer to the elements of X ∪ Y ∪ Z.
For dense 3-uniform hypergraphs (that is, 3-uniform hypergraphs with n vertices and θn 3 faces for some positive constant θ), and more generally for bounded functions f : X × Y × Z → C there is a useful norm that measures quasirandomness, given by the formula where C is the operation of complex conjugation and |ǫ| is shorthand for ǫ 1 + ǫ 2 + ǫ 3 . One of the important properties of this norm is that for any three functions u : X × Y → C, v : Y × Z → C and w : X × Z → C we have the inequality which tells us that if f 3 is small and u, v, w are bounded functions that each depend on only two variables, then f cannot have any significant correlation with uvw. It is natural to ask whether there might be a norm that plays the same role for linear hypergraphs that this norm plays for dense hypergraphs. When f is the characteristic function of a set A ⊂ X × Y × Z, the quantity f 8 3 counts the number of octahedra (with labelled vertices and including degenerate ones) in A. For linear hypergraphs, one would like to count some other subhypergraph, which should itself be linear. The natural choice turns out to be the cuboctahedron, which can be described in many ways.
A combinatorial definition is that it is a hypergraph given by a list of faces of the following form: x 1 y 1 z 1 , x 1 y 2 z 2 , x 2 y 1 z 3 , x 2 y 2 z 4 , x 3 y 3 z 1 , x 3 y 4 z 2 , x 4 y 3 z 3 , x 4 y 4 z 4 , which is perhaps easier to take in if one just gives the pattern of the indices: 111,122,213,224,331,342,433,444. Thus, it has twelve vertices and eight faces, as pictured in Figure 1.
It can also be viewed geometrically, as its name suggests. Given a cube, one can use it to define a linear hypergraph H as follows: the vertices of H are the midpoints of the edges of the cube, and for each vertex v of the cube, we obtain a face of H by taking the triangle whose vertices are the midpoints of the three edges of the cube that are incident to v. This gives us eight triangles, and each vertex of each triangle is a vertex of exactly one other triangle. These eight triangles are the triangular faces of the polyhedron obtained by taking the convex hull of the midpoints of the edges of the cube, which is a cuboctahedron.
The number of labelled cuboctahedra contained in a linear hypergraph turns out to give a useful notion of quasirandomness and can be used to define a "cuboctahedral norm": we shall not give the formula here, but it is the obvious modification of the "octahedral norm" above. Writing f cuo for the cuboctahedral norm of f , one has the useful property that if u : X → C, v : Y → C and w : Z → C, then |E x,y,z f (x, y, z)u(x)v(y)w(z)| ≤ f cuo u 2 v 2 w 2 .
Thus, a function with small cuboctahedral norm does not correlate with products of bounded functions of one variable. This is a special case of a result of Conlon, Hàn, Person and Schacht [3], who worked out a general theory of weak quasirandomness for k-uniform hypergraphs. (The word "weak" is used because knowing about correlations with products of one-variable functions is considerably weaker than knowing about correlations with products of (k − 1)-variable functions.) However, the above inequality concerns weak quasirandomness for dense 3-uniform hypergraphs, whereas we wanted a quasirandomness definition for linear hypergraphs, which are very far from dense. And interesting difficulties arise when one tries to use the cuboctahedral norm for linear hypergraphs.
Consider first what we might expect of a random linear hypergraph of nearmaximal density. (This part of the discussion will not be rigorous, because it is not easy to provide and then analyse a suitable probabilistic model.) Assuming that the vertex sets all have size n, the trivial upper bound for the number of faces is n 2 , since each pair (x, y) ∈ X × Y is contained in at most one face, so the density condition is that there are cn 2 faces for some positive constant c. How many cuboctahedra would we expect such a hypergraph to contain? There are n 12 choices for the vertices, and each choice defines for us eight faces, which each have a probability of order cn −1 of belonging to the hypergraph, so we would expect the number of cuboctahedra to be about c 8 n 4 .
For a theory of quasirandomness to get off the ground, one usually needs a norm whose value for random dense structures is within a constant of a trivial maximum value. So one hopes at this point that a linear hypergraph cannot contain more than n 4 cuboctahedra.
However, this hope is extremely far from being realized: the trivial maximum is not n 4 but n 5 . To see this, let G be any finite group and define a linear hypergraph H by taking all triples (a, b, c) ∈ G 3 such that abc = e. Now let us calculate the number of solutions to the equations a 1 b 1 c 1 = a 1 b 2 c 2 = a 2 b 1 c 3 = a 2 b 2 c 4 = a 3 b 3 c 1 = a 3 b 4 c 2 = a 4 b 3 c 3 = a 4 b 4 c 4 = e.
Once we have chosen the a i and the b i , the c i are determined, so the above task is equivalent to that of counting solutions to the four equations which we can rewrite as By transitivity of equality, the last equation here is redundant, and the first three are easily seen to have n 5 solutions. Indeed, we can choose a 2 , a 4 and b 1 as we like. They determine b 3 . Then we can choose a 1 , which determines a 3 , and finally we can choose b 2 , which determines b 4 . The total number of free choices was 5 -hence the answer of n 5 . It is also easy to see that this is the maximum possible: given a linear hypergraph H, if we imagine trying to build a cuboctahedron face by face, we have at most n 2 choices for the first face, at most n choices for each of its three neighbouring faces (since we have chosen one vertex for each), at most one choice for each of the faces at distance 2 (since after the choices of the neighbouring faces we have chosen two vertices for each one) and at most one choice for the face opposite. Note that for the opposite face we have chosen all three vertices, so we depend on a "miracle" for it to belong to H. It is precisely this miracle that is guaranteed to occur when H is constructed in the grouptheoretic way just described. (It manifested itself in the redundant equation that appeared when we counted the solutions to the equations above.)

Latin squares and the quadrangle condition
It turns out that the converse is true as well. This is a well-known result (originally due to Brandt [1]) in the theory of Latin squares -see, for example, [4]. Since the proof is short and easy, we shall give it in full. But first let us make explicit a connection between linear hypergraphs and labelled subsets of grids that we shall use throughout this paper. The connection is simply that if we have a linear hypergraph with vertex sets X, Y and Z, we can regard it as a labelling of some of the grid X × Y with points of Z: for each (x, y) ∈ X × Y , if there exists z with (x, y, z) ∈ H (which will necessarily be unique), then we label the point (x, y) with z, and otherwise we leave it unlabelled. If we label every point of the grid and X, Y and Z have the same size n, then we obtain an n × n Latin square, since if the same label z appeared more than once in a column, that would imply that the pair (y, z) was part of more than one face of H, and similarly for rows. If we do not label every point of the grid (because the hypergraph has fewer than n 2 edges) we call the resulting partial labelling a partial Latin square. Going in the other direction, if we have a labelling of part of X × Y with elements of a set Z, and if no label appears more than once in any row or column, then we can construct a linear hypergraph by taking its faces to be all triples (x, y, z) ∈ X × Y × Z such that (x, y) is labelled with z.
Let us call the labelled grid corresponding to a linear hypergraph its grid representation. The faces of the hypergraph are represented by labelled points of the grid, the three vertex classes are represented by the sets of rows, columns, and labels, and two faces are joined by a vertex if they share a row, column, or label. Let us call four points in the grid a rectangle if they are of the form (x, y), (x, y ′ ), (x ′ , y) and (x ′ , y ′ ). Then a cuboctahedron in the grid representa-tion is a pair of rectangles with the same labelling. Here is an example.
A Latin square is said to satisfy the quadrangle condition if whenever it contains seven faces of a cuboctahedron it automatically contains the eighth (the "miracle" referred to earlier). That is, if it ever contains a configuration such as We are now ready to give the proof of the converse to the earlier observation about cuboctahedra. Proof. Choose an arbitrary row R and column C and define a binary operation • on the set of labels as follows. Given labels a and b, find where a appears in row R and where b appears in column C, and then let a • b = c, where c is the label of the point in the same column as a and the same row as b. The label of the point where R and C intersect is then an identity for •, and the Latin square condition implies that every element has both a left and a right inverse. It remains to check associativity. To do this, consider the following picture, which is of a portion of the Latin square, chosen to demonstrate that and e for the identity.
For associativity we need f to equal h. But this follows from the quadrangle condition, since included in the above diagram are the points It is well known and easy to show that if a set has an associative binary operation with an identity such that every element has a left and a right inverse, then it is a group, so we are done.
A notable feature of the above proof is the arbitrary choice of an identity. As the proof shows, one cannot tell just from the multiplication table of a group what the identity element of the group is. If one wishes to avoid a noncanonical choice, then one needs to change category to one where the objects are "groups that do not know which element is the identity". One can do this by defining a ternary operation, which we can write as (a, b, c) → ab −1 c, which satisfies the axioms ab −1 b = a, aa −1 b = b, and ab −1 (cd −1 e) = (ab −1 c)d −1 e. The resulting algebraic structure is known as a torsor. From a torsor one can create a group by choosing an arbitrary element x and defining a binary operation a * b = ax −1 b. Then x is the identity, associativity follows from the torsor associativity, and inverses are guaranteed by the Latin square condition. The relationship between groups and torsors is closely analogous to the relationship between vector spaces and affine spaces, and the ternary map is also closely analogous to the (partially defined) map (a, b, c) → a − b + c that often appears in additive combinatorics when one has a set A with additive structure that is not "centred on zero".
In the light of these observations, it is natural to wonder what happens if we look at linear hypergraphs where the number of cuboctahedra is close to the maximum of n 5 but not necessarily equal to it. Must such hypergraphs be "group-like" (or "torsor-like") in some way?
This question is very closely related to the question about binary operations with many associative triples. One can show that the multiplication table of such a binary operation contains many cuboctahedra, so any positive result one can prove about Latin squares with many cuboctahedra implies a corresponding result about binary operations with many associative triples. By the same token, any negative result about binary operations with many associative triples implies a corresponding negative result about Latin squares with many cuboctahedra.
In hypergraph terms, the condition that there are many cuboctahedra says the following. Let us define a potential cuboctahedron to be a linear hypergraph with eight faces and thirteen vertices v 1 , . . . , v 12 , v ′ 12 that are all distinct except that v 12 may possibly equal v ′ 12 , such that if the vertices v 12 and v ′ 12 are identified then one obtains a cuboctahedron. If v 12 = v ′ 12 , then we call the potential cuboctahedron actual, and otherwise we call it flappy. In Figure 2 we show a flappy cuboctahedron next to an actual cuboctahedron. Geometrically, a flappy cuboctahedron is obtained from a cuboctahedron by pulling apart two faces that are joined at a vertex, so that those two faces become "flaps". In hypergraph terms, the quadrangle condition is the condition that a linear hypergraph does not contain any flappy cuboctahedra.
Let us say that a labelling of some subset A of the grid [n] 2 satisfies the label quadrangle condition if whenever A contains two rectangles that are labelled in the same way for three of their four corners, then they are labelled in the same way for the fourth as well. That is, it cannot contain a configuration of the form with the lower c and d in different rows.
If A satisfies the label, column and row quadrangle conditions, then we simply say that A satisfies the quadrangle condition. Note that in the case of Latin squares, where A = [n] 2 and the set of labels is [n], the three quadrangle conditions are equivalent, but they become distinct when A is a proper subset of [n] 2 and/or the number of labels is greater than n.
Recall that an n × n partial Latin square is a labelling of a subset of an n × n grid with labels from [n] such that no label is repeated in any row or column. We will often conflate the partial Latin square with the corresponding linear hypergraph when the meaning is clear (so, for instance, we may say that a partial Latin square contains a cuboctahedron, even though the latter is strictly speaking a hypergraph).
A special case of one of our main theorems is the following statement.
Theorem 1.2. There exists an absolute constant C such that for every ǫ > 0, if A is an n × n partial Latin square containing at least ǫn 5 cuboctahedra, then A contains a subset A ′ of density at least ǫ C such that the restriction of the labelling to A ′ satisfies the label quadrangle condition.
By interchanging the roles of labels, rows and columns, one can show easily that if one replaces the label quadrangle condition with the row or column quadrangle condition in the theorem above, then one obtains an equivalent theorem. Therefore, applying the above theorem three times, once each for labels, rows and columns, we deduce the following corollary, which is still only a special case of our main theorem. Corollary 1.3. There exists an absolute constant C such that for every ǫ > 0, if A is an n × n partial Latin square containing at least ǫn 5 cuboctahedra, then A contains a subset A ′ of density ǫ C such that the restriction of the labelling to A ′ satisfies the quadrangle condition.
An equivalent statement, expressed in terms of linear hypergraphs, is the following.
Corollary 1.4. There exists an absolute constant C such that for every ǫ > 0 the following statement holds. Let X, Y and Z be three sets of size n and let H be a linear hypergraph with vertex sets X, Y and Z. Suppose that H contains at least ǫn 5 cuboctahedra. Then there is a subhypergraph of H with at least ǫ C n 2 faces such that every potential cuboctahedron is actual -that is, the subhypergraph contains no flappy cuboctahedra.

The relationship between cuboctahedra and partial associativity
In this short section, we show that the multiplication table of a partially defined binary operation with many associative triples contains many cuboctahedra.
Suppose, then, that we have a partial binary operation • on a set X of size n such that the maps b → a • b and b → b • a are injections for every a ∈ X and such that a Then the average size of |W b | is at least ǫn 2 . Writing ǫ b for the density of W b in X 2 , the box-norm inequality tells us that W b contains at least ǫ 4 b n 4 quadruples (a 0 , a 1 , c 0 , c 1 ) such that all four points (a i , c j ) belong to W b . Therefore, by Jensen's inequality, the average number of such quadruples in W b is at least ǫ 4 n 4 . Each such quadruple yields a diagram of the following form.
where the left column and bottom row say which elements are being multiplied together. The associativity of the triples (a and the result is that each quadruple of triples gives us a (grid representation of a) cuboctahedron. Note that from the cuboctahedron we can reconstruct the pairs (a 0 , d 0 ) and (a 1 , d 1 ) by looking at which columns are used, and since b = a −1 0 d 0 = a −1 1 d 1 , it follows that we can reconstruct b. Therefore, distinct b give rise to distinct cuboctahedra, and putting all this together implies that there are at least ǫ 4 n 5 cuboctahedra.
This observation justifies the formulation of our positive results (such as Corollary 1.4 and the theorems in the next sections), and from now on we shall talk almost entirely about cuboctahedra rather than about associative triples.

Short proofs of inconsistency
In this section we shall provide some more details about the precise statement of our main result, and the connection with a formulation in terms of linear hypergraphs.
Even though Proposition 1.1 shows that the quadrangle condition implies the associativity axiom in a Latin square, giving us a group, the statement cannot be strengthened to partial Latin squares. If a partial Latin square A satisfies the quadrangle condition, it is not necessarily the case that A embeds into a group multiplication table -we shall see examples later that demonstrate this fact.
Conversely, of course, the existence of a single flappy cuboctahedron will prove that A cannot embed into a group multiplication table. One way to think about this is to explicitly convert the flappy cuboctahedron into a proof of inconsistency in the following way. Suppose that the flappy cuboctahedron violates the label quadrangle condition and therefore consists (in the grid representation) of two rectangles, the first occupying rows x 1 and x 2 and columns y 1 and y 2 with labels z 1 , z 2 , z 3 and z 4 , and the second occupying rows x 3 and x 4 and columns y 3 and y 4 with labels z 1 , z 2 , z 3 and z 5 . The presence of these labels in these positions gives us a collection of relations We can combine these to form the following relations which in turn combine to give the relation which reduces (by cancelling inverse pairs) to the relation z 5 z −1 4 . But this proves that the two distinct elements z 4 and z 5 are in fact equal, which is a contradiction.
In general, the idea is as follows. We give names x i to the rows of the grid representation and names y i to the columns. We then consider the group with generators given by the set of rows x 1 , . . . , x n 1 combined with the set of columns y 1 , . . . , y n 2 and the set of labels z 1 , . . . , z n 3 . For each element of A, corresponding, say, to label z k in row x i and column y j , we have a relation x i y j z −1 k . Denote the group with this presentation by G A . We can embed A into the multiplication table of a group if and only if we can embed it into the multiplication table of G A , which in turn is possible if and only if there is no pair of distinct generators that can be proved to be equal using the relations in G A . We call such a proof, if it does exist, a proof of inconsistency. Since a flappy cuboctahedron provides a proof of inconsistency, the presence of a flappy cuboctahedron is a sufficient condition for showing that A cannot be embedded into a group multiplication table.
However, it is not necessary, and in general a proof that two generators are equal can be much more involved. Let us define the length of a proof of inconsistency to be the number of relations of G A used in the proof (with multiplicity). Thus the length of the proof of inconsistency provided by a flappy cuboctahedron is 8. As we shall see, there are natural examples of partially labelled grids that cannot be embedded into group multiplication tables, for which the length k of the shortest proof of inconsistency is arbitrarily large (where the density of the labelled points depends only on k).
At this point we note that there is a well-established framework for discussing proofs of equality in a group with given presentation, namely van Kampen diagrams. We shall discuss these at some length in Section 2.2, so we do not include the details here. It suffices at this point to say that such diagrams come with a notion of boundary word, which, in a proof of inconsistency above, will be a word of the form uv −1 for distinct generators u and v. They also come with a notion of area, which corresponds precisely to our definition of the length of the inconsistency proof.
Later we shall see other examples of proofs of inconsistency. Given such a proof we may form the hypergraph corresponding to the proof (in the same way that the flappy cuboctahedron corresponded to the proof above) by translating the relations used in the proof into faces of the hypergraph representation of A. We may form the corresponding van Kampen diagrams also; these will correspond to a certain kind of dual of the hypergraphs. We will see that the hypergraphs given by proofs of inconsistency have a very particular structure, which we shall call flappy spherical hypergraphs -see Section 2.2 for a continuation of this discussion.
Since the number of relations used in a proof determines its length, the number of faces of the corresponding hypergraph (assuming non-degeneracy) also determines the length. This means that for A to admit no short proof of inconsistency, it is sufficient for A to contain no small flappy spherical hypergraphs.
We may thus frame our main result as a generalization of Corollary 1.4. This corollary allowed us to pass to a dense subgraph in order to eliminate all flappy cuboctahedra, while our main positive result states that we may pass to a dense subgraph in order to eliminate all flappy spherical hypergraphs below a given, bounded size. Theorem 1.5. Let b be a fixed positive integer. Then there exists C = C(b) such that for every ǫ > 0 the following statement holds. Let X be a set of size n, let A ⊂ X 2 , and let φ : A → X be a function that is injective in each variable separately, such that the graph of φ, when viewed as a partial labelling of X 2 , contains at least ǫn 5 cuboctahedra. Then A has a subset A ′ of size at least ǫ C n 2 such that the restriction of the labelling to A ′ admits no proof of inconsistency of length less than or equal to b.

Rough approximate groups
In this section we describe a class of examples that satisfy the conclusion of Theorem 1.5. As a warm-up, we present the example that has many associative triples, but that we conjecture has no dense subset that is isomorphic to part of the multiplication table of a group. Let Γ be a maximal δ-separated subset of SO(3) (in some sensible translation-invariant metric -it is not too important which), let θ > 0 be a small real number, and define a partial multiplication • : Γ × Γ → Γ by setting x • y = z if d(xy, z) ≤ θδ. It can be shown that this operation is defined a positive proportion of the time -we give the details in Appendix C. To see that this operation is associative when all relevant products are defined, note that and a similar argument shows that d(( It is straightforward to generalize this argument to show that if θ is sufficiently small, then the operation • is associative for longer products -again when all subproducts are defined with all possible bracketings. We shall prove a more general fact later in the section. Conjecture 1.6. No dense subset of the multiplication table of the partial binary operation just defined can be embedded into the multiplication table of a group. It is not too hard to prove that the full multiplication table cannot be embedded into the multiplication table of a group (though this is not a trivial statement) but proving it for all dense subsets seems to be rather more difficult. However, we are fairly confident that the conjecture is true. As we shall soon see, this would imply that our main result is in a certain sense the best one can hope for. First, we remark that our choice of the group SO(3) in the above example is not essential: what matters is that the group should be non-Abelian and not too high-dimensional. (The second condition is needed to ensure that balls of radius θδ are not too much smaller than balls of radius δ. Without that, there is no easy way to ensure that the partial binary operation is densely defined.) Let G be a metric group -that is, a group with a translation-invariant metric. For later convenience, we allow our "metrics" to take the value ∞. We call a subset H ⊂ G a (K, ǫ)-rough approximate subgroup if H = H −1 and there is a set E of size at most K such that E = E −1 and HH ⊂ (EH) ǫ . Here, A ǫ denotes the ǫ-expansion of A, meaning the set of all points at distance at most ǫ from some point of A. Note that the words "rough" and "approximate" refer to two different senses in which H is not exactly closed under multiplication: "rough" means that the closure property holds only after a small perturbation, and "approximate" means that H 2 is contained in the union of a small number of translates of H rather than in H itself. A (K, 0)-approximate subgroup is just an approximate subgroup in the usual sense. By a rough approximate group we mean a rough approximate subgroup of some metric group. (It is possible to define approximate groups intrinsically [2], and the same is doubtless true for rough approximate groups, but this turns out not to be needed for our purposes.) Now let us give a general recipe for constructing binary operations that yield no short proofs of inconsistency. Proposition 1.7. Let G be a metric group and let X, Y be 1-separated subsets of G. Let 0 < ǫ < 1/12 and suppose that there exists a (K, ǫ)-rough approximate subgroup H of G of size at most K|X| 1/2 |Y | 1/2 and sets U, V ⊂ G of size at most K such that X ⊂ (U H) ǫ and Y ⊂ (HV ) ǫ . Then there is a subset Z ⊂ G of size at most K 4 |X| 1/2 |Y | 1/2 such that if we define a partial binary operation • : X × Y → Z by setting x • y to be z if and only if d(xy, z) ≤ 3ǫ, then x • y is defined for at least K −8 |X||Y |/8 pairs (x, y) ∈ X × Y , and the shortest proof that the multiplication table of • does not embed into a group multiplication table has length at least ǫ −1 /12.
Proof. Observe first that if B is a ball of diameter less than 1, then for each x ∈ X there is at most one y ∈ Y such that xy ∈ B. It follows that there are at most min{|X|, |Y |} pairs (x, y) such that xy ∈ B.
Let W be a set of size at most K such that H 2 ⊂ (W H) ǫ . Then and U W HV has cardinality at most K 3 |H| ≤ K 4 |X| 1/2 |Y | 1/2 . Let the cardinality of U W HV be C|X| 1/2 |Y | 1/2 . For each z ∈ U W HV , let f (z) be the number of pairs (x, y) ∈ X × Y with d(xy, z) ≤ 3ǫ. Then the average value of f (z) is at least C −1 |X| 1/2 |Y | 1/2 . From the observation in the first paragraph and the assumption that ǫ < 1/6, we also have that the maximum value is at most min{|X|, |Y |} ≤ |X| 1/2 |Y | 1/2 . Therefore, there are at least Let S be the set of all z ∈ U W HV such that f (z) ≥ K −4 |X| 1/2 |Y | 1/2 /2 and let Z be a maximal 1/4-separated subset of Z. If |Z| < K −4 |S|/2, then it follows that there exists z ∈ S such that the ball of radius 1/4 about z contains more than 2K 4 elements of S, and from this it follows that there are more than |X| 1/2 |Y | 1/2 ≥ min{|X|, |Y |} pairs (x, y) such that the product xy belongs to the ball of radius 1/4 + 3ǫ about z. Since ǫ < 1/12, this contradicts the observation in the first paragraph.
As in the statement of the proposition, we now define a partial binary operation • : X × Y → Z by setting x • y = z if d(xy, z) ≤ 3ǫ. Note that this operation is indeed defined for at least K −8 |X||Y |/8 pairs (x, y) ∈ X × Y . It remains to show it does not give rise to any proof of inconsistency of length less than ǫ −1 /12. This is almost immediate. Suppose, for instance, that there is a proof of length k that x 1 = x 2 , where x 1 and x 2 are distinct elements of X. Such a proof is a sequence of words w 0 , . . . , w k where w 0 = x 1 , w k = x 2 , and each word w i is obtained from w i−1 by inserting or removing an inverse pair or a relation of the form xyz −1 , where x • y = z. Each time an inverse pair is inserted or removed, the corresponding product in G is unaltered, but each time a relation xyz −1 is inserted or removed, the corresponding product in G can move by up to 3ǫ. Therefore, since X is a 1-separated set, the proof cannot have length less than ǫ −1 /3. The same applies to two elements of Y . Since Z is only 1/4-separated rather than 1-separated, the bound for Z is ǫ −1 /12.
In the first two appendixes to this paper, we prove a converse to this result. Given a metric group G and a partial Latin square A, we say that A is δapproximately isomorphic to a subset of the multiplication table of G if there are embeddings φ 1 , φ 2 , φ 3 from the rows, columns and labels of A into G such that for any point (x, y, z) ∈ A we have d(φ 1 (x)φ 2 (y), φ 3 (z)) < δ.
Loosely speaking, Theorem B.10 shows that given a dense partial Latin square A with no short proofs of inconsistency, there exists a rough approximate subgroup H of a metric group G and a dense subset B ⊂ A such that B is approximately isomorphic to the multiplication table of H (inside the multiplication table of G).
This converse shows that all examples of partial binary operations with no short proof of inconsistency are small perturbations of the kind of example given by Proposition 1.7. Since those examples remain examples after a small perturbation, we have a characterization of partial binary operations with no short proof of inconsistency, and hence a characterization of partial binary operations that satisfy the one percent quadrangle condition.

The main result: preliminaries
Across the next three sections we shall give a proof of Theorem 1.5. We begin with a discussion of some terminology and definitions.

Hypergraphs and labelled grids
In much of what follows, we shall be considering a dense subset A of an n × n Latin square. As has already been stated, this can also be viewed a tripartite, linear, 3-uniform hypergraph with αn 2 faces (where n is the size of each vertex class). We shall make use of both ways of thinking about A, and sometimes switch between the two. The terminology used will make clear at each stage which setting we are using: in the hypergraph context we will refer to faces (or edges) of A, which means the same as points of A in the labelled grid setting.
It is, however, important to note that there are several different labelled grid representations of the same linear hypergraph -we are free to choose any of the three vertex classes to correspond to 'labels' in the grid formulation, with the other two classes corresponding to rows and columns. Thus the hypergraph is a more natural and symmetric way of thinking about A, but some of the technical details are easier to handle using the labelled-grid framework.

Cycles and 2k-PFs
An object that will be extremely important to us is one that naturally extends the notion of a rectangle in the grid (as seen, for instance, in the quadrangle condition). A rectangle consists of four points in a row-column cycle: in other words arranged so that the first pair share a row, the next share a column, etc., with no restriction on the four labels. With that in mind, we make the following definition.
Definition 2.1. A 2r-cycle in a partially labelled grid consists of 2r points forming a row-column cycle. In other words, we have 2r points that alternate between sharing rows and columns, with no restriction on the labels.
If we disregard the labelling and think of the resulting subset of the grid as a bipartite adjacency matrix, then the above definition is just the usual definition a 2r-cycle in the corresponding bipartite graph.
If we look at 2r-cycles in the setting of tripartite 3-uniform hypergraphs, we are taking two of the vertex sets, X and Y , say, and forming 2r-cycles of faces, where two faces are joined if they share a vertex either in X or in Y . We shall give a separate name to these hypergraphs.

Definition 2.2.
Let H be a tripartite 3-uniform hypergraph. A 2r-petalled flower, or 2r-PF is a cycle of 2r faces such that each face shares one vertex with the next face and a different vertex with the previous face, and such that the third vertex always comes from the same vertex class. We refer to the 2r vertices of degree 2 in the 2r-PF as the inner vertices and to the 2r vertices of degree 1 as petals. We shall sometimes refer to PFs when the number of faces is not to be specified.
Observe that a 2r-cycle in the labelled grid corresponds to a 2r-PF in which the petals come from the class corresponding to the label coordinate.

Spherical hypergraphs and van Kampen Diagrams
It will turn out that a certain dual picture of our 3-uniform linear hypergraphs will often give us a more natural picture to consider, and will allow us to use standard notions from geometric group theory to describe proofs of inconsistency (as discussed in Section 1.4).
We begin with the definition of a van Kampen diagram and associated terminology. Definition 2.3. Given a group G with generators from a set U and relations from a set R, the corresponding van Kampen diagram consists of (a drawing of) a planar, directed graph in which each edge is labelled with an element of U and in which for each face the word obtained by traversing the edges of the face clockwise or anticlockwise in a single complete cycle (writing the inverse if the edge is traversed in the opposite direction to its orientation) corresponds to a relation from R. The van Kampen diagram also has an identified vertex, the base vertex, on the topological boundary of the graph. We associate with the van Kampen diagram the word w that is obtained by traversing the topological boundary cycle of the diagram clockwise, starting and finishing at the base vertex -this word is called the boundary word of the van Kampen diagram. The area of the van Kampen diagram is the number of faces (not including the external face outside the boundary).
The following result is an abridged form of the van Kampen lemma. Lemma 2.4 (van Kampen). Let D be a van Kampen diagram for the group G with boundary word w. Then w = 1 in G. Moreover, if w is a word such that w = 1 in G then there exists a van Kampen diagram D whose boundary word is equal to w.

Figure 3: A van
Kampen diagram corresponding to a label flappy cuboctahedron with flaps labelled z 4 and z 5 . The boundary word of the diagram is z 4 z −1 5 . The labels x i correspond to rows in the grid representation, and the labels y i correspond to columns.
As discussed in Section 1.4 we may, given a partial Latin square A, form the group G A obtained by taking the set of rows, columns and labels to be the generators and taking a relation for each point of A. In this group, the relations all have the form xyz −1 . Therefore any van Kampen diagram obtained from G A is a triangulation of a region of the plane.
If we take each face of such a van Kampen diagram, its edges and vertices form a triangle in the graph-theoretic sense. If we replace each of these triangles by its dual, again in the graph-theoretic sense, then we can interpret the new triangles as the faces of a linear 3-uniform hypergraph, which is tripartite, since each face of the van Kampen diagram had a row-edge, a column-edge and a label-edge, and therefore the vertices of the hypergraph are partitioned in a corresponding way. For example, if we perform this process to an octahedral van Kampen diagram (which has no boundary) then we obtain a cuboctahedron.
A flappy cuboctahedron, on the other hand, corresponds to a van Kampen diagram that looks like an octahedron with a 'slit' -a single edge of an octahedron has been split into two edges with the same endpoints. By pulling this slit open so that it bounds a van Kampen diagram that is drawn in the plane, we obtain the diagram shown in Figure 3.
Note that if, as we are assuming, all the relations are of the form x i y j z −1 k , and if we simplify the labelling of a van Kampen diagram by replacing each x i by x, each y j by y, and each z k by z, then the labellings and orientations of the edges of two triangles that share an edge are reflections of each other. This implies, for example, that each vertex sees edge labels of exactly two kinds. Let us call a van Kampen diagram of this special kind kaleidoscopic.
In view of the correspondence just described, we make the following definitions.
Definition 2.5. A spherical tripartite hypergraph is a hypergraph that corresponds to a kaleidoscopic triangulation of a sphere. A flappy spherical tripartite hypergraph is a hypergraph that corresponds to a kaleidoscopic van Kampen diagram with a boundary of length 2.
In particular, a flappy cuboctahedron is a flappy spherical hypergraph, and its corresponding proof of inconsistency can be visualized as the van Kampen diagram with boundary of length 2 displayed in Figure 3. Moreover, all inconsistency proofs can be visualized this way.
Given a subset A of a Latin square (meaning a subset of the grid together with its labelling), we can map the rows, columns and labels of the Latin square to the corresponding generators of the group G A . This map φ is a homomorphism in the sense that if label z appears at the point (x, y), then φ(x)φ(y) = φ(z) in G A . However, this does not give us an embedding of A into the multiplication table of G A because we have no guarantee that φ is injective: indeed G A might be trivial. The injectivity of φ is a necessary and sufficient condition for A to embed into a group multiplication table. For φ to be injective, we need only that G A admits no proof that a generator u equals a different generator v (in the sense that the two generators can be shown to be equal using the relations obtained from A). Lemma 2.4 gives us that such a proof consists precisely of a van Kampen diagram with boundary of length 2 (such as the example shown in Figure 3).
Specifically, we have shown: Lemma 2.6. A partially labelled grid A embeds into a group multiplication table if and only if G A admits no van Kampen diagram with a boundary word of length 2 that is not an inverse pair.
Thus, the problem of finding a subset of a partially labelled grid that is isomorphic to part of the multiplication table of a group is equivalent to the problem of finding a subset that contains no flappy spherical hypergraphs. As we have mentioned, we do not know how to do this, and do not believe that it is possible in general. Instead, we shall show how to pass to a dense subset in which there are no flappy spherical hypergraphs with fewer than k faces, for a fixed (but arbitrary) k.
The van Kampen lemma shows that the elimination of bounded size, flappy, spherical hypergraphs corresponds to the elimination of van Kampen diagrams with a boundary word of length 2 and bounded area. We refer to the corresponding inconsistency proofs as short since a van Kampen diagram can be converted into a proof by converting the faces back into relations, and the length of such a proof is equal to the area of the diagram.

Van Kampen complexes
In some later parts of the argument, it will be most natural to abandon the grid and hypergraph representations for our Latin square and work entirely within the dual setting. For this purpose we make the following definition.
Definition 2.7. Given a partial Latin square A, we may build a simplicial complex K A by including a triangular 2-simplex (face) for each relation of G A as described above. We identify the 1-simplices (edges) corresponding to the same row, column or label of A. We identify 0-simplices (vertices) only when forced to by our identification of edges. We call the resulting simplicial complex a van Kampen complex.
Our policy of avoiding identifying vertices unnecessarily in the van Kampen complex is not important. We could in fact identify all vertices, but it often helps to picture the van Kampen complex or parts of it (as in Figure 3) if we do not do this.

2k-PFs in terms of van Kampen diagrams
In Section 2.1.1 we introduced the notion of a 2k-PFs in the hypergraph representation. This has a natural interpretation in the dual setting, which will be important when we focus our attention on van Kampen diagrams later on. The internal vertices of the 2k-PF become edges that are shared by adjacent faces, and the petals become edges that have labels from one class only (row, column or label) and form a 2k-gon, triangulated with 2k triangles that radiate from a single point in the middle. The van Kampen diagram in Figure 4 is a 4-PF, for example. Its boundary word is Given a kaleidoscopic van Kampen diagram, we may isolate a 2k-PF by picking an internal vertex and taking the collection of triangles incident to it. In the van Kampen representation of the flappy cuboctahedron shown in Figure 3, for instance, we find four different 4-PFs since each internal vertex is contained in four triangular faces. In a full van Kampen complex, we may still isolate individual 2k-PFs, but a collection of triangles incident to a vertex may give a union of many 2k-PFs. This is because, unlike with van Kampen diagrams, there is no requirement for the full complex to be a triangulated planar surface.

Statements and overview of the proof
In this section we shall give a very high level overview of the argument, since the technical details are quite extensive. For ease of comprehensibility, we shall not yet give fully precise statements of our results. The precise versions, and their proofs, will appear in the next sections.
As we have explained, eliminating short inconsistency proofs is equivalent to eliminating flappy spherical hypergraphs with at most b faces for some b. It will be simplest, for the purposes of the discussion in this section, to consider only the removal of flappy cuboctahedra, since this case contains much of the difficulty of the problem and will also be used as a running example in the technical details that follow.
The theorem is similar in spirit to a useful result from additive combinatorics that states that if A is a dense subset of a finite Abelian group G, H is another Abelian group, and φ : G → H is a map such that φ(x) + φ(y) = φ(z) + φ(w) for at least a proportion δ of the quadruples (x, y, z, w) ∈ A 4 such that x + y = z + w, then A has a dense subset B with the property that the restriction of φ to B is a Freiman homomorphism. As with that result, one of the main tools we shall use is a dependent random selection.
We can also think of the theorem (as well as the additive combinatorics result just mentioned) as a kind of removal lemma. We are given a linear hypergraph A with many cuboctahedra, and thus not too many flappy cuboctahedra, and we wish to discard a certain proportion of the faces of A to leave a hypergraph containing no flappy cuboctahedra. However, our result is also significantly different from traditional removal lemmas, since we start with a weaker assumption -we do not ask for the number of flappy cuboctahedra to be small, but just not quite as big as it might be -and a weaker conclusioninstead of removing a small fraction of the faces, we are allowed to remove all but a positive proportion of them. Nevertheless, the basic challenge we face is similar to the one that arises in the triangle removal lemma: that the number of flappy cuboctahedra is in general far more than the number of faces, and so we cannot simply discard a face from every flappy cuboctahedron.
Note that we would get round this problem if we could pass to a subset of A in which for each vertex u there are at most a bounded number of other vertices v for which there is a flappy cuboctahedron with its flaps on the vertices u and v. That is, it would be good for us if, after forming a graph by joining the label u to the label v when such a flappy cuboctahedron exists, we could pass to a dense subset of A so that this graph is of bounded degree. If we could do this, then all we would have to do is find a dense independent set of vertices in the graph (which, as is well known, can be done by simply choosing the vertices greedily), and the corresponding hypergraph is free of flappy cuboctahedra. So an intermediate goal is to pass to such a subset of A. (We also have to ensure that the vertices we keep span a positive proportion of the faces of the hypergraph, but that also turns out to be easy.) Let us call a 4-PF F in A popular if A contains linearly many 4-PFs all having the same petal vertices as F (a more precise definition will follow in the next section). Observe that since A contains many cuboctahedra, and cuboctahedra can be thought of as pairs of 4-PFs that share their petals, A contains many popular 4-PFs. As usual, "many" means within a constant factor of the trivial maximum.
Suppose that it were the case that every 4-PF in A were popular. Then we would have our desired intermediate statement. Very roughly, the argument is as follows. Suppose that we can find a vertex u for which there are K different vertices v i such that u and v i form the flaps of a flappy cuboctahedron. For each of these v i , we can build a whole collection of flappy cuboctahedra with flap vertices u and v i by using the popularity of 4-PFs to replace the 4-PFs in the corresponding flappy cuboctahedron by many different possibilities. Since there are four distinct non-degenerate 4-PFs in a flappy cuboctahedron as shown in Figure 5, it turns out that the popularity of 4-PFs allows us to find O(n 4 ) flappy cuboctahedra with flap vertices u and v i . This gives us KO(n 4 ) flappy cuboctahedra with a flap vertex at u, and taking the union over all choices of u we get KO(n 5 ) flappy cuboctahedra in total (where the O factor is independent of K and n). However, the maximum possible number of flappy cuboctahedra in a linear hypergraph in n 5 , so for K sufficiently large we get a contradiction.
It may seem counterintuitive that we obtained too many flappy cuboctahedra in the argument above while considering only one vertex u and a handful of vertices v i for the possible flaps. The reason it works is related to the fact that the trivial maximum for the number of flappy cuboctahedra, n 5 , is the same as the trivial maximum for the number of cuboctahedra, despite the fact that flappy cuboctahedra appear to have an extra degree of freedom.
Unfortunately for this approach, we cannot say that every 4-PF in A is popular, and, worse, we cannot even pass to a dense subset of A in which this is the case. However, it turns out that we can pass to a dense subset of A in which all 4-PFs, and indeed all 2r-PFs for bounded r, can be decomposed into popular PFs in many ways, and this is good enough for our purposes.
The precise definitions of the decompositions we use will follow in the next sections, but one may draw an analogy with the proof of the Balog-Szemerédi-Gowers theorem. In this proof, a key step is to take a dense graph G and pass to a dense subgraph H in which any pair of vertices is joined by many paths of length 4. This step is justified by first passing to a dense subgraph in which almost any pair of vertices is joined by many paths of length 2. In order to eliminate the almost, it is necessary to increase the complexity of the involved substructure: by aiming for paths of length 4 (rather than length 2) we obtain the required statement. Our situation is very similar. Using a simple point decomposition to be defined later, we may pass to a dense subset in which almost all 4-PFs can be decomposed into popular 4-PFs in many ways. But in order to eliminate the almost, we must instead deal with a considerably more complex decomposition, the full decomposition, which we shall also describe later.
With a complicated decomposition into popular PFs, the way to use popularity becomes much less clear, but the underlying idea is nevertheless similar. By starting with a collection of flappy structures, we repeatedly use the fact that constituent 2r-PFs can be decomposed into popular PFs to build up a large family of flappy structures whose size ultimately violates a trivial upper bound on the maximum possible number of such structures in a linear hypergraph. For this we need a somewhat abstract argument, but in Section 5.3 we present the cuboctahedron case separately, where it is possible to present a more explicit strategy.
Before we can give any details of this 'popular replacement' argument, we must first prove the following theorem (in a more precise form), which will form the basis of our repeated uses of popular decompositions. We are deliberately avoiding giving the precise meaning of popularly decomposable at this stage, since it is quite involved.
Theorem 2.8. Starting with a complete n × n Latin Square A containing at least ǫn 5 cuboctohedra, we may find a sequence The proof of this theorem will be the ultimate goal of the next two sections. The first of these sections, Section 3 will provide many of the technical foundations, but the decompositions used will ultimately be too complex for the latter parts of the arguments. In Section 4 we will be able to use results obtained in Section 3 to improve and simplify the decompositions used, giving a proof of Theorem 2.8. We will then progress to the popular replacement stage, in which we will flesh out the ideas sketched above.
We are now ready to begin the technical details.

Popularity and decompositions
In this section, it will be easiest to work entirely within the frameworks of grid representations and hypergraphs. Once we have a precise statement for Theorem 2.8, we will then consider the van Kampen representation, which will turn out to be a much more appropriate setting for our applications of Theorem 2.8, but there are some steps in the proof of this theorem which cannot be conveniently interpreted in the dual framework. We begin with a well-known bound for the number of 2r-cycles in a bipartite graph, which will underlie many of the calculations throughout this section. Lemma 3.1. Let A be a subset of the n×n grid of density α. Then A contains at least α 2r n 2r and at most α r n 2r distinct labelled 2r-cycles.
Proof. We may view A as a bipartite graph with vertex sets X and Y of size n and αn 2 edges. Let λ 1 , . . . , λ n be the singular values of the adjacency matrix of this graph. Then the number of 2r-cycles is equal to i λ 2r i . But the largest singular value is at least αn, so this sum is at least α 2r n 2r .
For the upper bound we observe that the number of 2r-cycles can be counted by summing, over all (ordered) r-tuples (x 1 , . . . , x r ) ∈ A r , the indicator that there is a 2r-cycle x 1 y 1 . . . x r y r . This sum is clearly at most |A| r = α r n 2r , since that is the number of ways of choosing (x 1 , . . . , x r ).
Our starting point is a partial Latin square A containing at least ǫn 5 cuboctahedra.
The lower bound on the cuboctahedron count in A requires that the labelling of a random rectangle is repeated, on average, many times. This motivates the following definition.
Definition 3.2. We say that a rectangle with labelling (a, b, c, d) in the square is θ-popular in A if the labelling (a, b, c, d) occurs at least θn times in A. More generally, we call a 2r-cycle θ-popular in A if the labelling of the cycle occurs at least θn times in A.
Note that the trivial maximum for the number of occurrences of a given labelling is n, since once one has chosen which of at most n points to choose with the first label, the condition that no label is repeated in any row or column implies that rest of the 2r-cycle is determined by the labelling.
The first step towards obtaining the decompositions we need is a dependent random selection that ensures that almost all 2r-cycles can be decomposed into popular rectangles in many ways. The decomposition we use at this stage will be referred to as the point decomposition. Definition 3.3. Given a 2r-cycle C = x 1 y 1 . . . x r y r in A, a point decomposition of C in A is a collection of 2r rectangles, all belonging to A and all sharing a point u, with the corners opposite to u being the x i and y i . We call the point decomposition ǫ-popular if each of the 2r rectangles is ǫ-popular in A.
Point decompositions for a rectangle and a 6-cycle are shown in Figure 6. Lemma 3.4. Let ǫ, δ > 0 and let k > 1 be a fixed integer. Given a partial n × n Latin square A containing at least ǫn 5 cuboctahedra, we can find a subset Proof. We define a graph G with vertex set given by [n] 2 corresponding to the cells of the n × n grid, and edges given by joining x to y if the rectangle with opposite corners x and y has all its vertices in A and is ǫ/2-popular. Let X be the number of edges in G and Y be the number of non-edges. An edge in G can be associated to a set of at least ǫn/2 (and at most n) cuboctahedra, by combining the rectangle corresponding to the edge with one of the other rectangles with the same labelling. Similarly, a non-edge in G can be associated to a set of less than ǫn/2 cuboctahedra. In such a way, all cuboctahedra of A are accounted for. Therefore Xn + Y ǫn/2 ≥ ǫn 5 ⇒ Xn + ǫn 5 /2 ≥ ǫn 5 so G has average degree at least ǫn 2 .
A 2r-cycle has at least ηn 2 different ǫ/2-popular point decompositions in A if the common neighbourhood (in G) of the 2r corner vertices has size at least ηn 2 .
We choose a vertex v in G uniformly at random, and let N (v) be the neighbourhood of v in G. This is our dependent random selection. It remains to prove that it works with positive probability.
Let C = x 1 y 1 . . . x r y r be a given 2r-cycle in A. Let N (C) be the set of vertices in G that are joined to all of x 1 , . . . , y r . We shall say that C is bad if Let Z r count the number of bad 2r-cycles in N (v). We have EZ r ≤ ηn 2r .
Our lower bound on the average degree of G also gives us that In particular, we have Therefore, letting B 1 = N (v) for this choice of v, we have β 1 n 2 = |N (v)| ≥ ǫn 2 /2 and the proportion of 2r-cycles in N (v) which are bad is at most Using Lemma 3.4 we may pass to a dense subset B 1 of A such that almost all 2r-cycles have many (within a constant factor of the trivial maximum) popular point decompositions in A. However, for our purposes the 'almost all' is not sufficient, and we need to use a more complicated decomposition to boost Lemma 3.4 into an 'all' statement.
The following definition introduces these more complex decompositions.
Definition 3.5. Let X be a fixed partial Latin square. Given a 2r-cycle If C ′ and all the rectangles are ǫ-popular, we call the collection of all the rectangles together with C ′ an ǫ-popular ring decomposition of C. An ǫ-popular full decomposition of C is a 2r-cycle C ′ together with ǫ-popular point decompositions of C ′ and the 2r rectangles just defined. A ring decomposition of a 4-cycle is shown in Figure 7 and a full decomposition is shown in Figure 8.
Remark 3.6. It will be important to keep track of the order (in n) of the trivial maxima for the number of ring decompositions and full decompositions of a 2r-cycle in a dense subset of an n × n Latin square. The number of ring decompositions is at most n 2r , since a ring decomposition of a 2r-cycle C is uniquely defined by a 2r-cycle C ′ . In a full decomposition, C ′ and all the rectangles in the ring decomposition are given point decompositions, each of which can be chosen in at most n 2 ways. So the number of full decompositions is at most n 2r+2(2r+1) = n 6r+2 . Our next step is to pass to a subset B 2 of B 1 such that all 2r-cycles in B 2 have within a constant factor of the trivial maximum number of ring decompositions. Since almost all 2r-cycles in B 1 have popular point decompositions, we will then be able to pass to a further subset B 3 of B 2 so that all 2r-cycles in B 3 have within a constant factor of the trivial maximum number of ǫ-popular full decompositions.
We need a technical lemma to achieve the first step of this process. Lemma 3.7. Let k be a fixed integer greater than 1. Let G be a bipartite graph with vertex classes X, Y of size n and edge density δ. Then we can pass to subsets X ′ ⊂ X and Y ′ ⊂ Y , each of size at least δ 2 n/16, such that the edge density in G ′ = G| X ′ ×Y ′ is at least δ/4 and for any 2 ≤ r ≤ k and any choice of r vertices x 1 , . . . , x r ∈ X ′ and y 1 , . . . , y r ∈ Y ′ we have at least (δ/2) 20k 2 n 2r choices of 2r-cycle u 1 v 1 . . . u r v r in G with x i u i ∈ E(G) and y i v i ∈ E(G) for each i = 1, . . . , r.
Proof. Let us begin by discarding all vertices from X of degree smaller than δn/2. This leaves a set X 1 ⊂ X of size at least δn/2. Let η = δ k /2 3k+2 k and ν = δ 3k+2 /2 5k+5 k. We now use a dependent random selection argument that allows us to pass to a subset X 2 ⊂ X 1 of size at least (δ 2 /8)n with the property that for a (1 − η) proportion of choices (x 1 , . . . , x k+1 ) from X 2 we have at least νn vertices in the shared neighbourhood Γ(x 1 , . . . , x k+1 ) ⊂ Y .
Since each vertex in X 3 has at least δn/2 neighbours in Y , the number of edges from Y to X 3 is at least δn|X 3 |/2. We now pass to the subset Y 1 ⊂ Y that consists of all vertices with at least δ|X 3 |/4 edges into X 3 . We note that |Y 1 | ≥ δn/4. Now let x 1 , . . . , x k be chosen from X 3 and y 1 , . . . , y k from Y 1 . Let A 1 , . . . , A k be the neighbourhoods of the y i in X 3 -note that |A i | ≥ δ|X 3 |/4. Let T = A 1 × · · · × A k and note that it has cardinality at least (δ|X 3 |/4) k ≥ (δ|X 2 |/8) k .
By the choice of X 3 , we know that the number of choices of u 1 , . . . , u k ∈ X 2 such that |Γ(x i , u 1 , . . . , u k )| < νn is at most 2η|X 2 | k for each i = 1, . . . , k. Since 2ηk = (δ/8) k /2 and |T | = (δ|X 2 |/8) k , there must be at least (δ|X 2 |/8) k /2 choices of (a 1 , . . . , a k ) ∈ T such that |Γ(x i , a 1 , . . . , a k )| ≥ νn for each i = 1, . . . , k. Observe that for any such choice of (a 1 , . . . , a k ) and for any choice of b i ∈ Γ(x i , a 1 , . . . , a k ) we get a complete bipartite graph between the a i and the b i as well as the edges x i b i and y i a i for each i.
The number of choices of the a i and b i from the above paragraph is at least Observe that the subgraph induced by the x i , y j , a k and b l contains a 2rcycle a 1 b 1 . . . a r b r as well as the edges x i a i and y i b i for each i. Moreover, the edge density in X 3 × Y 1 is at least δ/4, so taking X ′ = X 3 and Y ′ = Y 1 , the result follows.
Remark 3.8. It is well known that given a dense bipartite graph G, we may pass to a dense subgraph H such that any two vertices of H are joined by many P 3 s in G. The proof of Lemma 3.7 shows that a considerable generalization of this statement is available for relatively little extra effort: given any fixed bipartite graph H ′ with t special vertices v 1 , . . . , v t such that the shortest path from any v i to any v j has length at least 3, we may pass to a dense subgraph H of G such that for any u 1 , . . . , u t the number of isomorphic copies φ(H ′ ) of H ′ in H with φ(v i ) = u i for all i is within a constant of the trivial maximum. The P 3 statement is the special case where H ′ is a path of length 3 and v 1 and v 2 are its endpoints. (A similar observation was made in a blog post of Tao [9], but he was content to discuss just the special case he needed, and he left the proof as an exercise for the reader.) When viewed as a statement about subsets of the grid, Lemma 3.7 states that we may pass to a dense subset B 2 ⊂ B 1 such that all 2r-cycles in B 2 have many ring decompositions in B 1 . We must now pass to a further subset in which all 2r-cycles have many popular full decompositions.
In the statement of the following lemma we shall assume that we are given some property of cycles, and cycles that have that property will be called 'good'. The reason we work at this level of abstraction is partly that we can, and partly that we shall apply the lemma twice, with different definitions of 'good' each time. Lemma 3.9. Let β, δ, γ > 0. Let B be a subset of an n × n grid of density at least β with the property that for each 2 ≤ r ≤ k at least a proportion 1 − δ of 2r-cycles in B are good. Let ι = (β/2) 20k 2 n 2r /3. If δ ≤ (β/2) 30k 2 then we can find a subset B ′ of B with density β ′ ≥ β 5 2 −11 with the property that any 2r-cycle in B ′ has at least ιn 2r different ring decompositions into good cycles in B.
Proof. Recall that a ring decomposition of a cycle C involves a paired cycle C ′ , which we shall refer to as the back face, and 2r rectangles between these cycles, which we shall refer to as the side faces. We shall call a ring decomposition of a cycle C good if the cycle C ′ making up the back face and all the rectangles involved in the side faces are good.
We shall call a 2r-cycle C indecomposable if it has fewer than ιn 2r good ring decompositions. We shall say that an indecomposable 2r-cycle is bad on the back face if at least one third of its ring decompositions have a bad (ie not good) cycle on the back face, and bad on the side faces if at least one third of its decompositions have a bad rectangle on a side face.
In parallel with the subset B of the Latin square, we shall also consider the corresponding bipartite graph G in which the rows and columns form the vertex sets and the points of B form the edges.
We begin by applying Lemma 3.7. This allows us to pass to a subset B * of B of density at least (β 2 /16) 2 (β/4) = β 5 /2 10 with the property that each 2r-cycle in B * has at least (β/2) 20k 2 n 2r ring decompositions in B.
Consider a given 2r-cycle C = x 1 y 1 . . . x r y r in B ′ . Suppose that C is bad on the back face. Then there are at least (β/2) 20k 2 n 2r /3 bad 2r-cycles in B. But only a proportion δ of all 2r-cycles in B are bad, and the maximum possible number of 2r-cycles in B is β r n 2r . So if δ < (β/2) 20k 2 /3 then we have a contradiction.
Therefore no 2r-cycles are bad on the back face (for any 2 ≤ r ≤ k), and so all indecomposable 2r-cycles are bad on a side face. If for each r there are no more than β 5 2 −10 n 2 /4k 2 vertex disjoint indecomposable 2r-cycles, then discarding all points from a maximal vertex-disjoint set of indecomposable cycles we discard at most β 5 2 −10 n 2 /2 points, leaving a set of density at least β 5 2 −10 /2 with no indecomposable cycles (and so we are done).
Thus, for some r it must be possible to find at least β 5 2 −10 n 2 /4k 2 vertex disjoint indecomposable 2r-cycles. Since there are no cycles bad on the back face, all these cycles are bad on a side face. This means that each of these 2r-cycles has at least (β/2) 20k 2 n 2r /3 ring decompositions involving a bad rectangle as a side face. Each bad rectangle can belong to at most n 2r−2 ring decompositions, so we get at least (β/2) 20k 2 n 2 /3 bad rectangles sharing a vertex with each of these indecomposable cycles. This gives us at least (β/2) 20k 2 n 2 /3 β 5 2 −10 n 2 /4k 2 > (β/2) 30k 2 n 4 bad rectangles in B.
But the number of bad rectangles is at most δβ 2 n 4 , so if δ ≤ (β/2) 30k 2 then we have a contradiction.
By applying Lemmas 3.4, 3.7 and 3.9 we will be able to pass to a dense subset B of A in which all 2r-cycles have many popular full decompositions. This will still not be sufficient for our later purposes, which will require obtaining ǫ-popular ring decompositions. So before we fill in the details, we shall give more technical lemmas that will help us with this objective. Lemma 3.10. Let A be a partial Latin square and let B be a subset of A. Suppose that every 2r-cycle in B has at least γn 6r+2 different ǫ-popular full decompositions in A. Then for every (a 1 , . . . , a 2r−1 ) the number of a 2r such that (a 1 , . . . , a 2r ) is a labelling of some 2r-cycle in B is at most ǫ −10r γ −1 .
Proof. Suppose that we have a tuple (a 1 , . . . , a 2r−1 ) such that the set {x i } of possible labelling completions has size at least K. For each completion we can find γn 6r+2 ǫ-popular full decompositions.
Let us think about a typical one of these decompositions as follows. (For the discussion that follows, it may well help to look back at Figure 8.) We begin with a 2r-cycle C with points x 1 , y 1 , . . . , x r , y r , where x i has label a 2i−1 when 1 ≤ i ≤ k, y i has label a 2i when 1 ≤ i ≤ k − 1, and we do not know about the label attached to the point y r . (It is important to be clear that the x i and y i are elements of [n] 2 and not of [n] in this discussion.) Next, we have another 2r-cycle C ′ with points x ′ 1 , y ′ 1 , . . . , x ′ r , y ′ r . However, it is 'reflected', in the sense that whereas x i and y i are in the same row, x ′ i and y ′ i are in the same column, and whereas y i and x i+1 are in the same column, y ′ i and x ′ i+1 are in the same row. Now we complete the cycles C and C ′ to a ring decomposition by adding in 2r points u 1 , v 1 , . . . , u r , v r , where u i is in the row that contains x i and y i and the column that contains x ′ i and y ′ i , and v i is in the column that contains y i and x i+1 and the row that contains y ′ i and x ′ i+1 . (The points u i and v i do not form a 2r-cycle.) The rectangles of this ring decomposition are given by To form a point decomposition, we now add points p i and q i , and form the four rectangles that have a vertex in R i and the opposite vertex at p i , and the four rectangles that have a vertex in S i and the opposite vertex at q i . As well as the point p i , we have to add four more points to R i in order to complete the decomposition into four rectangles. Of these four points, let r i and s i be the ones in the same row and the same column as x i ; we shall not bother giving names to the other two. Similarly, let w i and z i be the points in the same column and row as y i that are part of the decomposition of S i into four rectangles. Now let us consider a certain subset of the (variable set of) points of the full decomposition. We shall take the points u i and v i , the points r i and s i , and the points w i and z i with 1 ≤ i ≤ r − 1. We shall also take the two points from the point decomposition of C ′ that are in the same row and column as x ′ 1 , and the two points from the decomposition of the rectangle S r that are in the same row and column as y ′ r . This makes a total of 6r + 2 points, so by the pigeonhole principle we can find some choice of labellings of these 6r + 2 points that occurs at least Kγ times amongst the set of ǫ-popular full decompositions of 2r-cycles C for which the points x 1 , y 1 , . . . , x r are labelled a 1 , . . . , a 2r−1 .
Observe that a full decomposition of a given cycle is uniquely determined by the way it is labelled, since once a point has been specified, any other point in the same row or column is then determined by its label. Observe also that since each rectangle in an ǫ-popular full decomposition is ǫ-popular, given three labels of any rectangle there are at most 1/ǫ different choices of label for the fourth, since otherwise there would be more than n rectangles that shared three labels, which is impossible.
Our aim now is use this observation to show that once the labellings of the 6r + 2 points specified earlier are given, the number of possible labellings of the remaining points is at most ǫ −10r . Since we know that it is also at least Kγ, this will give us our desired upper bound on K.
To do this, we consider the natural closure operation, where three points of a rectangle generate the fourth. The observation implies that if we know the labels at some set of points that generates the entire decomposition, and if there are t other points, then the number of possible ways of completing the labelling is at most ǫ −t . We apply this to the set of 6r + 2 points we have chosen.
Note first that the side faces of the full decomposition, apart from the rectangle containing the unfixed point of C, each contain five points from the set in their point decompositions, and furthermore these five generate the other four. Therefore the closure of the set contains all the points in all the point decompositions of the rectangles R 1 , . . . , R r and S 1 , . . . , S r−1 . These include the points x ′ 1 , . . . , x ′ r and y ′ 1 , . . . , y ′ r−1 . Since we also have the points in the same row and column as x ′ 1 , we obtain the central point of the back face of the decomposition, and using this we can work round the cycle and obtain all the points in its point decomposition. And now we have five points of the rectangle S r that generate the others (since they lie along two edges), which shows that the 6r + 2 points we choose generate all the points of the full decomposition. It is not hard to check that a full decomposition contains 18r + 1 points, so we find, as promised, that the number of labellings given the labels at the 6r + 2 points and 2r − 1 of the points of C is at most ǫ −10r , as claimed, and this proves that K ≤ ǫ −10r γ.
By combining these lemmas we are now at a stage where we can pass to a subset B of A in which for each 2 ≤ r ≤ k the number of different ways of completing the labelling of a 2r-cycle in B given 2r − 1 of its labels is bounded. In order to state this concisely, we introduce the following definition.
Definition 3.11. Let B be a subset of an n × n Latin square. Suppose that for any sequence of 2r −1 labels, the number of different labellings of a 2r-cycle in B with its first 2r − 1 points labelled using that sequence is always at most C. Then we say that the 2r-cycle completion operation in B is C-well-defined.
In particular, if for any three labels a, b, c the number of labels d for which there is a rectangle (thought of as an ordered quadruple of points) in B labelled a, b, c, d is at most C, then the 4-cycle completion operation in B is C-welldefined.
With this definition, we can describe our progress so far as follows.
Theorem 3.12. Let ǫ > 0 and let k ≥ 2 be a fixed positive integer. Let A be a partial Latin square containing at least ǫn 5 cuboctohedra. Then we can find a subset B ⊂ A of density β ≥ ǫ 5 2 −16 with the property that for each 2 ≤ r ≤ k the 2r-cycle completion operation in B is (ǫ/4) −94k 3 -well-defined.
Proof. We now apply Lemma 3.4 with δ = (ǫ/4) 30k 2 . This allows us to pass to a subset B 1 ⊂ A of density β 1 ≥ ǫ/2 such that for each 2 ≤ r ≤ k a proportion at least 1 − δ of 2r-cycles in B 1 have at least (ǫ/4) 30k 2 (ǫ/2) 2k+1 n 2 /k ≥ (ǫ/4) 32k 2 n 2 different ǫ/2-popular point decompositions. From here we apply Lemma 3.9, where we take the property 'good' for a 2r-cycle to mean that the cycle has at least (ǫ/4) 32k 2 n 2 different ǫ/2-popular point decompositions. We can do this since B 1 has density β 1 ≥ ǫ/2, so δ ≤ (β 1 /2) 30k 2 . The lemma gives us a subset B 2 of B 1 of density (in the original n × n grid) β 2 ≥ β 5 1 2 −11 ≥ ǫ 5 2 −16 in which every 2r-cycle in B 2 has at least different ǫ/2-popular full decompositions. (The first bracket on the left is a lower bound for the number of good ring decompositions, and the second is a lower bound for the number of ways of converting each one into an ǫ/2-popular full decomposition.) This allows us to apply Lemma 3.10 with γ = (ǫ/4) 91k 3 , which implies the result (using the fact that 10r ≤ 3k 3 ) with B = B 2 .
We draw attention once again to the analogy with the result from additive combinatorics mentioned earlier. Given a map φ : G → H between abelian groups such that φ respects a positive proportion of additive quadruples in a subset A of G, we can pass to a subset B of A such that the restriction of φ to B is a Freiman homomorphism. One way of proving of this result begins by showing that it is possible to pass to a set B ′ such that for each w, the number of values that φ(x) + φ(y) − φ(z) can take when x + y − z = w is bounded independently of the size of G. This first step mirrors what we have achieved thus far.
To complete the proof, it is necessary to reduce C to 1. At this point the analogy breaks down somewhat, since in the additive problem, Plünnecke's inequality is used, but our setting does not involve an ambient group so we do not appear to have an analogous tool.

Next steps
Theorem 3.12 constitutes significant progress towards our positive result, but reducing C to 1 requires a number of further steps.
Perhaps surprisingly, the first step will involve abandoning full decompositions. While full decompositions are easy to understand in the grid setting, they are more difficult to visualize in the hypergraph setting, because of the presence of vertices that are contained in more than two faces, which also means that they are not surfaces in the van Kampen representation, but complexes in which four or more faces can share an edge. For these reasons, they are not a natural tool for what is to come. Instead, we shall use the C-well-defined property to start again, reapplying Lemma 3.9 with the added information. This will allow us to find ring decompositions into popular rectangles (rather than into rectangles with many popular point decompositions), which will greatly simplify the structures we have to consider.
Following this, we will introduce a notion of shattered ring decomposition. These objects will have natural van Kampen representations, facilitating a move into the dual setting which will provide the most natural backdrop for the final part of the argument. (The word 'shattered' here has nothing to do with VC-dimension.) In the final step we shall describe a 'popular replacement' argument, which will complete the proof of the positive result by reducing C to 1. We shall present this step first for the cuboctahedron, and then give the argument in the general case.

Simplifying the decompositions
In this section, we shall use Theorem 3.12 as a tool in a 'second pass' through the arguments in Section 3. Our first lemma for this section shows that the property of C-well-definedness is sufficient to ensure that almost all of the cycles in B are popular (for a lower threshold of popularity). This is significant because it enables us to repeat the above process but eliminates the need for Lemma 3.4 and point decompositions. We will simply be able to reapply Lemma 3.9 to the subset B with a different meaning for the property 'good': now it will mean 'θ-popular', for some appropriate θ, rather than 'having many popular point decompositions'. Lemma 4.1. Let B be an n×n partial Latin square of density β. Suppose that the 2r-cycle completion operation in B is C-well-defined. Let δ, θ be such that β 2r δθ −1 > C. Then the proportion of 2r-cycles in B that are not θ-popular is at most δ.
Proof. By Lemma 3.1, the number of 2r-cycles in B is at least β 2r n 2r . Therefore, given a tuple (a 1 , . . . , a 2r−1 ) of labels, the number of 2r-cycles with first 2r − 1 labels (a 1 , . . . , a 2r−1 ) is on average at least β 2r n. However, since the 2rcycle completion operation is C-well-defined we have further that the number of different a 2r completing a 2r-cycle labelling (a 1 , . . . , a 2r ) in B is at most C.
If a proportion greater than δ of 2r-cycles are not θ-popular, then by averaging there must be some (a 1 , . . . , a 2r−1 ) such that a proportion greater than δ of 2r-cycles starting with these labels are not θ-popular. But that means that there must be more than β 2r δθ −1 > C completions which is a contradiction to the assumption that the 2r-cycle completion operation in B is C-well-defined.
We are now ready to put together our technical lemmas to prove the following proposition, which will be the main tool in the proof of Theorem 2.8.
Proof. We begin by applying Theorem 3.12. This allows us to pass to a subset B 1 ⊂ A of density β 1 ≥ ǫ 5 2 −16 with the property that for each 2 ≤ r ≤ k the 2r-cycle completion operation in B is C-well-defined, where C = (ǫ/4) −94k 3 . By Lemma 4.1 we see that a proportion greater than 1 − δ of 2r-cycles (for each 2 ≤ r ≤ k) in B are θ-popular for any choice of θ < β 2k 1 δ/C. We now apply Lemma 3.9 again, but taking the property 'good' for a 2rcycle to mean that the cycle is θ-popular. To do this, we take δ = (β 1 /2) 30k 2 . It is not hard to check that with this value of δ, we may take some θ ≥ (ǫ/8) 172k 3 .
Since B 2 is a subset of B 1 , the rectangle completion operation in B 2 is still C-well-defined. By Lemma 3.1 the number of rectangles in B 2 is at least β 4 2 n 4 , and since cuboctahedra are counted by pairs of rectangles with the same labelling, the cuboctahedron count is minimized when the the number of rectangles with each labelling is as balanced as possible (by convexity). For each triple of labels (a, b, c) the number of possible completions d is at most C, so the number of cuboctahedra is at least (β 4 2 n/C) 2 n 3 = (β 8 2 /C 2 )n 5 ≥ (ǫ/4) 250k 3 n 5 as required.
We now observe that a cuboctahedron, which consists of two identically labelled rectangles, still corresponds to a cuboctahedron if we permute the coordinates of the points. Viewing A as a 3-uniform, linear hypergraph we may associate it with a partial Latin square by designating any particular coordinate to represent the 'label coordinate' and the other two to represent the row and column coordinates. The cuboctahedron count in A does not depend on which coordinate we choose.
This observation allows us to deduce Theorem 2.8 straightforwardly from Proposition 4.2. In Section 2.3 we did not state Theorem 2.8 precisely, so we do so here. Theorem 4.3. Fix ǫ ≤ 1/4 and k ≥ 2. Let A be a 3-uniform, linear hypergraph that contains at least ǫn 5 cuboctahedra. Then there exists a sequence A = A 0 ⊃ A 1 ⊃ . . . such that each A i has density at least α i (ǫ, k) and A i contains at least ǫ i (ǫ, k)n 5 cuboctahedra, and for each r = 2, . . . , k, every 2r-PF in A i is θ i (ǫ, k)popularly decomposable in A i−1 in at least γ i (ǫ, k)n 2r different ways. Each of the parameters α i , ǫ i , θ i , γ i may be chosen to be at least ǫ 2 25i k 9i .
Proof. Given a 3-uniform, linear hypergraph B we define three different partially labelled n × n grids, B (1) , B (2) and B (3) . If (x, y, z) is a face of B, then we put the label z in position (x, y) of B (1) , y in position (z, x) of B (2) , and x in position (y, z) of B (3) .
Once we have chosen A i , we first consider A i . We apply Proposition 4.2 to pass to a dense subset B (1) i in which all 2r-cycles have at least γ i n 2r different θ i -popular ring decompositions in A (1) i for 2 ≤ r ≤ k. We then 'rotate coordinates' to obtain the partially labelled grid B (2) i . Since rotation does not change the number of cuboctahedra, we are still in a position to apply Proposition 4.2 (albeit with different parameters) to obtain a subset C (2) i in which all 2r-cycles have at least γ ′ i n 2r different θ ′ i -popular ring decompositions in A (2) i for 2 ≤ r ≤ k. Finally we rotate again to obtain a set C i , to which we apply Proposition 4.2 again to obtain a subset D for 2 ≤ r ≤ k. If the density of A i is α i and the number of cuboctahedra is ǫ i n 5 , then the density of B i is at least (ǫ i /8) 31 . Moreover, the cuboctahedron count of B i is at least (ǫ i /4) 250k 3 n 5 . Therefore, the density of C i is at least and the cuboctahedron count of C i is at least This implies that the density of D i is at least ((ǫ i /4) 2 16 k 6 /8) 31 ≥ (ǫ i /4) 2 21 k 6 and the cuboctahedron count is at least ((ǫ i /4) 2 16 k 6 /4) 250k 3 n 5 ≥ (ǫ i /4) 2 24 k 9 n 5 .
Lastly, we also have i since the cuboctahedron counts of A i and B i are larger than that of C i . This gives us a subgraph D i of A i which is still dense, and has the property that any 2r-PF (for 2 ≤ r ≤ k) in D i is popularly decomposable in A i . We thus let A i+1 = D i .
To close this section, we shall briefly discuss what it means for a 2r-PF to be θ i (ǫ, k)-popularly decomposable in at least γ i (ǫ, k)n 2r different ways, and how this is going to be used in later sections. For this purpose, we need another definition.
Definition 4.4. Let C be a 2r-cycle x 1 y 1 . . . x r y r with x 1 and y 1 sharing a row. A shattered ring decomposition of C consists of a 2r-cycle x ′ 1 y ′ 1 . . . x ′ r y ′ r with x ′ 1 and y ′ 1 sharing a column, together with rectangles where u i shares a row with x ′′ i and w i shares a column with y ′′ i ) such that for each i, x i and x ′′ i have the same label, x ′ i and x ′′′ i have the same label, y i and y ′′ i have the same label, y ′ i and y ′′′ i have the same label, u i and z i have the same label, and w i and v i+1 have the same label.
The reason for this terminology is that one obtains a shattered ring decomposition if one begins with a ring decomposition and then replaces the back face and all the side faces by other cycles that have the same labellings. The conditions above are precisely the ones that will hold when we do this: a point in one cycle has to have the same label as a point in another cycle if before the 'shattering' they were the same point. Note that to say that a ring decomposition is popular is precisely to say that one can obtain many shattered ring decompositions from it in this way.
Although we have formulated this definition in grid terms, referring to cycles and labels, it has a natural description in hypergraph terms.
Definition 4.5. Let F be a 2r-PF. A shattered ring decomposition of F consists of a second 2r-PF F ′ with petals in the same vertex class, together with 2r 4-PFs, each of has a petal equal to a petal of F and its opposite petal equal to the corresponding petal of F ′ , and each of which shares a petal with its predecessor and a petal with its successor, in such a way that the assignment of vertex classes to the inner vertices of each 4-PF is the reflection of the assigment of classes to its predecessor.
The hypergraph forms of shattered ring decompositions of a 4-PF, 6-PF and 8-PF are shown in Figure 9 (with the 4-PF, 6-PF and 8-PF not drawntheir petals will coincide with the degree-1 vertices in the diagrams).
If a 2r-PF F is θ i (ǫ, k)-popularly decomposable in at least γ i (ǫ, k)n 2r different ways, this means that there are at least γ i (ǫ, k)n 2r different ring decompositions of F into PFs that are θ i (ǫ, k)-popular. If a 2s-PF F ′ is θ i (ǫ, k)-popular, this means that there are at least θ i (ǫ, k)n different 2s-PFs that share all their petals with F ′ . This gives us the following lemma. Lemma 4.6. Let A be a tripartite, linear hypergraph with n vertices in each class. Let F be a 2r-PF which is θ i (ǫ, k)-popularly decomposable in A in at least γ i (ǫ, k)n 2r different ways. Then F has at least γ i θ 2r+1 i n 4r+1 different shattered ring decompositions.
Proof. As discussed above, there are at least γ i n 2r different ring decompositions of F into PFs which are θ i -popular. Each of these popular PFs can be replaced with one of θ i n different PFs sharing petals with the original, giving a total of (θ i n) 2r+1 further choices, from which the result follows.
Broadly speaking, the arguments in the next section will involve starting with a particular hypergraph H and repeatedly replacing 2r-PFs in H with shattered ring decompositions. Keeping track of the number of ways these replacements are possible will be achieved using Lemma 4.6.

The popular replacement argument
Let A be a partially labelled grid with many cuboctahedra. The next part of the argument describes how we pass to a dense subset of A in which there are no small, flappy structures. Since the details will get somewhat involved, it will be instructive to begin with the case of flappy cuboctahedra, which will be enough to make the general strategy clear.
For this stage of the argument, it is most natural and convenient to use the van Kampen representation. By interpreting Theorem 4.3 in this framework, we will be able to view our popular replacements as a kind of 'unfixing' process: we start with a fixed triangulated surface, and little by little we 'unfix' vertices in order to convert it into a variable surface, at each stage ensuring that the number of possibilities for the variable surface is within a constant of the trivial maximum, given the points that are still fixed. This idea will be explained in more detail in the next section.

Overview
Recall from Section 2.2.2 that we may regard a partial Latin square as a van Kampen complex ; a simplicial complex built from oriented triangles that correspond to the filled points of the grid.
In this Kampen complex, the 2r-PFs are 2r-gons triangulated using 2r triangles that each contain a single internal vertex. Given such a collection C of triangular faces, let F c be the corresponding hypergraph 2r-PF. If F C is θ-popularly decomposable in γn 2r different ways then Lemma 4.6 gives us γθ 2r+1 n 4r+1 different shattered ring decompositions of F C . Each of these shattered ring decompositions corresponds to a certain triangulated surface whose boundary coincides with the boundary of C. The boundary of C is a 2r-gon, and the patch of surface corresponding to a shattered ring decomposition of a 2r-PF consists of an inner 2r-gon connected to the outer 2r-gon with 2r edges between corresponding vertices, with the whole picture then triangulated by adding a new vertex to the center of each face -this is shown in Figure 10 for a 4PF. (The 4PF is not shown, apart from its boundary, which consists of the outer four edges in the diagram.) The overall structure of our argument will be as follows. We start with a dense partial Latin square A, represented as a hypergraph. After applying Theorem 4.3 to create our sequence A = A 0 ⊃ A 1 ⊃ . . . , we shall fix some s and pick a particular small flappy structure H 0 (such as the flappy cuboctahedron) and consider the auxiliary graph on the faces of A s formed by joining two faces if they form the flaps of a copy of H 0 . If the maximum degree of this auxiliary graph is bounded then we may pass to a dense independent set, which corresponds to a dense subhypergraph that avoids any copies of the chosen flappy structure.
Otherwise we may find a vertex of large degree in the auxiliary graph, which corresponds to a face of A s that is contained in many different copies of H 0 , each with a different 'opposite flap'. Each of these copies corresponds to a copy of a certain van Kampen diagram K 0 with boundary of size 2 in the van Kampen complex of A s . Given one of these diagrams, we perform our unfixing process. Initially, we say that all edges are fixed, meaning that we have specified precisely one copy of K 0 . We then find a 2r-PF in this copy (which in this representation is better described as a triangulated 2r-gon) and use the popular decomposability obtained from Theorem 4.3 to replace it with a new triangulated surface, resulting in a larger van Kampen diagram K 1 that still has a boundary of length 2. The copies of K 1 obtained lie in A s−1 ⊃ A s . We obtain Ω(n 4r+1 ) such copies (the trivial maximum being n 4r+1 ). We describe the situation with this set of copies of K 1 that results from replacing our chosen 2r-PF in our chosen copy of K 0 by saying that the internal edges in the chosen 2r-PF are now unfixed, since they may differ across the collection of copies. Note that the number of fixed edges has decreased.
We may continue this process, choosing at each step a 2r-PF with some fixed internal edges from K i and using the popular decomposability to generate a larger collection of copies of a van Kampen diagram K i+1 that lies in A s−i−1 , with more edges unfixed. If s is chosen sufficiently large relative to the area of K 0 then we may proceed until we obtain a collection C of copies of some diagram K t in which the two boundary edges are fixed but every edge incident to an vertex on the inside of the diagram is unfixed. One of the boundary edges corresponds to our initial vertex of high degree in the auxiliary graph. By repeating this process for each choice of neighbour of our chosen vertex from that auxiliary graph, we obtain many different collections of copies of K t , each of which share one of the two boundary edges. By taking the union of all of these collections, we end up violating the trivial upper bound on the maximum possible number of copies of K t in a van Kampen complex, giving us our contradiction.
The next sections will expand on the details required for this argument. As promised earlier, we will give a detailed account of the argument when H 0 is taken to be the flappy cuboctahedron before tackling the necessary generalizations, but before we embark on this it will be necessary to work out the trivial maximum for the number of copies of a given van Kampen diagram with a given set of fixed edges in a dense van Kampen complex. The main task of this section will then be to verify that during the unfixing process, the number of copies we obtain is always within a constant of the appropriate trivial maximum, so that in particular this is the case when we reach the unfixed van Kampen diagram K t , which allows us to obtain our desired contradiction.

The maximum number of copies of a partially fixed van Kampen diagram
Define a partially fixed van Kampen diagram to be the object obtained by taking a van Kampen diagram (of the special kind considered in this paper), forgetting the labels on the edges to obtain the underlying triangulated surface, then colouring some edges red. The red edges will be called unfixed. We call a face unfixed if it contains at least one unfixed edge. Edges that are not unfixed are fixed.
If (e 1 , . . . , e r ) is an ordering of the fixed edges of a van Kampen diagram K, L is a van Kampen complex, and (a 1 , . . . , a r ) is a sequence of labels of edges in L (which represent rows, columns, or labels in the partial Latin square from which L is built) then a copy of K in L is a homomorphism φ : K → L that takes each e i to an edge of L that is labelled a i . Less formally, it is a copy of K inside L with given assignments for the fixed edges.
By the trivial maximum number of copies of a partially fixed van Kampen diagram K we mean the maximum possible number of copies of K in a van Kampen complex associated with a dense partial Latin square A with some given choice of labels for the fixed edges. Lemma 5.1. Let K be a partially fixed van Kampen diagram in which the faces form a simply connected set, and the boundary edges are fixed. Then the trivial maximum number of copies of K is at most n V I where V I is the number of internal vertices -that is, vertices that do not lie on the boundary.
Proof. We are free to assume that all edges which are not boundary edges are unfixed, since the fewer edges are fixed the fewer restrictions we have on our copies of K.
The proof is by induction on the number of faces of K. The result is trivial when K is a single face with all three edges fixed. Now let K be a partially fixed van Kampen diagram with at least two faces, in which all edges are unfixed except the boundary edges. Suppose first that there is a face that has two boundary edges. Then the third edge must be internal. The label of this edge is determined by the labels on the two boundary edges. If we remove the face f and fix its internal edge, then we obtain a van Kampen diagram that still has V I internal vertices, and hence at most n V I copies, so we are done.
If K does not have such a face, then we split into two further cases. Suppose first that K has an internal vertex: that is, a vertex that does not lie on the boundary. Then there must be an internal vertex that is joined by an edge to a boundary vertex w. The neighbours of w form a path from its predecessor along the boundary to its successor. Let v be the first internal vertex along this path. Then v is joined to w and to its predecessor, which gives us a face that has one boundary edge and two internal edges. We can choose the label for one of the internal edges in at most n ways, and that determines the label for the other. Having done so, if we remove the face and fix the two internal edges, we obtain a simply connected van Kampen diagram K ′ with one less internal vertex. For each of the at most n choices of labelling for the newly fixed edges we get at most n V I −1 copies of K ′ , by the inductive hypothesis, so the number of copies of K is at most n V I as required.
The final case is where K does not have any internal vertices or any faces with two boundary edges. This case cannot in fact occur. Indeed, if it did, then note that the number of vertices would equal the number of boundary edges, and the number of faces would be at most the number of internal edges (since each face would contain at least two internal edges and each internal edge would be contained in two faces). It would follow that V − E + F ≤ 0, contradicting Euler's formula (which would give V − E + F = 1, since we are not counting the external face as a face).
A simple example that is important for us is a 2r-PF. As we have mentioned, this becomes a triangulated 2r-gon with a single internal vertex in the middle, so if the boundary is fixed, then we are left with at most n possibilities. In the grid picture, this corresponds to the fact that if we know the labels of a 2r-cycle, then the first point of the cycle (which can be chosen in at most n ways) determines the rest of the cycle if it exists.
An even more important example is where K is taken to be the van Kampen diagram of a shattered ring decomposition of a 2r-PF, again with the boundary cycle fixed. This bounds the maximum possible number of van Kampen diagrams of shattered ring decompositions of a given 2r-cycle, since all such diagrams share the boundary of the original 2r-PF. The number of internal vertices is 4r + 1, since the opposite 2r-cycle contributes 2r vertices, its central vertex contributes 1 vertex, and each of the 2r triangulated rectangles has a further internal vertex in the middle. Thus, Lemma 5.1 gives an upper bound of n 4r+1 shattered ring decompositions of a given 2r-PF. But Lemma 4.6 gives us Ω(n 4r+1 ) such decompositions, so we see again that our machinery from the previous section provides within a constant factor of the maximum number of such objects.

The flappy cuboctahedron case
The aim of this section is to apply the results of the previous sections in order to pass to a dense subset of our given partial Latin square A in which there are no flappy cuboctahedra.
Recall that if B is a partial Latin square, we may define an auxiliary graph G(B) on the set of rows, columns and labels of B by joining vertices x and y if there is a flappy cuboctahedron in B with its petals at the vertices x and y. In van Kampen terms this is telling us that there is an octahedron with one of its edges 'slit' into two, with those boundary edges corresponding to x and y. In grid terms, what an edge looks like depends on the types of the vertices x and y, but if, for example, they are label vertices corresponding to the labels d and d ′ , then there will be an edge between them if there is a rectangle with labels a, b, c, d and another rectangle with labels a, b, c, d ′ . As we have already discussed, if we can prove that this auxiliary graph is of bounded degree, then we will be able to pass to a dense independent subset of the vertices and thereby eliminate all flappy cuboctahedra.
Our aim is to achieve this by taking B to be the set A s (for appropriately chosen s) given to us by Theorem 4.3. The rough idea is that if we fix a row, column or label x, then each edge xy i in the auxiliary graph gives rise to a large number of flappy structures, or equivalently van Kampen diagrams with boundary word xy −1 , which we build from the initial slit-octahedron van Kampen diagram by unfixing all the interior vertices using popular replacements that are guaranteed by the theorem. If there are too many edges xy i , this ends up contradicting Lemma 5.1. We now give the details. Lemma 5.2. Let A be an n × n partial Latin square with at least ǫn 5 cuboctahedra. Then A has a subset B of density at least ǫ 2 172 such that the maximum degree in the graph G(B) is at most ǫ −2 176 .
Proof. We initially begin with a specific flappy cuboctahedron, corresponding to a specific van Kampen diagram (as pictured in Figure 11). From this specific diagram we shall build a large collection of different copies of a more complicated van Kampen diagram that we obtain by performing popular replacements. Note that there are four internal vertices in the van Kampen diagram. Each time we perform a popular replacement, we shall choose an internal vertex and replace the 2r-gon that contains it by a more complicated surface that has the same boundary (corresponding to a shattered ring decomposition) and has all its internal edges unfixed. Once we have done this, we will obtain a van Kampen diagram that has the same boundary as the original one and no fixed internal edges. Moreover, the number of ways we can do this will be within a constant of the trivial maximum number of such diagrams. First we apply Theorem 4.3 to obtain a sequence A = A 0 ⊃ A 1 ⊃ · · · ⊃ A 4 with the property that A i has density α i (ǫ), and for each r = 2, . . . , 4 we have that every 2r-PF in A i is θ i (ǫ)-popularly decomposable in A i−1 in at least γ i (ǫ)n 2r different ways. The parameters α i (ǫ), θ i (ǫ) and γ i (ǫ) are all at least ǫ 2 25i 4 9i = ǫ 2 43i . Now suppose that the auxiliary graph G(A 4 ) of A 4 has a vertex d of degree at least M . Without loss of generality, let us assume that the vertex class that contains this vertex corresponds to the set of labels in the grid representation. This means that we can find a set {d 1 , . . . , d M } of distinct labels such that for each j there exists a label-flappy cuboctahedron where the flaps have label vertices d and d j .
We think of each of these specific label-flappy cuboctahedra as having all of their vertices fixed, since they are entirely specified. We will now use the popular decompositions to unfix the vertices, giving a growing collection of copies of a certain van Kampen diagram, that all share their fixed edges but may differ on the unfixed edges.
Let us take a specific label-flappy cuboctahdron such as the one shown in Figure 11. We now select a 4-PF by choosing some internal vertex. For Figure 11: A van Kampen diagram corresponding to a label flappy cuboctahedron with flaps labelled d and d j . The boundary word of the diagram is dd −1 j . The labels x i correspond to rows in the grid representation, and the labels y i correspond to columns. instance, we may select the bottom internal vertex which is incident to the edges labelled x 3 , y 3 , x 4 and y 4 . This 4-PF is represented by the four faces in the bottom half of the diagram, and is θ 4 -popularly decomposable in A 3 in at least γ 4 (ǫ)n 4 different ways. This means that there are at least γ 4 θ 5 4 n 9 shattered ring decompositions of this 4-PF that live inside the set A 3 .
Each of these shattered ring decompositions has the same boundary as the original 4-PF they replace. Therefore, by removing the original 4-PF from the chosen van Kampen diagram and replacing it with a choice of shattered ring decomposition, we obtain a new van Kampen diagram K 1 pictured in Figure 12. For each choice of shattered ring decomposition, we have a copy of K 1 in the van Kampen complex given by A 3 , and we call the collection of these copies K 1 . Since these van Kampen diagrams differ (for different choices of shattering) only in the edges which are coloured red, we say that the red edges are unfixed and the others are fixed.
At this point we note that K 1 consists of a collection of γ 4 θ 5 4 n 9 copies of K 1 that all share the eight fixed edges shown in black in Figure 12. Observe that there are nine internal red vertices, so Lemma 5.1 tells us that this is within a constant factor γ 4 θ 5 4 of the trivial maximum possible size of such a collection. The next step is to select another 2r-PF by choosing another internal vertex, this time of K 1 . Since we aim to unfix all but the boundary edges, we must select a 2r-PF that contains unfixed edges on the inside. We can do this Figure 12: The van Kampen diagram K 1 obtained after the first popular replacement in a flappy cuboctahedron. The shattered ring decomposition is represented with the red part of the diagram. All labels have been omitted for simplicity.
by picking all the faces of K 1 that contain some given internal vertex that is incident to at least one unfixed edge. For instance, we might take the leftmost internal black vertex in Figure 12. This gives us a 6-PF F , since this vertex is contained in six faces of K 1 . Let K 2 be the diagram obtained by replacing F with a shattered ring decomposition, and let K 2 be the collection of copies of K 2 obtained by taking the union over all copies in K 1 of the collections obtained by replacing F with one of the shattered ring decompositions.
It is already challenging to draw K 2 in detail, and we shall see shortly that it is not important to track the precise structure of the van Kampen diagrams that we obtain at each step. Nevertheless, we include an illustration of K 2 in Figure 13 to help clarify the process.
Since any given 6-PF is θ 3 -popularly ring decomposable in A 2 in γ 3 n 6 different ways, we may replace F with one of γ 3 θ 7 3 n 13 different shattered ring decompositions. We now claim that |K 2 | = γ 3 θ 7 3 n 13 |K 1 |, but to verify this we must ensure that each copy of K 2 in K 2 is counted at most once.
Suppose that K, K ′ are copies of K 1 in K 1 that, following replacements of their respective copies of F , both give the same element of K 2 . Then K and K ′ must agree on all but the internal edges of F . However, we chose F in such a way that one of the internal edges of F is fixed and thus shared between K and K ′ . But since all the edges of the van Kampen diagram for a 2r-PF are fixed once the boundary edges and a single internal edge are fixed (by the linearity of the underlying hypergraph), we see that K = K ′ . Figure 13: The van Kampen diagram K 2 obtained after the second replacement. The fixed part in which all members of K 2 agree is shown in black, and the unfixed part in red. All labels and directions have been omitted for simplicity.
Therefore we do not overcount, and the size of K 2 is given simply by multiplying the size of K 1 by the number of shattered ring decompositions for F . Therefore |K 2 | = γ 3 θ 7 3 n 13 |K 1 | = γ 3 γ 4 θ 7 3 θ 5 4 n 22 . Again, it is easy to see that this is within a constant of the trivial maximum, since a shattered ring decomposition of a 6-PF has thirteen internal vertices, so the number of internal red vertices after the second unfixing is 22 (as the sceptical reader can verify from Figure 13).
The remaining two steps are similar. At the next step, we can replace the 8-PF around the rightmost, internal black vertex in Figure 13 to create a collection K 3 of copies of K 3 , the van Kampen diagram obtained when the chosen 8-PF is replaced with a shattered ring decomposition with all internal edges unfixed. By Lemma 4.6, we will introduce a new factor of γ 2 θ 9 2 n 17 into the size of the collection, so that |K 3 | = γ 2 θ 9 2 n 17 |K 2 |. But we will also have added 8 + 8 + 1 = 17 new internal red vertices, so the trivial maximum increases by a factor of n 17 . Therefore, |K 3 | is within a factor γ 2 γ 3 γ 4 θ 9 2 θ 7 3 θ 5 4 of the maximum possible.
In K 3 there is one remaining internal vertex that is incident to fixed edges. This vertex is the internal vertex of an 8-PF in K 3 , so we may finish by replacing this 8-PF with a shattered ring decomposition to obtain a collection K 4 of copies of K 4 . In K 4 , only the two boundary edges are fixed. As before, Lemma 4.6 gives us that |K 4 | = γ 1 θ 9 1 n 17 |K 3 | and therefore Lemma 5.1 tells us that |K 4 | is within the constant factor γ 1 γ 2 γ 3 γ 4 θ 9 1 θ 9 2 θ 7 3 θ 5 4 of the maximum possible.
Drawings of the full structure of K 3 and K 4 would be too complicated to be illuminating, but we include Figure 14, which gives a global view of the replacement sequence we have performed. In this figure we show K 1 , K 2 , K 3 and K 4 but instead of drawing all the unfixed edges, we simply indicate where they are with red hashing.
Recall that the family K 4 was obtained by starting with a given label-flappy cuboctahedron, which yielded a van Kampen diagram with boundary labelled d and d j . By performing this sequence of popular replacements for each choice of j ∈ {1, . . . , M } we obtain M different collections of copies of the same van Kampen diagram K 4 . Each of these collections has a fixed boundary, but one of the two fixed boundary edges differs from collection to collection. By taking the union over all these collections, we obtain a final collection K of copies of K 4 in which only the label on one of the two boundary edges is fixed. Now we need an upper bound for the maximum number of copies of the partially fixed van Kampen diagram K ′ 4 , which is the same as K 4 except that only one of the two boundary edges is fixed. We cannot immediately apply Lemma 5.1 since the entire boundary is not fixed. But we can modify K 4 by attaching one new triangular face onto the unfixed boundary edge and fixing the other two edges of this face. We thus obtain a new partially fixed van Kampen diagram K ′′ 4 with a boundary consisting of three fixed edges, and every internal edge is unfixed. The maximum number of copies of K ′′ 4 is at most the maximum number of copies of K ′ 4 , since adding extra fixed edges cannot increase the number. We can now apply Lemma 5.1 to K ′′ 4 , which has the same number of internal vertices as K 4 . Therefore the maximum number of copies of K ′′ 4 is the same as that of K 4 , and hence the maximum number of copies of K ′ 4 is at most that of K 4 . But the size of the collection K is M |K 4 |, and |K 4 | is within a constant factor γ 1 γ 2 γ 3 γ 4 θ 9 1 θ 9 2 θ 7 3 θ 5 4 of the maximum possible. Therefore if M γ 1 γ 2 γ 3 γ 4 θ 9 1 θ 9 2 θ 7 3 θ 5 4 ≥ M ǫ 2 43 ǫ 2 86 ǫ 2 129 ǫ 2 172 ǫ 9.2 43 ǫ 9.2 86 ǫ 7.2 129 ǫ 5.2 172 ≥ M ǫ 2 176 > 1 then we have our contradiction. Therefore we may take B = A 4 , which has density at least α 4 ≥ ǫ 2 172 .
Our 'removal lemma' for flappy cuboctahedra follows from this lemma. Figure 14: The sequence of four popular replacements from the proof of Lemma 5.2. Starting with the van Kampen diagram of a specific flappy cuboctahedron (with all black edges labelled), we progressively unfix edges. Our unfixing process modifies the triangulation, and we represent the modified part with the red hatching (for example, the top figure represents K 1 , shown in full detail in Figure 12). All edges in the triangulation represented by the red hatching are unfixed.
Theorem 5.3. If A is an n × n partial Latin square containing at least ǫn 5 cuboctahedra, then there exists a subset B of A of density at least ǫ 2 179 that contains no flappy cuboctahedra.
Proof. We apply Lemma 5.2. This gives us a subset A ′ of A of density at least ǫ 2 172 such that G(A ′ ) has maximum degree at most ǫ −2 176 . We now pick a maximal independent set I of label vertices from G(A ′ ) as follows. We first pick that vertex v corresponding to the label that appears most frequently in A ′ and add it to I. Then we discard all vertices in the neighbourhood of v in G(A ′ ) and repeat, picking at each stage the vertex corresponding to the most popular label in A ′ . Since the maximum degree of G(A ′ ) is at most ǫ −2 176 , we end up picking at least ǫ 2 176 n vertices from G(A ′ ) corresponding to labels accounting for at least a fraction ǫ 2 176 of A ′ . Let A 1 be the subset of A ′ consisting of points with label in I. Then A 1 has density at least ǫ 2 177 , and inside A 1 there is no violation of the label quadrangle condition.
Since G(A 1 ) also has maximum degree at most ǫ −2 176 , we may similarly choose an independent set of at least ǫ 2 176 n row vertices from G(A 1 ) accounting for at least a fraction ǫ 2 176 of A 1 . This gives us a set A 2 of density at least ǫ 2 178 with no violation of either the label quadrangle condition or the row quadrangle condition.
Finally, we choose an independent set of at least ǫ 2 176 n column vertices from G(A 2 ) accounting for the greatest fraction of points of A 2 . This gives us a set A 3 = B of density at least ǫ 2 179 with no violations of row, column or label quadrangle conditions. In other words, B contains no flappy cuboctahedra.

The general case
Almost all of the complexity of the general case is contained in the detailed account given for the flappy cuboctahedron in the previous section. What remains is to describe how the replacement steps work in general, so that we can see that the argument for the cuboctahedron generalizes straightforwardly to arbitrary flappy structures.
The outline of the approach is as above. Given a van Kampen diagram K with boundary of length 2 and a subset A of a Latin square, we shall define the auxiliary graph G(A, K) on the vertex set of A (corresponding to rows/columns/labels) and an edge between d and d ′ if, when the vertex class containing both d and d ′ is chosen as the label coordinate, we have a flappy copy of K with boundary word dd ′−1 .
The main lemma will show that we may pass to a dense subset B of A such that G(B, K) has bounded degree for each K of bounded size. If this is the case, then the elimination of flappy structures is straightforward -as in the proof of Theorem 5.3, we will simply pass down to independent sets in the graphs G(B, K) in such a way that we avoid discarding too much of B.
The proof of the main lemma is similar to that of Lemma 5.2. Given M specific copies of the van Kampen diagram K with boundary edges labelled d and d j (for j = 1, . . . , M ), we shall unfix the edges by using popular decompositions of constituent 2r-gons that surround internal vertices. At each stage we have a collection of partially fixed van Kampen diagrams, and as we unfix more edges the size of our collection grows. We aim to show that once all edges incident to internal vertices are unfixed, we will have more than the trivial maximum number of copies of a certain partially fixed van Kampen diagram in the van Kampen complex of B unless M is bounded above by some constant that is independent of n (which will have a power dependence on ǫ, with the exponent depending on the number of faces of K). Lemma 5.4. Let A be an n × n partial Latin square with at least ǫn 5 cuboctahedra. Let b ≥ 8 be fixed. Then we can pass to a subset B of density at least ǫ 2 34b b 9b such that for each van Kampen diagram K with at most b faces and a boundary of length 2, the maximum degree in the graph G(B, K) is at most Proof. We begin the proof, as in the proof of Lemma 5.2,by applying Theorem 4.3, which we do with k = 2b. We obtain a sequence A = A 0 ⊃ A 1 ⊃ . . . with the property that A i is α i (ǫ, 2b) dense and for each r = 2, . . . , k we have that every 2r-PF in A i is θ i (ǫ, 2b)-popularly decomposable in A i−1 in at least γ i (ǫ, 2b)n 2r different ways, where each of α i , γ i and θ i are at least ǫ 2 25i (2b) 9i = ǫ 2 34i b 9i . Our set B will be A b which has density at least ǫ 2 34b b 9b . Note that the number of internal vertices of any van Kampen diagram with at most b faces is at most 3b/4 < b, since each internal vertex is contained in at least four faces and each face contains at most three internal vertices. Now let K be a van Kampen diagram with at most b faces. Initially, we may view all the labels on the edges as fixed. Our goal is to unfix all edges except the boundary edges. As before, our unfixing steps involve picking vertices from the diagram and removing all of their incident faces and re-triangulating the resulting 2r-gonal hole using the van-Kampen diagram of a shattered ring decomposition of the 2r-gon in which all internal edges are unfixed. Starting with K = K 0 , this process will lead us to construct a sequence K = K 0 , K 1 , K 2 , . . . of partially fixed van Kampen diagrams and associated collections K i of copies of these diagrams, where the copies in the family K i live in the set A s−i .
In the previous section, we performed the replacements one by one and ensured at each stage that the size of K i is within a constant of the maximum possible. For the general case, it will be simplest to perform the latter check at the end, once all replacements have been made and the we have reached a partially fixed van Kampen diagram K s in which all edges incident to internal vertices are unfixed.
At each stage, we pick any vertex v inside K i (not on the boundary) such that v is incident to fixed edges. We then consider the faces containing v -there are 2r i of them giving a 2r i -PF (the number must be even because the van Kampen diagrams are built from oriented triangles). We use popular decomposability to replace this 2r i -PF with a shattered ring decomposition with unfixed internal edges, giving us K i+1 . As before, K i+1 is obtained from K i by choosing each possible replacement for each member of K i . As in the cuboctahedron case, we will have that the size of K i+1 will be equal to at least the size of K i times the minimum number of different ring decompositions of the 2r i -PF in the set A s−i−1 . We do not overcount, since if two copies of K i agree on all edges apart from those incident to v then, since v is also incident to a fixed edge, they must agree everywhere.
At each stage we reduce the number of internal vertices incident to fixed edges by exactly one, so the number of unfixing steps that we need to perform is equal to the number of internal vertices of the van Kampen diagram K, which is at most 3b/4. Moreover, the maximum degree of a vertex in K is bounded above by b and this increases by at most two with each popular replacement. Thus, the maximum value of r for which we ever perform a popular replacement of a 2r-PF is bounded above by (b + 2(3b/4))/2 ≤ 2b = k.
We now consider the van Kampen diagram K s that we get at the end of this process. Each time we do a popular replacement of a 4r i -PF, we increase the size of the family by a factor γ k+1−i θ 2r i −1 k+1−i n 4r i +1 , by Lemma 4.6. So at the end of the process, the size of the collection K s is at least The number of internal vertices of K s is s i=1 (4r i + 1), since at each step of the unfixing process we replace one internal vertex by the 4r i + 1 internal vertices of a shattered ring decomposition. So, by Lemma 5.1, the maximum possible size of a collection of copies of K s that agree on the boundary edges is s i=1 n 4r i +1 . Therefore |K s | is within a constant factor of the maximum possible. Indeed the constant factor η is bounded by As before, we may repeat the same unfixing process (in the same order) for each different choice of label d i (i = 1, . . . , M ). Each different choice gives us a collection of van Kampen diagrams with fixed boundary labels. The union of these collections is K, a collection of copies of the partially fixed van Kampen diagram K ′ obtained by unfixing the appropriate boundary edge of K s . By the same trick as in the previous section, we can apply Lemma 5.1 to deduce that the maximum possible number of copies of K ′ is in fact the same as the maximum possible number of copies of K s , and therefore we obtain a contradiction if M η > 1. Therefore M ≤ ǫ −b 15b , which proves the lemma.
We are finally ready to prove the main theorem, which we restate here using precise language.
Theorem 5.5. Let A be a tripartite, linear hypergraph with vertex classes of size n containing at least ǫn 5 cuboctahedra. Then for any positive integer b there exists a subset B of A with at least ǫ b 17b n 2 faces containing no flappy spherical hypergraphs with fewer than b faces.
Proof. We apply Lemma 5.4 to obtain a subset A ′ such that the graph G(A ′ , H) has maximum degree at most ǫ −b 15b for any flappy spherical hypergraph H with fewer than b faces. The goal is now to pass to subsets V i of each vertex class with the property that G i (A V , H) contains no edges for any choice of a flappy, In order to do this, we introduce the graph G(A ′ , b) which is the union of all graphs G i (A ′ , H) where H is a flappy, spherical hypergraph with at most b faces. Since a flappy, spherical hypergraph has 3b/2 + 1 vertices, the number of different flappy, spherical hypergraphs with at most b faces is at most (3b/2 + 1) b+1 , so G(A ′ , b) has maximum degree at most (3b/2 + 1) b+1 ǫ −b 15b . Now, as in the proof of Theorem 5.3, we select our subsets V i by passing to independent sets in the G(A ′ , b) in such a way that the number of faces in the induced subgraph A V is maximised. Doing this gives us a subgraph B which is guaranteed to have at least faces, and which contains no flappy spherical hypergraphs with fewer than b faces.
Remark 5.6. Of course, Theorem 5.5 implies a version of Theorem 5.3, although the bound is somewhat worse because Theorem 5.5 uses crude estimates for the number of replacements required (whereas in the proof of Theorem 5.3 we determine an exact sequence of four replacements for the flappy cuboctahedron, and determine each r i required).

Conclusion and further questions
We have shown that a Latin square (or more generally a partially labelled grid such that no label occurs more than once in the same row or column) that has many cuboctahedra, or equivalently that satisfies the 1% quadrangle condition, can be restricted to a dense subset of the grid that contains no flappy spherical hypergraphs of size less than k. By the correspondence between proofs of inconsistency and van Kampen diagrams, we can reframe our main result as follows.
Corollary 6.1. Let A be an n × n partial Latin square that contains at least ǫn 5 pairs of equally labelled rectangles. Then for any positive integer b there exists a subset B of A of density at least ǫ b 17b such that there is no proof of length less than b that B does not embed into the multiplication table of a group.
In the first two appendixes, we show that any such binary operation can be restricted to a further dense subset that comes from a rough approximate group in the manner explained in Section 1.5. This can be used to deduce the following further corollary (proved in Appendix B), which is a more algebraic formulation of our main theorem. Corollary 6.2. Let A be an n × n partial Latin square that contains at least ǫn 5 pairs of equally labelled rectangles. Then for any δ > 0 there exist a real number K = K(δ, ǫ) = ǫ −δ −O(δ −1 ) , a metric group G, two 1-separated subsets X, Y of G of size at most n, a (K, δ)-rough approximate subgroup H of G of size at most Kn, and subsets U, V ⊂ G of size at most K such that X ⊂ (U H) δ , Y ⊂ (HV ) δ , and a subset B ⊂ A of density at least β = β(δ, ǫ) = ǫ δ −O(δ −1 ) such that B is δ-approximately isomorphic to a subset of the restriction of the multiplication table of G to X × Y .
Thus, as we claimed in the introduction, we have a complete characterization, up to dense subsets, of labellings with many cuboctahedra. However, it is natural to ask for a bit more than this. In particular, thanks to the work of Breuillard, Green and Tao [2] we have a complete description of approximate groups. Is there a corresponding description of rough approximate groups? We have not formulated a precise conjecture, but the SO(3) example that was also presented in Section 1.5 suggests that one ought to be able to say quite a lot more about the metric group that contains a rough approximate group: it seems likely that Lie groups of bounded rank enter the picture, for example, but since SO (3) is not nilpotent, it also seems that nilpotency plays a less important role than it does in the theory of approximate groups.
The other question we would like to see answered is whether Conjecture 1.6 is true, since that would demonstrate that rough approximate groups really are more general objects than approximate groups.

A Basic results about rough approximate groups
In the first two sections of this appendix we show that much of the basic theory about subsets of groups with small doubling and tripling carries over to 'rough' statements. Our main arguments, given in the next section, will either come from a blog post of Tao [7] or be closely modelled on the arguments in a paper of Tao [8], both of which we mentioned earlier. In the paper, he developed non-Abelian product-set theory, and in the blog post he proved 'metric-entropy versions' of some of his statements from the paper. Here we shall carry out this process for a few more such statements.
In this section, we give some definitions and make a few basic observations. Given a subset X of a metric space, and another subset ∆, we say that ∆ is an ǫ-net of X if for every x ∈ X there exists y ∈ ∆ such that d(x, y) < ǫ. An ǫ-separated subset of X is a subset Γ such that d(x, x ′ ) ≥ ǫ for every pair of distinct elements x, x ′ ∈ Γ. Write ν ǫ (X) for the smallest size of an ǫ-net of X, and σ ǫ (X) for the largest size of an ǫ-separated subset. We begin with three very basic lemmas.
Proof. Let Γ be an ǫ-separated set of maximal size. Then in particular it is maximal. It follows that it is an ǫ-net. This proves the first inequality. Now let ∆ be an (ǫ/2)-net. Then the balls of radius ǫ/2 about the points of ∆ cover X, and no ǫ-separated set can contain more than one element in any of these balls. This proves the second inequality.
Lemma A.2. Let X and Y be subsets of metric spaces and let d be the metric on X × Y defined by d ((x, y), Proof. By Lemma A.1, we have that Lemma A.3. Let X be a subset of a metric group and let ǫ > 0. Then ν ǫ (X) = ν ǫ (X −1 ) and σ ǫ (X) = σ ǫ (X −1 ).
Proof. This is an immediate consequence of the fact that for any two elements x, y of a metric group.
We shall now make some definitions that will allow us to state our further results more concisely. Let X, Y, Z be disjoint finite sets and let φ : X ×Y → Z. We say that the quadruple (X, Y, Z, φ) is a partial k-torsor of density δ if the following conditions hold.
(iii) In the free group with generators X ∪ Y ∪ Z and relations xyφ(x, y) −1 there is no van Kampen diagram of area less than or equal to k and boundary of the form x 1 x −1 2 , y 1 y −1 2 or z 1 z −1 2 . Note that if k ≥ 2, then the above conditions imply that no label from Z appears more than once in any row or column of X × Y . For instance, if φ(x 1 , y) = φ(x 2 , y) = z, then we have the relations x 1 yz −1 and x 2 yz −1 , from which we deduce that x 1 x −1 2 = (x 1 yz −1 )(zy −1 x −1 2 ) = e, and this deduction corresponds to a van Kampen diagram of area 2 with boundary x 1 x −1 2 . From this it follows that |Z| ≥ δ max{|X|, |Y |}, which in turn implies (using property (ii)) that δ 4 |X| ≤ |Y | ≤ δ −4 |X|. Thus, the sizes of X, Y and Z are comparable.
One way of stating our main result is to say that if X and Y are sets of size n and φ is a partially defined binary operation that gives rise to at least cn 5 cuboctahedra, then for every k the multiplication table of φ can be restricted to a partial k-torsor of density δ(c, k), where δ has a power dependence on c for each fixed k.
Given a k-partial torsor, define the corresponding van Kampen metric on the free group with generators X ∪ Y ∪ Z by taking d(w 1 , w 2 ) to be the area of the smallest van Kampen diagram with the relations xyφ(x, y) −1 and boundary w 1 w −1 2 . Here w 1 and w 2 are words in X ∪Y ∪Z. If there is no van Kampen diagram with boundary w 1 w −1 2 we say that d(w 1 , w 2 ) = ∞ (so strictly speaking the metric is not a metric but a generalization of a metric where infinite distances are allowed). It is a simple exercise to prove that the distance between two generators is infinite unless they belong to the same set out of X, Y and Z, and more generally that there is a homomorphism from the group with these relations to the group with presentation a, b, c|ab = c , which takes all of X to a, all of Y to b and all of Z to c.
We stress once again that the group with generators X ∪Y ∪Z and relations xyφ(x, y) −1 does not have to be very interesting: for example, it might just be the group a, b, c|ab = c (which is of course isomorphic to the free group on a and b). But we are looking in much more detail at the metric structure given to the free group on X ∪ Y ∪ Z by the relations x i y j = φ(x i , y j ), rather than merely distinguishing between finite and infinite distances.
From now on, when φ is understood, we shall write xy for φ(x, y) and AB for φ(A, B) = {φ(x, y) : x ∈ A, y ∈ B}.
The observation that allows us to say something about the structure of partial k-torsors is the following. We shall write ν ǫ (X) for the size of the smallest non-strict ǫ-net of X -that is, of the smallest set ∆ such that for every x ∈ X there exists y ∈ ∆ with d(x, y) ≤ ǫ.
proportion Ω(δ) of the time, whereX ′ andỸ ′ are the embeddings of X ′ and Y ′ into the free group with the van Kampen metric defined above.
Proof. Form a bipartite graph G with vertex sets X, Y by joining x to y if and only if φ(x, y) is defined. Then by hypothesis G has density δ. By Lemma 3.7 we can find X ′ ⊂ X and Y ′ ⊂ Y with |X ′ | = δ O(1) |X| and |Y ′ | = δ O(1) |Y | such that between any x ∈ X ′ and y ∈ Y ′ there are δ O(1) |X||Y | paths of length 3 (with the two vertices in between not required to live in X ′ and Y ′ ) and such that the graph G| X ′ ×Y ′ has density Ω(δ).
For each x ∈ X ′ and y ∈ Y ′ , let T (x, y) be the set of triples (z 1 , z 2 , z 3 ) ∈ Z 3 such that there exist x 1 ∈ X and y 1 ∈ Y with φ(x, y 1 ) = z 1 , φ(x 1 , y 1 ) = z 2 and φ(x 1 , y) = z 3 . Since no z ∈ Z appears more than once in any row or column, there is a bijection between triples in T (x, y) and paths of length 3 from x to y in the graph, so each set T (x, y) has size δ O(1) |X ′ ||Y ′ |.
Suppose now that (z 1 , z 2 , z 3 ) belongs to T (x, y) and x 1 , y 1 are as above. Then z 1 z −1 2 z 3 = xy 1 (x 1 y 1 ) −1 x 1 y = xy, and since we used three relations in the proof, we have that d(z 1 z −1 2 z 3 , xy) ≤ 3. Now let Γ = {(x 1 y 1 ), . . . , (x m y m )} be a 6-separated subset of X ′ Y ′ . Then the balls of radius 3 about the x i y i are disjoint, from which it follows that the sets T (x i , y i ) are disjoint. But each one has size δ O(1) |X ′ ||Y ′ | and their union has size at most This bound holds for all 6-separated subsets, so the result now follows from Lemma A.1.
We remark that X and Y are k-separated sets, so we could if we wanted replace the cardinalities |X ′ | and |Y ′ | in the statement above by the quantities σ k (X ′ ) and σ k (Y ′ ).
One of the main results of [8] is that if X, Y are finite subsets of a group and |XY | ≤ C|X| 1/2 |Y | 1/2 , then there exists an approximate group H and sets K, L of bounded size such that X ⊂ KH and Y ⊂ HL. (One can of course take K and L to be the same by taking their union.) In the next section, we shall prove an analogous statement for our metric-entropy context.
B Products with small metric entropy come from rough approximate groups The main theorem we prove in this section is the following metric-entropy variant of Theorem 4.6 of [8].
We begin with an analogue of the Ruzsa triangle inequality (which can also be found in [7]).
Proof. By Lemma B.2, Lemma A.3 and our hypothesis, we have that , so the result follows.
Our next lemma is a version of the Ruzsa covering lemma.
Lemma B.4. Let ǫ > 0 and let A, B be subsets of a metric group such that ν ǫ (AB) ≤ Cσ 2ǫ (B). Then there exists a set K of size at most C such that KBB −1 is a 2ǫ-net of A.
Proof. Let K ⊂ A be maximal such that for any two distinct elements x, x ′ ∈ K the distance between the sets xB and x ′ B is at least 2ǫ. Then if y ∈ A there must be some x ∈ K such that d(xB, yB) < 2ǫ, by maximality, from which it follows that d(y, xBB −1 ) < 2ǫ. Therefore, KBB −1 is a 2ǫ-net of A. Now let Γ be a 2ǫ-separated subset of B. Then KΓ is a 2ǫ-separated subset of KB, which is contained in AB. It follows that Kσ 2ǫ (B) ≤ σ 2ǫ (AB), which by Lemma A.1 is at most ν ǫ (AB). By hypothesis this is at most Cσ 2ǫ (B) and the result follows.
Next we need a notion of "popular differences" that will be suitable for this metric-entropy context. Definition B.5. Let A be a subset of a metric group. We say that an element d ∈ A 2 is (ǫ, δ, m)-popular if there are m pairs (x i , y i ) ∈ A 2 such that the sets {x 1 , . . . , x m } and {y 1 , . . . , y m } are δ-separated and d(y −1 i x i , d) < ǫ for every i, It follows that ν 1024ǫ (Y −1 S 2 ) = ν 1024ǫ (S 2 Y ) ≤ 128C 32 σ β (Y ). Therefore, by Lemma B.9 again it follows that there is a set L of size at most 2048C 48 such that LS 2 is a 2304ǫ-net of Y −1 , which implies that S 2 L −1 is a 2304ǫ-net of Y .
Combining Theorem B.1 with Lemma A.4 gives the promised converse to Proposition 1.7. It follows from the definition of a partial k-torsor thatX ′ and Y ′ are k-separated sets in the van Kampen metric, so we may apply Theorem B.1 with β = k and ǫ = 6 to the sets obtained from Lemma A.4. (Of course, this is an interesting statement only when k is sufficiently large.) Theorem B.10. Let T = (X, Y, Z, •) be a partial k-torsor of density at least α such that |X| = |Y | = |Z| = n. Then there exist K = α −O(1) and δ = O(k −1 ) such that we can find subsets X ′ , Y ′ , Z ′ of X, Y, Z of size at least α O(1) n, a metric group G, and maps φ 1 : X ′ → G, φ 2 : Y ′ → G and φ 3 : Z ′ → G such that the images of X ′ , Y ′ , Z ′ are 1-separated, d(φ 1 (x)φ 2 (y), φ 3 (z)) < δ whenever x • y = z (which happens for at least α O(1) n 2 pairs (x, y) ∈ X ′ × Y ′ ), and there is a (K, δ)-approximate subgroup H of G, and sets U, V of size α −O (1) such that X ′ ⊂ (U H) δ and Y ′ ⊂ (HV ) δ .
Proof. Let G * be the free group with 3n generators given by X, Y and Z, and let d be the van Kampen metric on G * induced by •. Let G be the resulting metric group (G * , d). Let φ 1 , φ 2 and φ 3 be the embeddings from X, Y and Z into G, respectively.
Note also that if we pass to a further dense subset, then we can describe the structure of our multiplication in a slightly more economical way. Indeed, if X is approximately covered by a bounded number of left translates of H and Y by a bounded number of right translates, then we can find a left translate aH and a right translate Hb such that the binary operation is defined for a large number of pairs (x, y) that all belong to (X × Y ) ∩ (aH × Hb). Since ah 1 h 2 b is close to ah 3 h 4 b if and only if h 1 h 2 is close to h 3 h 4 , this shows that a dense part of the multiplication table is approximately isomorphic to a dense part of the multiplication table of a rough approximate group, as we claimed earlier in the paper.
We finish by spelling out the proof of Corollary 6.2.
Proof of Corollary 6.2. We begin by applying Corollary 6.1 with b = O(δ −1 ). This allows us to find inside A a partial b-torsor T of density at least ǫ b O(b) = ǫ −δ O(δ −1 ) . We then apply Theorem B.10 to the partial b-torsor T .
C A Bogolyubov-type lemma for SO (3).
In this section we look at properties of product sets of dense subsets of SO (3). The main result we shall prove is the following lemma. It follows straightforwardly that AA −1 AA −1 contains the whole of a ball of some radius η(θ). Thus, this result can be thought of as a Bogolyubov lemma for SO (3), with balls about the identity playing the role of Bohr sets. Note that unlike in the Abelian case, there is an extra uniformity here: the ball we obtain depends only on the measure of A and not on A itself. This fact, which can be thought of as saying that the only departure from quasirandomness of the group SO (3) is the obvious one that a product of two small balls is contained in a small ball, will be essential to our argument. A corollary of this result will be a statement that we claimed earlier in the paper: that if Γ is a maximal δ-separated subset of SO(3) and • : Γ × Γ → Γ is a partially defined operation where x • y = z if and only if xy is close to z, then • is defined for a dense set of pairs. (We give a precise formulation later.) To prove Lemma C.1 we shall begin, as one might expect, by imitating the proof of Bogolyubov's lemma, using non-Abelian Fourier analysis. This will show that the structure of the convolution ½ A * ½ A −1 essentially depends on the large Fourier coefficients of ½ A . As is well known, these all come from the low-dimensional representations of SO (3): the further uniformity mentioned above comes from the fact that the number of low-dimensional representations is bounded. To prove the assertion about the ball, we use the fact that the lowdimensional representations can be described explicitly in terms of spherical harmonics.
Let us briefly recall the basic facts about non-Abelian Fourier analysis that we shall need. Given a measurable function f : SO(3) → C and an irreducible representation ρ of SO (3), we define the Fourier coefficientf (ρ) by the formulâ where we are writing E x for the average with respect to Haar measure on SO (3). Note that if ρ is a k-dimensional representation, thenf (ρ) is a k × k matrix.
The non-Abelian versions of Parseval's identity, the convolution identity, and the inversion formula are as follows. Parseval's identity states that for any two measurable functions f, g : SO(3) → C, where the sum is over all irreducible representations and n ρ is the dimension of ρ. The left-hand side is the obvious definition of the inner product of f and g. As for the right-hand side, the matrix inner product A, B of two k × k matrices A and B is tr(AB * ) = ij A ij B * ij , so we can rewrite it as ρ n ρ f (ρ),ĝ(ρ) , which is a natural way of defining the inner product f ,ĝ . So, suitably interpreted, Parseval's identity is just the usual identity f, g = f ,ĝ .
The convolution identity is also the same as it is in the Abelian case: f * g(ρ) =f (ρ)ĝ(ρ). Of course, here the productf (ρ)ĝ(ρ) is a matrix product.
Finally, the inversion formula is where the equality is valid almost everywhere. The Hilbert-Schmidt norm of a complex matrix A is defined by the formula It follows that if C is large, then the function f * g is well approximated in L 2 (SO (3)) by the function u defined above, which was the part that comes from the representations of dimension at most C. Now let B be a ball of radius η, where η > 0 is a constant to be chosen later, and let µ B be the characteristic measure of B. That is, if B has Haar measure β, then µ B (x) = β −1 for x ∈ B and µ B (x) = 0 otherwise. We shall show that if η is sufficiently small, then u−u * µ B ∞ , and hence u−u * µ B 2 , is small.
In order to bound the size of the right-hand side, we shall show that if η is small enough, then µ B (ρ) is close to the identity (on C nρ ) for all irreducible representations ρ of dimension at most C. By definition, Now every irreducible representation of SO(3) has odd dimension, and a representation of dimension 2d + 1 can be realized by taking the action of SO(3) on the space of spherical harmonics of degree d: that is, the harmonic polynomials that are homogeneous of degree d.
One can show easily that the d-dimensional spherical harmonics are equicontinuous: for instance, it follows from the fact that they are all of the form i a i p i , where the p i are an orthonormal basis and i a 2 i = 1. (If one wants, one can obtain estimates for the equicontinuity by using explicit formulae for the spherical harmonics, but we shall content ourselves with a qualitative statement here.) Therefore, for every ǫ > 0 and every irreducible representation ρ of dimension n ρ = 2d + 1 there exists η > 0 such that if x is sufficiently close to the identity in SO (3), then ρ(x)p, p ≥ 1 − η for every spherical harmonic p of dimension d. It follows by averaging over all p that n −1 ρ trρ(x) ≥ 1 − η, which implies that ρ(x) − I nρ 2 HS ≤ 2ηn ρ , and therefore that ρ(x) − I nρ 2 op ≤ 2ηn ρ as well.
It follows that for every δ > 0 and every C we may choose η > 0 such that ρ(x) − I nρ op ≤ δ for every x ∈ B and every irreducible representation ρ of SO(3) of dimension at most C. This in turn implies by averaging that µ B (ρ) − I nρ op ≤ δ for every such ρ, where B is the ball of radius η about the identity. But then in the right-hand side of (2) we are taking the trace of a product of four matrices of which three have operator norm at most 1 and one has operator norm at most δ. It follows that the trace is at most δn ρ , and therefore that the right-hand side is in total at most δ ρ:nρ≤C n 2 ρ . Choosing δ in such a way that this sum is at most ǫ/2 (which we can do with δ depending on ǫ and C only) we obtain that |u(x) − u * µ B (x)| ≤ ǫ for every x ∈ SO (3), which implies that u − u * µ B 2 ≤ ǫ/2.
Let us state this result formally for later reference. When we say "the ball of radius η" this is to be understood to be the ball with respect to any reasonable metric, such as the one coming from the operator norm or the Hilbert-Schmidt norm -the result is true for all of them.
Lemma C.2. For every ǫ > 0 there exists η > 0 such that the following statement holds. Let f and g be two bounded measurable complex-valued functions defined on SO (3), let B be the ball of radius η around the identity in SO (3) and let µ B be the characteristic measure of B. Then f * g − f * g * µ B 2 ≤ ǫ.
We can interpret this result as a kind of partial quasirandomness property of SO (3). For a fully quasirandom group, we can replace µ B by the constant function that takes value 1 everywhere and the lemma above holds. Thus, of Γ. Since all balls of radius θδ have the same measure, it follows that the proportion of (x ′ , y ′ ) ∈ Γ 2 such that x ′ y ′ is within 3θδ of some point z ′ ∈ Γ is at least θ 9 /16. Replacing θ by θ/3 gives the lemma as stated.
We make another observation that uses part of the proof above.
Proof. Choose η, and therefore B, as in the proof of the previous lemma. Then the measure of the set of x such that |f * f (x) − f * f * µ B (x)| < θ 6 /8 is at least 1 − 64γ 2 θ −12 . For each x in this set, we have f * f (x) ≥ θ 6 /8 (since f * f * µ B is always at least θ 6 /4). Let us denote this set of 'popular products' by W .
Note that the power θ 6 makes sense above. Since SO (3) is three-dimensional, the probability that two random points will be within θ of points in Γ should be around θ 3 .θ 3 , and we have shown that most of the time we are within a constant of what this random model would predict.