Joints of Varieties

We generalize the Guth–Katz joints theorem from lines to varieties. A special case says that N planes (2-flats) in 6 dimensions (over any field) have O(N3/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(N^{3/2})$$\end{document} joints, where a joint is a point contained in a triple of these planes not all lying in some hyperplane. More generally, we prove the same bound when the set of N planes is replaced by a set of 2-dimensional algebraic varieties of total degree N, and a joint is a point that is regular for three varieties whose tangent planes at that point are not all contained in some hyperplane. Our most general result gives upper bounds, tight up to constant factors, for joints with multiplicities for several sets of varieties of arbitrary dimensions (known as Carbery’s conjecture). Our main innovation is a new way to extend the polynomial method to higher dimensional objects, relating the degree of a polynomial and its orders of vanishing on a given set of points on a variety.


Introduction
Guth and Katz [GK10] proved the following "joints theorem": N lines in R 3 have O(N 3/2 ) joints, where a joint is a point contained in three of the lines that do not all lie on some plane. This bound is tight up to a constant factor due to the following example: consider k generic planes-their pairwise intersections give k 2 lines and triplewise intersections give k 3 joints. The joints problem was first studied in Chazelle et al. [CEGPSSS92]. Besides being an interesting problem in incidence geometry, it also caught the attention of harmonic analysts due to connections to the Kakeya problem as observed by Wolff [Wol99]. This connection was further elucidated by Bennett, Carbery and Tao [BCT06] in their work on the multilinear Kakeya problem, which in turn allowed them to improve bounds on the joints problem (prior to the Guth-Katz solution). Guth [Gut10] later adapted techniques from the solution of the joints theorem to prove the so-called endpoint case of the Bennett-Carbery-Tao multilinear Kakeya conjecture, which can be viewed as a joints theorem for tubes (also see the exposition in [Gut16, Section 15.8]). Guth's multilinear Kakeya result was later generalized by Zhang [Zha18] to slabs and neighborhoods of varieties (though the latter does not translate back to the joints problem for flats).

JOINTS OF VARIETIES 303
The Guth-Katz solution of the joints problem highlights the importance of the polynomial method. Their joints theorem was also a precursor to their subsequent breakthrough on the Erdős distinct distances problem [GK15], which introduced a polynomial partitioning method that has found many subsequent applications. One of the key steps in [GK15] dealt with a point-line incidence problem in R 3 with additional constraints on the configuration of lines. These developments were partly inspired by Dvir's [Dvi09] stunningly short and elegant solution to the finite field Kakeya problem. Guth has also successfully applied the polynomial method developed in this line of work to restriction problems related to Kakeya [Gut16,Gut18].
Since Guth and Katz's original work, there has been significant effort in extending the joints theorem [CI14, CI20, CV14, EKS11, Ili13b, Ili15a, Ili15b, KSS10, Qui09, YZ22, Zha20]. Kaplan, Sharir, and Shustin [KSS10] and Quilodrán [Qui09] independently extended the joints theorem from R 3 to R d , and these techniques and results extend to arbitrary fields as stated below (also see [CI14,Dvi10,Tao14]). Given a set of lines in F d , a joint is a point contained in d lines with independent and spanning directions. Throughout the paper, F stands for an arbitrary field, and our constants do not depend on F.

Theorem 1.1 A set of N lines in F d has at most C d N d/(d−1) joints, for some constant C d .
Recently Yu and Zhao [YZ22] proved that N lines in F d have at most (d−1)! 1/(d−1) d N d/(d−1) joints. This leading constant is optimal, matching the above construction up to a (1 + o(1))-factor.
We generalize the joints theorem from lines to varieties, overcoming a fundamental difficulty with the polynomial method that one quickly runs into-we will elaborate more on this later. A representative case of our result says the following. Here a joint is a point contained in a triple of planes not all lying in some hyperplane. All our bounds on joints in this paper are tight up to a constant factor (depending on the dimension) due to a straightforward generalization of the example in the first paragraph.
Theorem 1.2 A set of N planes in F 6 has O(N 3/2 ) joints.
In his PhD thesis, Ben Yang [Yan16,Yan17] proved partial results giving an upper bound N 3/2+o(1) when F = R (and also more generally for bounded degree varieties in R d -in contrast, our results on joints of varieties do not require any bounded degree hypotheses). Yang's results have two fundamental limitations: (1) an error term in the exponent and (2) the methods only work over the reals. He used a variant of the polynomial partitioning method [GK15], which requires real topology. More specifically, Yang applied polynomial partitioning for varieties (due to Guth [Gut15] and extended by Blagojević, Blagojević, and Ziegler [BBZ17]) using bounded degree polynomials (due to Solymosi and Tao [SI12]), with the latter requiring an error term in the exponent. We introduce a novel approach that avoids both limitations. Theorem 1.4 (Multijoints of lines). Given d sets of lines L 1 , . . . , L d in F d , the number of joints formed by taking one line from each L i is at most C d (|L 1 | · · · |L d |) 1/(d−1) for some constant C d .
We extend the multijoints theorem from lines to flats. Here a point is a joint formed by several flats if these flats contain this point and have spanning and independent directions.

Varieties.
We extend the joints theorem from flats to varieties. Generalizing earlier notions, a point p is a joint formed by several varieties V 1 , . . . , V r if p is a regular point for each V i and their tangent spaces at p have independent and spanning directions. (Recall that a point p is a regular point of a variety V if the Zariski tangent space T p V has the same dimension as V .) The proof of the joints theorem can be easily adapted from lines to algebraic curves (e.g., see [KSS10,Qui09]). Here we extend the joints theorem to higher dimensional varieties. Given a set V of varieties, let deg V denote the sum of the degrees of the elements of V.
Remark. In this paper, all varieties are assumed to be irreducible. We do not lose any generality for the joints problem with this assumption as one can always replace any algebraic set by its irreducible components.
Like earlier, we prove the result more generally for multiple sets of varieties. Theorem 1.7 (Multijoints of varieties). Given V 1 , . . . , V r , where each V i is a set of k i -dimensional varieties in F d , where d = k 1 + · · · + k r , the number of joints formed by taking one variety from each V i is at most C k1,...,kr (deg V 1 · · · deg V r ) 1/(r−1) for some constant C k1,...,kr .
Previously, Iliopoulou [Ili15b] proved the multijoints theorem for algebraic curves of bounded degree in R d (here by bounded degree we mean that the leading constant C depends on the maximum degree of the curves), but it was unknown how to to generalize from R d to F d , despite knowledge of the joints theorem for a single set of curves. This is because Zhang's proof [Zha20] of the multijoints theorem for lines (Theorem 1.4) does not easily adapt to curves.
In the setting of real varieties, Yang [Yan16] proved an upper bound of the form C (|V 1 | · · · |V r |) 1/(r−1)+ for all > 0 where C also depends on the maximum degree of the varieties.

Joints with multiplicities.
In the above formulations of joints and multijoints theorems, each point is counted as a joint at most once. Motivated by Kakeya problems, Carbery suggested a generalization where joints contained in many lines are counted with multiplicity. The following theorem about joints of lines with multiplicities was conjectured by Carbery, proved in R 3 by Iliopoulou, and settled in general by Zhang [Zha20].
where C d is some constant.
Theorem 1.8 strengthens Theorem 1.4 (multijoints of lines). The exponent in M (p) 1/(d−1) on the left-hand side is optimal as can be easily seen by duplicating every element in each set of lines m times for some large m.
Yang [Yan16] studied a generalization of Theorem 1.8 to joints of varieties with multiplicities, but as earlier, his upper bound only holds in R d , carries an +o(1) error term in the exponent, and the leading constant depends on the maximum degree of the varieties.
Our main result, below, generalizes the above to joints of varieties counted with multiplicities. It generalizes all previously stated results. Theorem 1.9 (Joints of varieties with multiplicities).
Our proof of Theorem 1.9 even in the case of lines is different from that of Zhang [Zha20]. By our method, there is no significant difference between the proofs of Theorem 1.7 (without multiplicities) and Theorem 1.9 (with multiplicities).

JOINTS OF VARIETIES 307
1.5 Constants. We restate Theorems 1.7 and 1.9 in the following equivalent form with explicit constansts. This superficially more general formulation (formulated in [YZ22] for flats) exposes a difficulty hierarchy of the problem. It also allows us to discuss the leading constants. While the constants below are optimal for (r, k 1 , m 1 ) = (1, 1, d), they are likely not tight in all other cases.
Let us explain how various specializations of Theorem 1.10 correspond to earlier results.
While we know the optimal constant for joints of lines, our proof does not seem to give the optimal constant for flats or varieties. For r = 1 we conjecture that the optimal constant in Theorem 1.10 is C k,m = (m!/m m ) 1/(m−1) N m/(m−1) for all k and m, agreeing with joints of lines (k = 1). The first open case (k, m) = (2, 3) is stated below.
1.6 Outline. We begin by motivating and describing, in Sect. 2, the key new ideas in our method. We then give, in Sect. 3, the proof in the special case of joints of planes in R 6 , which is representative of the general result. To obtain the result in full generality, we use higher order directional derivatives with respect to local coordinates along a variety, as well as Hasse derivatives to deal with arbitrary fields, and they are both discussed in Sect. 4. The complete proof of the main theorem then appears in Sect. 5. Given a set of N lines forming J joints in R n , let g be a nonzero polynomial of minimum degree that vanishes at all J joints. By the parameter counting lemma, we have deg g ≤ CJ 1/3 for some constant C > 0.

Key Ideas
The following elementary fact is key to the polynomial method.
Lemma 2.2 (Vanishing lemma). If a degree n polynomial vanishes at more than n points on a line, then it vanishes on the whole line.
We claim that some line contains at most CJ 1/3 joints. Suppose, for contradiction, that every line contains more than CJ 1/3 joints. Since deg g ≤ CJ 1/3 , the vanishing lemma implies that g vanishes on each of the N lines. Since each joint is contained in three lines in spanning directions, the gradient ∇g vanishes at every joint. Thus ∂g/∂x, ∂g/∂y, ∂g/∂z all vanish at every joint. At least one of these partial derivatives is a nonzero polynomial of degree smaller than that of g, thereby contradicting the minimal degree assumption on g.
Thus some line contains at most CJ 1/3 joints. We can then remove this line and all its joints, and repeat the argument to find another line with at most CJ 1/3 joints. After we have removed all the lines, we have removed at most CJ 1/3 N joints, so J ≤ CJ 1/3 N , and hence J = O(N 3/2 ). This completes the proof in the case of R 3 . This proof also extends to F d .

Vanishing on planes.
How can we try to adapt the above proof to show that N planes in R 6 form O(N 3/2 ) joints? The main obstacle is to generalize the vanishing lemma from lines to planes. The above proof would extend verbatim to joints of planes if the answer to the following question were yes. Attempt I. Given distinct points p 1 , . . . , p ( n+2 2 ) in the plane, if g ∈ R[x, y] ≤n satisfies the vanishing conditions g(p 1 ) = 0, . . . , g(p ( n+2 2 ) ) = 0, does this imply that g is identically zero?
Of course, the answer to this question is no, since the vanishing locus of the polynomial on a plane could be a curve. Clearly it is impossible to force a twovariable polynomial to vanish by forcing it to vanish at any finite number of points. Instead of asking for polynomials to vanish at the joints, we can ask them to vanish to high multiplicity at the joints. This idea, known as the "method of multiplicities" [DKSS13], has been fruitful in the study of the joints problem [Zha20,YZ22], and it was also used to improve bounds on the finite field Kakeya problem [Dvi09, BC2121]. Attempt II. Given a point p 1 in the plane, if g ∈ R[x, y] ≤n vanishes to order more than n at p 1 ; equivalently, if g satisfies the vanishing conditions ∂ i+j g ∂x i ∂y j (p 1 ) = 0 for all 0 ≤ i + j ≤ n, does this imply that g is identically zero?
The answer to this one is yes, and it shows how using derivatives creates a correct vanishing lemma. However, this vanishing lemma is completely useless for our application since we want to use this vanishing lemma somehow to bound the number of joints lying on a plane and this method ignores all of the joints but one on each plane. Perhaps we can create a correct and useful vanishing lemma by combining the ideas of Attempts I and II. Attempt III. Given distinct points p 1 , . . . , p m in the plane with m ∼ n 2 /s 2 , if g ∈ R[x, y] ≤n vanishes to order at least s at each point, does this imply that g is identically zero?
Unfortunately the answer is no again. Indeed g(x, y) = y s vanishes to order s on the entire x-axis.
We have dim R[x, y] ≤n = n+2 2 , less than the number of linear constraints on the coefficients of g imposed by asking g to vanish to order at least s on Θ(n 2 /s 2 ) given points (each such point gives s+2 2 constraints). This counterexample must imply that some of these linear constraints are linearly dependent. Our proof strategy is to build a vanishing lemma using a linearly independent set of such constraints on the coefficients of g.
Remark. Another very natural strategy for extending the proof of joints of lines to planes is to consider, instead of a single polynomial that vanishes on all the joints, now a pair of polynomials that vanish on all the joints. For this approach to be useful, one would like the pair of polynomials, when restricted to each plane, to either be coprime or one of them to vanish. This seems like a difficult condition to satisfy and we suspect that it is not possible, at least if one wants the degrees of the polynomials to be small. This problem appears to be related to the inverse Bézout problem. Given a set of N points in R 2 , can one always find a pair of coprime polynomials P, Q both vanishing on all N points and (deg P )(deg Q) = O(N )? The answer is no, by putting half of the N points on a N/2 × N/2 grid and the other half on a line (this grid-and-line example shows up again in our discussion below). A partial converse to Bézout's theorem is known in 2-dimensions but open in higher dimensions (see Tao [Tao12]).

Key idea I: collecting linearly independent vanishing conditions.
We define a vanishing condition to be a single homogeneous linear constraint on the coefficients of a polynomial g ∈ R[x, y] ≤n that arises from requiring some particular higher order directional derivative to vanish at some point. For example, for a two variable polynomial g, some examples of vanishing conditions are (a) g(2, 4) = 0, (b) ∂g ∂x (2, 1) = 0, and (c) ∂ 2 g ∂x 2 − ∂ 2 g ∂x∂y (−1, 2) = 0. For a positive integer r, an r-th order vanishing condition on g at p is a vanishing condition of the form Dg(p) = 0 where D is an (r − 1)-th order derivative operator, i.e., a linear combination of ∂ r−1 /∂ r1 x 1 · · · ∂ rd x d for some r 1 + · · · + r d = r − 1. (We will not need mixed order vanishing conditions for joints of flats, but they will be needed for joints of varieties.) For now, let us focus on a single plane and study vanishing conditions on g ∈ R[x, y] ≤n . Vanishing conditions can be viewed as linear functionals on the vector space R[x, y] ≤n , though it will be helpful later to also keep track of the (derivative operator, point) pair (D, p) that generates the vanishing condition Dg(p) = 0.
We now devise a procedure for selecting a basis of linear functionals on R[x, y] ≤n . As a first attempt, we fix an arbitrary order on P, say p 1 , . . . , p r and cycle through the points (the vertical bars are a visual aid separating the epochs) We cycle through the points in the above sequence and maintain a linearly independent set of vanishing conditions on R[x, y] ≤n , starting from an empty set of vanishing conditions. The r-th time (r = 1, 2, . . . ) that we see a point p, we add to our existing collection a maximal subset of r-th order vanishing conditions so that our collection of vanishing conditions always remains linearly independent as a set of linear functionals on R[x, y] ≤n . Eventually, the process terminates once we have collected a basis of n+2 Although there is some choice in the above process in deciding which vanishing conditions to add to our collection at each step, the number of vanishing conditions added at each step does not depend on this choice. We would like to understand and control the number of vanishing conditions attached to each point as we run through the process. However, this does not seem easy. We do not know how to compute these numbers (for large n) even for an explicitly given set of points.
More importantly, the process does not always evenly assign the vanishing conditions across all the points. For example, suppose we have |P| = 2t 2 , with half of the points in P forming an t × t grid (a high-degree part), and the other t 2 points all lying on a single generic line (a low-degree part). As we run through the above process, we encounter significantly more linear dependencies among vanishing conditions at points on the line than on the grid. For large n, at the end of the process, each point on the grid receives on the order of t times as many vanishing conditions as each point on line. This is an undesirable situation, since the process leads to an unequal distribution of vanishing conditions, effectively "ignoring" the points on the low-degree algebraic structure.

Key idea II: handicaps and priority order.
To address the uneven distribution of vanishing conditions across points, we give the "disadvantaged" points a head start and cycle just among themselves many times before we cycle through the entire set of points. For example, in the earlier grid-and-line example, if p 1 , . . . , p r/2 are points on the line and p r/2+1 , . . . , p r are points on the grid, then we give points on the line a head start, e.g., More generally, we give each point p a handicap α p ∈ Z corresponding to the number of rounds of head start.
For example, suppose there are five points labeled a, b, c, d, e that we would cycle through in this order. Now we assign handicaps 0, 1, 3, 0, −1 to a, b, c, d, e respectively.
Then, for instance, c starts in round −3 and b starts in round −1. So we process the points in the following priority order : We now run the same vanishing condition collection process as earlier with this sequence of points. The r-th time (r = 1, 2, . . . ) that we see a point p, we append to our existing collection a maximal non-redundant set of r-th order vanishing conditions at p.
We would like to assign handicaps in a way so that all joints are treated equitably in the distribution of vanishing conditions (what this means precisely will be explained later). However, it appears to be a very difficult problem to determine how exactly the distribution of vanishing conditions depends on the handicaps. Intuitively, as in the grid-and-line example, we want to assign more handicap to points that are part of low-degree algebraic substructures, but it is far from obvious how to make this notion precise and useful.
2.5 Key idea III: existence of a good handicap via compactness/ smoothing. Instead of explicitly assigning handicaps, we shall indirectly prove the existence of a good choice of handicaps via a compactness/smoothing argument. (Strictly speaking, we do not actually invoke compactness here since all our domains are finite, but we believe that compactness offers a helpful perspective as the argument here is a significant generalization of the earlier compactness argument giving tight bounds for joints of lines [YZ22].) Fix a joints configuration. Let n be large and consider the function where the partition records the final number of vanishing conditions assigned to each point. While it appears to be difficult to compute this function explicitly, we can show that it has the following three properties. Bounded domain. If one point has a much bigger handicap than another point, then the latter point gets assigned no vanishing conditions since the process would have finished before the first appearance of the latter point. Such a situation will never be desirable, so we only need to consider cases where the handicaps are all bounded (as a function of n).
Monotonicity. Suppose we increase the handicap by one at a subset of points while holding others fixed. Then the number of vanishing conditions assigned to this subset of points cannot decrease, and the number of vanishing conditions assigned to the other points cannot increase. Indeed, the points with the increased handicap now appear earlier in the priority order, and thus cannot receive fewer vanishing conditions than before the change.
Lipschitz continuity. A small change in the handicap assignments can only induce a small change in the number of vanishing conditions at each point. This property is intuitively reasonable, but it requires a proof.
With these three properties, we can iteratively increase the handicaps at points that end up with too few constraints, so that we eventually balance out the distribution of constraints across all joints.
The eventual implicit assignment of handicaps across joints appears to somehow identify the "algebraicity" of each point in the configuration by assigning higher handicaps to points lying in lower-degree algebraic substructures. However, we do not know how to make this algebraicity intuition precise.
Remark. This idea of implicitly assigning handicaps came up in a simpler form previously in the work of Yu and Zhao [YZ22] in determining the tight constant for the joints theorem of lines. There one does not have to consider any priority order or iterative process of adding constraints as we do here, though one does end up proving, via compactness, the existence of a handicap (though not called by that name) along with other parameters for controlling the order of vanishing at each joint.

2.6
Putting everything together: a new vanishing lemma. Suppose we have a set F of planes in R 6 forming joints J . For a choice of handicaps α ∈ Z J , and a large integer n, we can run the above vanishing condition collection procedure separately on each plane (using handicaps α restricted to points on the plane). On each plane F ∈ F, and at each joint p on the plane F , the procedure attaches a set D p,F = D p,F ( α, n) of derivative operators. Combining these vanishing conditions over all joints on F then gives a basis of linear functionals on the space of polynomials g on F of degree at most n, where each basis element is a vanishing condition of the form Dg(p) = 0 with p ∈ F and D ∈ D p,F being a linear combination of higher order directional derivatives along F . With this data, we can now state our new vanishing lemma for joints of planes. Vanishing lemma for joints of planes (Lemma 3.9). With the above setup, if g ∈ R[x 1 , . . . , x 6 ] ≤n satisfies D 1 D 2 D 3 g(p) = 0 whenever D i ∈ D p,Fi are three derivative operators attached to three planes F 1 , F 2 , F 3 forming a joint p ∈ J , then g = 0.
Note that we are choosing a minimal set of derivative operators on each plane (as we chose a basis of linear functionals). The vanishing lemma would be trivial if each D p,Fi were the full set of directional derivative operators at p along F . Also, our proof of the vanishing lemma only works if we build the vanishing conditions following the priority order-we would not be able to say much if the joints were processed in some other arbitrary manner.
By parameter counting, this new vanishing lemma implies the following inequality. Summing over joints p formed by a triple of planes F 1 , F 2 , F 3 , we have The left-hand side is the number of linear constraints on g of the form D 1 D 2 D 3 g(p) = 0 in the vanishing lemma. Indeed, if this inequality were not satisfied, by parameter counting there would be a non-zero polynomial g of degree at most d satisfying these vanishing conditions. However, the vanishing lemma implies that such a g is identically zero, a contradiction. Recall that all these quantities |D p,F | depend on n as well as the handicap α. We can now apply a compactness/smoothing argument to choose a handicap α that minimizes Using the three properties (bounded domain, monotonicity, Lipschitz continuity) of (2.1), we can deduce that the above difference must be negligible, i.e., o(n 6 ), since otherwise we can significantly reduce the above difference by increasing the handicap by 1 at a subset of points p with small |D p,F1 | |D p,F2 | |D p,F3 |.
It follows that we can choose handicaps so that the product |D p,F1 | |D p,F2 | |D p,F3 | is roughly constant across all (p, F 1 , F 2 , F 3 ). We also know that for each plane F , since we have a basis of linear functionals on the space of polynomials on F with degree at most n. The conclusion |J | = O(N 3/2 ) then follows from a short calculation using the AM-GM inequality (see the end of Sect. 3).
In Sect. 3, we flesh out these ideas to give a complete proof of joints of planes in R 6 . In Sect. 4 we discuss two further modifications to the above proof technique. To deal with varieties, we modify our notion of higher order directional derivatives. Geometrically we are taking derivatives with respect to local coordinates on the varieties. To deal with general fields other than the reals, we use Hasse derivatives.

Joints of Planes in R 6
The purpose of this section is to prove that N planes in R 6 have O(N 3/2 ) joints. This special case contains many of the key ideas that we introduce in this paper towards the full theorem.
Let (J , F) be a joints configuration of planes in R 6 , where F is a finite set of planes and J is the set of joints formed by any three planes in F. We abuse notation slightly to handle the case when more than three planes pass through p ∈ J : in this case we arbitrarily choose three planes forming a joint at p, and only write "p ∈ F " (and say that "F contains p", etc.) if F is among the triple of planes chosen at p.

Priority order and handicaps.
First, assign an arbitrary but fixed order (referred to as the preassigned order ) to the joints J .
A handicap α = (α p ) p∈J ∈ Z J assigns an integer to each joint. Given a handicap, the associated priority order is a linear order on J × Z ≥0 defined by setting (p, r) ≺ (p , r ) -if r − α p < r − α p , or -if r − α p = r − α p and p comes before p in the preassigned order on J .

JOINTS OF VARIETIES 315
The priority ordering corresponds to the description in the previous section. Note that in particular (p, 0) ≺ (p, 1) ≺ (p, 2) ≺ · · · . We write ≺ for the strict ordering, and to allow equality. R[x 1 , . . . , x k ] ≤n denote the space of polynomials of degree at most n in k variables. Given a plane F and a joint p ∈ F , let D r p,F denote the space of all r-th order derivative operators in directions along F , i.e., every element D ∈ D r p,F gives a linear map g → Dg sending R[x 1 . . . , x 6 ] → R[x 1 , . . . , x 6 ] and D is a linear combination of compositions of r directional derivative operators along F . For example, if F is the plane spanned by the first two coordinate directions, then D r p,F is the space spanned by the operators ∂ i+j /∂x i 1 ∂x j 2 ranging over all i + j = r. (The space D r p,F here does not actually depend on p, but we include p in the notation with a view towards generalization from flats to varieties.) Let B r p,F (n) denote the subspace of all linear functionals on R[x 1 , . . . , x 6 ] ≤n of the form g → Dg(p) for some D ∈ D r p,F (i.e., an r-th order derivative along F evaluated at p). Then, for a fixed p ∈ J ∩ F , a polynomial g ∈ R[x 1 , . . . , x 6 ] ≤n lies in the common kernel of B 0 p,F (n) + B 1 p,F (n) + · · · + B r−1 p,F (n) if and only if the restriction of g to the plane F vanishes to order at least r at p. (By common kernel we mean the intersection of the kernels of all linear functionals in this space.)

Derivatives and evaluations. Let
To emphasize the difference between B and D, the elements of D r p,F are derivative operators sending polynomials to polynomials, whereas the elements of B r p,F (n) are linear functionals sending polynomials of degree up to n to scalars. Perhaps a helpful mnemonic is that D stands for "differentiation" while B stands for "basis" (we will soon use a basis of the space of linear forms on polynomials up to degree n).
For a fixed F ∈ F, let us describe a process where we go through pairs (p, r) ∈ (J ∩ F ) × Z ≥0 according to the priority order, and at each step we choose a B r p,F ( α, n) ⊂ B r p,F (n).
We will drop the dependencies on α, n, and F when there is no confusion, i.e., we write B r p ⊂ B r p for the above inclusion. In addition, all unions and direct sums in the following paragraph are taken over (p , r ) ∈ (J ∩ F ) × Z ≥0 .
Suppose we are at the start of step (p, r). At this point, we have already chosen some B r p ⊂ B r p for each (p , r ) ≺ (p, r) so that the disjoint union (p ,r )≺(p,r) B r p is a basis for (p ,r )≺(p,r) B r p . Now consider expanding this space to (p ,r ) (p,r) B r p by adding in all the r-th order derivative evaluations at p along F . We desire to expand the basis accordingly. As such, we choose a set B r p ⊂ B r p so that the disjoint union (p ,r ) (p,r) B r p becomes a basis of (p ,r ) (p,r) B r p . Note that while we have some choice about which elements of B r p to include as new basis elements, the size of B r p does not depend on any choice, and is only a function n and the priority order. We will provide a more direct formula for B r p shortly.
Since each element of B r p,F (n) can be written as g → Dg(p) for some D ∈ D r p,F , we can choose with the same size as B r p,F ( α, n) so that We write As we range over all joints p on F , the sets B p,F ( α, n) combine to form a basis of the space of linear forms on polynomials of degree at most n on F . Thus We may omit the parenthetical α and n in our notation when these parameters do not change and the context is clear. Some of the arguments below will involve comparing different values of α and n, in which case we will state the dependencies explicitly. We may also omit F when we are not considering other planes.

Polynomials with given vanishing orders.
In this and the next subsection, we focus our attention on a single fixed plane F ∼ = R 2 . Fix a finite set of points P ⊂ F (which we will later take to be the joints on F ). Given a vector v = (v p ) p∈P ∈ Z P ≥0 , let T( v, n) = {g ∈ R[x, y] ≤n : g vanishes to order ≥ v p at each p ∈ P} (i.e., the partial derivatives satisfy ∂ i+j g ∂x i ∂y j (p) = 0 for all i + j < v p ). We would like to understand how the dimension of T( v, n) changes with v and n. We are particularly interested in the following quantity, which we will shortly relate below in (3.4) to |B r p,F ( α, n)|: for p ∈ P, set Here, given a pair of subspaces W ≤ U , we write codim U W for the relative codimension of W in U . Also e p ∈ Z P is the vector with 1 at p and 0 elsewhere. Note, for each p ∈ P, the space T( v + e p , n) is the nullspace of the map on T( v, n) that sends every polynomial g to all its v p -th order derivatives evaluated at p, and thus b p ( v, n) is the rank of this map. The following basic fact will be useful: Lemma 3.1 (Bounded domain). If v ∈ Z P ≥0 has v p > n for some p ∈ P, then dim T( v, n) = 0.
Proof. This is the statement that no nonzero polynomial of degree at most n can vanish to order more than n at some point. and with equality at p. Then b p ( v (1) , n) Proof. Earlier we saw that for each i = 1, 2, b p ( v (i) , n) is the rank of the map on T( v (i) , n) that sends each polynomial to all its v n) on the rank of a map when restricted to a subspace.
The next two lemmas together will lead to the Lipschitz continuity property of b p ( v, n) as a function of v.
Lemma 3.3 Let p, q ∈ P be distinct points. Then for every v ∈ Z P ≥0 and nonnegative integer n, one has Proof. Let f be an arbitrary linear polynomial that vanishes at q but at no other point of P (such f clearly exists if the underlying field F is large enough; if not, we replace F by a field extension, which would not affect b p ( v, n) as it is a rank-type quantity). We have b p ( v + e q , n) = codim T( v+ eq,n) T( v + e p + e q , n) The inequality step follows from (3.2), observing that restricting T( v + e q , n) and T( v+ e p + e q , n) to polynomials divisible by f yields f ·T( v, n−1) and f ·T( v+ e p , n−1) respectively.

How the number of vanishing conditions varies with the handicap.
As in the previous subsection, let us continue to focus our attention on a set of points P on a fixed plane F ∼ = R 2 (which we will drop from our notation temporarily). Given a handicap α ∈ Z P (restricted to this plane), we define the vector v p,r ( α) as follows. It assigns to coordinate p ∈ P the smallest nonnegative integer r such that (p, r) (p , r ). Equivalently, the value of v p,r ( α) at p is given by max{r − α p + α p + 1, 0} if p comes strictly before p in the preassigned order, max{r − α p + α p , 0} otherwise.

(3.3)
In other words, v p,r ( α) collects the desired vanishing orders at each joint on F at the stage right before we hit (p, r) in the priority order. Define B r p ( α, n) and B p ( α, n) as in Sect. 3.2 restricted to this plane. Recall that for every (p, r) ∈ P ×Z ≥0 , the disjoint union (p ,r )≺(p,r) B r p is basis of (p ,r )≺(p,r) B r p . Then a polynomial g ∈ R[x 1 , . . . , x 6 ] ≤n lies in the common kernel of (p ,r )≺(p,r) B r p if and only if the restriction of g to the plane F vanishes to order at least v p,r q ( α) for every q ∈ P. Since adding B r p makes this set a basis for (p ,r ) (p,r) B r p , its size is the number of non-redundant constraints that we need to add to increase the order of vanishing at p by 1. Thus

JOINTS OF VARIETIES 319
The observations in the previous section then imply the following.
Proof. For each r ≥ 0, the value of v = v p,r ( α) at q is greater than n, so dim T( v, n) = 0 by Lemma 3.1.
Lemma 3.8 (Lipschitz continuity). Let p ∈ P and α (1) , α (2) ∈ Z P . Then Proof. Shifting all handicaps by the same constant does not change the priority order and thus also does not change |B p |. Since the right-hand side of the above inequality is also invariant under translation we may assume that α (1) (2) p = 0. Starting with α = α (1) , we can perform a sequence of changes where at each step we change the value of the handicap α at some p = p by exactly 1, so that the vector (α p ) p ∈P ends up being equal to (α moves. So it suffices to prove the inequality for each step in the process, i.e., showing that for every α ∈ Z P and q = p, The first inequality follows from Lemma 3.7. For the second inequality, by |B p ( α, n)| = r≥0 B r p ( α, n) and (3.4), it suffices to prove From (3.3), we see that there is some r 0 so that v p,r ( α + e q ) = v p,r ( α) for all r < r 0 and v p,r ( α + e q ) = v p,r ( α) + e q for all r ≥ r 0 . Restricting the sum to r ≥ r 0 (the earlier terms cancel), we obtain the desired inequality by Lemma 3.5.
3.5 Vanishing lemma. Now we start considering the interactions between different planes at the joints. The next statement is a vanishing lemma that is tailored to this joints problem. We omit the dependence on the handicap α and the degree n from the notation since we are keeping them fixed in this subsection. Recall from the beginning of the section that, at each joint, we arbitrarily chose three planes that form this joint. Note that this vanishing lemma is the only place in the proof where we use the hypothesis that the three planes that form a joint do not all lie in some hyperplane.
Lemma 3.9 Let (J , F) be a joints configuration of planes in R 6 . Given a handicap α ∈ Z J and its associated priority order, and a positive integer n, choose D p,F as earlier.
Then for every nonzero polynomial g ∈ R[x 1 , . . . , x 6 ] of degree at most n, one has for some joint p ∈ J formed by F 1 , F 2 , F 3 ∈ F, and some D i ∈ D p,Fi for each i = 1, 2, 3.
Proof. Suppose, on the contrary, that there were some nonzero g ∈ R[x 1 , . . . , x 6 ] ≤n such that D 1 D 2 D 3 g(p) = 0 for every p ∈ J , with F 1 , F 2 , F 3 ∈ F being the three planes passing through p, and every D i ∈ D p,Fi for each i = 1, 2, 3.
Choose p ∈ J to minimize (p, v p (g)) under ≺, where v p (g) is the order of vanishing of g at p.
Recall that D r p,F is the space of r-th order derivative operators at p along F . Since g vanishes to order exactly v p (g) at p and the planes F 1 , F 2 , F 3 do not all line in one hyperplane, there exist D 1 ∈ D r1 p,F1 , D 2 ∈ D r2 p,F2 , D 3 ∈ D r3 p,F1 with D 1 D 2 D 3 g(p) = 0 and r 1 + r 2 + r 3 = v p (g). Among all choices of D 1 , D 2 , D 3 (including choices of r 1 , r 2 , r 3 ), choose ones so that |{i ∈ [3] : D i ∈ D p,Fi }| is maximized. By the assumption at the beginning of the proof, one must have D i / ∈ D p,Fi for some i. Relabeling if necessary, assume that D 1 / ∈ D p,F1 . Suppose p ∈ F 1 ∩J and r ∈ Z ≥0 satisfy (p , r ) ≺ (p, r 1 ). We get (p , r +r 2 +r 3 ) ≺ (p, r 1 + r 2 + r 3 ) = (p, v p (g)). By the choice of p, we have (p, v p (g)) (p , v p (g)). Thus (p , r + r 2 + r 3 ) ≺ (p , v p (g)), and hence r + r 2 + r 3 < v p (g). If follows that DD 2 D 3 g(p ) = 0 for all D ∈ D r p ,F1 by the definition of vanishing order. From the above paragraph we deduce that D 2 D 3 g lies in the common kernel of B r p ,F1 ranging over all (p , r ) ∈ (F 1 ∩ J ) × Z ≥0 with (p , r ) ≺ (p, r 1 ). Since D 1 D 2 D 3 g(p) = 0, we deduce that D 2 D 3 g does not lie in the common kernel of B r1 p,F1 , i.e., there is some D ∈ D r1 p,F1 with DD 2 D 3 g(p) = 0. But this D contradicts the earlier assumption that the choice of (D 1 , D 2 , D 3 ) maximizes |{i : D i ∈ D p,Fi }|.
The next inequality uses parameter counting.

JOINTS OF VARIETIES 321
Lemma 3.10 Assume the same setup as Lemma 3.9. We have p∈J F p |D p,F ( α, n)| ≥ n + 6 6 .
Proof. Denote the left hand side by A and right hand side by B. Consider the constraints on g ∈ R[x 1 , . . . , x 6 ] ≤n where for all p ∈ J formed by the planes F 1 , F 2 , F 3 ∈ F, we require This requirement is asking A linear functionals on R[x 1 , . . . , x 6 ] ≤n , which has dimension B, to vanish at g. Hence, if A < B, then there exists a nonzero polynomial g in R[x 1 , . . . , x 6 ] ≤n that satisfies all the conditions, which would contradict Lemma 3.9.
3.6 Choosing the handicaps. We say that a joints configuration (J , F) is connected if the following graph is connected: the vertex set is J , with two joints adjacent if there is some plane in F containing both joints.
Lemma 3.11 Let (J , F) be any connected joints configuration, and let n be some positive integer. Then there exists a choice of handicap α ∈ Z J such that for some constant C that only depends on (J , F) but not n.
Proof. Fix n throughout the proof. Denote for all p ∈ J . The α p are arbitrary integers. However, note that shifting all α p by the same constant does not affect the priority order and thus does not affect W p (α). Furthermore, by Lemma 3.6, if two handicaps differ by more than n at two points on the same plane, then W p ( α) = 0. Therefore, there are only finitely many possibilities for the vector (W p ( α) : p ∈ J ). Among those possibilities, choose the one so that after sorting W p ( α) in descending order, this vector is least in lexicographical order over all such possible vectors. Suppose that the sorted result is We will show that W pi ( α)−W pi+1 ( α) ≤ C /n for some constant C to be determined. This will imply the desired statement.
Suppose for the sake of contradiction that the above claim does not hold. Let t be the least positive integer such that W pt ( α) − W pt+1 ( α) > C /n. Then let v = e p1 + · · · + e pt and let α = α − v be a new handicap. We will consider the difference between W p ( α) and W p ( α ). By Lemma 3.8, for each joint p on each plane F . We have |D p,F ( α, n)| ≤ n+2 2 by (3.1). We use the following telescoping inequality. For x 1 , x 2 , x 3 , y 1 , y 2 , y 3 ∈ [0, 1], where we choose C = 12|J |.
By the monotonicity established in Lemma 3.7, we know that However, since the difference between W p ( α) and W p ( α ) is at most C /2n, and W pt ( α) − W pt+1 ( α) > C /n, we know that W p1 ( α ), . . . , W pt ( α ) are still the t largest values among (W p ( α )) p∈J . This shows that α gives a strictly lower lexicographical order of (W p ( α )) p∈J , which is a contradiction. Hence for all p ∈ J . By the same argument, we know that W p ( α) = W p ( α ) = W p ( α − c v) holds for p ∈ J for any positive integer c. By connectedness, we can find some i ≤ t < j such that p i and p j are on the same plane. As a consequence, if c is chosen sufficiently large such that α pi −c < α pj −n, this implies that W pi ( α−c v) = 0. By our ordering this implies that W p i ( α − c v) = 0 for all i ≥ i. In particular, W pt ( α) = W pt+1 ( α) = 0, contradicting our earlier assumption that W pt ( α) − W pt+1 ( α) > C /n.
We are now ready to prove the joints theorem for a set of planes in R 6 .
Proof. (Proof that N planes in R 6 have 10/3N 3/2 joints) Assume first that the joints configuration is connected. Let n be some large positive integer. In this proof we will use O-notation to suppress constants that can depend on (J , F) arbitrarily as long as they are independent of n. Choose α according to Lemma 3.11. Then there exists W such that

JOINTS OF VARIETIES 323
for all p ∈ J . By Lemma 3.10, we have |J |W n + 2 2 3 ≥ n + 6 6 − O(n 5 ). Therefore So there is some constant c > 0 (depending on J but not on n) so that W ∈ [c, 1] for all sufficiently large n. For each p ∈ J , by a Taylor series approximation, Hence (in the summations, p ranges over joints and F ranges over planes in F), By comparing the leading term in the upper bound and the lower bound of W , i.e., letting n go to infinity, we get that and by rearranging we get that The above argument proves the result for connected joints configurations. In general, decompose the joints configuration (J , F) into connected components (in the sense of the associated graph) (J 1 , F 1 ), . . . , (J k , F k ). Denote N i = |F i | . Then Remark. The arguments here generalize straightforwardly to joints of flats in arbitrary dimensions.

Derivatives along Varieties
In this section we discuss how to generalize the argument in Sect. 3 to varieties in F d . There are two issues that we need to address. The first is to define appropriate higher order directional derivatives along varieties. As we explain below, it does not suffice to simply take derivatives along the tangent plane, as those miss the higher order data of the variety. The second is to generalize derivatives from the reals to general fields. Since we are working with polynomials, differentiation can be viewed as a formal algebraic operation. To handle fields of positive characteristics, we use Hasse derivatives.
. The elements of R V are called regular functions on V . Let p be a regular point on V , that is, a point where the Zariski tangent space of V at p is also k-dimensional. Given a nonnegative integer r, we would like to write down derivative operators D on F[x 1 , . . . , x d ] so that Dg(p) is well defined not just when g ∈ F[x 1 , . . . , x d ], but also when g is a regular function on V . The point here is that regular functions on V may be represented as polynomials in F[x 1 , . . . , x d ] in non-unique ways (by adding a polynomial that vanishes on V ), but we should study derivative operators D whose evaluation Dg(p) does not depend on this representation of g. 4.1 An explicit example. We consider the explicit example of the circle V in R 2 centered at (0, 1/2) of radius 1/2. In particular, V is defined by the equation y = x 2 + y 2 . Let p = (0, 0) be the origin. How should we define a second-order derivative at p along V ? Naively one might take ∂ 2 /∂x 2 since the tangent at p is the x-coordinate direction. However, consider evaluation of this derivative at p applied to the two sides of y = x 2 + y 2 (an identity of regular functions on V ): the left-hand side gives 0 while the right-hand side gives 2. So ∂ 2 /∂x 2 does not induce a linear functional on the space of regular functions on V .
To fix this issue, we can rewrite all regular functions on V as power series centered at p using the local coordinate x of V . Indeed, by repeated substituting y ← x 2 + y 2 , we can write y as a power series in x: We would like a derivative operator D on R[x, y] so that Dg(0, 0) equals to the coefficient of x 2 in g(x, x 2 + x 4 + 2x 6 + · · · ), which in turn equals to the coefficient of x 2 plus the coefficient of y in g(x, y). It is not hard to see that only such choice is Conversely, it is not hard to check that Dg(0, 0) = 0 for every g ∈ R[x, y] that vanishes identically on V .
Elaborating on this example further, for each nonnegative integer r, we will define D r p,V to be a one-dimensional space spanned a derivative operator D on R[x, y] such that Dg(0, 0) equals to the coefficient of x r in g(x, x 2 + x 4 + 2x 6 + · · · ). Thus (here · denotes the span) The computation in the above example can be extended to any variety over any field, as we explain below.

Local coordinates.
Given a regular point p on a k-dimensional variety V , after a translation and a linear change of coordinates, suppose that p is at the origin and the first k coordinate vectors are tangent to V . Then by assumption, there are polynomials f k+1 , . . . , f d without any constant or linear terms so that on V , we have . . . , x k ). For each i = k + 1, . . . , d, by repeated substitutions using the defining equations, as functions on V , we can write each x i as a formal power series h i (x 1 , . . . , x k ) in the local coordinates x 1 , . . . , x k for V at p.
The procedure of taking a power series described earlier can be described in algebraic geometry as a completion. We give a quick summary here and refer the reader to a standard algebraic geometry textbook, e.g., [Eis95,Chapter 7] [Vak17,Chapter 29]. Let p be a regular point on a k-dimensional variety V in F d . Let m p ⊂ R V be the maximal ideal of regular functions that vanish at p. Then the completion R p,V of R V at p is the inverse limit lim ← − R V /m m p . The family of projection maps R V → R V /m m p induces a map ι p,V : R V → R p,V . The completion should be thought of as the ring of formal power series around p. For example, when R V = F[x] and m p = (x), the completion is the ring of formal power series F x . More generally, for a regular point p on V , assuming that p is the origin and x 1 , . . . , x k ∈ m p span the Zariski cotangent space m p /m 2 p , the map F x 1 , . . . , x k → R p,v sending x i to ι p,V (x i ) is an isomorphism (say, by the Cohen structure theorem). In other words, there is a local coordinate system at p so that every regular function on V can be written as a formal power series around p.
It will be useful to know that the formal power series expansion of a regular function is zero if and only if the regular function is zero, i.e., the completion map R V → R p,V is injective. This fact follows from the Krull intersection theorem below (recall that our varieties are always irreducible).

Hasse derivatives.
In the explicit example earlier, the main goal of taking derivatives is to extract coefficients. This is a formal algebraic procedure that does not rely on real analysis. To allow for arbitrary fields, including those of positive characteristics, we use an algebraic variant known as Hasse derivatives, whose definition and basic properties we summarize below. For proofs of these basic properties of Hasse derivatives, we refer the reader to [DKSS13], where Hasse derivatives were used to study the finite field Kakeya problem. 1 · · · x δd d and γ ω := γ1 ω1 · · · γd ωd ) H ω x δ = δ ω x δ− ω for every d-tuple δ = (δ 1 , . . . , δ d ) of nonnegative integers.
In particular, H ω x δ = 0 unless δ ≥ ω coordinatewise. Over the reals, it is not hard to see that the two notions of derivatives are related by a constant factor Like usual derivatives, Hasse derivatives commute: Hasse derivatives form an algebraic generalization of the usual derivatives when acting on polynomials or formal power series. The evaluation of a Hasse derivative corresponds to coefficient extraction (without the factorial factors that might be troublesome in fields of positive characteristics). Indeed, we have the following "Taylor's theorem": given formal variables x 1 , . . . , x d , y 1 , . . . , y d and  By an affine change of coordinates, assume that p is at the origin, and the tangent space of V at p is spanned by the first k coordinate directions. For each i = k + 1, . . . , d, write each x i as a formal power series h i (x 1 , . . . , x k ) in the "local coordinates" x 1 , . . . , x k for V at p. Equivalently, h i (x 1 , . . . , x k ) is the image of x i under the completion map R V → R p,V ∼ = F x 1 , . . . , x k .
{g → Dg(p) : D ∈ D r p,V ( α, n)}. Finally, write B p,V ( α, n) := r≥0 B r p,V ( α, n) and D p,V ( α, n) := r≥0 D r p,V ( α, n). From the Krull intersection theorem, it follows that for every p ∈ V , r≥0 B r p,V ( α, n) spans the dual space of R V,≤n . Hence the disjoint union p∈V B p,V ( α, n) is a basis of the space of linear forms on R V,≤n . Thus Furthermore there is some n 0 (V ) so that dim R V,≤n is a polynomial in n for all n ≥ n 0 (V ). This is a standard fact about the Hilbert series for a variety (see, e.g., [Vak17,Chapter 18.6]).

Regular functions with given vanishing orders.
This subsection parallels Sect. 3.3. Here we fix a k-dimensional variety V and a finite set of points P ⊂ V . Given a vector v ∈ Z P ≥0 , define T( v, n) = {g ∈ R V,≤n : g vanishes to order ≥ v p at each p ∈ P}.
Set b p ( v, n) := codim T( v,n) T( v + e p , n). We omit the proofs of the next two lemmas, which mirror those of Sect. 3.3, except to note that the last line of the proof of Lemma 3.4 should be adapted as codim T( 0,n) T( 0, n − 1) = dim R V,≤n − dim R V,≤n−1 To see this we use the fact that dim R V,≤n , for sufficiently large n, equals to a polynomial (the Hilbert polynomial) whose leading term given in (5.1). The righthand side is the finite difference of this polynomial which can readily be seen to have the above form.