Joints of varieties

We generalize the Guth--Katz joints theorem from lines to varieties. A special case says that $N$ planes (2-flats) in 6 dimensions (over any field) have $O(N^{3/2})$ joints, where a joint is a point contained in a triple of these planes not all lying in some hyperplane. More generally, we prove the same bound when the set of $N$ planes is replaced by a set of 2-dimensional algebraic varieties of total degree $N$, and a joint is a point that is regular for three varieties whose tangent planes at that point are not all contained in some hyperplane. Our most general result gives upper bounds, tight up to constant factors, for joints with multiplicities for several sets of varieties of arbitrary dimensions (known as Carbery's conjecture). Our main innovation is a new way to extend the polynomial method to higher dimensional objects, relating the degree of a polynomial and its orders of vanishing on a given set of points on a variety.


Introduction
Guth and Katz [18] proved the following "joints theorem": N lines in R 3 have O(N 3/2 ) joints, where a joint is a point contained in three of the lines that do not all lie on some plane. This bound is tight up to a constant factor due to the following example: consider k generic planes-their pairwise intersections give k 2 lines and triplewise intersections give k 3 joints. The joints problem was first studied in Chazelle et al. [7]. Besides being an interesting problem in incidence geometry, it also caught the attention of harmonic analysts due to connections to the Kakeya problem as observed by Wolff [34]. This connection was further elucidated by Bennett, Carbery and Tao [1] in their work on the multilinear Kakeya problem, which in turn allowed them to improve bounds on the joints problem (prior to the Guth-Katz solution). Guth [13] later adapted techniques from the solution of the joints theorem to prove the so-called endpoint case of the Bennett-Carbery-Tao multilinear Kakeya conjecture, which can be viewed as a joints theorem for tubes (also see the exposition in [15,Section 15.8]). Guth's multilinear Kakeya result was later generalized by Zhang [38] to slabs and neighborhoods of varieties (though the latter does not translate back to the joints problem for flats).
The Guth-Katz solution of the joints problem highlights the importance of the polynomial method. Their joints theorem was also a precursor to their subsequent breakthrough on the Erdős distinct distances problem [19], which introduced a polynomial partitioning method that has found many subsequent applications. One of the key steps in [19] dealt with a point-line incidence problem in R 3 with additional constraints on the configuration of lines. These developments were partly inspired by Dvir's [8] stunningly short and elegant solution to the finite field Kakeya problem. Guth has also successfully applied the polynomial method developed in this line of work to restriction problems related to Kakeya [16,17].
Since Guth and Katz's original work, there has been significant effort in extending the joints theorem [4,5,6,12,20,21,22,23,24,25,26,37,39]. Kaplan, Sharir, and Shustin [25] and Quilodrán [26] independently extended the joints theorem from R 3 to R d , and these techniques and results extend to arbitrary fields as stated below (also see [4,9,29]). Given a set of lines in F d , a joint is a point contained in d lines with independent and spanning directions. Throughout the paper, F stands for an arbitrary field, and our constants do not depend on F. Theorem 1.1. A set of N lines in F d has at most C d N d/(d−1) joints, for some constant C d . [37] proved that N lines in F d have at most (d−1)! 1/(d −1) d N d/(d−1) joints. This leading constant is optimal, matching the above construction up to a (1 + o(1))-factor.

Recently Yu and Zhao
We generalize the joints theorem from lines to varieties, overcoming a fundamental difficulty with the polynomial method that one quickly runs into-we will elaborate more on this later. A representative case of our result says the following. Here a joint is a point contained in a triple of planes not all lying in some hyperplane. All our bounds on joints in this paper are tight up to a constant factor (depending on the dimension) due to a straightforward generalization of the example in the first paragraph. Theorem 1.2. A set of N planes in F 6 has O(N 3/2 ) joints.
In his PhD thesis, Ben Yang [35,36] proved partial results giving an upper bound N 3/2+o(1) when F = R (and also more generally for bounded degree varieties in R d -in contrast, our results on joints of varieties do not require any bounded degree hypotheses). Yang's results have two fundamental limitations: (1) an error term in the exponent and (2) the methods only work over the reals. He used a variant of the polynomial partitioning method [19], which requires real topology. More specifically, Yang applied polynomial partitioning for varieties (due to Guth [14] and extended by Blagojević, Blagojević, and Ziegler [2]) using bounded degree polynomials (due to Solymosi and Tao [27]), with the latter requiring an error term in the exponent. We introduce a novel approach that avoids both limitations.
The only other prior result on joints of higher dimensional objects says that, as a representative example, a set of L lines and F planes in F 4 has O(LF 1/2 ) joints, where now a joint is defined to be a point contained in two lines and one plane, not all lying on a hyperplane (this result was recently independently proved by Yu and Zhao [37] and Carbery and Iliopoulou [5]; Yang mentioned at the end of his thesis [36] that he could also obtain this claim, though without details). Even the "next" case of "line-plane-plane" joints was open before this work.
Incidence geometry and the polynomial method concerning higher dimensional objects often tend to be substantially more intricate compared to problems that only involve lines and points. Our work introduces a new way to tackle such problems. Let us highlight some other representative works on higher dimensional incidence problems. Solymosi and Tao [27] introduced a bounded degree variation of the polynomial partitioning method, used in Yang's proof mentioned earlier, to give nearly tight (up to a +o(1) error term in the exponent) bound for incidences between points and k-dimensional varieties of bounded degree in R d , in the spirit of the Szemerédi-Trotter theorem [28] for point-line incidences in the plane. Using different methods, Walsh [32,33] recently developed powerful techniques for understanding incidences between sets of m-dimensional and m + 1-dimensional varieties, thereby unifying a large body of incidence geometry results in the literature. However, we do not see how to apply Walsh's techniques for extending the joints theorem. The above approaches use different forms of "partitioning" and involve iteratively restricting the ambient space to a codimension-1 subvariety, which usually involves an increment in the degree of the ambient variety. By contrast, our strategy does not use any form of partitioning.
The main innovation of our work is a new method of relating degrees and orders of vanishing for multivariate polynomials. Earlier approaches, e.g., [27,32,33,35,38], consider multiple polynomials, and are related to understanding Bézout's theorem and possible inverses (see Tao's blog post [30] on inverse Bézout). Our approach instead only considers a single polynomial via parameter counting but we have to be extremely delicate in choosing vanishing conditions. We motivate and explain these ideas in Section 2. The polynomial method is already a powerful technique in discrete geometry, analysis, number theory, and theoretical computer science, and we hope that our method for handling higher dimensional objects will find additional applications.
The most general version of our result is Theorem 1.9 below, and it implies all the other statements. Next we gradually introduce the various generalizations and explain the history. The reader who is only interested in the proof of Theorem 1.2 can safely skip the rest of this section and proceed to Section 2 and Section 3 for the key ideas and the proof of Theorem 1.2.
1.1. Joints of flats. We extend Theorem 1.2 to flats of arbitrary dimensions. Given a collection of k-flats (i.e., k-dimensional flats) in F mk , a joint is defined to be a point contained in m of these k-flats and not all contained in a single hyperplane. Theorem 1.3. A set of N k-flats in F mk has at most C m,k N m/(m−1) joints, for some constant C m,k .

1.2.
Multijoints. In the joints problem, instead of a single set of lines in F d , we can consider d sets of lines L 1 , . . . , L d in F d and consider joints formed by taking one line from each L i (each point is counted as a joint at most once, for now). This variation, known as "multijoints", can be viewed as a discrete analogue of the endpoint multilinear Kakeya problem. The following bound on multijoints was conjectured by Carbery, proved in F 3 and R d by Iliopoulou [24] and in general F d by Zhang [39]. Note that the the multijoints theorem is equivalent to the joints theorem if |L i | are all within a constant factor of each other.
We extend the multijoints theorem from lines to flats. Here a point is a joint formed by several flats if these flats contain this point and have spanning and independent directions. Theorem 1.5 (Multijoints of flats). Given F 1 , . . . , F r , where F i is a set of k i -flats in F d , with d = k 1 + · · · + k r , the number of joints formed by taking one flat from each F i is at most C k 1 ,...,kr (|F 1 | · · · |F r |) 1/(r−1) for some constant C k 1 ,...,kr .

Varieties.
We extend the joints theorem from flats to varieties. Generalizing earlier notions, a point p is a joint formed by several varieties V 1 , . . . , V r if p is a regular point for each V i and their tangent spaces at p have independent and spanning directions. (Recall that a point p is a regular point of a variety V if the Zariski tangent space T p V has the same dimension as V .) The proof of the joints theorem can be easily adapted from lines to algebraic curves (e.g., see [25,26]). Here we extend the joints theorem to higher dimensional varieties. Given a set V of varieties, let deg V denote the sum of the degrees of the elements of V. Theorem 1.6 (Joints of varieties). A set V of k-dimensional varieties in F mk has at most C m,k (deg V) m/(m−1) joints for some constant C m,k .
Remark. In this paper, all varieties are assumed to be irreducible. We do not lose any generality for the joints problem with this assumption as one can always replace any algebraic set by its irreducible components.
Like earlier, we prove the result more generally for multiple sets of varieties.
k r , the number of joints formed by taking one variety from each V i is at most C k 1 ,...,kr (deg V 1 · · · deg V r ) 1/(r−1) for some constant C k 1 ,...,kr .
Previously, Iliopoulou [24] proved the multijoints theorem for algebraic curves of bounded degree in R d (here by bounded degree we mean that the leading constant C depends on the maximum degree of the curves), but it was unknown how to to generalize from R d to F d , despite knowledge of the joints theorem for a single set of curves. This is because Zhang's proof [39] of the multijoints theorem for lines (Theorem 1.4) does not easily adapt to curves.
In the setting of real varieties, Yang [35] proved an upper bound of the form C (|V 1 | · · · |V r |) 1/(r−1)+ for all > 0 where C also depends on the maximum degree of the varieties.
1.4. Joints with multiplicities. In the above formulations of joints and multijoints theorems, each point is counted as a joint at most once. Motivated by Kakeya problems, Carbery suggested a generalization where joints contained in many lines are counted with multiplicity. The following theorem about joints of lines with multiplicities was conjectured by Carbery, proved in R 3 by Iliopoulou [21], and settled in general by Zhang [39]. Theorem 1.8 (Joints of lines with multiplicities). Let L 1 , . . . , L d be multisets of lines in F d . Let M (p) denote the number of tuples of lines ( 1 , . . . , d ) ∈ L 1 ×· · ·×L d that form a joint at p. Summing over all such joints p, we have where C d is some constant. Theorem 1.8 strengthens Theorem 1.4 (multijoints of lines). The exponent in M (p) 1/(d−1) on the left-hand side is optimal as can be easily seen by duplicating every element in each set of lines m times for some large m.
Yang [35] studied a generalization of Theorem 1.8 to joints of varieties with multiplicities, but as earlier, his upper bound only holds in R d , carries an +o(1) error term in the exponent, and the leading constant depends on the maximum degree of the varieties.
Our main result, below, generalizes the above to joints of varieties counted with multiplicities. It generalizes all previously stated results. Theorem 1.9 (Joints of varieties with multiplicities). For each i = 1, . . . , r, let V i be a multiset of k i -dimensional varieties in F d , where d = k 1 + · · · + k r . Let M (p) denote the number of tuples of varieties (V 1 , . . . , V r ) ∈ V 1 × · · · × V r that form a joint at p. Summing over all such joints p, we have where C k 1 ,...,kr is some constant.
Our proof of Theorem 1.9 even in the case of lines is different from that of Zhang [39]. By our method, there is no significant difference between the proofs of Theorem 1.7 (without multiplicities) and Theorem 1.9 (with multiplicities).
1.5. Constants. We restate Theorems 1.7 and 1.9 in the following equivalent form with explicit constansts. This superficially more general formulation (formulated in [37] for flats) exposes a difficulty hierarchy of the problem. It also allows us to discuss the leading constants. While the constants below are optimal for (r, k 1 , m 1 ) = (1, 1, d), they are likely not tight in all other cases. Theorem 1.10 (Main theorem). Let k 1 , . . . , k r , m 1 , . . . , m r be positive integers. For each i = 1, . . . , r, let V i be a finite multiset of k i -dimensional varieties in F d , where d = m 1 k 1 + · · · + m r k r . We only consider joints p formed by choosing m i unordered elements from V i for each i = 1, . . . , r, and we write M (p) for the number of such choices.
(a) (without multiplicities) The number of joints is at most .
Let us explain how various specializations of Theorem 1.10 correspond to earlier results.
Previously the only other known case is (r; k 1 , k 2 ; m 1 , m 2 ) = (2; k, 1; 1, d − k), i.e., a set of k-flats and a set of lines, where each joint is formed by one k-flat and d − k lines, as proved independently by [5] and [37] (and stated without proof in [35]). Even the "next" case of (r; k 1 , k 2 ; m 1 , m 2 ) = (2; 2, 1; 2, 1) was previously unknown, corresponding to having a joint being formed by two flats and one line. Likewise, the case (r; k 1 , k 2 , k 3 ; m 1 , m 2 , m 3 ) = (3; 2, 1, 1; 1, 1, 1) allowing one set of flats and two different sets of lines was also previously unsolved. (5) (Varieties) Theorems 1.6 and 1.7 relax the degree 1 assumption, generalizing from flats to varieties. The only previously known case was for a single set of curves [25,26], namely r = 1 and k 1 = 1, as well as multiple sets of bounded degree curves in R n [24]. Theorem 1.7 is equivalent to Theorem 1.10(a) (other than constants). (6) (Multiplicities) Finally, adding in considerations of joint multiplicities, Theorem 1.8 is equivalent to Theorem 1.10(b) for lines, while Theorem 1.9 is equivalent to Theorem 1.10(b) in general (other than constants). For a single set of lines, i.e., (r, k 1 , m 1 ) = (1, 1, d), our result gives C 1;d = 1. While we know the optimal constant for joints of lines, our proof does not seem to give the optimal constant for flats or varieties. For r = 1 we conjecture that the optimal constant in Theorem 1.10 is C k,m = (m!/m m ) 1/(m−1) N m/(m−1) for all k and m, agreeing with joints of lines (k = 1). The first open case (k, m) = (2, 3) is stated below. Conjecture 1.11. A set of N planes in F 6 has at most ( √ 2/3 + o(1))N 3/2 joints.
1.6. Outline. We begin by motivating and describing, in Section 2, the key new ideas in our method. We then give, in Section 3, the proof in the special case of joints of planes in R 6 , which is representative of the general result. To obtain the result in full generality, we use higher order directional derivatives with respect to local coordinates along a variety, as well as Hasse derivatives to deal with arbitrary fields, and they are both discussed in Section 4. The complete proof of the main theorem then appears in Section 5.

Key ideas
2.1. Joints of lines. We begin by recalling the proof of Theorem 1.1 on joints of lines in R 3 following [25,26] (also see Guth's book [15, Section 2.5] for a nice exposition). The proof exposes two tools that are essential in nearly all applications of the polynomial method: parameter counting and vanishing lemma.
Let R[x 1 , . . . , x d ] ≤n denote the space of polynomials with degree at most n. Using that its dimension is n+d d , we have the following simple yet extremely useful linear algebraic consequence.
Lemma 2.1 (Parameter counting). Given a set of fewer than n+3 3 points in R 3 , there exists a nonzero polynomial of degree at most n that vanishes on all these points.
Given a set of N lines forming J joints in R n , let g be a nonzero polynomial of minimum degree that vanishes at all J joints. By the parameter counting lemma, we have deg g ≤ CJ 1/3 for some constant C > 0.
The following elementary fact is key to the polynomial method.
Lemma 2.2 (Vanishing lemma). If a degree n polynomial vanishes at more than n points on a line, then it vanishes on the whole line.
We claim that some line contains at most CJ 1/3 joints. Suppose, for contradiction, that every line contains more than CJ 1/3 joints. Since deg g ≤ CJ 1/3 , the vanishing lemma implies that g vanishes on each of the N lines. Since each joint is contained in three lines in spanning directions, the gradient ∇g vanishes at every joint. Thus ∂g/∂x, ∂g/∂y, ∂g/∂z all vanish at every joint. At least one of these partial derivatives is a nonzero polynomial of degree smaller than that of g, thereby contradicting the minimal degree assumption on g.
Thus some line contains at most CJ 1/3 joints. We can then remove this line and all its joints, and repeat the argument to find another line with at most CJ 1/3 joints. After we have removed all the lines, we have removed at most CJ 1/3 N joints, so J ≤ CJ 1/3 N , and hence J = O(N 3/2 ). This completes the proof in the case of R 3 . This proof also extends to F d .

2.2.
Vanishing on planes. How can we try to adapt the above proof to show that N planes in R 6 form O(N 3/2 ) joints? The main obstacle is to generalize the vanishing lemma from lines to planes. The above proof would extend verbatim to joints of planes if the answer to the following question were yes.
Attempt I. Given distinct points p 1 , . . . , p ( n+2 2 ) in the plane, if g ∈ R[x, y] ≤n satisfies the vanishing conditions g(p 1 ) = 0, . . . , g(p ( n+2 2 ) ) = 0, does this imply that g is identically zero? Of course, the answer to this question is no, since the vanishing locus of the polynomial on a plane could be a curve. Clearly it is impossible to force a two-variable polynomial to vanish by forcing it to vanish at any finite number of points. Instead of asking for polynomials to vanish at the joints, we can ask them to vanish to high multiplicity at the joints. This idea, known as the "method of multiplicities" [10], has been fruitful in the study of the joints problem [39,37], and it was also used to improve bounds on the finite field Kakeya problem [8,3].
Attempt II. Given a point p 1 in the plane, if g ∈ R[x, y] ≤n vanishes to order more than n at p 1 ; equivalently, if g satisfies the vanishing conditions ∂ i+j g ∂x i ∂y j (p 1 ) = 0 for all 0 ≤ i + j ≤ n, does this imply that g is identically zero?
The answer to this one is yes, and it shows how using derivatives creates a correct vanishing lemma. However, this vanishing lemma is completely useless for our application since we want to use this vanishing lemma somehow to bound the number of joints lying on a plane and this method ignores all of the joints but one on each plane. Perhaps we can create a correct and useful vanishing lemma by combining the ideas of Attempts I and II.
Attempt III. Given distinct points p 1 , . . . , p m in the plane with m ∼ n 2 /s 2 , if g ∈ R[x, y] ≤n vanishes to order at least s at each point, does this imply that g is identically zero?
Unfortunately the answer is no again. Indeed g(x, y) = y s vanishes to order s on the entire x-axis.
We have dim R[x, y] ≤n = n+2 2 , less than the number of linear constraints on the coefficients of g imposed by asking g to vanish to order at least s on Θ(n 2 /s 2 ) given points (each such point gives s+2 2 constraints). This counterexample must imply that some of these linear constraints are linearly dependent. Our proof strategy is to build a vanishing lemma using a linearly independent set of such constraints on the coefficients of g.
Remark. Another very natural strategy for extending the proof of joints of lines to planes is to consider, instead of a single polynomial that vanishes on all the joints, now a pair of polynomials that vanish on all the joints. For this approach to be useful, one would like the pair of polynomials, when restricted to each plane, to either be coprime or one of them to vanish. This seems like a difficult condition to satisfy and we suspect that it is not possible, at least if one wants the degrees of the polynomials to be small. This problem appears to be related to the inverse Bézout problem. Given a set of N points in R 2 , can one always find a pair of coprime polynomials P, Q both vanishing on all N points and The answer is no, by putting half of the N points on a N/2 × N/2 grid and the other half on a line (this grid-and-line example shows up again in our discussion below). A partial converse to Bézout's theorem is known in 2-dimensions but open in higher dimensions (see Tao [30]).

2.3.
Key idea I: collecting linearly independent vanishing conditions. We define a vanishing condition to be a single homogeneous linear constraint on the coefficients of a polynomial g ∈ R[x, y] ≤n that arises from requiring some particular higher order directional derivative to vanish at some point. For example, for a two variable polynomial g, some examples of vanishing conditions are For a positive integer r, an r-th order vanishing condition on g at p is a vanishing condition of the form Dg(p) = 0 where D is an (r − 1)-th order derivative operator, i.e., a linear combination of ∂ r−1 /∂ r 1 x 1 · · · ∂ r d x d for some r 1 + · · · + r d = r − 1. (We will not need mixed order vanishing conditions for joints of flats, but they will be needed for joints of varieties.) For now, let us focus on a single plane and study vanishing conditions on g ∈ R[x, y] ≤n . Vanishing conditions can be viewed as linear functionals on the vector space R[x, y] ≤n , though it will be helpful later to also keep track of the (derivative operator, point) pair (D, p) that generates the vanishing condition Dg(p) = 0.
We now devise a procedure for selecting a basis of linear functionals on R[x, y] ≤n . As a first attempt, we fix an arbitrary order on P, say p 1 , . . . , p r and cycle through the points (the vertical bars are a visual aid separating the epochs) We cycle through the points in the above sequence and maintain a linearly independent set of vanishing conditions on R[x, y] ≤n , starting from an empty set of vanishing conditions. The r-th time (r = 1, 2, . . . ) that we see a point p, we add to our existing collection a maximal subset of r-th order vanishing conditions so that our collection of vanishing conditions always remains linearly independent as a set of linear functionals on R[x, y] ≤n . Eventually, the process terminates once we have collected a basis of n+2 2 linear functionals on R[x, y] ≤n . Although there is some choice in the above process in deciding which vanishing conditions to add to our collection at each step, the number of vanishing conditions added at each step does not depend on this choice. We would like to understand and control the number of vanishing conditions attached to each point as we run through the process. However, this does not seem easy. We do not know how to compute these numbers (for large n) even for an explicitly given set of points.
More importantly, the process does not always evenly assign the vanishing conditions across all the points. For example, suppose we have |P| = 2t 2 , with half of the points in P forming an t × t grid (a high-degree part), and the other t 2 points all lying on a single generic line (a low-degree part). As we run through the above process, we encounter significantly more linear dependencies among vanishing conditions at points on the line than on the grid. For large n, at the end of the process, each point on the grid receives on the order of t times as many vanishing conditions as each point on line. This is an undesirable situation, since the process leads to an unequal distribution of vanishing conditions, effectively "ignoring" the points on the low-degree algebraic structure.
2.4. Key idea II: handicaps and priority order. To address the uneven distribution of vanishing conditions across points, we give the "disadvantaged" points a head start and cycle just among themselves many times before we cycle through the entire set of points. For example, in the earlier grid-and-line example, if p 1 , . . . , p r/2 are points on the line and p r/2+1 , . . . , p r are points on the grid, then we give points on the line a head start, e.g., More generally, we give each point p a handicap α p ∈ Z corresponding to the number of rounds of head start.
For example, suppose there are five points labeled a, b, c, d, e that we would cycle through in this order. Now we assign handicaps 0, 1, 3, 0, −1 to a, b, c, d, e respectively. Then, for instance, c starts in round −3 and b starts in round −1. So we process the points in the following priority order : We now run the same vanishing condition collection process as earlier with this sequence of points. The r-th time (r = 1, 2, . . . ) that we see a point p, we append to our existing collection a maximal non-redundant set of r-th order vanishing conditions at p.
We would like to assign handicaps in a way so that all joints are treated equitably in the distribution of vanishing conditions (what this means precisely will be explained later). However, it appears to be a very difficult problem to determine how exactly the distribution of vanishing conditions depends on the handicaps. Intuitively, as in the grid-and-line example, we want to assign more handicap to points that are part of low-degree algebraic substructures, but it is far from obvious how to make this notion precise and useful.
2.5. Key idea III: existence of a good handicap via compactness/smoothing. Instead of explicitly assigning handicaps, we shall indirectly prove the existence of a good choice of handicaps via a compactness/smoothing argument. (Strictly speaking, we do not actually invoke compactness here since all our domains are finite, but we believe that compactness offers a helpful perspective as the argument here is a significant generalization of the earlier compactness argument giving tight bounds for joints of lines [37].) Fix a joints configuration. Let n be large and consider the function where the partition records the final number of vanishing conditions assigned to each point. While it appears to be difficult to compute this function explicitly, we can show that it has the following three properties.
Bounded domain. If one point has a much bigger handicap than another point, then the latter point gets assigned no vanishing conditions since the process would have finished before the first appearance of the latter point. Such a situation will never be desirable, so we only need to consider cases where the handicaps are all bounded (as a function of n).
Monotonicity. Suppose we increase the handicap by one at a subset of points while holding others fixed. Then the number of vanishing conditions assigned to this subset of points cannot decrease, and the number of vanishing conditions assigned to the other points cannot increase. Indeed, the points with the increased handicap now appear earlier in the priority order, and thus cannot receive fewer vanishing conditions than before the change.
Lipschitz continuity. A small change in the handicap assignments can only induce a small change in the number of vanishing conditions at each point. This property is intuitively reasonable, but it requires a proof.
With these three properties, we can iteratively increase the handicaps at points that end up with too few constraints, so that we eventually balance out the distribution of constraints across all joints.
The eventual implicit assignment of handicaps across joints appears to somehow identify the "algebraicity" of each point in the configuration by assigning higher handicaps to points lying in lower-degree algebraic substructures. However, we do not know how to make this algebraicity intuition precise.
Remark. This idea of implicitly assigning handicaps came up in a simpler form previously in the work of Yu and Zhao [37] in determining the tight constant for the joints theorem of lines. There one does not have to consider any priority order or iterative process of adding constraints as we do here, though one does end up proving, via compactness, the existence of a handicap (though not called by that name) along with other parameters for controlling the order of vanishing at each joint.
2.6. Putting everything together: a new vanishing lemma. Suppose we have a set F of planes in R 6 forming joints J . For a choice of handicaps α ∈ Z J , and a large integer n, we can run the above vanishing condition collection procedure separately on each plane (using handicaps α restricted to points on the plane). On each plane F ∈ F, and at each joint p on the plane F , the procedure attaches a set D p,F = D p,F ( α, n) of derivative operators. Combining these vanishing conditions over all joints on F then gives a basis of linear functionals on the space of polynomials g on F of degree at most n, where each basis element is a vanishing condition of the form Dg(p) = 0 with p ∈ F and D ∈ D p,F being a linear combination of higher order directional derivatives along F . With this data, we can now state our new vanishing lemma for joints of planes.
Vanishing lemma for joints of planes (Lemma 3.9). With the above setup, if g ∈ R[x 1 , . . . , x 6 ] ≤n satisfies D 1 D 2 D 3 g(p) = 0 whenever D i ∈ D p,F i are three derivative operators attached to three planes F 1 , F 2 , F 3 forming a joint p ∈ J , then g = 0.
Note that we are choosing a minimal set of derivative operators on each plane (as we chose a basis of linear functionals). The vanishing lemma would be trivial if each D p,F i were the full set of directional derivative operators at p along F . Also, our proof of the vanishing lemma only works if we build the vanishing conditions following the priority order-we would not be able to say much if the joints were processed in some other arbitrary manner.
By parameter counting, this new vanishing lemma implies the following inequality. Summing over joints p formed by a triple of planes F 1 , F 2 , F 3 , we have The left-hand side is the number of linear constraints on g of the form D 1 D 2 D 3 g(p) = 0 in the vanishing lemma. Indeed, if this inequality were not satisfied, by parameter counting there would be a non-zero polynomial g of degree at most d satisfying these vanishing conditions. However, the vanishing lemma implies that such a g is identically zero, a contradiction.
Recall that all these quantities |D p,F | depend on n as well as the handicap α. We can now apply a compactness/smoothing argument to choose a handicap α that minimizes Using the three properties (bounded domain, monotonicity, Lipschitz continuity) of (2.1), we can deduce that the above difference must be negligible, i.e., o(n 6 ), since otherwise we can significantly reduce the above difference by increasing the handicap by 1 at a subset of points p with small It follows that we can choose handicaps so that the product |D p,F 1 | |D p,F 2 | |D p,F 3 | is roughly constant across all (p, F 1 , F 2 , F 3 ). We also know that for each plane F , p∈F |D p,F | = dim R[x, y] ≤n = n+2 2 since we have a basis of linear functionals on the space of polynomials on F with degree at most n. The conclusion |J | = O(N 3/2 ) then follows from a short calculation using the AM-GM inequality (see the end of Section 3).
In Section 3, we flesh out these ideas to give a complete proof of joints of planes in R 6 . In Section 4 we discuss two further modifications to the above proof technique. To deal with varieties, we modify our notion of higher order directional derivatives. Geometrically we are taking derivatives with respect to local coordinates on the varieties. To deal with general fields other than the reals, we use Hasse derivatives.

Joints of planes in R 6
The purpose of this section is to prove that N planes in R 6 have O(N 3/2 ) joints. This special case contains many of the key ideas that we introduce in this paper towards the full theorem.
Let (J , F) be a joints configuration of planes in R 6 , where F is a finite set of planes and J is the set of joints formed by any three planes in F. We abuse notation slightly to handle the case when more than three planes pass through p ∈ J : in this case we arbitrarily choose three planes forming a joint at p, and only write "p ∈ F " (and say that "F contains p", etc.) if F is among the triple of planes chosen at p.
3.1. Priority order and handicaps. First, assign an arbitrary but fixed order (referred to as the preassigned order ) to the joints J .
A handicap α = (α p ) p∈J ∈ Z J assigns an integer to each joint. Given a handicap, the associated priority order is a linear order on J × Z ≥0 defined by setting (p, r) ≺ (p , r ) • if r − α p < r − α p , or • if r − α p = r − α p and p comes before p in the preassigned order on J . The priority ordering corresponds to the description in the previous section. Note that in particular (p, 0) ≺ (p, 1) ≺ (p, 2) ≺ · · · . We write ≺ for the strict ordering, and to allow equality.

3.2.
Derivatives and evaluations. Let R[x 1 , . . . , x k ] ≤n denote the space of polynomials of degree at most n in k variables.
Given a plane F and a joint p ∈ F , let D r p,F denote the space of all r-th order derivative operators in directions along F , i.e., every element D ∈ D r p,F gives a linear map g → Dg sending R[x 1 . . . , x 6 ] → R[x 1 , . . . , x 6 ] and D is a linear combination of compositions of r directional derivative operators along F . For example, if F is the plane spanned by the first two coordinate directions, then D r p,F is the space spanned by the operators ∂ i+j /∂x i 1 ∂x j 2 ranging over all i + j = r. (The space D r p,F here does not actually depend on p, but we include p in the notation with a view towards generalization from flats to varieties.) Let B r p,F (n) denote the subspace of all linear functionals on R[x 1 , . . . , x 6 ] ≤n of the form g → Dg(p) for some D ∈ D r p,F (i.e., an r-th order derivative along F evaluated at p). Then, for a fixed p ∈ J ∩ F , a polynomial g ∈ R[x 1 , . . . , x 6 ] ≤n lies in the common kernel of B 0 p,F (n) + B 1 p,F (n) + · · · + B r−1 p,F (n) if and only if the restriction of g to the plane F vanishes to order at least r at p. (By common kernel we mean the intersection of the kernels of all linear functionals in this space.) To emphasize the difference between B and D, the elements of D r p,F are derivative operators sending polynomials to polynomials, whereas the elements of B r p,F (n) are linear functionals sending polynomials of degree up to n to scalars. Perhaps a helpful mnemonic is that D stands for "differentiation" while B stands for "basis" (we will soon use a basis of the space of linear forms on polynomials up to degree n).
For a fixed F ∈ F, let us describe a process where we go through pairs (p, r) ∈ (J ∩ F ) × Z ≥0 according to the priority order, and at each step we choose a B r p,F ( α, n) ⊂ B r p,F (n). We will drop the dependencies on α, n, and F when there is no confusion, i.e., we write B r p ⊂ B r p for the above inclusion. In addition, all unions and direct sums in the following paragraph are taken over (p , r ) ∈ (J ∩ F ) × Z ≥0 .
Suppose we are at the start of step (p, r). At this point, we have already chosen some B r p ⊂ B r p for each (p , r ) ≺ (p, r) so that the disjoint union (p ,r )≺(p,r) B r p is a basis for (p ,r )≺(p,r) B r p . Now consider expanding this space to (p ,r ) (p,r) B r p by adding in all the r-th order derivative evaluations at p along F . We desire to expand the basis accordingly. As such, we choose a set B r p ⊂ B r p so that the disjoint union (p ,r ) (p,r) B r p becomes a basis of (p ,r ) (p,r) B r p . Note that while we have some choice about which elements of B r p to include as new basis elements, the size of B r p does not depend on any choice, and is only a function n and the priority order. We will provide a more direct formula for B r p shortly. Since each element of B r p,F (n) can be written as g → Dg(p) for some D ∈ D r p,F , we can choose As we range over all joints p on F , the sets B p,F ( α, n) combine to form a basis of the space of linear forms on polynomials of degree at most n on F . Thus We may omit the parenthetical α and n in our notation when these parameters do not change and the context is clear. Some of the arguments below will involve comparing different values of α and n, in which case we will state the dependencies explicitly. We may also omit F when we are not considering other planes.
3.3. Polynomials with given vanishing orders. In this and the next subsection, we focus our attention on a single fixed plane F ∼ = R 2 . Fix a finite set of points P ⊂ F (which we will later take to be the joints on F ). Given a vector v = (v p ) p∈P ∈ Z P ≥0 , let T( v, n) = {g ∈ R[x, y] ≤n : g vanishes to order ≥ v p at each p ∈ P} (i.e., the partial derivatives satisfy ∂ i+j g ∂x i ∂y j (p) = 0 for all i + j < v p ). We would like to understand how the dimension of T( v, n) changes with v and n. We are particularly interested in the following quantity, which we will shortly relate below in (3.4) to |B r p,F ( α, n)|: Here, given a pair of subspaces W ≤ U , we write codim U W for the relative codimension of W in U . Also e p ∈ Z P is the vector with 1 at p and 0 elsewhere. Note, for each p ∈ P, the space T( v + e p , n) is the nullspace of the map on T( v, n) that sends every polynomial g to all its v p -th order derivatives evaluated at p, and thus b p ( v, n) is the rank of this map.
The following basic fact will be useful: has v p > n for some p ∈ P, then dim T( v, n) = 0. Proof. This is the statement that no nonzero polynomial of degree at most n can vanish to order more than n at some point.
Proof. Earlier we saw that for each i = 1, 2, b p ( v (i) , n) is the rank of the map on T( v (i) , n) that sends each polynomial to all its v (2) , n) on the rank of a map when restricted to a subspace. The next two lemmas together will lead to the Lipschitz continuity property of b p ( v, n) as a function of v. Lemma 3.3. Let p, q ∈ P be distinct points. Then for every v ∈ Z P ≥0 and nonnegative integer n, one has b p ( v + e q , n) ≥ b p ( v, n − 1).
Proof. Let f be an arbitrary linear polynomial that vanishes at q but at no other point of P (such f clearly exists if the underlying field F is large enough; if not, we replace F by a field extension, which would not affect b p ( v, n) as it is a rank-type quantity). We have The inequality step follows from (3.2), observing that restricting T( v + e q , n) and T( v + e p + e q , n) to polynomials divisible by f yields f · T( v, n − 1) and f · T( v + e p , n − 1) respectively. Lemma 3.4. Let p ∈ P. Suppose v (0) , v (1) , · · · ∈ Z P are such that v (0) ≤ v (1) ≤ · · · coordinate-wise and strictly increasing at the coordinate indexed by p. Then ≤ codim T( 0,n) T( 0, n − 1) = n + 1.

3.4.
How the number of vanishing conditions varies with the handicap. As in the previous subsection, let us continue to focus our attention on a set of points P on a fixed plane F ∼ = R 2 (which we will drop from our notation temporarily). Given a handicap α ∈ Z P (restricted to this plane), we define the vector v p,r ( α) as follows. It assigns to coordinate p ∈ P the smallest nonnegative integer r such that (p, r) (p , r ). Equivalently, the value of v p,r ( α) at p is given by v p,r p ( α) = max{r − α p + α p + 1, 0} if p comes strictly before p in the preassigned order, 3) In other words, v p,r ( α) collects the desired vanishing orders at each joint on F at the stage right before we hit (p, r) in the priority order.
Define B r p ( α, n) and B p ( α, n) as in Section 3.2 restricted to this plane. Recall that for every (p, r) ∈ P × Z ≥0 , the disjoint union (p ,r )≺(p,r) B r p is basis of (p ,r )≺(p,r) B r p . Then a polynomial g ∈ R[x 1 , . . . , x 6 ] ≤n lies in the common kernel of (p ,r )≺(p,r) B r p if and only if the restriction of g to the plane F vanishes to order at least v p,r q ( α) for every q ∈ P. Since adding B r p makes this set a basis for (p ,r ) (p,r) B r p , its size is the number of non-redundant constraints that we need to add to increase the order of vanishing at p by 1. Thus The observations in the previous section then imply the following.
Proof. For each r ≥ 0, the value of v = v p,r ( α) at q is greater than n, so dim T( v, n) = 0 by Lemma 3.1. Hence B r p ( α, n) = b p ( v, n) = 0.

n), and (3.4) gives the claim.
Lemma 3.8 (Lipschitz continuity). Let p ∈ P and α (1) , α (2) ∈ Z P . Then Proof. Shifting all handicaps by the same constant does not change the priority order and thus also does not change |B p |. Since the right-hand side of the above inequality is also invariant under translation we may assume that α p = 0. Starting with α = α (1) , we can perform a sequence of changes where at each step we change the value of the handicap α at some p = p by exactly 1, so that the vector (α p ) p ∈P ends up being equal to (α (2) p ) after exactly p ∈P α (1) p − α (2) p moves. So it suffices to prove the inequality for each step in the process, i.e., showing that for every α ∈ Z P and q = p, 0 ≤ |B p ( α, n)| − |B p ( α + e q , n)| ≤ n + 1.
The first inequality follows from Lemma 3.7. For the second inequality, by |B p ( α, n)| = r≥0 B r p ( α, n) and (3.4), it suffices to prove From (3.3), we see that there is some r 0 so that v p,r ( α + e q ) = v p,r ( α) for all r < r 0 and v p,r ( α + e q ) = v p,r ( α) + e q for all r ≥ r 0 . Restricting the sum to r ≥ r 0 (the earlier terms cancel), we obtain the desired inequality by Lemma 3.5.

3.5.
Vanishing lemma. Now we start considering the interactions between different planes at the joints. The next statement is a vanishing lemma that is tailored to this joints problem. We omit the dependence on the handicap α and the degree n from the notation since we are keeping them fixed in this subsection. Recall from the beginning of the section that, at each joint, we arbitrarily chose three planes that form this joint. Note that this vanishing lemma is the only place in the proof where we use the hypothesis that the three planes that form a joint do not all lie in some hyperplane. Lemma 3.9. Let (J , F) be a joints configuration of planes in R 6 . Given a handicap α ∈ Z J and its associated priority order, and a positive integer n, choose D p,F as earlier.
Then for every nonzero polynomial g ∈ R[x 1 , . . . , x 6 ] of degree at most n, one has for some joint p ∈ J formed by F 1 , F 2 , F 3 ∈ F, and some D i ∈ D p,F i for each i = 1, 2, 3.
Proof. Suppose, on the contrary, that there were some nonzero g ∈ R[x 1 , . . . , x 6 ] ≤n such that D 1 D 2 D 3 g(p) = 0 for every p ∈ J , with F 1 , F 2 , F 3 ∈ F being the three planes passing through p, and every D i ∈ D p,F i for each i = 1, 2, 3.
Choose p ∈ J to minimize (p, v p (g)) under ≺, where v p (g) is the order of vanishing of g at p.
Recall that D r p,F is the space of r-th order derivative operators at p along F . Since g vanishes to order exactly v p (g) at p and the planes F 1 , F 2 , F 3 do not all line in one hyperplane, there exist D 1 ∈ D r 1 p,F 1 , D 2 ∈ D r 2 p,F 2 , D 3 ∈ D r 3 p,F 1 with D 1 D 2 D 3 g(p) = 0 and r 1 + r 2 + r 3 = v p (g). Among all choices of D 1 , D 2 , D 3 (including choices of r 1 , r 2 , r 3 ), choose ones so that |{i ∈ [3] : D i ∈ D p,F i }| is maximized. By the assumption at the beginning of the proof, one must have D i / ∈ D p,F i for some i. Relabeling if necessary, assume that D 1 / ∈ D p,F 1 . Suppose p ∈ F 1 ∩J and r ∈ Z ≥0 satisfy (p , r ) ≺ (p, r 1 ). We get (p , r +r 2 +r 3 ) ≺ (p, r 1 +r 2 +r 3 ) = (p, v p (g)). By the choice of p, we have (p, v p (g)) (p , v p (g)). Thus (p , r + r 2 + r 3 ) ≺ (p , v p (g)), and hence r + r 2 + r 3 < v p (g). If follows that DD 2 D 3 g(p ) = 0 for all D ∈ D r p ,F 1 by the definition of vanishing order.
From the above paragraph we deduce that D 2 D 3 g lies in the common kernel of B r p ,F 1 ranging over all (p , r ) ∈ (F 1 ∩J )×Z ≥0 with (p , r ) ≺ (p, r 1 ). Since D 1 D 2 D 3 g(p) = 0, we deduce that D 2 D 3 g does not lie in the common kernel of B r 1 p,F 1 , i.e., there is some D ∈ D r 1 p,F 1 with DD 2 D 3 g(p) = 0. But this D contradicts the earlier assumption that the choice of (D 1 , D 2 , D 3 ) maximizes |{i : D i ∈ D p,F i }|.
The next inequality uses parameter counting.
Proof. Denote the left hand side by A and right hand side by B. Consider the constraints on g ∈ R[x 1 , . . . , x 6 ] ≤n where for all p ∈ J formed by the planes F 1 , F 2 , F 3 ∈ F, we require This requirement is asking A linear functionals on R[x 1 , . . . , x 6 ] ≤n , which has dimension B, to vanish at g. Hence, if A < B, then there exists a nonzero polynomial g in R[x 1 , . . . , x 6 ] ≤n that satisfies all the conditions, which would contradict Lemma 3.9.
3.6. Choosing the handicaps. We say that a joints configuration (J , F) is connected if the following graph is connected: the vertex set is J , with two joints adjacent if there is some plane in F containing both joints.
Lemma 3.11. Let (J , F) be any connected joints configuration, and let n be some positive integer. Then there exists a choice of handicap α ∈ Z J such that for some constant C that only depends on (J , F) but not n.
Proof. Fix n throughout the proof. Denote for all p ∈ J . The α p are arbitrary integers. However, note that shifting all α p by the same constant does not affect the priority order and thus does not affect W p (α). Furthermore, by Lemma 3.6, if two handicaps differ by more than n at two points on the same plane, then W p ( α) = 0. Therefore, there are only finitely many possibilities for the vector (W p ( α) : p ∈ J ). Among those possibilities, choose the one so that after sorting W p ( α) in descending order, this vector is least in lexicographical order over all such possible vectors. Suppose that the sorted result is We will show that W p i ( α) − W p i+1 ( α) ≤ C /n for some constant C to be determined. This will imply the desired statement. Suppose for the sake of contradiction that the above claim does not hold. Let t be the least positive integer such that W pt ( α) − W p t+1 ( α) > C /n. Then let v = e p 1 + · · · + e pt and let α = α − v be a new handicap. We will consider the difference between W p ( α) and W p ( α ). By Lemma 3.8, for each joint p on each plane F . We have |D p,F ( α, n)| ≤ n+2 2 by (3.1). We use the following telescoping inequality. For x 1 , x 2 , x 3 , y 1 , y 2 , y 3 ∈ [0, 1], By the monotonicity established in Lemma 3.7, we know that W p i ( α ) ≤ W p i ( α) for i ≤ t, and W p i ( α ) ≥ W p i ( α) for i > t. By (3.1), we know that if W p ( α ) = W p ( α) for some p, then there exists i ≤ t such that W p i ( α ) < W p i ( α). However, since the difference between W p ( α) and W p ( α ) is at most C /2n, and W pt ( α) − W p t+1 ( α) > C /n, we know that W p 1 ( α ), . . . , W pt ( α ) are still the t largest values among (W p ( α )) p∈J . This shows that α gives a strictly lower lexicographical order of (W p ( α )) p∈J , which is a contradiction.
Hence W p ( α) = W p ( α ) = W p ( α − v) for all p ∈ J . By the same argument, we know that W p ( α) = W p ( α ) = W p ( α − c v) holds for p ∈ J for any positive integer c. By connectedness, we can find some i ≤ t < j such that p i and p j are on the same plane. As a consequence, if c is chosen sufficiently large such that α p i − c < α p j − n, this implies that W p i ( α − c v) = 0. By our ordering this implies that W p i ( α − c v) = 0 for all i ≥ i. In particular, W pt ( α) = W p t+1 ( α) = 0, contradicting our earlier assumption that W pt ( α) − W p t+1 ( α) > C /n.
We are now ready to prove the joints theorem for a set of planes in R 6 .
Proof that N planes in R 6 have 10/3N 3/2 joints. Assume first that the joints configuration is connected. Let n be some large positive integer. In this proof we will use O-notation to suppress constants that can depend on (J , F) arbitrarily as long as they are independent of n. Choose α according to Lemma 3.11. Then there exists W such that for all p ∈ J . By Lemma 3.10, we have So there is some constant c > 0 (depending on J but not on n) so that W ∈ [c, 1] for all sufficiently large n. For each p ∈ J , by a Taylor series approximation, Hence (in the summations, p ranges over joints and F ranges over planes in F), By comparing the leading term in the upper bound and the lower bound of W , i.e., letting n go to infinity, we get that 8 6! · |J | ≤ N 3 27|J | 3 , and by rearranging we get that The above argument proves the result for connected joints configurations. In general, decompose the joints configuration (J , F) into connected components (in the sense of the associated graph) Remark. The arguments here generalize straightforwardly to joints of flats in arbitrary dimensions.

Derivatives along varieties
In this section we discuss how to generalize the argument in Section 3 to varieties in F d . There are two issues that we need to address. The first is to define appropriate higher order directional derivatives along varieties. As we explain below, it does not suffice to simply take derivatives along the tangent plane, as those miss the higher order data of the variety. The second is to generalize derivatives from the reals to general fields. Since we are working with polynomials, differentiation can be viewed as a formal algebraic operation. To handle fields of positive characteristics, we use Hasse derivatives.
Let V be a k-dimensional variety in F n . Let I(V ) be the ideal of polynomials in F[x 1 , . . . , x d ] that vanish on V . Define R V = F[x 1 , . . . , x d ]/I(V ). The elements of R V are called regular functions on V . Let p be a regular point on V , that is, a point where the Zariski tangent space of V at p is also k-dimensional. Given a nonnegative integer r, we would like to write down derivative operators D on F[x 1 , . . . , x d ] so that Dg(p) is well defined not just when g ∈ F[x 1 , . . . , x d ], but also when g is a regular function on V . The point here is that regular functions on V may be represented as polynomials in F[x 1 , . . . , x d ] in non-unique ways (by adding a polynomial that vanishes on V ), but we should study derivative operators D whose evaluation Dg(p) does not depend on this representation of g.

4.
1. An explicit example. We consider the explicit example of the circle V in R 2 centered at (0, 1/2) of radius 1/2. In particular, V is defined by the equation y = x 2 + y 2 . Let p = (0, 0) be the origin. How should we define a second-order derivative at p along V ? Naively one might take ∂ 2 /∂x 2 since the tangent at p is the x-coordinate direction. However, consider evaluation of this derivative at p applied to the two sides of y = x 2 + y 2 (an identity of regular functions on V ): the left-hand side gives 0 while the right-hand side gives 2. So ∂ 2 /∂x 2 does not induce a linear functional on the space of regular functions on V .
To fix this issue, we can rewrite all regular functions on V as power series centered at p using the local coordinate x of V . Indeed, by repeated substituting y ← x 2 + y 2 , we can write y as a power series in x: We would like a derivative operator D on R[x, y] so that Dg(0, 0) equals to the coefficient of x 2 in g(x, x 2 + x 4 + 2x 6 + · · · ), which in turn equals to the coefficient of x 2 plus the coefficient of y in g(x, y). It is not hard to see that only such choice is 1 2 ∂ 2 ∂x 2 + ∂ ∂y . Conversely, it is not hard to check that Dg(0, 0) = 0 for every g ∈ R[x, y] that vanishes identically on V .
Elaborating on this example further, for each nonnegative integer r, we will define D r p,V to be a one-dimensional space spanned a derivative operator D on R[x, y] such that Dg(0, 0) equals to the coefficient of x r in g(x, x 2 + x 4 + 2x 6 + · · · ). Thus (here · denotes the span) Then, for each each D ∈ D r p,V , the map sending g ∈ R[x, y] to Dg(0, 0) passes to a linear functional on the space R V = R[x, y]/I(V ) of regular functions on V .
The computation in the above example can be extended to any variety over any field, as we explain below.

4.2.
Local coordinates. Given a regular point p on a k-dimensional variety V , after a translation and a linear change of coordinates, suppose that p is at the origin and the first k coordinate vectors are tangent to V . Then by assumption, there are polynomials f k+1 , . . . , f d without any constant or linear terms so that on V , we have x k+1 = f k+1 (x 1 , . . . , x k ), . . . , x d = f d (x 1 , . . . , x k ). For each i = k + 1, . . . , d, by repeated substitutions using the defining equations, as functions on V , we can write each x i as a formal power series h i (x 1 , . . . , x k ) in the local coordinates x 1 , . . . , x k for V at p.
The procedure of taking a power series described earlier can be described in algebraic geometry as a completion. We give a quick summary here and refer the reader to a standard algebraic geometry textbook, e.g., [11,Chapter 7] [31,Chapter 29]. Let p be a regular point on a k-dimensional variety V in F d . Let m p ⊂ R V be the maximal ideal of regular functions that vanish at p. Then the completion R p,V of R V at p is the inverse limit lim ← − R V /m m p . The family of projection maps The completion should be thought of as the ring of formal power series around p. For example, when R V = F[x] and m p = (x), the completion is the ring of formal power series F x . More generally, for a regular point p on V , assuming that p is the origin and x 1 , . . . , x k ∈ m p span the Zariski cotangent space m p /m 2 p , the map F x 1 , . . . , x k → R p,v sending x i to ι p,V (x i ) is an isomorphism (say, by the Cohen structure theorem). In other words, there is a local coordinate system at p so that every regular function on V can be written as a formal power series around p.
It will be useful to know that the formal power series expansion of a regular function is zero if and only if the regular function is zero, i.e., the completion map R V → R p,V is injective. This fact follows from the Krull intersection theorem below (recall that our varieties are always irreducible).

4.3.
Hasse derivatives. In the explicit example earlier, the main goal of taking derivatives is to extract coefficients. This is a formal algebraic procedure that does not rely on real analysis. To allow for arbitrary fields, including those of positive characteristics, we use an algebraic variant known as Hasse derivatives, whose definition and basic properties we summarize below. For proofs of these basic properties of Hasse derivatives, we refer the reader to [10], where Hasse derivatives were used to study the finite field Kakeya problem.
In particular, H ω x δ = 0 unless δ ≥ ω coordinatewise. Over the reals, it is not hard to see that the two notions of derivatives are related by a constant factor Like usual derivatives, Hasse derivatives commute: Hasse derivatives form an algebraic generalization of the usual derivatives when acting on polynomials or formal power series. The evaluation of a Hasse derivative corresponds to coefficient extraction (without the factorial factors that might be troublesome in fields of positive characteristics). Indeed, we have the following "Taylor's theorem": given formal variables x 1 , . . . , x d , y 1 , . . . , y d and a polynomial g ∈ F[x 1 , . . . , x d ], we have for any g ∈ F[x 1 , . . . , x d ]. This identity can be easily checked for each monomial g(x) = x δ . From this characterization, we see that Hasse derivatives behave well under affine coordinate transforms (as we would expect for derivatives). For example, it makes sense to talk about directional Hasse derivatives without specifying a choice of a coordinate system.

4.4.
Higher order directional derivatives. Now that we have the tools of completion and Hasse derivatives, we are ready to define higher order directional derivatives at a regular point p along a k-dimensional variety V in F d , generalizing the notion for flats from Section 3. By an affine change of coordinates, assume that p is at the origin, and the tangent space of V at p is spanned by the first k coordinate directions. For each i = k + 1, . . . , d, write each x i as a formal power series h i (x 1 , . . . , x k ) in the "local coordinates" x 1 , . . . , x k for V at p. Equivalently, We define D r p,V to be the space of all linear combinations D of Hasse derivative operators on F[x 1 , . . . , x d ] such that the map F[x 1 , . . . , x d ] → F defined by g → Dg(p) equals a linear form on coefficients of the homogeneous degree r part of g(x 1 , . . . , x k ) := g(x 1 , x 2 , . . . , x k , h k+1 (x 1 , . . . , x k ), . . . , h d (x 1 , . . . , x k )), which is the power series representation of g as a regular function on V in local coordinates at p. Let us also write out this definition more explicitly. Given (γ 1 , . . . , γ k ) ∈ Z k ≥0 , define Note that the D γ p,V in the above set are linearly independent. To see this, first note that because no h i has constant or linear terms, one has D γ p,V ∈ H ( γ,0,...,0) + span H ω : ω 1 + · · · + ω d < γ 1 + · · · + γ k . (4. 2) The Hasse derivative operators H ω are linearly independent as ω ranges over Z d ≥0 . Since the top weight component of D γ p,V is H ( γ,0,...,0) , we see that the D γ p,V 's are linearly independent as γ ranges over Z k ≥0 . The key property, as well as the motivation for the above definition, is that for every D ∈ D r p,V , there is a well defined map R V → F given by g → Dg(p). To define this derivative evaluation, we can replace g ∈ R V by a representative g ∈ F[x 1 , . . . , x d ], and we need to check that Dg(p) does not depend on the choice of the representative. Indeed, if g is identically zero on V , then g = 0, and hence Dg(p) = 0.
The above explicit formula defines D r p,V assuming that p is at the origin and the tangent space of V at p is spanned by the first k coordinate directions. By an affine transformation (using (4.1) to determine the behavior of Hasse derivatives under affine transformations), we can define the space D r p,V of r-th order directional derivatives at any regular point p on a variety V .
Having defined D r p,V , we now can proceed nearly identically as in Section 3 to prove the joints theorem for varieties. Details are given in the next section.

5.
Proof of the main theorem 5.1. Priority order, handicaps, and a choice of basis. Given a set of joints J with a fixed preassigned order, and a handicap α ∈ Z J , we define the priority order ≺ on J × Z ≥0 as before.
Let n be a positive integer. Let R V,≤n denote the space of regular functions on V that can be represented as a polynomial of degree at most n in x 1 , . . . , x n . In other words, R V,≤n is the image of Define B r p,V (n) to be the set of linear functionals on R V,≤n of the form g → Dg(p) for some D ∈ D r p,V (this is a well defined linear functional as explained earlier). Note that g ∈ R V,≤n vanishes under B 0 p,V (n) + · · · + B r−1 p,V (n) if and only if g vanishes to order at least r at p. Here a regular function g on V vanishes at p to order at least r if g ∈ m r p,V where m p,V is the maximal ideal of R V corresponding to p. Equivalently, power series representation of g using local coordinates at p has no terms with degree lower than r. Now, exactly as in Section 3.2, we go through all pairs (p, r) ∈ (J ∩ V ) × Z ≥0 according to the priority order and choose sets B r p,V ( α, n) ⊂ B r p,V (n) as earlier so that the disjoint union (p ,r ) (p,r) B r p ,V ( α, n) is a basis of (p ,r ) (p,r) B r p ,V ( α, n). Choose D r p,V ( α, n) ⊂ D r p,V with the same size as B r p,V ( α, n) so that B r p,V ( α, n) = {g → Dg(p) : D ∈ D r p,V ( α, n)}. Finally, write B p,V ( α, n) := r≥0 B r p,V ( α, n) and D p,V ( α, n) := r≥0 D r p,V ( α, n).
From the Krull intersection theorem, it follows that for every p ∈ V , r≥0 B r p,V ( α, n) spans the dual space of R V,≤n . Hence the disjoint union p∈V B p,V ( α, n) is a basis of the space of linear forms on R V,≤n . Thus Furthermore there is some n 0 (V ) so that dim R V,≤n is a polynomial in n for all n ≥ n 0 (V ). This is a standard fact about the Hilbert series for a variety (see, e.g., [31,Chapter 18.6]).

5.2.
Regular functions with given vanishing orders. This subsection parallels Section 3.3.
Here we fix a k-dimensional variety V and a finite set of points P ⊂ V . Given a vector v ∈ Z P ≥0 , define We omit the proofs of the next two lemmas, which mirror those of Section 3.3, except to note that the last line of the proof of Lemma 3.4 should be adapted as To see this we use the fact that dim R V,≤n , for sufficiently large n, equals to a polynomial (the Hilbert polynomial) whose leading term given in (5.1). The right-hand side is the finite difference of this polynomial which can readily be seen to have the above form.

5.3.
How the number of vanishing conditions varies with the handicap. The lemmas in Section 3.4 can now be easily adapted to varieties. As in the previous subsection, we continue to focus our attention on a set of points P on a variety V . Given a handicap α ∈ Z P (restricted to V ), we define the vector v p,r ( α) identically to Section 5.3. We have . We omit the proofs of the following lemmas, which mirror those of Section 3.4 but now using the lemmas from the previous subsection.

Joints configuration.
We are ready to discuss joints of varieties. Here we set some notation and definitions. By a (k 1 , . . . , k r ; m 1 , . . . , m r )-joints configuration (or just a joints configuration for short) we mean a tuple (J , V 1 , . . . , V d ) as in Theorem 1.10, namely that each V i is a finite multiset of k i -dimensional varieties in F d , where d = m 1 k 1 + · · · + m r k r , and J is the set of joints formed by choosing m i elements from V i for each i = 1, . . . , r. We write M(p) for the multiset of r-tuples (S 1 , . . . , S r ), where each S i is an unordered m i -tuple of elements of S i and such that together these s = m 1 + · · · + m r varieties form a joint at p. The quantity M (p) from Theorem 1.10 is then the cardinality of M(p). We have M (p) > 0 at each p ∈ J . 5.5. Vanishing lemma. Before stating the analog to Lemma 3.9, let us first note the following observation about how high order directional derivatives of several varieties interact at a joint.
Then D 1 D 2 · · · D s g = c + higher order terms, which evaluates to c = 0 at p = 0.
The next statement is analogous to the vanishing lemma for planes in Lemma 3.9. The proof is analogous, but we write it out explicitly here since it is a critical step of the argument.
Lemma 5.8. Let (J , V 1 , . . . , V k ) be a (k 1 , . . . , k r ; m 1 , . . . , m r )-joints configuration. Let s = m 1 + · · · + m r and d = m 1 k 1 + · · · + m r k r . Fix a handicap α and its associated priority order. Fix a positive integer n. Choose D p,V as earlier. For each p ∈ J , fix a choice V 1 (p), V 2 (p), . . . , V s (p) of varieties that form a joint at p, and of which exactly m i of them come from V i for each i = 1, . . . , r.
Then for every nonzero polynomial g ∈ F[x 1 , . . . , x d ] of degree at most n, one has D 1 · · · D s g(p) = 0 for some joint p ∈ J and some D 1 ∈ D p,V 1 (p) , . . . , D s ∈ D p,Vs(p) .
Proof. Suppose, on the contrary, that there were some nonzero polynomial g ∈ F[x 1 , . . . , x d ] of degree at most n such that D 1 · · · D s g(p) = 0 for every joint p ∈ J and D 1 ∈ D p,V 1 , . . . , D s ∈ D p,Vs , where V 1 , V 2 , . . . , V s are any varieties that form a joint at p and exactly m i of them come from V i for each i = 1, . . . , r, Choose p ∈ J to minimize (p, v p (g)) under ≺, where v p (g) is the order vanishing of g at p.
Since g vanishes to order exactly v p (g) at p, by Lemma 5.7, there exist D 1 ∈ D r 1 p,V 1 (p) , . . . , D s ∈ D rs p,Vs(p) with D 1 D 2 · · · D s g(p) = 0 and r 1 + · · · + r s = v p (g). Among all choices of D 1 , . . . , D s (including choices of r 1 , . . . , r s ), choose ones so that |{i ∈ [s] : D i ∈ D p,V i (p) }| is maximized. By the assumption at the beginning of the proof, one must have D i / ∈ D p,V i for some i ∈ [s]. Relabeling if necessary, assume that D 1 / ∈ D p,V 1 (p) . (Here we are using that derivatives commute.) Suppose p ∈ V 1 (p) ∩ J and r ∈ Z ≥0 satisfy (p , r ) ≺ (p, r 1 ). We get (p , r + r 2 + · · · + r s ) ≺ (p, r 1 + r 2 + · · · + r s ) = (p, v p (g)). By the choice of p, we have (p, v p (g)) (p , v p (g)). Thus (p , r + r 2 + · · · + r s ) ≺ (p , v p (g)), and hence r + r 2 + · · · + r s < v p (g). If follows that DD 2 · · · D s g(p ) = 0 for all D ∈ D r p ,V 1 (p) by the definition of vanishing order. From the above paragraph we deduce that D 2 · · · D s g(p ) lies in the common kernel of B r p ,V 1 (p) ranging over all (p , r ) ∈ (V 1 (p) ∩ J ) × Z ≥0 with (p , r ) ≺ (p, r 1 ). Since D 1 D 2 · · · D s g(p) = 0, we deduce that D 2 · · · D s g does not lie in the common kernel of B r 1 p,V 1 (p) , i.e., there is some D ∈ D r 1 with DD 2 · · · D s g(p) = 0. But this D contradicts the earlier assumption that the choice of D 1 , . . . , D s maximizes |{i ∈ [s] : D i ∈ D p,V i (p) }|.
The next lemma is a consequence of parameter counting. Its proof is identical to that of Lemma 3.10 except that we now apply Lemma 5.8. Lemma 5.9. Assume the same setup as Lemma 5.8. We have p∈J s i=1 D p,V i (p) ( α, n) ≥ n + d d .
5.6. Choosing the handicaps. We say that a joints configuration (J , V 1 , . . . , V r ) is connected if the following graph is connected: the vertex set is J , with p, p ∈ J adjacent if there is some V ∈ V 1 ∪ · · · ∪ V r containing both p and p . lies in some common interval of length o J ,V 1 ,...,V k ,ω;n→∞ (1) as we range over p ∈ J . Here the notation means that the length of the interval tends to zero as n goes to infinity but the rate may depend on the joints configuration and ω.
Proof. The proof is analogous to Lemma 3.11 with appropriate modification. In this proof, we use o(1) to denote o J ,V 1 ,...,V k ,ω;n→∞ (1). Let (δ n ) n∈N be a sequence tending to 0 sufficiently slowly as n tends to infinity. Denote by W p ( α) the quantity (the dependence of W p ( α) on n is suppressed in the notation). .
We begin by noticing that, by Lemma 5.4, there exists some c depending on n and the joints configuration such that if α p < α p − c for two joints p, p on the same flat V , then |D p,V ( α, n)| = 0, which shows that W p ( α) = 0. Therefore, although there are infinitely many choices for α ∈ Z J , there are only finitely many possible values of (W p ) p∈J they can produce for a given n. Choose the W p ( α) so that after sorting W p in the descending order, it has the least lexicographical order. Suppose that the sorted result is W p 1 ( α) ≥ · · · ≥ W p |J | ( α).
Suppose for the sake of contradiction that the claim fails for some i. Let t be the least positive integer such that W pt ( α) − W p t+1 ( α) > δ n . Then let v = e p 1 + · · · + e pt and α = α − v. Take a constant C larger than all the degrees of the varieties in the joints configuration. Similar to the proof of Lemma 3.11, we can apply Lemma 5.6 to show that ||D p,V ( α, n)| − |D p,V ( α , n)|| / n dim V = o(1). Together with the fact that |D p,V ( α, n)| / n dim V ≤ C + o(1) (guaranteed by (5.1)) we can use a similar telescoping inequality to show that the difference between W p ( α, n) and W p ( α , n) is at most o(1). Therefore the difference is bounded by δ n /2 as long as δ n tends to 0 slowly enough. Now, by the new monotonicity established in Lemma 5.5, we know that If W p ( α) = W p ( α ) for some p ∈ J , then by (5.1), we know that there exist i ≤ t and p i ∈ V such that |D p i ,V ( α , n)| < |D p i ,V ( α, n)|, resulting in W p i ( α ) < W p i ( α). By the fact that |W p ( α ) − W p ( α)| ≤ δ n /2 for all p ∈ J and the assumption that W pt − W p t+1 > δ n , we know that W p 1 ( α ), . . . , W pt ( α ) are still the t largest ones among (W p ( α )) p∈J . Hence, that W p i ( α ) < W p i ( α) is a contradiction with the assumption of the minimality under the lexicographical order.
The previous paragraph shows that W p ( α) = W p ( α ) for every p ∈ J . As a consequence, W p ( α) = W p ( α − m v) for all positive integers m and p ∈ J . Since the joints configuration is connected, we can find i ≤ t < j such that p i and p j lie on the same variety. When m is sufficiently large, we have α p i − m < α p j − c, which forces W p i ( α − m v) to be 0. By the ordering, this shows that W p i ( α − m v) = 0 for all i ≤ i. In particular, W pt ( α) = W p t+1 ( α) = 0, contradicting W pt ( α) − W p t+1 ( α) > δ n .
Proof of Theorem 1.10(b). In this proof o(1) denotes a quantity which goes zero as n goes to infinity but can dependent arbitrarily on the joints configuration. Similar to the proof of Theorem 1.2, it suffices to consider the case where the joints configuration is connected. Set s = m 1 + · · · + m r , and J ω = p∈J ω(p) where ω(p) = M (p) 1/(s−1) .
Choose α according to Lemma 5.10. Then we can choose W so that which, after rearrangement, shows that Let V p,i be the set of varieties in V i that contain p. Then we have that for any joint p ∈ J , M (p)ω(p)W ≤  (1).
By comparing the lower and upper bounds on W , and letting n → ∞ so that the o(1) term vanishes, we have Rearranging gives the desired conclusion .
Proof of Theorem 1.10(a). As earlier, we may assume that the joints configuration is connected. Set s = m 1 + · · · + m r throughout the proof. Choose α according to Lemma 5.10 with ω(p) = 1 for all p ∈ J . Then we can choose W so that