Complete positivity and distance-avoiding sets

We introduce the cone of completely positive functions, a subset of the cone of positive-type functions, and use it to fully characterize maximum-density distance-avoiding sets as the optimal solutions of a convex optimization problem. As a consequence of this characterization, it is possible to reprove and improve many results concerning distance-avoiding sets on the sphere and in Euclidean space.


Introduction
The two prototypical geometrical problems considered in this paper are: (P1) What is the maximum surface measure m 0 (S n−1 ) that a subset of the unit sphere S n−1 = { x ∈ R n : x = 1 } can have if it does not contain pairs of orthogonal vectors? (P2) What is the maximum density m 1 (R n ) that a subset of R n can have if it does not contain pairs of points at distance 1?
Problem (P1) was posed by Witsenhausen [48]. Two antipodal open spherical caps of radius π/4 form a subset of S n−1 with no pairs of orthogonal vectors, and Kalai [20,Conjecture 2.8] conjectured that this construction is optimal, that is, that it attains m 0 (S n−1 ); this conjecture remains open for all n ≥ 2. Problem (P1) will be considered in depth in Sect. 8, where many upper bounds for m 0 (S n−1 ) will be improved.
Problem (P2) figures in Moser's collection of problems [32] and was popularized by Erdős, who conjectured that m 1 (R 2 ) < 1/4 (cf. Székely [45]); this conjecture is still open. A long-standing conjecture of L. Moser (cf. Conjecture 1 in Larman and Rogers [26]), related to Erdős's conjecture, would imply that m 1 (R n ) ≤ 1/2 n for all n ≥ 2. Moser's conjecture asserts that the maximum measure of a subset of the unit ball having no pairs of points at distance 1 is at most 1/2 n times the measure of the unit ball; it has recently been shown to be false [34]: the behavior of subsets of the unit ball that avoid distance 1 resembles Kalai's double cap conjecture. Problem (P2) will be considered in detail in Sect. 9, where upper bounds for m 1 (R n ) will be improved.
Bachoc et al. [1] proposed an upper bound for m 0 (S n−1 ) similar to the linear programming bound of Delsarte et al. [10] for the maximum cardinality of spherical codes. Recall that a continuous function f : [−1, 1] → R is of positive type for S n−1 if for every finite set U ⊆ S n−1 the matrix f (x · y) x,y∈U is positive semidefinite. Bachoc, Nebe, Oliveira, and Vallentin showed that the optimal value of the infinitedimensional optimization problem maximize S n−1 S n−1 f (x · y) dω(y)dω (x) f (1) = ω(S n−1 ) −1 , f (0) = 0, f : [−1, 1] → R is continuous and of positive type for S n−1 (1) is an upper bound for m 0 (S n−1 ). Here, ω is the surface measure on S n−1 .
Later, Oliveira and Vallentin [36] proposed an upper bound for m 1 (R n ) similar to the linear programming bound of Cohn and Elkies [7] for the maximum density of a sphere packing in R n ; the Cohn-Elkies bound has recently been used to solve the sphere-packing problem in dimensions 8 and 24 [8,46]. Recall that a continuous function f : R n → R is of positive type if for every finite set U ⊆ R n the matrix f (x−y) x,y∈U is positive semidefinite. Oliveira and Vallentin showed that the optimal value of the infinite-dimensional optimization problem f : R n → R is continuous and of positive type (2) is an upper bound for m 1 (R n ).
Here, M( f ) is the mean value of f , defined as An explicit characterization of functions of positive type for S n−1 is given by Schoenberg's theorem [40]. Likewise, functions of positive type on R n are characterized by Bochner's theorem [38,Theorem IX.9]. Using these characterizations, it is possible to rewrite and simplify problems (1) and (2), which become infinitedimensional linear programs. It then becomes possible to solve these problems by computer or even analytically; in this way, one obtains upper bounds for the geometrical parameters m 0 (S n−1 ) and m 1 (R n ). Both optimization problems above can also be strengthened by the addition of extra constraints. The best bounds for both geometrical parameters, in several dimensions, were obtained through strengthenings of the optimization problems above; see Sects. 8 and 9. A symmetric matrix A ∈ R n×n is completely positive if it is a conic combination of rank-one, symmetric, and nonnegative matrices, that is, if there are nonnegative vectors f 1 , …, f k ∈ R n such that The set of all completely positive matrices is a closed and convex cone of symmetric matrices that is strictly contained in the cone of positive-semidefinite matrices. Completely positive matrices are the main object of study in this paper.
A continuous function f : [−1, 1] → R is of completely positive type for S n−1 if for every finite set U ⊆ S n−1 the matrix f (x · y) x,y∈U is completely positive. Analogously, a continuous function f : R n → R is of completely positive type if for every U ⊆ R n the matrix f (x − y) x,y∈U is completely positive. Notice that functions of completely positive type are functions of positive type, but not every function of positive type is of completely positive type.
The central result of this paper is that, by considering functions of completely positive type instead of functions of positive type, one fully characterizes the geometrical parameters in (P1) and (P2). Table 1 New upper bounds for the independence ratio of G(S n−1 , {π/2}). Next to each bound is the number of BQP(U )-constraints used to obtain it. The lower bounds come from two opposite spherical caps. The bound for n = 3 improves on a previous bound of 0.308 by Zhao (personal communication); the bounds for n ≥ 4 improve on Witsenhausen's bound [48] Table 2 The bounds for n = 3 are due to Oliveira and Vallentin [36]; all other bounds are due to Bachoc et al. [2]. The graphs used for the subgraph constraints are indicated in the last column; they are the same ones used by Bachoc, Passuello, and Thiery (ibid.,  (1) we require f to be of completely positive type, then the optimal value of the problem is exactly m 0 (S n−1 ). Similarly, if in (2) we require f to be of completely positive type, then the optimal value is exactly m 1 (R n ).
The significance of this result is twofold. First, it gives us a source of constraints that can be added to (1) or (2) and asserts that this source is complete, that is, that the constraints are sufficient for us to obtain the exact parameters. Namely, for every finite set U ⊆ S n−1 we can add to (1) the constraint that f (x · y) x,y∈U has to be completely positive, and similarly for (2). All strengthenings of problems (1) and (2) considered so far in the literature have used such constraints. In this paper, by systematically using them, we are able to improve many of the known upper bounds for m 0 (S n−1 ) and m 1 (R n ); see Table 1 in Sect. 8 and Table 2 in Sect. 9.
Second, the characterizations of m 0 (S n−1 ) and m 1 (R n ) in terms of convex optimization problems, even computationally difficult ones, is good enough to allow us to derive some interesting theoretical results through analytical methods. For instance, denote by m d 1 ,...,d N (R n ) the maximum density that a Lebesgue-measurable set I ⊆ R n can have if it is such that x − y / ∈ {d 1 , . . . , d N } for all distinct x, y ∈ I . Bukh [6] showed, unifying results by Furstenberg et al. [17], Bourgain [5], Falconer [14], and Falconer and Marstrand [13], that, as the distances d 1 , …, d N space out, so does m d 1 ,...,d N (R n ) approach (m 1 (R n )) N . This precise asymptotic result can be recovered from (2) by using functions of completely positive type in a systematic way that can provide precise analytic results. Another result of Bukh (ibid.) that can be proved using this approach is the Turing-machine computability of m 1 (R n ). Using our convex formulation one can in principle extend this computability result to distance-avoiding sets in other geometric spaces.

Outline of the paper
The main theorem proved in this paper is Theorem 5.1, from which Theorem 1.1 follows. Theorem 5.1 is stated in terms of graphs on topological spaces and is much more general than Theorem 1.1. It has a rather technical statement, but it is in fact a natural extension of a well-known result in combinatorial optimization, namely that the independence number of a graph is the optimal value of a convex optimization problem over the cone of completely positive matrices. This connection is the main thread of this paper; it will be clarified in Sect. 3.
In Sect. 2 we will see how geometrical parameters such as m 0 (S n−1 ) and m 1 (R n ) can be modeled as the independence number of certain graphs defined over topological spaces such as the sphere. In Sect. 3 this will allow us to extend the completely positive formulation for the independence number from finite graphs to these topological graphs; this extension will rely on the introduction of the cone of completely positive operators on a Hilbert space. A study of these operators, carried out in Sect. 4, will then allow us to prove Theorem 5.1 in Sect. 5 and extend it from compact spaces to R n in Sect. 6. In Sects. 7, 8, and 9 we will see how to use Theorem 5.1 to obtain better bounds for m 0 (S n−1 ) and m 1 (R n ); these sections will be focused on computational techniques. We close in Sect. 10 by seeing how Theorem 5.1 can be used to prove Bukh's results [6] concerning sets avoiding many distances and the computability of m 1 (R n ).

Notation
All graphs considered have no loops nor parallel edges. Often, the edge set of a graph G = (V , E) is also seen as a symmetric subset of V × V . In this case, x, y ∈ V are adjacent if and only if (x, y), (y, x) ∈ E. A graph G = (V , E) is a topological graph if V is a topological space; topological properties of E (e.g., closedness, compactness) always refer to E as a subset of V × V .
If V is a metric space with metric d, then for x ∈ V and δ > 0 we denote by the open ball with center x and radius δ. The topological closure of a set X is denoted by cl X . The term "neighborhood" always means "open neighborhood", though the distinction is never really relevant.
The Euclidean inner product on R n is denoted by x · y = x 1 y 1 + · · · + x n y n for x, y ∈ R n . The (n − 1)-dimensional unit sphere is S n−1 = { x ∈ R n : x = 1 }.
All functions considered are real valued unless otherwise noted. If V is a measure space with measure ω, then the inner product of f , g ∈ L 2 (V ) is The inner product of kernels A, B ∈ L 2 (V × V ) is

y)B(x, y) dω(y)dω(x).
When V is finite and ω is the counting measure, then A, B is the trace inner product.
Denote by L 2 sym (V × V ) the space of all kernels that are symmetric, that is, self adjoint as operators. Note that A ∈ L 2 sym (V × V ) if and only if A ∈ L 2 (V × V ) and A(x, y) = A(y, x) almost everywhere. A symmetric kernel A is positive if for all f ∈ L 2 (V ) we have

Locally independent graphs
Let G = (V , E) be a graph (without loops and parallel edges). A set I ⊆ V is independent if it does not contain pairs of adjacent vertices, that is, if for all x, y ∈ I we have (x, y) / ∈ E. The independence number of G, denoted by α(G), is the maximum cardinality of an independent set in G. The problem of computing the independence number of a finite graph figures, as the complementary maximum-clique problem, in Karp's original list of 21 NP-hard problems [21].
To model the geometrical parameters m 0 (S n−1 ) and m 1 (R n ) as the independence number of some graph, we will have to extend the concept of independence number from finite to infinite graphs. Then the nature of both the vertex and edge sets plays a role; this can be best seen considering a few examples.
Let V be a metric space with metric d and take D ⊆ (0, ∞). The D-distance graph on V is the graph G(V , D) whose vertex set is V and in which vertices x, y are adjacent if d(x, y) ∈ D. Independent sets in G(V , D) are sometimes called D-avoiding sets. Let us consider a few concrete choices for V and D, corresponding to central problems in discrete geometry.
(i) The kissing number problem: V = S n−1 and D = (0, π/3). Here we consider the metric d(x, y) = arccos x · y. In this case, all independent sets in G(V , D) are finite; even more, the independence number is finite. The independent sets in G(V , D) are exactly the contact points of kissing configurations in R n , so α(G(V , D)) is the kissing number of R n .
(ii) Witsenhausen's problem (P1): V = S n−1 and D = {π/2}. Again we consider the metric d(x, y) = arccos x · y. An independent set in G(V , D) is a set without pairs of orthogonal vectors. These sets can be infinite and even have positive surface measure, so α(G(V , D)) = ∞. The right concept in this case is the measurable independence number where ω is the surface measure on the sphere. Then α ω (G(V , D)) = m 0 (S n−1 ). (iii) The sphere-packing problem: V = R n and D = (0, 1). Here we consider the Euclidean metric. The independent sets in G(V , D) are the sets of centers of spheres in a packing of spheres of radius 1/2 in R n . So independent sets in G(V , D) can be infinite but are always discrete, hence α(G(V , D)) = ∞ while independent sets always have Lebesgue measure 0. A better definition of independence number in this case would be the center density of the corresponding packing, that is, the average number of points per unit volume. is called the unit-distance graph of R n . Independent sets in this graph can be infinite and even have infinite Lebesgue measure, hence α(G(V , D)) = ∞. So the right notion of independence number is the density of a set, informally the fraction of space it covers. We will formally define the independence density In the first two examples above, the vertex set is compact. In (i), there is δ > 0 such that (0, δ) ⊆ D. Then every point has a neighborhood that is a clique (that is, a set of pairwise adjacent vertices), and this implies that all independent sets are discrete and hence finite, given the compactness of V . In (ii), 0 is isolated from D. Then every point has an independent neighborhood and there are independent sets of positive measure. In the last two examples, the vertex set is not compact. In (iii), again there is δ > 0 such that (0, δ) ⊆ D, and this implies that all independent sets are discrete, though since V is not compact they can be infinite. In (iv), 0 is again isolated from D, hence there are independent sets of positive measure and even infinite measure, given that V is not compact.
We have therefore two things at play. First, compactness of the vertex set. Second, the nature of the edge set, which in the examples above depends on 0 being isolated from D or not.
In this paper, the focus rests on graphs with compact vertex sets, though the not compact case of R n can be handled by seeing R n as a limit of tori (see Sect. 6 below). As for the edge set, we consider graphs like the ones in examples (ii) and (iv).
The graphs in examples (i) and (iii) are topological packing graphs, a concept introduced by de Laat and Vallentin [25]. These are topological graphs in which every finite clique is a subset of an open clique. In particular, every vertex has a neighborhood that is a clique. Here and in the remainder of the paper we consider locally independent graphs, which are in a sense the complements of topological packing graphs. Definition 2.1 A topological graph is locally independent if every compact independent set in it is a subset of an open independent set.
In particular, every vertex of a locally independent graph has an independent neighborhood. The graphs in examples (ii) and (iv) are locally independent, as follows from the following theorem.

Theorem 2.2 If G = (V , E) is a topological graph, if V is metrizable, and if E is closed, then G is locally independent.
Proof Let d be a metric that induces the topology on V . For V × V we consider the metric this is a continuous function. Let I ⊆ V be a nonempty and compact independent set. Since I × I is compact, the function d E has a minimum δ over Next take the set This is an open set that contains I ; it is moreover independent. Indeed, suppose x , y ∈ S are adjacent. Take x, y ∈ I such that x ∈ B(x, δ) and y ∈ B(y, δ). Then Let G = (V , E) be a topological graph and ω be a Borel measure on V . The independence number of G with respect to the measure ω is α ω (G) = sup{ ω(I ) : I ⊆ V is measurable and independent }; when speaking of the independence number of a graph, the measure considered will always be clear from the context. The following theorem is a converse of sorts to Theorem 2.2.
Proof Let I ⊆ V be a compact independent set in G . Then I is also an independent set in G and, since G is locally independent, there is an open independent set S in G that contains I . Since S is independent, E ∩(S × S) = ∅, and hence is a closed set and so cl E ⊆ (V × V )\(S × S), whence S is also an independent set in G , finishing the proof that G is locally independent.
As for the second part of the statement, clearly α ω (G ) ≤ α ω (G), so we prove the reverse inequality. Since ω is inner regular, we can restrict ourselves to compact sets, writing So, to prove the reverse inequality, it suffices to show that a compact independent set in G is also independent in G . Let I be a compact independent set in G and let S be an open independent set in G that contains I , which exists since G is locally independent.

A conic programming formulation for the independence number
One of the best polynomial-time-computable upper bounds for the independence number of a finite graph is the theta number, a graph parameter introduced by Lovász [27]. Let G = (V , E) be a finite graph. The theta number and its variants can be defined in terms of the following conic program, in which a linear function is maximized over the intersection of a convex cone with an affine subspace: Here, is a convex cone of symmetric matrices. Both the optimal value of the problem above and the problem itself are denoted by ϑ(G, K(V )).
The theta number of G, denoted by ϑ(G), is simply ϑ(G, PSD(V )), where PSD(V ) is the cone of positive-semidefinite matrices. In this case our problem becomes a semidefinite program, whose optimal value can be approximated in polynomial time to within any desired precision using the ellipsoid method [19] or interior-point methods [24]. We have moreover ϑ(G) ≥ α(G): if I ⊆ V is a nonempty independent set and χ I : V → {0, 1} is its characteristic function, then A = |I | −1 χ I ⊗ χ * I , which is the matrix such that is a feasible solution of ϑ(G, PSD(V )); moreover J , A = |I |, and hence ϑ(G) ≥ |I |. Since I is any nonempty independent set, ϑ(G) ≥ α(G) follows.
A strengthening of the Lovász theta number is the parameter ϑ (G) introduced independently by McEliece et al. [30] and Schrijver [41], obtained by taking is the cone of matrices with nonnegative entries.
Another choice for K(V ) is the cone of completely positive matrices. The proof above that ϑ(G) ≥ α(G) works just as well when K(V ) = C(V ), and hence De Klerk and Pasechnik [23] observed that a theorem of Motzkin and Straus [33] implies that the last inequality in (4) is actually tight; a streamlined proof of this fact goes as follows. If A is a feasible solution of ϑ(G, C(V )), then, after suitable normalization, where α i > 0, f i ≥ 0, and f i = 1 for all i. Since f i = 1, we have tr f i ⊗ f * i = 1, and then since tr A = 1 we must have α 1 + · · · + α n = 1. It follows that for some i we have J , f i ⊗ f * i ≥ J , A ; assume then that this is the case for i = 1. Next, observe that since A(x, y) = 0 for all (x, y) ∈ E and each f i is nonnegative, we must have f 1 (x) f 1 (y) = 0 for all (x, y) ∈ E. This implies that I , the support of f 1 , is an independent set. Denoting by ( f , g) = x∈V f (x)g(x) the Euclidean inner product in R V , we then have and, since A is any feasible solution, we get ϑ(G, C(V )) ≤ α(G). Problem (3) can be naturally extended to infinite topological graphs, as we will see now. Let G = (V , E) be a topological graph where V is compact, ω be a Borel measure on V , J ∈ L 2 (V × V ) be the constant 1 kernel, and K(V ) ⊆ L 2 sym (V × V ) be a convex cone of symmetric kernels. When V is finite with the discrete topology and ω is the counting measure, the following optimization problem is exactly (3): As before, we will denote both the optimal value (that is, the supremum of the objective function) of this problem and the problem itself by ϑ(G, K(V )).
The problem above is a straight-forward extension of (3), except that instead of the trace of the operator A we take the integral over the diagonal. Not every Hilbert-Schmidt operator has a trace, so if we were to insist on using the trace instead of the integral, we would have to require that A be trace class. Recall that A is trace class and has trace τ if for every complete orthonormal system Mercer's theorem says that a continuous and positive kernel A has a spectral decomposition in terms of continuous eigenfunctions that moreover converges absolutely and uniformly. This implies in particular that A is trace class and that its trace is the integral over the diagonal. So, as long as K(V ) is a subset of the cone of positive kernels, taking the integral over the diagonal or the trace is the same.
As before, there are at least two cones that can be put in place of K(V ). One is the cone PSD(V ) of positive kernels. The other is the cone of completely positive kernels on V , namely with the closure taken in the norm topology on L 2 (V × V ), and where f ≥ 0 means that f is nonnegative almost everywhere. Note that C(V ) ⊆ PSD(V ), and hence ϑ(G, PSD(V )) ≥ ϑ(G, C(V )).

Theorem 3.1 If G = (V , E) is a locally independent graph, if V is a compact
Hausdorff space, and if ω is an inner-regular Borel measure on V such that 0 < α ω (G) < ∞, then ϑ(G, C(V )) ≥ α ω (G).
Theorem 5.1 in Sect. 5 states that, under some extra assumptions on G and ω, one has ϑ(G, C(V )) = α ω (G), as in the finite case. The proof of this theorem is fundamentally the same as in the finite case; here is an intuitive description.
There are two key steps in the proof for finite graphs as given above. First, the matrix A is a convex combination of rank-one nonnegative matrices, as in (5). Second, this together with the constraints of our problem implies that the support of each f i in (5) is an independent set. Then the support of one of the f i s will give us a large independent set.
In the proof that ϑ(G, C(V )) = α ω (G) for an infinite topological graph we will have to repeat the two steps above. Now A will be a kernel, so it will not be in general a convex combination of finitely many rank-one kernels as in (5); Choquet's theorem [43,Theorem 10.7] will allow us to express A as a sort of convex combination of infinitely many rank-one kernels. Next, it will not be the case that the support of any function appearing in the decomposition of A will be independent, but depending on some properties of G and ω we will be able to fix this by removing from the support the measure-zero set consisting of all points that are not density points.
To be able to apply Choquet's theorem, we first need to better understand the cone C(V ); this we do next.

The completely positive and the copositive cones on compact spaces
Throughout this section, V will be a compact Hausdorff space and ω will be a finite Borel measure on V such that every open set has positive measure and ω(V ) = 1; the normalization of ω is made for convenience only.
There are two useful topologies to consider on the L 2 spaces we deal with: the norm topology and the weak topology. We begin with a short discussion about them, based on Chapter 5 of Simon [43]. Statements will be given in terms of L 2 (V ), but they also hold for L 2 (V × V ) and L 2 sym (V × V ). The norm topology on L 2 (V ) coincides with the Mackey topology, the strongest topology for which only the linear functionals f → ( f , g) for g ∈ L 2 (V ) are continuous.
The weak topology on L 2 (V ) is the weakest topology for which all linear functionals f → ( f , g) for g ∈ L 2 (V ) are continuous. A net 1 ( f α ) converges in the weak topology if and only if (( f α , g)) converges for all g ∈ L 2 (V ).
The weak and norm topologies are dual topologies, that is, the topological dual of L 2 (V ) is the same for both topologies, and hence it is isomorphic to L 2 (V ). Theorem 5.2 (iv) (ibid.) says that if X ⊆ L 2 (V ) is a convex set, then cl X is the same whether it is taken in the weak or norm topology. Since the set cone{ f ⊗ f * : f ∈ L 2 (V )and f ≥ 0 } is convex, it follows that if we take the closure in (7) in the weak topology we also obtain C(V ).
The dual cone of C(V ) is it is the cone of copositive kernels on V . This is a convex cone and, since it is closed in the weak topology on L 2 sym (V × V ), it is also closed in the norm topology. Moreover, the dual of C * (V ), namely Proof The first statement is immediate, so let us prove the second. If f ∈ L 2 (V ) is nonnegative, then f g ≥ 0, and so

Partitions and averaging 2
An ω-partition of V is a partition of V into finitely many measurable sets each of positive measure. Given a function f ∈ L 2 (V ) and an ω-partition P of V , the averaging of f on P is the function f * P : V → R such that for all X ∈ P and x ∈ X . It is immediate that f * P ∈ L 2 (V ). We also see f * P as a function with domain P, writing ( f * P)(X ) for the common value of f * P in X ∈ P.
for all X , Y ∈ P and x ∈ X , y ∈ Y . Again, A * P ∈ L 2 (V × V ); moreover, if A is symmetric, then so is A * P. The kernel A * P can also be seen as a function with domain P × P (that is, as a matrix), so (A * P)(X , Y ) is the common value of A * P in X × Y for X , Y ∈ P. Seeing A * P as a matrix allows us to show that, as a kernel, A * P has finite rank. Note also that The averaging operation preserves step functions and step kernels on the partition P. In particular, it is idempotent: if f ∈ L 2 (V ), then ( f * P) * P = f * P, and similarly for kernels.
For a proof, simply expand all the inner products. On the one hand, On the other hand, One concludes similarly that A, B * P = A * P, B * P .

Theorem 4.2
Let P be an ω-partition. If A ∈ C(V ), then A * P ∈ C(V ) and A * P ∈ C(P), where on P we consider the discrete topology and the counting measure. Similarly, if Z ∈ C * (V ), then Z * P ∈ C * (V ) and Z * P ∈ C * (P).
Proof Let us prove the second statement first. Take Z ∈ C * (V ) and f ∈ L 2 (V ) with f ≥ 0. Then f * P ≥ 0 and To see that Z * P ∈ C * (P), take a function φ : Seeing that A * P ∈ C(P) is only slightly more complicated. Given Z ∈ C * (P), and A * P ∈ C(P).

Corollary 4.3
If P is an ω-partition and if A ∈ C(V ), then there are nonnegative and nonzero functions f 1 , …, f n ∈ L 2 (V ), each one constant in each X ∈ P, such that Proof From Theorem 4.2 we know that A * P ∈ C(P). So there are nonnegative and nonzero functions φ 1 , …, φ n with domain P such that where A * P is seen as a function on P × P. The result now follows by taking f i (x) = φ i (X ) for X ∈ P and x ∈ X .

Approximation of continuous kernels
The main use of averaging is in approximating continuous kernels by finite-rank ones. We say that a continuous kernel A : V × V → R varies (strictly) less than over an ω-partition P if the variation of A in each X × Y for X , Y ∈ P is less than . We say that a partition P of V separates U ⊆ V if |U ∩ X | ≤ 1 for all X ∈ P. The main tool we need is the following result. Proof Since V is a Hausdorff space and U is finite, every x ∈ V has a neighborhood N x such that every y ∈ U \{x} is in the exterior of N x . Since A is continuous, Note C is an open cover of V . Moreover, by construction, |U ∩ S| ≤ 1 for all S ∈ C and, if x ∈ U is such that x / ∈ S for some S ∈ C, then x is in the exterior of S. Let us turn this open cover C into the desired ω-partition P.
For S ⊆ C, consider the set Then R is a partition of V that, by construction, separates U . Moreover, if X , Y ∈ R, then the variation of A in X × Y is less than /2. Indeed, note that if S ⊆ C and S ∈ C are such that E S ∩ S = ∅, then E S ⊆ S. Since B is a cover of V × V , given X , Y ∈ R there must be S × T ∈ B such that (X × Y ) ∩ (S × T ) = ∅, implying that X ∩ S = ∅ and Y ∩ T = ∅, whence X ⊆ S and Y ⊆ T . But then X × Y ⊆ S × T , and we know that the variation of A in S × T is less than /2. Now R may not be an ω-partition: though the sets in R are measurable, some may have measure 0. This does not happen, however, for sets in R that contain some point in U . Indeed, if for S ⊆ C and x ∈ U we have x ∈ E S , then x ∈ S∈S S, which is an open set. Moreover, x / ∈ S for all S ∈ C\S, and hence x is in the exterior of each S ∈ C\S. But then x is in the interior of E S and so E S has nonempty interior and hence positive measure.
Let us fix R by getting rid of sets with measure 0. Let W be the union of all sets in R with measure 0. Note cl(V \W ) = V . For if not, then there would be x ∈ W and a neighborhood N of x such that N ∩ cl(V \W ) = ∅. But then N ⊆ V \ cl(V \W ) ⊆ W , and hence ω(W ) > 0, a contradiction.
Let X 1 , …, X n be the sets of positive measure in R. Set Since V = cl(V \W ) = cl X 1 ∪ · · · ∪ cl X n , P = {X 1 , . . . , X n } is an ω-partition of V ; moreover, since U ∩ W = ∅, P separates U . Now X i ⊆ cl X i , and so the variation of A in X × Y for X , Y ∈ P is at most /2, and hence less than .
The existence of ω-partitions over which A has small variation allows us to approximate a continuous kernel by its averages.

Theorem 4.5 If a continuous kernel A
Proof Take x, y ∈ V and say x ∈ X , y ∈ Y for some X , Y ∈ P. Then Similarly, (A * P)(x, y) > A(x, y) − , and the theorem follows.

Corollary 4.6 If a continuous kernel A
Proof Using Theorem 4.5 we get as desired.
Since A is positive and continuous, Mercer's theorem implies that the trace of A is the integral over the diagonal. Since A * P is a finite-rank step kernel, its trace is also the integral over the diagonal. Then, using Theorem 4.5, [4]). An analogous result holds for C(V ) and its dual; see also Lemma 2.1 of Dobre et al. [12].
where we consider for U the discrete topology and the counting measure. Likewise, a continuous Z : Proof Take A ∈ C(V ) and let U ⊆ V be finite. For n ≥ 1, let P n be an ω-partition that separates U and over which A varies less than 1/n, as given by Theorem 4.4. Since A * P n ∈ C(P n ) and P n separates U , Theorem 4.
If A is not symmetric, we are done. So assume A is symmetric and let Z ∈ C * (V ) be such that A, Z = δ < 0. Corollary 4.6 together with the Cauchy-Schwarz inequality implies that, if A varies less than over an ω-partition P, then | A, Z − A * P, Z | < Z . So, for all small enough , if A varies less than over the ω-partition P, then Now notice that, if P is an ω-partition, then Z * P 1 ≤ Z 1 . So Together with (9) this gives Since U is a set of representatives of the parts of P, , as we wanted. The analogous result for C * (V ) can be similarly proved.
Using Theorem 4.7, we can rewrite problem ϑ(G, C(V )) (see (6)) by replacing the constraint "A ∈ C(V )" by infinitely many constraints on finite subkernels of A.

The tip of the cone of completely positive kernels
A base of a cone K is a set B ⊆ K that does not contain the origin and is such that for every nonzero x ∈ K there is a unique α > 0 for which α −1 x ∈ B. Cones with compact and convex bases have many pleasant properties that are particularly useful to the theory of conic programming [3, Chapter IV].
It is not in general clear whether C(V ) has a compact and convex base, however the following subset of C(V ) -its tip -will be just as useful in the coming developments: where cch X is the closure of the convex hull of X . Notice the closure is the same whether taken in the norm or the weak topology.
is a closed subset of the closed unit ball in L 2 (V × V ), and hence by Alaoglu's theorem [16,Theorem 5.18] it is weakly compact. If L 2 (V × V ) is separable, then the weak topology on the closed unit ball of L 2 (V × V ), and hence the weak topology on T (V ), is metrizable [16, p. 171, Exercise 50].
The tip displays a key property of a base, at least for continuous kernels.

Theorem 4.8 If A ∈ C(V ) is nonzero and continuous, then (tr
Proof For n ≥ 1, let P n be an ω-partition over which A varies less than 1/n. For each n ≥ 1, use Corollary 4.3 to write where α mn ≥ 0, f mn ≥ 0, and f mn = 1. The kernel A is in C(V ) and hence positive, so using Corollary 4.6 we have in the norm topology. Now tr A * P n = r n m=1 α mn > 0 for all large enough n, and then (tr A * P n ) −1 A * P n ∈ T (V ) for all large enough n, proving the theorem.
Finally, we also know how the extreme points of T (V ) look like.
is weakly compact and convex and since the weak topology is locally convex, it will follow from Milman's theorem [43,Theorem 9.4] that all extreme points of T (V ) are contained in B. Let The net ( f α ) lies in the closed unit ball, which is weakly compact, and hence it has a weakly converging subnet. So we may assume that the net ( f α ) is itself weakly converging; let f be its limit. Immediately Let S be a complete orthonormal system of for every g, h ∈ L 2 (V ) with g = h ≤ 1. Since f is the weak limit of ( f α ), for g, h ∈ L 2 (V ) we have Now, G has finite rank for every > 0, so we must have and, together with (11), it follows that B is weakly closed. Now we only have to argue that Conversely, 0 is clearly not a convex combination of nonzero points, and hence it is an extreme point. Moreover, if f = 1, then f ⊗ f * = 1. Now, by the Cauchy-Schwarz inequality, it is impossible for a vector of norm 1 in L 2 to be a nontrivial convex combination of other vectors of norm 1, so f ⊗ f * is an extreme point.

When is the completely positive formulation exact?
Throughout this section, the Haar measure on a compact group will always be normalized so the group has total measure 1.
When is ϑ(G, C(V )) = α ω (G)? When G is a finite graph and ω is the counting measure, equality holds, as we saw in the introduction. In the finite case, actually, equality holds irrespective of the measure. In this section, we will see some sufficient conditions on G and ω under which ϑ(G, C(V )) = α ω (G); these conditions will be satisfied by the main examples of infinite graphs considered here.
Let G = (V , E) be a topological graph. An automorphism of G is a homeomorphism σ : V → V such that (x, y) ∈ E if and only if (σ x, σ y) ∈ E. Denote by Aut(G) the set of all automorphisms of G, which is a group under function composition.
Say V is a set and a group that acts on V . We say that acts continuously on V if (i) for every σ ∈ , the map x → σ x from V to V is continuous and (ii) for every x ∈ V , the map σ → σ x from to V is continuous.

We say that acts transitively on
Assume that is compact and that it acts continuously and transitively on V and let μ be its Haar measure. Fix x ∈ V and consider the function p : The pushforward is a Borel measure; moreover, since acts transitively and since μ is invariant, it is independent of the choice of x. The pushforward is also invariant under the action of , that is, if X ⊆ V and σ ∈ , then Let V be a metric space with metric d and ω be a Borel measure on V such that every open set has positive measure. A point x in a measurable set S ⊆ V is a density We say that the metric d is a density metric for ω if for every measurable set S ⊆ V the set of all density points of S has the same measure as S, that is, almost all points of S are density points. For example, Lebesgue's density theorem states that the Euclidean metric on R n is a density metric for the Lebesgue measure.
We now come to the main theorem of the paper.
be a compact group that acts continuously and transitively on V , and ω be a multiple of the pushforward of the Haar measure on . If is metrizable via a bi-invariant density metric for the Haar measure,

Here, a bi-invariant metric on is a metric
for every angle θ > 0. Indeed, G(S n−1 , {θ }) is a locally independent graph. For we take the orthogonal group O(n); this group acts continuously and transitively on S n−1 and the surface measure on the sphere is a multiple of the pushforward of the Haar measure [29,Theorem 3.7]. The metric on O(n) ⊆ R n×n inherited from the Euclidean metric is bi-invariant and is moreover a density metric since O(n) is a Riemannian manifold [15]. More generally, any compact Lie group is metrizable via a bi-invariant metric [31,Corollary 1.4].
In the proof of the theorem, the symmetry provided by the group is used to reduce the problem to an equivalent problem on a graph over , a Cayley graph.

Cayley graphs
Let be a topological group with identity 1 and ⊆ be such that 1 / ∈ and −1 = { σ −1 : σ ∈ } = . Consider the graph whose vertex set is and in which σ , τ ∈ are adjacent if and only if σ −1 τ ∈ (which happens, since −1 = , if and only if τ −1 σ ∈ ). This is the Cayley graph over with connection set ; it is denoted by Cayley( , ). Note that acts on itself continuously and transitively and that left multiplication by an element of is an automorphism of the Cayley graph.
We will use the following construction to relate a vertex-transitive graph to a Cayley graph over any transitive subgroup of its automorphism group. Let G = (V , E) be a topological graph and ⊆ Aut(G) be a group that acts transitively on V . Fix x 0 ∈ V and set G,

Lemma 5.2 If G = (V , E) is a locally independent graph and if ⊆ Aut(G) is a topological group that acts continuously and transitively on V , then Cayley( , G,x 0 )
is locally independent for all x 0 ∈ V . If moreover ω is a multiple of the pushforward of the Haar measure μ on , then for every M ≥ 0 the graph G has a measurable independent set of measure at least M if and only if Cayley( , G,x 0 ) has a measurable independent set of measure at least M/ω(V ); in particular, Let us first prove the second statement of the theorem. By normalizing ω if necessary, we may assume that ω(V ) = 1. Then ω is the pushforward of μ, and (i) implies directly that if I ⊆ V is a measurable independent set, then p −1 (I ) ⊆ is a measurable independent set with μ( p −1 (I )) = ω(I ).
Now suppose I ⊆ is a measurable independent set. The Haar measure is inner regular, meaning that we can take a sequence C 1 , C 2 , … of compact subsets of I such that μ(I \C n ) < 1/n. Let C be the union of all C n . Since C ⊆ I , we have that C, and hence p(C), are both independent sets. Since C n is compact, p(C n ) is also compact and hence measurable. But then since As for the first statement of the theorem, suppose G is locally independent and let I ⊆ be a compact independent set. The function p is continuous and hence p(I ) ⊆ V is compact. Since G is locally independent and p(I ) is independent, there is an open independent set S in G that contains p(I ). But then p −1 (S) is an open independent set in Cayley( , G,x 0 ) that contains I , and thus the Cayley graph is locally independent.
The theta parameters of G and any corresponding Cayley graph are also related: is a compact group that acts continuously and transitively on V , and if ω is a multiple of the pushforward of the Haar measure μ on , then for all x 0 ∈ V .
In fact, there is nothing special about the cone C(V ) in the above statement; the statement holds for any cone invariant under the action of , for example the cone of positive kernels.
Proof We may assume that ω(V ) = 1. Fix x 0 ∈ V and let : Now, the right-hand side above is independent of x 0 . For if x 0 = x 0 , then since acts transitively on V there is τ ∈ such that x 0 = τ x 0 . Then using the right invariance of the Haar measure we get The measure ω is the pushforward of μ, so it is invariant under the action of and ω(V ) = 1. Continuing (12) we get Note A is the limit, in the norm topology, of a sequence (A n ), where each A n is a finite sum of kernels of the form f ⊗ f * with f ∈ L 2 (V ) nonnegative. Since is linear is the limit of ( (A n )), and hence (A) ∈ C( ), proving the claim.
Finally, J , (A) = (J ), (A) = J , A , and since A is any feasible solution of ϑ(G, C(V )), the theorem follows.

The Reynolds operator
Let V be a compact Hausdorff space, let be a compact group that acts continuously and transitively on V , and consider on V a multiple of the pushforward of the Haar measure μ on . An important tool in the proof of Theorem 5.1 will be the Reynolds operator R : The operator is defined given a group that acts on V ; the group and its action will always be clear from context. Since is compact and therefore the Haar measure is both left and right invariant, the Reynolds operator is self adjoint, that is,

Lemma 5.4 If V is a compact space, if is a compact group that acts continuously and transitively on V , and if V is metrizable via a -invariant metric, then for every continuous
is a metric inducing the product topology on V × V . Now A is continuous, and hence uniformly continuous on the compact metric space So, given > 0, if δ > 0 is such that (13) holds, then d((x, y), (x , y )) < δ implies that proving that R(A) is continuous.
Proof By normalizing ω if necessary, we may assume that where (·, ·) denotes the usual L 2 inner product in the respective spaces; this implies in particular that φ, ψ ∈ L 2 ( ). To see (14) note that, since acts transitively, for every x ∈ V there is τ ∈ such that x = τ x . Then use the invariance of the Haar measure to get So, using the invariance of ω under the action of , as we wanted. Assume without loss of generality that f ≤ 1. Continuous functions are dense in L 2 (V ), so given > 0 there is a continuous function g such that f − g < .
Since f ≤ 1, and hence g ≤ 1 + , the Cauchy-Schwarz inequality together with (14) implies that the right-hand side above is less than for all x, y ∈ V . Now g ⊗ g * is continuous, so Lemma 5.4 says that R(g ⊗ g * ) is continuous. With the above inequality, this implies that R( f ⊗ f * ) is the uniform limit of continuous functions, and hence continuous.

Proof of Theorem 5.1
Under the hypotheses of Theorem 5.1, we must establish the identity ϑ(G, C(V )) = α ω (G). The '≥' inequality follows from Theorem 3.1; for the reverse inequality we use the following lemma. So fix a connection set ⊆ and suppose Cayley( , ) is locally independent. Throughout the rest of the proof, E will be the edge set of Cayley( , ). It is immediate that that is, considering the closure of the edge set does not change the optimal value. Together with Theorem 2.3, this implies that we may assume that E is closed.
Notice that is a Hausdorff space (topological groups are Hausdorff spaces by definition) and that μ is an inner-regular Borel measure (because it is a Haar measure) that is positive on open sets (indeed, if S ⊆ is open, then { σ S : σ ∈ } is an open cover of ; since is compact, there is a finite subcover, hence μ(S) > 0 or else we would have μ( ) = 0). So we can use the results of Sect. 4.
There is a countable set E ⊆ E such that cl E = E . Indeed, since E is closed and hence compact, for every n ≥ 1 we can cover E with finitely many open balls of radius 1/n; now choose one point of E in each such ball and let E be the set of all points chosen for n = 1, 2, ….
Let (σ 1 , τ 1 ), (σ 2 , τ 2 ), … be an enumeration of E . For n ≥ 1 consider the kernel This is indeed a kernel: the norm of each summand is 2 −i times a constant that depends only on n, so T n is square integrable.
If A : × → R is continuous, and hence uniformly continuous, then for every > 0 there is n 0 such that for all n ≥ n 0 we have This implies that Let A be a feasible solution of ϑ(Cayley( , ), C( )). Since tr A = 1, Theorem 4.8 tells us that A ∈ T ( ), where T ( ) is the tip of C( ); see Sect. 4.3. Also from Sect. 4.3 we know that T ( ) is weakly compact, that it is a subset of L 2 ( × ), whose weak topology is locally convex, and that the weak topology on T ( ) is metrizable. 4 So we can apply Choquet's theorem [43,Theorem 10.7] to get a probability measure ν on T ( ) with barycenter A and ν(X ) = 1, where X is the set of extreme points of T ( ). From Theorem 4.9 we know that any element of X is of the form f ⊗ f * for some nonnegative f ∈ L 2 ( ) that is either 0 or such that f = 1. So A being the barycenter of ν means that for every K ∈ L 2 sym ( × ) we have Since A is feasible, its symmetrization R(A) is also feasible, and in particular R(A)(σ, τ ) = 0 for all (σ, τ ) ∈ E . (Note that here we need to use Lemma 5.4, and for that we need the left invariance of the metric on .) This, together with (15), (16), and the self-adjointness of the Reynolds operator gives Fatou's lemma now says that we can exchange the integral with the limit (that becomes a lim inf) to get So, since T n and all f s above are nonnegative, the set has measure 0 with respect to ν.
Taking K = J in (16), we see that we can choose is continuous, and hence from (15) we see that f satisfies We are now almost done. Let I be the set of density points in the support of f (note that f ∈ L 2 ( ), so its support is not clearly defined; here it suffices to take, however, an arbitrary representative of the equivalence class of f and then its support). Claim: Since σ , τ ∈ I are density points, there is δ > 0 such that For ζ ∈ , write N ζ = { γ ∈ : γ ζ ∈ I }; note that I = N ζ ζ . The right invariance of the metric on implies that B(ζ, δ) = B(1, δ)ζ for all ζ ∈ and δ > 0. Then, using (17) and the invariance of μ, N τ ∩ B(1, δ)).
proving the claim.
So I is independent; it remains to estimate its measure. Recall I has the same measure as the support of f . Since f = 1, if χ is the constant 1 function, then proving the lemma.
Notice that, if ϑ(G, C(V )) has an optimal solution, then Lemma 5.6 implies that the measurable independence number is attained, that is, there is a measurable independent set I with ω(I ) = α ω (G). This is the case, for instance, of the distance graph G = G(S n−1 , {θ }) for n ≥ 3. In this case, a convergence argument, akin to the one we will use in Sect. 10.2, can be used to show that ϑ(G, C(V )) has an optimal solution. This provides another proof of a result of DeCorte and Pikhurko [9].

Distance graphs on the Euclidean space
Theorem 5.1 applies only to graphs on compact spaces, but thanks to a limit argument it can be extended to some graphs on R n ; we will see now how to make this extension for distance graphs.
Let D ⊆ (0, ∞) be a set of forbidden distances and consider the D-distance graph G(R n , D), where two vertices x, y ∈ R n are adjacent if x − y ∈ D. To measure the size of an independent set in G(R n , D) we use the upper density. Given a Lebesgue-measurable set X ⊆ R n , its upper density is where vol is the Lebesgue measure. The independence density of G(R n , D) is αδ(G(R n , D)) = sup{δ(I ) : I ⊆ R n is Lebesgue-measurable and independent }.

Periodic sets and limits of tori
The key idea is to consider independent sets that are periodic. A set X ⊆ R n is periodic if there is a lattice ⊆ R n whose action leaves X invariant, that is, X + v = X for all v ∈ ; in this case we say that is a periodicity lattice of X . Given a lattice ⊆ R n spanned by vectors u 1 , …, u n , its (strict) fundamental domain with respect to u 1 , …, u n is the set F = { α 1 u 1 + · · · + α n u n : α i ∈ [−1/2, 1/2) for all i }.
A periodic set with periodicity lattice repeats itself in copies of F translated by vectors in . We identify the torus R n / with the fundamental domain F of , identifying a coset S with the unique x ∈ F such that S = x + . When speaking of an element x ∈ R n / , it is always implicit that x is the unique representative of x + that lies in the fundamental domain. Given a lattice ⊆ R n , consider the graph G(R n / , D) whose vertex set is the torus R n / and in which vertices x, y ∈ R n / are adjacent if there is v ∈ such that x − y + v ∈ D. Independent sets in G(R n / , D) correspond to periodic independent sets in G(R n , D) with periodicity lattice and vice versa. The hypothesis that D is bounded is essential: for instance, if D = (1, ∞), then for every L > 0, any x ∈ R n /LZ n would be adjacent to itself. When D is unbounded, however, a theorem of Furstenberg et al. [17] implies that αδ(G(R n , D)) = 0, so this case is not really interesting.
Though the lemma is stated in terms of the lattice LZ n , a similar statement holds for any lattice , as long as the shortest nonzero vectors have length greater than 2 sup D. The lattice LZ n is chosen here for concreteness and also because it is the lattice that will be used later on.

Proof
The torus R n /LZ n is a metric space, for instance with the metric for x, y ∈ R n /LZ n . If x, y lie in the fundamental domain with respect to the canonical basis vectors, then x − y ∞ < L and x − y < Ln 1/2 . So if v ∞ ≥ L + Ln 1/2 , then x − y + v ≥ x − y + v ∞ > Ln 1/2 . This shows that the infimum above is attained by one of the finitely many vectors v ∈ R n /LZ n with v ∞ < L + Ln 1/2 . Let L > 2 sup D. Since any nonzero v ∈ LZ n is such that v ≥ L, the graph G = G(R n /LZ n , D) is loopless. We show that x, y ∈ R n /LZ n are adjacent in G if and only if d(x, y) ∈ D, so G is a distance graph. Since D is closed, this will moreover imply that the edge set of G is closed and then, since the torus is metrizable, from Theorem 2.2 it will follow that G is locally independent.
If d(x, y) ∈ D, then immediately we have that x, y are adjacent. So suppose that x, y are adjacent, that is, that there is v ∈ LZ n such that proving the claim.
The independence numbers of the graphs G(R n /LZ n , D) are also related to the independence density of G(R n , D): where vol denotes the Lebesgue measure.
It is well known that the densities of periodic sphere packings approximate the sphere-packing density arbitrarily well [7, Appendix A]. The proof of the lemma above is very similar to the proof of this fact.
Proof Any independent set in G(R n /LZ n , D) gives rise to a periodic independent set in G(R n , D), so the '≤' inequality is immediate. Let us then prove the reverse inequality.
If D = ∅, the statement is trivial. So assume D = ∅, write r = sup D, and let I ⊆ R n be a measurable independent set. From the definition of upper density, for every > 0 there is a point p ∈ R n such that for every Now take L > 2r satisfying (19) and write X = I ∩ ( p + [−L/2 + r , L/2 − r ] n ); in words, X is obtained from I ∩ ( p + [−L/2, L/2] n ) by erasing a border of width r around the facets of the hypercube. Then consider the set The set I is, by construction, periodic with periodicity lattice LZ n , measurable, and independent. If moreover we take L large enough compared to r , then the volume of the border that was erased is negligible compared to the volume of the hypercube, and so using (19) we can make sure that |δ(I ) −δ(I )| < . Since I is an arbitrary measurable independent set, we just proved that for any > 0 and any L 0 ≥ 0 there is L ≥ L 0 such that establishing the reverse inequality.

Some harmonic analysis
This is a good place to gather some notation and basic facts about harmonic analysis, which will be used next to extend Theorem 5.1 to G(R n , D); harmonic analysis will again be used in Sects. 9 and 10. For background, see e.g. the book by Reed and Simon [38]. In this section, functions are complex-valued unless stated otherwise.
for all x ∈ R n and if for every ρ ∈ L 1 (R n ) we have A continuous function f : R n → C is of positive type if and only if for every finite U ⊆ R n the matrix is (Hermitian) positive semidefinite. This characterization shows that if f is a continuous function of positive type, then f ∞ = f (0), since for every x ∈ R n the matrix is positive semidefinite and hence | f (x)| ≤ f (0). The set of all functions of positive type is a closed and convex cone, which we denote by PSD(R n ).
Bochner's theorem says that functions of positive type are exactly the Fourier transforms of finite measures: a continuous function f : R n → C is of positive type if and only if for some finite (positive) Borel measure ν, with the integral converging uniformly 5 over R n . A continuous function of positive type f : R n → C has a well-defined mean value and if ν is the measure in (20), then M( f ) = ν({0}). To see this last identity, for T > 0 and u ∈ R n , write Let g : R n → R be the function such that g(0) = 1 and g(u) = 0 for all nonzero u ∈ R n . Then g is the pointwise limit of g T as T → ∞. Moreover, |g T (u)| ≤ 1 for all u, and the constant one function is integrable with respect to the measure ν, since ν is finite. So we may use Lebesgue's dominated convergence theorem, and together with (20) we get .
for all x ∈ R n and v ∈ ; in this case we say that is a periodicity lattice of f . If f is periodic with periodicity lattice , then So we may equip L 2 (R n / ) with the inner product .
is the dual lattice of , form a complete orthogonal system of L 2 (R n / ). Given f ∈ L 2 (R n / ) and u ∈ 2π * , the Fourier coefficient of f at u is We then have that with convergence in L 2 norm, and from this follows Parseval's identity: if f , g ∈ L 2 (R n / ), then

An exact completely positive formulation
Let D ⊆ (0, ∞) be a set of forbidden distances and K(R n ) ⊆ PSD(R n ) be a convex cone; consider the optimization problem We denote both the problem above and its optimal value by ϑ (G(R n , D), K(R n )). Notice that, since K(R n ) ⊆ PSD(R n ), every f ∈ K(R n ) has a mean value, so the objective function is well defined.
Again, there are at least two cones that can be put in place of K(R n ). One is the cone PSD(R n ) of functions of positive type. The other is the cone of real-valued completely positive functions on R n , namely where the closure is taken in the L ∞ norm; note that C(R n ) is a cone contained in PSD(R n ). (G(R n , D)).
Write G = G(R n , D) for short. Since D is closed and does not contain 0, Theorem 2.2 implies that G is locally independent. Recall that, if D is unbounded, then a theorem of Furstenberg et al. [17] implies that αδ(G) = 0. In this case, one can show that ϑ(G, C(R n )) = 0; actually, ϑ(G, PSD(R n )) = 0, as shown by Oliveira and Vallentin [36, Theorem 5.1] (see also Sect. 10 below).
To prove the theorem we may therefore assume that D is bounded and nonempty. Write r = sup D, and for L > 2r write V L = R n /LZ n ; note V L is a compact Abelian group. Lemma 6.1 says that G L = G(V L , D) is locally independent. Since V L is metrizable via the bi-invariant metric (18), by taking V = = V L and letting ω be the Lebesgue measure on V L , the graph G L satisfies the hypotheses of Theorem 5.1, and so ϑ(G L , C(V L )) = α vol (G L ).

Lemma 6.2 then implies that lim sup
So to prove Theorem 6.3 it suffices to show that the limit above is equal to ϑ(G, C(R n )). The proof of this fact is a bit technical, but the main idea is simple; we prove the following two assertions: The first assertion establishes that the limit in (22) is ≤ ϑ(G, C(R n )); the second assertion establishes the reverse inequality.
To prove (A1), fix L > 2r and let A be a feasible solution of ϑ(G L , C(V L )). By applying the Reynolds operator to A if necessary, we may assume that A is invariant under the action of V L , that is, A(x + z, y + z) = A(x, y) for all x, y, z ∈ V L . Indeed, if A is feasible, then R(A) is also feasible, and to see this it suffices to show that R(A) is continuous, since the other constraints are easily seen to be satisfied. But the continuity of R(A) follows from Lemma 5.4, since V L is metrizable via the invariant metric (18).
Since A is invariant, there is a function g : V L → R such that Then: (i) g is continuous; (ii) since L > 2r , if x ∈ R n is such that x ∈ D, then x lies in the fundamental domain of LZ n with respect to the canonical basis vectors, and so g(x) = A(0, x) = 0 since 0 and x are adjacent in G L ; (iii) since A ∈ C(V L ), using Theorem 4.7 we see that g ∈ C(R n ); (iv) since A is invariant, its diagonal is constant, and then since tr This all implies that f = (vol V L )g is a feasible solution of ϑ(G, C(R n )); all that is left to do is to compute M( f ). Since g is periodic, its mean value is the integral of g on the fundamental domain F of the periodicity lattice divided by the volume of F, hence To prove (A2), let f be a feasible solution of ϑ(G, C(R n )) and fix L > 2r . Let W L = [−L/2, L/2] n and consider the kernel H : W L × W L → R such that H (x, y) = f (x − y). Note H is continuous and, since f ∈ C(R n ), using Theorem 4.7 we see that H ∈ C(W L ).
Let W L = [−L/2 + r , L/2 − r ] n and consider the kernel F : But then if x and y are adjacent, we must have x − y ∈ D and F(x, y) = H (x, y) = f (x − y) = 0. Now F is not continuous, but R(F) is; here is a proof. Since H is continuous and positive (recall H ∈ C(W L )), Mercer's theorem says that there are continuous functions φ i : W L → R with φ i = 1 and numbers λ i ≥ 0 for i = 1, 2, … such with absolute and uniform convergence over W L × W L . For i = 1, 2, … define the function ψ i : V L → R by setting We show now that the series converges absolutely and uniformly over V L × V L and, since R(ψ i ⊗ψ * i ) is continuous by Lemma 5.5, this will imply that R(F) is continuous.
For u ∈ V L and ψ : V L → R, write ψ u for the function such that ψ u (x) = ψ(x +u). Then establishing absolute convergence. For uniform convergence, note that given > 0 establishing uniform convergence and thus finishing the proof that R(F) is continuous. Now that we know that R(F) is continuous, we can show that R(F) ∈ C(V L ).

Indeed, since H is continuous and belongs to C(W L ), using Theorem 4.7 it is straightforward to show that, if U ⊆ V L is finite, then F[U ] ∈ C(U ) and hence also R(F)[U ] ∈ C(U ). But then, since R(F) is continuous, Theorem 4.7 implies that R(F) ∈ C(V L ).
So far we can conclude that A L = (tr R(F)) −1 R(F) is a feasible solution of ϑ(G L , C(V L )). To estimate J , A L we use the following fact. Lemma 6.4 If f : R n → C is continuous and of positive type, then Proof The function g : R n × R n → C such that g(x, y) = f (x − y) is continuous and of positive type. Indeed, let ν be the measure given by Bochner's theorem such that (20) holds and consider the Borel measure μ on R n × R n such that for all measurable X ⊆ R n × R n . Then μ is a finite measure and so μ is the measure representing g. But then the left-hand side of (23) is Since r is fixed, So using the lemma above we get finishing the proof of (A2). Here, the second identity follows from the definition of A L and the self-adjointness of the Reynolds operator.

The Boolean-quadratic cone and polytope
As was said in Sect. 1, one can use valid inequalities for C(V ) to strengthen the upper bound provided by ϑ (G, PSD(V )). This is one of our goals: to obtain better upper bounds in some particular cases of interest, like the unit-distance graph on Euclidean space or distance graphs on the sphere. From a practical standpoint, and for reasons that will become clear soon, instead of using valid inequalities for the completely positive cone, it is more convenient to use valid inequalities for the Boolean-quadratic cone. Given a nonempty finite set V , the Boolean-quadratic cone on V is If V is finite and ω is the counting measure, then recalling the proof of the inequality ϑ(G, C(V )) ≥ α ω (G) given in Sect. 3 we immediately get If V is infinite, it is not clear that (24) holds; at least the proof of Theorem 3.1 does not go through anymore: if f : V → R is the continuous function approximating the characteristic function of the independent set, then in general it is not true that f −2 f ⊗ f * ∈ BQC(V ). If G and ω satisfy the hypotheses of Theorem 5.1, however, then (24) holds and we have:

Theorem 7.1 Let G = (V , E) be a locally independent graph where V is a compact Hausdorff space, ⊆ Aut(G) be a compact group that acts continuously and transitively on V , and ω be a multiple of the pushforward of the Haar measure on . If is metrizable via a bi-invariant density metric for the Haar measure, then ϑ(G, BQC(V )) = α ω (G).
The proof requires the use of the Reynolds operator on V , namely of Lemma 5.5. For this we need a -invariant metric on V , whose existence is implied by the metrizability of via a bi-invariant metric, as shown by the following lemma.

Lemma 7.2 Let V be a compact Hausdorff space and be a compact group that acts continuously and transitively on V . If is metrizable via a bi-invariant metric, then V is metrizable via a -invariant metric.
Proof For x ∈ V , consider the map p x : → V such that p x (σ ) = σ x; the continuous action of implies that p x is continuous for every x ∈ V . Since is compact and Hausdorff and V is Hausdorff, p x is a closed and proper map: images of closed sets are closed and preimages of compact sets are compact.
Let d be a bi-invariant metric that induces the topology on and for σ ∈ and δ ≥ 0 let be the closed ball in with center σ and radius δ. For x, y ∈ V , let It is easy to show that d V is a -invariant metric; we show now that it induces the topology on V .
To this end, for x ∈ V consider the closed ball with center x and radius δ ≥ 0, namely (B (1, δ)).
Notice that this ball is closed since B (1, δ) is closed and p x is a closed map. We show now that the collection of finite unions of such balls is a base of closed sets of the topology on V , and it will follow that the metric d V induces the topology on V . Let X ⊆ V be a closed set and take x / ∈ X . Note p −1 x (X ) and p −1 x ({x}) are compact and disjoint, so Since p −1 x (X ) is compact, it can be covered by finitely many closed balls of radius δ/2, say B (σ i , δ/2) with σ i ∈ p −1 x (X ) for i = 1, …, N ; moreover, by the definition of δ, we have that p −1 x ({x}) is disjoint from each such ball. But then We have shown that, given any closed set X ⊆ V and any x / ∈ X , there is a finite union of d V -balls that contains X but not x, that is, finite unions of d V -balls form a base of closed sets of the topology on V .
Proof of Theorem 7.1 Since BQC(V ) ⊆ C(V ), from Theorem 5.1 it suffices to show that (24) holds. So let I ⊆ V be a measurable independent set with ω(I ) > 0 (such a set exists since G is locally independent and ω is positive on open sets) and consider the kernel A = ω(I ) −1 R(χ I ⊗ χ * I ). Using Lemma 7.2 we know that V is metrizable via a -invariant metric, and then using Lemma 5.5 we see that A is continuous; it is also immediate that tr A = 1 and A(x, y) = 0 if x, y ∈ V are adjacent. Let us then show that A ∈ BQC(V ).
Indeed, given a finite U ⊆ V , note that for any Z ∈ BQC * (U ), if μ is the Haar measure on , then (24).
A corresponding result holds for the bound for distance graphs on R n , presented in Sect. 6, by considering the cone BQC(R n ) = cl{ f ∈ L ∞ (R n ) : f is real valued and continuous with the closure taken in the L ∞ norm. Note that BQC(R n ) ⊆ C(R n ). (G(R n , D)).
Proof Recall from Sect. 6.3 that we may assume D is bounded. In view of Theorem 6.3, it then suffices to show that ϑ (G(R n , D), BQC(R n )) ≥ αδ (G(R n , D)).
Let I ⊆ R n be a measurable and periodic independent set withδ(I ) > 0 (which exists since D is bounded) and consider the function f : R n → R given by (notice the limit above exists since I is periodic). This function is continuous and satisfies f (0) = 1 and f (x) = 0 if x ∈ D, since if x ∈ D then for all z we cannot have both z and x + z ∈ I . Moreover, f ∈ BQC(R n ): if U ⊆ R n is finite and Z ∈ BQC * (U ), then whence f is a feasible solution of ϑ(G(R n , D), BQC(R n )). We also have M( f ) = δ(I ). Indeed, the characteristic function χ I of I is periodic, say with periodicity lattice . For x ∈ R n , consider the function (χ I ) x such that (χ I ) x (z) = χ I (x + z). Then it is easy to check that the Fourier coefficient of (χ I ) x at u equals e iu·x χ I (u), and thus Parseval's identity gives us

From this it is clear that
To finish, note that I is any measurable and periodic independent set, so using Lemma 6.2 the theorem follows. Theorem 7.1 tells us that any number of constraints of the form for finite U ⊆ R n and Z ∈ BQC * (U ), can be added to ϑ(G, PSD(V )), and that the resulting problem still provides an upper bound for the independence number. Moreover, if all such constraints are added, then we obtain the independence number. Theorem 7.3 says the same for the independence density of G (R n , D).
The main advantage of using BQC(U ) instead of C(U ) is that the Boolean-quadratic cone in finite dimension is a polyhedral cone, so for finite U one is able to compute all (or at least some of) the facets of BQC(U ), though the amount of work gets prohibitively large already for |U | = 7 [11, §30.6]. The better upper bounds described in Sects. 8 and 9 were obtained by the use of constraints based on such facets.

Subgraph constraints
Constraints from subgraphs of G(R n , {1}) played a central role in the computation of the best upper bounds for the independence density of the unit-distance graph [2,22,36].
Such subgraph constraints are as follows. Let G = (V , E) be a locally independent graph and ω be a Borel measure on V and assume G and ω satisfy the hypotheses of Theorem 5.1. Let U ⊆ V be finite and for every x 0 ∈ V consider the inequality where After adding any number of such constraints to ϑ(G, PSD(V )) we still get an upper bound for α ω (G). Indeed, if I ⊆ V is a measurable independent set of positive measure, then A = ω(I ) −1 R(χ I ⊗χ * I ) is continuous, positive, and such that tr A = 1, A(x, y) = 0 if x, y ∈ V are adjacent, and J , A = ω(I ) (recall the proof of Theorem 7.1). Moreover, since A(x, x) = ω(V ) −1 for all x ∈ V , and since for every σ ∈ ⊆ Aut(G) the set σ −1 I is independent, we get Notice these constraints do not come directly from C(V ) or BQC(V ), since they rely on the edge set of the graph. Theorem 5.1 says that they must be somehow implied by the constraints coming from C(V ) together with the other constraints of problem ϑ(G, C(V )), but the way in which this implication is carried out is not necessarily simple: it could be that only by adding many constraints from the completely positive cone for sets other than U one would get the implication.
The situation is clearer when one considers instead the Boolean-quadratic cone. In this case, a subgraph constraint for a given finite U ⊆ V and a given x 0 ∈ V is implied by a single constraint from BQC(U ∪ {x 0 }) together with the constraints A(x, y) = 0 for adjacent x and y.
To see this, assume for the sake of simplicity that x 0 / ∈ U and write U = U ∪ {x 0 } (if x 0 ∈ U , a simple modification of the argument below works). Let C : U ×U → R be the matrix such that Then the subgraph constraint (25) is We now show that there are matrices Z ∈ BQC * (U ) and B : U × U → R such that B(x, y) = 0 if x, y ∈ U are not adjacent satisfying C = Z + B, and it will follow that, if A is feasible for ϑ(G, PSD(V )) and x,y∈U Z (x, y)A(x, y) ≥ 0, then For Z , consider the matrix showing that Z ∈ BQC * (U ). Finally, subgraph constraints can also be used for distance graphs on R n : given a set D ⊆ (0, ∞) of forbidden distances, one can add to ϑ (G(R n , D), PSD(R n )) any number of constraints of the form where U ⊆ R n is finite and x 0 ∈ R n is fixed. Such constraints have been used by Oliveira and Vallentin [36] to get improved upper bounds for the independence density of the unit-distance graph on R n in several dimensions; the sets U used were always vertex sets of regular simplices in R n . Keleti et al. [22] used the points of the Moser spindle to get improved bounds for the independence density of G(R 2 , {1}); Bachoc et al. [2] used several different graphs to get better bounds for the independence density of G(R n , {1}) for n = 4, …, 24 and a better asymptotic bound.

A new class of graphical facets of the Boolean-quadratic cone
The matrix Z defined in (26) is sometimes an extreme ray of BQC * (U ), that is, Z , A ≥ 0 induces a facet of BQC(U ). In fact, matrices like Z comprise a whole class of facets of the Boolean-quadratic cone that generalizes the class of clique inequalities introduced by Padberg [37].
Let G = (V , E) be a finite graph with at least two vertices. We say that G is α-critical if α(G − e) > α(G) for all e ∈ E; α-critical graphs have been extensively studied in the context of combinatorial optimization [42, §68.5]. Proof The argument given in the previous section shows that Q G , A ≥ 0 is valid for BQC(W ); let us then establish the necessary and sufficient conditions for it to be facet defining.
As a subset of the space of symmetric matrices indexed by W × W , the cone BQC(W ) is full dimensional. Indeed, it suffices to notice that the 1 + |W |(|W | + 1)/2 matrices χ U ⊗ χ * U for U ⊆ W with |U | ≤ 2 are affinely independent. We first show necessity. If G = G 1 + G 2 , where G 1 , G 2 have disjoint vertex sets and G 1 is a connected component of G, then Q G = Q G 1 + P, where G 1 = (V , E(G 1 )) and P : W × W → R is such that P(∅, ∅) = α(G 2 ) and P(x, y) = 1/2 if (x, y) ∈ E(G 2 ). Now Q G 1 , A ≥ 0 is valid for BQC(W ) and, since P ≥ 0, so is P, A ≥ 0. Since α(G) = α(G 1 ) + α(G 2 ) and since BQC(W ) is full dimensional, we see that Q G , A ≥ 0 does not induce a facet.
Similarly, if α(G − e) = α(G) for some e = (x, y) ∈ E, then Q G = Q G−e + P, where P(x, y) = P(y, x) = 1/2, and we see that Q G , A ≥ 0 does not induce a facet.
To see sufficiency, assume G is connected and α-critical. Now suppose Z : W × W → R is such that Z , A ≥ 0 induces a facet of BQC(W ) and To show that Q G , A ≥ 0 induces a facet it suffices to show that Z is a nonnegative multiple of Q G .
To this end, notice first that if x ∈ V , then Q G , χ {x} ⊗ χ * {x} = 0, so Next, let x, y ∈ V and assume (x, Take now (x, y) ∈ E. Let I ⊆ V be a maximum independent set in G − (x, y); then |I | = α(G) + 1 and hence we must have x, y ∈ I . Write S = I ∪ {∅}, so and similarly 2Z (x, y).
Since x and y are interchangeable in the above argument, we see immediately that Z (∅, x) = −Z (x, y) = Z (∅, y). Now G is connected, and so it follows immediately that there is a number a such that Z (∅, x) = −a for all x ∈ V and Z (x, y) = a for all (x, y) ∈ E.

An alternative normalization and polytope constraints
The constraint "tr A = 1" in (6) is there to prevent the problem from being unbounded: it is a normalization constraint. There is another kind of normalization constraint that can be used to replace the trace constraint; by doing so we obtain an equivalent problem and also gain the ability to add to our problem constraints from the Boolean-quadratic polytope, which given a nonempty finite set V is defined as Such constraints are also implied by constraints from the Boolean-quadratic cone, but in practice, given our limited computational power, they are useful. For instance, the inclusion-exclusion inequalities used by Keleti et al. [22] to get better upper bounds for G(R 2 , {1}) come from facets of BQP(V ), as we will soon see. Let G = (V , E) be a topological graph where V is a compact Hausdorff space, ω be a finite Borel measure on V , and K(V ) ⊆ PSD(V ) be a convex cone. Since K(V ) is a subset of the cone of positive kernels, Mercer's theorem implies that any continuous kernel in K(V ) is trace class and that the trace is the integral over the diagonal. The alternative version of (6) is: Ais continuous and A ∈ K(V ).
If A is a feasible solution of the above problem, then A = (tr A) −1 A is feasible for ϑ(G, K(V )). Moreover, the positive-semidefiniteness of the 2 × 2 matrix in (27) implies that (tr A) 2 ≤ J , A , whence (27). The reverse inequality is also true: if A is a feasible solution of (6), then one easily checks that A = J , A A is a feasible solution of (27) and that tr A = J , A . So problems (6) and (27) are actually equivalent.
Fix a finite set U ⊆ V and let Z : U ×U → R be a symmetric matrix and β be a real number such that Z , A ≥ β is a valid inequality for BQP(U ), that is, Z , A ≥ β for all A ∈ BQP(U ).
If G and ω satisfy the hypotheses of Theorem 5.1, then any number of constraints can be added to (27) with K(V ) = PSD(V ) and we still get an upper bound for α ω (G). Indeed, if I is a measurable independent set of positive measure, then A = R(χ I ⊗χ * I ) is easily checked to be a feasible solution of (27) with K(V ) = PSD(V ) that moreover satisfies (28), and tr A = ω(I ). The alternative normalization is essential for this approach to work: if we try to add constraint (28) to (6), then if β = 0 we get a nonlinear constraint because of the different normalization, making it more difficult to deal with the resulting problem in practice. The same ideas can be applied to problem (21). First, given a closed set D ⊆ (0, ∞) of forbidden distances, we consider an alternative normalization that gives rise to an equivalent problem: is positive semidefinite, Then, we observe that we can add to this problem, with K(R n ) = PSD(R n ), any number of constraints of the form for finite U ⊆ R n and Z , β such that Z , A ≥ β is valid for BQP(U ) and still prove that the optimal value provides an upper bound for the independence density of G(R n , D).
Given points x 1 , …, x N ∈ R n , the inclusion-exclusion inequality used by Keleti, Matolcsi, Oliveira, and Ruzsa is This constraint is just (30) with Z such that Z (x i , x i ) = −1 for all i and Z (x i , x j ) = 1/2 for all i = j. It can be easily checked that Z , A ≥ −1 is a valid inequality for BQP({x 1 , . . . , x N }); one can even verify that it gives a facet of the polytope, simply by finding enough affinely independent points in the polytope for which the inequality is tight.
Constraints from BQP(U ) for a finite U ⊆ R n are implied by constraints from BQC(U ∪ {∅}) together with the other constraints from (6) or (21). It is still useful to consider constraints from BQP(U ) mainly since U ∪ {∅} is a larger set than U , and therefore computing the facets of BQC(U ∪ {∅}) can be much harder than computing the facets of BQC(U ), as is the case already when |U | = 6. For instance, Deza and Laurent [11, §30.6] survey some numbers for the cut polytope, which is equivalent to the Boolean-quadratic polytope under a linear transformation. For 6 points, the total number of facets is 116,764, distributed among 11 equivalence classes. The approach we use to find violated constraints cannot, however, exploit the full symmetry of the polytope, so we end up using a list of 428 facets. For 7 points, the total number of facets is 217,093,472, distributed among 147 classes. Taking into account the smaller symmetry group we use, the total list of facets needed for our procedure would have more than ten thousand entries.

Better upper bounds for the independence number of graphs on the sphere
By adding BQP(U )-constraints to ϑ(G(S n−1 , {π/2}), PSD(S n−1 )) using the approach described in Sect. 7.2, one is able to improve on the best upper bounds for α ω (G(S n−1 , {π/2})) = m 0 (S n−1 ). Table 1 shows bounds thus obtained for the independence ratio, namely α ω (G(S n−1 , {π/2}))/ω n , for n = 3, …, 8. The rest of this section is devoted to an explanation of how these bounds were computed. The bounds have also been checked to be correct; the verification procedure is explained in detail in a document available with the arXiv version of this paper. The programs used for verification can also be found with the arXiv version.

Invariant kernels on the sphere
Let O(n) be the orthogonal group on R n , that is, the group of n×n orthogonal matrices. The orthogonal group acts on a kernel A : S n−1 × S n−1 → R by where T ∈ O(n); we say that A is invariant if T · A = A for all T ∈ O(n). An invariant kernel is thus a real-valued function with domain [−1, 1], since if x · y = x · y , then A(x , y ) = A(x, y). Let D ⊆ (0, π] be a set of forbidden distances. If the cone K(S n−1 ) is invariant under the action of the orthogonal group, then one can add to the problem ϑ(G (S n−1 , D), K(S n−1 )) the restriction that A has to be invariant without changing the optimal value of the resulting problem. Indeed, if A is a feasible solution, then so is T · A for all T ∈ O(n), and hence its symmetrization where μ is the Haar measure on O(n), is also feasible and has the same objective value as A.
The advantage of requiring A to be invariant is that invariant and positive kernels can be easily parameterized. Indeed, let P n k denote the Jacobi polynomial of degree k and parameters (α, α), where α = (n − 3)/2, normalized so P n k (1) = 1 (for background on Jacobi polynomials, see the book by Szegö [44]). A theorem of Schoenberg [40] says that A : S n−1 × S n−1 → R is continuous, invariant, and positive if and only if there are nonnegative numbers a(0), a(1), … such that ∞ k=0 a(k) < ∞ and for all x, y ∈ S n−1 ; in particular, the sum above converges absolutely and uniformly on S n−1 × S n−1 .

Primal and dual formulations
When a continuous, invariant, and positive kernel A is represented as in (31), constraint (28) becomes where r : N → R is the function such that Let R be a finite collection of BQP(U )-constraints represented as pairs (r , β), where r is given by the above expression for a valid inequality Z , A ≥ β for BQP(U ) for some finite U ⊆ S n−1 . If a continuous, invariant, and positive kernel A is given by expression (31), then J , A = ω 2 n a(0). Moreover, all diagonal entries of A are the same, and hence Using the alternative normalization of Sect. 7.2, problem ϑ(G(S n−1 , {θ }), PSD(S n−1 )), strengthened with the BQP(U )-constraints in R, can be equivalently written as is positive semidefinite, Notice that the objective function was scaled so the optimal value is a bound for the independence ratio α ω (G(S n−1 , {θ }))/ω n .
A dual for this problem is the following optimization problem on variables λ, y(r , β) for (r , β) ∈ R, and z 1 , z 2 , z 3 : minimize z 1 + (r ,β)∈R y(r , β)β λ + (r ,β)∈R y(r , β)r (0) + z 2 ω n + z 3 ω 2 n ≥ 1, λP n k (cos θ) + (r ,β)∈R y(r , β)r (k) + z 2 ω n ≥ 1, for k ≥ 1, is positive semidefinite, In practice, this is the problem that we solve to obtain an upper bound; there are two main reasons for this. The first one comes from weak duality: the objective value of any feasible solution of this problem is an upper bound for the independence ratio. Indeed, let λ, y, z 1 , z 2 , z 3 be a feasible solution of (33) and a be a feasible solution of (32). Then as we wanted, where for the last inequality we use the positive-semidefiniteness of the 2 × 2 matrices in (32) and (33). The second reason is that the dual is a semidefinite program with finitely many variables, though infinitely many constraints, including one constraint for each k ≥ 0. In practice, we choose d > 0 and disregard all constraints for k > d. Then we solve a finite semidefinite program, and later on we prove that a suitable modification of the solution found is indeed feasible for the infinite problem, as we will see now.

Finding feasible dual solutions and checking them
To find good feasible solutions of (33), we start by taking R = ∅. Then we turn our problem into a finite one: we choose d > 0 and disregard all constraints for k > d. We have then a finite semidefinite program, which we solve using standard semidefinite programming solvers. The idea is that, if d is large enough, then the solution found will be close enough to being feasible, and so by slightly changing z 1 , z 2 , and z 3 we will be able to find a feasible solution.
By solving the finite problem we obtain at the same time an optimal solution of the corresponding finite primal problem, in which a(k) = 0 if k > d (notice this is likely not an optimal solution of the original primal problem). We use this primal solution to perform a separation round, that is, to look for violated polytope constraints that we can add to the problem. One way to do this is as follows.
Say a is the primal solution and let Fix an integer N ≥ 2, write [N ] = {1, . . . , N }, and let Z ∈ R N ×N , β ∈ R be such that Z , X ≥ β is valid for BQP([N ]). Then we try to find points x 1 , …, x N ∈ S n−1 that maximize the violation of the polytope inequality. If we find points such that the violation is positive, then we have a violated constraint which can be added to R; the whole procedure can then be repeated: the dual problem is solved again and a new separation round is performed.
To find violated constraints we need to know valid inequalities, or better yet facets, of BQP([N ]). Up to N = 6 it is possible to work with a full list of facets; for N = 7 only with a partial list. To find points x 1 , …, x N ∈ S n−1 maximizing (34), we represent the points on the sphere by stereographic projection on the x n = −1 plane and use some method for unconstrained optimization that converges to a local optimum.
After a few optimization/separation rounds, one starts to notice only minor improvements to the bound. Then it is time to check how far from feasible the dual solution is and to fix it in order to get a truly feasible solution and therefore an upper bound. A detailed description of the verification procedure, together with a program to check the dual solutions used for the results in this section, can be found together with the arXiv version of this paper.

Better upper bounds for the independence density of unit-distance graphs
Just like in the case of graphs on the sphere, we can add BQP(U )-constraints to ϑ(G(R n , {1}), PSD(R n )) and so obtain improved upper bounds for αδ(G(R n , {1})) for n = 3, …, 8. These improved upper bounds then provide new lower bounds for the measurable chromatic number χ m (G(R n , {1})) of the unit-distance graph, which is the minimum number of measurable independent sets needed to partition R n , for n = 4, Table 2 shows these new bounds compared to the previously best ones. To obtain the bounds for n = 4, …, 8, subgraph constraints (see Sect. 7.1) have also been used. In the remainder of this section we will see how these bounds have been computed; they have also been checked to be correct, and the verification procedure is explained in detail in a document available with the arXiv version of this paper. The programs used for the verification can also be found with the arXiv version.

Radial functions
The orthogonal group O(n) acts on a function f : R n → C by where T ∈ O(n); we say that f is radial if it is invariant under this action, that is, if T · f = f for all T ∈ O(n). A radial function f is thus a function of one real variable, since if x = y , then f (x) = f (y).
Let D ⊆ (0, ∞) be a set of forbidden distances. If the cone K(R n ) ⊆ L ∞ (R n ) is invariant under the action of the orthogonal group, then one can add to the problem ϑ(G(R n , D), K(R n )) the restriction that f has to be radial without changing the optimal value of the resulting problem. Indeed, if f is a feasible solution, then so is T · f for all T ∈ O(n), and hence its radialization where μ is the Haar measure on O(n), is also feasible and has the same objective value as f .
The advantage of requiring f to be radial is that radial functions of positive type can be easily parameterized. Indeed, if f ∈ PSD(R n ) is continuous, then Bochner's theorem says that there is a finite Borel measure ν on R n such that But then we obtain the following expression, due to Schoenberg [39], for the radialization of f : where for u ∈ R n and α is the Borel measure on [0, ∞) such that α(X ) = ν({ λξ : λ ∈ X and ξ ∈ S n−1 }) for every measurable set X . The function n has a simple expression in terms of Bessel functions, namely for t > 0 and n (0) = 1, where J α denotes the Bessel function of first kind of order α (for background, see the book by Watson [47]).

Primal and dual formulations
When a continuous radial function f of positive type is represented as in (35), constraint (30) becomes where r : [0, ∞) → R is the continuous function such that As shown in Sect. 7.1, a subgraph constraint is implied by one BQP(U )-constraint together with the other constraints of ϑ(G(R n , {1}), PSD(R n )), so in the discussion below we treat them as BQP(U )-constraints. Let R be a finite collection of BQP(U )-constraints represented as pairs (r , β), where r is given by the above expression for a valid inequality Z , A ≥ β for BQP(U ) for some finite U ⊆ R n . Using the alternative normalization of Sect. 7.2, problem ϑ(G(R n , {1}), PSD(R n )), strengthened with the BQP(U )-constraints in R, can be equivalently written as is positive semidefinite, α is a finite Borel measure on [0, ∞).
A dual for this problem is the following optimization problem on variables λ, y(r , β) for (r , β) ∈ R, and z 1 , z 2 , z 3 : Again, this is the problem that we solve to obtain an upper bound, and the two reasons for this are the same as before. The first one comes from weak duality: the objective value of any feasible solution of this problem is an upper bound for the independence density. Indeed, let λ, y, z 1 , z 2 , z 3 be a feasible solution of (39) and α be a feasible solution of (38). Then as we wanted. The second reason is that the dual is a semidefinite program with finitely many variables, though infinitely many constraints, including one constraint for each t > 0. In practice, we discretize the set of constraints and solve a finite semidefinite program, later on proving that a suitable modification of the solution found is indeed feasible for the infinite problem, as we discuss now.

Finding feasible dual solutions and checking them
To find good feasible solutions of (39), we start by taking R = ∅. Then we discretize the constraint set: we choose a finite sample S ⊆ (0, ∞) and instead of all constraints for t > 0 we only consider constraints for t ∈ S. Then we have a semidefinite program, which we solve using standard semidefinite programming solvers. The idea is that, if the sample S is fine enough, then the solution found will be close enough to being feasible, and so by slightly increasing z 1 and z 2 we will be able to find a feasible solution.
By solving the discretized dual problem we obtain at the same time an optimal solution of the discretized primal problem, in which α is a sum of Dirac δ measures supported on S ∪ {0} (notice this is likely not an optimal solution of the original primal problem, but of the discretized one). We use this primal solution to perform a separation round, that is, to look for violated BQP(U )-constraints that we can add to the problem. One way to do this is as follows.
Say that α is the primal solution and let Fix an integer N ≥ 2, write [N ] = {1, . . . , N }, and let Z ∈ R N ×N , β ∈ R be such that Z , A ≥ β is valid for BQP([N ]). Then we try to find points x 1 , …, x N ∈ R n that maximize the violation of the BQP(U )-constraint. If we find points such that the violation is positive, then we have a violated constraint which can be added to R; the whole procedure can then be repeated: the dual problem is solved again and a new separation round is performed.
To find violated constraints we work with a list of facets of BQP([N ]), as in Sect. 8.3. To find points x 1 , …, x N ∈ R n maximizing (40) we simply use some method for unconstrained optimization. After a few optimization/separation rounds, one starts to notice only minor improvements to the bound. Then it is time to check how far from feasible the dual solution is and to fix it in order to get a truly feasible solution and therefore an upper bound. The verification procedure for the dual solution has already been outlined by Keleti et al. [22] and will be omitted here; the dual solutions that give the bounds in Table 2 and a program to verify them can be found together with the arXiv version of this paper.

Sets avoiding many distances in R n and the computability of the independence density
Reassuring though Theorem 5.1 may be, the computational results of Sects. 8 and 9 do not use it, or rather use only the easy direction of the statement. In this section we will see how the full power of Theorem 5.1 can be used to recover results about densities of sets avoiding several distances in Euclidean space. Furstenberg et al. [17] showed that, if n ≥ 2, then any subset of R n with positive upper density realizes all arbitrarily large distances. More precisely, if I ⊆ R n has positive upper density, then there is d 0 > 0 such that for all d > d 0 there are x, y ∈ I with x − y = d. This fails for n = 1: the set k∈Z (2k, 2k + 1) has density 1/2 but does not realize any odd distance.
Falconer [14] proved the following related theorem: if (d m ) is a sequence of positive numbers that converges to 0, then for all n ≥ 2 This theorem also fails when n = 1, as can be seen from an adaptation of the previous example.
Bukh [6] proved a theorem that implies both theorems above; namely, he showed that, as the ratios d 2 /d 1  Oliveira and Vallentin [36] showed that the limit above decreases exponentially fast as m increases. They showed that using in the proof only a few properties of the Bessel function. In this section, we will see how Bukh's result (41) can be obtained in a similar fashion using Theorem 5.1. This illustrates how the completely positive formulation provides a good enough characterization of the independence density to allow us to prove such precise asymptotic results.
Bukh derives his asymptotic result from an algorithm to compute the independence density to any desired precision. As a by-product of the approach of this section we also obtain such an algorithm based on solving a sequence of stronger and stronger convex optimization problems.
Finally, similar decay results can be proved for distance graphs on other metric spaces, such as the sphere or the real or complex projective space [35]. The methods of this section can in principle be applied to any metric space, as long as the harmonic analysis can be tackled successfully.

Thick constraints
The better bounds for the independence density described in Sect. 9 were obtained by adding to the initial problem ϑ(G(R n , {1}), PSD(R n )) a few BQP(U )-constraints for finite sets U . Our approach in this section is similar: we wish to add more and more constraints to the initial problem in a way that is guaranteed to give us closer and closer approximations of the independence density. The constraints used in Sect. 9 are easy to deal with in computations, but it is not clear (and we do not know) whether by adding a finite number of them to the initial problem we can get arbitrarily close to the independence density. A slight modification of these constraints, however, displays this property, even though such modified constraints are much harder to deal with in practice.
For a finite set U ⊆ R n write for the minimum distance between pairs of distinct points in U . The following lemma provides an alternative characterization of C(R n ).

Lemma 10.1 A continuous and real-valued function f
for all finite U ⊆ R n , Z ∈ C * (U ), and 0 < δ ≤ m(U )/2.
Compare this lemma to the definition of C(R n ) from Sect. 6.3. A constraint (42) is obtained from by considering an open ball of radius δ around each point in U ; since δ ≤ m(U )/2, balls around different points do not intersect. So we are "thickening" each point in U .
Proof Let f ∈ L ∞ (R n ) be a continuous and real-valued function and suppose there is a finite U ⊆ R n and Z ∈ C * (U ) such that Since f is continuous, for every > 0 there is δ > 0 such that for all x, y ∈ U we have | f (x − y) − f (x − y )| < for all x ∈ B(x, δ) and y ∈ B(y, δ). So for all x, It follows that, by taking small enough, the left-hand side of (42) for the corresponding δ will be negative.
For the other direction, we approximate integrals of f by finite sums. If f is such that the left-hand side of (42) is negative, then take for U the set consisting of a fine sample of points inside each B(x, δ) for x ∈ U . In this way one approximates by summation the double integrals in (42), showing that where Z : U × U → R is the copositive matrix derived from Z by duplication of rows and columns.
Recall from Sect. 9.1 that a continuous radial function f ∈ L ∞ (R n ) of positive type can be represented by a finite Borel measure α on [0, ∞) via Using this expression, a constraint like (42) where r : [0, ∞) → R is the function such that note r is continuous. The following lemma establishes two key properties of such a function r . (43), then r vanishes at infinity. If moreover n ≥ 2 and tr Z = 0, then r (t) ≥ 0 for all large enough t.

Lemma 10.2 If r is given as in
Proof Let B be an open ball centered at the origin and fix z ∈ R n . Let μ be the Haar measure on the orthogonal group O(n) ⊆ R n×n , normalized so the total measure is 1.
which provides us with an expression for the double integrals appearing in (43) in terms of the Fourier transform of χ B×(z+B) ; the lemma will follow from this relation. First, it is immediate from this relation that r vanishes at infinity. Indeed, the Riemann-Lebesgue lemma [38,Theorem IX.7] says that the Fourier transform of the characteristic function vanishes at infinity (that is, as u → ∞) and so, since Z is a fixed matrix, we must have that r vanishes at infinity.
To see that r is nonnegative at infinity is only slightly more complicated. Note Since B is centered at the origin, χ B×B (T u, −T u) = χ B×B (u, −u) for all T ∈ O(n), so averaging gives us B z+B Recall that n (0) = 1. Since n ≥ 2, the function n vanishes at infinity. 6 Then, since tr Z = 0, and hence tr Z > 0 as Z is copositive, using (44) it follows that for all large t the diagonal summands in (43) together dominate the off-diagonal ones. Now χ B×B (u, −u) ≥ 0 as follows from the definition of the Fourier transform. So since tr Z > 0, it follows that for all large enough t we have r (t) ≥ 0.
Say now R is any finite collection of functions r each one defined in terms of a thick constraint as in (43), and let d 1 , …, d m be m distinct positive numbers. Consider the optimization problem This problem is comparable to (38), but instead of using the alternative normalization of Sect. 7.2, the standard normalization is used, and instead of considering only distance 1 as a forbidden distance, distances d 1 , …, d m are forbidden; this way we get an infinite-dimensional linear program instead of a semidefinite program. By construction, the optimal value of (45) is an upper bound for αδ (G(R n , {d 1 , . . . , d m })).
A dual problem for (45) is the following (cf. problem (39)): (Recall n (0) = 1, hence the coefficient of z i in the first constraint is 1.) Weak duality holds between (45) and (46): if λ, z, and y is any feasible solution of the dual problem and α is any feasible solution of the primal problem, then α({0}) ≤ λ; the proof of this fact is analogous to the proof of the weak duality relation between problems (38) and (39), given in Sect. 9.2. So any feasible solution λ, z, and y of the dual provides an upper bound for the independence density, namely αδ(G(R n , {d 1 , . . . , d m })) ≤ λ.

A sequence of primal problems
For each finite nonempty set U , the set the tip of C * (U ), is a compact convex set, and every copositive matrix is a multiple of a matrix in the tip. 7 There is then a countable dense subset T * ℵ 0 (U ) of T * (U ), and we may assume that all Z ∈ T * ℵ 0 (U ) are such that tr Z > 0 and J , Z > 0. If U ⊆ R n is finite, then the set of constraints of the form (42) with Z ∈ T * ℵ 0 (U ) and δ = m(U )/(2k) for integer k ≥ 1 is countable. If we consider all finite subsets U of Q n and all corresponding constraints, then the set of all constraints thus obtained is also countable. The corresponding functions (43) can be enumerated as r 1 , r 2 , ….
We use this enumeration to define a sequence of optimization problems, the N th one being Note this is just problem (45) with R = {r 1 , . . . , r N }, m = 1, and d 1 = 1. Let ϑ N denote both the N th optimization problem above and its optimal value, and denote by ϑ ∞ the optimization problem in which constraints for all k ≥ 1 are added, as well as the optimal value of this problem. We know that ϑ N ≥ αδ (G(R n , {1})) for all N ≥ 1. By the construction of the r k functions, using Lemma 10.1 and Theorem 6.3, we also know that ϑ ∞ = αδ(G(R n , {1})).
Proof Since ϑ N ≥ ϑ N +1 and ϑ N ≥ ϑ ∞ for all N ≥ 1, the limit exists and is at least ϑ ∞ ; we show now the reverse inequality.
So let (α N ) be a sequence of measures such that α N is a feasible solution of ϑ N and α N ({0}) ≥ L for all N ≥ 1 and some L > 0. Each α N is a finite Radon measure (since [0, ∞) is a complete separable metric space), being therefore an element of the space M([0, ∞)) of signed Radon measures of bounded total variation. By the Riesz Representation Theorem [16,Theorem 7.17], the space M([0, ∞)) is the dual space of C 0 ([0, ∞)), which is the space of continuous functions vanishing at infinity equipped with the supremum norm.
To see (i), note first that α must be nonnegative. For suppose α(X ) < 0 for some set X . Since α is Radon, it is inner regular on σ -finite sets [16,Proposition 7.5], so there is a compact set C ⊆ X such that α(C) < 0. For k ≥ 1, let U k be the set of all points at distance less than 1/k from C; note that U k is open and that C is the intersection of U k for k ≥ 1.
For every k ≥ 1, Urysohn's lemma says that there is a continuous function f k : [0, ∞) → [0, 1] that is 1 on C and 0 outside of U k , and since U k is bounded this function vanishes at infinity. Now and for some N we must have [ f k , α N ] < 0, a contradiction since f ≥ 0 and α N is nonnegative.

A sequence of dual problems
Following (46), here is a dual problem for ϑ N : Weak duality holds between this problem and ϑ N , but in this case we know even more, namely that there is no duality gap between primal and dual problems: Theorem 10.4 If n ≥ 2, then the optimal value of (48) is ϑ N .
In Sect. 9.3 we saw how problem (39), which is similar to (48), is solved: we disregard all constraints for t > L for some L > 0, take a finite sample S of points in [0, L], and consider only constraints for t ∈ S. We then have a finite linear program, which can be solved by computer. Most likely, an optimal solution of this problem will be (slightly) infeasible for the original, infinite problem. However, the hope is that, if L is large enough and the sample S is fine enough, then the solution obtained from the discretized problem can be fixed to become a feasible solution of the original problem.
The proof of the above theorem follows the same strategy, but while in Sect. 9.3 we did not have to argue that this solution strategy always works (since we were only interested in having it work for the cases considered), here we have to. For that we need two lemmas, the first one to help us find the number L. Lemma 10.5 If n ≥ 2 and if t 0 > 0 is such that n (t 0 ) < 0 and r k (t 0 ) ≥ 0 for k = 1, …, N , then the polyhedron in R N +2 consisting of vectors (λ, z, y 1 , . . . , y N ) satisfying is bounded.

The polyhedron given by the inequalities
Finally, for each k = 1, …, N , add to s 1 nonnegative multiples of l 2 , −w 1 , and e i for i = k and rescale the result to see that −e k ∈ K, finishing the proof that K = R N +2 .
The second lemma provides some crude bounds on the derivative of the functions n and r k , and will be used to help us decide how fine the sample S has to be. Compare this with the expression for n+2 to get n (t) = −(t/n) n+2 (t).
We now have everything needed to prove that there is no duality gap.
Let S ⊆ [0, L] be a finite set of points with the property that given t ∈ [0, L] there is s ∈ S with |t − s| ≤ /(M D) and make sure that both t 0 and L are in S. Now consider the optimization problem minimize λ λ + z + N k=1 y k r k (0) ≥ 1, λ + z n (t) + N k=1 y k r k (t) ≥ 0 for all t ∈ S, −1 ≤ λ ≤ 2, y ≤ 0, which is a finite linear program. Let λ, z, and y be an optimal solution of this problem and write g(t) = z n (t) + N k=1 y k r k (t).
Since λ + g(s) ≥ 0, we then have that λ + g(t) ≥ − . The estimates (53) and (54) together show that λ + , z, and y is a feasible solution of (48). We now find a solution of ϑ N , defined in (47), of value close to it.
To do so, notice that if is small enough, then (53) implies in particular that λ > −1, or else λ + g(L) < 0, a contradiction. Since our solution is optimal, we must also have λ < 2 (notice λ = 1, z = 0, and y = 0 is a feasible solution of our problem). Now problem (52) is a finite linear program, and we can apply the strong duality theorem. Its dual looks very much like problem ϑ N , except that the measure α is now a discrete measure supported on S ∪ {0} and there are two extra variables corresponding to the constraints λ ≥ −1 and λ ≤ 2. Since our optimal solution of (52) is such that −1 < λ < 2, complementary slackness implies that these two extra variables of the dual of (52) will be 0 in an optimal solution. So if α is an optimal solution of the dual of (52), then it is also a feasible (though likely not optimal) solution of ϑ N .
We have then a solution of ϑ N of value λ and a feasible solution of (48) of value λ+ . Making approach 0 we obtain the theorem.

Asymptotics for many distances
The theorem below implies the '≤' direction of Bukh's result (41). The reverse inequality is much simpler to prove; the reader is referred to Bukh's paper [6]. Proof All ideas required for the proof can be more clearly presented when only two distances are considered; for larger values of m one only has to use induction.

Computability of the independence density
The sequence of dual problems of Sect. 10.3 can be used to construct a Turing machine that computes the independence ratio of the unit-distance graph up to any prescribed precision. Here is a brief sketch of the idea.
First we describe a Turing machine that computes an increasing sequence of lower bounds for the independence density that come arbitrarily close to it.
Given T > 0, let P T ,N be the partition of [−T , T ) n consisting of all half-open cubes C 1 × · · · × C n with For each such partition let G T ,N be the graph whose vertex set is P T ,N and in which two vertices X , Y are adjacent if and only there are x ∈ X and y ∈ Y such that x − y = 1. Given T and N , the finite graph G T ,N can be computed by a Turing machine.
By construction, if I is an independent set of G T ,N , then the union I of all X in I is an independent set of the unit-distance graph with measure |I | vol[0, 2T /N ] n and We know from Sect. 6.1 that periodic independent sets can come arbitrarily close to the independence density. It is then not hard to show that by taking larger and larger T and larger and larger N one can by the above construction generate lower bounds for the independence density that can come arbitrarily close to it. So our Turing machine simply fixes an enumeration (T 1 , N 1 ), (T 2 , N 2 ), … of (N\{0}) 2 , computes the independence number of G T i ,N i for all i, uses (55) to get a lower bound, and outputs at each step the best lower bound found so far.
Let us now see how to construct a Turing machine that computes a decreasing sequence of upper bounds for the independence density that come arbitrarily close to it.
The idea is to find at the N th step a feasible solution of the dual (48) of ϑ N with value at most ϑ N + 1/N . This we do by mimicking the proof of Theorem 10.4: we disregard constraints for t ≥ L for some large L and we discretize the interval [0, L]. Following the proof of the theorem, one sees that it is possible to estimate algorithmically how large L has to be and how fine the discretization has to be so we obtain a feasible solution of value at most ϑ N + 1/N .
One problem now is that we have to work with rational numbers and not real numbers. The Bessel function and all integrals involved have to be approximated by rationals, which can be done to any desired precision algorithmically. In the end, however, we are not solving the original dual problem, but an approximated version of it. Why is the solution of this approximated version close to the solution of the original version, given, that is, that the approximation is good enough? Such a result, related to what is known in linear programming as sensitivity analysis, follows from Lemma 10.5: we work with problems of bounded feasible region, so there is a universal upper bound on the magnitude of any number appearing in any feasible solution, and it is possible to show that if the input data approximates the real data well enough, then the solutions will be very close together; moreover, it is possible to estimate how good the approximation has to be.
Another problem is to see that the set {r 1 , r 2 , . . .} can be enumerated by a Turing machine. The only difficulty here is how to enumerate the set T * ℵ 0 (U ) for some finite set U . One way to do it is as follows. First, note that T * (U ) is a subset of the L 1 unit ball in R U ×U . Given > 0, consider a finite -net N for this unit ball. Let now N be a finite set containing for each A ∈ N a matrix B ∈ T * (U ) with B 1 ≤ 1 such that A − B 1 ≤ , if it exists. Then, since N is an -net, for every Z ∈ T * (U ) there is B ∈ N such that Z − B 1 ≤ 2 . So we may take for T * ℵ 0 (U ) the union of N 1/k for k ≥ 1.
It only remains to show how N can be computed. Given A ∈ N , we want to solve the following finite-dimensional optimization problem: The L 1 norms above can be equivalently rewritten using linear constraints, so the above problem is a conic program that can be solved with the ellipsoid method (the separation problem is NP-hard, as follows from the equivalence between separation and optimization [19], but in this case we do not care for efficiency: it is enough to have a separation algorithm for the copositive cone, and we do [18]). By solving this problem repeatedly one can construct N . So we have two Turing machines, one to find better and better lower bounds, and one to find better and better upper bounds. Running the two alternately, one constructs a third Turing machine that given > 0 stops when the best lower bound is -close to the best upper bound found.