Asymptotic optimality of the triangular lattice for a class of optimal location problems

We prove an asymptotic crystallization result in two dimensions for a class of nonlocal particle systems. To be precise, we consider the best approximation with respect to the 2-Wasserstein metric of a given absolutely continuous probability measure $f \mathrm{d}x$ by a discrete probability measure $\sum_i m_i \delta_{z_i}$, subject to a constraint on the particle sizes $m_i$. The locations $z_i$ of the particles, their sizes $m_i$, and the number of particles are all unknowns of the problem. We study a one-parameter family of constraints. This is an example of an optimal location problem (or an optimal sampling or quantization problem) and it has applications in economics, signal compression, and numerical integration. We establish the asymptotic minimum value of the (rescaled) approximation error as the number of particles goes to infinity. In particular, we show that for the constrained best approximation of the Lebesgue measure by a discrete measure, the discrete measure whose support is a triangular lattice is asymptotically optimal. In addition, we prove an analogous result for a problem where the constraint is replaced by a penalization. These results can also be viewed as the asymptotic optimality of the hexagonal tiling for an optimal partitioning problem. They generalise the crystallization result of Bourne, Peletier and Theil (Communications in Mathematical Physics, 2014) from a single particle system to a class of particle systems, and prove a case of a conjecture by Bouchitt\'{e}, Jimenez and Mahadevan (Journal de Math\'ematiques Pures et Appliqu\'ees, 2011). Finally, we prove a crystallization result which states that optimal configurations with energy close to that of a triangular lattice are geometrically close to a triangular lattice.


Introduction
Consider the problem of approximating an absolutely continuous probability measure by a discrete probability measure. To quantify the quality of the approximation, we measure the approximation error in the 2-Wasserstein metric. Let Ω ⊂ R d be the closure of an open and bounded set, and let be the density of the absolutely continuous probability measure. We approximate f dx by a discrete measure from the set For µ ∈ P d (Ω), we define N µ := #supp(µ), which is not fixed a priori. For p ≥ 1, the p-Wasserstein distance (see [65,71]) between f dx and µ ∈ P d (Ω) is is Borel, Observe that inf{W p (f, µ) : µ ∈ P d (Ω)} = 0 since there exists a sequence of discrete measures µ n converging weakly * to f dx, with N µn → ∞ as n → ∞. On the other hand, for each N ∈ N, inf{W p (f, µ) : µ ∈ P d (Ω), N µ ≤ N } > 0. Therefore the problem inf{W p (f, µ) : µ ∈ P d (Ω)} has no solution. To obtain a minimizer we must constrain the number of atoms N µ , either explicitly (with a constraint) or implicitly (with a penalization). Given an entropy H : P d (Ω) → [0, ∞] (defined below) we consider the constrained optimal location problem inf W p p (f, µ) : µ ∈ P d (Ω), H(µ) ≤ L =: where L > 0, and the penalized optimal location problem inf W p p (f, µ) + δH(µ) : µ ∈ P d (Ω) =: where δ > 0. If H satisfies H(µ) → ∞ as N µ → ∞, then minimising sequences for problems (3) and (4) have a uniformly bounded number of atoms. If in addition H is lower semi-continuous with respect to the weak * convergence of measures, then problems (3) and (4) admit a solution.
When L or δ are fixed, the geometry of the set Ω has a strong effect on optimal particle arrangements, and it is very difficult to characterise minimising configurations. As L increases, or δ decreases, the optimal number of particles N µ increases, and it is believed that optimal configurations locally form regular, periodic patterns; see the numerical evidence in Figures 1  and 2. This phenomenon is known as crystallization (see Section 1.3 for more on this). The specific geometry of these patterns depends on the choice of p in the Wasserstein distance, the choice of H, and the dimension d. In this paper we will study the crystallization problem by taking the limits L → ∞ and δ → 0.
For the entropy H(µ) = #supp(µ) (5) Zador's Theorem for the asymptotic quantization error states that for some positive constant C p,d that is independent of the density f . See for example [20,41,45,74] and see [44,51,53] for the more general case where Ω is a Riemannian manifold. The constant C p,d is known in two dimensions: where P 6 is a regular hexagon of unit area centred at the origin. This follows from Fejes Tóth's Theorem on Sums of Moments (see [37,43]), which has also been proved in various levels of generality by several other authors including [13,36,59,61].  (4). The polygons are the sets T −1 ({z i }), where T is the optimal transport map in (2). The particles z i are located at the centroids of the polygons. The masses m i are the areas of the polygons. The colours correspond to the number of sides: squares are yellow, pentagons are orange, hexagons are blue, and heptagons are red. For each value of α, a hexagonal tiling (with defects) starts to emerge as δ is decreased. This figure, Figure 2 and Table  1 were made by Steven Roper using the generalized Lloyd algorithm from [19]. To search for a global minimizer in the highly non-convex energy landscape, the algorithm was ran many times using different, randomly generated initial conditions. The values of δ were chosen by first choosing a target value of N µ and then using the heuristic (16) to generate the corresponding δ. Better results, without defects, can be achieved by taking the initial particle locations to be a perturbation of a triangular lattice; see Figure 2.  Figure  2 (middle row, middle and right columns). See the caption to Figure 1 for a description of the polygons and the colour scheme. These configurations have lower energy (W 2 2 (f, µ) + δH α (µ)) than the corresponding configurations shown in Figure 1, and they do not have defects. This figure was generated by Steven Roper using the generalized Lloyd algorithm from [19] and by taking the initial conditions to be perturbations of a triangular lattice. In Figure 1 (middle row, right column) there are N µ = 200 particles whereas in this figure (right) there are N µ = 202 particles; algorithm [19] attempts to find the optimum number of particles.
The geometric interpretation of (6) and (7) is the following: In two dimensions it is asymptotically optimal to arrange the atoms of the discrete measure at the centres of regular hexagons, i.e., on a regular triangular lattice, where the areas of the hexagons depend on the density f . Locally, where f is approximately constant, these hexagons form a regular honeycomb. By the regular triangular lattice we mean the set Z(1, 0) ⊕ Z(1/2, √ 3/2) up to dilation and isometry. See Remark 1.3 below for more on this geometric interpretation.
In particular, the conjecture for the case p = 2, d = 2 is for all α ∈ (−∞, 1). It is known that C 2,2 (α) = c 6 for all α ∈ (−∞, 0] (see [14,Section 3.6]) and so it remains to establish the conjecture for the case α ∈ (0, 1). The conjecture would mean that in two dimensions a discrete measure supported on a regular triangular lattice gives asymptotically the best constrained approximation of the Lebesgue measure (again, see Remark 1.3 below for this geometric interpretation).
1.1. Main results. In this paper we prove conjecture (10) for all α ∈ (−∞, α], where α = 0.583; see Theorem 1.2. The conjecture for α ∈ (α, 1) remains open, although we suggest a direction for proving it in Theorem 6.1, where we prove it under an additional assumption. In Theorem 1.1 we prove an analogous asymptotic quantization formula for the penalized optimal location problem (4) for all α ∈ (−∞, α]. This generalises the crystallization result of [18], where Theorem 1.1 was proved for the special case α = 0.5, f = 1. Moreover, for the case f = 1, we prove that minimal configurations are 'asymptotically approximately' a triangular lattice; see Theorem 1.4. To be more precise, we prove that, as δ → 0, rescaled minimal configurations for the penalized quantization problem are quantitatively close to a triangular lattice. This result will be proved for the case α = α. The proof can be easily modified for any α ≤ α. Define the constrained optimal quantization error by and the penalized optimal quantization error by Since the Wasserstein distance on the compact set Ω metrizes the tight convergence of probability measures, and the map µ → i m α i is lower semi-continuous with respect to this convergence [65, Lemma 7.11], both infima above are attained. Our main results are the following.   1 2−α m p (α, δ) (which appears on the left-hand side of (13)) to 2−α 1−α c 6 (which appears on the right-hand side of (13); note that For three representative values of α, we see that the ratio tends to 1 as δ tends to 0, which supports Theorems 1.1 and 6.1. The numerical approximation of the minimum energy and the choice of δ is described in Figures 1 and 2.
By comparing equation (9) to equation (14) with p = d = 2, we read off that C 2,2 (α) = c 6 for all α ∈ (−∞, α], which proves conjecture (10) for this range of α. We believe that Theorem 1.1 and Theorem 1.2 hold for all α ∈ (−∞, 1), not just for α ∈ (−∞, α], but we are only able to prove them for the whole range of α if we make an ansatz about minimal configurations; see Theorem 6.1. |Ω| ½ Ω be the uniform probability distribution on Ω. Here ½ Ω denotes the characteristic function of the set Ω. By definition of c 6 (equation (10)) and a change of variables, for all i. Therefore the penalized quantization error of approximating f dx by The right-hand side of (15) is minimized when Substituting this value of N into (15) (assuming for a moment that it is an integer) gives which motivates the rescaling used in (13). This heuristic computation suggests an upper bound for the left-hand side of (13), for the case where f is the uniform distribution. Theorem 1.1 says that this upper bound is in fact asymptotically optimal. In this sense we can say that the honeycomb structure gives asymptotically the best approximation of the uniform distribution. The rescaling used in (14) can be derived in a similar way. Indeed, fix L > 0 and consider the constraint If all the masses are the same, m i = 1/N µ for all i, then the biggest number N µ for which this constraint is satisfied is N µ = L 1 1−α . Assuming that N µ is an integer, take as above which motivates the rescaling used in (14). Combining this formal calculation with Theorem 1.2 again suggests the asymptotic optimality of the honeycomb. Theorem 1.1 gives the asymptotic minimum value of the penalized quantization error but says nothing about the configuration of the particles; it says that the triangular lattice is asymptotically optimal, but it does not say that asymptotically optimal configurations are close to a triangular lattice. We prove this in the following theorem. Theorem 1.4 (Asymptotically optimal configurations are close to a regular triangular lattice).
Let Ω ⊂ R 2 be a convex polygon with at most six sides, |Ω| = 1, f = ½ Ω , and α = α. There exist constants ε 0 , c, β 1 , β 2 > 0 with the following property. Let δ > 0 and µ δ = N δ i=1 m i δ z i ∈ P d (Ω) be a solution of the penalized quantization problem defining m p (α, δ). Define the defect of µ δ by Note that lim δ→0 d(µ δ ) = 0 by Theorem 1.1. Define The optimal number of particles N δ is asymptotically equal to V δ,α : (b) If δ > 0 is sufficiently small, and if ε ∈ (0, ε 0 ) and µ δ satisfy then, with the possible exception of at most N δ cε 1/3 indices i ∈ {1, . . . , N δ }, the following hold: (i) V i is a hexagon; (ii) the distance between z i and each vertex of V i is between (iii) the distance between z i and each edge of V i is between Even though Theorem 1.4 is stated only for the case α = α, the same proof holds for any α ≤ α, up to proving the convexity inequality (25) for that specific value of α (by using the same strategy we used for the case α = α). A similar result can be proved for the constrained quantization problem. Remark 1.5 (Geometric interpretation of Theorem 1.4). Note that the term (17) converges to 0 as δ → 0. Theorem 1.4 essentially states that if the defect d(µ δ ) is small, then the support of µ δ is close to a regular triangular lattice, and it quantifies how close. Note that the Voronoi tessellation generated by the regular triangular lattice is a regular hexagonal tessellation. The theorem states that the Voronoi tessellation of V 1/2 δ,α Ω generated by the rescaled particles z i is close to a regular hexagonal tessellation in the sense that, except for at most N δ cε 1/3 Voronoi cells, the Voronoi cells are hexagons, and it quantifies how far the hexagons are from being regular. For a regular hexagon of area V δ,α N δ , the distance between the centre of the hexagon and each vertex is , and the distance between the centre of the hexagon and each edge is Since lim δ→0 V δ,α /N δ → 1, 'most' of the rescaled Voronoi cells V i are 'close' to a regular hexagon of area 1. Remark 1.6 (Locality and weaker assumptions on f ). Theorems 1.1, 1.2 say that the quantization problems are essentially independent of f , in the sense that the optimal constants 2−α 1−α c 6 and c 6 are independent of f and are determined by the corresponding quantization problems with f = 1; see Remarks 3.5 and 3.11. The locality of the quantization problems is independent of the crystallization and is easier to prove. The locality for the constrained problem was proved by [14] and the locality for the penalised problem follows easily from this, as we shall see in Section 3.2. Locality results for the classical quantization problem were proved among others by [20,45,53,74]. We believe that the assumption of lower semi-continuity on f in Theorems 1.1, 1.2 could be relaxed by using the approach in [60], where a locality result is proved for the related irrigation problem, which concerns the best approximation of an absolutely continuous probability measure by a one-dimensional Hausdorff measure supported on a curve. Remark 1.7 (α ≥ 1). For α ≥ 1, the constrained and penalized quantization problems m c (1, L) and m p (1, δ) do not have a minimizer. The infimum is zero since both the Wasserstein distance and the entropy can be sent to zero by sending the number of particles to infinity. In [14] the authors considered the constraint See [14, Proposition 3.11(iii), Remark 3.13(iii)]. For α > 2, C 2,2 (α) = 0. For α ∈ (1, 2), C 2,2 (α) is not known, but it satisfies the bounds where B is the ball of unit area centred at the origin [14, Lemma 3.10].
h is lower semi-continuous, subadditive and lim t→0+ h(t)/t = +∞; see [65,Lemma 7.11]. This includes our entropy h(m) = m α . There is evidence, however, that crystallization does not hold for all entropies in this class, or at least that optimal configurations consist of particles of different sizes; see [14,Section 3.4]. In this paper we have found a subclass for which crystallization holds. It is an open problem to find the largest class of such entropies.
(ii) Functionals of the form µ → W 2 2 (f, µ)+δ i h(m i ) arise in models of economic planning; see [24]. For example, consider the problem of the optimal location of warehouses in a county Ω with population density f . The measure µ = i m i δ z i represents the locations z i and sizes m i of the warehouses. The Wasserstein term in the functional above penalizes the average distance between the population and the warehouses, and the entropy term penalizes the building or running costs of the warehouses. The subadditive nature of the entropy H α corresponds to an economy of scale, where it is cheaper to build one warehouses of size m than two of size m/2. (iii) The special case α = 0.5 arises in a simplified model of a two-phase fluid, namely a diblock copolymer melt, in two dimensions; see [17]. Here the entropy √ m corresponds to the interfacial length between a droplet of one phase of area m and the surrounding, dominant phase. (iv) Finally, from a mathematical perspective, we were inspired to study the entropy H α by the conjecture of Bouchitté et al. [14, Section 3.6 (ii)].

1.2.
Sketch of the proofs of Theorems 1.1 & 1.2. We briefly present the main ideas of the paper. We will see that Theorem 1.2 is an easy consequence of Theorem 1.1 (see Section 5), and so here we just focus on the ideas behind the proof of Theorem 1.1. The strategy for proving Theorem 1.4 is discussed in Section 7.
Next we prove a monotonicity result (Lemma 3.12), which is analogous to a monotonicity result proved by [14] for the constrained quantization problem, which asserts that if Theorem 1.1 holds for some α ∈ (−∞, 1), then it holds for all α ∈ (−∞, α]. Therefore we only need to prove Theorem 1.1 for the single value α = α = 0.583. Therefore for the rest of the paper we can take Ω = Q, f = ½ Ω , α = α without loss of generality. From the definition of the Wasserstein distance, equation (2) where T is the optimal transport map. Since Q is a polygonal set, it is well known (see Lemma 2.1) that the sets T −1 ({z i }) are convex polygons, called Laguerre cells.
A classical result by Fejes Tóth (see Lemma 2.3) states that the second moment of a polygon about any point in the plane is greater than or equal to the second moment of a regular polygon (with the same area and same number of edges) about its centre of mass: where P (m, n) is a polygon with area m and n edges, R(m, n) is a regular polygon centred at the origin with area m and n edges, and z ∈ R 2 . Combining (20) and (21) gives for all µ ∈ P d (Ω), where n i denotes the number of edges of the polygon T −1 ({z i }) and m i denotes its area. Our proofs are limited to the p-Wasserstein metric with p = 2 since, for p = 2, the transport regions T −1 ({z i }) are not convex polygons. Moreover, our proofs are limited to two dimensions since there is no equivalent statement of Fejes Tóth's Moment Lemma in higher dimensions (due to the lack of regular polytopes in higher dimensions). Next we recall the proof of Theorem 1.2 due to Gruber [43] for the case α = 0, which we will adapt to prove Theorem 1.1 (and consequently Theorem 1.2) for general α ∈ (−∞, α]. It can be shown that the function (m, n) → g(m, n) := m 2 c n is convex. (Note that n → c n can be extended from a function on N ∩ [3, ∞) to a function on [3, ∞); see Lemma 2.3.) If µ is a minimizer of W 2 2 (½ Ω , ·) subject to the constraint N µ ≤ L, then clearly N µ = L (assuming that L is an integer) since we get the best constrained approximation of ½ Ω by taking as many Dirac masses as possible. By convexity, where κ := g n (1, 6) = ∂ n c n | n=6 < 0. Combining equations (22) and (23) gives since i m i = |Ω| = 1. Euler's formula for planar graphs implies that the average number of edges in any partition of the unit square Ω by convex polygons is less than or equal to 6: 1 L i n i ≤ 6; see Lemma 2.6. Therefore, by equation (24) and since κ < 0, This is the lower bound in Theorem 1.2 for the case α = 0. A matching upper bound can be obtained in the limit L → ∞ by taking µ = In [17] Gruber's strategy was generalized to prove Theorem 1.1 for the case α = 0.5 and f = 1. Thanks to the results of [14] and our results in Section 3.2, it follow that Theorem 1.1 holds for all α ∈ (−∞, 0.5] and all lower semi-continuous f satisfying (1). In this paper we extend these ideas further to prove Theorem 1.1 for the case α = 0.583, and hence all α ∈ (−∞, 0.583]. First of all, we rescale the square Q as follows (see Remark 1.3): The rescaling factor is chosen in such a way that a discrete measure supported at the centres of regular hexagons of unit area is asymptotically optimal. Up to a multiplicative factor, the rescaled energy is where ½ Q δ,α denotes the characteristic function of the square Q δ,α . Here µ is a Borel measure Unfortunately, for α ∈ (0, 1), g α is not convex. Our first main technical result is to show that for α = α there exists m 0 > 0 such that the following 'convexity inequality' holds for all m ≥ m 0 , Therefore minimizers satisfy the convexity inequality (25), and the proof of Theorem 1.1 now follows using Gruber's strategy.
To be precise, we are only able to prove the inequality (26) for particles z i that are not too close to the boundary (Lemma 4.15(i)). Nevertheless, we are able to prove a worse lower bound on the mass m i of particles near the boundary (Lemma 4.15(ii)), which is still sufficient to show that the number of particles near the boundary is asymptotically negligible. This fixes what appears to be a gap in the proof in [18], where it was tacitly assumed that all of the particles were sufficiently far from the boundary of the rescaled domain (at least distance 3.2143; see the proof of [18,Lemma 7]).
The idea of the proof of (26) is to compare the energy of a minimizer µ with that of a competitor µ that is obtained by gluing the smallest particle of µ with one of its neighbours. The proofs of (25) and (26) require some delicate positivity estimates. As in the proof of [18], we also use computer evaluation at several points in the proof to check the sign of some explicit numerical constants (that are much larger than machine precision).
1.3. Literature on crystallization, optimal partitions and quantization. Our work belongs to the very active research programme of establishing crystallization results for nonlocal interacting particle systems. This problem is known as the crystallization conjecture [12]. Despite experimental evidence that many particle systems, such as atoms in metals, have periodic ground states, until recently there were few rigorous mathematical results. Results in one dimension include [11,39] and results in two dimensions include [3,7,8,9,10,18,30,35,50,63,64,69]. Let us recall that a central open problem in mathematical physics is to establish the optimality of the Abrikosov (triangular) lattice for the Ginzburg-Landau energy [68]. In three dimensions there are few rigorous results. Even establishing the optimal configuration of just five charges on a sphere was only achieved in 2013 via a computer-assisted proof [66]. The Kepler conjecture about optimal sphere packing was also computer-assisted [48,49], while the optimal sphere covering remains to this day unknown. In even higher dimensions (in particular 8 and 24), there start to be more rigorous results again, e.g., [26,27,70]. For a thorough survey of recent crystallization results for nonlocal particle systems see [12] and [67].
Our result also falls into the field of optimal partitions (see Remark 4.10). The optimality of hexagonal tilings, or Honeycomb conjectures, have been proved for example by [21,22,23,47]. Kelvin's problem of finding the optimal foam in 3D (the 'three-dimensional Honeycomb conjecture') remains to this day unsolved; for over 100 years it was believed that truncated octahedra gave the optimal tessellation, until the remarkable discovery of a better tessellation by Weire and Phelan [72].
Finally, our result also belongs to the field of optimal quantization or optimal sampling [41,45], [46,Section 33], which concerns the best approximation of a probability measure by a discrete probability measure. The most commonly used notion of best approximation is the Wasserstein distance. This problem has been studied by a wide range of communities including applied analysis [14,24,52], computational geometry [31], discrete geometry [28,45], and probability [41]. Applications include optimal location of resources [13], signal and image compression [32,42], numerical integration [62, Section 2.3], mesh generation [33,57], finance [62], materials science [16, Section 3.2], and particle methods for PDEs (sampling the initial distribution) [15, It is well known that if µ = N i=1 m i z i is a minimizer of W 2 (f dx, ·), then the particles z i generate a centroidal Voronoi tessellation (CVT) [31,55], which means the particles z i lie at the centroids of their Voronoi cells. Numerical methods for computing CVTs include Lloyd's algorithm [31] and quasi-Newton methods [55]. More generally, minimizers of the penalized energy µ → δ i m α i + W 2 2 (f dx, µ) generate centroidal Laguerre tessellations (see Remark 4.6). Numerical methods for solving the constrained and penalized quantization problems include [19] (which was used to produce Figures 1 and 2) and [73].
There is a large literature on optimal CVTs of N points (global minima of µ → W 2 (½ Ω , µ) subject to #supp(µ) = N ). According to Gersho's conjecture (see [40]), minimizers correspond to regular tessellations consisting of the repetition of a single polytope whose shape depends only on the spatial dimension. In two dimensions the polytope is a hexagon [13,36,37,43,59,61] and moreover the result holds for any p-Wasserstein metric, p ∈ [1, ∞). Gersho's conjecture is open in three dimensions, although it is believed that the optimal CVT is a tessellation by truncated octahedra, which is generated by the body-centred cubic (BCC) lattice. Some numerical evidence for this is given in [34], and in [6] it was proved that the BCC lattice is optimal among lattices (but we do not know whether the optimal configuration is in fact a lattice). Geometric properties of optimal CVTs in 3D were recently proved in [25], who also suggested a strategy for a computed-assisted proof of Gersho's conjecture.
1.4. Organization of the paper. In Section 2 we recall some basic notions from optimal transport theory and convex geometry. In Section 3.1 we recall from [14] the result (9) for the case d = p = 2, namely the scaling of the minimum value of the energy for the constrained problem (11). In Section 3.2 we derive the scaling of the minimum value of the energy for the penalized problem (12). These results give the optimal scaling of the minimum values of the constrained and penalized energies, but they do not give the optimal constants. In Section 4 we identify the optimal constant for the penalized problem (which proves Theorem 1.1) and in Section 5 we identify the optimal constant for the constrained problem (which proves Theorem 1.2). In Section 6 we prove the asymptotic crystallization result for all α ∈ (−∞, 1) under an additional assumption. Finally, Section 7 is devoted to the proof of Theorem 1.4.

Main assumptions.
We assume that Ω ⊂ R 2 is the closure of an open and bounded set, and f ∈ L 1 (Ω) is a lower semi-continuous function satisfying f ≥ c > 0 and . For a Lebesgue-measurable set A ⊂ R 2 , we denote by |A| its area and by ½ A its characteristic function. We let M(X) denote the set of non-negative finite Borel measures on X ⊂ R d and P(X) ⊂ M(X) denote the set of probability measures on X. Moreover, we let M d (X) ⊂ M(X) be the following set of discrete measures: Recall that P d (X) denotes the set of discrete probability measures, P d (X) = M d (X) ∩ P(X), and that, for µ ∈ P d (Ω), N µ := #supp(µ). For brevity, in an abuse of notation, we denote the preimage of a singleton set {z} ⊂ X under a map T : X → X by T −1 (z) instead of T −1 ({z}).

2.3.
Facts from optimal transport theory and convex geometry. We start by recalling the characterization of solutions of the semi-discrete transport problem (2) for the case p = 2. The following result goes back to [4] and is now well-known in the optimal transport community; see for example [54,56,58].
Then the infimum is attained and the minimizer T : is unique (up to a set of measure zero). Moreover, by possibly modifying T on a set of measure zero, there exists (w 1 , . . . , w Nµ ) ∈ R Nµ such that

Remark 2.2 (Laguerre cells). The previous lemma implies that the partition {T
; see [5,58]. The sets T −1 (z i ) are convex polygons, known as Laguerre cells or power cells.
We now recall a classical result by L. Fejes Tóth (see [37, p. 198]), which says that the minimal second moment of an n-gon is greater than or equal to the minimal second moment of a regular n-gon of the same area: Then the infimum is attained by a regular n-gon. Consequently a direct calculation gives c n = 1 2n Remark 2.4. Note that a change of variables gives inf min We extend the definition of c n to all n ∈ [3, ∞) using equation (27). Its main properties are stated in the next result, whose proof is a direct computation (see [43]). Finally, we recall one more result from convex geometry, which follows from Euler's polytope formula. It is proved for example in [18,Lemma 4] or [59,Lemma 3.3].
Lemma 2.6 (Partitions by convex polygons). Let U ⊂ R 2 be a convex polygon with at most 6 sides. In any partition of U by convex polygons, the average number of edges per polygon is less than or equal to 6.
3. Scaling of the asymptotic quantization error 3.1. The constrained optimal location problem. We report here a result about the asymptotic quantization error from [14].
In order to define the cell formula for the asymptotic quantization error, we need to introduce the following metric on the space of probability measures. Given ρ 1 , ρ 2 ∈ P(R + ), we define where Lip(R + ) is the space of Lipschitz continuous functions on R + and Lip(ϕ) denotes the Lipschitz constant of ϕ. It is well known that d BL metrizes tight convergence (see [29,Theorem 11.3.3]).
The energy density of the asymptotic quantization error is introduced as follows.
Note that the result proved in [14] holds more generally: in any dimension, for any p-Wasserstein metric, and for more general entropies.
3.2. The penalized optimal location problem. Here we prove analogous results to those presented in the previous section.
We are thus left with proving that min M(Ω×R + ) where C p (α) is defined in (33).
Step 2. For each x ∈ Ω, define G x α : For each x ∈ Ω, G x α is lower semi-continuous since G is lower semi-continuous [ where γ 2,2 = B 1 (0) |x| 2 dx. Therefore, for each x ∈ Ω, minimising sequences for G x α are tight. Consequently G x α has at least one minimizer.
We claim that there exits a Borel measurable function x → ρ x ∈ P(R + ), x ∈ Ω, such that This will follow from Aumann's Selection Theorem (see [38,Theorem 6.10]) once we prove that the graph of the multifunction Γ : Ω → 2 P(R + ) \ ∅, defined by Γ(x) := argmin G x α , belongs to B(Ω) ⊗ B(P(R + )), the product σ-algebra of the Borel sets of Ω and the Borel sets of P(R + ). To prove this, we define the function Ψ : In the following, the target space R will always be equipped with the Borel σ-algebra. For each ρ ∈ P(R + ), the function s → Ψ(s, ρ) is continuous. For each s ∈ R + , the function ρ → Ψ(s, ρ) is lower semi-continuous and hence B(P(R + ))-measurable. Therefore Ψ is a Carathéodory function and hence B(R + ) ⊗ B(P(R + ))-measurable (see, e.g., [1,Lemma 4.51]). Define the composite function Φ : . This is B(Ω) ⊗ B(P(R + ))-measurable since f and Ψ are Borel measurable.
We claim that the map ) is B(Ω) ⊗ B(P(R + ))-measurable (since Φ is constant in its second argument). The required B(Ω) ⊗ B(P(R + ))-measurability of graph of the multifunction Γ then follows by noticing that To show that x → min ρ∈P(R + ) Φ(x, ρ) is B(Ω)-measurable, we write it as the composite function x → f (x) → min ρ∈P(R + ) Ψ(f (x), ρ). This is B(Ω)-measurable since x → f (x) is B(Ω)measurable and s → min ρ∈P(R + ) Ψ(s, ρ) is the pointwise infimum of a family of continuous functions, hence upper semi-continuous and B(R + )-measurable. This completes the proof that there exists a Borel measurable function x → ρ x ∈ P(R + ) satisfying (36).
Remark 3.11. Let Q ⊂ R 2 be a unit square. Taking Ω = Q and f = ½ Q in Corollary 3.10 By Corollary 3.10, in order to prove Theorem 1.1 it is sufficient to prove that for all α ∈ (−∞, α]. The next result, which is analogous to the monotonicity of the map α → C c (α), means that in order to prove Theorem 1.1 for all α ∈ (−∞, α), it is sufficient to prove it for the single value α = α.
The matching lower bound requires much more work. Owing to Corollary 3.10 and Remark 3.11 we can assume without loss of generality that We will do this throughout the rest of the paper.

4.1.
Rescaling of the energy and the energy of a partition. To prove (45) it is convenient to rescale the domain Q. As δ → 0, the optimal masses m i in (42) go to 0. Following [18], instead of keeping the domain Q fixed as δ → 0, we blow up Q in such a way that the optimal masses m i tend to 1. The following definition is motivated by the heuristic calculation given in Remark 1.3. δ,α Q. Moreover, define the set of admissible discrete measures A δ,α by Therefore, by (42) and (47), We now state two first-order necessary conditions for minimizers of F δ,α . For a proof see, for instance, [17,Theorem 4.5].
(i) For all i ∈ {1, . . . , N µ }, we have The point z i is the centroid of the Laguerre cell T −1 (z i ), namely In particular, z i ∈ T −1 (z i ).
In the following it will also be convenient to reason from a geometrical point of view. Each µ ∈ A δ,α induces a partition of Q δ,α by the Laguerre cells T −1 (z i ), where T is the optimal transport map defining W 2 (½ Q δ,α , µ). We define a wider class of partitions as follows: Definition 4.7 (Admissible partitions). Let S δ,α denote the family of partitions of Q δ,α of the form C = (C 1 , . . . , C k ) where k ∈ N, C i ⊂ Q δ,α is measurable, and k i=1 ½ C i = 1 a.e. in Q δ,α .
The advantage of working with partitions instead of measures is that it allows us to localise the nonlocal energy F δ,α .
. . , k}. We say that C ∈ S δ,α is an optimal partition if it minimizes F .
To each µ ∈ A δ,α it is possible to associate an element of S δ,α as follows: Definition 4.9 (Partition associated to a discrete measure). Let µ ∈ A δ,α be of the form where T is the optimal transport map defining W 2 (½ Q δ,α , µ). Let µ = Nµ i=1 m i δ z i be a minimizer of F δ,α . For i ∈ {1, . . . , N µ }, let n i denote the number of edges of C µ i . Then we can bound the energy from below as follows: In this notation the lower bound above becomes In the following section we study the function g α . 4.2. The convexity inequality. We start by proving a technical result that plays the role of a convexity inequality for g α . We want to show that, for large enough values of m, Writing this out explicitly gives where For α ∈ (0, 1), define the function h α : [0, ∞) × [3, ∞) → R to be the difference between g α and its tangent plane approximation at (1,6): Note that in this section we restrict our attention to α > 0 without loss of generality since by the monotonicity result (Lemma 3.12) in the end we only need to consider α = α = 0.583. The typical behaviour of the function m → h α (m, n) is depicted in Figure 3. Our aim is to prove that h α (m, n) is non-negative for all integers n ≥ 3 for large enough values of m, as suggested by the figure. Before proving this, we prove the following easy but important corollary, which allows us to reduce the proof of the convexity inequality h α (·, n) ≥ 0 for all integers n ≥ 3 to the finite number of cases n ∈ {3, 4, 5}: Proof. Observe that h α (0, n) = −κ(n − 6) ≥ 0 for all n ≥ 6. Therefore the result follows immediately from Lemma 4.11.
On the other hand, h α (0, n) < 0 for n ∈ {3, 4, 5}. This is why in the next section we will need to prove a lower bound on the masses m i of minimizers of F δ,α to ensure the validity of the convexity inequality (50).
Proof of Lemma 4.11.
Step 1. First we study the shape of the function m → h α (m, n). In particular, we show that it has exactly one local minimum point. Its derivative is It is easy to see that ∂ m h α (m, n) is strictly convex in m (since α > 0). Therefore m → ∂ 2 mm h α (m, n) is increasing and so m → ∂ m h α (m, n) has at most one critical point. On the other hand, lim m→0 ∂ m h α (m, n) = +∞, lim m→∞ ∂ m h α (m, n) = +∞. Therefore m → ∂ m h α (m, n) has exactly one critical point and m → h α (m, n) has at most two critical points.
To prove (57) we reason as follows. Using (51) and the fact that n → c n is decreasing (Lemma 2.5), we deduce that n → ∂ m h α (m, n) is decreasing for all m, and hence n → m(α, n) is increasing. Therefore m(α, n) ≤ m(α, ∞), where m(α, ∞) is defined to be the largest root of where c ∞ was defined in Lemma 2.5. We want to show that m(α, ∞) ≤ 3/2. We have Therefore m(α, ∞) ∈ (1, 3/2) by the Intermediate Value Theorem. This proves (57) and completes the proof.
Remark 4.13. Despite the fact that we used the specific value of α in several places in the proof of Lemma 4.11, we expect Lemma 4.11 to hold for all α ∈ (0, 1).

4.3.
Lower bound on the area of optimal cells. Let µ = Nµ i=1 m i δ z i ∈ A δ,α be a minimizer of F δ,α . We will prove the convexity inequality h α (m i , n) ≥ 0 for all i ∈ {1, . . . , N µ }, n ∈ N ∩ [3, ∞). The idea is to prove a lower bound m i ≥ m such that h α (m, n) ≥ 0 for all n ∈ N ∩ [3, ∞). Then the convexity inequality follows from Lemma 4.11.
We prove the lower bound m i ≥ m following the strategy of the proof of [18,Lemma 7], which was developed for the case α = 0.5. The main differences are that we have to deal with the more difficult case of α = α > 0.5 and that we optimise some of the estimates. Our proof can be used to give a lower bound m on the areas m i for all α ∈ (0, 1), but this lower bound does not satisfy the convexity inequality h α (m, n) ≥ 0 if α > α. Our lower bound on the area of the cells holds for cells with arbitrarily many sides, although by Corollary 4.12 we only need the lower bound for cells with 3, 4 or 5 sides. We saw no advantage in the proof of restricting the number of sides.
The following result gives the difference in energy of a partition and the one obtained by merging two of its cells. Recall that S δ,α and F were defined in Definitions 4.7 and 4.8.
We now prove a lower bound on the area of optimal cells, as well as an upper bound on the diameter of the cells and the maximum distance between the centroids. The latter two estimates will be used later to deal with the fact that the lower bound on the area of cells close to the boundary of Q δ,α is not good enough to ensure the validity of the convexity inequality (50). (iii) Let B ⊂ Q δ,α be a ball of radius R. If B ∩ supp(µ) = ∅, then R < R 0 := 3.3644.
Step 1: Upper bound on the distance between Dirac masses. Letz ∈ supp(µ) satisfy dist(z, ∂Q δ,α ) ≥ 4. Take R ∈ (0, 4) such that B R (z) ∩ supp(µ) = {z}. We want to get an upper bound on R. Let S : B R (z) → supp(µ) be any Borel map. Then It is now convenient to use the partition energy F . Let C µ be the partition associated to the minimizer µ (see Definition 4.9). Consider the partition C ∈ S δ,α obtained by modifying C µ in the ball B R (z) as follows. Write C µ = (C 1 , . . . , C Nµ ) and define Let {H j } be a tiling of the plane by regular hexagons of area A, where A > 0 will be defined below, and let H j = H j ∩ B R (z). Define C to be the partition consisting of the sets C i for all i ∈ {1, . . . , N µ } and H j for all j such that H j = ∅; see Figure 4.
be the diameter of a regular hexagon of area A. The number of hexagons N A needed to cover a ball of radius R is bounded above by If H ∈ C is a (whole) regular hexagon of area A with centroid ξ H , then Since µ is a minimizer of F δ,α , then C µ is a minimizer of F . Therefore where S : and where in the final inequality we used the property of the centroid that Combining estimates (63) and (60) gives Define Then (64) implies that p(R, A) ≤ 0. The quadratic polynomial R → p(R, A) has one positive root and one negative root. Let R(A) denote the positive root: Since p(R, A) ≤ 0, we have R ∈ [0, R(A)] for all A, and so where the final inequality was obtained by numerically evaluating R(0.52). The choice A = 0.52 was motivated by numerically minimising R(A).
Step 2: Proof of (i). Letz ∈ supp(µ) satisfy dist(z, ∂Q δ,α ) ≥ 4. Let R 0 = 3.3644. By Step 1, there exists at least one point z ∈ B R 0 (z) ∩ supp(µ). In particular, Let m = µ({z}), M = µ({z}). We can assume without loss of generality that m ≤ M (otherwise simply interchange the roles ofz and z). Let C µ be the partition associated to the minimizer µ. We can define a new partition D by replacing the cells T −1 (z) and T −1 (z) with their union. Then Lemma 4.14 yields Define λ := m M ∈ (0, 1]. By dividing the previous inequality by M α and rearranging we obtain For α ∈ (0, 1), let We can restrict out attention to α > 0 since eventually we will apply this result to α = α > 0. We want to bound Θ α from below for the case α = α. The idea is the following: We first prove in Step 2a that the function λ → Θ α (λ) has one minimum point for all α ∈ (0, 1). In Step 2b we estimate this minimum point for the case α = α.
Step 3: Proof of (iii). This is very similar to Step 1. Let B = B R (x 0 ) ⊂ Q δ,α satisfy B ∩ supp(µ) = ∅. Let S : B → supp(µ) be any Borel map. We have The second inequality is clearly suboptimal, but it is sufficient for our purposes. Repeating exactly the same argument used in Step 1 (with B R (z) replaced by B) gives R < 3.3644, as required.
Step 4: Proof of (ii). Let z i ∈ supp(µ) satisfy dist(z i , ∂Q δ,α ) < 4. Let U be the square U = {x ∈ Q δ,α : dist(x, ∂Q δ,α ) ≥ 4}. Take x ∈ ∂U satisfying |x − z i | = dist(z i , ∂U ). Let K be a closed square of side-length 2R 0 such that x ∈ ∂K and K ⊂ U (such a square exists if δ is sufficiently small). By Step 3, there exists z j ∈ supp(µ) ∩ K, z j = z i . Therefore Without loss of generality can assume that m i < m (otherwise m i ≥ m > m b and there is nothing to prove). Let T be the optimal transport map defining W 2 (½ Q δ,α , µ). Let z ∈ T −1 (z i ). By Lemma 2.1 and Lemma 4.5(i), By Lemma 4.5(ii), z i ∈ T −1 (z i ). Taking z = z i in (73) gives by (72) and Lemma 4.15(i). Therefore as required.
Step 5: Proof of (iv). Take any x ∈ T −1 (z i ). Let K be a closed square of side-length 2R 0 such that x ∈ K and K ⊂ Q δ,α (such a square exists if δ is sufficiently small). By Step 3, there exists at least one point z j ∈ supp(µ) ∩ K. Therefore |x − z j | 2 ≤ diam(K) 2 = 8R 2 0 . By Lemma 2.1 and Lemma 4.5(i), as required.
The lower bound we obtained in Lemma 4.15(i) is good enough to ensure the validity of the convexity inequality (50): Corollary 4.16 (Convexity inequality). Let µ = Nµ i=1 m i δ z i ∈ A δ,α be a minimizer of F δ,α . If δ > 0 is sufficiently small and dist(z i , ∂Q δ,α ) ≥ 4, then Proof. Recall that m = 2.0620 × 10 −4 . By a direct computation we see that Therefore (74)  Theorem 4.17 (Lower bound on C p (α)). We have Combining Theorem 4.17 with Lemma 4.1, Lemma 3.12 and Corollary 3.10 completes the proof of Theorem 1.1.
Proof of Theorem 4.17. By (48) it is sufficient to prove that Let µ = Nµ i=1 m i δ z i ∈ A δ,α be a minimizer of F δ,α and let Let U be the tubular neighbourhood of ∂Q δ,α of width 4 + D 0 , where D 0 was defined in Lemma 4.15(v): Let T be the optimal transport map defining W 2 (½ Q δ,α , µ). By Lemma 4.15(v), Recall that Q δ,α is a square of side-length V 1/2 δ,α . By Lemma 4.15(iv), |T −1 (z j )| > m b for all j ∈ J . Therefore, for δ sufficiently small so that V Hence lim Using this and (78) we conclude that For i ∈ {1 . . . , N µ }, let n i be the number of edges of the convex polygon T −1 (z i ). Recall from Lemma 2.6 that Finally, we put everything together to complete the proof: (49)) Taking the limit δ → 0 and using (79) and (80) proves (75), as required.
5. The constrained optimal location problem: Proof of Theorem 1.2 This section is devoted to the proof of Theorem 1.2. The idea is to use Theorem 1.1 together with the following relation between C c (α) and C p (α): Lemma 5.1 (Relation between the constrained and penalized problems). For all α ∈ (−∞, 1), Proof. By Remark 3.5, Remark 3.11. In the penultimate equality we used the change of variables δ = c 6 1−α L α−2 Proof of Theorem 1.2. Let α ∈ (−∞, α]. By Corollary 3.4, in order to prove Theorem 1.2 it is sufficient to prove that C c (α) = c 6 . We have as required.
6. Paving the way towards α = 1 In this section we prove an asymptotic crystallization result for the full range of α ∈ (−∞, 1) under the ansatz (which we are not able to prove) that the neighbouring cells of the smallest ones are not too large.
Assume also that, for all δ sufficiently small, z i * is not too close to the boundary: Then the asymptotic crystallization results (13) and (14) hold.
Step 2. Next we study the shape of the function m → h 1 (m, n). Its derivative is which is strictly convex in m with lim m→0 ∂ m h 1 (m, n) = +∞, lim m→∞ ∂ m h 1 (m, n) = +∞. Therefore m → ∂ m h 1 (m, n) has exactly one critical point, which is m = c 6 2cn . Since n → c n is decreasing, Therefore m → h 1 (m, n) has exactly two critical points, the smallest critical point is a local maximum point, and the largest critical point is a local minimum point. Let m(n) denote the local minimum point. By the calculation above, Define ϕ(n) to be the value of h 1 (·, n) at its local minimum point: Observe that ∂ m h 1 (1, 6) = 0. Therefore, by (84), the local minimum point of h 1 is m(6) = 1 and ϕ(6) = 0.
Step 3. We claim that there exists a constant c > 0 such that ϕ(n) > c for all integers n ≥ 3, n = 6. First we show that it is sufficient to prove this for the finite number of cases n ∈ {3, 4, 5, 7}.
Step 4. By Step 1 of the proof of Lemma 4.11, the function m → h α (m, n) has at most two critical points for all α ∈ (0, 1), n ≥ 3. If it has less than two critical points, then Lemma 6.2 follows immediately, so we just need to consider the case where it has exactly two critical points. By Step 1 of the proof of Lemma 4.11, the smallest critical point is a local maximum point and the largest critical point is a local minimum point.
By  (84) and (85). Let n be an integer, n ≥ 3, n = 6. By Step 3, h 1 ( m(n), n) > c > 0. Therefore, for α sufficiently close to 1, the value of m → h α (m, n) at its local minimum point is also positive by the uniform convergence. This proves Lemma 6.2 for the case n ≥ 3, n = 6.
Finally, the case n = 6 follows immediately from Step 2 of the proof of Lemma 4.11, where it was shown that the value of h α (m, 6) at its local minimum point is 0 for all α ∈ (0, 1).
Proof. Observe that h α (0, n) = −κ(n − 6) ≥ 0 for all n ≥ 6. Therefore the result follows immediately from Lemma 6.2. Now we are in a position to prove the main theorem of the section.
Let x belong to the edge e zz := T −1 (z) ∩ T −1 (z) and let x satisfy |z − x| = d, which means that x is the closet point on the boundary of the Laguerre cell T −1 (z) to z. Define M = µ(z). By assumption, M ≤ 19 4 m.
Step 6: Convexity inequality for h α for α sufficiently close to 1. Recall from (88) that Note that the positivity of inf [ 4 19 ,1] m α for α sufficiently close to 1 follows from the positivity of inf [ 4 19 ,1] m 1 and the uniform convergence of m α to m 1 . By equation (89) we have By the uniform convergence of h α to h 1 (Step 1 of the proof of Lemma 6.2), the uniform convergence of m α to m 1 (which implies that η α converges to inf [ 4 19 ,1] m 1 ), and the continuity of h 1 , we find that lim for all n ∈ {3, 4, 5} by Step 4. By (96), (97), Lemma 6.2, and Corollary 6.3 we conclude that h α (µ({z}), n) ≥ 0 for all z ∈ supp(µ) and all integers n ≥ 3, provided that α is sufficiently close to 1.
Step 7: Conclusion. By using the convexity inequality h α (µ({z}), n) ≥ 0 from Step 6, we can conclude the proof using the same argument that we used to prove Theorem 4.17.

Proof of Theorem 1.4
The proof is similar in spirit to the analogous result for the case α = 0.5 from [18, Theorem 3], although we have to do some extra work to take care of particles near the boundary. The main ingredient is the following stability result due to G. Fejes Tóth [36], which roughly states that if µ is a discrete measure on a convex n-gon Ω, with n ≤ 6, such that the rescaled quantization error Nµ |Ω| 2 W 2 2 (½ Ω , µ) is close to the asymptotically optimal value of c 6 , then the support of µ is close to a triangular lattice.
There exist ε 0 > 0 and c > 0 such that the following hold. If ε ∈ (0, ε 0 ) and ) ≤ ε, then, with the possible exception of at most N cε 1/3 indices i ∈ {1, . . . , N }, the following hold: (i) V i is a hexagon; (ii) the distance between z i and each vertex of V i is between (1 ± ε 1/3 ) |Ω| (iii) the distance between z i and each edge of V i is between (1 ± ε 1/3 ) |Ω| To appreciate the geometric significance of this result, note that for a regular hexagon of area |Ω|/N , the distance between the centre of the hexagon and each vertex is |Ω| N 2 3 √ 3 , and the distance between the centre of the hexagon and each edge is |Ω| N 1 2 √
The other key ingredient is an improved version of the convexity inequality (50) for sufficiently large masses. Proof. The constant ξ > 0 will be chosen as the minimum of several quantities that we are now going to introduce. Recall that h α (m, n) = c 6 1 − α m α + c n m 2 − c 6 2 − α 1 − α m − κ(n − 6).
Recall also that n → c n is decreasing with lim n→∞ c n = c ∞ > 0 (see Lemma 2.5) and κ < 0. Therefore, for all n ≥ 3, Proof of Theorem 1.4. Step 1. In this step we rescale the energy. Let µ δ = N δ i=1 m i δ z i ∈ P d (Ω) satisfy the hypotheses of Theorem 1.4. Define In analogy with (46), define As in Remark 4.4, it is easy to check that the defect d(µ δ ) can be rewritten as Step 2. In this step we estimate the number of small particles: #{i ∈ {1, . . . , N δ } : m i < m}. Despite the fact that Lemma 4.15 was proved for the case where Ω is a square, the same argument can be applied to the case where Ω is any convex polygon with at most six sides (actually, to any Lipschitz domain). The only changes will be to the constant D 0 and to the lower bound m b on the mass of particles close to the boundary (to be precise, for those particles within distance 4 of the boundary of the rescaled domain). The lower bound m on the mass of particles far from the boundary remains the same. This is the only constant whose specific value matters for us. Therefore we will denote the other two constants byD 0 and m b also in this case. Define Similarly to the proof of (78), Fix any η > 0. If δ is sufficiently small, then δ,α (4 + D 0 )(H 1 (∂Ω) + η) since Ω is convex polygon and the Minkowski content of ∂Ω equals H 1 (∂Ω) [2, Theorem 2.106]. Therefore #K Step 3. Let ξ > 0 be the constant given by Lemma 7.2. Define In this step we prove the following lower bound on the defect: We estimate where in the last step we used the fact that κ < 0 together with Lemma 2.6. Combining this estimate and (102) gives where in the last step we used the fact that m i ∈ (0, m] for each i ∈ K. This proves (104).
Step 4. In this step we prove Theorem 1.4(a). We have where in the penultimate step we used Jensen's inequality and in the last step we used (104). Therefore lim δ→0 V δ,α /N δ = 1 by (103) and since lim δ→0 d(µ δ ) = 0 by Theorem 1.1. This proves Theorem 1.4(a). For the future, we record that if δ is sufficiently small, then we can read off from (105) that N µ V δ,α ≤ 3.