1 Introduction

Consider the problem of approximating an absolutely continuous probability measure by a discrete probability measure. To quantify the quality of the approximation, we measure the approximation error in the 2-Wasserstein metric. Let \(\Omega \subset {\mathbb {R}}^d\) be the closure of an open and bounded set, and let

$$\begin{aligned} f\in L^1(\Omega ), \quad \quad f\ge c > 0, \quad \quad \int _\Omega f(x) \, \mathrm {d}x=1, \end{aligned}$$
(1)

be the density of the absolutely continuous probability measure. We approximate \(f \mathrm {d}x\) by a discrete measure from the set

$$\begin{aligned} {\mathcal {P}}_{\mathrm {d}}(\Omega ):= \left\{ \mu = \sum _{i=1}^{N_\mu } m_i \delta _{z_i} : N_\mu \in {\mathbb {N}}, \, m_i > 0, \, \sum _{i=1}^{N_\mu } m_i = 1, \, z_i \in \Omega , \, z_i \ne z_j \text { if } i \ne j \right\} . \end{aligned}$$

For \(\mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\), we define \(N_\mu := \# \mathrm {supp}(\mu )\), which is not fixed a priori. For \(p\ge 1\), the p-Wasserstein distance (see [65, 71]) between \(f \mathrm {d}x\) and \(\mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\) is

$$\begin{aligned}&W_p(f,\mu ) \nonumber \\&\quad :=\inf \left\{ \int _\Omega |x-T(x)|^p f(x)\,\mathrm {d}x \,:\, T:\Omega \rightarrow \{z_i\}_{i=1}^{N_\mu } \text { is Borel},\, \int _{T^{-1}(\{z_i\}) }f(x) \,\mathrm {d}x = m_i \,\, \forall \, i \right\} ^{\frac{1}{p}}. \end{aligned}$$
(2)

Observe that

$$\begin{aligned} \inf \{ W_p(f,\mu ): \mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\}=0 \end{aligned}$$

since there exists a sequence of discrete measures \(\mu _n\) converging weakly\(^*\) to \(f \mathrm {d}x\), with \(N_{\mu _n} \rightarrow \infty \) as \(n \rightarrow \infty \). On the other hand, for each \(N\in {\mathbb {N}}\), \(\inf \{ W_p(f,\mu ): \mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega ), \, N_\mu \le N \} > 0\). Therefore the problem \(\inf \{ W_p(f,\mu ): \mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\}\) has no solution. To obtain a minimizer we must constrain the number of atoms \(N_\mu \), either explicitly (with a constraint) or implicitly (with a penalization). Given an entropy \(H:{\mathcal {P}}_{\mathrm {d}}(\Omega )\rightarrow [0,\infty ]\) (defined below) we consider the constrained optimal location problem

$$\begin{aligned} \inf \left\{ W^p_p(f,\mu ) : \mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega ),\,H(\mu )\le L \right\} =: {\mathcal {E}}^{p,d}_H(L), \end{aligned}$$
(3)

where \(L>0\), and the penalized optimal location problem

$$\begin{aligned} \inf \left\{ W^p_p(f,\mu ) + \delta H(\mu ) : \mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\right\} =: {\mathcal {F}}^{p,d}_H(\delta ), \end{aligned}$$
(4)

where \(\delta >0\). If H satisfies \(H(\mu )\rightarrow \infty \) as \(N_\mu \rightarrow \infty \), then minimising sequences for problems (3) and (4) have a uniformly bounded number of atoms. If in addition H is lower semi-continuous with respect to the weak\(^*\) convergence of measures, then problems (3) and (4) admit a solution.

When L or \(\delta \) are fixed, the geometry of the set \(\Omega \) has a strong effect on optimal particle arrangements, and it is very difficult to characterise minimising configurations. As L increases, or \(\delta \) decreases, the optimal number of particles \(N_\mu \) increases, and it is believed that optimal configurations locally form regular, periodic patterns; see the numerical evidence in Figs. 1 and 2 . This phenomenon is known as crystallization (see Sect. 1.3 for more on this). The specific geometry of these patterns depends on the choice of p in the Wasserstein distance, the choice of H, and the dimension d. In this paper we will study the crystallization problem by taking the limits \(L\rightarrow \infty \) and \(\delta \rightarrow 0\).

Fig. 1
figure 1

Approximate local minimizers for the penalized problem (4) for the case \(p=d=2\), \(\Omega =[0,1]^2\), \(f={\mathbbm {1}}_\Omega \), \(H_\alpha (\mu )=\sum _i m_i^\alpha \), for several values of \(\alpha \) and \(\delta \). The value of \(\alpha \) is constant in each row, and the value of \(\delta \) decreases from left to right in each row. The black dots are the particles \(z_i\), where \(\mu = \sum _{i=1}^{N_\mu } m_i \delta _{z_i}\) is an approximate local minimizer of (4). The polygons are the sets \(T^{-1}(\{z_i\})\), where T is the optimal transport map in (2). The particles \(z_i\) are located at the centroids of the polygons. The masses \(m_i\) are the areas of the polygons. The colours correspond to the number of sides: squares are yellow, pentagons are orange, hexagons are blue, and heptagons are red. For each value of \(\alpha \), a hexagonal tiling (with defects) starts to emerge as \(\delta \) is decreased. This figure, Fig. 2 and Table 1 were made by Steven Roper using the generalized Lloyd algorithm from [16]. To search for a global minimizer in the highly non-convex energy landscape, the algorithm was ran many times using different, randomly generated initial conditions. The values of \(\delta \) were chosen by first choosing a target value of \(N_\mu \) and then using the heuristic (16) to generate the corresponding \(\delta \). Better results, without defects, can be achieved by taking the initial particle locations to be a perturbation of a triangular lattice; see Fig. 2

Fig. 2
figure 2

Approximate global minimizers for the penalized problem (4) for the case \(p=d=2\), \(\Omega =[0,1]^2\), \(f={\mathbbm {1}}_\Omega \), \(H_\alpha (\mu )=\sum _i m_i^\alpha \), \(\alpha =0.583\) for the values of \(\delta \) used in Fig 1 (middle row, middle and right columns). See the caption to Fig. 1 for a description of the polygons and the colour scheme. These configurations have lower energy (\(W_2^2(f,\mu )+\delta H_\alpha (\mu )\)) than the corresponding configurations shown in Fig. 1, and they do not have defects. This figure was generated by Steven Roper using the generalized Lloyd algorithm from [16] and by taking the initial conditions to be perturbations of a triangular lattice. In Fig. 1 (middle row, right column) there are. \(N_\mu =200\) particles whereas in this figure (right) there are \(N_\mu =202\) particles; algorithm [16] attempts to find the optimum number of particles

For the entropy

$$\begin{aligned} H(\mu )=\#\mathrm {supp}(\mu ) \end{aligned}$$
(5)

Zador’s Theorem for the asymptotic quantization error states that

$$\begin{aligned} \lim _{L\rightarrow \infty } \left[ L^{\frac{p}{d}} {\mathcal {E}}^{p,d}_H(L) \right] = C_{p,d} \left( \int _\Omega f(x)^{\frac{d}{d+p}} \,\mathrm {d}x \right) ^{\frac{d+p}{d}} \end{aligned}$$
(6)

for some positive constant \(C_{p,d}\) that is independent of the density f. See for example [20, 41, 45, 74] and see [44, 52, 53] for the more general case where \(\Omega \) is a Riemannian manifold. The constant \(C_{p,d}\) is known in two dimensions:

$$\begin{aligned} C_{p,2} = \int _{P_6} |x|^p \,\mathrm {d}x, \end{aligned}$$
(7)

where \(P_6\) is a regular hexagon of unit area centred at the origin. This follows from Fejes Tóth’s Theorem on Sums of Moments (see [36, 43]), which has also been proved in various levels of generality by several other authors including [13, 37, 59, 61].

The geometric interpretation of (6) and (7) is the following: In two dimensions it is asymptotically optimal to arrange the atoms of the discrete measure at the centres of regular hexagons, i.e., on a regular triangular lattice, where the areas of the hexagons depend on the density f. Locally, where f is approximately constant, these hexagons form a regular honeycomb. By the regular triangular lattice we mean the set \({\mathbb {Z}}(1,0)\oplus {\mathbb {Z}}(1/2,\sqrt{3}/2)\) up to dilation and isometry. See Remark 1.3 below for more on this geometric interpretation.

Formula (6) was extended to more general entropies by Bouchitté, Jimenez and Mahadevan in [14]. Their class of entropies includes the case

$$\begin{aligned} H_\alpha (\mu )=\sum _{i=1}^{N_\mu } m_i^\alpha , \end{aligned}$$
(8)

where \(\alpha \in (-\infty ,1)\). This reduces to the entropy (5) when \(\alpha =0\). Bouchitté, Jimenez and Mahadevan [14, Proposition 3.11(i)] proved that

$$\begin{aligned} \lim _{L\rightarrow \infty } \left[ L^{\frac{p}{d(1-\alpha )}} {\mathcal {E}}^{p,d}_{H_\alpha }(L) \right] = C_{p,d}(\alpha ) \left( \int _\Omega f(x)^{\frac{d(1-\alpha )+\alpha p}{d(1-\alpha )+p}} \,\mathrm {d}x \right) ^{1+\frac{p}{d(1-\alpha )}} \end{aligned}$$
(9)

for some positive constant \(C_{p,d}(\alpha )\). Moreover, they conjectured [14, Sect. 3.6 (ii)] that

$$\begin{aligned} C_{p,d}(\alpha ) \text { is independent of } \alpha . \end{aligned}$$

If this conjecture is true, then by (7)

$$\begin{aligned} C_{p,2}(\alpha ) = C_{p,2}(0) = \int _{P_6} |x|^p \,\mathrm {d}x. \end{aligned}$$

In particular, the conjecture for the case \(p=2\), \(d=2\) is

$$\begin{aligned} C_{2,2}(\alpha )=\int _{P_6} |x|^2 \,\mathrm {d}x=\frac{5}{18\sqrt{3}} =: c_6 \end{aligned}$$
(10)

for all \(\alpha \in (-\infty ,1)\). It is known that \(C_{2,2}(\alpha )=c_6\) for all \(\alpha \in (-\infty ,0]\) (see [14, Sect. 3.6]) and so it remains to establish the conjecture for the case \(\alpha \in (0,1)\). The conjecture would mean that in two dimensions a discrete measure supported on a regular triangular lattice gives asymptotically the best constrained approximation of the Lebesgue measure (again, see Remark 1.3 below for this geometric interpretation).

1.1 Main results

In this paper we prove conjecture (10) for all \(\alpha \in (-\infty ,{\overline{\alpha }}]\), where \({\overline{\alpha }}=0.583\); see Theorem 1.2. The conjecture for \(\alpha \in ({\overline{\alpha }},1)\) remains open, although we suggest a direction for proving it in Theorem 6.1, where we prove it under an additional assumption. In Theorem 1.1 we prove an analogous asymptotic quantization formula for the penalized optimal location problem (4) for all \(\alpha \in (-\infty ,{\overline{\alpha }}]\). This generalises the crystallization result of [18], where Theorem 1.1 was proved for the special case \(\alpha =0.5\), \(f=1\). Moreover, for the case \(f= 1\), we prove that minimal configurations are ‘asymptotically approximately’ a triangular lattice; see Theorem 1.4. To be more precise, we prove that, as \(\delta \rightarrow 0\), rescaled minimal configurations for the penalized quantization problem are quantitatively close to a triangular lattice. This result will be proved for the case \(\alpha ={\overline{\alpha }}\). The proof can be easily modified for any \(\alpha \le {\overline{\alpha }}\).

Define the constrained optimal quantization error by

$$\begin{aligned} \mathrm {m_c}(\alpha ,L):= {\mathcal {E}}^{2,2}_{H_\alpha }(L) = \inf \left\{ W^2_2(f,\mu ) : \mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega ),\,\, \sum _{i=1}^{N_\mu } m_i^\alpha \le L \right\} , \end{aligned}$$
(11)

and the penalized optimal quantization error by

$$\begin{aligned} \mathrm {m_p}(\alpha ,\delta ):= {\mathcal {F}}^{2,2}_{H_\alpha }(\delta ) = \inf \left\{ W^2_2\left( \,f,\mu \,\right) + \delta \sum _{i=1}^{N_\mu } m_i^\alpha : \mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\right\} . \end{aligned}$$
(12)

Since the Wasserstein distance on the compact set \(\Omega \) metrizes the tight convergence of probability measures, and the map \(\mu \mapsto \sum _i m_i^\alpha \) is lower semi-continuous with respect to this convergence [65, Lemma 7.11], both infima above are attained. Our main results are the following.

Theorem 1.1

(Asymptotic crystallization for the penalized optimal location problem). Let \(\alpha \in (-\infty ,{\overline{\alpha }}]\), where \({\overline{\alpha }}:=0.583\). Let \(\Omega \subset {\mathbb {R}}^2\) be the closure of an open and bounded set. Assume that \(f:\Omega \rightarrow [0,\infty )\) is lower semi-continuous with \(f\ge c>0\) and \(\int _\Omega f \, \mathrm {d}x = 1\). Then

$$\begin{aligned} \lim _{\delta \rightarrow 0}\left[ \left( \frac{c_6}{\delta (1-\alpha )}\right) ^{\frac{1}{2-\alpha }} \mathrm {m_p}(\alpha ,\delta ) \right] =\frac{2-\alpha }{1-\alpha } \, c_6\int _\Omega f(x)^{\frac{1}{2-\alpha }} \,\mathrm {d}x. \end{aligned}$$
(13)

Taking the special case \(f=1\), \(|\Omega |=1\), \(\alpha =0.5\) in Theorem 1.1 gives [18, Theorem 2]. We illustrate Theorem 1.1 in Table 1 and Figs. 1 and 2 .

Theorem 1.2

(Asymptotic crystallization for the constrained optimal location problem). Let \(\alpha \in (-\infty ,{\overline{\alpha }}]\), where \({\overline{\alpha }}:=0.583\). Let \(\Omega \subset {\mathbb {R}}^2\) be the closure of an open and bounded set. Assume that \(f:\Omega \rightarrow [0,\infty )\) is lower semi-continuous with \(f\ge c>0\) and \(\int _\Omega f \, \mathrm {d}x = 1\). Then

$$\begin{aligned} \lim _{L\rightarrow \infty }\left[ L^{\frac{1}{1-\alpha }} \, \mathrm {m_c}(\alpha ,L) \right] = c_6\left( \int _\Omega f(x)^{\frac{1}{2-\alpha }} \,\mathrm {d}x \right) ^{\frac{2-\alpha }{1-\alpha }}. \end{aligned}$$
(14)

By comparing equation (9) to equation (14) with \(p=d=2\), we read off that \(C_{2,2}(\alpha )=c_6\) for all \(\alpha \in (-\infty ,{\overline{\alpha }}]\), which proves conjecture (10) for this range of \(\alpha \). We believe that Theorem 1.1 and Theorem 1.2 hold for all \(\alpha \in (-\infty ,1)\), not just for \(\alpha \in (-\infty ,{\overline{\alpha }}]\), but we are only able to prove them for the whole range of \(\alpha \) if we make an ansatz about minimal configurations; see Theorem 6.1.

Table 1 Illustration of Theorem 1.1 for the case \(\Omega =[0,1]^2\), \(f={\mathbbm {1}}_\Omega \)

Remark 1.3

(Energy scaling and the geometric interpretation of Theorems 1.11.2). To motivate the rescaling on the left-hand side of (13) we reason as follows. Let

$$\begin{aligned} \Omega = \bigcup _{i=1}^N H_i \end{aligned}$$

be the union of N disjoint regular hexagons of equal area \(|\Omega |/N\). Let \(z_i\) be the centroid of \(H_i\) and let \(f = \tfrac{1}{|\Omega |}{\mathbbm {1}}_\Omega \) be the uniform probability distribution on \(\Omega \). Here \({\mathbbm {1}}_\Omega \) denotes the characteristic function of the set \(\Omega \). By definition of \(c_6\) (equation (10)) and a change of variables,

$$\begin{aligned} \int _{H_i} |x-z_i|^2 \, \mathrm {d}x = c_6 |H_i|^2 = c_6 \left( \frac{|\Omega |}{N}\right) ^2 \end{aligned}$$

for all i. Therefore the penalized quantization error of approximating \(f \mathrm {d}x\) by \(\mu =\sum _{i=1}^N \tfrac{1}{N} \delta _{z_i}\) is

$$\begin{aligned} W_2^2(f,\mu )+\delta H_\alpha (\mu ) = \sum _{i=1}^N \int _{H_i} |x-z_i|^2 \frac{1}{|\Omega |} \, \mathrm {d}x + \delta N \left( \frac{1}{N} \right) ^{\alpha } = c_6\frac{|\Omega |}{N} + \delta N^{1-\alpha }.\nonumber \\ \end{aligned}$$
(15)

The right-hand side of (15) is minimized when

$$\begin{aligned} N = \left( \frac{c_6 |\Omega |}{\delta (1-\alpha )}\right) ^{\frac{1}{2-\alpha }}. \end{aligned}$$
(16)

Substituting this value of N into (15) (assuming for a moment that it is an integer) gives

$$\begin{aligned} \left( \frac{c_6}{\delta (1-\alpha )}\right) ^{\frac{1}{2-\alpha }} \left( W_2^2(f,\mu )+\delta H_\alpha (\mu ) \right) \!=\! \frac{2-\alpha }{1-\alpha } \, c_6 \, |\Omega |^{\frac{1-\alpha }{2-\alpha }} = \frac{2-\alpha }{1-\alpha } \, c_6\int _\Omega f(x)^{\frac{1}{2-\alpha }} \,\mathrm {d}x, \end{aligned}$$

which motivates the rescaling used in (13). This heuristic computation suggests an upper bound for the left-hand side of (13), for the case where f is the uniform distribution. Theorem 1.1 says that this upper bound is in fact asymptotically optimal. In this sense we can say that the honeycomb structure gives asymptotically the best approximation of the uniform distribution.

The rescaling used in (14) can be derived in a similar way. Indeed, fix \(L>0\) and consider the constraint

$$\begin{aligned} H_\alpha (\mu )=\sum _{i=1}^{N_\mu } m_i^\alpha \le L. \end{aligned}$$

If all the masses are the same, \(m_i = 1/N_\mu \) for all i, then the biggest number \(N_\mu \) for which this constraint is satisfied is

$$\begin{aligned} N_\mu =L^{\frac{1}{1-\alpha }}. \end{aligned}$$

Assuming that \(N_\mu \) is an integer, take as above

$$\begin{aligned} \Omega = \bigcup _{i=1}^{N_\mu } H_i, \quad \quad f=\frac{1}{|\Omega |} {\mathbbm {1}}_\Omega , \quad \quad \mu =\sum _{i=1}^{N_\mu } \frac{1}{N_\mu } \delta _{z_i}. \end{aligned}$$

Then

$$\begin{aligned} L^{\frac{1}{1-\alpha }} \, W_2^2(f,\mu ) = c_6 \, |\Omega | = c_6\left( \, \int _\Omega f(x)^{\frac{1}{2-\alpha }} \,\mathrm {d}x\,\right) ^{\frac{2-\alpha }{1-\alpha }}, \end{aligned}$$

which motivates the rescaling used in (14). Combining this formal calculation with Theorem 1.2 again suggests the asymptotic optimality of the honeycomb.

Theorem 1.1 gives the asymptotic minimum value of the penalized quantization error but says nothing about the configuration of the particles; it says that the triangular lattice is asymptotically optimal, but it does not say that asymptotically optimal configurations are close to a triangular lattice. We prove this in the following theorem.

Theorem 1.4

(Asymptotically optimal configurations are close to a regular triangular lattice). Let \(\Omega \subset {\mathbb {R}}^2\) be a convex polygon with at most six sides, \(|\Omega |=1\), \(f={\mathbbm {1}}_\Omega \), and \(\alpha = {\overline{\alpha }}\). There exist constants \(\varepsilon _0, c, \beta _1, \beta _2 >0\) with the following property. Let \(\delta >0\) and \(\mu _\delta =\sum _{i=1}^{N_\delta } {\widetilde{m}}_i\delta _{{\widetilde{z}}_i}\in {\mathcal {P}}_{\mathrm {d}}(\Omega )\) be a solution of the penalized quantization problem defining \(\mathrm {m}_{\mathrm {p}}(\alpha ,\delta )\). Define the defect of \(\mu _\delta \) by

$$\begin{aligned} \mathrm {d}(\mu _\delta ):= \left( \frac{c_6}{\delta (1-\alpha )}\right) ^{\frac{1}{2-\alpha }}\mathrm {m_p}(\alpha ,\delta ) - \frac{2-\alpha }{1-\alpha } \, c_6. \end{aligned}$$

Note that \(\lim _{\delta \rightarrow 0} \mathrm {d}(\mu _\delta ) = 0\) by Theorem 1.1. Define

$$\begin{aligned} V_{\delta ,\alpha }= \left( \frac{c_6}{\delta (1-\alpha )} \right) ^{\frac{1}{2-\alpha }}. \end{aligned}$$

Define rescaled particle positions \(z_i = V_{\delta ,\alpha }^{1/2} {\widetilde{z}}_i\), \(i \in \{1,\ldots ,N_\delta \}\). Let \(\{ V_i \}_{i=1}^{N_\delta }\) be the Voronoi tessellation of \(\Omega \) generated by \(\{z_i\}_{i=1}^{N_\delta }\), i.e.,

$$\begin{aligned} V_i = \{ z \in V_{\delta ,\alpha }^{1/2} \Omega : |z-z_i| \le |z-z_j| \; \forall \, j \in \{1,\ldots ,N_\delta \}\}. \end{aligned}$$
  1. (a)

    The optimal number of particles \(N_\delta \) is asymptotically equal to \(V_{\delta ,\alpha }\):

    $$\begin{aligned} \lim _{\delta \rightarrow 0} \frac{V_{\delta ,\alpha }}{N_\delta }=1. \end{aligned}$$
  2. (b)

    If \(\delta >0\) is sufficiently small, and if \(\varepsilon \in (0,\varepsilon _0)\) and \(\mu _\delta \) satisfy

    $$\begin{aligned} \beta _1 \mathrm {d}(\mu _\delta ) + \beta _2 V_{\delta ,\alpha }^{-1/2} \le \varepsilon , \end{aligned}$$
    (17)

    then, with the possible exception of at most \(N_\delta c \varepsilon ^{1/3}\) indices \(i\in \{1,\dots ,N_\delta \}\), the following hold:

    1. (i)

      \(V_i\) is a hexagon;

    2. (ii)

      the distance between \(z_i\) and each vertex of \(V_i\) is between \((1 \pm \varepsilon ^{1/3}) \sqrt{\frac{V_{\delta ,\alpha }}{N_\delta }} \sqrt{\frac{2}{3 \sqrt{3}}}\);

    3. (iii)

      the distance between \(z_i\) and each edge of \(V_i\) is between \((1 \pm \varepsilon ^{1/3}) \sqrt{\frac{V_{\delta ,\alpha }}{N_\delta }} \sqrt{\frac{1}{2 \sqrt{3}}}\).

Even though Theorem 1.4 is stated only for the case \(\alpha ={\overline{\alpha }}\), the same proof holds for any \(\alpha \le {\overline{\alpha }}\), up to proving the convexity inequality (25) for that specific value of \(\alpha \) (by using the same strategy we used for the case \(\alpha ={\overline{\alpha }}\)). A similar result can be proved for the constrained quantization problem.

Remark 1.5

(Geometric interpretation of Theorem 1.4). Note that the term \( \beta _2 V_{\delta ,\alpha }^{-1/2}\) in (17) converges to 0 as \(\delta \rightarrow 0\). Theorem 1.4 essentially states that if the defect \(\mathrm {d}(\mu _\delta )\) is small, then the support of \(\mu _\delta \) is close to a regular triangular lattice, and it quantifies how close. Note that the Voronoi tessellation generated by the regular triangular lattice is a regular hexagonal tessellation. The theorem states that the Voronoi tessellation of \( V_{\delta ,\alpha }^{1/2} \Omega \) generated by the rescaled particles \(z_i\) is close to a regular hexagonal tessellation in the sense that, except for at most \(N_\delta c \varepsilon ^{1/3}\) Voronoi cells, the Voronoi cells are hexagons, and it quantifies how far the hexagons are from being regular. For a regular hexagon of area \(\frac{V_{\delta ,\alpha }}{N_\delta }\), the distance between the centre of the hexagon and each vertex is \(\sqrt{\frac{V_{\delta ,\alpha }}{N_\delta }} \sqrt{\frac{2}{3 \sqrt{3}}}\), and the distance between the centre of the hexagon and each edge is \(\sqrt{\frac{V_{\delta ,\alpha }}{N_\delta }} \sqrt{\frac{1}{2 \sqrt{3}}}\). Since \(\lim _{\delta \rightarrow 0} V_{\delta ,\alpha } /N_\delta \rightarrow 1\), ‘most’ of the rescaled Voronoi cells \(V_i\) are ‘close’ to a regular hexagon of area 1.

Remark 1.6

(Locality and weaker assumptions on f). Theorems 1.11.2 say that the quantization problems are essentially independent of f, in the sense that the optimal constants \(\frac{2-\alpha }{1-\alpha } c_6\) and \(c_6\) are independent of f and are determined by the corresponding quantization problems with \(f=1\); see Remarks 3.5 and 3.11 . The locality of the quantization problems is independent of the crystallization and is easier to prove. The locality for the constrained problem was proved by [14] and the locality for the penalised problem follows easily from this, as we shall see in Sect. 3.2. Locality results for the classical quantization problem were proved among others by [20, 45, 53, 74]. We believe that the assumption of lower semi-continuity on f in Theorems 1.11.2 could be relaxed by using the approach in [60], where a locality result is proved for the related irrigation problem, which concerns the best approximation of an absolutely continuous probability measure by a one-dimensional Hausdorff measure supported on a curve.

Remark 1.7

(\(\alpha \ge 1\)). For \(\alpha \ge 1\), the constrained and penalized quantization problems \(\mathrm {m_c}(1,L)\) and \(\mathrm {m_p}(1,\delta )\) do not have a minimizer. The infimum is zero since both the Wasserstein distance and the entropy can be sent to zero by sending the number of particles to infinity. In [14] the authors considered the constraint

$$\begin{aligned} \sum _{i=1}^{N_\mu }m_i^\alpha \ge \frac{1}{L} \end{aligned}$$

for \(\alpha > 1\). For \(f \in L^\infty (\Omega )\), \(\alpha \in (1,2) \cup (2,\infty )\) they proved that there exists a constant \(C_{2,2}(\alpha )\) such that

$$\begin{aligned} \lim _{L \rightarrow \infty } L^{\frac{1}{\alpha - 1}} \inf \left\{ W^2_2(f,\mu ) : \mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega ),\, \sum _{i=1}^{N_\mu }m_i^\alpha \ge \frac{1}{L} \right\} = C_{2,2}(\alpha ) \left( \int _\Omega f(x)^{\frac{1}{2-\alpha }} \, \mathrm {d}x \right) ^{\frac{2-\alpha }{1-\alpha }}. \end{aligned}$$

See [14, Proposition 3.11(iii), Remark 3.13(iii)]. For \(\alpha >2\), \(C_{2,2}(\alpha )=0\). For \(\alpha \in (1,2)\), \(C_{2,2}(\alpha )\) is not known, but it satisfies the bounds

$$\begin{aligned} \int _B |x|^2 \, \mathrm {d}x \le C_{2,2}(\alpha ) \le c_6 \end{aligned}$$

where B is the ball of unit area centred at the origin [14, Lemma 3.10].

Remark 1.8

(Motivation for the choice of entropy \(H_\alpha \)). The are several reasons why we chose to study the entropy \(H_\alpha (\mu )=\sum _i m_i^\alpha \), both mathematical and from a modelling point of view.

  1. (i)

    The functional \(\mu \mapsto W_2^2(f,\mu ) + \delta \sum _i h(m_i)\) is lower semi-continuous if \(h(0)=0\), \(h(t) \ge 0\), h is lower semi-continuous, subadditive and \(\lim _{t\rightarrow 0+}h(t)/t=+\infty \); see [65, Lemma 7.11]. This includes our entropy \(h(m)=m^\alpha \). There is evidence, however, that crystallization does not hold for all entropies in this class, or at least that optimal configurations consist of particles of different sizes; see [14, Sect. 3.4]. In this paper we have found a subclass for which crystallization holds. It is an open problem to find the largest class of such entropies.

  2. (ii)

    Functionals of the form \(\mu \mapsto W_2^2(f,\mu ) + \delta \sum _i h(m_i)\) arise in models of economic planning; see [24]. For example, consider the problem of the optimal location of warehouses in a county \(\Omega \) with population density f. The measure \(\mu =\sum _i m_i \delta _{z_i}\) represents the locations \(z_i\) and sizes \(m_i\) of the warehouses. The Wasserstein term in the functional above penalizes the average distance between the population and the warehouses, and the entropy term penalizes the building or running costs of the warehouses. The subadditive nature of the entropy \(H_\alpha \) corresponds to an economy of scale, where it is cheaper to build one warehouses of size m than two of size m/2.

  3. (iii)

    The special case \(\alpha =0.5\) arises in a simplified model of a two-phase fluid, namely a diblock copolymer melt, in two dimensions; see [17]. Here the entropy \(\sqrt{m}\) corresponds to the interfacial length between a droplet of one phase of area m and the surrounding, dominant phase.

  4. (iv)

    Finally, from a mathematical perspective, we were inspired to study the entropy \(H_\alpha \) by the conjecture of Bouchitté et al. [14, Sect. 3.6 (ii)].

1.2 Sketch of the proofs of Theorems 1.11.2

We briefly present the main ideas of the paper. We will see that Theorem 1.2 is an easy consequence of Theorem 1.1 (see Sect. 5), and so here we just focus on the ideas behind the proof of Theorem 1.1. The strategy for proving Theorem 1.4 is discussed in Sect. 7.

First we identify the scaling of the penalized quantization error \(\mathrm {m_p}(\alpha ,\delta )\) as \(\delta \rightarrow 0\) using the \(\Gamma \)-convergence result of [14]. This gives

$$\begin{aligned} \lim _{\delta \rightarrow 0}\left[ \left( \frac{c_6}{\delta (1-\alpha )}\right) ^{\frac{1}{2-\alpha }} \mathrm {m_p}(\alpha ,\delta )\right] =C_{\mathrm {p}}(\alpha ) \int _\Omega f(x)^{\frac{1}{2-\alpha }} \,\mathrm {d}x \end{aligned}$$
(18)

where

$$\begin{aligned} C_{\mathrm {p}}(\alpha )=\ \lim _{\delta \rightarrow 0}\left[ \left( \frac{c_6}{\delta (1-\alpha )}\right) ^{\frac{1}{2-\alpha }} \min \left\{ \delta \sum _{i=1}^{N_\mu } m_i^\alpha + W^2_2({\mathbbm {1}}_Q,\mu ) \,:\, \mu \in {\mathcal {P}}_{\mathrm {d}}(Q) \right\} \right] \nonumber \\ \end{aligned}$$
(19)

and \(Q=[-1/2,1/2]^2\); see Corollary 3.10 and Remark 3.11. The main challenge in this paper is to show that the optimal constant is \(C_{\mathrm {p}}(\alpha )= c_6 (2-\alpha )/(1-\alpha )\) for all \(\alpha \in (-\infty ,{\overline{\alpha }}]\). Thanks to equations (18) and (19), to prove Theorem 1.1 it is sufficient to prove it for the case where \(\Omega = Q\) and \(f={\mathbbm {1}}_\Omega \).

Next we prove a monotonicity result (Lemma 3.12), which is analogous to a monotonicity result proved by [14] for the constrained quantization problem, which asserts that if Theorem 1.1 holds for some \({\widetilde{\alpha }}\in (-\infty ,1)\), then it holds for all \(\alpha \in (-\infty ,{\widetilde{\alpha }}]\). Therefore we only need to prove Theorem 1.1 for the single value \(\alpha ={\overline{\alpha }}=0.583\). Therefore for the rest of the paper we can take \(\Omega =Q\), \(f={\mathbbm {1}}_\Omega \), \(\alpha ={\overline{\alpha }}\) without loss of generality.

From the definition of the Wasserstein distance, equation (2), if \(\mu =\sum _{i=1}^{N_\mu } m_i \delta _{z_i}\), then

$$\begin{aligned} W_2^2({\mathbbm {1}}_Q,\mu ) = \sum _{i=1}^{N_\mu } \int _{T^{-1}(\{z_i\})} |x-z_i|^2 \,\mathrm {d}x, \end{aligned}$$
(20)

where T is the optimal transport map. Since Q is a polygonal set, it is well known (see Lemma 2.1) that the sets \(T^{-1}(\{z_i\})\) are convex polygons, called Laguerre cells.

A classical result by Fejes Tóth (see Lemma 2.3) states that the second moment of a polygon about any point in the plane is greater than or equal to the second moment of a regular polygon (with the same area and same number of edges) about its centre of mass:

$$\begin{aligned} \int _{P(m,n)} |x-z|^2 \,\mathrm {d}x \ge \int _{R(m,n)} |x|^2 \, \mathrm {d}x = m^2 \int _{R(1,n)} |y|^2 \, \mathrm {d}y =: m^2 c_n \end{aligned}$$
(21)

where P(mn) is a polygon with area m and n edges, R(mn) is a regular polygon centred at the origin with area m and n edges, and \(z \in {\mathbb {R}}^2\). Combining (20) and (21) gives

$$\begin{aligned} W_2^2({\mathbbm {1}}_\Omega ,\mu ) \ge \sum _{i=1}^{N_\mu } m_i^2 c_{n_i}, \end{aligned}$$
(22)

for all \(\mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\), where \(n_i\) denotes the number of edges of the polygon \(T^{-1}(\{z_i\})\) and \(m_i\) denotes its area. Our proofs are limited to the p-Wasserstein metric with \(p=2\) since, for \(p \ne 2\), the transport regions \(T^{-1}(\{ z_i \})\) are not convex polygons. Moreover, our proofs are limited to two dimensions since there is no equivalent statement of Fejes Tóth’s Moment Lemma in higher dimensions (due to the lack of regular polytopes in higher dimensions).

Next we recall the proof of Theorem 1.2 due to Gruber [43] for the case \(\alpha =0\), which we will adapt to prove Theorem 1.1 (and consequently Theorem 1.2) for general \(\alpha \in (-\infty ,{\overline{\alpha }}]\). It can be shown that the function

$$\begin{aligned} (m,n)\mapsto g(m,n):=m^2 c_n \end{aligned}$$

is convex. (Note that \(n \mapsto c_n\) can be extended from a function on \({\mathbb {N}} \cap [3,\infty )\) to a function on \([3,\infty )\); see Lemma 2.3.) If \(\mu \) is a minimizer of \(W_2^2({\mathbbm {1}}_\Omega ,\cdot )\) subject to the constraint \(N_\mu \le L\), then clearly \(N_\mu =L\) (assuming that L is an integer) since we get the best constrained approximation of \({\mathbbm {1}}_\Omega \) by taking as many Dirac masses as possible. By convexity,

$$\begin{aligned} m^2 c_n = g(m,n)&\ge g\left( \tfrac{1}{L},6\right) + \nabla g\left( \tfrac{1}{L},6\right) \cdot \left( m-\tfrac{1}{L},n-6 \right) \nonumber \\&=\frac{c_6}{L^2} + \frac{2c_6}{L} \left( m - \tfrac{1}{L} \right) + \frac{\kappa }{L^2} (n-6) \end{aligned}$$
(23)

where \(\kappa :=g_n(1,6)=\partial _n c_n |_{n=6}<0\). Combining equations (22) and (23) gives

$$\begin{aligned} W_2^2({\mathbbm {1}}_\Omega ,\mu ) \ge \sum _{i=1}^{L} \left( \frac{c_6}{L^2} + \frac{2c_6}{L} \left( m_i - \tfrac{1}{L} \right) + \frac{\kappa }{L^2} (n_i-6) \right) = \frac{c_6}{L} + \frac{\kappa }{L^2}\sum _{i=1}^{L}(n_i-6)\nonumber \\ \end{aligned}$$
(24)

since \(\sum _i m_i = |\Omega | = 1\). Euler’s formula for planar graphs implies that the average number of edges in any partition of the unit square \(\Omega \) by convex polygons is less than or equal to 6: \(\frac{1}{L} \sum _{i} n_i \le 6\); see Lemma 2.6. Therefore, by equation (24) and since \(\kappa < 0\),

$$\begin{aligned} L \, \mathrm {m_c}(0,L) \ge c_6. \end{aligned}$$

This is the lower bound in Theorem 1.2 for the case \(\alpha =0\). A matching upper bound can be obtained in the limit \(L \rightarrow \infty \) by taking \(\mu =\sum _{i=1}^L \tfrac{1}{L} \delta _{z_i}\) where \(z_i\) lie on a regular triangular lattice.

In [17] Gruber’s strategy was generalized to prove Theorem 1.1 for the case \(\alpha =0.5\) and \(f=1\). Thanks to the results of [14] and our results in Sect. 3.2, it follow that Theorem 1.1 holds for all \(\alpha \in (-\infty ,0.5]\) and all lower semi-continuous f satisfying (1). In this paper we extend these ideas further to prove Theorem 1.1 for the case \(\alpha =0.583\), and hence all \(\alpha \in (-\infty ,0.583]\). First of all, we rescale the square Q as follows (see Remark 1.3):

$$\begin{aligned} Q_{\delta ,\alpha }:= V_{\delta ,\alpha }^{\frac{1}{2}}Q, \qquad V_{\delta ,\alpha }= \left( \frac{c_6}{\delta (1-\alpha )} \right) ^{\frac{1}{2-\alpha }}. \end{aligned}$$

The rescaling factor is chosen in such a way that a discrete measure supported at the centres of regular hexagons of unit area is asymptotically optimal. Up to a multiplicative factor, the rescaled energy is

$$\begin{aligned} {\mathcal {F}}_{\delta ,\alpha }(\mu ):= \frac{c_6}{1-\alpha } \sum _{i=1}^{N_\mu } m_i^\alpha + W^2_2({\mathbbm {1}}_{Q_{\delta ,\alpha }},\mu ), \end{aligned}$$

where \({\mathbbm {1}}_{Q_{\delta ,\alpha }}\) denotes the characteristic function of the square \(Q_{\delta ,\alpha }\). Here \(\mu \) is a Borel measure on \(Q_{\delta ,\alpha }\) of the form \(\mu =\sum _{i=1}^{N_\mu } m_i \delta _{z_i}\) with \(\sum _{i=1}^{N_\mu } m_i = V_{\delta ,\alpha }\). By (22) we have

$$\begin{aligned} {\mathcal {F}}_{\delta ,\alpha }(\mu ) \ge \sum _{i=1}^{N_\mu } \left[ \frac{c_6}{1-\alpha } m_i^\alpha + m_i^2 c_{n_i} \right] = \sum _{i=1}^{N_\mu } g_\alpha (m_i,n_i), \end{aligned}$$

where

$$\begin{aligned} g_\alpha (m,n):=\frac{c_6}{1-\alpha } m^\alpha + m^2 c_n. \end{aligned}$$

Unfortunately, for \(\alpha \in (0,1)\), \(g_\alpha \) is not convex. Our first main technical result is to show that for \(\alpha ={\overline{\alpha }}\) there exists \(m_0>0\) such that the following ‘convexity inequality’ holds for all \(m\ge m_0\), \(n \in {\mathbb {N}} \cap [3,\infty )\):

$$\begin{aligned} g_{{\overline{\alpha }}}(m,n)\ge g_{{\overline{\alpha }}}(1,6) + \nabla g_{{\overline{\alpha }}}(m,n)\cdot (m-1,n-6). \end{aligned}$$
(25)

See Lemma 4.11, Corollaries 4.12 and  4.16. Our second main technical result (Lemma 4.15) is to show that if \(\mu =\sum _{i=1}^{N_\mu } m_i \delta _{z_i}\) minimizes \({\mathcal {F}}_{\delta ,\alpha }\), then

$$\begin{aligned} m_i> 2.0620 \times 10^{-4} > m_0. \end{aligned}$$
(26)

Therefore minimizers satisfy the convexity inequality (25), and the proof of Theorem 1.1 now follows using Gruber’s strategy.

To be precise, we are only able to prove the inequality (26) for particles \(z_i\) that are not too close to the boundary (Lemma 4.15(i)). Nevertheless, we are able to prove a worse lower bound on the mass \(m_i\) of particles near the boundary (Lemma 4.15(ii)), which is still sufficient to show that the number of particles near the boundary is asymptotically negligible. This fixes what appears to be a gap in the proof in [18], where it was tacitly assumed that all of the particles were sufficiently far from the boundary of the rescaled domain (at least distance 3.2143; see the proof of [18, Lemma 7]).

The idea of the proof of (26) is to compare the energy of a minimizer \(\mu \) with that of a competitor \({\widetilde{\mu }}\) that is obtained by gluing the smallest particle of \(\mu \) with one of its neighbours. The proofs of (25) and (26) require some delicate positivity estimates. As in the proof of [18], we also use computer evaluation at several points in the proof to check the sign of some explicit numerical constants (that are much larger than machine precision).

1.3 Literature on crystallization, optimal partitions and quantization

Our work belongs to the very active research programme of establishing crystallization results for nonlocal interacting particle systems. This problem is known as the crystallization conjecture [12]. Despite experimental evidence that many particle systems, such as atoms in metals, have periodic ground states, until recently there were few rigorous mathematical results. Results in one dimension include [11, 39] and results in two dimensions include [3, 7,8,9,10, 18, 30, 35, 50, 63, 64, 69]. Let us recall that a central open problem in mathematical physics is to establish the optimality of the Abrikosov (triangular) lattice for the Ginzburg-Landau energy [68]. In three dimensions there are few rigorous results. Even establishing the optimal configuration of just five charges on a sphere was only achieved in 2013 via a computer-assisted proof [66]. The Kepler conjecture about optimal sphere packing was also computer-assisted [48, 49], while the optimal sphere covering remains to this day unknown. In even higher dimensions (in particular 8 and 24), there start to be more rigorous results again, e.g., [26, 27, 70]. For a thorough survey of recent crystallization results for nonlocal particle systems see [12] and [67].

Our result also falls into the field of optimal partitions (see Remark 4.10). The optimality of hexagonal tilings, or Honeycomb conjectures, have been proved for example by [21,22,23, 47]. Kelvin’s problem of finding the optimal foam in 3D (the ‘three-dimensional Honeycomb conjecture’) remains to this day unsolved; for over 100 years it was believed that truncated octahedra gave the optimal tessellation, until the remarkable discovery of a better tessellation by Weire and Phelan [72].

Finally, our result also belongs to the field of optimal quantization or optimal sampling [41, 45, 46, Sect. 33], which concerns the best approximation of a probability measure by a discrete probability measure. The most commonly used notion of best approximation is the Wasserstein distance. This problem has been studied by a wide range of communities including applied analysis [14, 24, 51], computational geometry [33], discrete geometry [28, 45], and probability [41]. Applications include optimal location of resources [13], signal and image compression [34, 42], numerical integration [62, Sect. 2.3], mesh generation [32, 58], finance [62], materials science [19, Sect. 3.2], and particle methods for PDEs (sampling the initial distribution) [15, Example 7.1].

It is well known that if \(\mu =\sum _{i=1}^N m_i z_i\) is a minimizer of \(W_2(f \mathrm {d}x,\cdot )\), then the particles \(z_i\) generate a centroidal Voronoi tessellation (CVT) [33, 55], which means the particles \(z_i\) lie at the centroids of their Voronoi cells. Numerical methods for computing CVTs include Lloyd’s algorithm [33] and quasi-Newton methods [55]. More generally, minimizers of the penalized energy \(\mu \mapsto \delta \sum _i m_i^\alpha + W_2^2(f \mathrm {d}x,\mu )\) generate centroidal Laguerre tessellations (see Remark 4.6). Numerical methods for solving the constrained and penalized quantization problems include [16] (which was used to produce Figs. 1 and 2 ) and [73].

There is a large literature on optimal CVTs of N points (global minima of \(\mu \mapsto W_2({\mathbbm {1}}_\Omega ,\mu )\) subject to \(\# \mathrm {supp}(\mu )=N\)). According to Gersho’s conjecture (see [40]), minimizers correspond to regular tessellations consisting of the repetition of a single polytope whose shape depends only on the spatial dimension. In two dimensions the polytope is a hexagon [13, 36, 37, 43, 59, 61] and moreover the result holds for any p-Wasserstein metric, \(p \in [1,\infty )\). Gersho’s conjecture is open in three dimensions, although it is believed that the optimal CVT is a tessellation by truncated octahedra, which is generated by the body-centred cubic (BCC) lattice. Some numerical evidence for this is given in [31], and in [6] it was proved that the BCC lattice is optimal among lattices (but we do not know whether the optimal configuration is in fact a lattice). Geometric properties of optimal CVTs in 3D were recently proved in [25], who also suggested a strategy for a computed-assisted proof of Gersho’s conjecture.

1.4 Organization of the paper

In Sect. 2 we recall some basic notions from optimal transport theory and convex geometry. In Sect. 3.1 we recall from [14] the result (9) for the case \(d=p=2\), namely the scaling of the minimum value of the energy for the constrained problem (11). In Sect.  3.2 we derive the scaling of the minimum value of the energy for the penalized problem (12). These results give the optimal scaling of the minimum values of the constrained and penalized energies, but they do not give the optimal constants. In Sect. 4 we identify the optimal constant for the penalized problem (which proves Theorem 1.1) and in Sect. 5 we identify the optimal constant for the constrained problem (which proves Theorem 1.2). In Sect. 6 we prove the asymptotic crystallization result for all \(\alpha \in (-\infty ,1)\) under an additional assumption. Finally, Sect. 7 is devoted to the proof of Theorem 1.4.

2 Preliminaries

2.1 Main assumptions

We assume that \(\Omega \subset {\mathbb {R}}^2\) is the closure of an open and bounded set, and \(f\in L^1(\Omega )\) is a lower semi-continuous function satisfying \(f\ge c>0\) and

$$\begin{aligned} \int _\Omega f(x) \,\mathrm {d}x =1. \end{aligned}$$

2.2 Notation

Define \({\mathbb {R}}^+:=(0,\infty )\). For a Lebesgue-measurable set \(A\subset {\mathbb {R}}^2\), we denote by |A| its area and by \({\mathbbm {1}}_A\) its characteristic function. We let \({\mathcal {M}}(X)\) denote the set of non-negative finite Borel measures on \(X\subset {\mathbb {R}}^d\) and \({\mathcal {P}}(X)\subset {\mathcal {M}}(X)\) denote the set of probability measures on X. Moreover, we let \({\mathcal {M}}_{\mathrm {d}}(X)\subset {\mathcal {M}}(X)\) be the following set of discrete measures:

$$\begin{aligned} {\mathcal {M}}_{\mathrm {d}}(X):=\left\{ \sum _{i=1}^{N} m_i \delta _{z_i} : N \in {\mathbb {N}}, \, m_i > 0, \, z_i \in X, \, z_i \ne z_j \text { if } i \ne j \right\} . \end{aligned}$$

Recall that \({\mathcal {P}}_{\mathrm {d}}(X)\) denotes the set of discrete probability measures, \({\mathcal {P}}_{\mathrm {d}}(X)={\mathcal {M}}_{\mathrm {d}}(X)\cap {\mathcal {P}}(X)\), and that, for \(\mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\), \(N_\mu := \# \mathrm {supp}(\mu )\). For brevity, in an abuse of notation, we denote the preimage of a singleton set \(\{ z\} \subset X\) under a map \(T:X \rightarrow X\) by \(T^{-1}(z)\) instead of \(T^{-1}(\{z\})\).

2.3 Facts from optimal transport theory and convex geometry

We start by recalling the characterization of solutions of the semi-discrete transport problem (2) for the case \(p=2\). The following result goes back to [4] and is now well-known in the optimal transport community; see for example [54, 56, 57].

Lemma 2.1

(Characterization of the optimal transport map). Let \(U\subset {\mathbb {R}}^2\) be a convex polygon, \(\mu = \sum _{i=1}^{N_\mu } m_i \delta _{z_i} \in {\mathcal {P}}_{\mathrm {d}}(U)\), \(g \in L^1(U;{\mathbb {R}}^+)\), \(\int _U g \, \mathrm {d}x =1\), and \(W_2(g,\mu )\) be the Wasserstein metric

$$\begin{aligned} W_2(g,\mu ) \,=\inf \left\{ \int _U |x-T(x)|^2 g(x)\,\mathrm {d}x \, : \, T : U\rightarrow \{z_i\}_{i=1}^{N_\mu } \text { is Borel}, \, \int _{T^{-1}(z_i) }g(x) \,\mathrm {d}x = m_i \, \forall \, i \right\} ^{\frac{1}{2}}. \end{aligned}$$

Then the infimum is attained and the minimizer \(T:U\rightarrow \{z_i\}_{i=1}^{N_\mu }\) is unique (up to a set of measure zero). Moreover, by possibly modifying T on a set of measure zero, there exists \((w_1,\dots , w_{N_\mu })\in {\mathbb {R}}^{N_\mu }\) such that

$$\begin{aligned} \overline{T^{-1}(z_i)}=\left\{ z\in U : |z-z_i|^2 - w_i \le |z-z_j|^2 - w_j \text { for all } j=1,\dots , N_\mu \right\} . \end{aligned}$$

Remark 2.2

(Laguerre cells). The previous lemma implies that the partition \(\{\overline{T^{-1}(z_i)} \}_{i=1}^{N_\mu }\) is the Laguerre tessellation or power diagram generated by the weighted points \(\{ (z_i, w_i) \}_{i=1}^{N_\mu }\); see [5, 56]. The sets \(\overline{T^{-1}(z_i)}\) are convex polygons, known as Laguerre cells or power cells.

We now recall a classical result by L. Fejes Tóth (see [36, p. 198]), which says that the minimal second moment of an n-gon is greater than or equal to the minimal second moment of a regular n-gon of the same area:

Lemma 2.3

(Fejes Tóth’s Moment Lemma). For \(n\in {\mathbb {N}}\), \(n\ge 3\), define

$$\begin{aligned} c_n:=\inf \left\{ \min _{\xi \in {\mathbb {R}}^2} \int _P |x-\xi |^2 \mathrm {d}x : P \text { is an { n}-gon},\, |P|=1 \right\} . \end{aligned}$$

Then the infimum is attained by a regular n-gon. Consequently a direct calculation gives

$$\begin{aligned} c_n=\frac{1}{2n}\left( \frac{1}{3}\tan \frac{\pi }{n}+\cot \frac{\pi }{n} \right) . \end{aligned}$$
(27)

Remark 2.4

Note that a change of variables gives

$$\begin{aligned} \inf \left\{ \min _{\xi \in {\mathbb {R}}^2} \int _P |x-\xi |^2 \mathrm {d}x : P \text { is an { n}-gon},\, |P|=m \right\} =c_n m^2 \end{aligned}$$

for all \(m>0\).

We extend the definition of \(c_n\) to all \(n\in [3,\infty )\) using equation (27). Its main properties are stated in the next result, whose proof is a direct computation (see [43]).

Lemma 2.5

(Properties of \(c_n\)). The function \(n \mapsto c_n\), \(n \in [3,\infty )\), is convex and decreasing. Moreover

$$\begin{aligned} \lim _{n\rightarrow \infty }c_n=c_\infty :=\frac{1}{2\pi }. \end{aligned}$$

Finally, we recall one more result from convex geometry, which follows from Euler’s polytope formula. It is proved for example in [18, Lemma 4] or [59, Lemma 3.3].

Lemma 2.6

(Partitions by convex polygons). Let \(U \subset {\mathbb {R}}^2\) be a convex polygon with at most 6 sides. In any partition of U by convex polygons, the average number of edges per polygon is less than or equal to 6.

3 Scaling of the Asymptotic Quantization Error

3.1 The constrained optimal location problem

We report here a result about the asymptotic quantization error from [14].

Definition 3.1

(Young measures). Given \(\varepsilon >0\) and a measure \(\mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\) of the form

$$\begin{aligned} \mu =\sum _{i=1}^{N_\mu } m_i {\delta _{z_i}}, \end{aligned}$$

define the measures \(\rho (\mu )\in {\mathcal {P}}_{\mathrm {d}}({\mathbb {R}}^+)\) and \(\lambda _\varepsilon (\mu )\in {\mathcal {M}}_{\mathrm {d}}(\Omega \times {\mathbb {R}}^+)\) by

$$\begin{aligned} \rho (\mu ):=\sum _{i=1}^{N_\mu } m_i \delta _{m_i}, \quad \quad \lambda _\varepsilon (\mu ):=\sum _{i=1}^{N_\mu } m_i\delta _{\left( z_i,\frac{m_i}{\varepsilon ^2} \right) }. \end{aligned}$$

Observe that the first marginal of \(\lambda _\varepsilon (\mu )\) is \(\mu \) and that the second marginal of \(\lambda _1(\mu )\) is \(\rho (\mu )\).

In order to define the cell formula for the asymptotic quantization error, we need to introduce the following metric on the space of probability measures. Given \(\rho _1, \rho _2\in {\mathcal {P}}({\mathbb {R}}^+)\), we define

$$\begin{aligned} \mathrm {d}_{\mathrm {BL}} (\rho _1, \rho _2):= \sup \left\{ \int _{{\mathbb {R}}^+} \varphi \, d(\rho _1-\rho _2) : \varphi \in \mathrm {Lip}({\mathbb {R}}^+), \, |\varphi |_{\infty } + \mathrm {Lip}(\varphi ) \le 1 \right\} , \end{aligned}$$

where \(\mathrm {Lip}({\mathbb {R}}^+)\) is the space of Lipschitz continuous functions on \({\mathbb {R}}^+\) and \( \mathrm {Lip}(\varphi )\) denotes the Lipschitz constant of \(\varphi \). It is well known that \(\mathrm {d}_{\mathrm {BL}}\) metrizes tight convergence (see [29, Theorem 11.3.3]).

The energy density of the asymptotic quantization error is introduced as follows.

Definition 3.2

(Cell formula). Given \(t>0\) and \(\rho \in {\mathcal {P}}({\mathbb {R}}^+)\), define

$$\begin{aligned} G_t(\rho ):=\inf _{k>0}\frac{S_t(\rho , Q_k)}{k^2} \end{aligned}$$

where \(Q_k:=[-k/2,k/2]^2\subset {\mathbb {R}}^2\) and

$$\begin{aligned} S_t(\rho , Q_k):=\inf \left\{ W^2_2({\mathbbm {1}}_{Q_k}, \mu ) +\frac{k^2}{t^2}\mathrm {d}_{\mathrm {BL}}(\rho ,\rho (\mu )) : \mu \in {\mathcal {P}}_{\mathrm {d}}(Q_k) \right\} . \end{aligned}$$

Define \(G:{\mathcal {P}}({\mathbb {R}}^+)\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} G(\rho ):=\sup _{t>0} G_t(\rho ) = \lim _{t \rightarrow 0} G_t(\rho ). \end{aligned}$$

Given \(\lambda \in {\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\), let \(\pi _1 \# \lambda \) denote its first marginal, where \(\pi _1 : \Omega \times {\mathbb {R}}^+ \rightarrow \Omega \) is the projection \(\pi _1(x,t)=x\). One of the main results of Bouchitté, Jimenez and Mahadevan [14, Theorem 3.1] is the following:

Theorem 3.3

(Gamma-limit of the quantization error). For \(\varepsilon >0\), let

$$\begin{aligned} {\mathcal {E}}_{\varepsilon }(\lambda ):= {\left\{ \begin{array}{ll} \frac{1}{\varepsilon ^2} W^2_2(f, \mu ) &{} \text {if } \lambda =\lambda _\varepsilon (\mu ), \, \mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega ),\\ + \infty &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(28)

Then \({\mathcal {E}}_{\varepsilon }{\mathop {\rightarrow }\limits ^{\Gamma }}{\mathcal {E}}_{0}\) with respect to tight convergence on \({\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\) where

$$\begin{aligned} {\mathcal {E}}_{0}(\lambda ):= {\left\{ \begin{array}{ll} \displaystyle \int _\Omega G(\lambda ^x) \, \mathrm {d}x&{} \text {if } \lambda =f \mathrm {d}x\otimes \lambda ^x, \\ + \infty &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$

where \(f \mathrm {d}x\otimes \lambda ^x\) denotes the disintegration of \(\lambda \) with respect to \(f \mathrm {d}x\); see [2, Theorem 2.28].

Bouchitté, Jimenez and Mahadevan used Theorem 3.3 to prove the following result about the scaling of the asymptotic quantization error for the constrained optimal location problem; see [14, Lemma 3.10, Proposition 3.11(i)].

Corollary 3.4

(Asymptotic quantization error for the constrained problem). For all \(\alpha \in (-\infty ,1)\),

$$\begin{aligned} \lim _{L\rightarrow \infty }\left[ L^{\frac{1}{1-\alpha }}\,\mathrm {m_c}(\alpha ,L)\right] = C_{\mathrm {c}}(\alpha )\left( \int _\Omega f(x)^{\frac{1}{2-\alpha }} \, \mathrm {d}x \right) ^{\frac{2-\alpha }{1-\alpha }} \end{aligned}$$

where

$$\begin{aligned} C_{\mathrm {c}}(\alpha ) := \min \left\{ G(\rho ) : \rho \in {\mathcal {P}}({\mathbb {R}}^+),\, \int _0^\infty t^{\alpha -1} \,\mathrm {d}\rho (t)\le 1 \right\} . \end{aligned}$$
(29)

Moreover, the function \(\alpha \mapsto C_{\mathrm {c}}(\alpha )\) is non-increasing.

Note that the result proved in [14] holds more generally: in any dimension, for any p-Wasserstein metric, and for more general entropies.

Remark 3.5

Let \(Q \subset {\mathbb {R}}^2\) be a unit square. Taking \(\Omega =Q\) and \(f={\mathbbm {1}}_Q\) in Corollary 3.4 yields

$$\begin{aligned} C_{\mathrm {c}}(\alpha )=\lim _{L\rightarrow \infty } L^{\frac{1}{1-\alpha }} \, \inf \left\{ W^2_2({\mathbbm {1}}_Q,\mu ) : \mu \in {\mathcal {P}}_{\mathrm {d}}(Q),\,\, \sum _{i=1}^{N_\mu } m_i^\alpha \le L \right\} . \end{aligned}$$
(30)

Remark 3.6

(Optimal constant). The constant \(C_{\mathrm {c}}(\alpha )\) in Corollary 3.4 was known explicitly for the case \(\alpha \in (-\infty ,0]\), where \(C_{\mathrm {c}}(\alpha ) = C_{\mathrm {c}}(0)=G(\delta _1)=c_6\) for all \(\alpha \le 0\). We briefly recall the proof: By Fejes Tóth’s Theorem on Sums of Moments [43],

$$\begin{aligned} c_6= C_{\mathrm {c}}(0) {\mathop {\le }\limits ^{(29)}} G(\delta _1) \le c_6 \end{aligned}$$

where the final inequality follows from [14, Prop. 3.2(iv)]. Therefore \(C_{\mathrm {c}}(0) =c_6\). In addition, \( C_{\mathrm {c}}(\alpha ) \ge C_{\mathrm {c}}(0) = c_6\) for all \(\alpha \le 0\) by the monotonicity of the map \(\alpha \mapsto C_{\mathrm {c}}(\alpha )\) (see Corollary 3.4). On the other hand, \( C_{\mathrm {c}}(\alpha ) \le c_6\) by [14, Prop. 3.2(iv)]. We conclude that \( C_{\mathrm {c}}(\alpha ) = c_6\) for all \(\alpha \le 0\), as required. One of our contributions is to prove that \(C_{\mathrm {c}}(\alpha )=c_6\) for all \(\alpha \in (-\infty ,0.583]\); see Sect. 5.

3.2 The penalized optimal location problem

Here we prove analogous results to those presented in the previous section.

Definition 3.7

(penalized energy). Let \(\delta >0\) and \(\alpha \in (-\infty ,1)\). Define \({\mathcal {E}}_{\delta ,\alpha } : {\mathcal {P}}_{\mathrm {d}}(\Omega ) \rightarrow [0,\infty )\) by

$$\begin{aligned} {\mathcal {E}}_{\delta ,\alpha }(\mu ):=\delta \sum _{i=1}^{N_\mu } m_i^\alpha + W^2_2(f,\mu ) \end{aligned}$$

where \(\mu = \sum _{i=1}^{N_\mu } m_i \delta _{z_i}\).

Proposition 3.8

(Gamma-limit of the penalized energy). Let \(\delta >0\), \(\alpha \in (-\infty ,1)\) and

$$\begin{aligned} \varepsilon _{\delta ,\alpha }:=\left( \frac{\delta (1-\alpha )}{c_6}\right) ^{\frac{1}{2(2-\alpha )}}. \end{aligned}$$

Define the rescaled penalized energy \({\widetilde{{\mathcal {E}}}}_{\delta ,\alpha }:{\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\rightarrow [0,\infty ]\) by

$$\begin{aligned} {\widetilde{{\mathcal {E}}}}_{\delta ,\alpha }(\lambda ):= {\left\{ \begin{array}{ll} \varepsilon _{\delta ,\alpha }^{-2} \, {\mathcal {E}}_{\delta ,\alpha }(\mu ) &{} \text {if } \lambda =\lambda _{\varepsilon _{\delta ,\alpha }}(\mu ), \, \mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega ),\\ + \infty &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Then \({\widetilde{{\mathcal {E}}}}_{\delta ,\alpha }{\mathop {\rightarrow }\limits ^{\Gamma }}{\mathcal {G}}_\alpha \) as \(\delta \rightarrow 0\) with respect to tight convergence on \({\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\) where

$$\begin{aligned} {\mathcal {G}}_\alpha (\lambda ):= {\left\{ \begin{array}{ll} \displaystyle \int _\Omega \left[ G(\lambda ^x) + f(x)\frac{c_6}{1-\alpha }\int _0^{\infty } t^{\alpha -1} \,\mathrm {d}\lambda ^x(t) \right] \, \mathrm {d}x &{} \text {if } \lambda =f \mathrm {d}x\otimes \lambda ^x, \\ +\infty &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

To prove Proposition 3.8 we need the following technical result from [14, Lemma 6.3], which says that we can modify \(\mu \) to remove asymptotically small Dirac masses (as \(\delta \rightarrow 0\)) without increasing the energy \({\widetilde{{\mathcal {E}}}}_{\delta ,\alpha }(\mu )\) too much.

Lemma 3.9

Let \(\lambda =f \mathrm {d}x \otimes \lambda ^x \in {\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\) satisfy \( {\mathcal {E}}_0(\lambda )<\infty \). Then, for every \(\gamma >1\), there exists a decreasing sequence \((t_k)_{k\in {\mathbb {N}}} \subset (0,\infty )\), \(t_k \rightarrow 0\), and a doubly-indexed sequence \(( \lambda _ \varepsilon ^k )_{\varepsilon >0, k\in {\mathbb {N}}}\subset {\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\) satisfying the following:

  1. (i)

    \(\lambda _\varepsilon ^k\) is supported in \(\Omega \times [t_k,\infty )\);

  2. (ii)

    \(\limsup _{k \rightarrow \infty } \limsup _{\varepsilon \rightarrow 0} \Vert \lambda ^k_ \varepsilon - \lambda \Vert = 0\), where \(\Vert \cdot \Vert \) denotes the total variation norm on the space of signed measures on \(\Omega \times {\mathbb {R}}^+\);

  3. (iii)

    for all \(\alpha \in (-\infty ,1)\), \(k \in {\mathbb {N}}\),

    $$\begin{aligned} \limsup _{\varepsilon \rightarrow 0} \int _{\Omega \times (0,\infty )} t^{\alpha -1} \, \mathrm {d}\lambda _\varepsilon ^k(x,t) \le \int _{\Omega \times (0,\infty )} t^{\alpha -1} \, \mathrm {d}\lambda (x,t); \end{aligned}$$
  4. (iv)

    there exists \(\mu _\varepsilon ^k \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\) such that \(\lambda _\varepsilon ^k=\lambda _\varepsilon (\mu _\varepsilon ^k)\) and

    $$\begin{aligned} \limsup _{k\rightarrow \infty } \, \limsup _{\varepsilon \rightarrow 0} \, \varepsilon ^{-2} \, W_2^2(f, \mu ^k_\varepsilon ) \le \gamma \int _\Omega G(\lambda ^x) \, \mathrm {d}x. \end{aligned}$$

Proof of Proposition 3.8

For \(\mu \in {\mathcal {P}}_{\mathrm {d}}(\Omega )\), \(\lambda =\lambda _{\varepsilon _{\delta ,\alpha }}(\mu )\), we can write

$$\begin{aligned} {\widetilde{{\mathcal {E}}}}_{\delta ,\alpha }(\lambda )&=\frac{c_6}{1-\alpha }\varepsilon _{\delta ,\alpha }^{2(1-\alpha )}\sum _{i=1}^{N_\mu }m_i^\alpha + \frac{1}{\varepsilon _{\delta ,\alpha }^2}W^2_2(f,\mu ) \\&=\frac{c_6}{1-\alpha }\int _{\Omega \times (0,\infty )} t^{\alpha -1}\, \mathrm {d}\lambda _{\varepsilon _{\delta ,\alpha }}(\mu )(x,t) + \frac{1}{\varepsilon _{\delta ,\alpha }^2}W^2_2(f,\mu ). \end{aligned}$$

Since the function \(t\mapsto t^{\alpha -1}\) is unbounded, and thus the first term of \({\widetilde{{\mathcal {E}}}}_{\delta ,\alpha }(\lambda )\) is not continuous in \(\lambda \), the \(\Gamma \)-convergence result does not follow directly from Theorem 3.3 and the stability of \(\Gamma \)-limits under continuous perturbations. We therefore reason as follows.

Step 1: liminf inequality. Fix \((\delta _n)_{n\in {\mathbb {N}}}\) with \(\delta _n\rightarrow 0\) as \(n\rightarrow \infty \). Let \(\lambda \in {\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\) and \((\lambda _n)_{n\in {\mathbb {N}}}\subset {\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\) satisfy \(\lambda _n\rightarrow \lambda \) tightly. Without loss of generality we can assume that

$$\begin{aligned} \liminf _{n\rightarrow \infty } {\widetilde{{\mathcal {E}}}}_{\delta _n,\alpha }(\lambda _n)<\infty . \end{aligned}$$
(31)

Therefore there exists \((\mu _n)_{n\in {\mathbb {N}}}\subset {\mathcal {M}}_{\mathrm {d}}(\Omega )\) such that \(\lambda _n=\lambda _{\varepsilon _{\delta _n,\alpha }}(\mu _n)\). Observe that \(\pi _1 \# \lambda _n = \mu _n\). By (31), and since \(W_2\) metrizes weak convergence of measures [65, Theorem 5.9], then \(\mu _n \rightarrow f \mathrm {d}x\) as \(n \rightarrow \infty \). Therefore \(\pi _1 \# \lambda = \lim _{n \rightarrow \infty } \pi _1 \# \lambda _n = f \mathrm {d}x\). By the Disintegration Theorem [2, Theorem 2.28] there exists \(\lambda ^x \in {\mathcal {M}}({\mathbb {R}}^+)\) satisfying \(\lambda =f \mathrm {d}x\otimes \lambda ^x\).

For \(M>0\) define the continuous bounded function \(g_M:(0,\infty ) \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} g_M(t):=\min \{ t^{\alpha -1}, M\}. \end{aligned}$$

Then, by using the liminf inequality of Theorem 3.3, we get

$$\begin{aligned} \liminf _{n\rightarrow \infty } {\widetilde{{\mathcal {E}}}}_{\delta _n,\alpha }(\lambda _n)&\ge \liminf _{n\rightarrow \infty }\left( \frac{c_6}{1-\alpha }\int _{\Omega \times (0,\infty )} g_M(t)\, \mathrm {d}\lambda _{\varepsilon _{\delta _n,\alpha }}(\mu _n)(x,t) + \frac{1}{\varepsilon _{\delta _n,\alpha }^2}W^2_2(f,\mu _n)\right) \\&\ge \frac{c_6}{1-\alpha } \int _\Omega \left( \int _0^{\infty } g_M(t)\,\mathrm {d}\lambda ^x(t) \right) f(x) \, \mathrm {d}x + \int _\Omega G(\lambda ^x) \,\mathrm {d}x. \end{aligned}$$

Since the function \(g_M\) is non-negative and pointwise non-decreasing in M, we obtain the liminf inequality by passing to the limit \(M \rightarrow \infty \) using the Monotone Convergence Theorem.

Step 2: limsup inequality. Let \(\lambda =f \mathrm {d}x \otimes \lambda ^x \in {\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\) satisfy \({\mathcal {G}}_\alpha (\lambda ) < \infty \), which implies that \({\mathcal {E}}_0(\lambda )<\infty \). Let \(\gamma >1\). By Lemma 3.9(iii),(iv), there exists a decreasing sequence \((t_k)_{k\in {\mathbb {N}}} \subset (0,\infty )\), a sequence \(\delta _n \rightarrow 0\), and a doubly-indexed sequence \(( \lambda _{\varepsilon _{\delta _n,\alpha }}^k )_{n, k\in {\mathbb {N}}}\subset {\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\) such that

$$\begin{aligned} \limsup _{k\rightarrow \infty } \, \limsup _{n \rightarrow \infty } \, {\widetilde{{\mathcal {E}}}}_{\delta _n,\alpha } \left( \lambda _{\varepsilon _{\delta _n,\alpha }}^k \right) \le \frac{c_6}{1-\alpha } \int _{\Omega \times (0,\infty )} t^{\alpha -1} \, \mathrm {d}\lambda (x,t) + \gamma \int _\Omega G(\lambda ^x) \, \mathrm {d}x\le \gamma {\mathcal {G}}_\alpha (\lambda ). \end{aligned}$$

By a diagonalization argument and Lemma 3.9(ii), we can find a subsequence \(\delta _n\) (not relabelled) such that \(\lambda _{\varepsilon _{\delta _{n},\alpha }}^{k_n} \rightarrow \lambda \) tightly as \(n \rightarrow \infty \) and

$$\begin{aligned} \limsup _{n \rightarrow \infty } \, {\widetilde{{\mathcal {E}}}}_{\delta _n,\alpha } \left( \lambda _{\varepsilon _{\delta _{n},\alpha }}^{k_n} \right) \le \gamma {\mathcal {G}}_\alpha (\lambda ). \end{aligned}$$

Since \(\gamma >1\) is arbitrary, the limsup inequality follows. \(\square \)

Corollary 3.10

(Asymptotic quantization error for the penalized problem). For all \(\alpha \in (-\infty ,1)\),

$$\begin{aligned} \lim _{\delta \rightarrow 0}\left[ \left( \frac{c_6}{\delta (1-\alpha )}\right) ^{\frac{1}{2-\alpha }} \mathrm {m_p}(\alpha ,\delta )\right] = C_{\mathrm {p}}(\alpha ) \int _\Omega f(x)^{\frac{1}{2-\alpha }} \,\mathrm {d}x \end{aligned}$$
(32)

where

$$\begin{aligned} C_{\mathrm {p}}(\alpha ):=\min \left\{ G(\rho )+\frac{c_6}{1-\alpha }\int _0^{\infty } t^{\alpha -1} \,\mathrm {d}\rho (t) : \rho \in {\mathcal {P}}({\mathbb {R}}^+)\right\} . \end{aligned}$$
(33)

Proof

Step 1. The functional \({\widetilde{{\mathcal {E}}}}_{\delta ,\alpha }\) has at least one minimizer (by [24, Theorem 2.1]), sequences \((\lambda _\delta )\) with bounded energy have tightly convergent subsequences (by [14, Theorem 3.1(i)]), and \({\widetilde{{\mathcal {E}}}}_{\delta ,\alpha }\) \(\Gamma \)-converges to \({\mathcal {G}}_\alpha \) (by Proposition 3.8). Therefore a standard result in the theory of \(\Gamma \)-convergence implies that the minimum value of \({\widetilde{{\mathcal {E}}}}_{\delta ,\alpha }\) converges to the minimum value of \({\mathcal {G}}_\alpha \):

$$\begin{aligned} \lim _{\delta \rightarrow 0}\left[ \left( \frac{c_6}{\delta (1-\alpha )}\right) ^{\frac{1}{2-\alpha }} \mathrm {m_p}(\alpha ,\delta )\right] =\min _{{\mathcal {M}}(\Omega \times {\mathbb {R}}^+)} {\mathcal {G}}_\alpha . \end{aligned}$$

We are thus left with proving that

$$\begin{aligned} \min _{{\mathcal {M}}(\Omega \times {\mathbb {R}}^+)} {\mathcal {G}}_\alpha = C_{\mathrm {p}}(\alpha ) \int _\Omega f(x)^{\frac{1}{2-\alpha }} \,\mathrm {d}x, \end{aligned}$$
(34)

where \(C_{\mathrm {p}}(\alpha )\) is defined in (33).

Step 2. For each \(x \in \Omega \), define \({\mathcal {G}}_\alpha ^x:{\mathcal {P}}(\mathbb {R^+}) \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} {\mathcal {G}}^x_\alpha (\rho ) := G(\rho )+f(x)\frac{c_6}{1-\alpha }\int _0^{\infty } t^{\alpha -1} \, \mathrm {d}\rho (t). \end{aligned}$$

By definition, if \(\lambda =f \mathrm {d}x\otimes \lambda ^x\),

$$\begin{aligned} {\mathcal {G}}_\alpha (\lambda )=\int _\Omega {\mathcal {G}}^x_\alpha (\lambda ^x) \, \mathrm {d}x. \end{aligned}$$
(35)

For each \(x \in \Omega \), \({\mathcal {G}}^x_\alpha \) is lower semi-continuous since G is lower semi-continuous [14, Prop. 3.2(i)] and since \(\rho \mapsto \int _0^\infty t^{\alpha -1} \, \mathrm {d}\rho (t)\) is lower semi-continuous [65, Lemma 1.6]. By [14, Prop. 3.2(iv)],

$$\begin{aligned} \gamma _{2,2} \int _0^\infty t \, \mathrm {d}\rho (t) \le G(\rho ) \end{aligned}$$

where \(\gamma _{2,2} = \int _{B_1(0)} |x|^2 \, \mathrm {d}x\). Therefore, for each \(x \in \Omega \), minimising sequences for \({\mathcal {G}}^x_\alpha \) are tight. Consequently \({\mathcal {G}}^x_\alpha \) has at least one minimizer.

We claim that there exits a Borel measurable function \(x \mapsto \rho ^x \in {\mathcal {P}}({\mathbb {R}}^+)\), \(x \in \Omega \), such that

$$\begin{aligned} {\mathcal {G}}^x_\alpha (\rho ^x) = \min _{{\mathcal {P}}({\mathbb {R}}^+)} {\mathcal {G}}^x_\alpha . \end{aligned}$$
(36)

This will follow from Aumann’s Selection Theorem (see [38, Theorem 6.10]) once we prove that the graph of the multifunction \(\Gamma : \Omega \rightarrow 2^{{\mathcal {P}}({\mathbb {R}}^+)} \setminus \emptyset \), defined by \(\Gamma (x):=\mathrm {argmin} \, {\mathcal {G}}^x_\alpha \), belongs to \({\mathcal {B}}(\Omega )\otimes {\mathcal {B}}({\mathcal {P}}({\mathbb {R}}^+))\), the product \(\sigma \)-algebra of the Borel sets of \(\Omega \) and the Borel sets of \(\mathcal {P({\mathbb {R}}^+)}\). To prove this, we define the function \(\Psi :{\mathbb {R}}^+ \times {\mathcal {P}}({\mathbb {R}}^+) \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} \Psi (s,\rho ):= G(\rho )+s \frac{c_6}{1-\alpha }\int _0^{\infty } t^{\alpha -1} \, \mathrm {d}\rho (t). \end{aligned}$$

In the following, the target space \({\mathbb {R}}\) will always be equipped with the Borel \(\sigma \)-algebra. For each \(\rho \in {\mathcal {P}}({\mathbb {R}}^+)\), the function \(s \mapsto \Psi (s,\rho )\) is continuous. For each \(s \in {\mathbb {R}}^+\), the function \(\rho \mapsto \Psi (s,\rho )\) is lower semi-continuous and hence \({\mathcal {B}}(\mathcal {P({\mathbb {R}}^+)})\)-measurable. Therefore \(\Psi \) is a Carathéodory function and hence \({\mathcal {B}}({\mathbb {R}}^+) \otimes {\mathcal {B}}(\mathcal {P({\mathbb {R}}^+)})\)-measurable (see, e.g., [1, Lemma 4.51]). Define the composite function \(\Phi :\Omega \times {\mathcal {P}}({\mathbb {R}}^+) \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} \Phi (x,\rho ):=\Psi (f(x),\rho ) = {\mathcal {G}}^x_\alpha (\rho ). \end{aligned}$$

This is \({\mathcal {B}}(\Omega ) \otimes {\mathcal {B}}(\mathcal {P({\mathbb {R}}^+)})\)-measurable since f and \(\Psi \) are Borel measurable.

We claim that the map \(x \mapsto \min _{\rho \in {\mathcal {P}}({\mathbb {R}}^+)}\Phi (x,\rho )\) is \({\mathcal {B}}(\Omega )\)-measurable. Then \({\overline{\Phi }}:\Omega \times {\mathcal {P}}({\mathbb {R}}^+) \rightarrow {\mathbb {R}}\) defined by \({\overline{\Phi }}(x,\nu ):=\min _{\rho \in {\mathcal {P}}({\mathbb {R}}^+)}\Phi (x,\rho )\) is \({\mathcal {B}}(\Omega )\otimes {\mathcal {B}}(\mathcal {P({\mathbb {R}}^+)})\)-measurable (since \({\overline{\Phi }}\) is constant in its second argument). The required \({\mathcal {B}}(\Omega )\otimes {\mathcal {B}}(\mathcal {P({\mathbb {R}}^+)})\)-measurability of graph of the multifunction \(\Gamma \) then follows by noticing that

$$\begin{aligned} \mathrm {graph}(\Gamma ) = (\Phi - {\overline{\Phi }})^{-1}(\{0\}). \end{aligned}$$

To show that \(x \mapsto \min _{\rho \in {\mathcal {P}}({\mathbb {R}}^+)}\Phi (x,\rho )\) is \({\mathcal {B}}(\Omega )\)-measurable, we write it as the composite function \(x \mapsto f(x) \mapsto \min _{\rho \in {\mathcal {P}}({\mathbb {R}}^+)}\Psi (f(x),\rho )\). This is \({\mathcal {B}}(\Omega )\)-measurable since \(x \mapsto f(x)\) is \({\mathcal {B}}(\Omega )\)-measurable and \(s \mapsto \min _{\rho \in {\mathcal {P}}({\mathbb {R}}^+)}\Psi (s,\rho )\) is the pointwise infimum of a family of continuous functions, hence upper semi-continuous and \({\mathcal {B}}({\mathbb {R}}^+)\)-measurable. This completes the proof that there exists a Borel measurable function \(x \mapsto \rho ^x \in {\mathcal {P}}({\mathbb {R}}^+)\) satisfying (36).

Step 3. Define \(\lambda :=f\mathrm {d}x \otimes \rho ^x \in {\mathcal {M}}(\Omega \times {\mathbb {R}}^+)\), where \(\rho ^x\) is the minimizer of \({\mathcal {G}}^x_\alpha \) constructed in Step 2 (note that \(\lambda \) is well-defined by [2, Definition 2.27] since \(x \mapsto \rho ^x\) is Borel measurable and hence Lebesgue measurable). By equations (35), (36), \(\lambda \) is a minimizer of \({\mathcal {G}}_\alpha \) and

$$\begin{aligned} \min _{{\mathcal {M}}(\Omega \times {\mathbb {R}}^+)} {\mathcal {G}}_\alpha = \int _\Omega \min \left\{ {\mathcal {G}}^x_\alpha (\rho ) : \rho \in {\mathcal {P}}({\mathbb {R}}^+)\right\} \, \mathrm {d}x. \end{aligned}$$
(37)

We now rewrite

$$\begin{aligned} \min \left\{ {\mathcal {G}}^x_\alpha (\rho ) : \rho \in {\mathcal {P}}({\mathbb {R}}^+)\right\} \end{aligned}$$

as follows. For \(a>0\), define the dilation \(L^a:{\mathbb {R}}^+\rightarrow {\mathbb {R}}^+\) by \(L^a(t):=at\). Let \(\rho \in {\mathcal {P}}({\mathbb {R}}^+)\) and consider the push-forward \(\rho _a:=L^a\#\rho \in {\mathcal {P}}({\mathbb {R}}^+)\). It was proved in [14, Prop. 3.2(ii)] that

$$\begin{aligned} G(\rho _a)=a G(\rho ). \end{aligned}$$
(38)

Note that

$$\begin{aligned} \int _0^{\infty } t^{\alpha -1} \,\mathrm {d}\rho _a(t) = a^{\alpha -1} \int _0^{\infty } t^{\alpha -1} \,\mathrm {d}\rho (t). \end{aligned}$$
(39)

Fix \(x\in \Omega \) and let \(a:=f(x)^{-\frac{1}{2-\alpha }}\). By (38) and (39) we can write

$$\begin{aligned} {\mathcal {G}}^x_\alpha (\rho ) = G(\rho )+f(x)\frac{c_6}{1-\alpha }\int _0^{\infty } t^{\alpha -1} \, \mathrm {d}\rho (t) =f(x)^{\frac{1}{2-\alpha }} \left[ G(\rho _a)+\frac{c_6}{1-\alpha }\int _0^{\infty } t^{\alpha -1} \, \mathrm {d}\rho _a(t) \right] . \end{aligned}$$
(40)

Therefore, by using (40) and the definition of \(C_{\mathrm {p}}(\alpha )\) (see (33)), we have that

$$\begin{aligned} \min \left\{ {\mathcal {G}}^x_\alpha (\rho ) : \rho \in {\mathcal {P}}({\mathbb {R}}^+)\right\}&= \min \left\{ G(\rho )+f(x)\frac{c_6}{1-\alpha }\int _0^{\infty } t^{\alpha -1} \, \mathrm {d}\rho (t) : \rho \in {\mathcal {P}}({\mathbb {R}}^+)\right\} \nonumber \\&= f(x)^{\frac{1}{2-\alpha }} C_{\mathrm {p}}(\alpha ) \end{aligned}$$
(41)

for all \(x\in \Omega \). By combining (37) and (41) we prove (34) and conclude the proof. \(\square \)

Remark 3.11

Let \(Q \subset {\mathbb {R}}^2\) be a unit square. Taking \(\Omega =Q\) and \(f={\mathbbm {1}}_Q\) in Corollary 3.10 yields

$$\begin{aligned} C_{\mathrm {p}}(\alpha )=\ \lim _{\delta \rightarrow 0}\left( \frac{c_6}{\delta (1-\alpha )}\right) ^{\frac{1}{2-\alpha }} \min \left\{ \delta \sum _{i=1}^{N_\mu } m_i^\alpha + W^2_2({\mathbbm {1}}_Q,\mu ) : \mu \in {\mathcal {P}}_{\mathrm {d}}(Q) \right\} . \end{aligned}$$
(42)

By Corollary 3.10, in order to prove Theorem 1.1 it is sufficient to prove that

$$\begin{aligned} C_{\mathrm {p}}(\alpha ) = \frac{2-\alpha }{1-\alpha } \, c_6 \end{aligned}$$

for all \(\alpha \in (-\infty ,{\overline{\alpha }}]\). The next result, which is analogous to the monotonicity of the map \(\alpha \mapsto C_{\mathrm {c}}(\alpha )\), means that in order to prove Theorem 1.1 for all \(\alpha \in (-\infty ,{\overline{\alpha }})\), it is sufficient to prove it for the single value \(\alpha ={\overline{\alpha }}\).

Lemma 3.12

(Monotonicity of the constant \(C_{\mathrm {p}}\)). Assume that for some \({\tilde{\alpha }}\in (-\infty ,1)\),

$$\begin{aligned} C_{\mathrm {p}}({\tilde{\alpha }}) = \frac{2-{\tilde{\alpha }}}{1-{\tilde{\alpha }}} \, c_6. \end{aligned}$$

Then

$$\begin{aligned} C_{\mathrm {p}}(\alpha ) = \frac{2-\alpha }{1-\alpha } \, c_6 \end{aligned}$$

for every \(\alpha <{\tilde{\alpha }}\).

Proof

Recall from Remark 3.6 that \(c_6 = G(\delta _1)\). By (33), for all \(\alpha \in (-\infty ,1)\),

$$\begin{aligned} C_{\mathrm {p}}(\alpha ) \le G(\delta _1) + \frac{c_6}{1-\alpha } \int _0^\infty t^{\alpha -1} \, \mathrm {d}\delta _1(t) = c_6 + \frac{c_6}{1-\alpha } = \frac{2-\alpha }{1-\alpha } \, c_6. \end{aligned}$$
(43)

Write \(C_{\mathrm {p}}(\alpha )=\min \{ F(\alpha ,\rho ) : \rho \in {\mathcal {P}}({\mathbb {R}}^+)\}\), where

$$\begin{aligned} F(\alpha ,\rho ) = G(\rho )+\frac{c_6}{1-\alpha }\int _0^{\infty } t^{\alpha -1} \, \mathrm {d}\rho (t). \end{aligned}$$

For all \( \rho \in {\mathcal {P}}({\mathbb {R}}^+)\), \(\alpha \in (-\infty ,1)\), we have

$$\begin{aligned} F(\alpha ,\rho ) - {F(\alpha ,\delta _1)} = G(\rho ) - G(\delta _1) + c_6 \int _0^\infty \frac{t^{\alpha -1}-1}{1-\alpha } \, \mathrm {d}\rho (t). \end{aligned}$$

Let \(\phi (t,\alpha )=(t^{\alpha -1}-1)/(1-\alpha )\) denote the integrand on the right-hand side. Then

$$\begin{aligned} (1-\alpha )^2 \partial _\alpha \phi (t,\alpha ) = t^{\alpha -1}( 1+(1-\alpha )\ln t )-1 =: \psi (t,\alpha ). \end{aligned}$$

Since \(\partial _\alpha \psi (t,\alpha )=(1-\alpha )(\ln t)^2 t^{\alpha -1}\ge 0\) and \(\psi (t,1)=0\), we obtain that \(\psi (t,\alpha )\le 0\) for all \(t\in (0,\infty )\), \(\alpha \in (-\infty ,1)\). Therefore \(\partial _\alpha \phi \le 0\) and \(\phi \) is non-increasing in \(\alpha \). Consequently the map \(\alpha \mapsto F(\alpha ,\rho ) - {F(\alpha ,\delta _1)}\) is non-increasing.

Let \({\tilde{\alpha }}\in (\infty ,1)\) be such that

$$\begin{aligned} C_{\mathrm {p}}({\tilde{\alpha }}) = \frac{2-{\tilde{\alpha }}}{1-{\tilde{\alpha }}} \, c_6 = {F({\tilde{\alpha }},\delta _1)}. \end{aligned}$$

For all \(\rho \in {\mathcal {P}}({\mathbb {R}}^+)\) and all \(\alpha \in (-\infty ,{\tilde{\alpha }}]\),

$$\begin{aligned} F(\alpha ,\rho ) - {F(\alpha ,\delta _1)} \ge F({\tilde{\alpha }},\rho ) - {F({\tilde{\alpha }},\delta _1)} \ge C_{\mathrm {p}}({\tilde{\alpha }}) - {F({\tilde{\alpha }},\delta _1)} = 0. \end{aligned}$$

Taking the infimum over \(\rho \) gives

$$\begin{aligned} C_{\mathrm {p}}(\alpha ) \ge {F(\alpha ,\delta _1)} = \frac{2-\alpha }{1-\alpha } \, c_6. \end{aligned}$$
(44)

Combining (43) and (44) completes the proof. \(\square \)

4 The penalized optimal location problem: Proof of Theorem 1.1

This section is devoted to the proof of Theorem 1.1. In particular, we prove that

$$\begin{aligned} C_{\mathrm {p}}({\overline{\alpha }}) = \frac{2 - {\overline{\alpha }}}{1-{\overline{\alpha }}} \, c_6. \end{aligned}$$

The upper bound is easy to prove:

Lemma 4.1

(Upper bound on \(C_{\mathrm {p}}(\alpha )\)). For all \(\alpha \in (-\infty ,1)\),

$$\begin{aligned} C_{\mathrm {p}}(\alpha ) \le \frac{2 - \alpha }{1-\alpha } \, c_6. \end{aligned}$$

Proof

Recall from Remark 3.6 that \(G(\delta _1)=c_6\). Therefore

$$\begin{aligned} C_{\mathrm {p}}(\alpha ) {\mathop {\le }\limits ^{(33)}} G(\delta _1) + \frac{c_6}{1-\alpha } \int _{0}^{\infty } t^{\alpha -1} \, \mathrm {d}\delta _1(t) = c_6 + \frac{c_6}{1-\alpha } = \frac{2 - \alpha }{1-\alpha } \, c_6. \end{aligned}$$

\(\square \)

Remark 4.2

(Direct proof of the upper bound). Lemma 4.1 can also be proved without using the result from [14] that \(G(\delta _1)=c_6\). Instead we can start from equation (42) and directly build a sequence of asymptotically optimal competitors \(\mu _\delta \) supported on a subset of a triangular lattice. This is done by covering the square Q with regular hexagons of a suitable size and making the heuristic calculation from Remark 1.3 rigorous; cf. [18, Lemma 8].

The matching lower bound

$$\begin{aligned} C_{\mathrm {p}}({\overline{\alpha }}) \ge \frac{2 - {\overline{\alpha }}}{1-{\overline{\alpha }}} \, c_6 \end{aligned}$$
(45)

requires much more work. Owing to Corollary 3.10 and Remark 3.11 we can assume without loss of generality that

$$\begin{aligned} \Omega = Q=[-1/2,1/2]^2, \qquad f = {\mathbbm {1}}_Q. \end{aligned}$$

We will do this throughout the rest of the paper.

4.1 Rescaling of the energy and the energy of a partition

To prove (45) it is convenient to rescale the domain Q. As \(\delta \rightarrow 0\), the optimal masses \(m_i\) in (42) go to 0. Following [18], instead of keeping the domain Q fixed as \(\delta \rightarrow 0\), we blow up Q in such a way that the optimal masses \(m_i\) tend to 1. The following definition is motivated by the heuristic calculation given in Remark 1.3.

Definition 4.3

(Rescaled domain and energy). For \(\alpha \in (-\infty ,1)\) and \(\delta >0\), define

$$\begin{aligned} V_{\delta ,\alpha }:=\left( \frac{c_6}{\delta (1-\alpha )} \right) ^{\frac{1}{2-\alpha }} \end{aligned}$$

and define the rescaled square domain \(Q_{\delta ,\alpha }\) by

$$\begin{aligned} Q_{\delta ,\alpha }:= V_{\delta ,\alpha }^{\frac{1}{2}} Q. \end{aligned}$$

Moreover, define the set of admissible discrete measures \({\mathcal {A}}_{\delta ,\alpha }\) by

$$\begin{aligned} {\mathcal {A}}_{\delta ,\alpha }:= \left\{ \mu = \sum _{i=1}^{N_\mu } m_i \delta _{z_i} : N_\mu \in {\mathbb {N}}, \, m_i > 0, \, \sum _{i=1}^{N_\mu } m_i = V_{\delta ,\alpha }, \, z_i \in Q_{\delta ,\alpha }, \, z_i \ne z_j \text { if } i \ne j \right\} \end{aligned}$$

and define the rescaled energy \({\mathcal {F}}_{\delta ,\alpha }:{\mathcal {A}}_{\delta ,\alpha } \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} {\mathcal {F}}_{\delta ,\alpha }(\mu ):= \frac{c_6}{1-\alpha } \sum _{i=1}^{N_\mu } m_i^\alpha + W_2^2({\mathbbm {1}}_{Q_{\delta ,\alpha }},\mu ). \end{aligned}$$
(46)

Remark 4.4

(Restating \(C_{\mathrm {p}}(\alpha )\) in terms of \({\mathcal {F}}_{\delta ,\alpha }\)). Let \(\mu = \sum _i m_i \delta _{z_i}\in {\mathcal {P}}_{\mathrm {d}}(Q)\). For \(\alpha \in (-\infty ,1)\) and \(\delta >0\), define \({\widetilde{z}}_i:=V_{\delta ,\alpha }^{1/2} z_i\), \({\widetilde{m}}_i:=V_{\delta ,\alpha } m_i\), and \({\widetilde{\mu }}_{\delta ,\alpha }:=\sum _i {\widetilde{m}}_i \delta _{{\widetilde{z}}_i} \in {\mathcal {A}}_{\delta ,\alpha }\). Then

$$\begin{aligned} V^{-2}_{\delta ,\alpha }{\mathcal {F}}_{\delta ,\alpha }({\widetilde{\mu }}_{\delta ,\alpha }) = \delta \sum _i m_i^\alpha + W^2_2({\mathbbm {1}}_{Q},\mu ). \end{aligned}$$
(47)

Therefore, by (42) and (47),

$$\begin{aligned} C_{\mathrm {p}}(\alpha ) = \lim _{\delta \rightarrow 0} V^{-1}_{\delta ,\alpha } \min \left\{ {\mathcal {F}}_{\delta ,\alpha }(\mu ) : \mu \in {\mathcal {A}}_{\delta ,\alpha } \right\} . \end{aligned}$$
(48)

We now state two first-order necessary conditions for minimizers of \({\mathcal {F}}_{\delta ,\alpha }\). For a proof see, for instance, [17, Theorem 4.5].

Lemma 4.5

(Properties of minimizers). Let \(\mu = \sum _{i=1}^{N_\mu } m_i \delta _{z_i}\in {\mathcal {A}}_{\delta ,\alpha }\) be a minimizer of \({\mathcal {F}}_{\delta ,\alpha }\). Let T be the optimal transport map defining \(W_2({\mathbbm {1}}_{Q_{\delta ,\alpha }},\mu )\) and let \((w_1,\ldots ,w_{N_\mu })\) be the weights of the corresponding Laguerre tessellation (see Lemma 2.1).

  1. (i)

    For all \(i \in \{1,\dots ,N_\mu \}\), we have

    $$\begin{aligned} w_i=-\frac{\alpha }{1-\alpha } c_6 m_i^{\alpha -1}. \end{aligned}$$
  2. (ii)

    The point \(z_i\) is the centroid of the Laguerre cell \(\overline{T^{-1}(z_i)}\), namely

    $$\begin{aligned} z_i = \frac{1}{m_i}\int _{T^{-1}(z_i)} x\,\mathrm {d}x. \end{aligned}$$

    In particular, \(z_i\in T^{-1}(z_i)\).

Remark 4.6

(Centroidal Laguerre tessellations). Lemma 4.5(ii) implies that minimizers of \({\mathcal {F}}_{\delta ,\alpha }\) generate centroidal Laguerre tessellations, which means that the particles \(z_i\) lie at the centroids of their Laguerre cells \(T^{-1}(z_i)\) [16, 73].

In the following it will also be convenient to reason from a geometrical point of view. Each \(\mu \in {\mathcal {A}}_{\delta ,\alpha }\) induces a partition of \(Q_{\delta ,\alpha }\) by the Laguerre cells \(\overline{T^{-1}(z_i)}\), where T is the optimal transport map defining \(W_2({\mathbbm {1}}_{Q_{\delta ,\alpha }},\mu )\). We define a wider class of partitions as follows:

Definition 4.7

(Admissible partitions). Let \({\mathcal {S}}_{\delta ,\alpha }\) denote the family of partitions of \(Q_{\delta ,\alpha }\) of the form \({\mathcal {C}}=(C_1,\dots , C_k)\) where \(k \in {\mathbb {N}}\), \(C_i \subset Q_{\delta ,\alpha }\) is measurable, and \(\sum _{i=1}^k {\mathbbm {1}}_{C_i} = 1\) a.e. in \(Q_{\delta ,\alpha }\).

The advantage of working with partitions instead of measures is that it allows us to localise the nonlocal energy \({\mathcal {F}}_{\delta ,\alpha }\).

Definition 4.8

(Optimal partitions). Define the partition energy \(F:{\mathcal {S}}_{\delta ,\alpha } \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} F({\mathcal {C}}):= \sum _{i=1}^k \left( \frac{c_6}{1-\alpha } |C_i|^\alpha + \int _{C_i} |x-\xi _{C_i}|^2 \, \mathrm {d}x \right) , \end{aligned}$$

where \(\xi _{C_i}:=\frac{1}{|C_i|} \int _{C_i} x\, \mathrm {d}x\) is the centroid of \(C_i\), \(i\in \{1,\dots ,k\}\). We say that \({\mathcal {C}}\in {\mathcal {S}}_{\delta ,\alpha }\) is an optimal partition if it minimizes F.

To each \(\mu \in {\mathcal {A}}_{\delta ,\alpha }\) it is possible to associate an element of \({\mathcal {S}}_{\delta ,\alpha }\) as follows:

Definition 4.9

(Partition associated to a discrete measure). Let \(\mu \in {\mathcal {A}}_{\delta ,\alpha }\) be of the form \(\mu =\sum _{i=1}^{N_\mu } m_i \delta _{z_i}\). Define \({\mathcal {C}}^\mu = (C^\mu _1,\dots ,C^\mu _{N_\mu })\in {\mathcal {S}}_{\delta ,\alpha }\) by

$$\begin{aligned} C^\mu _i := \overline{T^{-1}(z_i)} \end{aligned}$$

for all \(i\in \{1,\dots ,N_\mu \}\), where T is the optimal transport map defining \(W_2({\mathbbm {1}}_{Q_{\delta ,\alpha }},\mu )\).

Remark 4.10

(Equivalence of the partition formulation). It was proved in [18, p. 125] that

$$\begin{aligned} \min \{ {\mathcal {F}}_{\delta ,\alpha }(\mu ) : \mu \in {\mathcal {A}}_{\delta ,\alpha } \} = \min \{ F({\mathcal {C}}) : {\mathcal {C}}\in {\mathcal {S}}_{\delta ,\alpha } \}. \end{aligned}$$

Let \(\mu =\sum _{i=1}^{N_\mu } m_i \delta _{z_i}\) be a minimizer of \({\mathcal {F}}_{\delta ,\alpha }\). For \(i\in \{1,\dots ,N_\mu \}\), let \(n_i\) denote the number of edges of \(C_i^\mu \). Then we can bound the energy from below as follows:

$$\begin{aligned} {\mathcal {F}}_{\delta ,\alpha }(\mu ) = F({\mathcal {C}}^\mu ) =\sum _{i=1}^{N_\mu } \left( \frac{c_6}{1-\alpha }|C^\mu _i|^\alpha + \int _{C^\mu _i} |x-z_i|^2 \, \mathrm {d}x \right) \ge \sum _{i=1}^{N_\mu } \left( \frac{c_6}{1-\alpha } |C^\mu _i|^\alpha + c_{n_i} |C^\mu _i|^2 \right) \end{aligned}$$

by Lemma 2.3 and Remark 2.4. For \(\alpha \in (-\infty ,1)\), define \(g_\alpha :[0,\infty ) \times [3,\infty ) \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} g_\alpha (m,n):=\frac{c_6}{1-\alpha } m^\alpha +c_n m^2. \end{aligned}$$

In this notation the lower bound above becomes

$$\begin{aligned} {\mathcal {F}}_{\delta ,\alpha }(\mu ) = F({\mathcal {C}}^\mu ) \ge \sum _{i=1}^{N_\mu } g_\alpha (|C_i^\mu |, n_i). \end{aligned}$$
(49)

In the following section we study the function \(g_\alpha \).

4.2 The convexity inequality

We start by proving a technical result that plays the role of a convexity inequality for \(g_\alpha \). We want to show that, for large enough values of m,

$$\begin{aligned} g_\alpha (m,n)\ge g_\alpha (1,6) + \nabla g_\alpha (1,6)\cdot (m-1,n-6). \end{aligned}$$

Writing this out explicitly gives

$$\begin{aligned} \frac{c_6}{1-\alpha }m^\alpha + c_n m^2 \ge c_6\left( \frac{2-\alpha }{1-\alpha }\right) m+\kappa (n-6), \end{aligned}$$
(50)

where

$$\begin{aligned} \kappa :=\partial _n c_n |_{n=6}=\frac{2\pi }{243}-\frac{5\sqrt{3}}{324}<0. \end{aligned}$$

For \(\alpha \in (0,1)\), define the function \(h_\alpha :[0,\infty )\times [3,\infty )\rightarrow {\mathbb {R}}\) to be the difference between \(g_\alpha \) and its tangent plane approximation at (1, 6):

$$\begin{aligned} h_\alpha (m,n) :=&g_\alpha (m,n) - \left( g_\alpha (1,6) + \nabla g_\alpha (1,6)\cdot (m-1,n-6) \right) \\ =&\frac{c_6}{1-\alpha }m^\alpha + c_n m^2 -c_6\left( \frac{2-\alpha }{1-\alpha }\right) m-\kappa (n-6). \end{aligned}$$

Note that in this section we restrict our attention to \(\alpha >0\) without loss of generality since by the monotonicity result (Lemma 3.12) in the end we only need to consider \(\alpha = {\overline{\alpha }}=0.583\). The typical behaviour of the function \(m\mapsto h_\alpha (m,n)\) is depicted in Fig. 3. Our aim is to prove that \(h_\alpha (m,n)\) is non-negative for all integers \(n \ge 3\) for large enough values of m, as suggested by the figure.

Fig. 3
figure 3

Behaviour of the map \(m\mapsto h_\alpha (m,n)\) for different values of n and for \(\alpha =0.583\). Note that \(h_\alpha (0,n)<0\) for \(n \in \{3,4,5\}\), even if this is not evident from the figure

Lemma 4.11

(Positivity of \(h_{{\overline{\alpha }}}\)). Let \(n\ge 3\) be a integer. If \(h_{{\overline{\alpha }}}(m_1,n)\ge 0\) for some \(m_1\ge 0\), then \(h_{{\overline{\alpha }}}(m,n)\ge 0\) for all \(m\ge m_1\).

Before proving this, we prove the following easy but important corollary, which allows us to reduce the proof of the convexity inequality \(h_{{\overline{\alpha }}}(\cdot ,n)\ge 0\) for all integers \(n \ge 3\) to the finite number of cases \(n \in \{3,4,5\}\):

Corollary 4.12

(Reduction to \(n \in \{3,4,5\}\)). Let \(n \ge 6\) be an integer. Then \(h_{{\overline{\alpha }}}(m,n)\ge 0\) for all \(m \ge 0\).

Proof

Observe that \(h_{{\overline{\alpha }}}(0,n)=-\kappa (n-6) \ge 0\) for all \(n \ge 6\). Therefore the result follows immediately from Lemma 4.11. \(\square \)

On the other hand, \(h_{{\overline{\alpha }}}(0,n)<0\) for \(n \in \{3,4,5\}\). This is why in the next section we will need to prove a lower bound on the masses \(m_i\) of minimizers of \({\mathcal {F}}_{\delta ,{\overline{\alpha }}}\) to ensure the validity of the convexity inequality (50).

Proof of Lemma 4.11

Step 1. First we study the shape of the function \(m\mapsto h_\alpha (m,n)\). In particular, we show that it has exactly one local minimum point. Its derivative is

$$\begin{aligned} \partial _m h_\alpha (m,n) = \frac{\alpha }{1-\alpha } c_6 m^{\alpha -1} + 2 c_n m - c_6\frac{2-\alpha }{1-\alpha }. \end{aligned}$$
(51)

It is easy to see that \(\partial _m h_\alpha (m,n)\) is strictly convex in m (since \(\alpha > 0\)). Therefore \(m \mapsto \partial ^2_{mm} h_\alpha (m,n)\) is increasing and so \(m\mapsto \partial _m h_\alpha (m,n)\) has at most one critical point. On the other hand, \(\lim _{m \rightarrow 0}\partial _m h_\alpha (m,n)=+\infty \), \(\lim _{m \rightarrow \infty }\partial _m h_\alpha (m,n)=+\infty \). Therefore \(m\mapsto \partial _m h_\alpha (m,n)\) has exactly one critical point and \(m\mapsto h_\alpha (m,n)\) has at most two critical points.

Next we prove that \(m\mapsto h_\alpha (m,n)\) has exactly two critical points. It is sufficient to prove that

$$\begin{aligned} \partial _m h_\alpha (1/2,n) < 0 \end{aligned}$$

for all \(n \ge 3\) (since \(\lim _{m \rightarrow 0}\partial _m h_\alpha (m,n) >0\) and \(\lim _{m \rightarrow \infty }\partial _m h_\alpha (m,n) > 0\)). We have

$$\begin{aligned} \partial _m h_\alpha (1/2,n) = c_6 \frac{\alpha 2^{1-\alpha }-2+\alpha }{1-\alpha } + c_n. \end{aligned}$$
(52)

It is straightforward to check that

$$\begin{aligned} \frac{d}{d \alpha } \frac{\alpha 2^{1-\alpha }-2+\alpha }{1-\alpha } = \frac{\psi (\alpha )}{(1-\alpha )^2} \end{aligned}$$
(53)

with

$$\begin{aligned} \psi (\alpha ) = 2^{1-\alpha }(1-\alpha \ln 2 + \alpha ^2 \ln 2) - 1. \end{aligned}$$

Differentiating again gives

$$\begin{aligned} \psi '(\alpha ) = q(\alpha ) 2^{1-\alpha } \ln 2, \qquad q(\alpha ):=- \alpha ^2 \ln 2 + (2+\ln 2)\alpha -2. \end{aligned}$$

The concave quadratic polynomial q has roots \(\alpha =1\) and \(\alpha = 2/\ln 2 > 2\). Therefore, for all \(\alpha \in (0,1)\), \(q(\alpha )<0\), \(\psi '(\alpha )<0\), and

$$\begin{aligned} \psi (\alpha ) \ge \psi (1) = 0. \end{aligned}$$
(54)

From equations (52)–(54) we conclude that \(\alpha \mapsto \partial _m h_\alpha (1/2,n)\) is increasing. Therefore

$$\begin{aligned} \partial _m h_\alpha (1/2,n) \le \lim _{\alpha \rightarrow 1} \partial _m h_\alpha (1/2,n) = \lim _{\alpha \rightarrow 1} c_6 \frac{\alpha 2^{1-\alpha }-2+\alpha }{1-\alpha } + c_n = (\ln 2 - 2)c_6 + c_n. \end{aligned}$$

Since \(n \mapsto c_n\) is decreasing (see Lemma 2.5),

$$\begin{aligned} \partial _m h_\alpha (1/2,n) \le (\ln 2 - 2)c_6 + c_3< -0.017 < 0 \end{aligned}$$

as required.

We have shown that \(m\mapsto h_\alpha (m,n)\) has exactly two critical points. The smallest critical point is a local maximum point and the largest critical point is a local minimum point (since \(m\mapsto \partial ^2_{mm} h_\alpha (m,n)\) is increasing). Let \({\widetilde{m}}(\alpha ,n)\) denote the local minimum point. To prove the lemma it is sufficient to prove that

$$\begin{aligned} \varphi (\alpha ,n) := h_{\alpha }({\widetilde{m}}(\alpha ,n), n) \ge 0 \end{aligned}$$
(55)

for \(\alpha ={\overline{\alpha }}\) and for all \(n \in {\mathbb {N}} \cap [3,\infty )\).

Step 2. Next we prove (55) for the case \(n=6\). A direct computation shows that 1 is a local minimum point of \(m\mapsto h_\alpha (m,6)\) for all \(\alpha \in (0,1)\). Therefore \({\widetilde{m}}(\alpha ,6)=1\) and \(\varphi (\alpha ,6)= h_{\alpha }(1,6) =0\).

Step 3. Next we prove (55) for the case \(n \in \{3,4,5,7\}\). Let

$$\begin{aligned} m_1(n):= {\left\{ \begin{array}{ll} 0.764 &{} n=3, \\ 0.946 &{} n=4, \\ 0.98705 &{} n=5, \\ 1.00516 &{} n=7, \end{array}\right. } \quad \quad m_2(n):= {\left\{ \begin{array}{ll} 0.765 &{} n=3, \\ 0.947 &{} n=4, \\ 0.9871 &{} n=5, \\ 1.00518 &{} n=7. \end{array}\right. } \end{aligned}$$

Then numerically evaluating \(\partial _m h_{{\overline{\alpha }}}\) gives

$$\begin{aligned} \partial _m h_{{\overline{\alpha }}}(m_1(n),n)\le {\left\{ \begin{array}{ll} -8\times 10^{-6} &{} n=3,\\ -2\times 10^{-5} &{} n=4,\\ -1\times 10^{-6} &{} n=5,\\ -4\times 10^{7} &{} n=7, \end{array}\right. } \quad \quad \partial _m h_{{\overline{\alpha }}}(m_2(n),n)\ge {\left\{ \begin{array}{ll} 3\times 10^{-5} &{} n=3,\\ 8\times 10^{-6} &{} n=4,\\ 4\times 10^{-7} &{} n=5,\\ 1\times 10^{-7} &{} n=7. \end{array}\right. } \end{aligned}$$

Let \(n \in \{3,4,5,7\}\). By the Intermediate Value Theorem, the map \(m \mapsto \partial h_{{\overline{\alpha }}}(m,n)\) has a root between \(m_1(n)\) and \(m_2(n)\). Moreover, since \(\partial _m h_{{\overline{\alpha }}}(m_1(n),n) < 0\), we have bracketed the largest root \({\widetilde{m}}({\overline{\alpha }},n)\): \(m_1(n)<{\widetilde{m}}({\overline{\alpha }},n)<m_2(n)\). Therefore

$$\begin{aligned} \varphi ({\overline{\alpha }},n)\ge \frac{c_6}{1-{\overline{\alpha }}}m_1(n)^{{\overline{\alpha }}} + c_n m_1(n)^2 - c_6\left( \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}\right) m_2(n) - \kappa (n-6) \ge {\left\{ \begin{array}{ll} 2\times 10^{-2} &{} n=3,\\ 3\times 10^{-3} &{} n=4,\\ 5\times 10^{-4} &{} n=5,\\ 2\times 10^{-4} &{} n=7, \end{array}\right. } \end{aligned}$$

which proves (55) for the case \(n \in \{3,4,5,7\}\).

Step 4. Finally, we prove (55) for the case \(n \in {\mathbb {N}} \cap [8,\infty )\). To do this we prove that \(\partial _n \varphi ({\overline{\alpha }},n)>0\) for \(n\ge 7\). Then the result follows from the case \(n=7\) proved in Step 3. By definition of \({\widetilde{m}}(\alpha ,n)\), for all \(\alpha \in (0,1)\), we have \(\partial _m h_\alpha ({\widetilde{m}}(\alpha ,n),n)=0\). Therefore

$$\begin{aligned} \partial _n \varphi (\alpha ,n) = \frac{\partial h_\alpha }{\partial n} ({\tilde{m}}(\alpha ,n),n) = {\widetilde{m}}^2(\alpha ,n) \, \partial _n c_n - \kappa . \end{aligned}$$

Since \(n\mapsto c_n\) is convex (Lemma 2.5), then \(\partial _n c_n\) is increasing and we get the lower bound

$$\begin{aligned} \partial _n \varphi (\alpha ,n) \ge {\widetilde{m}}^2(\alpha ,n) \, \partial _n c_n|_{n=7} - \kappa \end{aligned}$$
(56)

for all \(n \ge 7\). We will prove below that

$$\begin{aligned} {\widetilde{m}}({\overline{\alpha }},n)\le \frac{3}{2}. \end{aligned}$$
(57)

Observe that

$$\begin{aligned} \partial _n c_n = - \frac{1}{n} c_n + \frac{1}{2n} \left( - \frac{\pi }{3n^2} \sec ^2 \left( \frac{\pi }{n} \right) + \frac{\pi }{n^2} \csc ^2 \left( \frac{\pi }{n} \right) \right) . \end{aligned}$$
(58)

From (56), (57), (58) we obtain that

$$\begin{aligned} \partial _n \varphi ({\overline{\alpha }},n) \ge {\widetilde{m}}^2({\overline{\alpha }},n) \, \partial _n c_n |_{n=7} -\kappa \ge \left( \frac{3}{2} \right) ^2 \, (\partial _n c_n)|_{n=7} - \kappa> 1.5 \times 10^{-5} >0, \end{aligned}$$

as required.

To prove (57) we reason as follows. Using (51) and the fact that \(n\mapsto c_n\) is decreasing (Lemma 2.5), we deduce that \(n \mapsto \partial _m h_\alpha (m,n)\) is decreasing for all m, and hence \(n\mapsto {\widetilde{m}}(\alpha ,n)\) is increasing. Therefore \({\widetilde{m}}(\alpha ,n)\le {\widetilde{m}}(\alpha ,\infty )\), where \({\widetilde{m}}(\alpha ,\infty )\) is defined to be the largest root of

$$\begin{aligned} \partial _m h_\alpha (m,\infty ) := \frac{\alpha }{1-\alpha } c_6 m^{\alpha -1} + 2 c_\infty m - c_6\frac{2-\alpha }{1-\alpha }, \end{aligned}$$

where \(c_\infty \) was defined in Lemma 2.5. We want to show that \({\widetilde{m}}({\overline{\alpha }},\infty )\le 3/2\). We have

$$\begin{aligned} \partial _m h_{{\overline{\alpha }}} \left( 1, \infty \right)&= \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 + 2 c_\infty - c_6\frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}< -0.0024 < 0, \\ \partial _m h_{{\overline{\alpha }}} \left( 3/2,\infty \right)&= \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 \left( \frac{3}{2}\right) ^{{\overline{\alpha }}-1} + 2 c_\infty \frac{3}{2} - c_6\frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}> 0.12 > 0. \end{aligned}$$

Therefore \({\widetilde{m}}({\overline{\alpha }},\infty ) \in (1,3/2)\) by the Intermediate Value Theorem. This proves (57) and completes the proof. \(\square \)

Remark 4.13

Despite the fact that we used the specific value of \({{\overline{\alpha }}}\) in several places in the proof of Lemma 4.11, we expect Lemma 4.11 to hold for all \(\alpha \in (0,1)\).

4.3 Lower bound on the area of optimal cells

Let \(\mu =\sum _{i=1}^{N_\mu }m_i\delta _{z_i} \in {\mathcal {A}}_{\delta ,{\overline{\alpha }}}\) be a minimizer of \({\mathcal {F}}_{\delta ,{\overline{\alpha }}}\). We will prove the convexity inequality \(h_{{\overline{\alpha }}}(m_i,n)\ge 0\) for all \(i \in \{1,\ldots ,N_\mu \}\), \(n \in {\mathbb {N}} \cap [3,\infty )\). The idea is to prove a lower bound \(m_i\ge {\overline{m}}\) such that \(h_{{\overline{\alpha }}}({\overline{m}},n)\ge 0\) for all \(n \in {\mathbb {N}} \cap [3,\infty )\). Then the convexity inequality follows from Lemma 4.11.

We prove the lower bound \(m_i\ge {\overline{m}}\) following the strategy of the proof of [18, Lemma 7], which was developed for the case \(\alpha =0.5\). The main differences are that we have to deal with the more difficult case of \(\alpha ={\overline{\alpha }}>0.5\) and that we optimise some of the estimates. Our proof can be used to give a lower bound \({\overline{m}}\) on the areas \(m_i\) for all \(\alpha \in (0,1)\), but this lower bound does not satisfy the convexity inequality \(h_{\alpha }({\overline{m}},n)\ge 0\) if \(\alpha > {\overline{\alpha }}\). Our lower bound on the area of the cells holds for cells with arbitrarily many sides, although by Corollary 4.12 we only need the lower bound for cells with 3, 4 or 5 sides. We saw no advantage in the proof of restricting the number of sides.

The following result gives the difference in energy of a partition and the one obtained by merging two of its cells. Recall that \({\mathcal {S}}_{\delta ,\alpha }\) and F were defined in Definitions 4.7 and 4.8 .

Lemma 4.14

(Merging). Let \({\mathcal {C}}=(C_1,\dots ,C_k)\in {\mathcal {S}}_{\delta ,\alpha }\), \(k\ge 2\). For \(i\in \{1,\dots ,k\}\) let \(m_i=|C_i|\) and let \(z_i\in C_i\) be the centroid of \(C_i\). Define \({\mathcal {D}}\in {\mathcal {S}}_{\delta ,\alpha }\) by \({\mathcal {D}}:=(C_1\cup C_2, C_3,\dots , C_k)\). For all \(\alpha \in (-\infty ,1)\),

$$\begin{aligned} F({\mathcal {D}}) - F({\mathcal {C}}) = \frac{c_6}{1-\alpha }\left( (m_1+m_2)^\alpha -m_1^\alpha -m_2^\alpha \right) + |z_2-z_{1}|^2\frac{m_1m_2}{m_1+m_2}. \end{aligned}$$
(59)

Proof

By definition

$$\begin{aligned} z_1 = \frac{1}{m_1}\int _{C_1} x\,\mathrm {d}x, \quad \quad z_2 = \frac{1}{m_2}\int _{C_2} x\,\mathrm {d}x. \end{aligned}$$

Let \({\overline{z}}\in C_1\cup C_2\) be the centroid of \(C_1\cup C_2\):

$$\begin{aligned} {\bar{z}} = \frac{m_1}{m_1+m_2}z_{1} + \frac{m_2}{m_1+m_2}z_2. \end{aligned}$$

A direct computation gives

$$\begin{aligned}&\int _{C_1\cup C_2} |x-{\bar{z}}|^2 \,\mathrm {d}x - \int _{C_1} |x-z_{1}|^2 \,\mathrm {d}x - \int _{C_2} |x-z_2|^2 \,\mathrm {d}x \\&\quad = \int _{C_1} \left( |{\bar{z}}|^2 - 2 x\cdot {\bar{z}} - |z_{1}|^2 + 2 x\cdot z_{1} \right) \,\mathrm {d}x + \int _{C_2} \left( |{\bar{z}}|^2 - 2 x\cdot {\bar{z}} - |z_2|^2 + 2 x\cdot z_2 \right) \, \mathrm {d}x \\&\quad = m_1|{\bar{z}}|^2 - 2m_1 z_{1}\cdot {\bar{z}} +m_1|z_{1}|^2 + m_2|{\bar{z}}|^2 - 2m_2 z_2\cdot {\bar{z}} + m_2|z_2|^2 \\&\quad = m_1 |{\bar{z}}-z_1|^2 + m_2 |{\bar{z}}-z_2|^2 \\&\quad = m_1 \left| \frac{m_2}{m_1+m_2}(z_2-z_1)\right| ^2 + m_2 \left| \frac{m_1}{m_1+m_2}(z_1-z_2)\right| ^2 \\&\quad = \frac{m_1 m_2}{m_1+m_2}|z_2-z_{1}|^2. \end{aligned}$$

The result now follows immediately from the definition of F. \(\square \)

We now prove a lower bound on the area of optimal cells, as well as an upper bound on the diameter of the cells and the maximum distance between the centroids. The latter two estimates will be used later to deal with the fact that the lower bound on the area of cells close to the boundary of \(Q_{\delta ,{\overline{\alpha }}}\) is not good enough to ensure the validity of the convexity inequality (50).

Lemma 4.15

(Lower bound on the area of optimal cells). Let \(\mu =\sum _{i=1}^{N_\mu }m_i\delta _{z_i} \in {\mathcal {A}}_{\delta ,{\overline{\alpha }}}\) be a minimizer of \({\mathcal {F}}_{\delta ,{\overline{\alpha }}}\). If \(\delta >0\) is sufficiently small, then the following hold:

  1. (i)

    If \(\mathrm {dist}(z_i,\partial Q_{\delta ,{\overline{\alpha }}}) \ge 4\), then

    $$\begin{aligned} m_i > {\overline{m}} := 2.0620 \times 10^{-4}. \end{aligned}$$
  2. (ii)

    If \(\mathrm {dist}(z_i,\partial Q_{\delta ,{\overline{\alpha }}}) < 4\), then

    $$\begin{aligned} m_i > m_{\mathrm {b}} := 1.5212 \times 10^{-5}. \end{aligned}$$
  3. (iii)

    Let \(B \subset Q_{\delta ,{\overline{\alpha }}}\) be a ball of radius R. If \(B \cap \mathrm {supp}(\mu ) = \emptyset \), then \(R<R_0 := 3.3644\).

  4. (iv)

    Let T be the optimal transport map defining \(W_2({\mathbbm {1}}_{Q_{\delta ,{\overline{\alpha }}}},\mu )\). For all \(i \in \{1,\ldots ,N_\mu \}\),

    $$\begin{aligned} \mathrm {diam}(T^{-1}(z_i)) \le D_0 := 2 \left( 8 R_0^2 + \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 m_{\mathrm {b}}^{{\overline{\alpha }}-1} \right) ^{\frac{1}{2}}. \end{aligned}$$
Fig. 4
figure 4

The construction inside the ball \(B_R({\bar{z}})\) (see the proof of Lemma 4.15)

Proof

Step 1: Upper bound on the distance between Dirac masses. Let \({\bar{z}}\in \mathrm{supp}(\mu )\) satisfy \(\mathrm {dist}({\bar{z}},\partial Q_{\delta ,{\overline{\alpha }}}) \ge 4\). Take \(R \in (0,4)\) such that \(B_{R}({\bar{z}})\cap \mathrm{supp}(\mu )=\{{\bar{z}}\}\). We want to get an upper bound on R. Let \(S:B_{R}({\bar{z}}) \rightarrow \mathrm{supp}(\mu )\) be any Borel map. Then

$$\begin{aligned} \int _{B_{R}({\bar{z}})} |x-S(x)|^2 \, \mathrm {d}x&= \int _{B_{R/2}({\bar{z}})} |x-S(x)|^2 \, \mathrm {d}x + \int _{B_{R}({\bar{z}})\setminus B_{R/2}({\bar{z}})} |x-S(x)|^2 \, \mathrm {d}x\nonumber \\&\ge \int _{B_{R/2}({\bar{z}})} |x-{\bar{z}}|^2 \, \mathrm {d}x + \int _{B_{R}({\bar{z}})\setminus B_{R/2}({\bar{z}})} \Big | x-R\frac{x}{|x|} \Big |^2 \, \mathrm {d}x \nonumber \\&= \frac{\pi }{12}R^4. \end{aligned}$$
(60)

It is now convenient to use the partition energy F. Let \({\mathcal {C}}^\mu \) be the partition associated to the minimizer \(\mu \) (see Definition 4.9). Consider the partition \(\widetilde{{\mathcal {C}}} \in {\mathcal {S}}_{\delta ,\alpha }\) obtained by modifying \({\mathcal {C}}^\mu \) in the ball \(B_R({\bar{z}})\) as follows. Write \({\mathcal {C}}^\mu =(C_1,\ldots ,C_{N_\mu })\) and define \({\widetilde{C}}_i = C_i \setminus B_R({\bar{z}})\) for \(i \in \{ 1 ,\ldots , N_\mu \}\). Let \(\{H_j\}\) be a tiling of the plane by regular hexagons of area A, where \(A>0\) will be defined below, and let \(\widetilde{H_j}=H_j \cap B_R({\bar{z}})\). Define \(\widetilde{{\mathcal {C}}}\) to be the partition consisting of the sets \({\widetilde{C}}_i\) for all \(i \in \{ 1 ,\ldots , N_\mu \}\) and \({\widetilde{H}}_j\) for all j such that \(\widetilde{H_j} \ne \emptyset \); see Fig. 4.

Let \(d_A:=2^{\frac{3}{2}}3^{-\frac{3}{4}}A^{\frac{1}{2}}\) be the diameter of a regular hexagon of area A. The number of hexagons \(N_A\) needed to cover a ball of radius R is bounded above by

$$\begin{aligned} N_A\le \frac{\pi (R+d_A)^2}{A}. \end{aligned}$$
(61)

If \(H \in \widetilde{{\mathcal {C}}}\) is a (whole) regular hexagon of area A with centroid \(\xi _H\), then

$$\begin{aligned} \frac{c_6}{1-{\overline{\alpha }}} |H|^{{\overline{\alpha }}} + \int _H |x-\xi _H|^2 \, \mathrm {d}x = \frac{c_6}{1-{\overline{\alpha }}} A^{{\overline{\alpha }}} + c_6 A^2. \end{aligned}$$
(62)

Since \(\mu \) is a minimizer of \({\mathcal {F}}_{\delta ,{\overline{\alpha }}}\), then \(C^\mu \) is a minimizer of F. Therefore

$$\begin{aligned} 0&\ge F({\mathcal {C}}^\mu ) - F(\widetilde{{\mathcal {C}}}) \nonumber \\&\ge \sum _{\begin{array}{c} i=1\\ C_i \cap B_R({\bar{z}}) \ne \emptyset \end{array}}^{N_\mu } \left( \frac{c_6}{1-{\overline{\alpha }}} |C_i|^{{\overline{\alpha }}} + \int _{C_i \setminus B_R({\bar{z}})} |x-\xi _{C_i}|^2 \, \mathrm {d}x + \int _{C_i \cap B_R({\bar{z}})} |x-\xi _{C_i}|^2 \, \mathrm {d}x \right) \nonumber \\&\quad - \sum _{\begin{array}{c} i=1\\ C_i \cap B_R({\bar{z}}) \ne \emptyset \end{array}}^{N_\mu } \left( \frac{c_6}{1-{\overline{\alpha }}} |C_i \setminus B_R({\bar{z}})|^{{\overline{\alpha }}} + \int _{C_i \setminus B_R({\bar{z}})} |x-\xi _{C_i \setminus B_R({\bar{z}})}|^2 \, \mathrm {d}x \right) \nonumber \\&\quad - N_A \left( \frac{c_6}{1-{\overline{\alpha }}} |H|^{{\overline{\alpha }}} + \int _H |x-\xi _H|^2 \, \mathrm {d}x\right) \nonumber \\&{\mathop {\ge }\limits ^{(62)}} \sum _{i=1}^{N_\mu } \int _{C_i \cap B_R({\bar{z}})} |x-\xi _{C_i}|^2 \, \mathrm {d}x - N_A\left( \frac{c_6}{1-{\overline{\alpha }}}A^{{\overline{\alpha }}} + c_6 A^2 \right) \nonumber \\&\quad + \sum _{\begin{array}{c} i=1\\ C_i \cap B_R({\bar{z}}) \ne \emptyset \end{array}}^{N_\mu } \left( \int _{C_i \setminus B_R({\bar{z}})} |x-\xi _{C_i}|^2 \, \mathrm {d}x - \int _{C_i \setminus B_R({\bar{z}})} |x-\xi _{C_i \setminus B_R({\bar{z}})}|^2 \, \mathrm {d}x \right) \nonumber \\&\ge \int _{B_R({\bar{z}})} |x-S(x)|^2 \mathrm {d}x - N_A\left( \frac{c_6}{1-{\overline{\alpha }}}A^{{\overline{\alpha }}} + c_6 A^2 \right) , \end{aligned}$$
(63)

where \(S:B_R({\bar{z}}) \rightarrow \mathrm {supp}(\mu )\) is defined by \(S(x)=\xi _{C_i}\) if \(x \in C_i\), and where in the final inequality we used the property of the centroid that

$$\begin{aligned} \int _{C_i \setminus B_R({\bar{z}})} |x-\xi _{C_i \setminus B_R({\bar{z}})}|^2 \, \mathrm {d}x = \inf _{y \in {\mathbb {R}}^2} \int _{C_i \setminus B_R({\bar{z}})} |x-y|^2 \, \mathrm {d}x. \end{aligned}$$

Combining estimates (63) and (60) gives

$$\begin{aligned} \frac{\pi }{12}R^4 \le N_A\left( \frac{c_6}{1-{\overline{\alpha }}}A^{{\overline{\alpha }}} + c_6 A^2 \right) {\mathop {\le }\limits ^{(61)}} \frac{\pi (R+d_A)^2}{A}\left( \frac{c_6}{1-{\overline{\alpha }}}A^{{\overline{\alpha }}} + c_6 A^2 \right) . \end{aligned}$$
(64)

Define

$$\begin{aligned} p(R,A):=R^2 - (R+d_A)\left( \frac{12 c_6}{1-{\overline{\alpha }}}A^{{\overline{\alpha }}-1} + 12 c_6 A \right) ^{\frac{1}{2}}. \end{aligned}$$

Then (64) implies that \(p(R,A)\le 0\). The quadratic polynomial \(R \mapsto p(R,A)\) has one positive root and one negative root. Let \({\widetilde{R}}(A)\) denote the positive root:

$$\begin{aligned} {\widetilde{R}}(A):= \frac{1}{2} \left[ 12c_6\left( \frac{A^{{\overline{\alpha }}-1}}{1-{\overline{\alpha }}} + A \right) \right] ^{\frac{1}{2}} + \frac{1}{2} \sqrt{12c_6\left( \frac{A^{{\overline{\alpha }}-1}}{1-{\overline{\alpha }}} + A \right) + 4\left[ 12c_6\left( \frac{A^{{\overline{\alpha }}-1}}{1-{\overline{\alpha }}} + A \right) \right] ^{\frac{1}{2}}d_A}. \end{aligned}$$

Since \(p(R,A)\le 0\), we have \(R \in [0,{\widetilde{R}}(A)]\) for all A, and so

$$\begin{aligned} R\le \min _{A>0} {\widetilde{R}}(A) \le {\widetilde{R}}(0.52) < 3.3644, \end{aligned}$$
(65)

where the final inequality was obtained by numerically evaluating \({\widetilde{R}}(0.52)\). The choice \(A=0.52\) was motivated by numerically minimising \({\widetilde{R}}(A)\).

Step 2: Proof of (i). Let \({\bar{z}}\in \mathrm {supp}(\mu )\) satisfy \(\mathrm {dist}({\bar{z}},\partial Q_{\delta ,{\overline{\alpha }}}) \ge 4\). Let \(R_0=3.3644\). By Step 1, there exists at least one point \(z\in B_{R_0}({\bar{z}}) \cap \mathrm {supp}(\mu )\). In particular,

$$\begin{aligned} |{\bar{z}}-z| \le 3.3644. \end{aligned}$$
(66)

Let \(m=\mu (\{{\bar{z}}\})\), \(M=\mu (\{z\})\). We can assume without loss of generality that \(m \le M\) (otherwise simply interchange the roles of \({\bar{z}}\) and z).

Let \({\mathcal {C}}^\mu \) be the partition associated to the minimizer \(\mu \). We can define a new partition \({\mathcal {D}}\) by replacing the cells \(\overline{T^{-1}({\bar{z}})}\) and \(\overline{T^{-1}(z)}\) with their union. Then Lemma 4.14 yields

$$\begin{aligned} 0 \le F({\mathcal {D}}) - F({\mathcal {C}}^\mu ) \le \frac{c_6}{1-{\overline{\alpha }}}\left( (m+M)^{{\overline{\alpha }}}-m^{{\overline{\alpha }}}-M^{{\overline{\alpha }}} \right) + |{\bar{z}}-z|^2 \frac{mM}{m+M}. \end{aligned}$$

Define \(\lambda :=\frac{m}{M} \in (0,1]\). By dividing the previous inequality by \(M^{{\overline{\alpha }}}\) and rearranging we obtain

$$\begin{aligned} m \ge \left[ \frac{1}{|{\bar{z}}-z|^2}\frac{c_6}{1-{\overline{\alpha }}} \inf _{\lambda \in (0,1]}\left( (1+\lambda ) \frac{1+\lambda ^{{\overline{\alpha }}}-(1+\lambda )^{{\overline{\alpha }}}}{\lambda ^{{\overline{\alpha }}}} \right) \right] ^{\frac{1}{1-{\overline{\alpha }}}}. \end{aligned}$$
(67)

For \(\alpha \in (0,1)\), let

$$\begin{aligned} \Theta _\alpha (\lambda ):=(1+\lambda )\frac{1+\lambda ^{\alpha }-(1+\lambda )^{\alpha }}{\lambda ^{\alpha }}. \end{aligned}$$

We can restrict out attention to \(\alpha >0\) since eventually we will apply this result to \(\alpha ={\overline{\alpha }}>0\). We want to bound \(\Theta _{\alpha }\) from below for the case \(\alpha ={\overline{\alpha }}\). The idea is the following: We first prove in Step 2a that the function \(\lambda \mapsto \Theta _{\alpha }(\lambda )\) has one minimum point for all \(\alpha \in (0,1)\). In Step 2b we estimate this minimum point for the case \(\alpha ={\overline{\alpha }}\).

Step 2a. In this substep we prove that, for all \(\alpha \in (0,1)\), the function \(\lambda \mapsto \Theta '_\alpha (\lambda )\) vanishes at only one point \(\lambda \in [0,1)\). Fix \(\alpha \in (0,1)\). We have

$$\begin{aligned} \Theta '_\alpha (\lambda ) = \frac{1}{\lambda ^{1+\alpha }}\left[ (1-{\alpha })\lambda + \lambda ^{1+\alpha } -(\alpha +1)(1+\lambda )^{\alpha }\lambda + \alpha (1+\lambda )^{1+\alpha } -\alpha \right] =:\frac{1}{\lambda ^{1+\alpha }} \Lambda (\lambda ). \end{aligned}$$

The strategy we use to prove that there exists a unique \(\lambda \in (0,1)\) such that \(\Lambda (\lambda )=0\) is the following: Using the fact that

$$\begin{aligned} \Lambda (0)<0,\quad \quad \Lambda (1)=(1-\alpha )(2-2^\alpha )>0, \quad \quad \Lambda '(0) = -\alpha + \alpha ^2 < 0, \end{aligned}$$

the desired result is proved once we show that \(\Lambda '\) vanishes at only one point \(\lambda \in (0,1)\). A direct computation gives

$$\begin{aligned} \Lambda '(\lambda )&= \frac{1}{(1+\lambda )^{1-\alpha }}\left[ (1-\alpha )(1+\lambda )^{1-\alpha } + (1+\alpha )\lambda ^\alpha (1+\lambda )^{1-\alpha } - 1 +\alpha ^2 - (1+\alpha )\lambda \right] \\&=: \frac{1}{(1+\lambda )^{1-\alpha }} \Phi (\lambda ) \end{aligned}$$

where

$$\begin{aligned} \Phi (\lambda ) = (1-\alpha )(1+\lambda )^{1-\alpha } + (1+\alpha )\lambda ^\alpha (1+\lambda )^{1-\alpha } -1+\alpha ^2 - (1+\alpha )\lambda . \end{aligned}$$

We claim that

$$\begin{aligned} \Phi (0)<0, \quad \quad \Phi (1)>0, \end{aligned}$$
(68)

and that

$$\begin{aligned} \Phi '(\lambda )>0 \end{aligned}$$
(69)

for all \(\lambda \in [0,1)\). This will show that \(\Phi \) vanishes at only one point \(\lambda \in (0,1)\) and, in turn, that the same holds for \(\Lambda '\).

We start by proving (68). Note that \(\Phi (0)=-\alpha + \alpha ^2 < 0\) since \(\alpha \in (0,1)\). Let

$$\begin{aligned} \psi (\alpha ):=\Phi (1)=2^{2-\alpha } - 2 - \alpha + \alpha ^2. \end{aligned}$$

We want to prove that \(\psi (\alpha )>0\). Since \(\psi (1)=0\), it is sufficient to prove that \(\psi '(\alpha )<0\). Note that \(\psi '(\alpha ) = -2^{2-\alpha }\ln 2 - 1 +2\alpha \), \(\psi ''(\alpha )=2^{2-\alpha }(\ln 2)^2 + 2 >0\), and \(\psi '(1)=-2\ln 2 + 1 <0\). Therefore \(\psi (\alpha )>0\) for all \(\alpha \in (0,1)\), which completes the proof of (68).

Finally, we prove (69). We have

$$\begin{aligned} \Phi '(\lambda )&= (1-\alpha )^2(1+\lambda )^{-\alpha } + \alpha (1+\alpha )\lambda ^{\alpha -1}(1+\lambda )^{1-\alpha } +(1-\alpha ^2)\lambda ^\alpha (1+\lambda )^{-\alpha } - (1+\alpha ),\\ \Phi ''(\lambda )&= -\alpha (1-\alpha )^2 (1+\lambda )^{-1-\alpha } - \alpha (1-\alpha ^2) \lambda ^{\alpha -2} (1+\lambda )^{-\alpha }(1-\lambda ) - \alpha (1-\alpha ^2) \lambda ^\alpha (1+\lambda )^{-1-\alpha }. \end{aligned}$$

Since \(\Phi ''(\lambda )<0\) for all \(\lambda \in [0,1)\), we have that

$$\begin{aligned} \Phi '(\lambda )\ge \Phi '(1) = 2^{1-\alpha }(1+\alpha ^2) - (1+\alpha ) =: \varphi (\alpha ). \end{aligned}$$

We have

$$\begin{aligned} \varphi '(\alpha ) = -\ln 2 (1+\alpha ^2)2^{1-\alpha } + 2\alpha 2^{1-\alpha } -1, \quad \varphi ''(\alpha ) = 2^{1-\alpha }\left[ (\ln 2)^2 \alpha ^2 -4\alpha \ln 2 + 2 +(\ln 2)^2 \right] . \end{aligned}$$

Therefore \(\varphi ''(\alpha )=0\) if and only if \(\alpha \in \{\alpha _-,\alpha _+\}\), where

$$\begin{aligned} \alpha _{\pm } = \frac{2\pm \sqrt{2-(\ln 2)^2}}{\ln 2}. \end{aligned}$$

Since \(1<\alpha _-<\alpha _+\), \(\varphi \) is strictly convex in \((0,\alpha _-)\). Therefore, for all \(\alpha \in (0,1)\),

$$\begin{aligned} \varphi (\alpha )> \varphi (1) + \varphi '(1)(\alpha - 1) = (1-2\ln 2)(\alpha -1) > 0. \end{aligned}$$

Therefore \(\Phi '(\lambda ) \ge \varphi (\alpha )>0\), which completes the proof of (69).

Step 2b. Let \({\bar{\lambda }} \in (0,1)\) denote the unique root of \(\Theta '_{{\overline{\alpha }}}\). We now estimate \(\Theta _{{\overline{\alpha }}}({\bar{\lambda }})=\inf _{\lambda \in (0,1]}\Theta _{{\overline{\alpha }}}(\lambda )\). Let \(\lambda _1 = 0.160764\) and \(\lambda _2 = 0.160767\). Numerically we see that

$$\begin{aligned} \Theta '_{{\overline{\alpha }}}(\lambda _1)< -2\times 10^{-6}<0, \quad \quad \Theta '_{{\overline{\alpha }}}(\lambda _2)> 2\times 10^{-6}>0. \end{aligned}$$

Therefore \({\bar{\lambda }}\in (\lambda _1,\lambda _2)\) by the Intermediate Value Theorem. Recall that

$$\begin{aligned} \Theta '_{{\overline{\alpha }}}(\lambda ) = \lambda ^{-1-\alpha } [(1-{\overline{\alpha }})\lambda + \lambda ^{1+{\overline{\alpha }}} + (1+\lambda )^{{\overline{\alpha }}} ({\overline{\alpha }}-\lambda ) - {\overline{\alpha }}]. \end{aligned}$$

Therefore, for all \(\lambda \in (\lambda _1,\lambda _2)\),

$$\begin{aligned} \Theta '_{{\overline{\alpha }}}(\lambda )&\le \lambda _1^{-1-{\overline{\alpha }}} [ (1-{\overline{\alpha }})\lambda _2 +\lambda _2^{1+{\overline{\alpha }}} + (1+\lambda _2)^{{\overline{\alpha }}} ({\overline{\alpha }} - \lambda _1) - {\overline{\alpha }}] < 6.2 \times 10^{-5}\\ \Theta '_{{\overline{\alpha }}}(\lambda )&\ge \lambda _1^{-1-{\overline{\alpha }}} [ (1-{\overline{\alpha }})\lambda _1 +\lambda _1^{1+{\overline{\alpha }}} + (1+\lambda _1)^{{\overline{\alpha }}} ({\overline{\alpha }} - \lambda _2) - {\overline{\alpha }}] > - 6.3 \times 10^{-5}. \end{aligned}$$

Therefore \(|\Theta '_{{\overline{\alpha }}}(\lambda )|\le 6.3 \times 10^{-5}\) for all \(\lambda \in (\lambda _1,\lambda _2)\). It follows that

$$\begin{aligned} \Theta _{{\overline{\alpha }}}({\bar{\lambda }})\ge \Theta _{{\overline{\alpha }}}(\lambda _1) - (\lambda _2-\lambda _1)\Vert \Theta '_{{\overline{\alpha }}} \Vert _{L^\infty ((\lambda _1,\lambda _2))} \ge 0.85482. \end{aligned}$$
(70)

From (66), (67) and (70) we conclude that

$$\begin{aligned} m> \left( \frac{1}{(3.3644)^2} \cdot \frac{c_6}{1-{\overline{\alpha }}}\cdot 0.85482 \right) ^{\frac{1}{1-{\overline{\alpha }}}} > 2.0620 \times 10^{-4} = {\overline{m}}, \end{aligned}$$
(71)

which proves (i).

Step 3: Proof of (iii). This is very similar to Step 1. Let \(B=B_R(x_0) \subset Q_{\delta ,{\overline{\alpha }}}\) satisfy \(B \cap \mathrm {supp}(\mu ) = \emptyset \). Let \(S:B \rightarrow \mathrm{supp}(\mu )\) be any Borel map. We have

$$\begin{aligned} \int _{B} |x-S(x)|^2 \, \mathrm {d}x \ge \int _{B} \mathrm {dist}(x,\partial B)^2 \, \mathrm {d}x \ge \int _{B} \mathrm {dist}(x,\{x_0\} \cup \partial B)^2 \, \mathrm {d}x {\mathop {=}\limits ^{(60)}} \frac{\pi }{12}R^4. \end{aligned}$$

The second inequality is clearly suboptimal, but it is sufficient for our purposes. Repeating exactly the same argument used in Step 1 (with \(B_R({\bar{z}})\) replaced by B) gives \(R<3.3644\), as required.

Step 4: Proof of (ii). Let \(z_i \in \mathrm {supp}(\mu )\) satisfy \(\mathrm {dist}(z_i,\partial Q_{\delta ,{\overline{\alpha }}}) < 4\). Let U be the square \(U = \{ x \in Q_{\delta ,{\overline{\alpha }}} : \mathrm {dist}(x,\partial Q_{\delta ,{\overline{\alpha }}}) \ge 4 \}\). Take \(x \in \partial U\) satisfying \(|x-z_i| = \mathrm {dist}(z_i,\partial U)\). Let K be a closed square of side-length \(2R_0\) such that \(x \in \partial K\) and \(K \subset U\) (such a square exists if \(\delta \) is sufficiently small). By Step 3, there exists \(z_j \in \mathrm {supp}(\mu ) \cap K\), \(z_j \ne z_i\). Therefore

$$\begin{aligned} |z_i - z_j| \le |z_i - x| + |x-z_j| \le \sqrt{32}+\mathrm {diam}(K) = \sqrt{32} + \sqrt{8} R_0. \end{aligned}$$
(72)

Without loss of generality can assume that \(m_i < {\overline{m}}\) (otherwise \(m_i \ge {\overline{m}} > m_{\mathrm {b}}\) and there is nothing to prove). Let T be the optimal transport map defining \(W_2({\mathbbm {1}}_{Q_{\delta ,\alpha }},\mu )\). Let \(z \in T^{-1}(z_i)\). By Lemmas 2.1 and 4.5(i),

$$\begin{aligned} |z-z_i| + \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 m_i^{{\overline{\alpha }}-1} \le |z-z_j| + \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 m_j^{{\overline{\alpha }}-1}. \end{aligned}$$
(73)

By Lemma 4.5(ii), \(z_i \in T^{-1}(z_i)\). Taking \(z=z_i\) in (73) gives

$$\begin{aligned} \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 m_i^{{\overline{\alpha }}-1} \le |z_i-z_j| + \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 m_j^{{\overline{\alpha }}-1} \le \sqrt{32} + \sqrt{8} R_0 + \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 {\overline{m}}^{{\overline{\alpha }}-1} \end{aligned}$$

by (72) and Lemma 4.15(i). Therefore

$$\begin{aligned} m_i \ge \left[ \frac{1-{\overline{\alpha }}}{c_6 {\overline{\alpha }}} \left( \sqrt{32} + \sqrt{8} R_0 + \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 {\overline{m}}^{{\overline{\alpha }}-1}\right) \right] ^{\frac{1}{{\overline{\alpha }}-1}} > 1.5212 \times 10^{-5} \end{aligned}$$

as required.

Step 5: Proof of (iv). Take any \(x \in T^{-1}(z_i)\). Let K be a closed square of side-length \(2R_0\) such that \(x \in K\) and \(K \subset Q_{\delta ,{\overline{\alpha }}}\) (such a square exists if \(\delta \) is sufficiently small). By Step 3, there exists at least one point \(z_j \in \mathrm {supp}(\mu ) \cap K\). Therefore \(|x-z_j|^2 \le \mathrm {diam}(K)^2 = 8R_0^2\). By Lemmas 2.1 and 4.5(i),

$$\begin{aligned} |x-z_i|^2 \le |x-z_j|^2 + \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 m_j^{{\overline{\alpha }}-1} - \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 m_i^{{\overline{\alpha }}-1} \le 8R_0^2 + \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 m_{\mathrm {b}}^{{\overline{\alpha }}-1}. \end{aligned}$$

By Lemma 4.5(ii), \(z_i \in T^{-1}(z_i)\). Therefore

$$\begin{aligned} \mathrm {diam}(T^{-1}(z_i)) \le 2 \max _{x \in T^{-1}(z_i)} |x-z_i| \le 2 \left( 8R_0^2 + \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 m_{\mathrm {b}}^{{\overline{\alpha }}-1} \right) ^{\frac{1}{2}} \end{aligned}$$

as required. \(\square \)

The lower bound we obtained in Lemma 4.15(i) is good enough to ensure the validity of the convexity inequality (50):

Corollary 4.16

(Convexity inequality). Let \(\mu =\sum _{i=1}^{N_\mu }m_i\delta _{z_i} \in {\mathcal {A}}_{\delta ,{\overline{\alpha }}}\) be a minimizer of \({\mathcal {F}}_{\delta ,{\overline{\alpha }}}\). If \(\delta >0\) is sufficiently small and \(\mathrm {dist}(z_i,\partial Q_{\delta ,{\overline{\alpha }}}) \ge 4\), then

$$\begin{aligned} h_{{\overline{\alpha }}}(m_i,n) \ge 0 \quad \forall \; n \in {\mathbb {N}} \cap [3,\infty ). \end{aligned}$$
(74)

Proof

Recall that \({\overline{m}} = 2.0620 \times 10^{-4}\). By a direct computation we see that

$$\begin{aligned} h_{{\overline{\alpha }}}({\overline{m}}, 3)> 7\times 10^{-7}>0, \quad h_{{\overline{\alpha }}}({\overline{m}}, 4)> 8 \times 10^{-4}> 0, \quad h_{{\overline{\alpha }}}({\overline{m}}, 5)> 1 \times 10^{-3} > 0. \end{aligned}$$

Therefore (74) follows from Lemma 4.15(i), Lemma 4.11 and Corollary 4.12. \(\square \)

4.4 Proof of Theorem 1.1

We are now in position to prove one of our main results.

Theorem 4.17

(Lower bound on \(C_{\mathrm {p}}({\overline{\alpha }})\)). We have

$$\begin{aligned} C_{\mathrm {p}}({\overline{\alpha }}) \ge \frac{2 - {\overline{\alpha }}}{1-{\overline{\alpha }}} \, c_6. \end{aligned}$$

Combining Theorem 4.17 with Lemmas 4.1, 3.12 and Corollary 3.10 completes the proof of Theorem 1.1.

Proof of Theorem 4.17

By (48) it is sufficient to prove that

$$\begin{aligned} \lim _{\delta \rightarrow 0} V^{-1}_{\delta ,{\overline{\alpha }}} \min \left\{ {\mathcal {F}}_{\delta ,{\overline{\alpha }}}(\mu ) : \mu \in {\mathcal {A}}_{\delta ,{\overline{\alpha }}} \right\} \ge \frac{2 - {\overline{\alpha }}}{1-{\overline{\alpha }}} \, c_6. \end{aligned}$$
(75)

Let \(\mu =\sum _{i=1}^{N_\mu }m_i\delta _{z_i} \in {\mathcal {A}}_{\delta ,{\overline{\alpha }}}\) be a minimizer of \({\mathcal {F}}_{\delta ,{\overline{\alpha }}}\) and let

$$\begin{aligned} \begin{aligned} {\mathcal {I}}&= \{ i \in \{1,\ldots ,N_\mu \} : \mathrm {dist}(z_i,\partial Q_{\delta ,{\overline{\alpha }}}) \ge 4 \},\\ {\mathcal {J}}&= \{ j \in \{1,\ldots ,N_\mu \} : \mathrm {dist}(z_j,\partial Q_{\delta ,{\overline{\alpha }}}) < 4 \}. \end{aligned} \end{aligned}$$
(76)

Let U be the tubular neighbourhood of \(\partial Q_{\delta ,{\overline{\alpha }}}\) of width \(4+D_0\), where \(D_0\) was defined in Lemma 4.15(v):

$$\begin{aligned} U = \left\{ x + y : x \in \partial Q_{\delta ,{\overline{\alpha }}}, \; y \in B_{4+D_0}(x) \right\} \cap Q_{\delta ,{\overline{\alpha }}}. \end{aligned}$$

Let T be the optimal transport map defining \(W_2({\mathbbm {1}}_{Q_{\delta ,{\overline{\alpha }}}},\mu )\). By Lemma 4.15(v),

$$\begin{aligned} \bigcup _{j \in {\mathcal {J}}} T^{-1}(z_j) \subset U. \end{aligned}$$
(77)

Recall that \(Q_{\delta ,{\overline{\alpha }}}\) is a square of side-length \(V_{\delta ,{\overline{\alpha }}}^{1/2}\). By Lemma 4.15(iv), \(|T^{-1}(z_j)| > m_{\mathrm {b}}\) for all \(j \in {\mathcal {J}}\). Therefore, for \(\delta \) sufficiently small so that \(V_{\delta ,{\overline{\alpha }}}^{1/2} > 2(4+D_0)\),

$$\begin{aligned} \# {\mathcal {J}} < \frac{|U|}{m_{\mathrm {b}}} = \frac{V_{\delta ,{\overline{\alpha }}}-(V_{\delta ,{\overline{\alpha }}}^{1/2}-2(4+D_0))^2}{m_{\mathrm {b}}}. \end{aligned}$$
(78)

Hence

$$\begin{aligned} \lim _{\delta \rightarrow 0} \frac{\# {\mathcal {J}}}{V_{\delta ,{\overline{\alpha }}}} = 0. \end{aligned}$$
(79)

Since \(\sum _{i=1}^{N_{\mu }} m_i=V_{\delta ,{{\overline{\alpha }}}}\), we have

$$\begin{aligned} 1 \ge V_{\delta ,{{\overline{\alpha }}}}^{-1} \sum _{i \in {\mathcal {I}}} m_i = V_{\delta ,{{\overline{\alpha }}}}^{-1} \sum _{i=1}^{N_\mu } m_i - V_{\delta ,{{\overline{\alpha }}}}^{-1} \sum _{j \in {\mathcal {J}}} m_j {\mathop {\ge }\limits ^{(77)}} 1 - V_{\delta ,{{\overline{\alpha }}}}^{-1} |U|. \end{aligned}$$

Using this and (78) we conclude that

$$\begin{aligned} \lim _{\delta \rightarrow 0} V_{\delta ,{{\overline{\alpha }}}}^{-1} \sum _{i \in {\mathcal {I}}} m_i =1. \end{aligned}$$
(80)

For \(i \in \{ 1 \ldots , N_\mu \}\), let \(n_i\) be the number of edges of the convex polygon \(T^{-1}(z_i)\). Recall from Lemma 2.6 that

$$\begin{aligned} \sum _{i=1}^{N_\mu } (n_i - 6) \le 0. \end{aligned}$$
(81)

Finally, we put everything together to complete the proof:

$$\begin{aligned} V_{\delta ,{\overline{\alpha }}}^{-1} {\mathcal {F}}_{\delta ,{\overline{\alpha }}}(\mu )&\ge V_{\delta ,{\overline{\alpha }}}^{-1} \sum _{i=1}^{N_\mu } g_{{\overline{\alpha }}}(m_i, n_i)&(\text {equation } (49) \\&\ge V_{\delta ,{\overline{\alpha }}}^{-1} \sum _{i \in {\mathcal {I}}} g_{{\overline{\alpha }}}(m_i, n_i) \\&\ge V_{\delta ,{\overline{\alpha }}}^{-1}\sum _{i \in {\mathcal {I}}} \left( g_{{\overline{\alpha }}}(1,6) + \nabla g_{{\overline{\alpha }}}(1,6)\cdot (m_i-1,n_i-6) \right)&(\text {Corollary }4.16) \\&= V_{\delta ,{\overline{\alpha }}}^{-1} \sum _{i \in {\mathcal {I}}} \left( \frac{2-{{\overline{\alpha }}}}{1-{{\overline{\alpha }}}} \, c_6 \, m_i+\kappa (n_i-6) \right)&(\text {equation } (50) \\&= \frac{2-{{\overline{\alpha }}}}{1-{{\overline{\alpha }}}} \, c_6 \, V_{\delta ,{\overline{\alpha }}}^{-1} \sum _{i \in {\mathcal {I}}} m_i + V_{\delta ,{\overline{\alpha }}}^{-1} \kappa \underbrace{\sum _{i=1}^{N_\mu } (n_i-6)}_{\le 0 \text { by } (81)} - V_{\delta ,{\overline{\alpha }}}^{-1} \sum _{j \in {\mathcal {J}}} \kappa (n_j-6) \\&\ge \frac{2-{{\overline{\alpha }}}}{1-{{\overline{\alpha }}}} \, c_6 \, V_{\delta ,{\overline{\alpha }}}^{-1} \sum _{i \in {\mathcal {I}}} m_i + 3 \kappa \, V_{\delta ,{\overline{\alpha }}}^{-1} \# {\mathcal {J}}.&(\kappa <0, \, n_j \ge 3) \end{aligned}$$

Taking the limit \(\delta \rightarrow 0\) and using (79) and (80) proves (75), as required. \(\square \)

5 The Constrained Optimal Location Problem: Proof of Theorem 1.2

This section is devoted to the proof of Theorem 1.2. The idea is to use Theorem 1.1 together with the following relation between \(C_{\mathrm {c}}(\alpha )\) and \(C_{\mathrm {p}}(\alpha )\):

Lemma 5.1

(Relation between the constrained and penalized problems). For all \(\alpha \in (-\infty ,1)\),

$$\begin{aligned} C_{\mathrm {c}}(\alpha )\ge C_{\mathrm {p}}(\alpha )- \frac{c_6}{1-\alpha }. \end{aligned}$$
(82)

Proof

By Remark 3.5,

$$\begin{aligned} C_{\mathrm {c}}(\alpha )&= \lim _{L\rightarrow \infty } L^{\frac{1}{1-\alpha }}\inf \left\{ W^2_2({\mathbbm {1}}_Q,\mu ) : \mu \in {\mathcal {P}}_{\mathrm {d}}(Q), \; \sum _{i=1}^{N_\mu } m_i^\alpha \le L \right\} \\&=\lim _{L\rightarrow \infty } L^{\frac{1}{1-\alpha }}\inf \left\{ W^2_2({\mathbbm {1}}_Q,\mu ) + \frac{c_6}{1-\alpha }L^{\frac{\alpha -2}{1-\alpha }}\sum _{i=1}^{N_\mu } m_i^\alpha - \frac{c_6}{1-\alpha }L^{\frac{\alpha -2}{1-\alpha }}\sum _{i=1}^{N_\mu } m_i^\alpha \right. \\&\quad \quad : \left. \mu \in {\mathcal {P}}_{\mathrm {d}}(Q), \; \sum _{i=1}^{N_\mu } m_i^\alpha \le L \right\} \\&\quad \ge \lim _{L\rightarrow \infty } L^{\frac{1}{1-\alpha }}\inf \left\{ W^2_2({\mathbbm {1}}_Q,\mu ) + \frac{c_6}{1-\alpha }L^{\frac{\alpha -2}{1-\alpha }}\sum _{i=1}^{N_\mu } m_i^\alpha : \mu \in {\mathcal {P}}_{\mathrm {d}}(Q) \right\} - \frac{c_6}{1-\alpha } L^{\frac{1}{1-\alpha }} L^{\frac{\alpha -2}{1-\alpha }} L\\&\quad =\lim _{\delta \rightarrow 0} \left( \frac{c_6}{\delta (1-\alpha )}\right) ^{\frac{1}{2-\alpha }} \inf \left\{ W^2_2({\mathbbm {1}}_Q,\mu )+ \delta \sum _{i=1}^{N_\mu } m_i^\alpha : \mu \in {\mathcal {P}}_{\mathrm {d}}(Q) \right\} - \frac{c_6}{1-\alpha }\\&\quad = C_{\mathrm {p}}(\alpha )- \frac{c_6}{1-\alpha } \end{aligned}$$

by Remark 3.11. In the penultimate equality we used the change of variables \(\delta =\frac{c_6}{1-\alpha }L^{\frac{\alpha -2}{1-\alpha }}\). \(\square \)

Proof of Theorem 1.2

Let \(\alpha \in (-\infty ,{\overline{\alpha }}]\). By Corollary 3.4, in order to prove Theorem 1.2 it is sufficient to prove that \( C_{\mathrm {c}}(\alpha ) = c_6\). We have

$$\begin{aligned} c_6&= C_{\mathrm {c}}(0)&(\text {Remark }~3.6) \\&\ge C_{\mathrm {c}}(\alpha )&(C_{\mathrm {c}}(\alpha ) \text { is non-increasing}) \\&\ge C_{\mathrm {p}}(\alpha )- \frac{c_6}{1-\alpha }&(\text {Lemma }~5.1) \\&= c_6&(\text {Theorem }1.1) \end{aligned}$$

as required. \(\square \)

6 Paving the Way Towards \(\alpha \) = 1

In this section we prove an asymptotic crystallization result for the full range of \(\alpha \in (-\infty ,1)\) under the ansatz (which we are not able to prove) that the neighbouring cells of the smallest ones are not too large.

Theorem 6.1

(Asymptotic crystallization for all \({\alpha \in (-\infty ,1)}\)). Let \(\alpha \in (-\infty ,1)\) and \(\delta >0\). Let \(\mu =\sum _{i=1}^{N_{\mu }} m_i\delta _{z_i} \in {\mathcal {A}}_{\delta ,\alpha }\) be a minimizer of \({\mathcal {F}}_{\delta ,\alpha }\). Let \(i_* \in \{1,\ldots ,N_{\mu } \}\) be the index of the smallest cell: \(m_{i_*}=\min _i m_i\) (\(i_*\) need not be unique). Let T be the optimal transport map defining \(W_2({\mathbbm {1}}_{Q_{\delta ,\alpha }},\mu )\). Assume for all \(\delta \) sufficiently small that the area of the ‘closest neighbour’ to the smallest cell is no more than \(\tfrac{19}{4}\) times the area of the smallest cell, i.e., \(m_j \le \frac{19}{4} m_{i_*}\) for all j such that

$$\begin{aligned} \mathrm {dist}(z_{i_*},\partial T^{-1}(z_{i_*})) = \mathrm {dist}(z_{i_*},\partial T^{-1}(z_{j})) =: d_{i_* j}. \end{aligned}$$

Assume also that, for all \(\delta \) sufficiently small, \(z_{i_*}\) is not too close to the boundary:

$$\begin{aligned} B_{d_{i_* j}}(z_{i_*}) \subset Q_{\delta ,\alpha }. \end{aligned}$$

Then the asymptotic crystallization results (13) and (14) hold.

First we prove an analogue of Lemma 4.11 for \(\alpha \) close to 1.

Lemma 6.2

(Positivity of \(h_{\alpha }\) for \(\alpha \) near 1). There exists \(\varepsilon >0\) such that, for all \(\alpha \in (1-\varepsilon ,1)\) and all integers \(n\ge 3\), the following holds: if \(h_\alpha (m_1,n) \ge 0\) for some \(m_1\ge 0\), then \(h_\alpha (m,n)\ge 0\) for all \(m\ge m_1\).

Proof

Define \(h_1:(0,\infty )\times [3,\infty )\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} h_1(m,n):= \lim _{\alpha \rightarrow 1} h_\alpha (m,n) = c_n m^2-c_6m -c_6 m \ln m -\kappa (n-6). \end{aligned}$$

Step 1. We claim that \(h_\alpha \) converges to \(h_1\) locally uniformly in m and uniformly in n as \(\alpha \rightarrow 1^-\). Let \(m \in (0,\infty )\), \(n \ge 3\), \(\alpha \in (0,1)\). Then

$$\begin{aligned} \frac{1}{c_6} (h_\alpha (m,n)-h_1(m,n)) = \frac{m^{\alpha }-m+(1-\alpha )m \ln m}{1-\alpha } =: \frac{\theta ({\alpha })}{1-\alpha }. \end{aligned}$$

Observe that \(\theta (1)=\theta '(1)=0\) and \(\theta ''(\alpha )=(\ln m)^2 m^\alpha \). By Taylor’s Theorem, there exists \(\beta (\alpha ) \in (\alpha ,1)\) such that

$$\begin{aligned} \frac{1}{c_6} |h_\alpha (m,n)-h_1(m,n)| = \frac{1}{2} (1-\alpha ) |\theta ''(\beta (\alpha ))| = \frac{1}{2} (1-\alpha ) (\ln m)^2 m^{\beta (\alpha )}. \end{aligned}$$

Fix \(m_1, m_2 \in (0,\infty )\). Then

$$\begin{aligned} \lim _{\alpha \rightarrow 1^-}\sup _{\begin{array}{c} m \in [m_1,m_2]\\ n \in [3,\infty ) \end{array}} |h_\alpha (m,n)-h_1(m,n)| \le \lim _{\alpha \rightarrow 1^-} \frac{c_6}{2} (1-\alpha ) \sup _{m \in [m_1,m_2]} (\ln m)^2 m^{\beta (\alpha )} = 0 \end{aligned}$$

as required.

Step 2. Next we study the shape of the function \(m\mapsto h_1(m,n)\). Its derivative is

$$\begin{aligned} \partial _m h_1(m,n) = 2 c_n m-2c_6 -c_6\ln m, \end{aligned}$$
(83)

which is strictly convex in m with \(\lim _{m \rightarrow 0}\partial _m h_1(m,n)=+\infty \), \(\lim _{m \rightarrow \infty }\partial _m h_1(m,n)=+\infty \). Therefore \(m\mapsto \partial _m h_1(m,n)\) has exactly one critical point, which is \(m=\frac{c_6}{2 c_n}\). Since \(n \mapsto c_n\) is decreasing,

$$\begin{aligned} \partial _m h_1 \left( \frac{c_6}{2 c_n},n \right) = -c_6 - c_6 \ln \left( \frac{c_6}{2 c_n} \right) \le -c_6 - c_6 \ln \left( \frac{c_6}{2 c_3} \right)< -0.019 < 0. \end{aligned}$$

Therefore \(m\mapsto h_1(m,n)\) has exactly two critical points, the smallest critical point is a local maximum point, and the largest critical point is a local minimum point. Let \({\widetilde{m}}(n)\) denote the local minimum point. By the calculation above,

$$\begin{aligned} {\widetilde{m}}(n)>\frac{c_6}{2c_n} > \frac{c_6}{2c_3}. \end{aligned}$$
(84)

Define \(\varphi (n)\) to be the value of \(h_1(\cdot ,n)\) at its local minimum point:

$$\begin{aligned} \varphi (n) := h_1({\widetilde{m}}(n),n). \end{aligned}$$

Observe that \(\partial _m h_1(1,6)=0\). Therefore, by (84), the local minimum point of \(h_1\) is \({\widetilde{m}}(6)=1\) and \(\varphi (6)=0\).

Step 3. We claim that there exists a constant \(c>0\) such that \(\varphi (n)>c\) for all integers \(n \ge 3\), \(n\ne 6\). First we show that it is sufficient to prove this for the finite number of cases \(n \in \{3,4,5,7\}\).

Step 3a: Reduction to the case \(n \in \{3,4,5,7\}\). By equation (83), for all \(n \ge 3\),

$$\begin{aligned} \partial _m h_1(1.05,n) = 2 c_n \cdot 1.05 -2c_6 -c_6\ln (1.05) \ge 2 c_\infty \cdot 1.05-2c_6 -c_6\ln (1.05)> 0.005 > 0. \end{aligned}$$

Therefore

$$\begin{aligned} {\widetilde{m}}(n) < 1.05. \end{aligned}$$
(85)

For all \(n \ge 7\),

$$\begin{aligned} \partial _n \varphi (n)&= \frac{\partial h_1}{\partial n} ({\widetilde{m}}(n),n)&(\text {since }\partial _m h_1({\widetilde{m}}(n),n)=0)\\&= {\widetilde{m}}^2(n) \, \partial _n c_n - \kappa \\&\ge {\widetilde{m}}^2(n) \, \partial _n c_n|_{n=7} - \kappa&(\text {by Lemma }~2.5)\\&\ge 1.05^2 \, \partial _n c_n|_{n=7} - \kappa&\text {(by }(85))\\&>4 \times 10^{-4}. \end{aligned}$$

Therefore \(\varphi \) is increasing for \(n \ge 7\). Consequently, for all integers \(n \ge 3\), \(n\ne 6\),

$$\begin{aligned} \varphi (n) \ge \min \{ \varphi (3), \varphi (4), \varphi (5), \varphi (7)\}. \end{aligned}$$
(86)

Step 3b: Proof for the case \(n \in \{3,4,5,7\}\). We show that the right-hand side of (86) is positive. Let

$$\begin{aligned} m_1(n):= {\left\{ \begin{array}{ll} 0.66 &{} n=3, \\ 0.92 &{} n=4, \\ 0.981 &{} n=5, \\ 1.007 &{} n=7, \end{array}\right. } \quad \quad m_2(n):= {\left\{ \begin{array}{ll} 0.661 &{} n=3, \\ 0.93 &{} n=4, \\ 0.982 &{} n=5, \\ 1.0075 &{} n=7. \end{array}\right. } \end{aligned}$$

Then evaluating \(\partial _m h_1\) gives

$$\begin{aligned} \partial _m h_1(m_1(n),n)\le {\left\{ \begin{array}{ll} -7 \times 10^{-5} &{} n=3,\\ -7 \times 10^{-4} &{} n=4,\\ -1 \times 10^{-4} &{} n=5,\\ -5 \times 10^{-5} &{} n=7, \end{array}\right. } \quad \quad \partial _m h_1(m_2(n),n)\ge {\left\{ \begin{array}{ll} 6 \times 10^{-5} &{} n=3,\\ 8 \times 10^{-4} &{} n=4,\\ 4 \times 10^{-5} &{} n=5,\\ 2 \times 10^{-5} &{} n=7. \end{array}\right. } \end{aligned}$$

Let \(n \in \{3,4,5,7\}\). By the Intermediate Value Theorem, the map \(m \mapsto \partial h_1(m,n)\) has a root between \(m_1(n)\) and \(m_2(n)\). Moreover, since \(\partial _m h_1(m_1(n),n) < 0\), we have bracketed the largest root \({\widetilde{m}}(n)\): \(m_1(n)<{\widetilde{m}}(n)<m_2(n)\). Therefore

$$\begin{aligned} \varphi (n)\ge c_n m_1(n)^2 - c_6 m_2(n) - c_6 m_2(n) \ln m_2(n) - \kappa (n-6) \ge {\left\{ \begin{array}{ll} 0.01 &{} n=3,\\ 9\times 10^{-4} &{} n=4,\\ 2\times 10^{-4} &{} n=5,\\ 1\times 10^{-4} &{} n=7. \end{array}\right. } \end{aligned}$$

Combining this with (86) proves that \(\varphi (n)>c >0\) for all integers \(n \ge 3\), \(n\ne 6\).

Step 4. By Step 1 of the proof of Lemma 4.11, the function \(m\mapsto h_\alpha (m,n)\) has exactly two critical points for all \(\alpha \in (0,1)\), \(n \ge 3\). The smallest critical point is a local maximum point and the largest critical point is a local minimum point.

By Step 1, \(h_\alpha \) converges to \(h_1\) uniformly on the interval \([c_6/(2c_3),1.05] \times [3,\infty )\). Observe that the local minimum point \({\widetilde{m}}(n)\) of \(m \mapsto h_1(m,n)\) lies in the interval \([c_6/(2c_3),1.05]\) by (84) and (85). Let n be an integer, \(n \ge 3\), \(n \ne 6\). By Step 3, \(h_1({\widetilde{m}}(n),n)>c>0\). Therefore, for \(\alpha \) sufficiently close to 1, the value of \(m \mapsto h_\alpha (m,n)\) at its local minimum point is also positive by the uniform convergence. This proves Lemma 6.2 for the case \(n \ge 3\), \(n \ne 6\).

Finally, the case \(n=6\) follows immediately from Step 2 of the proof of Lemma 4.11, where it was shown that the value of \(h_\alpha (m,6)\) at its local minimum point is 0 for all \(\alpha \in (0,1)\). \(\square \)

An immediate consequence of Lemma 6.2 is the following convexity inequality:

Corollary 6.3

(Reduction to \(n \in \{3,4,5\}\)). Let \(n \ge 6\) be an integer. If \(\alpha \in (0,1)\) is sufficiently close to 1, then \(h_{\alpha }(m,n)\ge 0\) for all \(m \ge 0\).

Proof

Observe that \(h_{\alpha }(0,n)=-\kappa (n-6) \ge 0\) for all \(n \ge 6\). Therefore the result follows immediately from Lemma 6.2. \(\square \)

Now we are in a position to prove the main theorem of the section.

Proof of Theorem 6.1

By the monotonicity result (Lemma 3.12) we can assume that \(\alpha \in (0,1)\). Let \(\mu \) and \(z_{i_*}\) be as in the statement of Theorem 6.1. To simplify the notation, let \({\overline{z}} = z_{i_*}\), \(m=\mu (\{{\overline{z}}\})=\min \{\mu (\{z\}) : z \in \mathrm {supp}(\mu ) \}\). Let \(z \in \mathrm {supp}(\mu )\) satisfy

$$\begin{aligned} \mathrm {dist}({\overline{z}},\partial T^{-1}({\overline{z}})) = \mathrm {dist}({\overline{z}},\partial T^{-1}(z)) =: d. \end{aligned}$$

Let x belong to the edge \(e_{{\overline{z}}z} := \overline{T^{-1}({\overline{z}})} \cap \overline{T^{-1}(z)}\) and let x satisfy \(|{\overline{z}}-x|=d\), which means that x is the closet point on the boundary of the Laguerre cell \(\overline{T^{-1}({\overline{z}})}\) to \({\overline{z}}\). Define \(M=\mu (z)\). By assumption, \(M \le \frac{19}{4} m\).

Step 1: Upper bound on \(|{\overline{z}}-z|\). The point x lies in the edge \(e_{{\overline{z}}z}\). Therefore by Lemma 2.1 and Lemma 4.5(i) we have

$$\begin{aligned} |x-{\overline{z}}|^2 + \frac{\alpha }{1-\alpha } c_6 m^{\alpha -1} = |x-z|^2 + \frac{\alpha }{1-\alpha } c_6 M^{\alpha -1}. \end{aligned}$$

Moreover, \(x = {\overline{z}} + t (z-{\overline{z}})\) for some \(t \in (0,1)\) (since x is the closest point to \({\overline{z}}\) in the edge \(e_{{\overline{z}}z}\), which has normal \(z-{\overline{z}}\)). Solving for t gives

$$\begin{aligned} x - {\overline{z}} = \left[ \frac{1}{2} + \frac{c_6}{2} \frac{\alpha }{1-\alpha } \frac{M^{\alpha -1}-m^{\alpha -1}}{|{\overline{z}}-z|^2} \right] (z-{\overline{z}}). \end{aligned}$$

The term in square brackets is positive since \({\overline{z}}\) lies in its Laguerre cell \(\overline{T^{-1}({\overline{z}})}\) (Lemma 4.5(ii)). Moreover, the ball \(B_d({\overline{z}})\) is contained in the Laguerre cell \(\overline{T^{-1}({\overline{z}})}\) (since \(e_{{\overline{z}}z}\) is the closest edge to \({\overline{z}}\)). Therefore

$$\begin{aligned} \pi d^2 \le m \quad \Longleftrightarrow \quad \pi \left[ \frac{1}{2} + \frac{c_6}{2} \frac{\alpha }{1-\alpha } \frac{M^{\alpha -1}-m^{\alpha -1}}{|{\overline{z}}-z|^2} \right] ^2 |z-{\overline{z}}|^2 \le m{.} \end{aligned}$$

Let \(R=|z-{\overline{z}}|\). Then we can rewrite the inequality above as

$$\begin{aligned} R^2 - 2\left( \frac{m}{\pi } \right) ^{\frac{1}{2}} R + c_6 \frac{\alpha }{1-\alpha } (M^{\alpha -1}-m^{\alpha -1}) \le 0. \end{aligned}$$

Thus we get the upper bound

$$\begin{aligned} R \le \left( \frac{m}{\pi } \right) ^{\frac{1}{2}} + \left( \frac{m}{\pi } - c_6 \frac{\alpha }{1-\alpha } (M^{\alpha -1}-m^{\alpha -1}) \right) ^{\frac{1}{2}}. \end{aligned}$$
(87)

Step 2: Lower bound on m. Let \({\mathcal {C}}^\mu \) be the partition associated to the minimizer \(\mu \). Define a new partition \({\mathcal {D}}\) by replacing the cells \(\overline{T^{-1}({\bar{z}})}\) and \(\overline{T^{-1}(z)}\) with their union. By Lemma 4.14

$$\begin{aligned} 0&\le F({\mathcal {D}}) - F({\mathcal {C}}^\mu ) \le \frac{c_6}{1-\alpha }\left( (m+M)^{\alpha }-m^{\alpha }-M^{\alpha } \right) + \frac{mM}{m+M} |{\bar{z}}-z|^2. \end{aligned}$$

By Young’s inequality \((a+b)^2 \le 2 (a^2+b^2)\) and (87) we obtain

$$\begin{aligned} 0&\le \frac{c_6}{1-\alpha }\left( (m+M)^{\alpha }-m^{\alpha }-M^{\alpha } \right) + 2 \frac{mM}{m+M} \left[ \frac{2}{\pi } m - c_6 \frac{\alpha }{1-\alpha } (M^{\alpha -1}-m^{\alpha -1}) \right] \\&= \frac{c_6}{1-\alpha } M^\alpha \left( \left( \frac{m}{M} + 1\right) ^{\alpha }-\left( \frac{m}{M} \right) ^{\alpha }-1 \right) + 2 \frac{1}{\frac{m}{M} + 1} m^\alpha \left[ \frac{2}{\pi } m^{2-\alpha }- c_6 \frac{\alpha }{1-\alpha } \left( \left( \frac{M}{m}\right) ^{\alpha -1}-1 \right) \right] . \end{aligned}$$

Define \(\lambda :=\frac{m}{M}\). By assumption, \(\lambda \in [\frac{4}{19},1]\). Then rearranging the previous inequality gives the lower bound

$$\begin{aligned} m ^{2-\alpha } \ge {\overline{m}}_\alpha (\lambda ) := \frac{\pi c_6}{2} \left[ \frac{\alpha }{1-\alpha } (\lambda ^{1-\alpha }-1) - \frac{(\lambda +1)((\lambda +1)^\alpha -\lambda ^\alpha -1)}{2 (1-\alpha ) \lambda ^\alpha } \right] . \end{aligned}$$
(88)

Note that (88) gives a non-trivial lower bound on m only when \({\overline{m}}_\alpha (\lambda )>0\). For each \(\lambda \in [\frac{4}{19},1]\) we have that

$$\begin{aligned} \lim _{\alpha \rightarrow 1}{\overline{m}}_\alpha (\lambda ) = {\overline{m}}_1(\lambda ) :=\frac{\pi c_6}{2} \left( \frac{1-\lambda }{2} \ln \lambda + \frac{(\lambda +1)^2}{2\lambda } \ln (\lambda + 1) \right) . \end{aligned}$$

Step 3: Lower bound on \({\overline{m}}_1(\lambda )\). We claim that \({\overline{m}}_1\) is increasing. We have

$$\begin{aligned} {\overline{m}}_1'(\lambda )&= \frac{\pi c_6}{2} \left( - \frac{\ln \lambda }{2} + \frac{1-\lambda }{2\lambda } + \frac{(\lambda ^2-1)}{2 \lambda ^2} \ln (\lambda +1) + \frac{\lambda +1}{2\lambda } \right) ,\\ {\overline{m}}_1''(\lambda )&= \frac{\pi c_6}{2} \lambda ^{-2} \left( \frac{\ln (\lambda +1)}{\lambda } - \frac{3}{2} \right) =: \frac{\pi c_6}{2} \lambda ^{-2} \phi (\lambda ). \end{aligned}$$

Then

$$\begin{aligned} \phi '(\lambda )&= \frac{\lambda - (\lambda +1)\ln (\lambda +1)}{\lambda ^2 (\lambda +1)} =: \frac{\psi (\lambda )}{\lambda ^2 (\lambda +1)}, \\ \psi '(\lambda )&= - \ln (\lambda +1) < 0. \end{aligned}$$

Therefore, for all \(\lambda \in [\frac{4}{19},1]\), \(\psi (\lambda ) \le \psi (\frac{4}{19})< -0.02 <0\). Hence \(\phi '(\lambda )<0\) and so

$$\begin{aligned} \phi (\lambda ) \le \phi (\tfrac{4}{19})< -0.59< 0, \qquad {\overline{m}}_1''(\lambda ) = \frac{\pi c_6}{2} \lambda ^{-2} \phi (\lambda ) < 0. \end{aligned}$$

We conclude that, for all \(\lambda \in [\frac{4}{19},1]\), \({\overline{m}}_1'(\lambda ) \ge {\overline{m}}_1'(1)=1\), and so \({\overline{m}}_1\) is increasing, as claimed. Therefore

$$\begin{aligned} \inf _{\lambda \in [\frac{4}{19},1]} {\overline{m}}_1(\lambda ) = {\overline{m}}_1(\tfrac{4}{19}) > 0.0125. \end{aligned}$$
(89)

This is essentially a lower bound on m for \(\alpha \) near 1 (we will make this statement precise below) and it is much better than the lower bound you obtain using the method from Lemma 4.15 (but in Lemma 4.15 we did not make the ansatz that \(\lambda \in [\frac{4}{19},1]\)).

Step 4: Convexity inequality for \(h_1\). Numerical evaluation gives

$$\begin{aligned} h_1({\overline{m}}_1(\tfrac{4}{19}),3)&> 4.2 \times 10^{-3}> 0, \\ h_1({\overline{m}}_1(\tfrac{4}{19}),4)&> 5.0 \times 10^{-3}>0, \\ h_1({\overline{m}}_1(\tfrac{4}{19}),5)&> 5.9 \times 10^{-3} >0. \end{aligned}$$

Step 5: Uniform convergence of \({\overline{m}}_\alpha \) to \({\overline{m}}_1\). Set

$$\begin{aligned} \varphi _\alpha (\lambda )&:= \frac{\alpha }{1-\alpha } (\lambda ^{1-\alpha }-1) - \frac{(\lambda +1)((\lambda +1)^\alpha -\lambda ^\alpha -1)}{2 (1-\alpha ) \lambda ^\alpha },\\ \varphi _1(\lambda )&:= \frac{1-\lambda }{2} \ln \lambda + \frac{(\lambda +1)^2}{2\lambda } \ln (\lambda + 1). \end{aligned}$$

By Taylor’s Theorem, for all \(x \in {\mathbb {R}}\),

$$\begin{aligned} |e^x-1-x| =\frac{1}{2} x^2 e^\xi \end{aligned}$$

for some \(\xi (x)\) between 0 and x. Since \(x \mapsto e^x\) is increasing we conclude that

$$\begin{aligned} |e^x - 1 - x| \le \frac{1}{2} x^2 \max \{ 1,e^x \} \end{aligned}$$
(90)

for all \(x \in {\mathbb {R}}\). We estimate

$$\begin{aligned} |{\overline{m}}_\alpha (\lambda ) - {\overline{m}}_1(\lambda )|&= \frac{\pi c_6}{2} |\varphi _\alpha (\lambda ) - \varphi _1(\lambda )| \nonumber \\&\le \frac{\pi c_6}{2} \left| \frac{(\lambda +1)}{2 \lambda ^\alpha } \frac{(\lambda +1)^\alpha -\lambda ^\alpha -1}{1-\alpha } + \frac{(\lambda +1)^2}{2\lambda }\ln (\lambda +1) - \frac{1+\lambda }{2}\ln \lambda \right| \nonumber \\&\quad + \frac{\pi c_6}{2} \left| \frac{\alpha }{1-\alpha } (\lambda ^{1-\alpha }-1) - \ln \lambda \right| . \end{aligned}$$
(91)

For all \(\lambda \in [\frac{4}{19},1]\), we estimate the second term on the right-hand side of (91) as follows:

$$\begin{aligned} \left| \frac{\alpha }{1-\alpha } (\lambda ^{1-\alpha }-1) - \ln \lambda \right|&\le \alpha \left| \frac{1}{1-\alpha } (\lambda ^{1-\alpha }-1) - \ln \lambda \right| + (1-\alpha )|\ln \lambda | \nonumber \\&= \alpha \left| \frac{e^{(1-\alpha )\ln \lambda } - 1 - (1-\alpha )\ln \lambda }{1-\alpha } \right| + (1-\alpha )|\ln \lambda | \nonumber \\&{\mathop {\le }\limits ^{(90)}} \frac{1}{2} \alpha (1-\alpha ) |\ln \lambda |^2 + (1-\alpha )|\ln \lambda | \nonumber \\&\le C(1-\alpha ), \end{aligned}$$
(92)

where \(C>0\) is a constant independent of \(\alpha \) and \(\lambda \). The existence of C follows from the fact that \(\lambda \in [\frac{4}{19},1]\), \(\alpha \in (0,1)\). We estimate the first term on the right-hand side of (91) by

$$\begin{aligned}&\left| \frac{(\lambda +1)}{2 \lambda ^\alpha } \frac{(\lambda +1)^\alpha -\lambda ^\alpha -1}{1-\alpha } + \frac{(\lambda +1)^2}{2\lambda }\ln (\lambda +1) - \frac{1+\lambda }{2}\ln \lambda \right| \nonumber \\&\quad \le \frac{\lambda +1}{2\lambda } \left| \frac{(\lambda +1)^\alpha -\lambda ^\alpha -1}{1-\alpha } + (\lambda +1)\ln (\lambda +1) - \lambda \ln \lambda \right| \nonumber \\&\qquad + \frac{\lambda +1}{2} \left| \frac{1}{\lambda ^\alpha } - \frac{1}{\lambda }\right| \, \left| \frac{(\lambda +1)^\alpha -\lambda ^\alpha -1}{1-\alpha } \right| {.} \end{aligned}$$
(93)

Now we bound the first term on the right-hand side of (93):

$$\begin{aligned}&\left| \frac{(\lambda +1)^\alpha -\lambda ^\alpha -1}{1-\alpha } + (\lambda +1)\ln (\lambda +1) - \lambda \ln \lambda \right| \nonumber \\&\quad \le (\lambda +1)\left| \frac{(\lambda +1)^{\alpha -1} - 1 - (\alpha -1)\ln (\lambda +1)}{1-\alpha } \right| + \lambda \left| \frac{\lambda ^{\alpha -1}-1-(\alpha -1)\ln \lambda }{1-\alpha } \right| \nonumber \\&\quad {\mathop {\le }\limits ^{(90)}} \frac{1}{2} (\lambda +1) (1-\alpha )|\ln (\lambda +1)|^2 +\frac{1}{2} \lambda ^\alpha (1-\alpha )|\ln \lambda |^2 \nonumber \\&\quad \le C(1-\alpha ) \end{aligned}$$
(94)

for some constant \(C>0\) since \(\lambda \in [\frac{4}{19},1]\), \(\alpha \in (0,1)\). Next we estimate the second term on the right-hand side of (93). By Taylor’s Theorem there exists \(\xi _1,\xi _2,\xi _3 \in (\alpha ,1)\) such that

$$\begin{aligned}&\left| \frac{1}{\lambda ^\alpha } - \frac{1}{\lambda }\right| \, \left| \frac{(\lambda +1)^\alpha -\lambda ^\alpha -1}{1-\alpha } \right| \nonumber \\&\quad = \frac{|\ln \lambda | \lambda ^{\xi _1} (1-\alpha ) }{\lambda ^\alpha \lambda } \, \frac{|\lambda + 1 + \ln (\lambda +1)(\lambda +1)^{\xi _2}(\alpha -1) - (\lambda + \ln (\lambda ) \lambda ^{\xi _3} (\alpha -1)) - 1|}{1-\alpha }\nonumber \\&\quad \le C(1-\alpha ) \end{aligned}$$
(95)

for all \(\lambda \in [\frac{4}{19},1]\), \(\alpha \in (0,1)\), where \(C>0\) is a constant independent of \(\alpha \) and \(\lambda \). By using (91)–(95) we get the uniform convergence of \({\overline{m}}_\alpha \) to \({\overline{m}}_1\) on the interval \([\frac{4}{19},1]\).

Step 6: Convexity inequality for \(h_\alpha \) for \(\alpha \) sufficiently close to 1. Recall from (88) that

$$\begin{aligned} \min \{\mu (\{z\}) : z \in \mathrm {supp}(\mu ) \} = m \ge \left( \inf _{[\frac{4}{19},1]}{\overline{m}}_\alpha \right) ^{\frac{1}{2-\alpha }} =: \eta _\alpha . \end{aligned}$$
(96)

Note that the positivity of \(\inf _{[\frac{4}{19},1]}{\overline{m}}_\alpha \) for \(\alpha \) sufficiently close to 1 follows from the positivity of \(\inf _{[\frac{4}{19},1]}{\overline{m}}_1\) and the uniform convergence of \({\overline{m}}_\alpha \) to \({\overline{m}}_1\). By equation (89) we have

$$\begin{aligned} h_\alpha (\eta _\alpha ,n) - h_1({\overline{m}}_1(\tfrac{4}{19}),n) = h_\alpha (\eta _\alpha ,n) - h_1(\eta _\alpha ,n) + h_1(\eta _\alpha ,n) - h_1(\textstyle \inf _{[\tfrac{4}{19},1]}{\overline{m}}_1,n). \end{aligned}$$

By the uniform convergence of \(h_\alpha \) to \(h_1\) (Step 1 of the proof of Lemma 6.2), the uniform convergence of \({\overline{m}}_\alpha \) to \({\overline{m}}_1\) (which implies that \(\eta _\alpha \) converges to \(\inf _{[\tfrac{4}{19},1]}{\overline{m}}_1\)), and the continuity of \(h_1\), we find that

$$\begin{aligned} \lim _{\alpha \rightarrow 1} h_\alpha (\eta _\alpha ,n) = h_1({\overline{m}}_1(\tfrac{4}{19}),n) > 0 \end{aligned}$$
(97)

for all \(n \in \{3,4,5\}\) by Step 4. By (96), (97), Lemma 6.2, and Corollary 6.3 we conclude that \(h_\alpha (\mu (\{z\}),n) \ge 0\) for all \(z \in \mathrm {supp}(\mu )\) and all integers \(n \ge 3\), provided that \(\alpha \) is sufficiently close to 1.

Step 7: Conclusion. By using the convexity inequality \(h_\alpha (\mu (\{z\}),n) \ge 0\) from Step 6, we can conclude the proof using the same argument that we used to prove Theorem 4.17. \(\square \)

7 Proof of Theorem 1.4

The proof is similar in spirit to the analogous result for the case \(\alpha =0.5\) from [18, Theorem 3], although we have to do some extra work to take care of particles near the boundary. The main ingredient is the following stability result due to G. Fejes Tóth [37], which roughly states that if \(\mu \) is a discrete measure on a convex n-gon \(\Omega \), with \(n \le 6\), such that the rescaled quantization error \(\frac{N_\mu }{|\Omega |^2} W_2^2({\mathbbm {1}}_\Omega ,\mu )\) is close to the asymptotically optimal value of \(c_6\), then the support of \(\mu \) is close to a triangular lattice.

Theorem 7.1

(Stability Theorem of G. Fejes Tóth). Let \(\Omega \subset {\mathbb {R}}^2\) be a convex polygon with at most six sides. Let \(\{z_i\}_{i=1}^N\) be a set of N distinct points in \(\Omega \) and let \(\{ V_i \}_{i=1}^N\) be the Voronoi tessellation of \(\Omega \) generated by \(\{z_i\}_{i=1}^N\), i.e.,

$$\begin{aligned} V_i = \{ z \in \Omega : |z-z_i| \le |z-z_j| \; \forall j \in \{1,\ldots ,N\}\}. \end{aligned}$$

Define the defect of the configuration \(\{z_i\}_{i=1}^N\) by

$$\begin{aligned} {\hat{\varepsilon }}(\{z_i\}_{i=1}^N):= \frac{N}{|\Omega |^2} \sum _{i=1}^{N} \int _{V_i} |z-z_i|^2\, \mathrm {d}z - c_6 . \end{aligned}$$

There exist \(\varepsilon _0 >0\) and \(c>0\) such that the following hold. If \(\varepsilon \in (0,\varepsilon _0)\) and \(\{z_i\}_{i=1}^N\) satisfy

$$\begin{aligned} {\hat{\varepsilon }}(\{z_i\}_{i=1}^N)\le \varepsilon , \end{aligned}$$

then, with the possible exception of at most \(N c \varepsilon ^{1/3}\) indices \(i\in \{1,\dots ,N\}\), the following hold:

  1. (i)

    \(V_i\) is a hexagon;

  2. (ii)

    the distance between \(z_i\) and each vertex of \(V_i\) is between \((1 \pm \varepsilon ^{1/3}) \sqrt{\frac{|\Omega |}{N}} \sqrt{\frac{2}{3 \sqrt{3}}}\);

  3. (iii)

    the distance between \(z_i\) and each edge of \(V_i\) is between \((1 \pm \varepsilon ^{1/3}) \sqrt{\frac{|\Omega |}{N}} \sqrt{\frac{1}{2 \sqrt{3}}}\).

To appreciate the geometric significance of this result, note that for a regular hexagon of area \(|\Omega |/N\), the distance between the centre of the hexagon and each vertex is \(\sqrt{\frac{|\Omega |}{N}} \sqrt{\frac{2}{3 \sqrt{3}}}\), and the distance between the centre of the hexagon and each edge is \(\sqrt{\frac{|\Omega |}{N}} \sqrt{\frac{1}{2 \sqrt{3}}}\).

Proof

This was proved in [37] in a much more general setting. A similar result was proved by Gruber in [44]. We make some quick remarks about how the version stated here can be read off from [37]. In the notation of [37, p. 123], we have \(f(t)=t^2\) and

$$\begin{aligned}&r(\mu ,N)=(1+\mu ) r(H_N), \qquad r(H_N) = \sqrt{\frac{|\Omega |}{N}} r(H_1), \qquad r(H_1)=\sqrt{\frac{1}{2 \sqrt{3}}}, \\&R(\mu ,N)=(1-\mu ) R(H_N), \qquad R(H_N) = \sqrt{\frac{|\Omega |}{N}} R(H_1), \qquad R(H_1)=\sqrt{\frac{2}{3 \sqrt{3}}}, \\&h(\mu ,N) = |r(\mu ,N)^2 - R(\mu ,N)^2| = b(\mu ) \frac{|\Omega |}{N}, \qquad b(\mu )=\frac{1}{\sqrt{3}} \left( \frac{1}{6} - \frac{7}{3} \mu + \frac{1}{6} \mu ^2 \right) . \end{aligned}$$

Since f is strictly increasing, or by a direct computation, it is easy to see that the condition \(h(\mu ,N) \ne 0\) stated in [37, equation (3)] holds for all \(\mu \in (0,(2-\sqrt{3})^2)\). Fix any \(\mu \in (0,(2-\sqrt{3})^2)\). Then by [37, Theorem p. 213], there exist \(c=c(\mu )>0\) and \(\varepsilon (\mu )>0\) such that if \(0< {\tilde{\varepsilon }} < \varepsilon (\mu )\) and

$$\begin{aligned} \sum _{i=1}^{N} \int _{V_i} |z-z_i|^2\, \mathrm {d}z - N \left( \frac{|\Omega |}{N} \right) ^2 c_6 \le {\tilde{\varepsilon }} \, b(\mu ) \frac{|\Omega |^2}{N}, \end{aligned}$$

then statements (i)-(iii) of Theorem 7.1 hold. Defining \(\varepsilon ={\tilde{\varepsilon }} \, b(\mu )\) and \(\varepsilon _0 = \varepsilon (\mu ) b(\mu )\) completes the proof. \(\square \)

The other key ingredient is an improved version of the convexity inequality (50) for sufficiently large masses.

Lemma 7.2

(Improved convexity inequality). There exists a constant \(\xi >0\) such that

$$\begin{aligned} \frac{c_6}{1-{\overline{\alpha }}}m^{{\overline{\alpha }}} + c_n m^2 -c_6\left( \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}\right) m-\kappa (n-6) \ge \xi (m-1)^2 \end{aligned}$$

for every integer \(n\ge 3\) and every \( m\ge {\overline{m}}\), where \({\overline{m}}>0\) is given in Lemma 4.15.

Proof

The constant \(\xi >0\) will be chosen as the minimum of several quantities that we are now going to introduce. Recall that

$$\begin{aligned} h_{{\overline{\alpha }}}(m,n) = \frac{c_6}{1-{\overline{\alpha }}}m^{{\overline{\alpha }}} + c_n m^2 -c_6\left( \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}\right) m-\kappa (n-6). \end{aligned}$$

Recall also that \(n\mapsto c_n\) is decreasing with \(\lim _{n\rightarrow \infty }c_n=c_\infty >0\) (see Lemma 2.5) and \(\kappa < 0\). Therefore, for all \(n \ge 3\),

$$\begin{aligned} h_{{\overline{\alpha }}}(m,n)&\ge \frac{c_6}{1-{\overline{\alpha }}}m^{{\overline{\alpha }}} + c_\infty m^2 -c_6\left( \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}\right) m + 3 \kappa \\&= \frac{c_\infty }{2} (m-1)^2 \!+\! \left[ \frac{c_\infty }{2} m^2 \!+\! \left( c_\infty \!-\! c_6\left( \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}\right) \right) m \!+\! \frac{c_6}{1-{\overline{\alpha }}}m^{{\overline{\alpha }}} \!+\! 3 \kappa \!-\! \frac{c_\infty }{2} \right] . \end{aligned}$$

The expression in the square brackets is positive for m sufficiently large. Therefore there exists a constant \(M>0\) (independent of n) such that

$$\begin{aligned} h_{{\overline{\alpha }}}(m,n) \ge \frac{c_\infty }{2} (m-1)^2 \end{aligned}$$
(98)

for all \(n \ge 3\) and all \(m \ge M\). Without loss of generality we can take \(M>2\).

Next we treat the case \(m \in [{\overline{m}},M]\), \(n \ne 6\). Recall from the proof of Lemma 4.11, Step 1, that the map \(m\mapsto h_{{\overline{\alpha }}}(m,n)\) has exactly two critical points. The smallest critical point is a local maximum point and the largest critical point is a local minimum point, denoted by \({\widetilde{m}}({\overline{\alpha }},n)\). Note that \({\widetilde{m}}({\overline{\alpha }},n) \le 3/2 < M\) by Steps 3 and 4 of the proof of Lemma 4.11. Therefore the global minimum of \(m \mapsto h_{{\overline{\alpha }}}(m,n)\) in the interval \([{\overline{m}},M]\) occurs at either \({\widetilde{m}}({\overline{\alpha }},n)\) or \({\overline{m}}\). Define

$$\begin{aligned} p_1 := \min _{\begin{array}{c} n\ge 3\\ n\ne 6 \end{array}} \, h_{{\overline{\alpha }}}({\widetilde{m}}({\overline{\alpha }},n),n), \qquad \qquad p_2 := \min _{\begin{array}{c} n\ge 3\\ n\ne 6 \end{array}} \, h_{{\overline{\alpha }}}({\overline{m}},n). \end{aligned}$$

Observe that \(p_1>0\) by Steps 3 and 4 of the proof of Lemma 4.11, and \(p_2>0\) by Corollary 4.12 and the proof of Corollary 4.16. Putting everything together gives the following for all \(m \in [{\overline{m}},M]\), \(n \ge 3\), \(n \ne 6\):

$$\begin{aligned} h_{{\overline{\alpha }}}(m,n) \ge \min \left\{ p_1, p_2 \right\} \ge \frac{\min \left\{ p_1, p_2 \right\} }{(M-1)^2} (m-1)^2 \end{aligned}$$
(99)

since \(m \in [{\overline{m}},M]\) and \((M-1)^2> 1 > ({\overline{m}}-1)^2\).

Next we treat the case \(n = 6\), \(m \in [{\overline{m}},M]\). Since \(m\mapsto h_{{\overline{\alpha }}}(m,6)\) is of class \(C^2\), \(h_{{\overline{\alpha }}}(1,6)=\partial _m h_{{\overline{\alpha }}}(1,6)=0\), and \(\partial ^2_{mm}h_{{\overline{\alpha }}}(1,6)>0\), there exist \(r \in (0,1/2)\) and \(l>0\) such that

$$\begin{aligned} h_{{\overline{\alpha }}}(m,6)\ge l(m-1)^2 \end{aligned}$$
(100)

for all \(m\in [1-r,1+r]\).

Finally, we treat the case \(n = 6\), \(m \in [{\overline{m}},1-r] \cup [1+r,M]\). The function \(m \mapsto h_{{\overline{\alpha }}}(m,6)\) has no local minima in this interval (by Lemma 4.11, Step 1). Therefore

$$\begin{aligned} h_{{\overline{\alpha }}}(m,6)\ge \min \left\{ h_{{\overline{\alpha }}}({\overline{m}},6), h_{{\overline{\alpha }}}(1-r,6), h_{{\overline{\alpha }}}(1+r,6), h_{{\overline{\alpha }}}(M,6) \right\} =:p > 0. \end{aligned}$$

Hence, for all \(m \in [{\overline{m}},1-r] \cup [1+r,M]\),

$$\begin{aligned} h_{{\overline{\alpha }}}(m,6)\ge \frac{p}{(M-1)^2} (m-1)^2. \end{aligned}$$
(101)

Define

$$\begin{aligned} \xi&:=\min \left\{ \frac{c_\infty }{2}, \frac{\min \left\{ p_1, p_2 \right\} }{(M-1)^2} , l, \frac{p}{(M-1)^2} \right\} . \end{aligned}$$

From (98), (99), (100), (101) we conclude that

$$\begin{aligned} h_{{\overline{\alpha }}}(m,n) \ge \xi (m-1)^2 \end{aligned}$$

for all \(m\ge {\overline{m}}\) and all \(n\ge 3\), as desired. \(\square \)

We are now in position to prove Theorem 1.4. The idea is essentially to bound the defect \({\hat{\varepsilon }}\) from Theorem 7.1 by the defect \(\mathrm {d}\) from Theorem 1.4.

Proof of Theorem 1.4

Step 1. In this step we rescale the energy. Let \(\mu _\delta =\sum _{i=1}^{N_\delta } {\widetilde{m}}_i\delta _{{\widetilde{z}}_i}\in {\mathcal {P}}_{\mathrm {d}}(\Omega )\) satisfy the hypotheses of Theorem 1.4. Define

$$\begin{aligned} m_i := V_{\delta ,{\overline{\alpha }}}{\widetilde{m}}_i, \qquad z_i := V_{\delta ,{\overline{\alpha }}}^{1/2} {\widetilde{z}}_i, \qquad \Omega _{\delta ,{\overline{\alpha }}} := V_{\delta ,{\overline{\alpha }}}^{1/2} \Omega , \qquad \mu := \sum _{i=1}^{N_\delta } m_i \delta _{z_i}. \end{aligned}$$

In analogy with (46), define

$$\begin{aligned} {\mathcal {F}}_{\delta ,{\overline{\alpha }}}(\mu ):= \frac{c_6}{1-{\overline{\alpha }}} \sum _{i=1}^{N_\delta } m_i^{{\overline{\alpha }}} + W_2^2({\mathbbm {1}}_{\Omega _{\delta ,{\overline{\alpha }}}},\mu ). \end{aligned}$$

As in Remark 4.4, it is easy to check that the defect \(\mathrm {d}(\mu _\delta )\) can be rewritten as

$$\begin{aligned} \mathrm {d}(\mu _\delta ) = V_{\delta ,{\overline{\alpha }}}^{-1} {\mathcal {F}}_{\delta ,{\overline{\alpha }}}(\mu ) - \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}} \, c_6. \end{aligned}$$
(102)

Step 2. In this step we estimate the number of small particles: \(\# \{i \in \{1,\ldots ,N_\delta \} : m_i < {\overline{m}} \}\). Despite the fact that Lemma 4.15 was proved for the case where \(\Omega \) is a square, the same argument can be applied to the case where \(\Omega \) is any convex polygon with at most six sides (actually, to any Lipschitz domain). The only changes will be to the constant \(D_0\) and to the lower bound \(m_{\mathrm {b}}\) on the mass of particles close to the boundary (to be precise, for those particles within distance 4 of the boundary of the rescaled domain). The lower bound \({\overline{m}}\) on the mass of particles far from the boundary remains the same. This is the only constant whose specific value matters for us. Therefore we will denote the other two constants by\(D_0\) and \(m_{\mathrm {b}}\) also in this case. Define

$$\begin{aligned} {\mathcal {K}} := \{ i \in \{1,\ldots ,N_\delta \} : m_i < {\overline{m}} \}, \qquad {\mathcal {K}}^c := \{ i \in \{1,\ldots ,N_\delta \} : m_i\ge {\overline{m}} \}. \end{aligned}$$

By Lemma 4.15(i), \({\mathcal {K}} \subseteq {\mathcal {J}}\), where \({\mathcal {J}}\) was defined in equation (76). Define

$$\begin{aligned} U = \left\{ x + y : x \in \partial \Omega _{\delta ,{\overline{\alpha }}}, \; y \in B_{4+D_0}(x) \right\} \cap \Omega _{\delta ,{\overline{\alpha }}}. \end{aligned}$$

Similarly to the proof of (78),

$$\begin{aligned} \# {\mathcal {K}} \le \# {\mathcal {J}} < \frac{|U|}{m_{\mathrm {b}}}. \end{aligned}$$

Fix any \(\eta > 0\). If \(\delta \) is sufficiently small, then

$$\begin{aligned} |U| \le V_{\delta ,{\overline{\alpha }}}^{1/2} (4+D_0) ({\mathcal {H}}^1(\partial \Omega ) + \eta ) \end{aligned}$$

since \(\Omega \) is convex polygon and the Minkowski content of \(\partial \Omega \) equals \({\mathcal {H}}^1(\partial \Omega )\) [2, Theorem 2.106]. Therefore

$$\begin{aligned} \frac{\# {\mathcal {K}}}{V_{\delta ,{\overline{\alpha }}}} \le V_{\delta ,{\overline{\alpha }}}^{-1/2} \frac{(4+D_0) ({\mathcal {H}}^1(\partial \Omega ) + \eta )}{m_{\mathrm {b}}}. \end{aligned}$$
(103)

Step 3. Let \(\xi >0\) be the constant given by Lemma 7.2. Define

$$\begin{aligned} {\widetilde{\beta }}_2 := \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}} c_6 {\overline{m}} - 3\kappa + \xi > 0. \end{aligned}$$

In this step we prove the following lower bound on the defect:

$$\begin{aligned} \mathrm {d}(\mu _\delta )\ge \frac{\xi }{V_{\delta ,{\overline{\alpha }}}} \sum _{i=1}^{N_\delta } (m_i-1)^2 -{\widetilde{\beta }}_2 \frac{\#{\mathcal {K}}}{V_{\delta ,{\overline{\alpha }}}}. \end{aligned}$$
(104)

We estimate

$$\begin{aligned} F_{\delta ,{\overline{\alpha }}}(\mu )&\ge \sum _{i=1}^{N_\delta } g_{{\overline{\alpha }}}(m_i,n_i)&(\text {Lemma }2.3) \\&\ge \sum _{i\in {\mathcal {K}}^c} g_{{\overline{\alpha }}}(m_i,n_i)&(g_{{\overline{\alpha }}} \ge 0)\\&\ge \sum _{i\in {\mathcal {K}}^c} \left[ c_6\left( \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}} \right) m_i +\kappa (n_i-6) +\xi (m_i-1)^2 \right]&(\text {Lemma }~7.2)\\&= \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}c_6 \sum _{i=1}^{N_\delta } m_i + \kappa \sum _{i=1}^{N_\delta }(n_i-6) +\xi \sum _{i=1}^{N_\delta } (m_i-1)^2 \\&\quad -\frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}c_6 \sum _{i\in {\mathcal {K}}} m_i - \kappa \sum _{i\in {\mathcal {K}}}(n_i-6) -\xi \sum _{i\in {\mathcal {K}}} (m_i-1)^2 \\&\ge \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}c_6 V_{\delta ,{\overline{\alpha }}} +\xi \sum _{i=1}^{N_\delta } (m_i-1)^2 \\&\quad -\frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}c_6 \sum _{i\in {\mathcal {K}}} m_i - \kappa \sum _{i\in {\mathcal {K}}}(n_i-6) -\xi \sum _{i\in {\mathcal {K}}} (m_i-1)^2 , \end{aligned}$$

where in the last step we used the fact that \(\kappa <0\) together with Lemma 2.6. Combining this estimate and (102) gives

$$\begin{aligned} \frac{\xi }{V_{\delta ,{\overline{\alpha }}}} \sum _{i=1}^{N_\delta } (m_i-1)^2&\le \mathrm {d}(\mu _\delta ) +\frac{1}{V_{\delta ,{\overline{\alpha }}}} \left[ \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}c_6 \sum _{i\in {\mathcal {K}}} m_i + \kappa \sum _{i\in {\mathcal {K}}}(n_i-6) +\xi \sum _{i\in {\mathcal {K}}} (m_i-1)^2 \right] \\&\le \mathrm {d}(\mu _\delta ) + \frac{\#{\mathcal {K}}}{V_{\delta ,{\overline{\alpha }}}} \left[ \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}c_6 {\overline{m}} - 3\kappa + \xi \right] , \end{aligned}$$

where in the last step we used the fact that \(m_i \in (0,{\overline{m}}]\) for each \(i\in {\mathcal {K}}\). This proves (104).

Step 4. In this step we prove Theorem 1.4(a). We have

$$\begin{aligned} \left| \frac{V_{\delta ,{\overline{\alpha }}}}{N_\delta } + \frac{N_\mu }{V_{\delta ,{\overline{\alpha }}}} - 2 \right|&=\frac{V_{\delta ,{\overline{\alpha }}}}{N_\delta }\left( \frac{V_{\delta ,{\overline{\alpha }}} - N_\delta }{V_{\delta ,{\overline{\alpha }}}} \right) ^2 =\frac{V_{\delta ,{\overline{\alpha }}}}{N_\delta }\left( \frac{1}{V_{\delta ,{\overline{\alpha }}}} \sum _{i=1}^{N_\delta } (m_i-1)\right) ^2 \nonumber \\&\le \frac{1}{V_{\delta ,{\overline{\alpha }}}}\sum _{i=1}^{N_\delta } (m_i-1)^2 \nonumber \\&\le \frac{1}{\xi } \left( \mathrm {d}(\mu _\delta ) + {\widetilde{\beta }}_2 \frac{\#{\mathcal {K}}}{V_{\delta ,{\overline{\alpha }}}} \right) , \end{aligned}$$
(105)

where in the penultimate step we used Jensen’s inequality and in the last step we used (104). Therefore \(\lim _{\delta \rightarrow 0} V_{\delta ,{\overline{\alpha }}} / N_\delta = 1\) by (103) and since \(\lim _{\delta \rightarrow 0} \mathrm {d}(\mu _\delta )=0\) by Theorem 1.1. This proves Theorem 1.4(a). For the future, we record that if \(\delta \) is sufficiently small, then we can read off from (105) that

$$\begin{aligned} \frac{N_\mu }{V_{\delta ,{\overline{\alpha }}}}\le 3. \end{aligned}$$
(106)

Step 5. In this step we prove the technical estimate

$$\begin{aligned} \sum _{i=1}^{N_\delta } m_i^{{\overline{\alpha }}} \ge N(1-{\overline{\alpha }}) + {\overline{\alpha }} V_{\delta ,{\overline{\alpha }}} - (1-{\overline{\alpha }})\frac{V_{\delta ,{\overline{\alpha }}}}{\xi }\mathrm {d}(\mu _\delta ) - (1-{\overline{\alpha }}) {\widetilde{\beta }}_2 \frac{\#{\mathcal {K}}}{\xi }. \end{aligned}$$
(107)

Define \(\phi (x)=x^{{\overline{\alpha }}}\), \(x>0\). Let q be the unique quadratic polynomial such that \(q(0)=\phi (0)\), \(q(1)=\phi (1)\), \(q'(1)=\phi '(1)\):

$$\begin{aligned} q(x):= 1+ {\overline{\alpha }}(x-1) + ({\overline{\alpha }}-1)(x-1)^2. \end{aligned}$$

Let \(\psi = \phi - q\). It is easy to check that \(\psi '''(x)>0\) for all \(x>0\) and hence \(\psi '\) is convex with \(\psi '(1)=0\), \(\psi ''(1) = {\overline{\alpha }}^2 - 3 \alpha + 2 > 0\), \(\lim _{x \rightarrow 0+} \psi '(x)=+\infty \), \(\lim _{x \rightarrow + \infty } \psi '(x)=+\infty \). Therefore \(\psi '(x)=0\) for only two points \(x>0\), one of which is \(x=1\) and the other is \(x^* \in (0,1)\). Moreover, \(\psi (0)=\psi (1)=0\), \(\lim _{x \rightarrow \infty }\psi (x)=+\infty \), and \(\psi '(x)<0\) for \(x \in (x^*,1)\). Therefore \(\psi (x)\ge 0 \) for all \(x \ge 0\). Consequently

$$\begin{aligned} \sum _{i=1}^{N_\delta } m_i^{{\overline{\alpha }}} \ge \sum _{i=1}^{N_\delta } q(m_i) = N_\delta + {\overline{\alpha }}(V_{\delta ,{\overline{\alpha }}} - N_\delta ) - (1-{\overline{\alpha }})\sum _{i=1}^{N_\delta } (m_i-1)^2. \end{aligned}$$

The desired inequality (107) follows by (104).

Step 6. Finally, we bound the defect \({\hat{\varepsilon }}(\{ z_i \}_{i=1}^{N_\delta })\) from Theorem 7.1. Let T be the optimal transport map defining \(W_2({\mathbbm {1}}_{\Omega _{\delta ,{\overline{\alpha }}}},\mu )\). For \(\delta >0\) sufficiently small we have

$$\begin{aligned} {\hat{\varepsilon }}(\{ z_i \}_{i=1}^{N_\delta })&=\frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}^2} \sum _{i=1}^{N_\delta } \int _{V_i} |z-z_i|^2 \, \mathrm {d} z - c_6 = \frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}^2} \int _{\Omega _{\delta ,{\overline{\alpha }}}} \min _i |z-z_i|^2 \, \mathrm {d} z - c_6 \nonumber \\&\le \frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}^2} \sum _{i=1}^{N_\delta } \int _{T^{-1}(x_i)} |z-z_i|^2 \, \mathrm {d} z - c_6 = \frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}^2} W_2({\mathbbm {1}}_{\Omega _{\delta ,{\overline{\alpha }}}},\mu ) - c_6 \nonumber \\&= \frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}} \mathrm {d}(\mu _\delta ) - \frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}^2} \frac{c_6}{1-{\overline{\alpha }}}\sum _{i=1}^{N_\delta } m_i^{{\overline{\alpha }}} +\frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}}\frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}}c_6 - c_6 \nonumber \\&{\mathop {\le }\limits ^{(107)}} \frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}}\left( 1 + \frac{c_6}{\xi } \right) \mathrm {d}(\mu _\delta ) + \frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}^2} \frac{c_6 {\widetilde{\beta }}_2}{\xi } \#{\mathcal {K}} - c_6 \underbrace{\left[ \frac{N_\delta ^2}{V_{\delta ,{\overline{\alpha }}}^2} + \frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}} \frac{{\overline{\alpha }}}{1-{\overline{\alpha }}} - \frac{N_\delta }{V_{\delta ,{\overline{\alpha }}}} \frac{2-{\overline{\alpha }}}{1-{\overline{\alpha }}} + 1\right] }_{= \left( \frac{V_{\delta ,{\overline{\alpha }}}-N_\delta }{V_{\delta ,{\overline{\alpha }}}} \right) ^2}\nonumber \\&{\mathop {\le }\limits ^{(106)}} 3\left( 1 + \frac{c_6}{\xi } \right) \mathrm {d}(\mu _\delta ) + 3\frac{c_6 {\widetilde{\beta }}_2}{\xi } \frac{\#{\mathcal {K}}}{V_{\delta ,{\overline{\alpha }}}} \nonumber \\&{\mathop {\le }\limits ^{(103)}} 3\left( 1 + \frac{c_6}{\xi } \right) \mathrm {d}(\mu _\delta ) + 3\frac{c_6 {\widetilde{\beta }}_2}{\xi } V_{\delta ,{\overline{\alpha }}}^{-1/2} \frac{(4+D_0) ({\mathcal {H}}^1(\partial \Omega ) + \eta )}{m_{\mathrm {b}}}. \end{aligned}$$
(108)

Let \(\varepsilon _0>0\) be the constant given by Theorem 7.1. Define

$$\begin{aligned} \beta _1 = 3\left( 1 + \frac{c_6}{\xi } \right) , \qquad \beta _2 = 3\frac{c_6 {\widetilde{\beta }}_2}{\xi } \frac{(4+D_0) ({\mathcal {H}}^1(\partial \Omega ) + \eta )}{m_{\mathrm {b}}}. \end{aligned}$$

By (108), if \(\delta >0\) is sufficiently small, and if \(\varepsilon \in (0,\varepsilon _0)\) and \(\mu _\delta \) satisfy

$$\begin{aligned} \beta _1 \mathrm {d}(\mu _\delta ) + \beta _2 V_{\delta ,\alpha }^{-1/2} \le \varepsilon , \end{aligned}$$

then

$$\begin{aligned} {\hat{\varepsilon }}(\{ z_i \}_{i=1}^{N_\delta }) \le \varepsilon . \end{aligned}$$

Applying Theorem 7.1 completes the proof. \(\square \)