Greedy energy minimization can count in binary: point charges and the van der Corput sequence

This paper establishes a connection between a problem in Potential Theory and Mathematical Physics, arranging points so as to minimize an energy functional, and a problem in Combinatorics and Number Theory, constructing ’well-distributed’ sequences of points on [0, 1). Let f:[0,1]→R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f:[0,1] \rightarrow {\mathbb {R}}$$\end{document} be (1) symmetric f(x)=f(1-x)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(x) = f(1-x)$$\end{document}, (2) twice differentiable on (0, 1), and (3) such that f′′(x)>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f''(x)>0$$\end{document} for all x∈(0,1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x \in (0,1)$$\end{document}. We study the greedy dynamical system, where, given an initial set {x0,…,xN-1}⊂[0,1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{x_0, \ldots , x_{N-1}\} \subset [0,1)$$\end{document}, the point xN\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_N$$\end{document} is obtained as xN=argminx∈[0,1)∑k=0N-1f(|x-xk|).\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} x_{N} = \arg \min _{x \in [0,1)} \sum _{k=0}^{N-1}{f(|x-x_k|)}. \end{aligned}$$\end{document}We prove that if we start this construction with the single element x0=0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_0=0$$\end{document}, then all arising constructions are permutations of the van der Corput sequence (counting in binary and reflected about the comma): greedy energy minimization recovers the way we count in binary. This gives a new construction of the classical van der Corput sequence. The special case f(x)=1-log(2sin(πx))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(x) = 1-\log (2 \sin (\pi x))$$\end{document} answers a question of Steinerberger. Interestingly, the point sets we derive are also known in a different context as Leja sequences on the unit disk.


A problem in mathematical physics
A classical question in Mathematical Physics, sometimes known as Thomson's problem [28], is the following: suppose you have N electrons on 2 interacting via a Coulomb potential, what are the stable equilibria? We can associate to any set of N points a notion of energy 1 3 and the main questions are then: (1) what is the minimal energy? (2) what are configurations of points attaining minimal energy? and (3) what do these configurations look like? These old questions are far from answered: the minimal energy is known to have an asymptotic expansion, the first few terms are known; see, e.g. [3-5, 13, 14]. Relatively little is known about minimal energy configurations, they seem to arrange themselves in a hexagonal pattern which is one instance of the crystallization conjecture [2,22,27]. The problem is notoriously difficult: the special case of N = 5 points was only very recently solved by Schwartz [24]. For some special values of N, there are constructions that are known; we refer to the seminal work of Cohn and Kumar [9]. Whenever there are topological obstructions to triangular lattice, the defects seem to localize in scars [8]. Needless to say, many of these questions remain interesting and many of the results carry over to the case of a more general domain instead of 2 and a more general energy functional where f ∶ ℝ → ℝ . The choice f (x) = x −s , Riesz potentials, is among the most popular. The only case where the problem is known to have been widely solved is the case of the one-dimensional torus: for a rather large class of functions, the optimal arrangement is known to be equispaced points [6].
Given the difficulty of these questions, one would assume that it is much more difficult to study the dynamical problem, where one starts with a given set of points {x 0 , … , x N−1 } ⊂ [0, 1) (or possibly just a single point) and then defines the next element in the sequence in a greedy fashion via i.e. adding the point in the location of the minimum of the energy (with the caveat that should the minimum not be unique, then any choice is admissible). One of the main contributions of our paper is that for a wide class of functions f, this dynamical system can indeed be studied for one-periodic functions, i.e. on the one-dimensional torus identified with the unit interval. Moreover, it gives rise to a surprisingly rigid dynamical structure.

A problem in combinatorics and number theory
Suppose we want to distribute a sequence of points X = (x n ) ∞ n=1 evenly over [0, 1) in the most regular fashion-how would one do it? If we knew in advance that we want to place N points, then we would presumably place them at equidistant intervals. However, what if we wanted the sequence to be distributed regularly at all times (i.e. also at all intermediate stages) ? We now make this notion precise and define the extreme discrepancy of the first N points of X as It is easy to see that D N ≥ 1∕N . How small can it be? In particular, is there a sequence X such that D N (X) ≤ cN −1 for all N ∈ ℕ ? This question, originally due to van der Corput, was answered by van Aardenne-Ehrenfest who showed that no such result exists; see [15] for details. The problem was finally solved by Schmidt [23] who proved that D N ≥ cN −1 log N for infinitely many N ∈ ℕ ; see [12,15,20,23] and references therein as well as [16,17] for the currently best constants. Sequences with D N (X) ≤ cN −1 log N for a constant c > 0 and all N are called low discrepancy sequences. In particular, this bound matches classical constructions of sequences, one of the most famous of which is the van der Corput sequence [29,30]: The original definition of the van der Corput sequence is based on writing integers in binary expansion: the n− th element is given by (1) writing the integer in binary, i.e. 22 = 10110 2 (2) inverting the order of the digits 10110 → 01101 (3) writing a comma in front of it and interpreting it as a real number in [0, 1] The van der Corput sequence and its various generalisations are known to be very close to optimal with regards to discrepancy; see [11,12,20].

A possible connection
There are very few known constructions of low discrepancy sequences, most of them are based on structures from Number Theory or Combinatorics. Low discrepancy sequences are important in numerical integration since they ensure smallest possible approximation errors; see [10]. While the theory is very well understood in one dimension, there are to-date two different conjectures of the sharp lower bound on the discrepancy of arbitrary sequences in more dimensions [10]. In particular, it is not clear whether there exist sequences more regular than anything we can construct so far.
Motivated by this, Steinerberger [25,26] recently proposed to study whether regular sequences could be constructed via dynamical systems of the type outlined in (1). More precisely, suppose we are given {x 0 , … , x N−1 } ⊂ [0, 1) , then he proposed to construct x N in a greedy manner as and if the minimum is not unique, any choice is admissible. Steinerberger proves that, independently of the initial conditions, such sequences satisfy D N ≤ cN −1∕2 log N . Moreover, dropping the technical gap condition in (2), he conjectures: (2) x N = arg min We give affirmative answers to these conjectures in Theorem 2.1 in the special case when we are given only one initial point x 0 . It was conjectured that if we start with x 0 = 0 , then the van der Corput sequence is an admissible choice for the greedy selection rule; see Fig. 1. The main purpose of our paper is to prove that this is indeed the case and that it holds at a much larger level of generality for dynamical systems of this type; for a precise statement of this result see Theorem 2.1. Interestingly, we can derive a general discrepancy bound for a subset of the functions considered in Theorem 2.1 and arbitrary initial choices which is slightly worse than Steinerberger's bound for the special case of f (x) = 1 − log(2 sin( |x|)) ; see Sect. 2.6 as well as the extended version of this manuscript on arXiv [21].

Outline
We state our main result Theorem 2.1 in Sect. 2; in particular this section also contains in Sect. 2.3 the definition of the sequences used to characterise the output of the greedy algorithm of Steinerberger as well as an outline of our proof strategy. In Sect. 3, we review properties of the van der Corput sequence. We also recall important generalisations and the method of Faure to calculate the discrepancy of permuted van der Corput sequences. In Sect. 4, we study the discrepancy of the particular generalised van der Corput sequences used to characterise the output of the algorithm. As a main result, we show in Theorem 4.5 resp. in Corollary 1 that all such sequences have the same D N ≤ cN −1 log N; Fig. 1 The set M 10 of the first 10 points (black) of the classical van der Corput sequence and the function ∑ 9 k=0 1 − log(2 sin( �x − x k �)) . The red points are the candidates for the next point as suggested by the greedy algorithm; note that the x 10 = .0101 2 = 5∕16 (colour figure online) discrepancy. In Sect. 5, we relate our results to the greedy algorithm. We show in Theorem 5.5 that the classical van der Corput sequence is an admissible output of the algorithm and we finally prove Theorem 2.1.

Preliminaries
Let b be the set of all permutations of {0, 1, … , b − 1} , i.e. all permutations of the first b non-negative integers. Throughout this paper, we sometimes write a concrete permutation as tuple, e.g. = (0, 2, 4, 1, 3) meaning that 0 is mapped to 0, 1 is mapped to 2, 2 is mapped to 4 and so on; i.e. the number at the i-th position of the tuple is the image of i under . Moreover, for tuples of numbers 1 , 2 of length b 1 , b 2 we write ( 1 , 2 ) for the concatenation of the two tuples; i.e. ( 1 , 2 ) is a tuple of numbers of length b 1 + b 2 . Finally, for permutations (or tuples) of fixed length b, we are sometimes only interested in its initial segment of the first k numbers. In this case we may write = ( k , * ) meaning that the first k elements of the permutation are determined by k and the rest of the permutation can be any arrangement of the remaining b − k numbers.

Permuted van der Corput sequences
Let a j (n) denote the jth coefficient in the binary representation of an integer n, in which a j (n) ∈ {0, 1} and if 0 ≤ n < 2 m , then a j (n) = 0 for all j ≥ m . The binary radical inverse function is defined as Then the classical van der Corput sequence is defined as S 2 = (S 2 (n)) n≥0 .
Faure [11] generalised the definition of the classical van der Corput sequence in two ways. First, he replaced the binary representation of an integer by its general b-adic representation for a fixed integer base b ≥ 2 . This allows for the definition of the b-adic radical inverse function S b , which in turn can be used to define van der Corput sequences in general bases; i.e. S b = (S b (n)) n≥0 . Furthermore, for ∈ b Faure defines the generalised (or permuted) van der Corput sequence S b = (S b (n)) n≥0 for a fixed base b ≥ 0 via the permuted b-adic radical inverse function Every such sequence is uniformly distributed modulo 1; [11, Propriété 3.1.1].

Family of permutations
We inductively define a set of permutations P m ⊂ 2 m in each basis b m = 2 m . We start with b 1 = 2 and P 1 = {(0, 1)} and we obtain the set P m+1 from P m in the following way: We first multiply each permutation ∈ P m with 2 and denote the resulting tuple of numbers as 2 and the set of all such tuples as 2P m . Next, each 2 ∈ 2P m gives rise to 2 m new tuples: For each odd a with 1 ≤ a ≤ 2 m+1 we form a new tuple 2 ⊕ a by adding a to 2 (addition modulo 2 m+1 ). The set of all such tuples is denoted by 2P m ⊕ a . Finally, the set P m+1 is defined as the set of all permutations (2 , 2 � ⊕ a) for , � ∈ P m and odd a with As examples we construct P 2 and P 3 . First, we have that , (0, 2, 3, 1)} . Next, we have and from which we can build P 3 .

Main result and open questions
We can now state our main result.

Theorem 2.1 (Counting in Binary)
for N > 0 and where every global minimum is admissible if it happens not to be unique. Then for every fixed N > 0 there exist an m > 0 with N ≤ 2 m and a ∈ P m such that Remark 1 Note that our assumptions on f imply that f is a strictly convex function on [0, 1], but not every strictly convex function necessarily satisfies f �� (x) > 0 for all x ∈ (0, 1) . However, every uniformly convex function, i.e. f such that f �� (x) ≥ > 0 , satisfies our assumptions.

Remark 2
The point sets we obtain are closely related to Leja sequences on the unit disk.
Let K be a nonempty compact subset of the complex plane. A sequence (a n ) n≥0 of points in K is a Leja sequence for K if the following extremal metric property holds for N ≥ 1, Thus, the (N + 1) st term a N of a Leja sequence must maximise the product of distances to the N previous ones; see [1,18]. There are very few classes of compact sets for which Leja sequences are explicitly known. However, if K is the unit disk then the structure of the corresponding Leja sequences is known and described in [1, Theorem 5 and Corollary 2]. Importantly, the maximum principle implies that these points lie on the boundary of K; i.e. on the unit circle if K is the unit disk. Note that Leja sequences are defined via an explicit functional which is studied on different compact sets. In our setting, we fix the domain and vary the functionals. It is interesting that our minimising sequences have the same structure as the particular Leja sequences on the unit disk.

Remark 3 We believe Theorem 2.1 is a strong indicator that the dynamical version
of the static equilibrium problem in mathematical physics might give rise to interesting structures. In the one-dimensional case, it certainly connects in a very substantial way to structures in number theory. To emphasize this, we explicitly state that: Greedy energy minimization on the one-dimensional torus automatically recovers the way we count in binary.

Remark 4
The result has nontrivial implications for the study of uniform distribution. It starts by providing a novel definition of the van der Corput sequence, i.e. start the greedy algorithm (1) with {0} and always pick the smallest of the suggested minima. The second statement in the main theorem, i.e. discrepancy being preserved over all possible choices, shows that potential theoretic approaches along the lines of what was proposed by Steinerberger [25,26] might indeed have intimate ties to discrepancy. Theorem 2.1 tells us that we can use symmetric and uniformly convex functions to construct the van der Corput sequence via greedy energy minimisation.
We suspect that the classical van der Corput sequence and its permutations studied in this paper form in a way a unique (and natural) link between sequences constructed via greedy energy minimisation and sequences obtained from traditional methods in number theory.

Outline of proof
We prove Theorem 2.1 in 2 steps. In the first step (i.e. in Sect. 4), we calculate the discrepancy of permuted van der Corput sequences S 2 m , ∈ P m . We conclude in Corollary 1 that all such sequences have the same extreme discrepancy as the classical van der Corput sequence S 2 . Our argument uses the general machinery of Faure for the calculation of the discrepancy of permuted van der Corput sequences which we recall in Sect. 3 and is based on various symmetries exhibited by the permutations in P m .
In the second step (i.e. in Sect. 5), we relate our particular family of permuted van der Corput sequences to the greedy algorithm of Steinerberger. We show that every such sequence can be obtained from the algorithm (Lemma 5.8) and that every output of the algorithm can be described by such a sequence (Lemma 5.7). Thus, we obtain a full characterisation of the possible outputs of the greedy algorithm and by the results of the first step we also know the discrepancy of any such output.
This concludes the proof of Theorem 2.1.

General discrepancy bound
Finally, we mention a general discrepancy bound for sequences generated from the greedy algorithm and an arbitrary initial set of values. Let f ∶ [0, 1] → ℝ be as in Theorem 2.1, and in addition assume that f is bounded and wlog normalised to have mean value 0. We can derive a general discrepancy bound for functions f that are bounded and whose Fourier coefficients for a constant c > 0 and all k ≠ 0 . In particular, for functions f satisfying the above, define the sequence x N = arg min  (3) of an infinite sequence is usually quantified with one of several different notions of discrepancy. We put and obtain for the extreme discrepancy, D N (X) , of the first N points of X Note that Faure does not divide the discrepancy by N; i.e. his formulas for the discrepancy (e.g. in [11]) give ND N (X) in our notation.

Self-similarity of van der Corput sequences
The classical van der Corput sequence and its permuted generalisations are self-similar as shown in the following two lemmas. Let denote the set of the first N points of the van der Corput sequence and let for N 2 > N 1 denote a segment of the van der Corput sequence of length N 2 − N 1 .
We can repeat the calculation from the proof of Lemma 3.1 to obtain the following general version for

Discrepancy of permuted van der Corput sequences
In the following, we introduce Faure's system of basic functions which can be used to calculate the discrepancy of generalised van der Corput sequences. The theory of Faure was applied to the study of various point sets and sequences over the last 40 years and is very well explained and illustrated in the recent survey [12]. The exact formulas for the discrepancy of generalised van der Corput sequences are based on a set of elementary functions which are defined for any permutation ∈ b . Let The functions b,h are continuous on [0, 1[, piecewise affine and are extended to the real numbers by periodicity. Moreover, we define For an infinite, one-dimensional sequence X, we set In [11, Théorème 6] the asymptotic constants for van der Corput sequences that are generated from the identity permutation in base b are calculated: Consequently, the only sequence generated from an identity permutation that improves the result of the classical sequence in base 2 is S 3 ; for more information, see the recent survey [20].

Symmetries
The Symmetry Lemma as well as the Swapping Lemma facilitate the analysis of structurally similar permutations and state that a shift, reflection or reversal of a permutation does not change the discrepancy of the generated sequence. This is not surprising since we interpret [0, 1] as a circle.

being the swapping permutation, then we have that
This is shown in [11, Lemme 4.4.1].

Intrication
Faure defined an operation [11, Section 3.4.3] which takes two arbitrary permutations , in bases b and c and outputs a new permutation, ⋅ in base b ⋅ c . The motivation for this definition comes from the following property which was first noted in [11,Proposition 3.4.3].

Lemma 3.5 (Intrication)
For ∈ b and ∈ c define ⋅ ∈ bc as for 0 ≤ k ′ < b and 0 ≤ k ′′ < c . Then, such that Remark 5 If we set = , then the intrication ⋅ gives a permutation in base b 2 whose -function is the function F 2 defined in (5). In this special case, the new permutation generates the same sequence as the original permutation.
In particular, we see that repeated intrications of the permutation (0, 1) generates permutations in bases b m = 2 m of a particular form: In the notation of Lemma 3.5, let b = c = 2 and let = = (0, 1) . Then we get If we take now the resulting permutation (0, 2, 1, 3) in base b 2 = 4 and form the intrication with (0, 1) then we see that the new permutation in base 8 is obtained from two copies of the permutation in base 4; i.e. setting 2 ∶= ⋅ we can write the new permutation as Using Lemma 3.5 it is easy to see that this holds in general:
Hence, every permutation m is contained in the set P m which we defined in Sect. 2.3.

A family of permutations
In this section, we study the discrepancy of sequences generated from permutations in P m ⊂ 2 m . The main result of this section is the observation that all the functions 2 m are identical for ∈ P m . Thus, using Lemma 3.6, we see that all permutations in P m generate permuted van der Corput sequences with the same asymptotic discrepancy constant (in fact, with the same discrepancy) as the classical van der Corput sequence. We summarise the main observations of this section in Corollary 1.
We start with an important technical observation. To illustrate this definition, observe that the two sets {0, 1, 2, 4, 6} and {3, 5, 7} are 3-inverse. Being m-inverse implies an important equality. We formulate the next lemma in the specialised way in which we will use it later. To prepare for the proof of the main theorem of this section, we make one more technical observation, which we state again in the special form in which we will use it below.

Lemma 4.4 Let
∈ P m and let � ∶= • 2 m , in which 2 m is the swapping permutation. Let k denote the first k elements of 2 ⊕ a and � 2 m −k denote the first 2 m − k elements of 2 ′ ⊕ a for an odd a with 1 ≤ a ≤ 2 m+1 and addition modulo 2 m+1 . Then Proof Note that the swapping permutation simply reverses the elements of ; i.e.
for all 0 ≤ i ≤ 2 m − 1 . Hence, the last 2 m − k elements of are exactly the first 2 m − k elements of ′ . ◻  Proof We prove the theorem by induction on m. The theorem is trivially true for m = 1 . Now let m = 2 . It is easy to see that both permutations give rise to the same -function. To turn to the induction step, we assume that the assertion is true up to m. In particular this means that all permutations in the set P m generate permuted van der Corput sequences with identical -function.
By Since we picked an arbitrary permutation in P m+1 , we conclude that every permutation in P m+1 has the same -function.

3 5 Relation to algorithm and proof of Theorem 2.1
In the following, we relate the results of Sects. 3 and 4 to the greedy algorithm and prove Theorem 2.1.  The functional studied by Steinerberger [25,26] is an example of an admissible function; for x ∈ (0, 1) see Fig. 2 for an illustration.  Proof We write N = N 1 = ∑ k j=1 2 m j in its binary representation such that m k > m k−1 > … > m 1 is a decreasing sequence of nonzero integers. According to this representation, we split the set M N into disjoint subsets such that there is one subset M j containing 2 m j points for every m j ; in particular we writẽ  Finally, we observe that every sequence generated by a permutation in P m can be realised by the algorithm (when starting with {0} ) and that for a fixed 2 m−1 < N ≤ 2 m every output of the algorithm can be found as initial segment of a sequence S 2 m with ∈ P m .
Now we prove the lemma by induction on m ≥ 2 . We assume the assertion is true for all N ≤ 2 m ; i.e. for every admissible set {x 0 , x 1 , … , x N−1 } with N ≤ 2 m there exists a ∈ P m with x i = S 2 m (i) for all 0 ≤ i ≤ N − 1.
In particular, this means that P m contains a permutation encoding every possible sequence of decisions we take when distributing the first 2 m points using the greedy algorithm and starting with x 0 = 0 . For N = 2 m we always get exactly one admissible set; i.e.
To make the induction step from m to m + 1 , we first use Lemma 5.4 to see that the algorithm suggests 2 m minima as candidates for x 2 m , i.e. the (N + 1) st point, which are of the form i∕2 m + 1∕2 m+1 for 0 ≤ i ≤ 2 m − 1 . Picking any of these 2 m minima, the important observation is that the function H 2 m +1 = ∑ 2 m h=0 f h and H 2 m +1 − H 2 m =H 1 = f 2 m attain their minima at the same set of arguments. This can be seen as in the proof of Theorem 5.5: We look at the intersection of a nested sequence of sets (the set of arguments of minima for H 2 m contains the set of arguments of minima of H 1 ) and this intersection does not change, i.e. is still the set of arguments of minima of H 1 , if we discard the outermost set. We can repeat this argument for any of the following Ns with 2 m + 2 ≤ N ≤ 2 m+1 . In particular, we see that once we have picked the point for N = 2 m + 1 we encounter the same tree of decisions as for the first 2 m points, just with a different start value. Now it is easy to see that we have indeed a permutation (2 , 2 � ⊕ a) in P m+1 for every path through the decision tree of the algorithm. Looking at the construction of P m+1 we see that by our assumption there is a permutation, , in P m for every initial choice of first 2 m points. Then we can accommodate for any choice of (N + 1) st point via the addition of an odd a modulo 2 m+1 . And finally, once we picked a, i.e. the new starting value a∕2 m + 1∕2 m+1 for 0 ≤ a ≤ 2 m − 1 , we can realise our second (shifted by the constant a) run through the decision tree to determine the remaining 2 m − 1 points via an appropriate permutation ′ in P m .
Thus, if the assertion holds for m, then it also holds for m + 1 . Together with Corollary 1 this completes the proof of Theorem 2.1.