Abstract
We verify two related divide-and-conquer algorithms solving one of the fundamental problems in Computational Geometry, the Closest Pair of Points problem. Using the interactive theorem prover Isabelle/HOL, we prove functional correctness and the optimal running time of \(\mathcal {O}(n \log n)\) of the algorithms. We generate executable code which is empirically competitive with handwritten reference implementations.
Download conference paper PDF
1 Introduction
The Closest Pair of Points or Closest Pair problem is one of the fundamental problems in Computational Geometry: Given a set
of
points in \(\mathbb {R}^d\), find the closest pair of P, i.e. two points \(p_0 \in P\) and \(p_1 \in P\) (\(p_0 \ne p_1\)) such that the distance between \(p_0\) and \(p_1\) is less than or equal to the distance of any distinct pair of points of P.
Shamos and Hoey [25] are one of the first to mention this problem and introduce an algorithm based on Voronoi diagrams for the planar case, improving the running time of the best known algorithms at the time from \(\mathcal {O}(n^2)\) to \(\mathcal {O}(n \log n)\). They also prove that this running time is optimal for a deterministic computation model. One year later, in 1976, Bentley and Shamos [2] publish a, also optimal, divide-and-conquer algorithm to solve the Closest Pair problem that can be non-trivially extended to work in arbitrary dimensions. Since then the problem has been the focus of extensive research and a multitude of optimal algorithms have been published. Smid [24] provides a comprehensive overview over the available algorithms, including randomized approaches which improve the running time even further to \(\mathcal {O}(n)\).
The main contribution of this paper is the first verification of two related functional implementations of the divide-and-conquer algorithm solving the Closest Pair problem for the two-dimensional Euclidean plane with the optimal running time of \(\mathcal {O}(n \log n)\). We use the interactive theorem prover Isabelle/HOL [17, 18] to prove functional correctness as well as the running time of the algorithms. In contrast to many publications and implementations we do not assume all points of
to have unique
-coordinates which causes some tricky complications. Empirical testing shows that our verified algorithms are competitive with handwritten reference implementations. Our formalizations are available online [23] in the Archive of Formal Proofs.
This paper is structured as follows: Sect. 2 familiarizes the reader with the algorithm by presenting a high-level description that covers both implementations. Section 3 presents the first implementation and its functional correctness proof. Section 4 proves the running time of \(\mathcal {O}(n \log n)\) of the implementation of the previous section. Section 5 describes our second implementation and illustrates how the proofs of Sects. 3 and 4 need to be adjusted. We also give an overview over further implementation approaches. Section 6 describes final adjustments to obtain executable versions of the algorithms in target languages such as OCaml and SML and evaluates them against handwritten imperative and functional implementations. Section 7 concludes.
1.1 Related Verification Work
Computational geometry is a vast area but only a few algorithms and theorems seem to have been verified formally. We are aware of a number of verifications of convex hull algorithms [4, 14, 20] (and a similar algorithm for the intersection of zonotopes [12]) and algorithms for triangulation [3, 7]. Geometric models based on maps and hypermaps [6, 22] are frequently used.
Work on theorem proving in geometry (see [15] for an overview) is also related but considers fixed geometric constructions rather than algorithms.
1.2 Isabelle/HOL and Notation
The notation
means that term
has type
. Basic types include
,
,
and
; type variables are written
,
etc; the function space arrow is
. Functions
and
return the first and second component of a pair.
We suppress numeric conversion functions, e.g.
, except where that would result in ambiguities for the reader.
Most type constructors are written postfix, e.g.
and
. Sets follow standard mathematical notation. Lists are constructed from the empty list
via the infix cons-operator
. Functions
and
return head and tail, function
converts a list into a set.
2 Closest Pair Algorithm
In this section we provide a high-level overview of the Closest Pair algorithm and give the reader a first intuition without delving into implementation details, functional correctness or running time proofs.
Let P denote a set of \(n \ge 2\) points. If \(n \le 3\) we solve the problem naively using the brute force approach of examining all \(n \atopwithdelims ()2\) possible closest pair combinations. Otherwise we apply the divide-and-conquer tactic.
We divide P into two sets \(P_L\) and \(P_R\) along a vertical line l such that the sizes of \(P_L\) and \(P_R\) differ by at most 1 and the x-coordinate of all points \(p_L \in P_L\,(p_R \in P_R)\) is
(
).
We then conquer the left and right subproblems by applying the algorithm recursively, obtaining \((p_{L0},\;p_{L1})\) and \((p_{R0},\;p_{R1})\), the respective closest pairs of \(P_L\) and \(P_R\). Let \(\delta _L\) and \(\delta _R\) denote the distance of the left and right closest pairs and let \(\delta = min \;\delta _L\;\delta _R\). At this point the closest pair of P is either \((p_{L0},\; p_{L1})\), \((p_{R0},\,p_{R1})\) or a pair of points \(p_0 \in P_L\) and \(p_1 \in P_R\) with a distance strictly less than \(\delta \). In the first two cases we have already found our closest pair.
Now we assume the third case and have reached the most interesting part of divide-and-conquer algorithms, the
step. It is not hard to see that both points of the closest pair must be within a \(2\delta \) wide vertical strip centered around l. Let \( ps \) be a list of all points in P that are contained within this \(2\delta \) wide strip, sorted in ascending order by y-coordinate. We can find our closest pair by iterating over \( ps \) and computing for each point its closest neighbor. But in the worst case \( ps \) contains all points of P, and we might think our only option is to again check all \(n \atopwithdelims ()2\) point combinations. This is not the case. Let
denote an arbitrary point of \( ps \), depicted as the square point in Fig. 1. If
is one of the points of the closest pair, then the distance to its closest neighbor is strictly less than \(\delta \) and we only have to check all points \(q \in set\ ps \) that are contained within the \(2\delta \) wide horizontal strip centered around the y-coordinate of
.
In Sect. 4 we prove that, for each \(p \in set\ ps \), it suffices to check only a constant number of closest point candidates. This fact allows for an implementation of the
step that runs in linear time and ultimately lets us achieve the familiar recurrence of \(T(n) = T(\lceil n/2 \rceil ) + T(\lfloor n/2 \rfloor ) + \mathcal {O}(n)\), which results in the running time of \(\mathcal {O}(n \log n)\).
We glossed over some implementation details to achieve this time complexity. In Sect. 3 we refine the algorithm and in Sect. 4 we prove the \(\mathcal {O}(n \log n)\) running time.
3 Implementation and Functional Correctness Proof
We present the implementation of the divide-and-conquer algorithm and the corresponding correctness proofs using a bottom-up approach, starting with the
step. The basis for both implementation and proof is the version presented by Cormen et al. [5]. But first we need to define the closest pair problem formally.
A point in the two-dimensional Euclidean plane is represented as a pair of (unbounded) integersFootnote 1. The library HOL-Analysis provides a generic distance function
applicable to our point definition. The definition of this specific
instance corresponds to the familiar Euclidean distance measure.
The closest pair problem can be stated formally as follows: A set of points P is \(\mathbf {\delta }\)-sparse iff \(\delta \) is a lower bound for the distance of all distinct pairs of points of P.

We can now state easily when two points
and
are a closest pair of P:
,
,
and (most importantly)
.
In the following we focus on outlining the proof of the sparsity property of our implementation, without going into the details. The additional set membership and distinctness properties of a closest pair can be proved relatively straightforwardly by adhering to a similar proof structure.
3.1 The Combine Step
The essence of the
step deals with the following scenario: We are given an initial pair of points with a distance of \(\delta \) and a list \( ps \) of points, sorted in ascending order by y-coordinate, that are contained in the \(2\delta \) wide vertical strip centered around l (see Fig. 1). Our task is to efficiently compute a pair of points with a distance \(\delta ' \le \delta \) such that \( ps \) is \(\delta '\)-sparse. The recursive function
achieves this by iterating over \( ps \), computing for each point its closest neighbor by calling the recursive function find_closest that considers only the points within the shaded square of Fig. 1, and updating the current pair of closest points if the newly found pair is closer together. We omit the implementation of the trivial base cases.


There are several noteworthy aspects of this implementation. The recursive search for the closest neighbor of a given point p of
starts at the first point spatially above p, continues upwards and is stopped early at the first point whose vertical distance to p is equal to or exceeds the given \(\delta \). Thus the function considers, in contrast to Fig. 1, only the upper half of the shaded square during this search. This is sufficient for the computation of a closest pair because for each possible point q preceding p in \( ps \) we already considered the pair \((q,\,p)\), if needed, and do not have to re-check \((p,\,q)\) due to the commutative property of our closest pair definition. Note also that \(\delta \) is updated, if possible, during the computation and consequently the search space for each point is limited to a \(2\delta \times \delta '\) rectangle with \(\delta ' \le \delta \). Lastly we intentionally do not minimize the number of distance computations. In Sect. 6 we make this optimization for the final executable code.
The following lemma captures the desired sparsity property of our implementation of the
step so far. It is proved by induction on the computation.
Lemma 1

where
means that
is sorted in ascending order by y-coordinate.
We wrap up the
step by limiting our search for the closest pair to only the points contained within the \(2\delta \) wide vertical strip and choosing as argument for the initial pair of points of find_closest_pair the closest pair of the two recursive invocations of our divide-and-conquer algorithm with the smaller distance \(\delta \).

Lemma 2 shows that if there exists a pair
of distinct points with a distance
, then both its points are contained in the mentioned vertical strip, otherwise we have already found our closest pair
, and the pair returned by
is its initial argument.
Lemma 2

We then can prove, additionally using Lemma 1, that
computes indeed a pair of points
such that our given list of points
is (
)-sparse.
Lemma 3

One can also show that
and
are in
and distinct (and thus a closest pair of
), if
(
) and
(
) are in
(
) and distinct.
3.2 The Divide & Conquer Algorithm
In Sect. 2 we glossed over some implementation detail which is necessary to achieve to running time of \(\mathcal {O}(n \log n)\). In particular we need to partition the given listFootnote 2 of points \( ps \) along a vertical line l into two lists of nearly equal length during the divide step and obtain a list \( ys \) of the same points, sorted in ascending order by y-coordinate, for the
step in linear time at each level of recursion.
Cormen et al. propose the following top-down approach: Their algorithm takes three arguments: the set of points
and lists
and
which contain the same set of points
but are respectively sorted by
and
-coordinate. The algorithm first splits
at
into two still sorted lists
and
and chooses
as either the
-coordinate of the last element of
or the
-coordinate of the first element of
. It then constructs the sets
and
respectively consisting of the points of
and
. For the recursive invocations it needs to obtain in addition lists
and
that are still sorted by
-coordinate and again respectively refer to the same points as
and
. It achieves this by iterating once through
and checking for each point if it is contained in
or not, constructing
and
along the way.
But this approach requires an implementation of sets. In fact, if we want to achieve the overall worst case running time of \(\mathcal {O}(n \log n)\) it requires an implementation of sets with linear time construction and constant time membership test, which is nontrivial, in particular in a functional setting. To avoid sets many publications and implementations either assume all points have unique
-coordinates or preprocess the points by applying for example a rotation such that the input fulfills this condition. For distinct
-coordinates one can then compute
and
by simply filtering
depending on the
-coordinate of the points relative to
and eliminate the usage of sets entirely.
But there exists a third option which we have found only in Cormen et al. where it is merely hinted at in an exercise left to the reader. The approach is the following. Looking at the overall structure of the closest pair algorithm we recognize that it closely resembles the structure of a standard mergesort implementation and that we only need
for the
step after the two recursive invocations of the algorithm. Thus we can obtain
by merging ‘along the way’ using a bottom-up approach. This is the actual code:


Function
expects a list of points
sorted by
-coordinate. The construction of
,
and
is analogous to Cormen et al. In the base case we then sort
by
-coordinate and compute the closest pair using the brute-force approach (
). The recursive call of the algorithm on
returns in addition to the closest pair of
the list
, containing all points of
but now sorted by
-coordinate. Analogously for
and
. Furthermore, we reuse function
from our
implementation, which we utilize to presort the points by
-coordinate, to obtain
from
and
in linear time at each level of recursion.
Splitting of
is performed by the function
via a simple linear pass over
. Our implementation of
sorts a list of points depending on a given projection function,
for ‘by
-coordinate’ and
for ‘by
-coordinate’.
Using Lemma 3, the functional correctness proofs of our mergesort implementation and several auxiliary lemmas proving that closest_pair_rec also sorts the points by y-coordinate, we arrive at the correctness proof of the desired sparsity property of the algorithm.
Theorem 1

Corollary 1 together with Theorems 2 and 3 then show that the pair
is indeed a closest pair of
.
Corollary 1

Theorem 2

Theorem 3

4 Time Complexity Proof
To formally verify the running time we follow the approach in [16]. For each function
we define a function
that takes the same arguments as
but computes the number of function calls the computation of
needs, the ‘time’. Function
follows the same recursion structure as
and can be seen as an abstract interpretation of
. To ensure the absence of errors we derive
and
from a monadic function that computes both the value and the time but for simplicity of presentation we present only
and
. We also simplify matters a bit: we count only expensive operations where the running time increases with the size of the input; in particular we assume constant time arithmetic and ignore small additive constants. Due to reasons of space we only show one example of such a ‘timing’ function,
, which is crucial to our time complexity proof.

We set the time to execute
computations to
since it is a combination of cheap operations. For the base cases of recursive functions we fix the computation time to be equivalent to the size of the input. This choice of constants is arbitrary and has no impact on the overall running time analysis but leads in general to ‘cleaner’ arithmetic bounds.
4.1 Time Analysis of the Combine Step
In Sect. 2 we claimed that the running time of the algorithm is captured by the recurrence
, where
is the length of the given list of points. This claim implies an at most linear overhead at each level of recursion. Splitting of the list
, merging
and
and the filtering operation of the
step run in linear time. But it is non-trivial that the function
, central to the
step, also exhibits a linear time complexity. It is applied to an argument list of, in the worst case, length
, iterates once through the list and calls
for each element. Consequently our proof obligation is the constant running time of
or, considering our timing function, that there exists some constant
such that
holds in the context of the
step.
Looking at the definition of
we see that the function terminates as soon as it encounters the first point within the given list
that does not fulfill the predicate (
), the point
being an argument to
, or if
is a list of length
. The corresponding timing function
computes the number of recursive function calls, which is, in this case, synonymous with the number of examined points. For our time complexity proof it suffices to show the following bound on the result of
. The proof is by induction on the computation of
. The function
is an abbreviation for
.
Lemma 4

Therefore we need to prove that the term
does not depend on the length of
. Looking back at Fig. 1, the square point representing
, we can assume that the list
is distinct and sorted in ascending order by
-coordinate. From the precomputing effort of the
step we know that its points are contained within the
wide vertical strip centered around
and can be split into two sets
(
) consisting of all points which lie to the left (right) of or on the line
. Due to the two recursive invocations of the algorithm during the conquer step we can additionally assume that both
and
are
-sparse, suggesting the following lemma which implies
and thus the constant running time of
.
Lemma 5

Proof
The library HOL-Analysis defines some basic geometric building blocks needed for the proof. A closed box describes points contained within rectangular shapes in Euclidean space. For our purposes the planar definition is sufficient.

The box is ‘closed’ since it includes points located on the border of the box. We then introduce some useful abbreviations:
-
The rectangle
is the upper half of the shaded square of Fig. 1:
-
The set
consists of all points of
that are encompassed by
:
-
The squares
and
denote the left and right halves of
:
-
The set
holds all points of
that are contained within the square
. The definition of
is analogous:
Let additionally
abbreviate the term
. First we prove
: Let
denote an arbitrary point of
. We know
because the list
and therefore
is sorted in ascending order by
-coordinate and
due to the filter predicate (
). Using the additional facts
(derived from our assumption that all points of
are contained within the
strip) and the definitions of
,
and
we know
and thus
. Since the list
maintains the distinctness property of
we additionally have
. Our intermediate goal immediately follows because
holds for the finite set
.
But how many points can there be in
? Lets first try to determine an upper bound for the number of points of
. All its points are contained within the square
whose side length is
. Moreover, since
is
-sparse and
,
is also
-sparse, or the distance between each distinct pair of points of
is at least
. Therefore the cardinality of
is bounded by the number of points we can maximally fit into
, maintaining a pairwise minimum distance of
. As the left-hand side of Fig. 2 depicts, we can arrange at most four points adhering to these restrictions, and consequently have
. An analogous argument holds for the number of points of
. Furthermore we know
due to our assumption
and the fact
and can conclude
. Our original statement then follows from
. \(\square \)
Note that the intermediate proof for the bound on
relies on basic human geometric intuition. Indeed Cormen et al. [5] and most of the proofs in the literature do. But for a formal proof we have to be rigorous. First we show two auxiliary lemmas: The maximum distance between two points in a square
with side length
is less than or equal to \(\sqrt{2}\delta \).
Lemma 6

The proof is straightforward. Both points are contained within the square
, the difference between their
and
coordinates is hence bounded by
and we finish the proof using the definition of the Euclidean distance. Below we employ the following variation of the pigeonhole principle:
Lemma 7

Finally we replace human intuition with formal proof:
Lemma 8

Proof
Let
denote the square with a side length of
and suppose, for the sake of contradiction, that
. Then
can be split into the four congruent squares
along the common point
as depicted by the right-hand side of Fig. 2. Since all points of
are contained within
and
we have
. Using Lemma 7 and our assumption
we know there exists a square
and a pair of distinct points
and
. Lemma 6 and the fact that all four sub-squares have the same side length
shows that the distance between
and
must be less than or equal to
and hence strictly less than
. But we also know that
is a lower bound for this distance because
,
,
and our premise that
is
-sparse, a contradiction. \(\square \)
4.2 Time Analysis of the Divide & Conquer Algorithm
In summary, the time to evaluate
is constant in the context of the
step and thus evaluating
as well as
takes time linear in
.
Next we turn our attention to the timing of
and derive (but do not show) the corresponding function
. At this point we could prove a concrete bound on
. But since we are dealing with a divide-and-conquer algorithm we should, in theory, be able to determine its running time using the ‘master theorem’ [5]. This is, in practice, also possible in Isabelle/HOL. Eberl [8] has formalized the Akra-Bazzi theorem [1], a generalization of the master theorem. Using this formalization we can derive the running time of our divide-and-conquer algorithm without a direct proof for
. First we capture the essence of
as a recurrence on natural numbers representing the length of the list argument of
:

The time complexity of this recurrence is proved completely automatically:
Lemma 9

Next we need to connect this bound with our timing function. Lemma 10 below expresses a procedure for deriving complexity properties of the form

where
is a timing function on the data domain, in our case lists. The function
is a measure on that data domain,
is a recurrence or any other function of type
and
is the set of valid inputs. The term ‘
’ should be read as ‘if the measured size of valid inputs is sufficiently large’ and utilizes Eberls formalization of Landau Notation [9] and the “filter” machinery of asymptotics in Isabelle/HOL [11]. For readability we omit stating the filter and
explicitly in the following and just state the conditions required of the input
. The measure
always corresponds to the
function.
Lemma 10

Lemma 11

Using Lemmas 9, 10 and 11 we arrive at Theorem 4, expressing our main claim: the running time of the divide-and-conquer algorithm.
Theorem 4

Since the function
only presorts the given list of points using our mergesort implementation and then calls
we obtain Corollary 2 and finish the time complexity proof.
Corollary 2
For
inputs:
5 Alternative Implementations
In the literature there exist various other algorithmic approaches to solve the closest pair problem. Most of them are closely related to our implementation of Sect. 3, deviating primarily in two aspects: the exact implementation of the
step and the approach to sorting the points by
-coordinate we already discussed in Subsect. 3.2. We present a short overview, concentrating on the
step and the second implementation we verified.
5.1 A Second Verified Implementation
Although the algorithm described by Cormen et al. is the basis for our implementation of Sect. 3, we took the liberty to optimize it. During execution of
our algorithm searches for the closest neighbor of
within the rectangle
, the upper half of the shaded square
of Fig. 1, and terminates the search if it examines points on or beyond the upper border of
. Cormen et al. originally follow a slightly different approach. They search for a closest neighbor of
by examining a constant number of points of
, the first
to be exact. This is valid because there are at most
points within
, not counting
, and the
th point of
would again lie on or beyond the upper border of
. This slightly easier implementation comes at the cost of being less efficient in practice. Cormen et al. are always assuming the worst case by checking all
points following
. But it is unlikely that the algorithm needs to examine even close to
points, except for specifically constructed inputs. They furthermore state that the bound of
is an over-approximation and dare the reader to lower it to
as an exercise. We refrain from doing so since a bound of
suffices for the time complexity proof of our, inherently faster, implementation. At this point we should also mention that the specific optimization of Sect. 3 is not our idea but rather an algorithmic detail which is unfortunately rarely mentioned in the literature.
Nonetheless we can adapt the implementation of Sect. 3 and the proofs of Sect. 4 to verify the original implementation of Cormen et al. as follows: We replace each call of
by a call to
where
iterates in brute-force fashion through its argument list to find the closest neighbor of
. To verify this implementation we then reuse most of the elementary lemmas and proof structure of Sects. 3 and 4, only a slightly adapted version of Lemma 5 is necessary. Note that this lemma was previously solely required for the time complexity proof of the algorithm. Now it is already necessary during the functional correctness proof since we need to argue that examining only a constant number of points of
is sufficient. The time analysis is overall greatly simplified: A call of the form
runs in constant time and we again are able to reuse the remaining time analysis proof structure of Sect. 4. For the exact differences between both formalizations we encourage the reader the consult our entry in the Archive of Formal Proofs [23].
5.2 Related Work
Over the years a considerable amount of effort has been made to further optimize the
step. Central to these improvements is the ‘complexity of computing distances’, abbreviated CCP in the following, a term introduced by Zhou et al. [26] which measures the number of Euclidean distances computed by a closest pair algorithm. The core idea being, since computing the Euclidean distance is more expensive than other primitive operations, it might be possible to improve overall algorithmic running time by reducing this complexity measure. In the same paper they introduce an optimized version of the closest pair algorithm with a CCP of \(2n \log n\), in contrast to \(7n \log n\) which will be the worst case CCP of the algorithm of Sect. 3 after we minimize the number of distance computations in Sect. 6. They improve upon the algorithm presented by Preparata and Shamos [21] which achieves a CCP of \(3n \log n\). Ge et al. [10] base their, quite sophisticated, algorithm on the version of Zhou et al. and prove an even lower CCP of \(\frac{3}{2}n \log n\) for their implementation. The race for the lowest number of distance computations culminates so far with the work of Jiang and Gillespie [13] who present their algorithms ‘Basic-2’Footnote 3 and ‘2-Pass’ with a respective CCP of \(2n \log n\) and (for the first time linear) \(\frac{7}{2}n\).
6 Executable Code
Before we explore how our algorithm stacks up against Basic-2 (which is surprisingly the fastest of the CCP minimizing algorithms according to Jiang and Gillespie) we have to make some final adjustments to generate executable code from our formalization.
In Sect. 3 we fixed the data representation of a point to be a pair of mathematical ints rather than mathematical reals. During code export Isabelle then maps, correctly and automatically, its abstract data type
to a suitable concrete implementation of (arbitrary-sized) integers; for our target language OCaml using the library ‘zarith’. For the data type
this is not possible since we cannot implement mathematical reals. We would instead have to resort to an approximation (e.g. floats) losing all proved guarantees in the process. But currently our algorithm still uses the standard Euclidean distance and hence mathematical reals due to the
function. For the executable code we have to replace this distance measure by the squared Euclidean distance. To prove that we preserve the correctness of our implementation several small variations of the following lemma suffice:

We apply two further code transformations. To minimize the number of distance computations we introduce auxiliary variables which capture and then replace repeated computations. For all of the shown functions that return a point or a pair of points this entails returning the corresponding computed distance as well. Furthermore we replace recursive auxiliary functions such as
by corresponding tail-recursive implementations to allow the OCaml compiler to optimize the generated code and prevent stackoverflows. To make sure these transformations are correct we prove lemmas expressing the equivalence of old and new implementations for each function. Isabelles code export machinery can then apply these transformations automatically.
Now it is time to evaluate the performance of our verified code. Figure 3 depicts the running time ratios of several implementations of the algorithm of Sect. 3 (called Basic-
) and Basic-7 (the original approach of Cormen et al.) over Basic-2. Basis-
is tested in three variations: the exported (purely functional) Isabelle code and equivalent handwritten functional and imperative implementations to gauge the overhead of the machine generated code. All algorithms are implemented in OCaml, use our bottom-up approach to sorting (imperative implementations sort in place) of Subsect. 3.2 and for each input of uniformly distributed points 50 independent executions were performed. Remarkably the exported code is only about 2.28Footnote 4 times slower than Basic-2 and furthermore most of the difference is caused by the inefficiencies inherent to machine generated code since its equivalent functional implementation is only 11% slower than Basic-2. Basic-7 is 2.26 times slower than the imperative Basic-
which demonstrates the huge impact the small optimization of Subsect. 5.1 can have in practice.
7 Conclusion
We have presented the first verification (both functional correctness and running time) of two related closest pair of points algorithms in the plane, without assuming the
coordinates of all points to be distinct. The executable code generated from the formalization is competitive with existing reference implementations. A challenging and rewarding next step would be to formalize and verify a closest pair of points algorithm in arbitrary dimensions. This case is treated rather sketchily in the literature.
Notes
- 1.
We choose integers over reals because be we cannot implement mathematical reals. See Sect. 6. Alternatively we could have chosen rationals.
- 2.
Our implementation deals with concrete lists in contrast to the abstract sets used in Sect. 2.
- 3.
Pereira and Lobo [19] later independently developed the same algorithm and additionally present extensive functional correctness proofs for all Minkowski distances.
- 4.
We measure differences between running times as the average over all data points weighted by the size of the input.
References
Akra, M., Bazzi, L.: On the solution of linear recurrence equations. Comput. Optim. Appl. 10(2), 195–210 (1998). https://doi.org/10.1023/A:1018373005182
Bentley, J.L., Shamos, M.I.: Divide-and-Conquer in multidimensional space. In: Proceedings of Eighth Annual ACM Symposium on Theory of Computing, STOC 1976, pp. 220–230. ACM (1976). https://doi.org/10.1145/800113.803652
Bertot, Y.: Formal verification of a geometry algorithm: a quest for abstract views and symmetry in Coq proofs. In: Fischer, B., Uustalu, T. (eds.) ICTAC 2018. LNCS, vol. 11187, pp. 3–10. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02508-3_1
Brun, C., Dufourd, J., Magaud, N.: Designing and proving correct a convex hull algorithm with hypermaps in Coq. Comput. Geom. 45(8), 436–457 (2012). https://doi.org/10.1016/j.comgeo.2010.06.006
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)
Dufourd, J.: An intuitionistic proof of a discrete form of the Jordan Curve Theorem formalized in Coq with combinatorial hypermaps. J. Autom. Reasoning 43(1), 19–51 (2009). https://doi.org/10.1007/s10817-009-9117-x
Dufourd, J.-F., Bertot, Y.: Formal study of plane delaunay triangulation. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 211–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14052-5_16
Eberl, M.: Proving Divide and Conquer complexities in Isabelle/HOL. J. Autom. Reasoning 58(4), 483–508 (2016). https://doi.org/10.1007/s10817-016-9378-0
Eberl, M.: Verified real asymptotics in Isabelle/HOL. In: Proceedings of the International Symposium on Symbolic and Algebraic Computation, ISSAC 2019. ACM, New York (2019). https://doi.org/10.1145/3326229.3326240
Ge, Q., Wang, H.T., Zhu, H.: An improved algorithm for finding the closest pair of points. J. Comput. Sci. Technol. 21(1), 27–31 (2006). https://doi.org/10.1007/s11390-006-0027-7
Hölzl, J., Immler, F., Huffman, B.: Type classes and filters for mathematical analysis in Isabelle/HOL. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 279–294. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39634-2_21
Immler, F.: A verified algorithm for geometric zonotope/hyperplane intersection. In: Certified Programs and Proofs, CPP 2015, pp. 129–136. ACM (2015). https://doi.org/10.1145/2676724.2693164
Jiang, M., Gillespie, J.: Engineering the Divide-and-Conquer closest pair algorithm. J. Comput. Sci. Technol. 22(4), 532–540 (2007)
Meikle, L.I., Fleuriot, J.D.: Mechanical theorem proving in computational geometry. In: Hong, H., Wang, D. (eds.) ADG 2004. LNCS (LNAI), vol. 3763, pp. 1–18. Springer, Heidelberg (2006). https://doi.org/10.1007/11615798_1
Narboux, J., Janicic, P., Fleuriot, J.: Computer-assisted theorem proving in synthetic geometry. In: Sitharam, M., John, A.S., Sidman, J. (eds.) Handbook of Geometric Constraint Systems Principles. Discrete Mathematics and Its Applications, Chapman and Hall/CRC (2018)
Nipkow, T.: Verified root-balanced trees. In: Chang, B.-Y.E. (ed.) APLAS 2017. LNCS, vol. 10695, pp. 255–272. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71237-6_13
Nipkow, T., Klein, G.: Concrete Semantics with Isabelle/HOL. Springer, Heidelberg (2014). http://concrete-semantics.org
Nipkow, T., Paulson, L., Wenzel, M.: Isabelle/HOL – A Proof Assistant for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45949-9
Pereira, J.C., Lobo, F.G.: An optimized Divide-and-Conquer algorithm for the closest-pair problem in the planar case. J. Comput. Sci. Technol. 27(4), 891–896 (2012)
Pichardie, D., Bertot, Y.: Formalizing convex hull algorithms. In: Boulton, R.J., Jackson, P.B. (eds.) TPHOLs 2001. LNCS, vol. 2152, pp. 346–361. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44755-5_24
Preparata, F.P., Shamos, M.I.: Computational Geometry: An Introduction. Springer, Heidelberg (1985). https://doi.org/10.1007/978-1-4612-1098-6
Puitg, F., Dufourd, J.: Formalizing mathematics in higher-order logic: a case study in geometric modelling. Theor. Comput. Sci. 234(1–2), 1–57 (2000). https://doi.org/10.1016/S0304-3975(98)00228-X
Rau, M., Nipkow, T.: Closest pair of points algorithms. Archive of Formal Proofs, Formal proof development, January 2020. http://isa-afp.org/entries/Closest_Pair_Points.html
Sack, J.R., Urrutia, J. (eds.): Handbook of Computational Geometry. North-Holland Publishing Co., Amsterdam (2000)
Shamos, M.I., Hoey, D.: Closest-point problems. In: 16th Annual Symposium on Foundations of Computer Science (SFCS 1975), pp. 151–162, October 1975. https://doi.org/10.1109/SFCS.1975.8
Zhou, Y., Xiong, P., Zhu, H.: An improved algorithm about the closest pair of points on plane set. Comput. Res. Dev. 35(10), 957–960 (1998)
Acknowledgements
Research supported by DFG grants NI 491/16-1 and 18-1.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Rau, M., Nipkow, T. (2020). Verification of Closest Pair of Points Algorithms. In: Peltier, N., Sofronie-Stokkermans, V. (eds) Automated Reasoning. IJCAR 2020. Lecture Notes in Computer Science(), vol 12167. Springer, Cham. https://doi.org/10.1007/978-3-030-51054-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-51054-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51053-4
Online ISBN: 978-3-030-51054-1
eBook Packages: Computer ScienceComputer Science (R0)