Topology and Local Optima in Computer Vision

We present an application of persistent homology to the image correspondence problem, also known as image registration, which is used to produce 3D reconstructions of scenery from two or more cameras. We present a novel filtered complex in the sense of persistent homology, and show that nontrivial homology groups in its persistence diagrams correspond to recognizable anomalies in images pairs, such as repeated patterns, which contribute to non-convexity of the relevant cost function. We present examples with actual image pairs, and prove a basic result that the corresponding homology classes are invariant under certain continuous deformations.


Introduction
Suppose we are given a pair of pictures of the same scene from different angles, represented by grayscale images called the source and target, respectively. We will denote them as functions I j ∶ D j → ℝ for j = 1, 2 for the source and target, respectively, where both domains are the unit square D 1 = D 2 = [0, 1] × [0, 1] , and the value of I j (p) = I j (x, y) is the intensity. The goal of image correspondence is to find a suitable transformation T ∶ D 1 → D 2 with the property that E(T) is as small as possible, where E is some cost function measuring the difference between I 1 (x, y) and the the composition I 2 (T(x, y)) . In typical applications, the correspondence T might be considered suitable if it is continuous, differentiable, of bounded variation, or some other niceness criteria. If the source and target come from cameras at the same height (e.g. a person's left and right eyes), it is natural to require that T satisfies the"epipolar" condition that corresponding pairs p, (p) , have the same y-coordinate. A basic example of a cost function is the L 2 -measure which might also contain a term penalizing high variation in . Another possibility if I 1 , I 2 are continuous is to replace the integral in (1) with the supremum over D 1 .
A common problem with the optimization approach is that E( ) may have local minimizers. Some are simply due to noise in the images, which can be resolved by a smoothing of the objective function, for instance by blurring the images. More serious ones occur when the source image matches with more than one region in the target, or when there is occlusion by foreground objects, so that regions in the source do not have a unique match in the target. One way to deal with this is to only study key points of the two images, as is the case for the highly successful SIFT and SURF methods [1,6].
In this paper, rather than smooth away local basins, we propose a novel construction which uses persistent homology to classify them, and identify regions in which they are present. We present a new filtered simplicial complex associated to a pair of images, with the property that its persistent homology groups encode robust categories of correspondences, which respect continuous deformation in a particular sense defined in "A Deformation Invariance Result". In "Practical Examples", we present practical examples in which confounding properties of image pairs such as repeated patterns can be identified with long bars in the persistent diagram of the corresponding complex.
There are several ways such classifiers could be used to address the original correspondence problem. First, if persistent homology detects the presence of one or more correspondences in a particular domain, one could devise algorithms for producing one or more correspondences that represent a given homology class, which is part of the general problem of producing cycle representatives of homology classes when they exist. Such representatives may then serve as highly informed initial guesses that avoid obstructions, due to confounding features of the image detected by topology. Conversely, if there are no persistent homology classes, that represent a certificate that there are no viable correspondences in a particular domain, as described in Corollary 1 below. The solver would therefore be well served to move on to another region of the source images, rely on other cameras, or move on to a different formulation of the correspondence problem. This leads to the possibility of searching for subregions of the source image of maximal size which satisfy the property that topologically, they are expected to contain a unique correspondence.

Definitions
In this section, we recall some relevant concepts from persistent homology, and formulate the correspondence problem.

Persistent Homology
In this paper, a simplicial complex X will mean an abstract simplicial complex with no particular embedding in space. In other words, X is a collection of subsets Δ of some index set S such that if Δ ∈ X , and Δ � ⊂ Δ , then Δ � ∈ X . The subsets Δ � ⊂ Δ are called the faces of Δ . Its geometric realization will be denoted |X|. For any i, let X i denote the i-dimensional simplices. Fix a field , and denote the set of i-chains, i-cycles, i-boundaries, and the boundary operator over by C i (X), Z i (X), B i (X) , and , respectively. In applications, it is most efficient to let be a finite field. If A ⊂ X is a subcomplex, then we have the long exact sequence in (reduced) relative homology, where H i (X, A) is the relative homology group, and is the connecting homomorphism.
If X is induced from a triangulation of a closed region with boundary in ℝ n , then its boundary is the image of a subcomplex X ⊂ X . The orientation on ℝ n determines a well-defined fundamental class where there is a minus sign if the ordering of the indices on Δ is the reverse of the orientation on ℝ n . A filtration function on X is a function f ∶ X → ℝ ∪ {∞} such that whenever Δ � is a face of Δ ∈ X , we have f (Δ � ) ≤ f (Δ) . For any a, the set is a subcomplex, with an inclusion map a,b ∶ X a → X b for a ≤ b . A complex together with a filtration function is called a filtered complex. We will denote C a i (X) = C i (X a ) , and similarly for the cycles, boundaries, and homology groups. Then a,b induces an inclusion map i a,b * ∶ C a i (K) ↪ C b i (X) that commutes with the boundary operator, which in turn induces a map H a i (X) → H b i (X) , that need not be injective or surjective. For a ≤ b , let H a,b i (X) denote the persistent homology group, which is the image of H a i (X) ∈ H b i (X). For any numbers a ≤ b , we have a nonnegative integer The ranks are encoded in the barcode diagram [2,4,9], which is the unordered collection of intervals in ℝ + ∪ {∞} , with the property that It is constructed by assuming that a, b take a discrete set values in ℤ ⋅ ⊂ ℚ . Then considering all the homology groups at once as a graded module The barcode is then determined by decomposing M as a module over a principal ideal domain. For an explanation of how these barcodes that can be generated, we refer to the JavaPlex tutorial [8].

Image Correspondences
Denote a source and target image I 1 , I 2 respectively by functions from the unit square into some set of possible colors C, for instance [0, 1] 3 for color images or [0, 1] for grayscale. We suppose that both images are pictures of the same scene, taken from different predetermined locations and angles. For each p ∈ D 1 belonging to the domain of the source image, we choose a parametrization of the epipolar line of the form

SN Computer Science
for some numbers c, d determined by the relative placement of two cameras, and lower and upper bounds on the x coordinate of corresponding points. In other words, the line parametrizes the points in the target which could be in correspondence with points in the source, assuming the cameras are parallel to the ground, of the same height, and pointed in the same direction. Our examples are all of this form, which is called being rectified, but in general, the parametrization could be more complicated. Now select a triangulation of D 1 represented by an inclusion of a pure 2-dimensional simplicial complex p ∶ |K| → D 1 . Suppose we are also given a continuous distance function d(T, T � ) for every pair of triangles T, T � ∈ D 1 , measuring the distance between the restriction of I 1 to T , and the restriction of I 2 to T ′ . The basic function we will use in the examples section is to affinely map T and T ′ to the unit right triangle whose vertices are (0, 0), (1, 0), and (0, 1), and evaluate the L 2 metric between them. That is, Another extension we will use involves a penalty term when T ′ is highly warped, meaning it is very far from being equilateral, or has the opposite orientation as T .
Define a simplicial correspondence to be an element of the set of functions on the vertex set, representing a piecewise linear function on |K| = D 1 . The cost function E ∶ Γ (K) → ℝ + is given by the worst-case triangle for p a = p(i a ).

Main Construction
In this section, we present the main object of the paper, define the complex described in the introduction in the case of two-dimensional images, and present the general pipeline to be followed in the examples of "Practical Examples".

Illustration in the Continuous Case
We describe the idea in the case of one-dimensional images.
Consider a correspondence to be a continuous increasing function T ∶ D 1 → D 2 , with the property that for some upper bound a > 0 on the dissimilarity between I 1 and I 2 . For the functions in Fig. 1, and a > 0 , there will be infinitely many correspondences. However, one can see that , where the source is on the vertical axis and the target is on the horizontal axis. A correspondence is an increasing continuous function For a relatively small, there are essentially three for these two signals, corresponding to three up and right moving paths from the bottom to the top of (c) avoiding the shaded regions, for there are essentially three groups of them up to continuous deformation, corresponding to the upward and rightward moving paths from the lower boundary of Fig. 1c to the upper one, avoiding the shaded regions which have high dissimilarity (the fact that the three paths only move up and to the right is equivalent to the requirement that T has to be increasing). Let X = D 1 × D 2 be the set of (x, y) pairs in Figure 1c. We have the persistent homology group for the sublevel set filtration one-dimensional version of the classifiers referred to in the introduction.
In the case of two-dimensional image correspondence, we have few differences. First, the domain of an image is now We would then expect the space X from "Illustration in the Continuous Case" to have dimension 4. But by the epipolar condition that corresponding points have the same y-coordinate, we instead take which is three-dimensional. The preimage of the boundary D 1 is now a cylinder, A = −1 ( D 1 ) = S 1 × [0, 1] , where (p, q) = p , and S 1 is identified with the boundary of the square. The dimensions of the persistent homology groups are moved up by one dimension, We have found that the second class [ T] a,b is favorable in practice, for one thing because homology in one-dimension is a smaller computation. In examples of correspondence, we will be interested in the persistence diagram of the filtered vector space containing this class, which is Its elements may be thought of as classes of correspondences on the boundary D 1 , which may be extended to the interior, and therefore map to zero in H 1 (X) . The setup of each of our examples in "Practical Examples" is to present a correspondence problem, exhibit the persistence diagram associated to the filtered vector space on the right side of (9), and show that features of the diagram, i.e. long bars, correspond to pertinent features of the image pair.

Definition of the Filtered Complex
The preceding section shows how persistent homology classes can in principle be used to classify image correspondences up to continuous deformation, if the relative persistent homology groups H a,b i (X, A) can be computed in practice. In this section, we construct a simplicial complex X representing X , together with a filtration function f ∶ X → ℝ ≥0 in place of d. We also define a simplicial complex K associated to a triangulation of the base D 1 , and a complex A ⊂ X representing A ⊂ X , which is also filtered by f.
Choosing a simplicial complex to represent a space is an interesting problem in general. The motivation for the construction of this paper is that it is fibered as a complex of the base X → K representing ∶ X → D 1 , so that  SN Computer Science a correspondence T determines a section ∶ K → X . It is therefore possible to define analogous to (7) and (8). A byproduct of doing this is that X does not come from a triangulation of X , nor is it embeddable in ℝ 3 .
To do this, begin with a triangulation p ∶ |K| → D 1 , and let K ⊂ K denote the boundary, as in "Image Correspondences". The underlying complex of X is given as follows: 1. Choose some natural numbers N i for every vertex i ∈ K 0 , and define the set of 0-simplices as In other words, we have added a 3-simplex whenever it includes two horizontal faces differing only in one coordinate by a j-value of one. 4. Include all the faces of every simplex added thus far, making X a legitimate simplicial complex. The 2-simplices that have been added as a result will only contain two distinct i-values, and will be called vertical faces.
The effect of adding the 3-simplices in item 2 is to "fill in" the space between correspondences. Notice that there is an obvious surjective map ∶ X → K of complexes that forgets the j-values. Let A = −1 ( K) be the subcomplex of X whose i-values lie in the boundary K ⊂ K. We next define a filtration on this complex. For every horizontal 2-simplex, define where d is choice of distance measure from "Image Correspondences". On every 3-simplex, we define the value of f to be the minimum of the two horizontal 2-simplices that are its faces. For every remaining simplex, the weight is inductively defined as the maximum of all simplices for which it is a face. We also obtain a filtration on A by restriction. As described in "Illustration in the Continuous Case", the persistent homology groups we are interested in are the persistence diagrams for the kernel of the map induced by the inclusion To generate a persistence diagram in JavaPlex , we will make use of a workaround introduced in [3], which was used to study the image (11) in the case where A is the ideal Klein bottle, and X is a filtered complex representing a dataset of natural images. In this setup, we select a parameter t 0 > 0 , and define X t 0 to have the same underlying complex as X, but where the persistence value of all interior simplices Δ ∈ X − A (which includes all horizontal simplices) begin at t 0 . This encodes the map A → X into single complex, by having the persistence values in the interior X − A begin at t 0 , by simply shifting the persistence values. We have found this approach to be sufficient for our purposes, though in future applications we expect to study the kernel in (9) directly, for instance using persistence for kernels and images in Dionysus [7].
We now describe how to interpret the persistence diagram of X t 0 in terms of image correspondences. For each image pair, the persistence diagram will show the following types of bar: 1. Multiple short bars: these may be disregarded as noise. 2. Multiple long bars, which begin to the right of the chosen offset parameter t 0 : these represent partial solutions in some subregion in the interior, that do not extend to the boundary values. 3. Long bars, with left endpoint slightly greater than zero, and right endpoint slightly greater than t 0 : they represent elements of the kernel (11), which come from correspondences of the entire source image. In other words, they represent true solutions to the correspondence problem. 4. Even longer bars whose right endpoint is significantly greater than t 0 : They represent correspondences near the periphery of the source image, but do not extend to the entire diagram, and so are not in the kernel (11). In other words, they are partial solutions which solve the correspondence problem near the boundary of the image, but which do not match on some regions in the interior.

Pipeline
Here we give an explicit description of the complex X t 0 that determines classes of correspondence between two images. We assume that we are given two images I 1 and I 2 , as indicated in Fig. 3a. The construction is as follows: 1. Let L be a collection of evenly-distributed landmark points in D 1 , such as a hexagonal lattice (Fig. 3b). 2. Associated with each landmark point p ∈ L , we have a collection of possible images of that landmark point Q(p) in the domain of I 2 , which are the possible q-values from the last section. The set Q could be infinite or finite, and is determined by some prior knowledge about the camera placement or other initial pre-processing. In our example, we have two cameras that are horizontally aligned, so we restrict Q(p) to a horizontal interval, which we discretize to obtain a finite complex, as shown in Fig. 3c. The set Q partially characterizes the "niceness" of the correspondence map that we seek, by restricting the plausible locations that we think that p could land under a "nice" mapping T . The 0-simplices of the complex X are the union ⋃ p∈L Q(p).

Build a Delaunay triangulation of L.
For each triangle T = (x, y, z) in the triangulation, and for every triangle T ′ of the form (u, v, w), with u ∈ Q(x) , v ∈ Q(y) , and w ∈ Q(z) , we do the following: If the shape of T ′ is "similar" to that of T , then add a 2-simplex to X (by "similar", we mean for example that the lengths of the perimeters of T and T ′ are not too different, and that their orientations are the same). Its persistence value is d(T, T � ) , where d(⋅, ⋅) is the distance measure in Eq. (4), or any similar counterpart, such as earth mover's distance. We have done this instead of the minimum taken in Eq. (10) only in the interest of speed, and because in our examples the values do not vary much in that domain, making this an acceptable approximation. In more sensitive applications we would expect to produce each value by solving an actual optimization problem in a highly localized domain. This obviously adds component 1- Fig. 3 The construction of our proposed complex. The source and target images are slightly skewed from one another, to suggest different camera angles of the same scene. The middle of the five triangles in the target image of (d) will have the lowest value of d(T, T � ) SN Computer Science and 0-simplices as well, whose persistence values are not defined yet.
This process is shown in Fig. 3d. 4. Set the persistence values of the lower-dimensional simplices in the natural way: each 1-simplex has a persistence value equal to the minimum persistence value among all 2-simplices containing it, and each 0-simplex has a persistence value equal to the minimum persistence value among all 1-simplices containing it. 5. Choose a value t 0 > 0 which is somewhere between the expected left and right endpoints of the important bars in the persistence diagram. For each simplex Δ (of any dimension), if Δ contains an interior vertex, add t 0 to its persistence value. Call the complex with the new persistence values X t 0 .

Practical Examples
In this section present three example applications, following the description of the pipeline from "Pipeline". Each example contains an image pair and a description of the correspondence problem, technical information such as the choice of the dissimilarity function and the value of the t 0 parameter, and the persistence diagram for the complex X t 0 described at the end of "Definition of the Filtered Complex".

Example: Identical Black Discs
Our first example is primarily a conceptual warmup example which illustrates some of the interesting features in Example 4.3. We choose our source and target to be the two identical black opaque circles shown in Fig. 4. The source image is triangulated by a complex K with 100 equilateral triangles. Our distance function is the L 2 -distance from (4). We remove all 2-simplices of X which are either orientation reversed, or are sheared to a width of more than double that of the based triangle, by setting the persistence score to infinity. The t 0 parameter is chosen to be 4.6 × 10 4 . In place of the desired min in Eq. (5), we take a rough approximation of the min in (10) by only sampling a single value for speed purposes as described in "Pipeline".
The persistence diagram is shown in Fig. 5. The important information is that there are three long bars, shown bolder in the picture, beginning before t 0 . Two of these continue well beyond t 0 , while the lower one stops almost immediately after it. The bar that stops near t 0 represents correspondences which correctly correspond points in the entire image. The two longer bars correspond to two types of correspondence of the boundary which cannot be extended to the entire picture, represented by those which carry the boundary of the source entirely in the white space to the left of the black disc in the target, and those that are entirely on the right.

Example: Dot Mesh
The second correspondence problem is to find a mapping that preserves vertical coordinates between the point clouds shown in Fig. 6a, b. Although not remotely apparent to the naked eye, there are actually two correspondences between the two, as illustrated in Fig. 6c-e. The two triangulations show that there are essentially two different ways to map the source points into the second.
Our distance measure is defined by For speed, we have simply replaced the maximum by the values at a sample we did for the circle example. The 2-simplices in X whose width is more than double that of the associated triangles in the domain of the source image D 1 are effectively removed by setting their persistence score to infinity, and also those triangles in which the orientation   The JavaPlex barcode diagram for the images of black discs in Fig. 4 has been reversed. The t 0 parameter is set to 180. Figure 7 shows the complexes X a for a few choices of a. Notice that the interior triangles begin appearing in Fig. 7c, once the t 0 parameter is passed. The persistence diagram shown in Fig. 8 shows two long bars, which detect the two possible types of correspondence shows in Figs. 6d, e.

Example: Pennies on a Tablecloth
Next consider two real images of a penny sitting on a tablecloth with an interesting pattern shown in Fig. 9. We began by quantizing color values using a filter, and then used a Wasserstein distance function. We eliminated highly skewed triangles using the same criteria from Examples 4.1 and 4.2. The parameter is t 0 = 2.6 × 10 4 .
The persistence diagram is shown in Figure 10. We have removed all barcodes of length < 30 for visual purposes. The  SN Computer Science first Betti number output appears similar to the first example, which is to be expected. However, beyond the extra noise due to the fact that have used a picture from real life, this image has additional features due to the repeating pattern of the tablecloth: 1. We now have interesting Betti zero features (connected components of X) corresponding to the different translations of the tablecloth pattern. They represent the fact that the boundary of the source image can be matched to three different positions on the target. 2. The Betti one features are the same as in Example 4.1, but for a different reason. This time every correspondence of the boundary encircles the penny, but there are three different classes of them due to the pattern. If the left and right borders of the frame in the source are moved in even slightly, so that the border is in the black stripe region at the same height as the source penny, the two additional Betti one barcodes will disappear. This is because the penny in the target will raise the persistence value even for boundary correspondences.

A Deformation Invariance Result
We now define the classes associated to a correspondence described in "Illustration in the Continuous Case", and prove that they are invariant under certain smooth deformations in Theorem 1. The existence of a theorem such as this one is the reason the discussion in each example in "Practical Examples" is valid, as they are implicitly describing classes of correspondences which are equivalent up to homotopy. Let us call an a-correspondence if E( ) < a , denote the set of these elements by Γ (K, a) . Let us also define X <a in the same way as X a but with strict inequality, and the same for A <a . Assume the dissimilarity function d is continuous as a function of the six vertices of a pair of triangles, so that we may regard Γ (K, a) as an open subset U ⊂ ℝ N , where N is the number of landmark points, i.e. zero simplices in K 0 .
We will say that , ′ are b-equivalent, and write ∼ b � if there exists a continuous function h ∶ [0, 1] → Γ (K, b) satisfying h(0) = , h(1) = � . for all s ∈ I . In other words, they are in the same connected component of Γ(K, b) ⊂ ℝ N , since path connected and connected are the same for open subsets. We will be interested in studying the set Γ (K, a)∕∼ b of a-correspondences up to b-equivalence for a ≤ b , the set being empty otherwise. For instance, Γ (K, a)∕∼ a is just the discrete set of connected components of the open set U, whereas choosing b > a results in collapsing more of these components. If ∈ Γ (K, a) , the equivalence class of will be written ̃a ,b ∈ Γ (K, a)∕∼ b .
For any correspondence ∈ Γ (K, a) , we have an injective map of complexes s ∶ K → X by It is a section of , meaning s = Id , which in particular implies that s ( K) ⊂ A . By (10), we can see that if ∈ Γ (K, a) , then s (K) ⊂ X <a .