G2MF-WA: Geometric multi-model fitting with weakly annotated data

In this paper we address the problem of geometric multi-model fitting using a few weakly annotated data points, which has been little studied so far. In weak annotating (WA), most manual annotations are supposed to be correct yet inevitably mixed with incorrect ones. SuchWA data can naturally arise through interaction in various tasks. For example, in the case of homography estimation, one can easily annotate points on the same plane or object with a single label by observing the image. Motivated by this, we propose a novel method to make full use of WA data to boost multi-model fitting performance. Specifically, a graph for model proposal sampling is first constructed using the WA data, given the prior that WA data annotated with the same weak label has a high probability of belonging to the same model. By incorporating this prior knowledge into the calculation of edge probabilities, vertices (i.e., data points) lying on or near the latent model are likely to be associated and further form a subset or cluster for effective proposal generation. Having generated proposals, a-expansion is used for labeling, and our method in return updates the proposals. This procedure works in an iterative way. Extensive experiments validate our method and show that it produces noticeably better results than state-of-the-art techniques in most cases.


Introduction
Geometric model fitting, aims to fit a model with data which contains both inliers and outliers.A well-known example is RANSAC [10], the main idea of which is to generate a number of random model proposals and select the best solution which holds the largest inlier set based on an inlier threshold.The geometric multi-model fitting task further assumes that multiple models are embedded in the input data.Multi-model fitting algorithms have to optimize the global solution, rather than taking a greedy strategy to maximize the inliers for single model exploration like RANSAC.To evaluate the numerous possible solutions, one common way is to design an energy function [2,3,8,12], such that an approximate solution can be achieved via energy minimization/maximization by balancing the geometric errors (data fidelity) and the regularity of inlier clusters (e.g., smoothness, complexity).Although finding the optimal solution is NP-hard [12], α-expansion [3] provides a powerful alternative to find solutions with guaranteed approximation bounds over a given set of model proposals.However, the quality of the solution and the convergence largely depend on the quality of the proposals which greatly influences the overall efficiency and effectiveness.
Most of the methods attempted to improve the quality of model proposals by sampling "clean" subsets of data points from the input data.This paper, however, claims that weakly annotated data (WA data) which has been sparsely treated so far in multi-model fitting tasks, can be exploited to improve the quality of the proposals and further the fitting performance.In Fig. 1, we show an example of weak annotations in the scenario of motion segmentation.One can observe that the four objects in the image are with independent motions according to a pair of two-view images (i.e., two-view motion segmentation problem, only one view is shown here).Due to the camera shake, the movement of feature points on each object may also involve camera motion, yielding the inliers biased and outlier points on objects are hardly distinguishable from inliers by observation.Nevertheless, the annotator can at least tell that points on the same object either belong to As the main technical contribution of this paper, we propose to independently construct a proposal sampling graph with only WA data, apart from the adjacent graph of the α-expansion [3].Inspired by the random clustering model [21], the sampling graph probabilistically forms subsets/clusters to generate model proposals controlled by edge probabilities.By incorporating the prior assumptions mentioned above into the update procedure of the edge probabilities, proposals with high quality can be generated, therefore leading to appealing fitting performance.Extensive experiments validate our approach, and show that it mostly outperforms state-of-the-art methods, in terms of accuracy and runtime.

Related Work
In this section, we first review two popular categories of techniques for multi-model fitting in Sec.2.1 and 2.2, respectively.Then, we focus on investigating efficient proposal generators, which are important to both of the above fitting techniques and closely related to our study.[10] and its variants [4,6,19] belong to a category which aims to estimate the parameters of one model with the greedily largest number of inliers (i.e., maximum consensus).The main philosophy is to iterate the following two steps: (1) generating "good" proposals based on proposal-verification, (2) refining the proposals by maximum consensus.Since RANSAC is efficient in the case of fitting a single model, many researchers work on extending it to multi-model cases [26,27,30].In [26,27], the standard RANSAC is ran sequentially and the model with maximum consensus in the current round will be removed in the next round.On the other hand, in [30], the authors claimed that a parallel fashion is more stable than the sequential fashion in dealing with multiple models.Other common greedy methods are [15,16,24].The authors of [16] tried to solve the multi-model fitting problem in terms of set coverage.In [15,24], data points lay on/near the same model are considered to share similar preferences (a vector consists of proposals sorted according to the residual).This is an important property for grouping points into the same model, which has also been taken into account for edge probability calculation in our work.

Energy-based Methods
It has become predominant to solve the fitting problem under optimization frameworks in recent years.Energyminimization based methods [2,8,12, 29] design a global energy function (i.e., the objective function) to evaluate solutions, and the optimal solution is supposed to be found with the minimum energy value.The energy function can be composed of different terms such as the data fidelity term [12], smoothness term [12], and label term [8].In [29], the multi-model fitting of geometric structures is formulated by the quadratic program, in which the data fidelity and the similarity between associated data are balanced.Most of the energy-based methods follow a two-stage strategy: (1) generating a large number of proposals with random subsets of data, (2) evaluating the quality of the proposal by certain likelihood functions [13].The proposals with large likelihood values are sampled and used for labeling.For more multi-model fitting methods, interested readers can refer to a survey paper [18].

Proposal Generation
Either of the above categories of methods demands highquality proposals to decrease fitting error or increase the convergence speed.In particular, in the case of a large data set, it is computationally impractical to exhaustively evaluate each possible model proposal with full data.Importantly, the number of ground-truth models is usually unknown in realworld tasks.Such challenges motivate the fitting algorithms to discretize the sampling space using subsets of the data and generate proposals by fitting with each subset.Although the proposals can be updated iteratively in a propose-and-refine fashion, different initialization of proposals can lead to differing convergence results.The generation of highquality proposals in the early stage, an important problem in computer vision from a general perspective of robust fitting [17], is crucial for improving the final labeling results for both greedy and energy-based methods.
Instead of full random initialization, many works improve the quality of proposals by utilizing information from inliers [7], certain meta-information (e.g., keypoint matching score) [5,6,25] or sparsity prior [9].The main factors that affect the quality of proposals can be: (1) the inlier rate of a subset, and (2) the size of a subset.In other words, a large subset with a high inlier rate can lead to a high-quality proposal.The contradiction here is that minimal subsets with high inlier rates may amplify the noise [17] while a large subset with a low inlier rate will decrease the efficiency of proposal sampling [17] and lead to an exponential growth of computational cost.Pham et al. [21] alleviated the above contradiction by using the Swendsen-Wang method [23] to improve the efficiency of proposal sampling.
In this paper, we propose to generate high-quality proposals with WA data, which is from a new perspective.Note that a proposal generated from an outlier-free sample is not guaranteed to be consistent with all inliers in practice, which makes the problem challenging even with the aid of weakly annotated data.We will elaborate how to elegantly handle it in next section.

Our Approach
To produce decent multi-model fitting results, our motivation is to effectively generate subsets for generating model proposals with high inlier rates under the guidance of partially and inaccurately labeled data (i.e., weak annotations).The multi-model fitting problem can be formulated mathematically.Specifically, given input data set X = {x i } N i=1 , which contains outliers and weak annotated data (WA data) X = { xu } Z u=1 , multiple unknown models M = {m k } K k=1 are embedded and need to be estimated (m 1 is the outlier model, K is also unknown).Each x i is assigned to a certain m k and the assignment procedure is referred to as labeling, with its result denoted by L = {l i } N i=1 .Each l i indicates that x i is assigned to a certain model in M. From the perspective of energy minimization, this can be generally solved by minimizing the following global energy function.
where the data term D is usually a distance or error metric to evaluate the data fidelity according to the labeling result.In this paper, the residuals in the form of Sampson distance [11] are used.A larger D indicates a larger : subsets for model proposal , × × × Fig. 2 The proposal sampling graph.WA data points xi are the vertices and are divided into two independent subsets (blue or yellow here), according to the connectivity of dotted edges d i j .{d i j } are the binary "bonds" for the random cluster model [21].Here, d 25 , d 45 , d 47 are all equivalent to zero and others are one.w i j denotes the edge probability between the i-th and j-th vertices to probabilistically determine the values of {d i j }.Clusters introduced by {d i j } form model proposals.
error of assigned labels.The smoothness term S is based on the prior assumption that spatially close neighbors are assumed to have the same label with a high probability.The neighbors are defined by a neighborhood system (e.g., Delaunay triangulation), with weights on edges indicating how likely two data points are from the same model.A larger S indicates worse local smoothness.The complexity term O penalizes the complexity (e.g., number of models) of the whole optimization task.The solution exploration by minimizing E is effective and has been validated in many works [12,21].We aim to explore the solution more effectively and efficiently with the help of the WA data, which can be achieved interactively with manual operations, based on the natural fact that the feature points belong to different models are mostly visually distinguishable (e.g., points on images belong to different objects, structures, etc).

Proposal Sampling Graph with WA Data
The solution quality of min M, L E(X, M, L) is closely related to the quality of proposals generated by the data subsets sampled from X.We build a sampling Graph Ĝ = (v, e) from X, apart from the adjacency Graph G built from X.For clarity, we illustrate Ĝ = (v, e) in Fig. 2 under a specific neighbor system.In our implementation, Delaunay triangulation is adopted for constructing the neighbor system as suggested by [8].d i j can be treated as a "switch" to turn on and off the connection between vertices, with the probability determined by corresponding w i j .A certain sample of {d i j } links to a clustering result of X, and each subset is used to calculate the model proposal θ g depending on the task setting.For example, in the case of multihomography detection task, the homography proposal can be solved by the direct linear transformation (DLT) method [11] as long as the number of points in a subspace is four or above.w i j objectively indicates how likely a pair of points ( xi , xj ) belong to a same model.In unsupervised situation, a common idea is to assume the preferences of inliers from the same model over a set of so-far-generated proposals are correlated [5,21].Specifically, let H = {θ g } G i=g be the set of the generated proposals during the iteration, and the residuals of xu ∈ X with respect to each proposal θ in H form a vector This can be viewed as a preference vector quantified by residuals.By sorting r xu i in an ascending order and leave out the elements after the h-th place, the preference permutation can be represented as where each element in p xu is a proposal in H . Then w i j can be updated by the correlation [21] between p xu and p xv in an online fashion as, The main drawback of Eq. 4 is that the confidence of w i j grows with the increase of G.At the beginning of any iterative algorithms, w i j can be with low confidence which hinders the whole algorithm from converging to the correct solution.We propose to utilize the prior knowledge brought by the WA data to make w i j more confident.That is, WA data with a same weak label has a high probability to be assigned to the same model and vice versa.By incorporating this property, Eq. 4 can be reformulated as a weighted function, where The prior distribution can be further learned by more complex distribution models like Gaussian mixture model [22].We found this empirically predetermined Bernoulli distribution works well in our experiments.

Proposal Sampling and Labeling with WA Data
With the proposal sampling graph introduced, the update of proposals and the update of labeling results can be realized by alternately sampling d = {d i j } and optimizing L under the random cluster framework [21], which solves min d, L E instead of min M, L E. Notice that the sampling of d and the optimization of L are respectively conducted with the two graphical models Ĝ and G in our method.G is built with X for α-expansion [3], as illustrated in Fig. 3.The two steps can be summarized as follows: Step ( 1 Fig. 3 An example of constructing a neighbor system with Delaunay triangulation.Each point x i is in X.In the case of high-dimensional data (e.g., feature-point pair is 4D in homography detection), the concatenated data is projected onto the first two principal axes extracted by PCA, and the neighbor system is constructed in this 2D plane.Distant edges are removed.
• P(d i j = 1| li = lj ) := w i j • P(d i j = 1| li lj ) := 0 Step (2) P(L|d).Optimizing L by minimizing Eq. 1 with the current d: • A proposal is generated according to the sampled d on Ĝ • L is updated via α-expansion by taking the new proposal into account.The complexity term O in Eq. 1 is not involved in α-expansion, as our algorithm does not follow the twostage strategy [12,14,29]: generating a huge number of random proposals and conducting labeling based on the proposals.In our method, one proposal is generated and probabilistically included or excluded under the framework of simulated annealing, which will not suffer from the complexity problem.The smooth term S in Eq. 1 follows the Potts model [3] and is defined as (i, j)∈ G c i j s i j , and The smooth prior c i j can be defined with spatial prior, since closer points in the neighbor system can be more likely to fit the same model.For simplicity, we set the c i j as a fixed constant that only penalizes the discontinuity for each edge.
Step (1) generates clusters of WA data, with each cluster indicating a model proposal.

Step (2) uses the proposals to perform labeling, and the labeling result in return encourages
Step (1) to connect the WA points which hold a same label.Obviously, this is a chicken-and-egg problem as the calculation in one of the two steps depends on the result of the other.Good labeling improves clustering and vice versa.An iterative algorithm is a realistic solution for getting rid of this situation.We modify the simulated annealing [21]  Add θ g to H  T := 0.99T and repeat from step 2 until T ≈ 0 27: end if to further involve an optional subjective prior limitation by introducing a variable n f lag ∈ {0, 1} to indicate whether the following assumption holds true (n f lag = 1) or not (n f lag = 0): the number of weak labels equals the number of embedded models (outlier model excluded).The whole procedure is listed in Alg. 1.
To justify the improvement on convergence, we compare our method with SA-RCM [21] in Fig. 4. As the simulated annealing iterations are meta-heuristic, we run 100 times for each method with different random seeds.We observe from Fig. 4 that G2MF-WA (our method) converges faster with lower segmentation errors in most of the trials than SA-RCM.It is easy to tell that G2MF-WA generally achieves convergence in less than 0.5 seconds with the aid of WA data, while SA-RCM still not converges within 1.5 seconds (a) SA-RCM [21] (b) G2MF-WA Fig. 4 Convergence analysis on the dataset hartley from [28].Dotted lines denote the convergence curves with respect to different random seeds.
in some trials.

Experimental Results
We first explain the experimental setup, including the compared techniques and the involved parameters.Then we introduce three applications of our approach: multihomography detection and two-view motion segmentation.

Experimental setup
We compared our method (denoted as G2MF-WA) with two state-of-the-art methods PEARL [12] and SA-RCM [21].Note that the comparisons are not under the same problem setting, as we utilize additional weak annotated data which can be easily obtained.The purpose of the comparisons is to demonstrate that the weak annotations can help achieve better fitting results.Older methods which have been shown to be less accurate in [21] are not included [14,20,29].Parameters of all the methods are carefully tuned based on the authors' implementations for best performance.The settings of each method are explained as follows.
PEARL [8].As a typical two-stage method, PEARL generates a large proposal set Θ at once, followed by energy (Eq. 1) minimization procedure with complexity term.It is unlike SA-RCM and G2MF-WA which expand Θ sequentially from an empty set.The complexity term is formulated using label costs, and counts the number of unique labels in L and penalizes complex solutions.Minimization is realized by running α-expansion iteratively, and the optimum solution after each iteration L corresponds to an optimum Θ ∈ Θ. Θ is then refined with the labeling results L and Θ is replaced with Θ .Iterations are repeated until convergence.It is obvious that the number and quality of proposals in the initial Θ significantly affect the final result.We set |Θ| = 1000 to ensure the accuracy.The minimum iteration number is set to 10 and the maximum iteration number is set to 20 as PEARL often converges within a few iterations.
SA-RCM [21].To control the convergence procedure in a practical way, the minimum iteration number is set to 500 and the maximum iteration number is set to 5000.The iteration is terminated when the energy changes in a small range over iterations.Unlike G2MF-WA, SA-RCM conducts proposal sampling and α-expansion in the same adjacent graph over all data points, which induces more computational cost with Eq. 4 when the sizes of X and G grow.
G2MF-WA has a similar simulated annealing framework to SA-RCM.The hyperparameters are shown in Alg. 1 and the number of iterations is set to be the same as SA-RCM.We generate simulated weak annotations instead of real manual annotations to facilitate the annotation controllability and enable large-scale comparisons..The simulated weak annotations consist of two types of data: (1) N g points are randomly selected from each ground-truth label (except the outlier).( 2) N o points from the ground-truth outlier are selected and assigned with other ground-truth labels randomly.For clarity, we design the following terms based on different weak annotation settings.
• G2MF-WA-A: The segmentation error [21] based on the ground-truth labeling is used as the evaluation criteria, which can be calculated for all the methods.All the experiments are conducted on an off-the-shelf PC with an Intel i7 CPU (3.6GHz) and 32GB RAM.

Application 1: multi-homography detection
Given two views of a scene, a number of feature points from these two images can be extracted and matched by feature matching techniques.Matched points can be further related by a 3 × 3 homography matrix if the points lie on the same planar structure.The goal of multi-homography detection is to recover such homography matrices from a set of matches.One homography corresponds to one model and incorrect matches correspond to the outlier model in our fitting algorithm.The DLT algorithm [1] which requires at least four matches is employed for model generation, and the residual error is calculated by Sampson distance.The full H part of the AdelaideRMF dataset [28] is used in this experiment for fairness purposes.The examples and statistical results are shown in Fig. 5 and Tab. 1, respectively.In Tab. 1, colored cells represent top-3 cells in each row (same dataset, different methods), in terms of median error and average processing time.According to the total number of colored cells in each column, which is summarized in the last row, it is clear that the G2MF-WA-A converges the fastest and G2MF-WA-B and G2MF-WA-C achieve the lowest segmentation error.The processing time includes both sampling and optimization time.In Fig. 5, according to the segmentation error in parentheses, we can find that either the optional assumption holds true (1st row) or not (2nd row), the WA data clearly contributes to improving the performance.

Application 2: two-view motion segmentation
Given two views of a scene and feature point matches, the goal of two-view motion segmentation is to estimate motion models modeled by 3 × 3 fundamental matrices and simultaneously the labeling.Points in a match are supposed to perform the same motion (usually on the same object or background).Outliers correspond to incorrect matches.The full F part of the AdelaideRMF dataset [28] is employed for fairness.The examples and statistical results are shown in Fig. 6 and Tab. 2, respectively.The DLT algorithm which requires at least eight matches is adopted for model generation, and the residual error is calculated by Sampson distance.In the case of G2MF-WA-A in Tab. 2, the results are unavailable as it is impossible to sample correct proposals when N g = 5, which is smaller than 8.We can clearly observe from Tab. 2 that G2MF-WA-B achieves the lowest error yet performs the least efficiently, while G2MF-WA-C performs fastest and is also competitive in achieving low error.
Unlike [21] that only used part benchmark data, we evaluate full benchmarks in both applications.Although the difference between the number of ground-truth labels and the number of WA labels is supposed to affect the final labeling result, our method can still detect the homography/fundamental matrices accurately (e.g., second row in Fig. 5).One potential reason could be the robustness of the α-expansion algorithm to the initial estimate [12].Also, although the WA data imposes priors on the edge probabilities between vertices in the sample graph, which allows the algorithm to generate proposals close to the intent of the annotator, the randomness of proposal generation is still included to keep the proposal diversity.This could be another possible factor that contributes to the above finding.

Application 3: planar augmented reality application
We show a real-world application of planar augmented reality in Fig. 7, which is required to insert multiple prepared images to the planar structures in the scene.Here, the visual satisfaction closely relates to the detection accuracy of the planar surfaces in a scene.Our algorithm is designed to improve the accuracy with the help of additional weak annotations.The WA data is obtained by the users in an interactive style.
Tab. 1 Median results over 100 trials on the multi-homography detection task with full AdelaideRMF dataset (H part).Darker colors represent lower errors (%) and runtime (in seconds), denoted by fuchsia and cyan, respectively.Top-3 cells in each row are colored.

Conclusion
In this paper, we have presented a multi-model fitting method with the assistance of weakly annotated data.The main contribution is to take advantage of the prior knowledge brought by the weakly annotated data, and incorporate it into the calculation of edge probabilities in the proposal sampling graph for effective model proposal generation and further labeling.Extensive experiments demonstrate that our method mostly outperforms the state-of-the-art methods in terms of both accuracy and runtime.
Despite the effectiveness of our method, it still has a few limitations.Since the model proposals are explored heuristically, the fitting performance is likely to depend on random seeding, and the segmentation error might grow while the algorithm gets stuck in local optimum.One potential way to solve these issues is to increase the number of iterations of the simulated annealing algorithm.As the future work, we would like to design an interactive annotation interface and embed it within the proposed framework.Also, we plan to improve the usability of our algorithm by further reducing the user effort.
Tab. 2 Median results over 100 trials on the two-view motion segmentation task with full AdelaideRMF dataset (F part).Darker colors represent lower errors (%) and runtime (in seconds), denoted by fuchsia and cyan, respectively.Top-3 cells in each row are colored.

Fig. 1
Fig. 1 Example of weak annotations in the two-view motion segmentation task (only one view is shown here).(a) Ground-truth labeling.Four objects are with independent motions, and the movement of the camera induces the outliers.(b) Detected feature points which are represented by circles including weakly annotated ones (bi-colored) and unlabeled ones (white).The annotator annotates points on an object with a corresponding bi-colored weak label, based on the fact that they can not distinguish them from outliers.Note that the number of weak labels in (b) is not necessarily equal to the number of ground-truth labels in (a).Best viewed in color.
) P(d| L).Sampling d with the current labeling result of WA data ( L ∈ L on Ĝ):

Fig. 5
Fig. 5 Examples of the multi-homography detection.The first row (elderhallb) shows the situation when the number of weak labels in (d) equals the number of ground-truth labels (1st column) without considering the outlier label (red circle in (a)), i.e., 3 labels in (d) and 4 labels in (a) including the outlier.The second row (barrsmith) shows the situation when the number of weak labels in (i) differs from the ground truth.In (d) and (i), WA data points are shown.Best viewed in color.