Automatically Selecting Inference Algorithms for Discrete Energy Minimisation
 7k Downloads
Abstract
Minimisation of discrete energies defined over factors is an important problem in computer vision, and a vast number of MAP inference algorithms have been proposed. Different inference algorithms perform better on factor graph models (GMs) from different underlying problem classes, and in general it is difficult to know which algorithm will yield the lowest energy for a given GM. To mitigate this difficulty, survey papers [1, 2, 3] advise the practitioner on what algorithms perform well on what classes of models. We take the next step forward, and present a technique to automatically select the best inference algorithm for an input GM. We validate our method experimentally on an extended version of the OpenGM2 benchmark [3], containing a diverse set of vision problems. On average, our method selects an inference algorithm yielding labellings with 96 % of variables the same as the best available algorithm.
Keywords
Problem Instance Problem Class Inference Algorithm Stereo Match Semantic Segmentation1 Introduction
Minimisation of discrete energies defined over factors is an important problem in computer vision and other fields such as bioinformatics, with many algorithms proposed in the literature to solve such problems [3]. These models arise from many different underlying problem classes; in vision, typical examples are stereo matching, semantic segmentation, and texture reconstruction, each of which yields models with very different characteristics, making different choices of minimisation algorithm preferable.
The space of published inference algorithms is vast, with methods ranging from highly specialised to very general. For example, message passing [4] is widely applicable, but takes exponential time for large cliques, and may not converge. Dualspace variants such as TRWS [5] do guarantee convergence, but not necessarily to a global optimum. \(\alpha \)expansion [6] and graphcuts [7] are better suited to models with dense connectivity, but require factors to take certain restricted forms, while QPBO [8] only works for binary models and may leave some variables unlabelled. Algorithms solving the Wolfe dual [9, 10, 11] such as [12, 13] are applicable to models with arbitrary factors and labels, but existing implementations for this generic setting tend to run more slowly.
Thus, when developing a new model, it may be difficult to decide what algorithm to use for inference. Selecting a good algorithm for a given model requires extensive expertise about the landscape of existing algorithms and typically involves understanding the operational details of many of them. Moreover, even for an expert who can choose which algorithm is best overall on a particular problem class, it may not be clear which is best for a particular instance—certain problem classes are heterogeneous enough that different instances within them may be best solved by different algorithms (Sect. 3). An alternative solution would be to run many algorithms on each input model and see which one performs best empirically. However, this would be computationally very expensive.
Recently studies appeared that evaluate a number of algorithms on various problems, comparing their performance [1, 2, 3, 14, 15]. These are intended to provide a ‘field guide’ for the practitioner, suggesting which techniques are suited for which models. In this paper, we take the next step forward and propose a technique to automatically select which inference algorithm to run on an input problem instance (Sect. 4). We do so without requiring the user to have any knowledge of the applicability of different inference methods, and without the computational expense of running many algorithms. Thus, our method is particularly suited for the vision practitioner with limited knowledge of inference, but who wishes to apply it to realworld problems.
Our method uses features extracted from the problem instance itself, to select inference algorithms according to two criteria relevant for the practitioner: (1) the fastest algorithm reaching the lowest energy for that instance; or (2) the fastest algorithm delivering a very similar labelling to the lowest energy one (Fig. 1). The features are designed to capture characteristics of the instance that affect algorithm applicability or performance, such as the clique sizes and connectivity structure (Sect. 4.1). We train our selection models without human supervision, based on the results of running many algorithms over a large dataset of training problem instances.
We perform experiments (Sect. 5) on an extended version of the OpenGM2 benchmark [3], containing 344 problem instances drawn from 32 diverse classes (Sect. 2), and consider a pool of 15 inference algorithms drawn from the most prominent approaches (Sect. 3). The results show that on 69 % of problem instances our method selects the best algorithm. On average, the labels of 96 % of variables match that returned by the algorithm achieving the lowest energy. Our automatic selector achieves these results over \(88\times \) faster than the obvious alternative of running all algorithms and retaining the best solution.
1.1 Related Work
MAP inference. MAP inference algorithms can be split into several broad categories. Graphcuts [7] is very efficient, but restricted to pairwise binary GMs with submodular factors. It can be extended to more general models, such as by the movemaking methods \(\alpha \)expansion and \(\alpha \beta \)swap [6], wherein a subset of variables change label at each iteration, or by transformations introducing auxiliary variables [16, 17, 18]. Alternatively, inference is naturally formulated as an integer linear program, which can be solved directly and optimally using offtheshelf polyhedral solvers for small problems [3]. It can also be relaxed to a noninteger linear program (LP), which can be solved faster. However, it requires rounding the solution, which does not always yield the global optimum of the original problem. Messagepassing algorithms [4, 19] have each variable/factor iteratively send to its neighbours messages encoding its current belief about each neighbour’s minmarginals. Treereweighted methods [5, 20] use a messagepassing formulation, but actually solve a Lagrangian dual of the LP, and can provide a certificate of optimality where relevant. Other dualdecomposition methods [12, 13, 21] directly solve the Wolfe dual [10, 11] to the LP, but by iteratively finding the MAP state of each clique (or other tractable subgraphs) instead of passing messages. Our focus in this paper is not to introduce another inference algorithm, but to consider the metaproblem of learning to select what existing inference algorithm to apply to an input model; as such, we use many of the above algorithms in our framework (Sect. 3).
Inferning. Our work is a form of inferning [22], as it considers interactions between inference and learning. A few such methods use learning to guide the inference process. Unlike the hardwired algorithms mentioned above, these approaches learn to adapt to the characteristics of a particular problem class. Some operate by pruning the model during inference, by learning classifiers to remove labels from some variables [23, 24], or to remove certain factors from the model [25, 26]. Others learn an optimal sequence of operations to perform during messagepassing inference [27]. Our work operates at a higher level than these approaches. Instead of incorporating learning into an algorithm to allow adaptation to a problem class, we instead learn to predict which of a fixed set of hardwired algorithms is best to apply to a given problem instance.
Surveys on inference. The survey papers [1, 2, 3, 14, 15] evaluate a number of algorithms on various problems, comparing their performance. [1] focuses on stereo matching and considers highlyconnected grid models defined on pixels with unary and pairwise factors only. It evaluates three inference algorithms (graphcuts, TRWS, and belief propagation). [2] considers a wider selection of problems—stereo matching, image reconstruction, photomontaging, and binary segmentation—but with 4connectivity only, and applies a wider range of algorithms, adding ICM and \(\alpha \)expansion to the above. Recently, [3, 14] substantially widened the scope of such analysis, by considering also models with higherorder potentials, regular graphs with denser connectivity, models based on superpixels with smaller number of variables, and partitioning problems without unary terms. They compare the performance of many different types of algorithms on these models, including some specialised to particular problem classes. These surveys help to understand the space of existing algorithms and provide a guide to which algorithms are suited for which models. Our work takes a natural step forward, with a technique to automatically select the best algorithm to run on an input problem instance.
Automatic algorithm selection. Automatic algorithm selection was pioneered by [28], which considered algorithms for quadrature and process scheduling. More recently, machine learning techniques have been used to select algorithms for constraintsatisfaction [29], and other combinatorial search problems [30]. However, none of these works consider selecting MAP inference algorithms.
2 Dataset of Models
OpenGM2 [3]. The OpenGM2 dataset contains GMs drawn from 28 problem classes, including pairwise and higherorder models from computer vision and bioinformatics; it is the largest dataset of GMs currently available. We briefly summarize here the main kinds of problems and refer to [3] for details.

lowlevel vision problems such as stereo matching [2], inpainting [31, 32], and montaging [2]. These are all locallyconnected graphs with variables corresponding to pixels, and with pairwise factors only; label counts vary widely between classes, from 2–256.

small semantic segmentation problems with up to eight classes, with labels corresponding to surface types [33] and geometric descriptions [34]. These are irregular, sparse graphs over superpixels; [33] uses pairwise factors only, while [34] has general thirdorder terms.

partitioning (unsupervised segmentation by clustering) based on patch similarity, operating on superpixels and with as many labels as variables, in both 2D [35, 36, 37] and 3D [38]. Potts or generalised Potts factors are used in all cases; [35] has very large cliques with up to 300 variables, while the other classes are pairwise or thirdorder, just one class having dense connectivity.

two problem classes from bioinformatics: protein sidechain prediction [39], and protein folding [40]; both are defined over irregular graphs, with [39] having only two labels but general thirdorder factors, while [40] has up to 503 labels and dense pairwise connectivity.
Semantic segmentation with context [23]. Semantic segmentation on the MSRC21 dataset [41] with relative position factors. Variables correspond to superpixels and labels to 21 object/background classes (e.g. car, road, sky). Unary factors are given by appearance classifiers on features of a superpixel, while pairwise factors encode relative location in the image, to favour labellings showing classes in the expected spatial relation to one another (e.g. sky above road). The model is fully connected, i.e. there is a pairwise factor between every two superpixels in the image.
Joint localisation [23]. Joint object localisation across images on the PASCAL VOC 2007 dataset [42]. The set of images containing a certain object class form a problem instance. Variables correspond to images and labels to object proposals [43] in the images. Unary factors are given by the objectness probability of a proposal [43], while pairwise factors measure the appearance similarity between two proposals in different images. Inference on this model will select one proposal per image, so that they are likely to contain objects and to be visually similar over the images.
Line fitting [44]. Fitting of multiple lines to a set of points in \(\mathbb {R}^2\). This an alternative to RANSAC [45] for fitting an unknown number of geometric models to a dataset. Variables correspond to points and labels to candidate lines from a fixed pool (sampled from the point set in a preprocessing stage). Unary factors favour labelling a point with a nearby line, while pairwise factors promote local smoothness of the labelling (i.e. nearby points should take the same label).
Texture restoration [46]. Binary texture restoration with pattern potentials. Given a binary image corrupted by noise, the task is to reconstruct the original noisefree image, while preserving the underlying texture regularity. Variables correspond to pixels and labels to ‘on’ or ‘off’. Unary factors penalise deviations from the input noisy image, while pairwise factors prefer pixels at certain offsets taking certain pairs of values (learned on a training image showing a noisefree texture). Higherorder factors reward image patches for taking joint labellings which occur frequently in the training image (patterns). The pairwise and higherorder factors capture low and high order texture properties, respectively.
Data diversity. From each problem class we take all instances up to a maximum of 20. This results in a diverse dataset of 344 problem instances drawn from the 32 classes; 224 of these instances are pairwise and 120 higherorder. 21 of the problem classes have small labelspaces (<20 labels), while the remainder vary greatly up to a maximum of 17074. Variable counts similarly cover a wide range, from 19 to 2356620, with a median of 10148. Amongst the higherorder problems, 58 % of instances have arbitrary dense factor tables, while the remainder have Potts potentials [6] or generalised versions thereof [47, 48]. The problem classes also differ greatly in the degrees of homogeneity of their instances. For example, instances in the linefitting class vary by an order of magnitude in variable and label counts, whereas all instances in the inclusion class have identical characteristics but for the factor energies themselves.
3 Inference Algorithms and Performance
Algorithms used in this study, including the GM orders they are applicable to (pw = pairwise), number of parameter settings included if more than one (#p), full name or description, and reference to the original work
alias  order  #p  name / description  ref. 

A*  all  implicitly convert to shortestpath problem and apply A*  [49]  
AD\(^{3}\)  all  alternating directions dual decomposition with branch and bound  [13]  
\(\alpha \)exp  pw  alphaexpansion  [6]  
BPS  all  4  sequential loopy belief propagation, implementation of [3]  [4] 
DDS  all  2  dual decomposition with subgradient descent  [12] 
FPD  pw  3  fast primal/dual (FastPD)  [50] 
ICM  all  iterated conditional modes  [51]  
ILP  all  solve as integer programming problem with Gurobi  [3]  
KL  pw  KernighanLin method for \(2^\mathrm{{nd}}\) order partitioning problems  [52]  
LBP  all  4  parallel loopy belief propagation, implementation of [3]  [4] 
LP  all  solve linear programming relaxation with Gurobi  [3]  
MPLP  all  maxproduct linear programming with cutting plane relaxation tightening  
QPBO  pw  quadratic pseudoboolean optimisation  [8]  
TRWS  pw  3  sequential treereweighted messagepassing  [5] 
UM  all  take lowestenergy label according to unary factors only   
We also include a simple method, dubbed unarymodes (UM), which labels each variable by minimizing its unary factors only; this should perform poorly on genuinely hard structured prediction problems, where the nonunary factors have a decisive impact on the MAP labelling.
Protocol for inference. We used the original authors’ implementation of each algorithm where available, and the implementations in [3] otherwise. Every algorithm was run on every problem instance in our dataset, with limits of 60 min CPU time and 4 GB RAM imposed for inference on one instance. For each successful run, we recorded the MAP labelling and time taken.
Many of the algorithms have free parameters that must be defined by the user. While it was not practical to evaluate every possible combination of parameters, for several of the algorithms we included multiple parameterisations where this affects their results significantly. For example, we ran four versions of loopy belief propagation, with damping set to 0.0 and 0.75, and maximum iteration counts of 50 and 250. In such cases, the different parameterisations are combined to create a metaalgorithm, which simulates the user running every parameterisation, then taking the results from that yielding lowest energy on the problem instance.

higherorder factors are omitted when passing GMs to pairwise algorithms. However, when evaluating the algorithm’s performance, the energy of the output labelling is still computed on the full model including all factors.

nonmetric pairwise factors passed to \(\alpha \)expansion are handled as if they were metric, sacrificing the usual correctness and optimality guarantees [6].

QPBO aborts when presented with a GM having nonbinary variables.

FastPD aborts when presented with a GM whose pairwise factors are not all proportional to some uniform distance function on labels.

KernighanLin aborts when presented with a GM having factors that are not pairwise Potts
Aggregate performance of each inference algorithm on our dataset; mean time is over instances for which the algorithm successfully returns a result
% instances for which...  mean time /s  

completes  best&fastest  good&fastest  
A*  4  0  0  0.1 
AD\(^{3}\)  52  7  1  390.2 
\(\alpha \)exp  98  5  7  23.4 
BPS  72  4  2  158.3 
DDS  80  0  0  296.6 
FPD  31  9  22  7.2 
ILP  48  1  0  96.3 
LP  52  2  1  76.8 
ICM  100  30  31  60.7 
KL  12  10  10  142.2 
LBP  73  6  4  193.5 
MPLP  56  1  1  1116.3 
QPBO  12  0  2  0.1 
TRWS  94  19  10  236.4 
UM  100  0  3  0.1 

completes: whether the algorithm runs to completion, i.e. returns a solution within 60 min, regardless of the energy of that solution.

bestandfastest: whether the algorithm reaches the lowest energy among all algorithms, faster than any other one that does so. This is relevant for a user requiring the solution with lowest possible energy, even at high computational cost.

goodandfastest: whether the algorithm is the fastest to reach a solution with 98 % of variables matching the lowest energy labelling. This is highly relevant in practice, as minor deviations from that labelling may not matter to the user, while achieving it would require a significantly slower algorithm.
Table 2 shows the performance of the algorithms with respect to these measures.
Algorithm diversity. We see that the distributions of both bestandfastest and goodandfastest algorithms over instances have high entropy—many different algorithms are bestandfastest or goodandfastest for a significant fraction of GMs. 11 of the 15 algorithms are able to return a solution for at least one instance on more than half of the problem classes; the other four are particularly restricted, such as QPBO (which only operates on binary problems). All the algorithms other than A* and DDS are the bestandfastest for at least one problem instance. TRWS and FastPD perform particularly well on pairwise problems, with TRWS generally reaching slightly lower energies, but FastPD being much quicker. KernighanLin outperforms all algorithms on pairwise partitioning problems. AD\(^{3}\) gives low energies for highorder problems, but often takes longer than other algorithms. Only ICM and unarymodes are able to return a solution for all problem instances. Although they are fast and widelyapplicable, these naïve methods are unable to return the best solution in the majority of cases. All these observations show how our goal of learning to select the best inferencer is much harder than simply picking any algorithm that runs to completion.
4 Learning to Select an Algorithm
We now consider how to automatically select the best MAP inference algorithm for an input problem instance. This is the main contribution of this paper. We define two tasks: (1) predicting the bestandfastest algorithm; and (2) predicting the goodandfastest algorithm. To address these tasks, we design selection models that take a GM as input, and select an algorithm as output (Sect. 4.2). The selection models operate on features extracted from the GMs themselves (Sect. 4.1). This is different from the typical approach in computer vision of extracting features from images and using these to build a GM.
4.1 GM Features
We extract the following three groups of features from each problem instance (Fig. 3).
Instance size. The number of variables, V, and of factors, F, are used to indicate the overall size of the problem instance, hence whether slower algorithms are likely to be applicable. We also include the minimum, maximum and mean label count over all variables. See Fig. 3b.
Secondly, we measure factor densities—for each factor order \(M \ge 2\), the number of factors of order M divided by the binomial coefficient \(\left( {\begin{array}{c}V\\ M\end{array}}\right) = \frac{V!}{M!(VM)!}\). Intuitively, this is the number of possible Mcliques that actually have an associated factor. In Fig. 3, this is 1 for third order, as there is only one possible triplet, but 2 / 3 for second order, as only two of the possible three pairs of variables have a pairwise factor: (x, y) and (x, z) but not (z, y)

for each factor \(f \in F\), let \(\mu _f\) be the mean and \(\sigma _f\) the standard deviation of all unique values taken by f

then, for each factor order \(M \ge 2\), compute for factors \(F_M\) of that order, the ratio of each of the following to the same quantity for \(M = 1\): (i) \(\sum _{f \in F_M}\mu _f\) (ii) \(\sum _{f \in F_M}\mu _f / F_M\) (iii) \(\sum _{f \in F_M}\sigma _f / F_M\).
4.2 Algorithm Selection Models and Their Training
Selection models. We propose two algorithm selection models. Each is a 1ofN classifier implemented as a random forest [54], taking the features described in Sect. 4.1 as input. Model BF is trained to predict the bestandfastest algorithm; model GF is trained to predict the goodandfastest algorithm. The random forests are trained recursively by selecting the best split from a randomlygenerated pool at each step, using information gain (i.e. entropy decrease) as the criterion, and with outputs modelled by categorical distributions [54].
Data. We train the selection models on a subset of our dataset (Sect. 2). A training sample consists of features extracted from a problem instance and a target output label denoting which algorithm works best on it. It is important to note that these training labels are automatically generated by running all algorithms on the training instances, as in Sect. 3. No human annotation is required. At test time, we run the selection models on a separate subset of the dataset. The evaluation compares the algorithm selected by our model to the one known to perform best (Sect. 5). Again, this test label is produced automatically.
5 Experiments
 1.
pairwise—many algorithms are designed for pairwise problems only;
 2.
higherorder—there exist algorithms designed explicitly to handle higherorder factors, but which may be slow for pairwise instances;
 3.
partitioning—these are a special class which is hard for general algorithms (due to having a large label space, and being invariant to label permutations) but certain methods can exploit this structure to solve them efficiently; most partitioning problems in our dataset are pairwise, but some are thirdorder.
Then, at test time, each problem instance is assigned the algorithm that is most often best for training problems of its superclass. This strong baseline mimics the behaviour of a user with good working knowledge of inference—enough to recognise how her problem fits in these superclasses, and to know which algorithm will be best for each.
Experimental setup. We select half the problem instances at random to train on, and the remainder are used for testing. As discussed in Sect. 3, the groundtruth labels marking which inference algorithms perform best on a problem instance are obtained automatically by running all algorithms on all instances. No human annotation is necessary for training or testing.
Performance of our model GF and baselines NB and SB for selecting the goodandfastest algorithm (first three columns), and performance of our model FG and baselines for selecting the bestandfastest algorithm (last three columns).
algorithm selected by...  goodandfastest  bestandfastest  

GF  NB  SB  BF  NB  SB  
% instances correctly classified  69  31  28  62  30  36 
mean % matching variables  96.4  75.3  87.5  97.1  75.4  95.6 

percentage of instances with the correct algorithm (bestandfastest or goodandfastest) selected. This is the measure for which we trained our selection models.

mean (over instances) of fraction of variables matching the labelling returned by the bestandfastest algorithm. This is particularly relevant in practice, as users typically care about the quality of the labelling output, by an algorithm, especially in terms of how close it is to the lowestenergy labelling that could have been returned.
6 Analysis
Predicting the goodandfastest algorithm. Model GF correctly chooses the goodandfastest algorithm for 69 % of instances, with 96.4 % of variables taking the same label as in the true best labelling on average. This compares favourably to the naïve baseline NB, which correctly selects only on 31 % of the instances and returns labellings that are considerably worse (75.3 % correctlylabelled variables on average). Indeed, our model also substantially outperforms the strong baseline, which only achieves an average of 87.5 % of correctlylabelled variables.
These results show that our selection model successfully generalises to new problem instances not seen during training. It is able to select an algorithm much better than even the strong baseline of a user who knows which algorithm performs best for similar problems in the training set.
Mean times and speedups from using our method, versus exhaustively applying all algorithms. matching var’s is fraction of variables whose labels match true best result; speedup is ratio of time to that for exhaustive testing
mean...  time /s  speedup  matching var’s 

exhaustive  13046.8  1.0\(\times \)  100 % 
goodandfastest  221.3  88.1\(\times \)  96.4 % 
bestandfastest  312.5  46.8\(\times \)  97.1 % 
Confusion matrix showing true (rows) and predicted (columns) goodandfastest algorithms for pairwise problems. The table only includes those algorithms that are the true goodandfastest for at least one problem instance.
Algorithms selected by the strong baseline. As described in Sect. 5, our strong baseline chooses the algorithm that is most often bestandfastest or goodandfastest over the training set, for problems in the same superclass as the test instance. For predicting the bestandfastest algorithm, KernighanLin is selected for partitioning problems, TRWS for other pairwise instances, and AD\(^{3}\) for other (higher order) instances. However, for the goodandfastest algorithm, FastPD is selected instead of TRWS for pairwise instances, and ICM for higher order instances, indicating that these often label 98 % or more of variables correctly, while being faster to run.
Algorithms selected by our method. At a coarse level, for the task of selecting the bestandfastest algorithm, we find that our model BF most often chooses pairwisespecific algorithms for pairwise problems, and AD\(^{3}\) for higherorder problems. This agrees with intuition—pairwise algorithms are specifically designed to be faster for pairwise instances, while AD\(^{3}\) is a good generalpurpose algorithm for higherorder instances. Interestingly, for the goodandfastest task, model GF correctly learns to choose ICM or a good pairwise method for higherorder problems in place of AD\(^{3}\)—for many instances, these provide solutions close in labelling to the lowestenergy, and do so much faster than AD\(^{3}\).
Performance of algorithm selection methods selecting goodandfastest and bestandfastest algorithms in the LOCO regime; see Table 3 for details.
algorithm selected by...  goodandfastest  bestandfastest  

GF  NB  SB  BF  NB  SB  
% instances correctly classified  40  26  11  28  25  23 
mean % matching variables  89.8  73.3  71.7  85.5  73.3  86.2 
LOCO regime. We also tested our models and baselines in an even harder ‘leave one class out’ (LOCO) regime, where for each problem class C in turn, we train on all instances from classes other than C, and test on those in C; the final performance is given by a weighted mean over classes. This tests generalisation to classes absent from the training set, which is relevant when the user does not wish to train our model on her classes. The results are presented in Table 6.
For selecting the goodandfastest algorithm, model GF still performs well in LOCO regime, selecting algorithms labelling 89.8 % of variables correctly, and exceeding both naïve and strong baselines by over 15 %. Moreover, we correctly choose the goodandfastest algorithm 14 % more often than the baselines.
For the bestandfastest task, model BF results in 85.5 % of variables being correctly labelled, significantly exceeding the naïve baseline at 73.3 % and comparable with the strong baseline at 86.2 %. These results demonstrate that our selection models are strong enough to generalise across the hidden problem classes, going beyond discovering and recalling distinguishing features of these.
7 Conclusions
We have presented a method to automatically choose the best inference algorithm to apply to an input problem instance. It selects an inference algorithm that labels 96 % of variables the same as the best available algorithm for that instance. Our method is over 88\(\times \) faster than exhaustively trying all algorithms. The experiments show that our automated selection methods successfully generalise across problem instances and importantly, even across problem classes.
References
 1.Kolmogorov, V., Rother, C.: Comparison of energy minimization algorithms for highly connected graphs. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 1–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
 2.Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A comparative study of energy minimization methods for markov random fields with smoothnessbased priors. IEEE Trans. on PAMI 30(6), 1068–1080 (2008)CrossRefGoogle Scholar
 3.Kappes, J., Andres, B., Hamprecht, F., Schnörr, C., Nowozin, S., Batra, D., Kim, S., Kausler, B., Kröger, T., Lellmann, J., Komodakis, N., Savchynskyy, B., Rother, C.: A comparative study of modern inference techniques for structured discrete energy minimization problems. IJCV 115, 1–30 (2015)MathSciNetCrossRefGoogle Scholar
 4.Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2006)zbMATHGoogle Scholar
 5.Kolmogorov, V.: Convergent treereweighted message passing for energy minimization. IEEE Trans. PAMI 28(10), 1568–1583 (2006)CrossRefGoogle Scholar
 6.Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. PAMI 23(11), 1222–1239 (2001)CrossRefGoogle Scholar
 7.Greig, D.M., Porteous, B.T., Seheult, A.H.: Exact maximum a posteriori estimation for binary images. J. Roy. Stat. Soc. 51(2), 271–279 (1989)Google Scholar
 8.Rother, C., Kolmogorov, V., Lempitsky, V., Szummer, M.: Optimizing binary MRFs via extended roof duality. In: CVPR (2007)Google Scholar
 9.Storvik, G., Dahl, G.: Lagrangianbased methods for finding MAP solutions for MRF models. IEEE Trans. Image Process. 9, 469–479 (2000)CrossRefGoogle Scholar
 10.Guignard, M., Kim, S.: Lagrangean decomposition: a model yielding stronger Lagrangean bounds. Math. Prog. 39, 215–228 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
 11.Komodakis, N., Paragios, N., Tziritas, G.: MRF optimization via dual decomposition: messagepassing revisited. In: ICCV, pp. 1–8 (2007)Google Scholar
 12.Kappes, J., Savchynskyy, B., Schnörr, C.: A bundle approach to efficient MAPinference by lagrangian relaxation. In: CVPR (2012)Google Scholar
 13.Martins, A.F.T., Figueiredo, M.A.T., Aguiar, P.M.Q., Smith, N.A., Xing, E.P.: AD3: alternating directions dual decomposition for MAP inference in graphical models. JMLR 16, 495–545 (2015)MathSciNetzbMATHGoogle Scholar
 14.Andres, B., Kappes, J.H., Köthe, U., Schnörr, C., Hamprecht, F.A.: An empirical comparison of inference algorithms for graphical models with higher order factors using openGM. In: Goesele, M., Roth, S., Kuijper, A., Schiele, B., Schindler, K. (eds.) Pattern Recognition. LNCS, vol. 6376, pp. 353–362. Springer, Heidelberg (2010)CrossRefGoogle Scholar
 15.Alahari, K., Kohli, P., Torr, P.: Dynamic hybrid algorithms for discrete map MRF inference. IEEE Trans. PAMI 32(10), 1846–1857 (2010)CrossRefGoogle Scholar
 16.Ishikawa, H.: Higherorder clique reduction in binary graph cut. In: CVPR, pp. 2993–3000 (2009)Google Scholar
 17.Ishikawa, H.: Transformation of general binary MRF minimization to the first order case. IEEE Trans. PAMI 33(6), 1234–1249 (2011)CrossRefGoogle Scholar
 18.Fix, A., Gruber, A., Boros, E., Zabih, R.: A graph cut algorithm for higherorder markov random fields. In: ICCV (2011)Google Scholar
 19.Kschischang, F.R., Frey, B.J.: Factor graphs and the sumproduct algorithm. IEEE Trans. Inf. Theor. 47(2), 498–519 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
 20.Wainwright, M.J., Jaakkola, T.S., Willsky, A.S.: MAP estimation via agreement on (hyper)trees: messagepassing and linearprogramming approaches. IEEE Trans. Inf. Theor. 51(11), 3697–3717 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
 21.Sontag, D., Meltzer, T., Globerson, A., Weiss, Y., Jaakkola, T.: Tightening LP relaxations for MAP using messagepassing. In: Proceedings of UAI, pp. 503–510 (2008)Google Scholar
 22.Doppa, J.R., Kumar, P., Wick, M., Singh, S., Salakhutdinov, R.: ICML 2013 workshop on inferning (2013). http://inferning.cs.umass.edu/
 23.Guillaumin, M., Van Gool, L., Ferrari, V.: Fast energy minimization using learned state filters. In: CVPR (2013)Google Scholar
 24.Conejo, B., Komodakis, N., Leprince, S., Avouac, J.P.: Inference by learning: speedingup graphical model optimization via a coarsetofine cascade of pruning classifiers. In: NIPS, pp. 1–9 (2014)Google Scholar
 25.Stoyanov, V., Eisner, J.: Fast and accurate prediction via evidencespecific MRF structure. In: ICML Workshop on Inferning (2012)Google Scholar
 26.Roig, G., Boix, X., De Nijs, R., Ramos, S., Kuhnlenz, K., Van Gool, L.: Active map inference in CRFS for efficient semantic segmentation. In: ICCV, pp. 2312–2319 (2013)Google Scholar
 27.Jiang, J., Moon, T., Daumé III., H., Eisner, J.: Prioritized asynchronous belief propagation. In: ICML Workshop on Inferning (2013)Google Scholar
 28.Rice, J.R.: The algorithm selection problem. Adv. Comps. 15, 65–118 (1976)CrossRefGoogle Scholar
 29.Xu, L., Hutter, F., Hoos, H.H., LeytonBrown, K.: SATzilla: portfoliobased algorithm selection for SAT. J. Artif. Intel. Res. 32, 565–606 (2008)zbMATHGoogle Scholar
 30.Kotthoff, L., Gent, I.P., Miguel, I.: A preliminary evaluation of machine learning in algorithm selection for search problems. In: Symposium on Combinatorial Search (2011)Google Scholar
 31.Lellmann, J., Schnörr, C.: Continuous multiclass labeling approaches and algorithms. SIAM J. Im. Sci. 4(4), 1049–1096 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 32.Nowozin, S., Rother, C., Bagon, S., Sharp, T., Yao, B., Kohli, P.: Decision tree fields. In: ICCV (2011)Google Scholar
 33.Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: ICCV (2009)Google Scholar
 34.Hoiem, D., Efros, A.A., Hebert, M.: Recovering occlusion boundaries from an image. IJCV 91(3), 328–346 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 35.Kim, S., Nowozin, S., Kohli, P., Yoo, C.D.: Higherorder correlation clustering for image segmentation. In: NIPS (2011)Google Scholar
 36.Andres, B., Kappes, J.H., Beier, T., Köthe, U., Hamprecht, F.A.: Probabilistic image segmentation with closedness constraints. In: ICCV (2011)Google Scholar
 37.Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. KDE 2(2), 172–188 (2008)zbMATHGoogle Scholar
 38.Andres, B., Kroeger, T., Briggman, K.L., Denk, W., Korogod, N., Knott, G., Koethe, U., Hamprecht, F.A.: Globally optimal closedsurface segmentation for connectomics. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 778–791. Springer, Heidelberg (2012)Google Scholar
 39.Jaimovich, A., Elidan, G., Margalit, H., Friedman, N.: Towards an integrated proteinprotein interaction network: a relational Markov network approach. J. Comp. Biol. 13(2), 145–164 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 40.Yanover, C., SchuelerFurman, O., Weiss, Y.: Minimizing and learning energy functions for sidechain prediction. J. Comp. Biol. 15(7), 899–911 (2008)MathSciNetCrossRefGoogle Scholar
 41.Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost for image understanding: multiclass object recognition and segmentation by jointly modeling appearance, shape and context. IJCV 81(1), 2–23 (2009)CrossRefGoogle Scholar
 42.Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results (2007). http://www.pascalnetwork.org/challenges/VOC/voc2007/workshop/index.html
 43.Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR (2010)Google Scholar
 44.Isack, H., Boykov, Y.: Energybased geometric multimodel fitting. IJCV 97(2), 123–147 (2012)CrossRefzbMATHGoogle Scholar
 45.Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Comm. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
 46.Rother, C., Kohli, P., Feng, W., Jia, J.: Minimizing sparse higher order energy functions of discrete variables. In: CVPR (2009)Google Scholar
 47.Kohli, P., Kumar, M., Torr, P.: P3 & beyond: solving energies with higher order cliques. In: CVPR (2007)Google Scholar
 48.Kohli, P., Ladicky, L., Torr, P.: Robust higher order potentials for enforcing label consistency. In: CVPR (2008)Google Scholar
 49.Bergtholdt, M., Kappes, J.H., Schnörr, C.: Learning of graphical models and efficient inference for object class recognition. In: Franke, K., Müller, K.R., Nickolay, B., Schäfer, R. (eds.) DAGM 2006. LNCS, vol. 4174, pp. 273–283. Springer, Heidelberg (2006). doi: 10.1007/11861898_28 CrossRefGoogle Scholar
 50.Komodakis, N., Tziritas, G., Paragios, N.: Performance vs computational efficiency for optimizing single and dynamic MRFs: setting the state of the art with primaldual strategies. CVIU 112(1), 14–29 (2008)Google Scholar
 51.Besag, J.: On the statistical analysis of dirty pictures. J. Roy. Stat. Soc. 48(3), 48–259 (1986)MathSciNetzbMATHGoogle Scholar
 52.Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Sys. Tech. J. 49(2), 291–307 (1970)CrossRefzbMATHGoogle Scholar
 53.Sontag, D., Choe, D.K., Li, Y.: Efficiently searching for frustrated cycles in MAP inference. In: Proceedings of UAI, pp. 795–804 (2012)Google Scholar
 54.Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests for classification, regression, density estimation, manifold learning and semisupervised learning. Microsoft Research Cambridge, Technical report MSRTR2011114 (2011)Google Scholar