## Abstract

Recently, a number of cross bilateral filtering methods have been proposed for solving multi-label problems in computer vision, such as stereo, optical flow and object class segmentation that show an order of magnitude improvement in speed over previous methods. These methods have achieved good results despite using models with only unary and/or pairwise terms. However, previous work has shown the value of using models with higher-order terms e.g. to represent label consistency over large regions, or global co-occurrence relations. We show how these higher-order terms can be formulated such that filter-based inference remains possible. We demonstrate our techniques on joint stereo and object labelling problems, as well as object class segmentation, showing in addition for joint object-stereo labelling how our method provides an efficient approach to inference in product label-spaces. We show that we are able to speed up inference in these models around 10–30 times with respect to competing graph-cut/move-making methods, as well as maintaining or improving accuracy in all cases. We show results on PascalVOC-10 for object class segmentation, and Leuven for joint object-stereo labelling.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
For exact MPM inference, the solution satisfies \(x^{{{\mathrm{MPM}}}}_i \in {{\mathrm{argmax}}}_l \sum _{\{\mathbf {x}|x_i=l\}}P(\mathbf {x}|I)\).

- 2.
Although the updates are conceptually parallel in form, the permutohedral lattice convolution is implemented sequentially.

- 3.
The class of such sparse higher-order potentials is also considered in Rother et al. (2009).

- 4.
Equation 9 requires evaluation of the joint probability of \(c-1\) variable assignments for each of the \(|\mathcal {P}_c|\) patterns, leading to the complexity \(O(|\mathcal {P}_c||c|)\) for a single evaluation. If \(Q\) is prevented from taking the values \(0\) and \(1\), the joint pattern probabilities \(\prod _{j\in c}Q_j(x_j=p_j)\) can be calculated once for each clique, and the conditional forms \(\prod _{j\in c, j\ne i}Q_j(x_j=p_j)\) needed for parallel updates can then be derived by dividing by \(Q_i(x_i=p_i)\), leading to the overall \(O(\max _c(|\mathcal {P}_c||c|)|\mathcal {C}^{{{\mathrm{pat}}}}|)\) complexity.

- 5.
In fact we use slightly different co-occurrence potentials with graph-cuts and mean-field, since for graph-cuts we use \(\psi ^{{{\mathrm{cooc}}}}\) while for mean-field we use \(\psi ^{{{\mathrm{cooc-2}}}}\), although we set the costs \(C(\Lambda )\) identically. We view the latter as an approximation of the former, and thus view this as a slight handicap for mean-field inference; however, further experiments would be needed to determine if the different forms of this potential lead to better/worse models.

## References

Adams, A., Baek, J., & Davis, M. A. (2010). Fast high-dimensional filtering using the permutohedral lattice.

*Computer Graphics Forum*,*29*(2), 753–762.Bai, X. and Sapiro, G. (2007). A geodesic framework for fast interactive image and video segmentation and matting. In

*ICCV*.Bleyer, M., Rhemann , C. and Rother, C. (2012). Extracting 3D scene-consistent object proposals and depth from stereo images. In

*ECCV*, (pp. 467–481).Bleyer, M., Rother, C., Kohli, P., Scharstein, D. and Sinha, S. (2011). Object stereo - joint stereo matching and object segmentation. In

*CVPR*, (pp. 3081–3088).Borestein, E. and Malik, J. (2006). Shape guided object segmentation. In

*CVPR*, (pp. 969–976).Boykov, Y. and Jolly, M. (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In

*ICCV*, (pp. 105–112).Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts.

*IEEE PAMI*,*23*(11), 1222–1239.Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach towards feature space analysis.

*TPAMI*,*24*, 603–619.Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis.

*IEEE PAMI*,*24*(5), 603–619.Criminisi, A. Sharp, T. and Blake, A. (2008). GeoS: Geodesic image segmentation. In

*ECCV*, (pp. 99–112).Everingham, M. Van Gool, L., Williams, C.K.I., Winn, J. and Zisserman, A. (2011). The PASCAL visual object classes, challenge (VOC2011).

Galleguillos, C. Rabinovich, A. and Belongiem, S. (2008). Object categorization using co-occurrence, location and appearance. In

*CVPR*.Gastla, E. S. S. L., & Oliveira, M. M. (2011). Domain transform for edge-aware image and video processing.

*ACM Transactions on Graphics*,*30*(4), 69.Goldlucke, B. and Cremers, D. (2010). Convex relaxation for multilabel problems with product label spaces. In

*ECCV*, (pp. 225–238).Gonfaus, J. M., Boix, X., Van De Weijer, J., Bagdanov, A. D., Serrat, J. and J. (2010). Gonzalez. Harmony potentials for joint classification and segmentation. In

*IEEE CVPR*.Grady, L. (2006). Random walks for image segmentation.

*TPAMI*,*28*, 1768–1783.Kohli, P., Kumar, M.P. and Torr, P.H.S. (2007). P3 & beyond: Solving energies with higher order cliques. In

*IEEE CVPR*.Koller, D., & Friedman, N. (2009).

*Probabilistic graphical models*. London: MIT Press.Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization.

*IEEE PAMI*,*28*(10), 1568–1583.Komodakis, N. and Paragios, N. (2009). Beyond pairwise energies: Efficient optimization for higher-order MRFs. In

*IEEE CVPR*, (pp. 2985–2992).Komodakis, N., Paragios, N., & Tziritas, G. (2011). MRF energy minimization and beyond via dual decomposition.

*IEEE PAMI*,*33*(3), 531–552.Kornprobst, P., Tumblin, J., & Durand, F. (2009). Bilateral filtering: Theory and applications.

*Foundations and Trends in Computer Graphics and Vision*,*4*(1), 1–74.Krahenbuhl . P. and Koltun, V. (2011). Efficient inference in fully connected CRFs with gaussian edge potentials. In

*NIPS*, (pp. 109–117).Kumar, M., Torr, P. and Zisserman, A. (2005). Obj cut. In

*CVPR*, (pp. 18–25).Kumar, M. P., Veksler, O., & Torr, P. H. S. (2011). Improved moves for truncated convex models.

*JMLR*,*12*, 31–67.Ladický, L., Russell, C., Kohli, P. and Torr, P.H.S. (2009). Assiciative hierarchical CRFs for object class image segmentation. In

*ICCV*, (pp. 739–746).Ladický, L., Russell, C., Kohli, P. and Torr, P.H.S. (2010). Graph cut based inference with co-occurrence statistics. In

*ECCV*, (pp. 239–253).Ladický, L., Sturgess, P., Alahari, K., Russell, C. and Torr, P.H.S. (2010). What, where and how many? combining object detectors and crfs. In

*ECCV*.Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W.F. and Torr, P.H.S. (2010). Joint optimisation for object class segmentation and dense stereo reconstruction. In

*BMVC*, (pp. 1–11).Lan, X., Roth, S., Huttenlocker, D. and Black, M. (2009). Efficient belief propagation with learnerd higher-order markov random fields. In

*ECCV*, (pp. 269–283).Liu, C., Yuen, J., Torralba, A., Sivic, J. and Freeman, W.T. (2008). SIFT flow: Dense correspondence across different scenes. In

*ECCV*.Liu, C., Yuen, J. and Torralba, A. (2009). Nonparametric scene parsing: Label transfer via dense scene alignment. In

*CVPR*.Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope.

*IJCV*,*42*, 145–175.Pawan Kumar, M. and Torr, Philip H.S. (2008). Improved moves for truncated convex models. In

*NIPS*, (pp. 889–896).Payet, N. and Todorovic, S. (2010). (\(\text{ RF })^2\)-random forest random field. In

*NIPS*.Potetz, B., & Lee, T. S. (2008). Efficient belief propagation for higher-order cliques using linear constraint nodes.

*CVIU*,*112*, 39–54.Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E. and Belongie, S. (2007). Objects in context. In

*ICCV*.Rhemann, C., Hosni, A., Bleyer, M., Rother, C. and Gelautz. M. (2011). Fast cost-volume filtering for visual correspondence and beyond. In

*CVPR*, (pp. 3017–3024).Rother, C., Kohli, P., Feng, W. and Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. In

*CVPR*, (pp. 1382–1389).Rother, C., Kohli, P., Feng, W. and Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. In

*CVPR*.Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: Interactive foreground extraction using iterated graph cuts.

*ACM TOG*,*23*, 309–314.Shotton, J., Winn, J. M., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context.

*IJCV*,*81*(1), 2–23.Singaraju, D., Grady, L. and Vidal R. (2008). P-Brush: Continuous valued MRFs with normed pairwise distributions for image segmentation. In

*CVPR*.Torralba, A., Murphy, K. P., & Freeman, W. T. (2007). Sharing visual features for multiclass and multiview object detection.

*IEEE PAMI*,*29*, 854–869.Toyoda, T., & Hasegawa, O. (2008). Random field model for integration of local information and global information.

*TPAMI*,*30*, 1483–1489.Turner, R. E. and Sahani, M. (2011). Two problems with variational expectation maximisation for time-series models. In

*Bayesian time series models*, (pp. 109–130).Veksler, O. (2007). Graph cut based optimization for MRFs with truncated convex priors. In

*CVPR*.Weiss, Y. (2001). Comparing the mean field method and belief propagation for approximate inference in MRFs.

*Advanced mean field methods: Theory and practices*. Cambridge, MA: MIT Press.Woodford, O., Torr, P. H. S., Reid, I., & Fitzgibbon, A. (2009). Global stereo reconstruction under second-order smoothness priors.

*IEEE PAMI*,*31*(12), 2115–2128.

## Acknowledgments

We thank Paul Sturgess for his discussion on SIFT-flow based initialization. The work was supported by the EPSRC and the IST programme of the European Community, under the PASCAL2 Network of Excellence. Professor Philip H.S. Torr is in receipt of a Royal Society Wolfson Research Merit Award.

## Author information

## Additional information

Vibhav Vineet and Jonathan Warrell have contributed to this work equally as joint first author.

Communicated by Carlo Colombo.

## Rights and permissions

## About this article

### Cite this article

Vineet, V., Warrell, J. & Torr, P.H.S. Filter-Based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces.
*Int J Comput Vis* **110, **290–307 (2014). https://doi.org/10.1007/s11263-014-0708-6

Received:

Accepted:

Published:

Issue Date:

### Keywords

- Object class segmentation
- Dense stereo reconstruction
- Mean-field methods
- Higher order potentials
- Bilateral filters
- CRF