Skip to main content
Log in

Detecting parametric objects in large scenes by Monte Carlo sampling

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Point processes constitute a natural extension of Markov random fields (MRF), designed to handle parametric objects. They have shown efficiency and competitiveness for tackling object extraction problems in vision. Simulating these stochastic models is however a difficult task. The performances of the existing samplers are limited in terms of computation time and convergence stability, especially on large scenes. We propose a new sampling procedure based on a Monte Carlo formalism. Our algorithm exploits the Markovian property of point processes to perform the sampling in parallel. This procedure is embedded into a data-driven mechanism so that the points are distributed in the scene in function of spatial information extracted from the input data. The performances of the sampler are analyzed through a set of experiments on various object detection problems from large scenes, including comparisons to the existing algorithms. The sampler is also tested as optimization algorithm for MRF-based labeling problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. GCO C++ library (http://vision.csd.uwo.ca/code/).

References

  • Baddeley, A. J., & Lieshout, M. V. (1993). Stochastic geometry models in high-level vision. Journal of Applied Statistics, 20(5–6), 231–256.

    Article  Google Scholar 

  • Benchmark, (2013). Datasets, results and evaluation tools. http://www-sop.inria.fr/members/Florent.Lafarge/benchmark/evaluation.html.

  • Besag, J. E. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, 48(3), 259–302.

    MATH  MathSciNet  Google Scholar 

  • Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. Pattern Analysis and Machine Intelligence, 23(11), 1222–1239.

    Article  Google Scholar 

  • Byrd, J., Jarvis, S., & Bhalerao, A. (2010). On the parallelisation of mcmc-based image processing. IEEE International Symposium on Parallel and Distributed Processing. Atlanta, US.

  • Chai, D., Forstner, W., & Lafarge, F. (2013). Recovering line-networks in images by junction-point processes. Computer Vision and Pattern Recognition, Portland.

  • Chai, D., Forstner, W., & Yang, M. Y. (2012). Combine Markov random fields and marked point processes to extract building from remotely sensed images. International Society for Photogrammetry and Remote Sensing Congress. Melbourne, Australia.

  • Descombes, X. (2011). Stochastic geometry for image analysis. Oxford: Wiley.

    Google Scholar 

  • Descombes, X., Minlos, R., & Zhizhina, E. (2009). Object extraction using a stochastic birth-and-death dynamics in continuum. Journal of Mathematical Imaging and Vision, 33(3), 347–359.

    Article  MathSciNet  Google Scholar 

  • Earl, D., & Deem, M. (2005). Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 23(7), 3910–3916.

    Article  Google Scholar 

  • Ge, W., & Collins, R. (2009). Marked point processes for crowd counting. Computer Vision and Pattern Recognition. Miami.

  • Gonzalez, J., Low, Y., Gretton, A., & Guestrin, C. (2011). Parallel Gibbs sampling: From colored fields to thin junction trees. Journal of Machine Learning Research, 15, 324–332.

    Google Scholar 

  • Green, P. (1995). Reversible jump Markov chains Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732.

    Article  MATH  MathSciNet  Google Scholar 

  • Grenander, U., & Miller, M. (1994). Representations of knowledge in complex systems. Journal of the Royal Statistical Society, 56(4), 549–603.

    MATH  MathSciNet  Google Scholar 

  • Han, F., Tu, Z. W., & Zhu, S. (2004). Range image segmentation by an effective jump-diffusion method. Pattern Analysis and Machine Intelligence, 26(9), 1138–1153.

    Article  Google Scholar 

  • Harkness, M., & Green, P. (2000). Parallel chains, delayed rejection and reversible jump mcmc for object recognition. British Machine Vision Conference. Bristol, United Kingdom.

  • Hastings, W. (1970). Monte Carlo sampling using Markov chains and their applications. Biometrika, 57(1), 97–109.

    Article  MATH  Google Scholar 

  • Lacoste, C., Descombe, X., & Zerubia, J. (2005). Point processes for unsupervised line network extraction in remote sensing. Pattern Analysis and Machine Intelligence, 27(10), 1568–1579.

    Article  Google Scholar 

  • Lafarge, F., Gimel’farb, G., & Descombes, X. (2010). Geometric feature extraction by a multi-marked point process. Pattern Analysis and Machine Intelligence, 32(9), 1597–1609.

    Article  Google Scholar 

  • Lafarge, F., & Mallet, C. (2012). Creating large-scale city models from 3d-point clouds: A robust approach with hybrid representation. International Journal of Computer Vision, 99(1), 69–85.

    Article  MathSciNet  Google Scholar 

  • Lehmussola, A., Ruusuvuori, P., Selinummi, J., Huttunen, H., & Yli-Harja, O. (2007). Computational framework for simulating fluorescence microscope images with cell populations. IEEE Transactions on Medical Imaging, 26(7), 1010–1016.

    Google Scholar 

  • Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. Conference on Neural Information Processing Systems. Vancouver, Canada.

  • Li, S. (2001). Markov random field modeling in image analysis. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Lieshout, M. V. (2008). Depth map calculation for a variable number of moving objects using markov sequential object processes. Pattern Analysis and Machine Intelligence, 30(7), 1308–1312.

    Article  Google Scholar 

  • Liu, J. (2001). Monte Carlo strategies in scientific computing. New York: Springer.

    MATH  Google Scholar 

  • Mallet, C., Lafarge, F., Roux, M., Soergel, U., Bretar, F., & Heipke, C. (2010). A marked point process for modeling lidar waveforms. IEEE Transactions on Image Processing, 19(12), 3204–3221.

    Article  MathSciNet  Google Scholar 

  • Nguyen, H.-G., Fablet, R., & Bouchet, J. (2010). Spatial statistics of visual keypoints for texture recognition. European Conference on Computer Vision. Heraklion, Greece.

  • Ortner, M., Descombes, X., & Zerubia, J. (2008). A marked point process of rectangles and segments for automatic analysis of digital elevation models. Pattern Analysis and Machine Intelligence, 30(1), 105–119.

    Article  Google Scholar 

  • Rochery, M., Jermyn, I., & Zerubia, J. (2006). Higher order active contours. International Journal of Computer Vision, 69(3), 335–351.

    Article  Google Scholar 

  • Salamon, P., Sibani, P., & Frost, R. (2002). Facts, Conjectures, and Improvements for Simulated Annealing. Philadelphia: SIAM Monographs on Mathematical Modeling and Computation.

  • Srivastava, A., Grenander, U., Jensen, G., & Miller, M. (2002). Jump-Diffusion Markov processes on orthogonal groups for object pose estimation. Journal of Statistical Planning and Inference, 103(1–2), 15–27.

    Article  MATH  MathSciNet  Google Scholar 

  • Stoica, R. S., Martinez, V., & Saar, E. (2007). A three dimensional object point process for detection of cosmic filaments. Journal of the Royal Statistical Society, 56(4), 459.

    Google Scholar 

  • Sun, K., Sang, N., & Zhang, T. (2007). Marked point process for vasculartree extraction on angiogram. Energy Minimization Methods in Computer Vision and Pattern Recognition. Ezhou, China.

  • Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., et al. (2008). Comparative study of energy minimization methods for markov random fields with smoothness-based priors. Pattern Analysis and Machine Intelligence, 30(6), 1068.

    Article  Google Scholar 

  • Tu, Z., & Zhu, S. (2002). Image segmentation by data-driven Markov chain Monte Carlo. Pattern Analysis and Machine Intelligence, 24(5), 657–673.

    Article  Google Scholar 

  • Utasi, A., & Benedek, C. (2011). A 3-D marked point process model for multi-view people detection. Conference on Computer Vision and Pattern Recognition. Colorado Springs, US.

  • Verdie, Y., & Lafarge, F. (2012). Efficient Monte Carlo sampler for detecting parametric objects in large scenes. European Conference on Computer Vision. Firenze, Italy.

  • Weiss, Y., & Freeman, W. (2001). On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory, 47(2), 736–744.

    Article  MATH  MathSciNet  Google Scholar 

  • Zhu, S., Guo, C., Wang, Y., & Xu, Z. (2005). What are textons? International Journal of Computer Vision, 62(1–2), 121–143.

    Google Scholar 

Download references

Acknowledgments

This work was partially funded by the European Research Council (ERC Starting Grant “Robust Geometry Processing”, Grant agreement 257474). The authors thank A. Lehmussola, V. Lempitsky, H. Bischof, R. Ehrich, the French Mapping Agency (IGN), the Tour du Valat, and the BRGM for providing the datasets, as well as the reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florent Lafarge.

Appendices

Appendix 1: Population counting model

Let \(x\) denote a configuration of ellipses for which the center of mass of an ellipse is contained in the compact set \(K\) supporting the input image (see Fig. 19). The energy follows the form specified by Eq. 5. The unitary data term \(D(x_i)\) and the potential \(V(x_i,x_j)\) are given by:

$$\begin{aligned} D(x_i)&= \left\{ \begin{array}{ll} \quad 1 - \frac{d(x_i)}{d_0} &{}\text { if }d(x_i)<d_0\\ \quad exp(\frac{d_0-d(x_i)}{d_0}) -1 &{}\text { otherwise}\\ \end{array} \right. \end{aligned}$$
(20)
$$\begin{aligned} V(x_i,x_j)&= \beta \frac{A(x_i \cap x_j)}{\min (A(x_i),A(x_j))} \end{aligned}$$
(21)

where

  • \(d(x_i)\) represents the Bhattacharyya distance between the radiometry inside and outside the object \(x_i\):

    $$\begin{aligned} d(x_i) = \frac{(m_{in}-m_{out})^2}{4(\sigma _{in}^2+\sigma _{out}^2)} - \frac{1}{2} ln\left( \frac{2\sigma _{in}\sigma _{out}}{\sigma _{in}^2+\sigma _{out}^2}\right) \end{aligned}$$
    (22)

    where \(m_{in}\) and \(\sigma _{in}\) (respectively \(m_{out}\) and \(\sigma _{out}\)) are the intensity mean and standard deviation in \(S_{in}\) (respectively in \(S_{out}\)).

  • \(d_0\) is a coefficient fixing the sensitivity of the object fitting. The higher the value of \(d_0\), the more selective the object fitting. In particular, \(d_0\) has to be high when the input images are corrupted by a significant amount of noise.

  • \(A(x_i)\) is the area of object \(x_i\).

  • \(\beta \) is a coefficient weighting the non-overlapping constraint with respect to the data term.

Note that a basic mathematical dilatation is used in practice to roughly extract the class of interest from the image of birds for creating a space-partitioning tree.

Fig. 19
figure 19

Objects and their parameters for the various presented models. (left) Ellipses and (middle) line-segments are defined by a 2D-point \(p \in K\) (center of mass of the object) and some marks. These additional parameters are the semi-major axis \(b\), the semi-minor axis \(a\), and the angle \(\theta \) for an ellipse, and the semi-length \(b\), the semi-width \(a\), the orientation \(\theta \), and the anchor length \(c\) for a line-segment. The inside (respectively bordering) volume of the object is denoted by \(S_{in}\) (respectively \(S_{out}\)). The anchors are denoted by \(A_1\) and \(A_2\). (right) 3D-trees are defined by a 3D-point \(p \in K\) (center of mass of the object), a type \(t \in \{\)conoidal, ellipsoidal, semi-ellipsoidal\(\}\) illustrated on Fig. 20, and \(3\) additional parameters which are the canopy height \(a\), the trunk height \(b\) and the canopy diameter \(c\). The cylindrical volume \(\mathcal C x_i\) represents the attraction space of object \(x_i\) in which the input points are used to measure the quality of this object

Appendix 2: Line-network extraction model

A line-segment is defined by five parameters, including the 2D point corresponding to the center of mass of the object (Fig. 19). Similarly to the population counting model detailed in Appendix 1, the fitting quality with respect to the data is based on the Bhattacharyya distance: the unitary data term \(D(x_i)\) of the energy is given by Eq. 20. The potential \(V(x_i,x_j)\) penalizes strong object overlaps (see Eq. 21), but also takes into account a connection interaction in order to favor the linking of the line-segments. The potential term is thus given by:

$$\begin{aligned} V(x_i,x_j) \!=\! \beta _1 \frac{A(x_i \cap x_j)}{\min (A(x_i),A(x_j))} \!+\! \mathbf 1 _{x_i \sim _{nc} x_j} \times \beta _2 f(x_i,x_j) \nonumber \\ \end{aligned}$$
(23)

where

  • \(\beta _1\) and \(\beta _2\) are two coefficients weighing respectively the non-overlapping and connection constraints with respect to the data term.

  • \(\sim _{nc}\) is the non-connection relationship between two objects. \(x_i \sim _{nc} x_j\) if the anchor areas of \(x_i\) and \(x_j\) (see Fig. 19) do not overlap.

  • \( \mathbf 1 _{condition}\) is the indicative function returning one when condition is valid, and zero otherwise.

  • \(f(x_i,x_j)\) is a symmetric function weighting the penalization of two non-connected objects \(x_i\) and \(x_j\) with respect to their average fitting quality. The function \(f\) is introduced to slightly relax the connection constraint when the two objects are of very good quality.

As for the bird counting problem, a basic mathematical dilatation has been used to roughly extract the class of interest from the aerial image shown on Fig. 14. Indeed the pixels corresponding to the class road in this image are relatively bright compared to the background. The segmented result is obviously not optimal, but sufficient to create an efficient space-partitioning tree.

Appendix 3: Tree recognition model formulation

Let \(x\) represent a configuration of 3D-models of trees from a template library described in Fig. 20. The center of mass \(p\) of a tree is contained in the compact set \(K\) supporting the 3D bounding box of the input point cloud (Fig. 19). We denote by \(\partial x_i\) the surface of the object \(x_i\), and by \(\mathcal C x_i\) the cylindrical volume having a vertical axis passing through the center of mass of \(x_i\), in which the input points are considered to measure the quality of \(x_i\). The unitary data term \(D(x_i)\) and the pairwise potential \(V(x_i,x_j)\) are given by:

$$\begin{aligned} D(x_i)&= \frac{1}{|\mathcal C x_i|} \underset{p_c \in \mathcal C x_i}{\prod } \gamma ( d(p_c, \partial x_i)) \end{aligned}$$
(24)
$$\begin{aligned} V(x_i,x_j)&= \beta _1 V_{overlap}(x_i,x_j) + \beta _2 V_{competition}(x_i,x_j) \nonumber \\ \end{aligned}$$
(25)

where

  • \(|\mathcal C x_i|\) is a coefficient normalizing the unitary data term with respect to the number of input points contained in \(\mathcal C x_i\).

  • \(d(p_c, \partial x_i)\) is a distance measuring the coherence of the point \(p_c\) with respect to the object surface \(\partial x_i\). \(d\) is not the traditional orthogonal distance from point to surface because, as real trees do not describe ellipsoidal/conoidal shapes, input points are not homogeneously distributed on the object surface. Here, \(d\) is defined as the combination of the planimetric distance, i.e. the projection in the plane of equation \(z=0\) of the Euclidean distance, and the altimetric variation such that points outside the object are more penalized than inside points. Note that \(d\) is invariant by rotation around the Z-axis.

  • \(\gamma (.) \in [-1,1]\) is a quality function which is strictly increasing.

  • \(V_{overlap}\) is the pairwise potential penalizing strong overlapping between two objects, and given by:

    $$\begin{aligned} V_{overlap}(x_i,x_j) = \frac{A(x_i \cap x_j)}{\min (A(x_i),A(x_j))} \end{aligned}$$
    (26)

    where \(A(x_i)\) is the area of the object \(x_i\) projected onto the plane of equation \(z=0\).

  • \(V_{competition}\) is the pairwise potential favoring a similar tree type \(t\) in a local neighborhood:

    $$\begin{aligned} V_{competition}(x_i,x_j) = \mathbf 1 _{t_i \ne t_j} \end{aligned}$$
    (27)

    where \(\mathbf 1 _{.}\) is the indicative function.

  • \(\beta _1\) and \(\beta _2\) are two coefficients weighting respectively the non-overlapping constraint and the competition term with respect to the data term.

In order to roughly extract the class of interest from the point clouds, the scatter descriptor proposed by Lafarge and Mallet (2012) is used to identify the points which potentially correspond to trees.

Fig. 20
figure 20

Library of tree models—the objects are specified by a 3D point (center of mass illustrated by a red dot) and additional parameters (blue arrows) including the canopy type whose shape can be conoidal (e.g. pine or fir), ellipsoidal (e.g. poplar or tilia) or semi-ellipsoidal (e.g. oak or maple)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verdié, Y., Lafarge, F. Detecting parametric objects in large scenes by Monte Carlo sampling. Int J Comput Vis 106, 57–75 (2014). https://doi.org/10.1007/s11263-013-0641-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-013-0641-0

Keywords

Navigation