The Shape Boltzmann Machine: A Strong Model of Object Shape

Eslami, S. M. Ali; Heess, Nicolas; Williams, Christopher K. I.; Winn, John

doi:10.1007/s11263-013-0669-1

The Shape Boltzmann Machine: A Strong Model of Object Shape

Published: 12 November 2013

Volume 107, pages 155–176, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

S. M. Ali Eslami¹,
Nicolas Heess²,
Christopher K. I. Williams¹ &
…
John Winn³

2592 Accesses
74 Citations
3 Altmetric
Explore all metrics

Abstract

A good model of object shape is essential in applications such as segmentation, detection, inpainting and graphics. For example, when performing segmentation, local constraints on the shapes can help where object boundaries are noisy or unclear, and global constraints can resolve ambiguities where background clutter looks similar to parts of the objects. In general, the stronger the model of shape, the more performance is improved. In this paper, we use a type of deep Boltzmann machine (Salakhutdinov and Hinton, International Conference on Artificial Intelligence and Statistics, 2009) that we call a Shape Boltzmann Machine (SBM) for the task of modeling foreground/background (binary) and parts-based (categorical) shape images. We show that the SBM characterizes a strong model of shape, in that samples from the model look realistic and it can generalize to generate samples that differ from training examples. We find that the SBM learns distributions that are qualitatively and quantitatively better than existing models for this task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gaussian Process Deep Belief Networks: A Smooth Generative Model of Shape with Uncertainty Propagation

Centered convolutional deep Boltzmann machine for 2D shape modeling

Article 06 January 2021

How to Pretrain Deep Boltzmann Machines in Two Stages

Notes

http://msri.org/people/members/eranb.
http://vision.caltech.edu/Image_Datasets/Caltech101.
We set \(S=10,000\) in our experiments.

References

Ackley, D., Hinton, G., & Sejnowski, T. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9(1), 147–169.
Article Google Scholar
Alexe, B., Deselaers, T., & Ferrari, V. (2010a). ClassCut for unsupervised class segmentation. In European Conference on Computer vision (pp. 380–393).
Alexe, B., Deselaers, T., & Ferrari, V. (2010b). What is an object?. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 73–80).
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., & Davis, J. (2005). SCAPE: Shape completion and animation of people. ACM Transactions on Graphics (SIGGRAPH), 24(3), 408–416.
Article Google Scholar
Bertozzi, A., Esedoglu, S., & Gillette, A. (2007). Inpainting of binary images using the Cahn–Hilliard equation. IEEE Transactions on Image Processing, 16(1), 285–291.
Article MATH MathSciNet Google Scholar
Bo, Y., & Fowlkes, C. (2011). Shape-based pedestrian parsing. In IEEE Conference on Computer Vision and Pattern Recognition 2011.
Borenstein, E., Sharon, E., & Ullman, S. (2004). Combining top-down and bottom-up segmentation. In CVPR Workshop on Perceptual Organization in Computer Vision.
Boykov, Y., & Jolly, M. P. (2001). Interactive graph cuts for oOptimal boundary & region segmentation of objects in N-D images. In International Conference on Computer Vision 2001 (pp. 105–112).
Bridle, J. S. (1990). Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In Advances in Neural Information Processing Systems (Vol. 2, pp. 211–217).
Cemgil, T., Zajdel, W., & Krose, B. (2005). A hybrid graphical model for robust feature extraction from video. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1158–1165).
Chan, T. F., & Shen, J. (2001). Nontexture inpainting by curvature-driven diffusions. Journal of Visual Communication and Image Representation, 12(4), 436–449.
Article Google Scholar
Chen, F., Yu, H., Hu, R., & Zeng, X. (2013). Deep learning shape priors for object segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1870–1877).
Cootes, T., Taylor, C., Cooper, D. H., & Graham, J. (1995). Active shape models—Their training and application. Computer Vision and Image Understanding, 61, 38–59.
Article Google Scholar
Desjardins, G., & Bengio, Y. (2008). Empirical evaluation of convolutional RBMs for vision. Tech. Rep. 1327, Département d’Informatique et de Recherche Opérationnelle, Université de Montréal.
Eslami, S. M. A., & Williams, C. K. I. (2011). Factored shapes and appearances for parts-based object understanding. In British Machine Vision Conference 2011, (pp. 18.1–18.12).
Eslami, S. M. A., & Williams, C. K. I. (2012). A generative model for parts-based object segmentation. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, & K. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 25, pp. 100–107). Red Hook, NY: Curran Associates, Inc.
Fei-Fei, L., Fergus, R., Perona, P. (2004). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In IEEE Conference on Computer Vision and Pattern Recognition 2004, Workshop on Generative-Model Based Vision.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99, 1–19.
Google Scholar
Freund, Y., & Haussler, D. (1994). Unsupervised learning of distributions on binary vectors using two layer networks, Tech. Rep. UCSC-CRL-94-25. Santa Cruz: University of California.
Frey, B., Jojic, N., & Kannan, A. (2003). Learning appearance and transparency manifolds of occluded objects in layer. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 45–52).
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.
Google Scholar
Gavrila, D. M. (2007). A Bayesian, exemplar-based approach to hierarchical shape matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1408–1421.
Article Google Scholar
Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In International Conference on Computer Vision.
Heess, N., Roux, N. L., & Winn, J. M. (2011). Weakly supervised learning of foreground-background segmentation using masked RBMs. In International Conference on Artificial Neural Networks (Vol. 2, pp. 9–16).
Hinton, G. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.
Article MATH MathSciNet Google Scholar
Jojic, N., & Caspi, Y. (2004). Capturing image structure with probabilistic index maps. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 212–219).
Jojic, N., Perina, A., Cristani, M., Murino, V., & Frey, B. (2009). Stel component analysis: Modeling spatial correlations in image class structure. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2044–2051).
Kapoor, A. & Winn, J. (2006). Located hidden random fields: Learning discriminative parts for object detection. In European Conference on Computer Vision (pp. 302–315).
Kohli, P., Kumar, M. P., Torr, P. H. S. (2007). P3 & beyond: Solving energies with higher order cliques. In IEEE Conference on Computer Vision and Pattern Recognition.
Kohli, P., Ladicky, L., & Torr, P. H. S. (2009). Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision, 82(3), 302–324.
Article Google Scholar
Komodakis, N. & Paragios, N. (2009). Beyond pairwise energies: Efficient optimization for higher-order mrfs. In IEEE Conference on Computer Vision and Pattern Recognition 2007 (pp. 2985–2992).
Kumar, P., Torr, P., & Zisserman, A. (2005). OBJ CUT. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 18–25).
Lampert, C. H., Blaschko, M., & Hofmann, T. (2008). Beyond sliding windows: Object localization by efficient subwindow search. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8).
Le Roux, N., & Bengio, Y. (2008). Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation, 20(6), 1631–1649.
Article MATH MathSciNet Google Scholar
Le Roux, N., Heess, N., Shotton, J., & Winn, J. (2011). Learning a generative model of images by factoring appearance and shape. Neural Computation, 23(3), 593–650.
Article MATH MathSciNet Google Scholar
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of Hierarchical representations. In International Conference on Machine Learning (pp. 609–616).
Morris, R. D., Descombes, X., & Zerubia, J. (1996). The Ising/Potts model is not well suited to segmentation tasks. In Proceedings of the IEEE Digital Signal Processing Workshop.
Murray, I., & Salakhutdinov, R. (2009). Evaluating probabilities under high-dimensional latent variable models. In Advances in Neural Information Processing Systems (Vol. 21).
Neal, R. M. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56, 71–113.
Article MATH MathSciNet Google Scholar
Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11(2), 125–139.
Article MathSciNet Google Scholar
Norouzi, M., Ranjbar, M., & Mori, G. (2009). Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning. In CVPR (pp. 2735–2742).
Nowozin, S., & Lampert, C. H. (2009). Global connectivity potentials for random field models. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 818–825).
Raina, R., Madhavan, A., & Ng, A. Y. (2009). Large-scale deep unsupervised learning using graphics processors. In International Conference on Machine Learning (pp. 873–880).
Ranzato, M., Mnih, V., & Hinton, G. E. (2010). How to generate realistic images using gated MRFs. In J. Lafferty, C. K. I. Williams, R. Zemel, J. Shawe-Taylor, & A. Culotta (Eds.), Advances in Neural Information Processing Systems (Vol. 23). Cambridge: MIT Press.
Ranzato, M., Susskind, J., Mnih, V., & Hinton, G. E. (2011). On deep generative models with applications to recognition. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2857–2864).
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407.
Article MATH MathSciNet Google Scholar
Roth, S., & Black, M. J. (2005). Fields of experts: A framework for learning image priors. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 860–867).
Rother, C., Kolmogorov, V., & Blake, A. (2004). “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics (SIGGRAPH), 23, 309–314.
Article Google Scholar
Rother, C., Kohli, P., Feng, W., & Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1382–1389).
Rowley, H., Baluja, S., & Kanade, T. (1998). Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 23–38.
Article Google Scholar
Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.
Article Google Scholar
Salakhutdinov, R. & Hinton, G. (2009). Deep Boltzmann machines. In International Conference on Artificial Intelligence and Statistics 2009, (Vol. 5, pp. 448–455).
Salakhutdinov, R., & Murray, I. (2008). On the quantitative analysis of deep belief networks. In International Conference on Machine Learning 2008.
Schneiderman, H. (2000). A statistical approach to 3D object detection applied to faces and cars. PhD Thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
Shekhovtsov, A., Kohli, P., & Rother, C. (2012). Curvature prior for MRF-based segmentation and shape inpainting. In DAGM/OAGM Symposium (pp. 41–51).
Sigal, L., Balan, A., & Black, M. (2010). HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision, 87(1–2), 4–27.
Article Google Scholar
Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., & Gool, L. V. (2009). Using multi-view recognition and meta-data annotation to guide a robot’s attention. International Journal of Robotics Research, 28(8), 976–998.
Google Scholar
Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In International Conference on Machine Learning 2008 (pp. 1064–1071).
Tjelmeland, H., & Besag, J. (1998). Markov random fields with higher-order interactions. Scandinavian Journal of Statistics, 25(3), 415–433.
Google Scholar
Williams, C. K. I., & Titsias, M. (2004). Greedy learning of multiple objects in images using robust statistics and factorial learning. Neural Computation, 16(5), 1039–1062.
Article MATH Google Scholar
Winn, J., & Jojic, N. (2005). LOCUS: Learning object classes with unsupervised segmentation. In International Conference on Computer Vision (pp. 756–763).
Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. In Stochastics and Stochastics Reports (Vol. 65, pp. 177–228).
Younes, L., & Sud, P. (1989). Parametric inference for imperfectly observed Gibbsian fields. Probability Theory and Related Fields, 82, 625–645.
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The majority of this work was performed whilst AE and NH were at Microsoft Research in Cambridge. Thanks to Charless Fowlkes and Vittorio Ferrari for access to datasets, and to Pushmeet Kohli for valuable discussions. AE acknowledges funding from the Carnegie Trust, the SORSAS scheme, and the IST Programme of the European Community under the PASCAL2 Network of Excellence (IST-2007-216886). NH acknowledges funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under Grant agreement no. 270327, and from the Gatsby Charitable foundation. We finally thank the anonymous referees for their comments which helped improve the paper.

Author information

Authors and Affiliations

School of Informatics, University of Edinburgh, Edinburgh, UK
S. M. Ali Eslami & Christopher K. I. Williams
Gatsby Computational Neuroscience Unit, University College London, London, UK
Nicolas Heess
Microsoft Research, Cambridge, UK
John Winn

Authors

S. M. Ali Eslami
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Heess
View author publications
You can also search for this author in PubMed Google Scholar
Christopher K. I. Williams
View author publications
You can also search for this author in PubMed Google Scholar
John Winn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. M. Ali Eslami.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eslami, S.M.A., Heess, N., Williams, C.K.I. et al. The Shape Boltzmann Machine: A Strong Model of Object Shape. Int J Comput Vis 107, 155–176 (2014). https://doi.org/10.1007/s11263-013-0669-1

Download citation

Received: 04 February 2013
Accepted: 11 October 2013
Published: 12 November 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s11263-013-0669-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Shape Boltzmann Machine: A Strong Model of Object Shape

Abstract

Access this article

Similar content being viewed by others

Gaussian Process Deep Belief Networks: A Smooth Generative Model of Shape with Uncertainty Propagation

Centered convolutional deep Boltzmann machine for 2D shape modeling

How to Pretrain Deep Boltzmann Machines in Two Stages

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Shape Boltzmann Machine: A Strong Model of Object Shape

Abstract

Access this article

Similar content being viewed by others

Gaussian Process Deep Belief Networks: A Smooth Generative Model of Shape with Uncertainty Propagation

Centered convolutional deep Boltzmann machine for 2D shape modeling

How to Pretrain Deep Boltzmann Machines in Two Stages

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation