International Journal of Computer Vision

, Volume 107, Issue 2, pp 155–176 | Cite as

The Shape Boltzmann Machine: A Strong Model of Object Shape

  • S. M. Ali Eslami
  • Nicolas Heess
  • Christopher K. I. Williams
  • John Winn
Article

Abstract

A good model of object shape is essential in applications such as segmentation, detection, inpainting and graphics. For example, when performing segmentation, local constraints on the shapes can help where object boundaries are noisy or unclear, and global constraints can resolve ambiguities where background clutter looks similar to parts of the objects. In general, the stronger the model of shape, the more performance is improved. In this paper, we use a type of deep Boltzmann machine (Salakhutdinov and Hinton, International Conference on Artificial Intelligence and Statistics, 2009) that we call a Shape Boltzmann Machine (SBM) for the task of modeling foreground/background (binary) and parts-based (categorical) shape images. We show that the SBM characterizes a strong model of shape, in that samples from the model look realistic and it can generalize to generate samples that differ from training examples. We find that the SBM learns distributions that are qualitatively and quantitatively better than existing models for this task.

Keywords

Shape Generative Deep Boltzmann machine Sampling 

References

  1. Ackley, D., Hinton, G., & Sejnowski, T. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9(1), 147–169.CrossRefGoogle Scholar
  2. Alexe, B., Deselaers, T., & Ferrari, V. (2010a). ClassCut for unsupervised class segmentation. In European Conference on Computer vision (pp. 380–393).Google Scholar
  3. Alexe, B., Deselaers, T., & Ferrari, V. (2010b). What is an object?. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 73–80).Google Scholar
  4. Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., & Davis, J. (2005). SCAPE: Shape completion and animation of people. ACM Transactions on Graphics (SIGGRAPH), 24(3), 408–416.CrossRefGoogle Scholar
  5. Bertozzi, A., Esedoglu, S., & Gillette, A. (2007). Inpainting of binary images using the Cahn–Hilliard equation. IEEE Transactions on Image Processing, 16(1), 285–291.CrossRefMATHMathSciNetGoogle Scholar
  6. Bo, Y., & Fowlkes, C. (2011). Shape-based pedestrian parsing. In IEEE Conference on Computer Vision and Pattern Recognition 2011.Google Scholar
  7. Borenstein, E., Sharon, E., & Ullman, S. (2004). Combining top-down and bottom-up segmentation. In CVPR Workshop on Perceptual Organization in Computer Vision.Google Scholar
  8. Boykov, Y., & Jolly, M. P. (2001). Interactive graph cuts for oOptimal boundary & region segmentation of objects in N-D images. In International Conference on Computer Vision 2001 (pp. 105–112).Google Scholar
  9. Bridle, J. S. (1990). Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In Advances in Neural Information Processing Systems (Vol. 2, pp. 211–217).Google Scholar
  10. Cemgil, T., Zajdel, W., & Krose, B. (2005). A hybrid graphical model for robust feature extraction from video. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1158–1165).Google Scholar
  11. Chan, T. F., & Shen, J. (2001). Nontexture inpainting by curvature-driven diffusions. Journal of Visual Communication and Image Representation, 12(4), 436–449.CrossRefGoogle Scholar
  12. Chen, F., Yu, H., Hu, R., & Zeng, X. (2013). Deep learning shape priors for object segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1870–1877).Google Scholar
  13. Cootes, T., Taylor, C., Cooper, D. H., & Graham, J. (1995). Active shape models—Their training and application. Computer Vision and Image Understanding, 61, 38–59.CrossRefGoogle Scholar
  14. Desjardins, G., & Bengio, Y. (2008). Empirical evaluation of convolutional RBMs for vision. Tech. Rep. 1327, Département d’Informatique et de Recherche Opérationnelle, Université de Montréal.Google Scholar
  15. Eslami, S. M. A., & Williams, C. K. I. (2011). Factored shapes and appearances for parts-based object understanding. In British Machine Vision Conference 2011, (pp. 18.1–18.12).Google Scholar
  16. Eslami, S. M. A., & Williams, C. K. I. (2012). A generative model for parts-based object segmentation. In P. Bartlett, F. Pereira, C. Burges, L. Bottou, & K. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 25, pp. 100–107). Red Hook, NY: Curran Associates, Inc.Google Scholar
  17. Fei-Fei, L., Fergus, R., Perona, P. (2004). Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In IEEE Conference on Computer Vision and Pattern Recognition 2004, Workshop on Generative-Model Based Vision.Google Scholar
  18. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99, 1–19.Google Scholar
  19. Freund, Y., & Haussler, D. (1994). Unsupervised learning of distributions on binary vectors using two layer networks, Tech. Rep. UCSC-CRL-94-25. Santa Cruz: University of California.Google Scholar
  20. Frey, B., Jojic, N., & Kannan, A. (2003). Learning appearance and transparency manifolds of occluded objects in layer. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 45–52).Google Scholar
  21. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.Google Scholar
  22. Gavrila, D. M. (2007). A Bayesian, exemplar-based approach to hierarchical shape matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1408–1421.CrossRefGoogle Scholar
  23. Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In International Conference on Computer Vision.Google Scholar
  24. Heess, N., Roux, N. L., & Winn, J. M. (2011). Weakly supervised learning of foreground-background segmentation using masked RBMs. In International Conference on Artificial Neural Networks (Vol. 2, pp. 9–16).Google Scholar
  25. Hinton, G. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.CrossRefMATHMathSciNetGoogle Scholar
  26. Jojic, N., & Caspi, Y. (2004). Capturing image structure with probabilistic index maps. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 212–219).Google Scholar
  27. Jojic, N., Perina, A., Cristani, M., Murino, V., & Frey, B. (2009). Stel component analysis: Modeling spatial correlations in image class structure. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2044–2051).Google Scholar
  28. Kapoor, A. & Winn, J. (2006). Located hidden random fields: Learning discriminative parts for object detection. In European Conference on Computer Vision (pp. 302–315).Google Scholar
  29. Kohli, P., Kumar, M. P., Torr, P. H. S. (2007). P3 & beyond: Solving energies with higher order cliques. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
  30. Kohli, P., Ladicky, L., & Torr, P. H. S. (2009). Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision, 82(3), 302–324.CrossRefGoogle Scholar
  31. Komodakis, N. & Paragios, N. (2009). Beyond pairwise energies: Efficient optimization for higher-order mrfs. In IEEE Conference on Computer Vision and Pattern Recognition 2007 (pp. 2985–2992).Google Scholar
  32. Kumar, P., Torr, P., & Zisserman, A. (2005). OBJ CUT. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 18–25).Google Scholar
  33. Lampert, C. H., Blaschko, M., & Hofmann, T. (2008). Beyond sliding windows: Object localization by efficient subwindow search. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8).Google Scholar
  34. Le Roux, N., & Bengio, Y. (2008). Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation, 20(6), 1631–1649.CrossRefMATHMathSciNetGoogle Scholar
  35. Le Roux, N., Heess, N., Shotton, J., & Winn, J. (2011). Learning a generative model of images by factoring appearance and shape. Neural Computation, 23(3), 593–650.CrossRefMATHMathSciNetGoogle Scholar
  36. Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of Hierarchical representations. In International Conference on Machine Learning (pp. 609–616).Google Scholar
  37. Morris, R. D., Descombes, X., & Zerubia, J. (1996). The Ising/Potts model is not well suited to segmentation tasks. In Proceedings of the IEEE Digital Signal Processing Workshop.Google Scholar
  38. Murray, I., & Salakhutdinov, R. (2009). Evaluating probabilities under high-dimensional latent variable models. In Advances in Neural Information Processing Systems (Vol. 21).Google Scholar
  39. Neal, R. M. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56, 71–113.CrossRefMATHMathSciNetGoogle Scholar
  40. Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11(2), 125–139.CrossRefMathSciNetGoogle Scholar
  41. Norouzi, M., Ranjbar, M., & Mori, G. (2009). Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning. In CVPR (pp. 2735–2742).Google Scholar
  42. Nowozin, S., & Lampert, C. H. (2009). Global connectivity potentials for random field models. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 818–825).Google Scholar
  43. Raina, R., Madhavan, A., & Ng, A. Y. (2009). Large-scale deep unsupervised learning using graphics processors. In International Conference on Machine Learning (pp. 873–880).Google Scholar
  44. Ranzato, M., Mnih, V., & Hinton, G. E. (2010). How to generate realistic images using gated MRFs. In J. Lafferty, C. K. I. Williams, R. Zemel, J. Shawe-Taylor, & A. Culotta (Eds.), Advances in Neural Information Processing Systems (Vol. 23). Cambridge: MIT Press.Google Scholar
  45. Ranzato, M., Susskind, J., Mnih, V., & Hinton, G. E. (2011). On deep generative models with applications to recognition. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2857–2864).Google Scholar
  46. Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407.CrossRefMATHMathSciNetGoogle Scholar
  47. Roth, S., & Black, M. J. (2005). Fields of experts: A framework for learning image priors. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 860–867).Google Scholar
  48. Rother, C., Kolmogorov, V., & Blake, A. (2004). “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics (SIGGRAPH), 23, 309–314.CrossRefGoogle Scholar
  49. Rother, C., Kohli, P., Feng, W., & Jia, J. (2009). Minimizing sparse higher order energy functions of discrete variables. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1382–1389).Google Scholar
  50. Rowley, H., Baluja, S., & Kanade, T. (1998). Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1), 23–38.CrossRefGoogle Scholar
  51. Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.CrossRefGoogle Scholar
  52. Salakhutdinov, R. & Hinton, G. (2009). Deep Boltzmann machines. In International Conference on Artificial Intelligence and Statistics 2009, (Vol. 5, pp. 448–455).Google Scholar
  53. Salakhutdinov, R., & Murray, I. (2008). On the quantitative analysis of deep belief networks. In International Conference on Machine Learning 2008.Google Scholar
  54. Schneiderman, H. (2000). A statistical approach to 3D object detection applied to faces and cars. PhD Thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
  55. Shekhovtsov, A., Kohli, P., & Rother, C. (2012). Curvature prior for MRF-based segmentation and shape inpainting. In DAGM/OAGM Symposium (pp. 41–51).Google Scholar
  56. Sigal, L., Balan, A., & Black, M. (2010). HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision, 87(1–2), 4–27.CrossRefGoogle Scholar
  57. Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., & Gool, L. V. (2009). Using multi-view recognition and meta-data annotation to guide a robot’s attention. International Journal of Robotics Research, 28(8), 976–998.Google Scholar
  58. Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In International Conference on Machine Learning 2008 (pp. 1064–1071).Google Scholar
  59. Tjelmeland, H., & Besag, J. (1998). Markov random fields with higher-order interactions. Scandinavian Journal of Statistics, 25(3), 415–433. Google Scholar
  60. Williams, C. K. I., & Titsias, M. (2004). Greedy learning of multiple objects in images using robust statistics and factorial learning. Neural Computation, 16(5), 1039–1062.CrossRefMATHGoogle Scholar
  61. Winn, J., & Jojic, N. (2005). LOCUS: Learning object classes with unsupervised segmentation. In International Conference on Computer Vision (pp. 756–763).Google Scholar
  62. Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. In Stochastics and Stochastics Reports (Vol. 65, pp. 177–228).Google Scholar
  63. Younes, L., & Sud, P. (1989). Parametric inference for imperfectly observed Gibbsian fields. Probability Theory and Related Fields, 82, 625–645.CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • S. M. Ali Eslami
    • 1
  • Nicolas Heess
    • 2
  • Christopher K. I. Williams
    • 1
  • John Winn
    • 3
  1. 1.School of InformaticsUniversity of EdinburghEdinburghUK
  2. 2.Gatsby Computational Neuroscience UnitUniversity College LondonLondonUK
  3. 3.Microsoft ResearchCambridgeUK

Personalised recommendations