Advertisement

Probabilistic models of vision and max-margin methods

  • Alan YuilleEmail author
  • Xuming He
Research Article
  • 84 Downloads

Abstract

It is attractive to formulate problems in computer vision and related fields in term of probabilistic estimation where the probability models are defined over graphs, such as grammars. The graphical structures, and the state variables defined over them, give a rich knowledge representation which can describe the complex structures of objects and images. The probability distributions defined over the graphs capture the statistical variability of these structures. These probability models can be learnt from training data with limited amounts of supervision. But learning these models suffers from the difficulty of evaluating the normalization constant, or partition function, of the probability distributions which can be extremely computationally demanding. This paper shows that by placing bounds on the normalization constant we can obtain computationally tractable approximations. Surprisingly, for certain choices of loss functions, we obtain many of the standard max-margin criteria used in support vector machines (SVMs) and hence we reduce the learning to standard machine learning methods. We show that many machine learning methods can be obtained in this way as approximations to probabilistic methods including multi-class max-margin, ordinal regression, max-margin Markov networks and parsers, multiple-instance learning, and latent SVM. We illustrate this work by computer vision applications including image labeling, object detection and localization, and motion estimation. We speculate that better results can be obtained by using better bounds and approximations.

Keywords

structured prediction max-margin learning probabilistic models loss function 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Pearl J. Probabilistic Reasoning in Intelligent Systems. Morgan-Kaufmann, 1988Google Scholar
  2. 2.
    Manning C D, Schütze H. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999zbMATHGoogle Scholar
  3. 3.
    Heckerman D. A tutorial on learning with Bayesian networks. Learning in Graphical Model, 1999, 301–354Google Scholar
  4. 4.
    Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 2nd ed. Prentice Hall, 2003Google Scholar
  5. 5.
    Zhu S C, Mumford D. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2006, 2(4): 259–362zbMATHCrossRefGoogle Scholar
  6. 6.
    Tenenbaum J B, Griffiths T L, Kemp C. Theory-based Bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 2006, 10(7): 309–318CrossRefGoogle Scholar
  7. 7.
    Smith N A. Linguistic Structure Prediction. Synthesis Lectures on Human Language Technologies. Morgan and Claypool, 2011Google Scholar
  8. 8.
    Grenander U. Pattern Synthesis: Lectures in Pattern Theory 1. New York, NY: Springer, 1976Google Scholar
  9. 9.
    Grenander U. Pattern Analysis: Lectures in Pattern Theory 2. New York, NY: Springer, 1978Google Scholar
  10. 10.
    Tenenbaum J B, Yuille A L. IPAM Summer School: The Mathematics of the Mind. IPAM, UCLA, 2007Google Scholar
  11. 11.
    Jin Y, Geman S. Context and hierarchy in a probabilistic image model. In: Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition. 2006, 2145–2152Google Scholar
  12. 12.
    Vapnik V N. The Nature of Statistical Learning Theory. New York, NY: Springer-Verlag, 1995zbMATHGoogle Scholar
  13. 13.
    Schölkopf B, Smola A. Leaning with Kernels. MIT Press, 2001Google Scholar
  14. 14.
    Franc V, Zien A, Schölkopf B. Support vector machines as probabilistic models. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011, 665–672Google Scholar
  15. 15.
    Hastie T, Tibshirani R, Feldman J. The Elements of Statistical Learning. 1st ed. Springer, 2001Google Scholar
  16. 16.
    Lebanon G, Lafferty J. Boosting and maximum likelihood for exponential models. In: Proceedings of Neural Information Processing Systems (NIPS 2001). 2001Google Scholar
  17. 17.
    Rätsch G, Onoda T, Müller K R. Soft margins for AdaBoost. Machine Learning, 2001, 42(3): 287–320zbMATHCrossRefGoogle Scholar
  18. 18.
    Rätsch G, Mika S, Schölkopf B, Müller K R. Constructing boosting algorithms from SVMs: An application to one-class classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(9): 1184–1199CrossRefGoogle Scholar
  19. 19.
    Wahba G. Spline Models for Observational Data. SIAM CBMS-NSF Regional Conference Series in Applied Mathematics. Volume 59. SIAM, Philadelphia, PA, 1990Google Scholar
  20. 20.
    Seeger M. Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers. In: Proceedings of Neural Information Processing Systems (NIPS 1999). 1999, 603–609Google Scholar
  21. 21.
    Poggio T, Smale S. The mathematics of learning: Dealing with data. American Mathematical Society, 2003Google Scholar
  22. 22.
    Zhu J, Xing E P. Maximum entropy discrimination Markov networks. Journal of Machine Learning Research, 2009, 10: 2531–2569MathSciNetGoogle Scholar
  23. 23.
    Zhu L, Chen Y, Lin Y, Yuille A L. A hierarchical image model for polynomial-time 2D parsing. In: Proceedings of Neural Information Processing Systems (NIPS 2008). 2008Google Scholar
  24. 24.
    Zhu L, Chen Y, Lu Y, Lin C, Yuille A L. Max margin AND/OR graph learning for parsing the human body. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008Google Scholar
  25. 25.
    Zhu L, Chen Y, Yuille A L. Learning a hierarchical deformable template for rapid deformable object parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(6): 1029–1043CrossRefGoogle Scholar
  26. 26.
    Zhu L, Chen Y, Yuille A L, Freeman W. Latent hierarchical structure learning for object detection. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. 2010Google Scholar
  27. 27.
    Wu S, He X, Lu H, Yuille A L. A unified model of shortrange and long-range motion perception. In: Proceedings of Neural Information Processing Systems (NIPS 2010). 2010Google Scholar
  28. 28.
    Fischler M A, Elschlager R A. The representation and matching of pictorial structures. IEEE Transactions on Computers, 1973, 22(1): 67–92CrossRefGoogle Scholar
  29. 29.
    Yuille A L, Hallinan P W, Cohen D S. Feature extraction from faces using deformable templates. International Journal of Computer Vision, 1992, 8(2): 99–111CrossRefGoogle Scholar
  30. 30.
    Cootes T F, Edwards G J, Taylor C J. Active appearance models. Lecture Notes in Computer Science, 1998, 1407: 484–498CrossRefGoogle Scholar
  31. 31.
    Tu Z, Chen X, Yuille A L, Zhu S C. Image parsing: Unifying segmentation, detection, and recognition. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 1: 18–25Google Scholar
  32. 32.
    Lafferty J D, McCallum A, Pereira F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 282–289Google Scholar
  33. 33.
    Kumar S, Hebert M. A hierarchical field framework for unified context-based classification. In: Proceedings of the 10th IEEE International Conference on Computer Vision. 2005, 2: 1284–1291Google Scholar
  34. 34.
    He X, Zemel R S, Carreira-Perpiñán M Á. Multiscale conditional random fields for image labeling. In: Proceedings of the 2004 IEEE Conference on Computer Vision and Pattern Recognition. 2004, 2: 695–702CrossRefGoogle Scholar
  35. 35.
    Winn J M, Jojic N. LOCUS: Learning object classes with unsupervised segmentation. In: Proceedings of the 10th IEEE International Conference on Computer Vision. 2005, 1: 756–763Google Scholar
  36. 36.
    Sudderth E B, Torralba A B, Freeman W T, Willsky A S. Learning hierarchical models of scenes, objects, and parts. In: Proceedings of the 10th IEEE International Conference on Computer Vision. 2005, 2: 1331–1338Google Scholar
  37. 37.
    Felzenszwalb P, Mcallester D, Ramanan D. A discriminatively trained, multiscale, deformable part model. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008Google Scholar
  38. 38.
    Shilman M, Liang P, Viola P A. Learning non-generative grammatical models for document analysis. In: Proceedings of the 10th IEEE International Conference on Computer Vision. 2005, 2: 962–969Google Scholar
  39. 39.
    Ahuja N, Todorovic S. Learning the taxonomy and models of categories present in arbitrary image. In: Proceedings of the 11th IEEE International Conference on Computer Vision. 2007Google Scholar
  40. 40.
    Todorovic S, Ahuja N. Region-based hierarchical image matching. International Journal of Computer Vision, 2008, 78(1): 47–66CrossRefGoogle Scholar
  41. 41.
    Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527–1554MathSciNetzbMATHCrossRefGoogle Scholar
  42. 42.
    Amit Y, Geman D, Fan X. A coarse-to-fine strategy for multiclass shape detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(12): 1606–1621CrossRefGoogle Scholar
  43. 43.
    Kokkinos I, Yuille A L. Hop: Hierarchical object parsing. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 802–809Google Scholar
  44. 44.
    Kokkinos I, Yuille A L. Inference and learning with hierarchical shape models. International Journal of Computer Vision, 2011, 93(2): 201–225MathSciNetCrossRefGoogle Scholar
  45. 45.
    Tipping M E. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 2001, 1: 211–244MathSciNetzbMATHGoogle Scholar
  46. 46.
    Pletscher P, Ong C S, Buhmann J M. Entropy and margin maximization for structured output learning. In: Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases. 2010Google Scholar
  47. 47.
    Taskar B, Guestrin C, Koller D. Max-margin Markov networks. In: Proceedings of Neural Information Processing Systems (NIPS 2003). 2003Google Scholar
  48. 48.
    Taskar B, Klein D, Collins M, Koller D, Manning C. Maxmargin parsing. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 2004Google Scholar
  49. 49.
    Tsochantaridis I, Joachims T, Hofmann T, Altun Y. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 2006, 6: 1453–1484MathSciNetGoogle Scholar
  50. 50.
    Yu C N, Joachims T. Learning structural SVMs with latent variables. In: Proceedings of the 26th International Conference on Machine Learning. 2009Google Scholar
  51. 51.
    Andrews S, Tsochantaridis I, Hofmann T. Support vector machines for multiple-instance learning. In: Proceedings of Neural Information Processing Systems (NIPS 2002). 2002Google Scholar
  52. 52.
    Xu L, Wilkinson D, Schuurmans D. Discriminative unsupervised learning of structured predictors. In: Proceedings of the 23rd International Conference on Machine Learning. 2006Google Scholar
  53. 53.
    Collins M. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In: Proceedings of the ACL02 Conference on Empirical Methods in Natural Language Processing. 2002Google Scholar
  54. 54.
    Yedidia J S, Freeman W T, Weiss Y. Constructing freeenergy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 2005, 51(7): 2282–2312MathSciNetCrossRefGoogle Scholar
  55. 55.
    Yuille A L, Rangarajan A. The concave-convex procedure (CCCP). In: Proceedings of Neural Information Processing Systems (NIPS 2001). 2001Google Scholar
  56. 56.
    Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B: Methodological, 1977, 39(1): 1–38MathSciNetzbMATHGoogle Scholar
  57. 57.
    Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2006Google Scholar
  58. 58.
    Shotton J, Winn J, Rother C, Criminisi A. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Proceedings of European Conference on Computer Vision. 2006Google Scholar
  59. 59.
    Baker S, Roth S, Scharstein D, Black M J, Lewis J P, Szeliski R. A database and evaluation methodology for optical flow. In: Proceedings of the 11th IEEE International Conference on Computer Vision. 2007Google Scholar

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of California at Los AngelesLos AngelesUSA
  2. 2.Department of Brain and Cognitive EngineeringKorea UniversitySeoulKorea
  3. 3.NICTA Canberra Research LaboratoryCanberraAustralia

Personalised recommendations