Advertisement

International Journal of Computer Vision

, Volume 99, Issue 3, pp 302–318 | Cite as

Fast Structured Prediction Using Large Margin Sigmoid Belief Networks

  • Xu MiaoEmail author
  • Rajesh P. N. RaoEmail author
Article

Abstract

Images usually contain multiple objects that are semantically related to one another. Mapping from low-level visual features to mutually dependent high-level semantics can be formulated as a structured prediction problem. Current statistical models for structured prediction make simplifying assumptions about the underlying output graph structure, such as assuming a low-order Markov chain, because exact inference becomes intractable as the tree-width of the underlying graph increases. Approximate inference algorithms, on the other hand, force one to trade off representational power with computational efficiency. In this paper, we present large margin sigmoid belief networks (LMSBNs) for structured prediction in images. LMSBNs allow a very fast inference algorithm for arbitrary graph structures that runs in polynomial time with high probability. This probability is data-distribution dependent and is maximized in learning. The new approach overcomes the representation-efficiency trade-off in previous models and allows fast structured prediction with complicated graph structures. We present results from applying a fully connected model to semantic image annotation, image retrieval and optical character recognition (OCR) problems, and demonstrate that the proposed approach can yield significant performance gains over current state-of-the-art methods.

Keywords

Structured prediction Probabilistic graphical models Exact and approximate inference 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdou, S., & Scordilis, M. S. (2004). Beam search pruning in speech recognition using a posterior probability-based confidence measure. Speech Communication, 42(3–4), 409–428. CrossRefGoogle Scholar
  2. Bartlett, P. L., & Mendelson, S. (2002). Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482. MathSciNetGoogle Scholar
  3. Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in neural information processing systems. Cambridge: MIT Press. Google Scholar
  4. Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 161–168). Google Scholar
  5. Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37, 1757–1771. CrossRefGoogle Scholar
  6. Carneiro, G., Chan, A., Moreno, P., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394–410. CrossRefGoogle Scholar
  7. Collins, M., Globerson, A., Koo, T., Carreras, X., & Bartlett, P. L. (2008). Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks. Journal of Machine Learning Research, 9, 1775–1822. MathSciNetzbMATHGoogle Scholar
  8. Darwiche, A. (2003). A differential approach to inference in Bayesian networks. Journal of the ACM, 50, 123–132. MathSciNetCrossRefGoogle Scholar
  9. Daumé III, H., Langford, J., & Marcu, D. (2009). Search-based structured prediction. Machine Learning Journal, 75(3), 297–325. CrossRefGoogle Scholar
  10. Doucet, A., de Freitas, N., Murphy, K. P., & Russell, S. J. (2000). Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Annual conference on uncertainty in artificial intelligence (pp. 176–183). Google Scholar
  11. Duygulu, P., Barnard, K., de Freitas, J., & Forsyth, D. (2006). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In A. Heyden, G. Sparr, M. Nielsen, & P. Johansen (Eds.), Lecture notes in computer science: Vol. 2353. European conference on computer vision (pp. 349–354). Berlin: Springer. Google Scholar
  12. Fan, R. E., & Lin, C. J. (2007). A study on threshold selection for multi-label classification (Tech. rep.). National Taiwan University. Google Scholar
  13. Feng, S., Manmatha R., & Lavrenko, V. (2004). Multiple Bernoulli relevance models for image and video annotation. In Computer vision and pattern recognition. Google Scholar
  14. Finley, T., & Joachims, T. (2008). Training structural SVMs when exact inference is intractable. In International conference on machine learning (pp. 304–311). New York: ACM. Google Scholar
  15. Guillaumin, M., Mensink, T., Verbeek, J., & Schmid, C. (2009). Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In International conference on computer vision (pp. 309–316). CrossRefGoogle Scholar
  16. Guo, Y., Wilkinson, D., & Schuurmans, D. (2005). Maximum margin Bayesian networks. In Annual conference on uncertainty in artificial intelligence. Google Scholar
  17. Hinton, G. E., & Sejnowski, T. J. (1983). Optimal perceptual inference. In Computer vision and pattern recognition (pp. 448–453). Google Scholar
  18. Hoefel, G., & Elkan, C. (2008). Learning a two-stage SVM/CRF sequence classifier. In ACM conference on information and knowledge management (pp. 271–278). New York, NY, USA. Google Scholar
  19. Hsieh, C. J., Chang, K. W., Lin, C. J., Keerthi, S. S., & Sundararajan, S. (2008). A dual coordinate descent method for large-scale linear SVM. In International conference on machine learning (pp. 408–415). New York: ACM. Google Scholar
  20. Joachims, T., Finley, T., & Yu, C. N. J. (2009). Cutting-plane training of structural SVMs. Machine Learning, 77(1), 27–59. zbMATHCrossRefGoogle Scholar
  21. Kassel, R. H. (1995). A comparison of approaches to on-line handwritten character recognition. PhD thesis, Cambridge, MA, USA. Google Scholar
  22. Kolmogorov, V., & Zabih, R. (2002). What energy functions can be minimized via graph cuts? In European conference on computer vision (pp. 185–208). Google Scholar
  23. Kulesza, A., & Pereira, F. (2007). Structured learning with approximate inference. In Advances in neural information processing systems. Google Scholar
  24. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In International conference on machine learning. Google Scholar
  25. Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In Advances in neural information processing systems. Cambridge: MIT Press. Google Scholar
  26. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Computer vision and pattern recognition. Google Scholar
  27. LeCun, Y., Chopra, S., Hadsell, R., Huang, F. J., Bakir, G., Hofman, T., Schölkopf, B., Smola, A., & Taskar, B. (Eds.) (2006). A tutorial on energy-based learning. In Predicting structured data. Cambridge: MIT Press. Google Scholar
  28. Liu, J., Li, M., Liu, Q., Lu, H., & Ma, S. (2009). Image annotation via graph learning. Pattern Recognition, 42, 218–228. zbMATHCrossRefGoogle Scholar
  29. Lowd, D., & Domingos, P. (2008). Learning arithmetic circuits. In Annual conference on uncertainty in artificial intelligence (pp. 383–392). Corvallis: AUAI Press. Google Scholar
  30. Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In European conference on computer vision (pp. 316–329). Berlin: Springer. Google Scholar
  31. McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proc. 17th international conf. on machine learning (pp. 591–598). San Francisco: Morgan Kaufmann. Google Scholar
  32. Metzler, D., & Manmatha, R. (2004). An inference network approach to image retrieval. In The international conference on image and video retrieval (pp. 42–50). Berlin: Springer. CrossRefGoogle Scholar
  33. Miao, X., & Rao, R. P. (2009). Large margin Boltzmann machines. In International joint conference on artificial intelligence (pp. 1156–1162). Google Scholar
  34. Neal, R. M. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56(1), 71–113. MathSciNetzbMATHCrossRefGoogle Scholar
  35. Perez-Cruz, F., Ghahramani, Z., & Pontil, M. (2007). Conditional graphical models. Cambridge: MIT Press. Google Scholar
  36. Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In Advances in neural information processing systems (pp. 1097–1104). Cambridge: MIT Press. Google Scholar
  37. Rosenberg, D., Klein, D., & Taskar, B. (2007). Mixture-of-parents maximum entropy Markov models. In Annual conference on uncertainty in artificial intelligence (pp. 318–325). Corvallis: AUAI Press. Google Scholar
  38. Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2008). Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1), 157–173. CrossRefGoogle Scholar
  39. Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for SVM. In International conference on machine learning (pp. 807–814). New York: ACM. Google Scholar
  40. Shalev-Shwartz, S., Srebro, N., & Sridharan, K. (2008). Fast rates for regularized objectives. In Advances in neural information processing systems. Google Scholar
  41. Tadepalli, P., & Natarajan, B. K. (1996). A formal framework for speedup learning from problems and solutions. The Journal of Artificial Intelligence Research, 4, 445–475. MathSciNetzbMATHGoogle Scholar
  42. Taskar, B., Guestrin, C., & Koller, D. (2004). Max-margin Markov networks. In Advances in neural information processing systems, Vancouver, Canada. Google Scholar
  43. Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In International conference on machine learning. Google Scholar
  44. Tsoumakas, G., Katakis, I., & Vlahavas, I. P. (2010). Mining multi-label data. In Data mining and knowledge discovery handbook, pp. 667–685. Google Scholar
  45. Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2003). Tree-reweighted belief propagation algorithms and approximate Ml estimation via pseudo-moment matching. In AISTATS. Google Scholar
  46. Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2005a). A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51, 2313–2335. MathSciNetCrossRefGoogle Scholar
  47. Wainwright, M. J., Jaakola, T., & Willsky, A. S. (2005b). MAP estimation via agreement on trees: message passing and linear programming. IEEE Transactions on Information Theory, 51, 3697–3717. CrossRefGoogle Scholar
  48. Yavlinsky, A., Schofield, E., & Rüger, S. (2005). Automated image annotation using global features and robust nonparametric density estimation. In The international conference on image and video retrieval (pp. 507–517). Berlin: Springer. CrossRefGoogle Scholar
  49. Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51, 2282–2312. MathSciNetCrossRefGoogle Scholar
  50. Zhang, T. (2001). Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1), 56–85. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Computer Science and EngineeringUniversity of WashingtonSeattleUSA

Personalised recommendations