Learning to Recognize Objects with Little Supervision

  • Peter Carbonetto
  • Gyuri Dorkó
  • Cordelia Schmid
  • Hendrik Kück
  • Nando de Freitas
Article

Abstract

This paper shows (i) improvements over state-of-the-art local feature recognition systems, (ii) how to formulate principled models for automatic local feature selection in object class recognition when there is little supervised data, and (iii) how to formulate sensible spatial image context models using a conditional random field for integrating local features and segmentation cues (superpixels). By adopting sparse kernel methods, Bayesian learning techniques and data association with constraints, the proposed model identifies the most relevant sets of local features for recognizing object classes, achieves performance comparable to the fully supervised setting, and obtains excellent results for image classification.

Keywords

Object recognition Scale-invariant keypoints Weakly supervised learning Data association Bayesian analysis Markov Chain Monte Carlo 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1475–1490. CrossRefGoogle Scholar
  2. Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Multiple instance learning with generalized support vector machines. In Proceedings of the 18th national conference on artificial intelligence (pp. 943–944). Google Scholar
  3. Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine Learning, 50(1–2), 5–43. MATHCrossRefGoogle Scholar
  4. Bernardo, J. M., & Smith, A. F. M. (2000). Bayesian theory. New York: Wiley. MATHGoogle Scholar
  5. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36, 192–236. MATHMathSciNetGoogle Scholar
  6. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press. MATHGoogle Scholar
  7. Carbonetto, P., de Freitas, N., Gustafson, P., & Thompson, N. (2003). Bayesian feature weighting for unsupervised learning, with application to object recognition. In Proceedings of the workshop on artificial intelligence and statistics. Google Scholar
  8. Carbonetto, P., de Freitas, N., & Barnard, K. (2004a). A statistical model for general contextual object recognition. In Proceedings of the 8th European conference on computer vision (Vol. I, pp. 350–362). Google Scholar
  9. Carbonetto, P., Dorko, G., Schmid, C., & de Freitas, N. (2004b). Bayesian learning for weakly supervised object classification. Technical report, INRIA Rhône-Alpes. Google Scholar
  10. Celeux, G., Hurn, M., & Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95, 957–970. MATHCrossRefMathSciNetGoogle Scholar
  11. Chib, S., & Greenberg, E. (1995). Understanding the Metropolis–Hastings algorithm. The American Statistician, 49(4), 327–335. CrossRefGoogle Scholar
  12. Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of the ECCV international workshop on statistical learning in computer vision. Google Scholar
  13. Deselaers, T., Keysers, D., & Ney, H. (2005). Discriminative training for object recognition using images patches. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 157–162). Google Scholar
  14. Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple instance learning with axis-parallel rectangles. Artificial Intelligence, 89(1), 31–71. MATHCrossRefGoogle Scholar
  15. Dorkó, G., & Schmid, C. (2003). Selection of scale invariant neighborhoods for object class recognition. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 634–640). Google Scholar
  16. Duygulu, P., Barnard, K., de Freitas, N., & Forsyth, D. A. (2002). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In Proceedings of the 7th European conference on computer vision (Vol. IV, pp. 97–112). Google Scholar
  17. Everingham, M., Zisserman, A., Williams, C., & Gool, L. V. (2006). The PASCAL visual object classes challenge 2006 (VOC2006) results. Technical report. Google Scholar
  18. Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 264–271). Google Scholar
  19. Hamze, F., & de Freitas, N. (2004). From fields to trees. In Proceedings of the 20th conference on uncertainty in artificial intelligence (pp. 243–250). Google Scholar
  20. Kadir, T., & Brady, M. (2001). Scale, saliency and image description. International Journal of Computer Vision, 45(2), 83–105. MATHCrossRefGoogle Scholar
  21. Kohn, R., Smith, M., & Chan, D. (2001). Nonparametric regression using linear combinations of basis functions. Statistics and Computing, 11, 313–322. CrossRefMathSciNetGoogle Scholar
  22. Kück, H., & de Freitas, N. (2005). Learning about individuals from group statistics. In Proceedings of the 21st conference on uncertainty in artificial intelligence (pp. 332–339). Google Scholar
  23. Kück, H., Carbonetto, P., & de Freitas, N. (2004). A constrained semi-supervised learning approach to data association. In Proceedings of the 8th European conference on computer vision (Vol. III, pp. 1–12). Google Scholar
  24. Kumar, S., & Hebert, M. (2006). Discriminative random fields. International Journal of Computer Vision, 26, 179–201. CrossRefGoogle Scholar
  25. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international conference on machine learning. Google Scholar
  26. Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence. Google Scholar
  27. Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. I, pp. 878–885). Google Scholar
  28. Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116. CrossRefGoogle Scholar
  29. Liu, J. S., & Wu, Y. N. (1999). Parameter expansion for data augmentation. Journal of the American Statistical Association, 94(448), 1264–1274. MATHCrossRefMathSciNetGoogle Scholar
  30. Liu, J. S., Wong, W. H., & Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika, 81(1), 27–40. MATHCrossRefMathSciNetGoogle Scholar
  31. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  32. Marsden, J. E., & Tromba, A. J. (1999). Vector calculus (4th ed.). New York: Freeman. Google Scholar
  33. McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica, 57, 995–1026. MATHCrossRefMathSciNetGoogle Scholar
  34. Mikolajczyk, K., & Schmid, C. (2001). Indexing based on scale invariant interest points. In Proceedings of the 8th international conference on computer vision (Vol. I, pp. 525–531). Google Scholar
  35. Mikolajczyk, K., & Schmid, C. (2003). A Performance evaluation of local descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 257–263). Google Scholar
  36. Mikolajczyk, K., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In Proceedings of the 8th European conference on computer vision (Vol. I, pp. 69–82). Google Scholar
  37. Miller, T., Berg, A. C., Edwards, J., Maire, M., White, R., Teh, Y. W., Learned-Miller, E., & Forsyth, D. A. (2004). Names and faces in the news. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 848–854). Google Scholar
  38. Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In Proceedings of the 8th European conference on computer vision (Vol. II, pp. 71–84). Google Scholar
  39. Pasula, H., Marthi, B., Milch, B., Russell, S., & Shpitser, I. (2003). Identity uncertainty and citation matching. In Advances in neural information processing systems 15. Google Scholar
  40. Quattoni, A., Collins, M., & Darrell, T. (2005). Conditional random fields for object recognition. In Advances in neural information processing systems 17 (pp. 1097–1104) Google Scholar
  41. Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 10–17). Google Scholar
  42. Robert, C. P. (1994). The Bayesian choice. Berlin: Springer. MATHGoogle Scholar
  43. Robert, C. P. (1995). Simulation of truncated normal variables. Statistics and Computing, 5, 121–125. MATHCrossRefGoogle Scholar
  44. Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods (2nd ed.). Berlin: Springer. MATHGoogle Scholar
  45. Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. II, pp. 994–1000). Google Scholar
  46. Shi, J., & Malik, J. (1997). Normalized cuts and image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 731–737). Google Scholar
  47. Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., & Freeman, W. T. (2005). Discovering objects and their locations in images. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. I, pp. 370–377). Google Scholar
  48. Tham, S. (2002). Markov chain Monte Carlo for sparse Bayesian regression and classification. PhD thesis, University of Melbourne. Google Scholar
  49. Tham, S. S., Doucet, A., & Kotagiri, R. (2002). Sparse Bayesian learning for regression and classification using Markov Chain Monte Carlo. In Proceedings of the 19th international conference on machine learning. Google Scholar
  50. Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244. MATHCrossRefMathSciNetGoogle Scholar
  51. Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision system for place and object recognition. In Proceedings of the 9th IEEE international conference on computer vision (Vol. I, pp. 273–280). Google Scholar
  52. Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154. CrossRefGoogle Scholar
  53. Willamowski, J., Arregui, D., Csurka, G., Dance, C. R., & Fan, L. (2004). Categorizing nine visual classes using local appearance descriptors. In Proceedings of the CVPR workshop on learning for adaptable visual systems. Google Scholar
  54. Winn, J., & Shotton, J. (2006). The layout consistent random field for recognizing and segmenting partially occluded objects. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 37–44). Google Scholar
  55. Zellner, A. (1971). An introduction to Bayesian inference in econometrics. New York: Wiley. MATHGoogle Scholar
  56. Zhang, J., Marsałek, M., Lazebnik, S., & Schmid, C. (2006). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238. CrossRefGoogle Scholar
  57. Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the 20th international conference on machine learning (pp. 912–919). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Peter Carbonetto
    • 1
  • Gyuri Dorkó
    • 2
  • Cordelia Schmid
    • 2
  • Hendrik Kück
    • 1
  • Nando de Freitas
    • 1
  1. 1.University of British ColumbiaVancouverCanada
  2. 2.INRIA Rhône-AlpesGrenobleFrance

Personalised recommendations