International Journal of Computer Vision

, Volume 104, Issue 2, pp 154–171 | Cite as

Selective Search for Object Recognition

  • J. R. R. Uijlings
  • K. E. A. van de Sande
  • T. Gevers
  • A. W. M. Smeulders
Article

Abstract

This paper addresses the problem of generating possible object locations for use in object recognition. We introduce selective search which combines the strength of both an exhaustive search and segmentation. Like segmentation, we use the image structure to guide our sampling process. Like exhaustive search, we aim to capture all possible object locations. Instead of a single technique to generate possible object locations, we diversify our search and use a variety of complementary image partitionings to deal with as many image conditions as possible. Our selective search results in a small set of data-driven, class-independent, high quality locations, yielding 99 % recall and a Mean Average Best Overlap of 0.879 at 10,097 locations. The reduced number of locations compared to an exhaustive search enables the use of stronger machine learning techniques and stronger appearance models for object recognition. In this paper we show that our selective search enables the use of the powerful Bag-of-Words model for recognition. The selective search software is made publicly available (Software: http://disi.unitn.it/~uijlings/SelectiveSearch.html).

References

  1. Alexe, B., Deselaers, T., Ferrari, V. (2010). What is an object? In CVPR.Google Scholar
  2. Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2189–2202.CrossRefGoogle Scholar
  3. Arbeláez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.CrossRefGoogle Scholar
  4. Carreira, J., Sminchisescu, C. (2010). Constrained parametric min-cuts for automatic object segmentation. In CVPR.Google Scholar
  5. Chum, O., Zisserman, A. (2007). An exemplar model for learning object classes. In CVPR.Google Scholar
  6. Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 603–619.CrossRefGoogle Scholar
  7. Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). In ECCV statistical learning in computer vision: Visual categorization with bags of keypoints.Google Scholar
  8. Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.Google Scholar
  9. Endres, I., Hoiem, D. (2010). Category independent object proposals. In ECCV.Google Scholar
  10. Everingham, M., Gool, L. V., Williams, C., Winn, J., & Zisserman, A. (2011). The Pascal visual object classes challenge workshop: Overview and results of the detection challenge.Google Scholar
  11. Everingham, M., van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.Google Scholar
  12. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627–1645.CrossRefGoogle Scholar
  13. Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59, 167–181.CrossRefGoogle Scholar
  14. Geusebroek, J. M., van den Boomgaard, R., Smeulders, A. W. M., & Geerts, H. (2001). Color invariance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1338–1350.CrossRefGoogle Scholar
  15. Gu, C., Lim, J. J., Arbeláez, P., & Malik, J. (2009). In CVPR: Recognition using regions.Google Scholar
  16. Harzallah, H., Jurie, F., & Schmid, C. (2009). In ICCV: Combining efficient object localization and image classification.Google Scholar
  17. Lampert, C. H., Blaschko, M. B., & Hofmann, T. (2009). Efficient subwindow search: A branch and bound framework for object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 2129–2142.Google Scholar
  18. Lazebnik, S., Schmid, C., Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.Google Scholar
  19. Li, F., & Carreira, J., Sminchisescu, C. (2010). In CVPR: Object recognition as ranking holistic figure-ground hypotheses.Google Scholar
  20. Liu, C., Sharan, L., Adelson, E.H., Rosenholtz, R. (2010). Exploring features in a bayesian framework for material recognition. In Computer vision and pattern recognition 2010. IEEE. Google Scholar
  21. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRefGoogle Scholar
  22. Maji, S., Berg, A. C., & Malik, J. (2008). In CVPR: Classification using intersection kernel support vector machines is efficient.Google Scholar
  23. Maji, S., & Malik, J. (2009). Object detection using a max-margin hough transform. In CVPR.Google Scholar
  24. Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.CrossRefGoogle Scholar
  25. Perronnin, F., Sánchez, J., & Thomas M. (2010). In ECCV: Improving the Fisher Kernel for large-scale image classification.Google Scholar
  26. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888–905.CrossRefGoogle Scholar
  27. Sivic, J., Zisserman, A.(2003). Video google: A text retrieval approach to object matching in videos. In ICCV.Google Scholar
  28. Sonnenburg, S., Raetsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., et al. (2010). The shogun machine learning toolbox. Journal of Machine Learning Research, 11, 1799–1802.MATHGoogle Scholar
  29. Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. (2005). Image parsing: Unifying segmentation, detection and recognition. Marr Prize Issue. International Journal of Computer Vision. Google Scholar
  30. Uijlings, J. R. R., Smeulders, A. W. M., & Scha, R. J. H. (2010). Real-time visual concept classification. IEEE Transactions on Multimedia, 12(7), 665–681.CrossRefGoogle Scholar
  31. van de Sande, K. E. A., & Gevers, T. (2012). Illumination-invariant descriptors for discriminative visual object categorization. Technical report : University of Amsterdam.Google Scholar
  32. van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1582–1596.CrossRefGoogle Scholar
  33. van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2011). Empowering visual categorization with the GPU. IEEE Transactions on Multimedia, 13(1), 60–70.CrossRefGoogle Scholar
  34. Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). In ICCV: Multiple kernels for object detection.Google Scholar
  35. Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In CVPR, Volume 1, 511–518.Google Scholar
  36. Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154.CrossRefGoogle Scholar
  37. Zhou, X., Kai, Y., Zhang, T., & Huang, T. S. (2010). In ECCV: Image classification using super-vector coding of local image descriptors.Google Scholar
  38. Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). In CVPR: Latent hierarchical structural learning for object detection.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • J. R. R. Uijlings
    • 1
  • K. E. A. van de Sande
    • 2
  • T. Gevers
    • 2
  • A. W. M. Smeulders
    • 2
  1. 1.University of TrentoTrentoItaly
  2. 2.University of AmsterdamAmsterdamThe Netherlands

Personalised recommendations