Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Fast PRISM: Branch and Bound Hough Transform for Object Class Detection

Abstract

This paper addresses the task of efficient object class detection by means of the Hough transform. This approach has been made popular by the Implicit Shape Model (ISM) and has been adopted many times. Although ISM exhibits robust detection performance, its probabilistic formulation is unsatisfactory. The PRincipled Implicit Shape Model (PRISM) overcomes these problems by interpreting Hough voting as a dual implementation of linear sliding-window detection. It thereby gives a sound justification to the voting procedure and imposes minimal constraints. We demonstrate PRISM’s flexibility by two complementary implementations: a generatively trained Gaussian Mixture Model as well as a discriminatively trained histogram approach. Both systems achieve state-of-the-art performance. Detections are found by gradient-based or branch and bound search, respectively. The latter greatly benefits from PRISM’s feature-centric view. It thereby avoids the unfavourable memory trade-off and any on-line pre-processing of the original Efficient Subwindow Search (ESS). Moreover, our approach takes account of the features’ scale value while ESS does not. Finally, we show how to avoid soft-matching and spatial pyramid descriptors during detection without losing their positive effect. This makes algorithms simpler and faster. Both are possible if the object model is properly regularised and we discuss a modification of SVMs which allows for doing so.

This is a preview of subscription content, log in to check access.

References

  1. Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.

  2. An, S., Peursum, P., Liu, W., & Venkatesh, S. (2009). Efficient algorithms for subwindow search in object detection and localization. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  3. Baggenstoos, P. M. (2002). Statistical modeling using Gaussian mixtures and hmms with Matlab. Tech. rep., Naval Undersea Warfare Center, Newport, RI, http://www.npt.nuwc.navy.mil/Csf/software.html.

  4. Ballard, D. (1981). Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition, 13(2), 111–122.

  5. Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Surf: Speeded up robust features. Computer Vision and Image Understanding, 110(3), 346–359.

  6. Bentley, J. (1984). Programming pearls: algorithm design techniques. Communications of the ACM, 27(9), 865–873.

  7. Blaschko, M. B., & Lampert, C. H. (2008). Learning to localize objects with structured output regression. In Proceedings of the European conference on computer vision.

  8. Breuel, T. M. (1992). Fast recognition using adaptive subdivisions of transformation space. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  9. Breuel, T. M. (2002). A comparison of search strategies for geometric branch and bound algorithms. In Proceedings of the European conference on computer vision.

  10. Carreira Perpiñán, M. Á. (2000). Mode-finding for mixtures of Gaussian distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), 1318–1323.

  11. Chapelle, O. (2007). Training a support vector machine in the primal. Neural Computation, 19(5), 1155–1178.

  12. Chum, O., & Zisserman, A. (2007). An exemplar model for learning object classes. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  13. Comaniciu, D., Ramesh, V., & Meer, P. (2001). The variable bandwidth mean shift and data-driven scale selection. In Proceedings of the IEEE international conference on computer vision.

  14. Cornelis, N., & Van Gool, L. (2008). Fast scale invariant feature detection and matching on programmable graphics hardware. In Proceedings of the computer vision and pattern recognition (CVPR) workshop.

  15. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  16. Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  17. Ferrari, V., Jurie, F., & Schmid, C. (2007). Accurate object detection with deformable shape models learnt from images. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  18. Fritz, M., Leibe, B., Caputo, B., & Schiele, B. (2005). Integrating representative and discriminant models for object category detection. In Proceedings of the IEEE international conference on computer vision.

  19. Gall, J., & Lempitsky, V. (2009). Class-specific hough forests for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  20. Grauman, K., & Darrell, T. (2005). The pyramid match kernel:discriminative classification with sets of image features. In Proceedings of the IEEE international conference on computer vision.

  21. Heitz, G., & Koller, D. (2008). Learning spatial context: Using stuff to find things. In Proceedings of the European conference on computer vision.

  22. Keysers, D., Deselaers, T., & Breuel, T. M. (2007). Geometric matching for patch-based object detection. Electronic Letters on Computer Vision and Image Analysis, 6(1), 44–54.

  23. Kittler, J., Hatef, M., Duin, R., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.

  24. Lampert, C. H., Blaschko, M. B., & Hofmann, T. (2008). Beyond sliding windows: Object localization by efficient subwindow search. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  25. Lampert, C. H., Blaschko, M. B., & Hofmann, T. (2009). Efficient subwindow search: A branch and bound framework for object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(1).

  26. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 2169–2178.

  27. Lehmann, A., Leibe, B., & Van Gool, L. (2009a). Feature-centric efficient subwindow search. In Proceedings of the IEEE international conference on computer vision.

  28. Lehmann, A., Leibe, B., & Van Gool, L. (2009b). Prism: Principled implicit shape model. In Proceedings of the British machine vision conference.

  29. Leibe, B., & Schiele, B. (2004). Scale-invariant object categorization using a scale-adaptive mean-shift search. In Proceedings of the DAGM symposium.

  30. Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  31. Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection by interleaving categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.

  32. Liebelt, J., Schmid, C., & Schertler, K. (2008). Viewpoint-independent object class detection using 3D feature maps. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  33. Lindeberg, T. (1994). Scale-space theory in computer vision. Amsterdam: Kluwer Academic.

  34. Maji, S., & Malik, J. (2009). Object detection using a max-margin hough transform. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  35. Maji, S., Berg, A. C., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  36. Mallat, S., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397–3415.

  37. Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.

  38. Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.

  39. Ommer, B., & Buhmann, J. M. (2007). Learning the compositional nature of visual objects. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  40. Opelt, A., Pinz, A., & Zisserman, A. (2006). A boundary-fragment-model for object detection. In Proceedings of the European conference on computer vision.

  41. Perona, P., & Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7), 629–639.

  42. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  43. Sandler, T., Talukdar, P. P., Ungar, L. H., & Blitzer, J. (2008). Regularized learning with networks of features. In Proceedings of the advances in neural information processing systems.

  44. Schneiderman, H. (2004). Feature-centric evaluation for efficient cascaded object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol 2, pp. 29–36).

  45. Schneiderman, H., & Kanade, T. (2004). Object detection using the statistics of parts. International Journal of Computer Vision, 56(3), 151–177.

  46. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.

  47. Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2005). Learning hierarchical models of scenes, objects, and parts. In Proceedings of the IEEE international conference on computer vision.

  48. Viola, P. A., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.

  49. Williams, C. K. I., & Allan, M. (2006). On a connection between object localization with a generative template of features and pose-space prediction methods (Tech. Rep. 0719). University of Edinburgh.

  50. Yeh, T., Lee, J. J., & Trevor, Darrell T. (2009). Fast concurrent object localization and recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.

Download references

Author information

Correspondence to Alain Lehmann.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lehmann, A., Leibe, B. & Van Gool, L. Fast PRISM: Branch and Bound Hough Transform for Object Class Detection. Int J Comput Vis 94, 175–197 (2011). https://doi.org/10.1007/s11263-010-0342-x

Download citation

Keywords

  • Object detection
  • Hough transform
  • Sliding-window
  • Branch and bound
  • Soft-matching
  • Spatial pyramid histograms