Latent Hough Transform for Object Detection

  • Nima Razavi
  • Juergen Gall
  • Pushmeet Kohli
  • Luc van Gool
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7574)


Hough transform based methods for object detection work by allowing image features to vote for the location of the object. While this representation allows for parts observed in different training instances to support a single object hypothesis, it also produces false positives by accumulating votes that are consistent in location but inconsistent in other properties like pose, color, shape or type. In this work, we propose to augment the Hough transform with latent variables in order to enforce consistency among votes. To this end, only votes that agree on the assignment of the latent variable are allowed to support a single hypothesis. For training a Latent Hough Transform (LHT) model, we propose a learning scheme that exploits the linearity of the Hough transform based methods. Our experiments on two datasets including the challenging PASCAL VOC 2007 benchmark show that our method outperforms traditional Hough transform based methods leading to state-of-the-art performance on some categories.


Training Data Training Image Object Detection Training Instance Latent Variable Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  2. 2.
    Ferrari, V., Fevrier, L., Jurie, F., Schmid, C.: Groups of adjacent contour segments for object detection. TPAMI 30, 36–51 (2008)CrossRefGoogle Scholar
  3. 3.
    Ojala, T., Pietikinen, M., Menp, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. TPAMI 24, 971–987 (2002)CrossRefGoogle Scholar
  4. 4.
    Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77, 259–289 (2008)CrossRefGoogle Scholar
  5. 5.
    Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. TPAMI 33, 2188–2202 (2011)CrossRefGoogle Scholar
  6. 6.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)Google Scholar
  7. 7.
    Hoiem, D., Rother, C., Winn, J.: 3d layoutcrf for multi-view object class recognition and segmentation. In: CVPR (2007)Google Scholar
  8. 8.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. TPAMI 32, 1627–1645 (2009)CrossRefGoogle Scholar
  9. 9.
    Bergtholdt, M., Kappes, J., Schmidt, S., Schnörr, C.: A study of parts-based object class detection using complete graphs. IJCV 87, 93–117 (2010)CrossRefGoogle Scholar
  10. 10.
    Leibe, B., Cornelis, N., Cornelis, K., Van Gool, L.: Dynamic 3d scene analysis from a moving vehicle. In: CVPR (2007)Google Scholar
  11. 11.
    Seemann, E., Leibe, B., Schiele, B.: Multi-aspect detection of articulated objects. In: CVPR (2006)Google Scholar
  12. 12.
    Seemann, E., Fritz, M., Schiele, B.: Towards robust pedestrian detection in crowded image sequences. In: CVPR (2007)Google Scholar
  13. 13.
    Razavi, N., Gall, J., Van Gool, L.: Backprojection revisited: Scalable multi-view object detection and similarity metrics for detections. In: ECCV (2010)Google Scholar
  14. 14.
    Marszałek, M., Schmid, C.: Accurate object localization with shape masks. In: CVPR (2007)Google Scholar
  15. 15.
    Stephens, R.: Probabilistic approach to the hough transform. Image and Vision Computing 9, 66–71 (1991)CrossRefGoogle Scholar
  16. 16.
    Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV 88, 303–338 (2010)CrossRefGoogle Scholar
  17. 17.
    Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: CVPR (2004)Google Scholar
  18. 18.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., Van Gool, L.: Towards multi-view object class detection. In: CVPR (2006)Google Scholar
  19. 19.
    Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: CVPR (2009)Google Scholar
  20. 20.
    Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: CVPR (2012)Google Scholar
  21. 21.
    Sun, M., Kohli, P., Shotton, J.: Conditional regression forests for human pose estimation. In: CVPR (2012)Google Scholar
  22. 22.
    Sun, M., Bradski, G., Xu, B.-X., Savarese, S.: Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 658–671. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Yarlagadda, P., Monroy, A., Ommer, B.: Voting by grouping dependent parts. In: ECCV (2010)Google Scholar
  24. 24.
    Girshick, R.B., Felzenszwalb, P.F., McAllester, D.: Object detection with grammar models. In: NIPS (2011)Google Scholar
  25. 25.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)Google Scholar
  26. 26.
    Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: ICCV (2011)Google Scholar
  27. 27.
    Torsello, A., Bulò, S., Pelillo, M.: Beyond partitions: Allowing overlapping groups in pairwise clustering. In: ICPR (2008)Google Scholar
  28. 28.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42, 177–196 (2001)zbMATHCrossRefGoogle Scholar
  29. 29.
    Farhadi, A., Tabrizi, M., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: ICCV (2009)Google Scholar
  30. 30.
    Wang, Y., Mori, G.: A discriminative latent model of object classes and attributes. In: ECCV (2010)Google Scholar
  31. 31.
    Bilen, H., Namboodiri, V., Van Gool, L.: Object and action classification with latent variables. In: BMVC (2011)Google Scholar
  32. 32.
    Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: CVPR (2010)Google Scholar
  33. 33.
    Razavi, N., Gall, J., Van Gool, L.: Scalable multiclass object detection. In: CVPR (2011)Google Scholar
  34. 34.
    Maji, S., Malik, J.: Object detection using a max-margin hough transform. In: CVPR (2009)Google Scholar
  35. 35.
    Zhang, Y., Chen, T.: Implicit shape kernel for discriminative learning of the hough transform detector. In: BMVC (2010)Google Scholar
  36. 36.
    Woodford, O., Pham, M., Maki, A., Perbet, F., Stenger, B.: Demisting the hough transform. In: BMVC (2011)Google Scholar
  37. 37.
    Barinova, O., Lempitsky, V., Kohli, P.: On detection of multiple object instances using hough transforms. In: CVPR (2010)Google Scholar
  38. 38.
    Gall, J., Potthoff, J., Schnörr, C., Rosenhahn, B., Seidel, H.: Interacting and annealing particle filters: Mathematics and a recipe for applications. Journal of Mathematical Imaging and Vision 28, 1–18 (2007)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology (2007)Google Scholar
  40. 40.
    Opelt, A., Pinz, A., Fussenegger, M., Auer, P.: Generic object recognition with boosting. TPAMI 28, 416–431 (2006)CrossRefGoogle Scholar
  41. 41.
    Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: ICCV (2009)Google Scholar
  42. 42.
    Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. In: CVPR (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Nima Razavi
    • 1
  • Juergen Gall
    • 2
  • Pushmeet Kohli
    • 3
  • Luc van Gool
    • 1
    • 4
  1. 1.Computer Vision LaboratoryETH ZurichSwitzerland
  2. 2.Preceiving Systems DepartmentMPI for Intelligent SystemsGermany
  3. 3.Microsoft Research CambridgeUK
  4. 4.IBBT/ESAT-PSIK.U. LeuvenBelgium

Personalised recommendations