Integrating Context and Occlusion for Car Detection by Hierarchical And-Or Model

  • Bo Li
  • Tianfu Wu
  • Song-Chun Zhu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)


This paper presents a method of learning reconfigurable hierarchical And-Or models to integrate context and occlusion for car detection. The And-Or model represents the regularities of car-to-car context and occlusion patterns at three levels: (i) layouts of spatially-coupled N cars, (ii) single cars with different viewpoint-occlusion configurations, and (iii) a small number of parts. The learning process consists of two stages. We first learn the structure of the And-Or model with three components: (a) mining N-car contextual patterns based on layouts of annotated single car bounding boxes, (b) mining the occlusion configurations based on the overlapping statistics between single cars, and (c) learning visible parts based on car 3D CAD simulation or heuristically mining latent car parts. The And-Or model is organized into a directed and acyclic graph which leads to the Dynamic Programming algorithm in inference. In the second stage, we jointly train the model parameters (for appearance, deformation and bias) using Weak-Label Structural SVM. In experiments, we test our model on four car datasets: the KITTI dataset [11], the street parking dataset [19], the PASCAL VOC2007 car dataset [7], and a self-collected parking lot dataset. We compare with state-of-the-art variants of deformable part-based models and other methods. Our model obtains significant improvement consistently on the four datasets.


Car Detection Context Occlusion And-Or Graph 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Azizpour, H., Laptev, I.: Object detection using strongly-supervised deformable part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 836–849. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  2. 2.
    Behley, J., Steinhage, V., Cremers, A.: Laser-based Segment Classification Using a Mixture of Bag-of-Words. In: IROS (2013)Google Scholar
  3. 3.
    Branson, S., Perona, P., Belongie, S.: Strong supervision from weak annotation: Interactive training of deformable part models. In: ICCV (2011)Google Scholar
  4. 4.
    Chen, G., Ding, Y., Xiao, J., Han, T.X.: Detection evolution with multi-order contextual co-occurrence. In: CVPR (2013)Google Scholar
  5. 5.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  6. 6.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. IJCV 95(1), 1–12 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV (2010)Google Scholar
  8. 8.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI (2010)Google Scholar
  9. 9.
    Felzenszwalb, P., McAllester, D.: Object detection grammars. Tech. rep., University of Chicago, Computer Science TR-2010-02 (2010)Google Scholar
  10. 10.
    Felzenszwalb, P., Huttenlocher, D.: Distance transforms of sampled functions. Theory of Computing (2012)Google Scholar
  11. 11.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR (2012)Google Scholar
  12. 12.
    Geiger, A., Wojek, C., Urtasun, R.: Joint 3D estimation of objects and scene layout. In: NIPS (2011)Google Scholar
  13. 13.
    Girshick, R., Felzenszwalb, P., McAllester, D.: Object detection with grammar models. In: NIPS (2011)Google Scholar
  14. 14.
    Girshick, R.B., Felzenszwalb, P.F., McAllester, D.: Discriminatively trained deformable part models, release 5,
  15. 15.
    Hejrati, M., Ramanan, D.: Analyzing 3D objects in cluttered images. In: NIPS (2012)Google Scholar
  16. 16.
    Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 340–353. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. IJCV 80(1), 3–15 (2008)CrossRefGoogle Scholar
  18. 18.
    Hu, W., Zhu, S.C.: Learning 3D object templates by quantizing geometry and appearance spaces. TPAMI (to appear, 2014)Google Scholar
  19. 19.
    Li, B., Hu, W., Wu, T.F., Zhu, S.C.: Modeling occlusion by discriminative and-or structures. In: ICCV (2013)Google Scholar
  20. 20.
    Li, B., Song, X., Wu, T.F., Hu, W., Pei, M.: Coupling-and-decoupling: A hierarchical model for occlusion-free object detection. PR 47, 3254–3264 (2014)Google Scholar
  21. 21.
    Mathias, M., Benenson, R., Timofte, R., Van Gool, L.: Handling occlusions with franken-classifiers. In: ICCV (2013)Google Scholar
  22. 22.
    McAllester, D., Keshet, J.: Generalization bounds and consistency for latent structural probit and ramp loss. In: NIPS (2011)Google Scholar
  23. 23.
    Ouyang, W., Wang, X.: Single-pedestrian detection aided by multi-pedestrian detection. In: CVPR (2013)Google Scholar
  24. 24.
    Pepik, B., Stark, M., Gehler, P., Schiele, B.: Teaching 3d geometry to deformable part models. In: CVPR (2012)Google Scholar
  25. 25.
    Pepik, B., Stark, M., Gehler, P., Schiele, B.: Occlusion patterns for object class detection. In: CVPR (2013)Google Scholar
  26. 26.
    Sadeghi, M., Farhadi, A.: Recognition using visual phrases. In: CVPR (2011)Google Scholar
  27. 27.
    Song, X., Wu, T.F., Jia, Y., Zhu, S.C.: Discriminatively trained and-or tree models for object detection. In: CVPR (2013)Google Scholar
  28. 28.
    Tang, S., Andriluka, M., Schiele, B.: Detection and tracking of occluded people. In: BMVC (2012)Google Scholar
  29. 29.
    Tu, Z., Bai, X.: Auto-context and its application to high-level vision tasks and 3D brain image segmentation. TPAMI (2010)Google Scholar
  30. 30.
    Yang, Y., Baker, S., Kannan, A., Ramanan, D.: Recognizing proxemics in personal photos. In: CVPR (2012)Google Scholar
  31. 31.
    Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: CVPR (2010)Google Scholar
  32. 32.
    Zhu, S.C., Mumford, D.: A stochastic grammar of images. Found. Trends. Comput. Graph. Vis. (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Bo Li
    • 1
    • 2
  • Tianfu Wu
    • 2
  • Song-Chun Zhu
    • 2
  1. 1.Beijing Lab of Intelligent Information TechnologyBeijing Institute of TechnologyChina
  2. 2.Department of StatisticsUniversity of CaliforniaLos AngelesUSA

Personalised recommendations