A Coarse to Fine Object Proposal Framework for Autonomous Driving Object Detection Using Binocular Image

  • Xiaolong LiuEmail author
  • Wanzeng Cai
  • Zhengfa Liang
  • Yiliu Feng
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 699)


The now widely used object proposal methods for object detection commonly get fulfilling results on the dataset, which is captured in simple scenes. But the performance degraded when it comes to complicate real traffic scene. In our paper, a coarse to fine object proposal generating framework is proposed for autonomous driving object detection, provides a better object proposal solution in complex circumstances. By adding several low level geometrical features, which can be efficiently computed from binocular images, we recalculate scores for the candidate bounding boxes generated by coarse region proposal approaches with a Bayesian probability model. Our proposal generation approach is validated on the challenging KITTI benchmark, achieving state-of-art object proposal performance for pedestrian, car and cyclist.


Object proposal Object detection Stereo vision Bayesian probability model Coarse to fine framework 


  1. 1.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)Google Scholar
  2. 2.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_26 Google Scholar
  3. 3.
    van de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, pp. 1879–1886, November 2011Google Scholar
  4. 4.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition, pp. 580–587. IEEE (2014)Google Scholar
  5. 5.
    Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision IEEE, pp. 1440–1448 (2015)Google Scholar
  6. 6.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 167–181 (2004)CrossRefGoogle Scholar
  7. 7.
    Chang, K.Y., Liu, T.L., Chen, H.T., Lai, S.H.: Fusing generic objectness and visual saliency for salient object detection, pp. 914–921 (2011)Google Scholar
  8. 8.
    Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 73–80. IEEE (2010)Google Scholar
  9. 9.
    Arbelaez, P., Ponttuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 328–335 (2014)Google Scholar
  10. 10.
    Cheng, M.M., Zhang, Z., Lin, W.Y., Torr, P.: BING: binarized normed gradients for objectness estimation at 300fps, pp. 3286–3293 (2014)Google Scholar
  11. 11.
    Chen, X., Kundu, K., Zhu, Y.: 3D object proposals for accurate object class detection (2015)Google Scholar
  12. 12.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell., p. 1 (2016)Google Scholar
  13. 13.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)Google Scholar
  14. 14.
    Bigdeli, S.A., Budweiser, G., Zwicker, M.: Temporally coherent disparity maps using CRFs with fast 4D filtering. In: IAPR Asian Conference on Pattern Recognition IEEE (2015)Google Scholar
  15. 15.
    Seki, A., Pollefeys, M.: Patch based confidence prediction for dense disparity map. In: British Machine Vision Conference (BMVC) (2016)Google Scholar
  16. 16.
    Guney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: Computer Vision and Pattern Recognition. IEEE (2015)Google Scholar
  17. 17.
    Guo, K., Li, N., Zhang, M.: The application of RANSIC in video mosaicing. In: Second International Conference on Electric Information and Control Engineering, pp. 652–655 (2012)Google Scholar
  18. 18.
    Wang, X., Yang, M., Zhu, S., Lin, Y.: Regionlets for generic object detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2071–2084 (2015)CrossRefGoogle Scholar
  19. 19.
    Tuytelaars, T.: Dense interest points. In: IEEE Conference on Computer Vision & Pattern Recognition, pp. 2281–2288 (2010)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  • Xiaolong Liu
    • 1
    Email author
  • Wanzeng Cai
    • 1
  • Zhengfa Liang
    • 1
  • Yiliu Feng
    • 1
  1. 1.College of ComputerNational University of Defense TechnologyChangshaChina

Personalised recommendations