Advertisement

Semantic Segmentation of Urban Scenes Using Dense Depth Maps

  • Chenxi Zhang
  • Liang Wang
  • Ruigang Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6314)

Abstract

In this paper we present a framework for semantic scene parsing and object recognition based on dense depth maps. Five view-independent 3D features that vary with object class are extracted from dense depth maps at a superpixel level for training a classifier using randomized decision forest technique. Our formulation integrates multiple features in a Markov Random Field (MRF) framework to segment and recognize different object classes in query street scene images. We evaluate our method both quantitatively and qualitatively on the challenging Cambridge-driving Labeled Video Database (CamVid). The result shows that only using dense depth information, we can achieve overall better accurate segmentation and recognition than that from sparse 3D features or appearance, or even the combination of sparse 3D features and appearance, advancing state-of-the-art performance. Furthermore, by aligning 3D dense depth based features into a unified coordinate frame, our algorithm can handle the special case of view changes between training and testing scenarios. Preliminary evaluation in cross training and testing shows promising results.

Supplementary material

978-3-642-15561-1_51_MOESM1_ESM.avi (13.2 mb)
Electronic Supplementary Material (13,480 KB)
978-3-642-15561-1_51_MOESM2_ESM.pdf (125 kb)
Electronic Supplementary Material (126 KB)

References

  1. 1.
    Levinshtein, A., Stere, A., Kutulakos, K.N., Fleet, D.J., Dickinson, S.J.: Turbopixels: Fast superpixels using geometric flows. IEEE Trans. on Pattern Analysis and Machine Intelligence 31(12), 2290–2297 (2009)CrossRefGoogle Scholar
  2. 2.
    Russell, B.C., Torralba, A.: Labelme: a database and web-based tool for image. Int. J. of Computer Vision 77(1)Google Scholar
  3. 3.
    Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: Label transfer via dense scene alignment. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (2009)Google Scholar
  4. 4.
    Collins, R.T.: A space-sweep approach to true multi-image matching. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 358–365 (1996)Google Scholar
  5. 5.
    Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence 25(5), 603–619 (2002)CrossRefGoogle Scholar
  6. 6.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient belief propagation for early vision. Int. J. of Computer Vision 70(1) (October 2006)Google Scholar
  7. 7.
    Fischler, M., Bolles, R.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6), 381–395 (1981)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Brostow, G., Fauqueur, J., Cipolla, R.: Semantic object classes in video: A high-definition ground truth database. Pattern Recognition letters 20(2), 88–97 (2009)CrossRefGoogle Scholar
  9. 9.
    Browtow, G., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (2008)Google Scholar
  11. 11.
    Xiao, J., Quan, L.: Multiple view semantic segmentation for street view images. In: Proc. of Intl. Conf. on Computer Vision (2009)Google Scholar
  12. 12.
    Li, L., Li, F.: What, where and who? classifying events by scene and object recognition. In: Proc. of Intl. Conf. on Computer Vision (2007)Google Scholar
  13. 13.
    Merrell, P., Akbarzadeh, A., Wang, L., Mordohai, P., Frahm, J.M., Yang, R., Nister, D., Pollefeys, M.: Real-time visibility-based fusion of depth maps. In: Proc. of Intl. Conf. on Computer Vision (2007)Google Scholar
  14. 14.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 36(1), 3–42 (2006)CrossRefGoogle Scholar
  15. 15.
    Pollefeys, M., Nister, D., Frahm, J.M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S.J., Merrell, P., Salmi, C., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewenius, H., Yang, R., Welch, G., Towles, H.: Detailed real-time urban 3d reconstruction from video. Int. J. of Computer Vision 78(2), 143–167 (2008)CrossRefGoogle Scholar
  16. 16.
    Sun, J., Li, Y., Kang, S.B., Shum, H.Y.: Symmetric stereo matching for occlusion handling. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (2005)Google Scholar
  17. 17.
    Yang, R., Pollefeys, M.: Multi-resolution real-time stereo on commodity graphics hardware. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (2003)Google Scholar
  18. 18.
    Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(11), 1222–1239 (2001)CrossRefGoogle Scholar
  19. 19.
    Zhang, G., Jia, J., Wang, T.T., Bao, H.: Recovering consistent video depth maps via bundle optimization. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (2008)Google Scholar
  20. 20.
    Zhang, G., Qin, X., Hua, W., Wang, T.T., Heng, P.A., Bao, H.: Robust metric reconstruction from challenging video sequences. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (2007), http://www.zjucvg.net/acts/acts.html

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Chenxi Zhang
    • 1
  • Liang Wang
    • 1
  • Ruigang Yang
    • 1
  1. 1.Center for Visualization and Virtual EnvironmentsUniversity of KentuckyUSA

Personalised recommendations