Efficient Multi-cue Scene Segmentation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8142)


This paper presents a novel multi-cue framework for scene segmentation, involving a combination of appearance (grayscale images) and depth cues (dense stereo vision). An efficient 3D environment model is utilized to create a small set of meaningful free-form region hypotheses for object location and extent. Those regions are subsequently categorized into several object classes using an extended multi-cue bag-of-features pipeline. For that, we augment grayscale bag-of-features by bag-of-depth-features operating on dense disparity maps, as well as height pooling to incorporate a 3D geometric ordering into our region descriptor.

In experiments on a large real-world stereo vision data set, we obtain state-of-the-art segmentation results at significantly reduced computational costs. Our dataset is made public for benchmarking purposes.


Stereo Vision Sift Descriptor Region Descriptor Scene Segmentation Semantic Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Achanta, R., et al.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. PAMI 34(11), 2274–2282 (2012)CrossRefGoogle Scholar
  2. 2.
    Arbeláez, P., Hariharan, B., Gu, C.: Semantic Segmentation using Regions and Parts. In: Proc. CVPR, pp. 3378–3385 (2012)Google Scholar
  3. 3.
    Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic Segmentation with Second-Order Pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: Proc. CVPR, vol. 1, pp. 886–893 (2005)Google Scholar
  6. 6.
    Enzweiler, M., Gavrila, D.: A Multi-Level Mixture-of-Experts Framework for Pedestrian Classification. IEEE Trans. IP 20(10), 2967–2979 (2011)MathSciNetGoogle Scholar
  7. 7.
    Ester, M., et al.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. KDD, pp. 226–231 (1996)Google Scholar
  8. 8.
    Everingham, M., et al.: The Pascal Visual Object Classes (VOC) Challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  9. 9.
    Felzenszwalb, P., et al.: Object Detection with Discriminatively Trained Part Based Models. IEEE Trans. PAMI 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  10. 10.
    Fraundorfer, F., et al.: Combining Monocular and Stereo Cues for Mobile Robot Localization Using Visual Words. In: Proc. ICPR, pp. 3927–3930 (2010)Google Scholar
  11. 11.
    Fulkerson, B., Vedaldi, A.: Class Segmentation and Object Localization with Superpixel Neighborhoods. In: Proc. ICCV, pp. 670–677 (2009)Google Scholar
  12. 12.
    Grauman, K., Darrell, T.: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In: Proc. ICCV, vol. 2, pp. 1458–1465 (2005)Google Scholar
  13. 13.
    Gupta, S., Arbeláez, P., Malik, J.: Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images. In: Proc. CVPR (2013)Google Scholar
  14. 14.
    Hernández-Vela, A., et al.: BoVDW: Bag-of-Visual-and-Depth-Words for gesture recognition. In: Proc. ICPR, pp. 3–6 (2012)Google Scholar
  15. 15.
    Hirschmuller, H.: Stereo Processing by Semi-global Matching and Mutual Information. IEEE Trans. PAMI 30(2), 328–341 (2008)CrossRefGoogle Scholar
  16. 16.
    Ladický, L., et al.: Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction. In: Proc. BMVC, pp. 1–11 (2010)Google Scholar
  17. 17.
    Lazebnik, S., et al.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: Proc. CVPR, pp. 2169–2178 (2006)Google Scholar
  18. 18.
    Lowe, D.: Distinctive Image Features from Scale-Invariant Keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  19. 19.
    Micusik, B.: Semantic Segmentation of Street Scenes by Superpixel Co-occurrence and 3D Geometry. In: Computer Vision Workshops (ICCV), pp. 625–632 (2009)Google Scholar
  20. 20.
    Moosmann, F., Triggs, B., Jurie, F.: Fast Discriminative Visual Codebooks using Randomized Clustering Forests. In: NIPS (2007)Google Scholar
  21. 21.
    Pfeiffer, D., Franke, U.: Towards a Global Optimal Multi-layer Stixel Representation of Dense 3D Data. In: BMVC, pp. 51.1–51.12 (2011)Google Scholar
  22. 22.
    Shotton, J., et al.: TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context. IJCV 81(1), 2–23 (2009)CrossRefGoogle Scholar
  23. 23.
    Sturgess, P., et al.: Combining Appearance and Structure from Motion Features for Road Scene Understanding. In: Proc. BMVC, pp. 62.1–62.11 (2009)Google Scholar
  24. 24.
    Tang, J., Miller, S.: A Textured Object Recognition Pipeline for Color and Depth Image Data. In: Proc. ICRA (2012)Google Scholar
  25. 25.
    Vedaldi, A., Fulkerson, B.: VLFeat: An Open and Portable Library of Computer Vision Algorithms (2008),
  26. 26.
    Vieux, R., et al.: Segmentation-based Multi-Class Semantic Object Detection. Multimedia Tools and Applications 60, 305–326 (2012)CrossRefGoogle Scholar
  27. 27.
    Wu, J.: A Fast Dual Method for HIK SVM Learning. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 552–565. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  28. 28.
    Zhang, C., Wang, L., Yang, R.: Semantic Segmentation of Urban Scenes Using Dense Depth Maps. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 708–721. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  29. 29.
    Zhang, J., et al.: Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. IJCV 73(2), 213–238 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Environment PerceptionDaimler R&DSindelfingenGermany
  2. 2.Department of Computer ScienceTU DarmstadtDarmstadtGermany

Personalised recommendations