Segmentation and Recognition Using Structure from Motion Point Clouds

  • Gabriel J. Brostow
  • Jamie Shotton
  • Julien Fauqueur
  • Roberto Cipolla
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5302)


We propose an algorithm for semantic segmentation based on 3D point clouds derived from ego-motion. We motivate five simple cues designed to model specific patterns of motion and 3D world structure that vary with object category. We introduce features that project the 3D cues back to the 2D image plane while modeling spatial layout and context. A randomized decision forest combines many such features to achieve a coherent 2D segmentation and recognize the object categories present. Our main contribution is to show how semantic segmentation is possible based solely on motion-derived 3D world structure. Our method works well on sparse, noisy point clouds, and unlike existing approaches, does not need appearance-based descriptors.

Experiments were performed on a challenging new video database containing sequences filmed from a moving car in daylight and at dusk. The results confirm that indeed, accurate segmentation and recognition are possible using only motion and 3D world structure. Further, we show that the motion-derived information complements an existing state-of-the-art appearance-based method, improving both qualitative and quantitative performance.


Point Cloud Video Sequence Object Recognition Feature Track Motion Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

978-3-540-88682-2_5_MOESM1_ESM.avi (28.3 mb)
Supplementary material (29,005 KB)


  1. 1.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  2. 2.
    Merrell, P., Akbarzadeh, A., Wang, L., Mordohai, P., Frahm, J.M., Yang, R., Nister, D., Pollefeys, M.: Real-time visibility-based fusion of depth maps. In: Proceedings of the International Conference on Computer Vision (ICCV) (2007)Google Scholar
  3. 3.
    Posner, I., Schroeter, D., Newman, P.M.: Describing composite urban workspaces. In: ICRA (2007)Google Scholar
  4. 4.
    Boujou: 2d3 Ltd. (2007),
  5. 5.
    Chum, O., Zisserman, A.: An exemplar model for learning object classes. In: CVPR (2007)Google Scholar
  6. 6.
    Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)Google Scholar
  7. 7.
    Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: CVPR (2008)Google Scholar
  8. 8.
    Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: CVPR, vol. 2, pp. 2137–2144 (2006)Google Scholar
  9. 9.
    Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: ICCV, vol. 1, pp. 654–661 (2005)Google Scholar
  10. 10.
    Huber, D., Kapuria, A., Donamukkala, R., Hebert, M.: Parts-based 3d object classification. In: CVPR, pp. 82–89 (2004)Google Scholar
  11. 11.
    Hoiem, D., Rother, C., Winn, J.: 3d layout crf for multi-view object class recognition and segmentation. In: CVPR (2007)Google Scholar
  12. 12.
    Kushal, A., Schmid, C., Ponce, J.: Flexible object models for category-level 3d object recognition. In: CVPR (2007)Google Scholar
  13. 13.
    Pingkun, Y., Khan, S., Shah, M.: 3d model based object class detection in an arbitrary view. In: ICCV (2007)Google Scholar
  14. 14.
    Savarese, S., Fei-Fei, L.: 3d generic object categorization, localization and pose estimation. In: ICCV (2007)Google Scholar
  15. 15.
    Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 428–441. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Cedras, C., Shah, M.: Motion-based recognition: A survey. IVC 13(2), 129–155 (1995)CrossRefGoogle Scholar
  17. 17.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)Google Scholar
  18. 18.
    Viola, P.A., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: ICCV, pp. 734–741 (2003)Google Scholar
  19. 19.
    Yin, P., Criminisi, A., Winn, J.M., Essa, I.: Tree-based classifiers for bilayer video segmentation. In: CVPR (2007)Google Scholar
  20. 20.
    Wiles, C., Brady, M.: Closing the loop on multiple motions. In: ICCV, pp. 308–313 (1995)Google Scholar
  21. 21.
    Kang, J., Cohen, I., Medioni, G.G., Yuan, C.: Detection and tracking of moving objects from a moving platform in presence of strong parallax. In: ICCV, pp. 10–17 (2005)Google Scholar
  22. 22.
    Leibe, B., Cornelis, N., Cornelis, K., Gool, L.J.V.: Dynamic 3d scene analysis from a moving vehicle. In: CVPR (2007)Google Scholar
  23. 23.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, pp. 726–733 (2003)Google Scholar
  24. 24.
    Harris, C., Stephens, M.: A Combined Corner and Edge Detector. In: 4th ALVEY Vision Conference, pp. 147–151 (1988)Google Scholar
  25. 25.
    Mitra, N.J., Nguyen, A., Guibas, L.: Estimating surface normals in noisy point cloud data. International Journal of Computational Geometry and Applications 14, 261–276 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Shewchuk, J.R.: Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In: Lin, M.C., Manocha, D. (eds.) FCRC-WS 1996 and WACG 1996. LNCS, vol. 1148, pp. 203–222. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  27. 27.
    Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  28. 28.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR, vol. 1, pp. 511–518 (2001)Google Scholar
  29. 29.
    Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9(7), 1545–1588 (1997)CrossRefGoogle Scholar
  30. 30.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 36(1), 3–42 (2006)CrossRefzbMATHGoogle Scholar
  31. 31.
    Winn, J., Shotton, J.: The layout consistent random field for recognizing and segmenting partially occluded objects. In: CVPR (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Gabriel J. Brostow
    • 1
  • Jamie Shotton
    • 2
  • Julien Fauqueur
    • 3
  • Roberto Cipolla
    • 4
  1. 1.University College London and ETH ZurichUK
  2. 2.Microsoft Research CambridgeUSA
  3. 3.University of Cambridge (now with MirriAd Ltd.)USA
  4. 4.University of CambridgeUSA

Personalised recommendations