Scene Understanding through Autonomous Interactive Perception

  • Niklas Bergström
  • Carl Henrik Ek
  • Mårten Björkman
  • Danica Kragic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6962)


We propose a framework for detecting, extracting and modeling objects in natural scenes from multi-modal data. Our framework is iterative, exploiting different hypotheses in a complementary manner. We employ the framework in realistic scenarios, based on visual appearance and depth information. Using a robotic manipulator that interacts with the scene, object hypotheses generated using appearance information are confirmed through pushing. The framework is iterative, each generated hypothesis is feeding into the subsequent one, continuously refining the predictions about the scene. We show results that demonstrate the synergic effect of applying multiple hypotheses for real-world scene understanding. The method is efficient and performs in real-time.


Image Space Object Segmentation Rigid Object Dense Label Motion Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bagon, S., Boiman, O., Irani, M.: What is a good image segment? A unified approach to segment extraction. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 30–44. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Batra, D., Kowdle, A., Parikh, D., Luo, J., Chen, T.: icoseg: Interactive co-segmentation with intelligent scribble guidance. In: CVPR, pp. 3169–3176 (2010)Google Scholar
  3. 3.
    Bergström, N., Björkman, M., Kragic, D.: Generating Object Hypotheses in Natural Scenes through Human-Robot Interaction. In: IROS (2011)Google Scholar
  4. 4.
  5. 5.
    Johnson-Roberson, M., Bohg, J., Skantze, G., Gustafson, J., Carlson, R., Rasolzadeh, B., Kragic, D.: Enhanced Visual Scene Understanding through Human-Robot Dialog. In: IROS, San Francisco, USA (2011)Google Scholar
  6. 6.
    Björkman, M., Kragic, D.: Active 3d scene segmentation and detection of unknown objects. In: ICRA, pp. 3114–3120 (2010)Google Scholar
  7. 7.
    Björkman, M., Kragic, D.: Active 3d segmentation through fixation of previously unseen objects. In: Proceedings of the British Machine Vision Conference, pp. 361–386. BMVA Press (2010)Google Scholar
  8. 8.
    Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)CrossRefGoogle Scholar
  9. 9.
    Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)CrossRefGoogle Scholar
  10. 10.
    Goh, A., Vidal, R.: Segmenting motions of different types by unsupervised manifold clustering. In: Proceedings of CVPR, pp. 1–6 (2007)Google Scholar
  11. 11.
    Johnson-Roberson, M., Skantze, G., Bohg, J., Gustafson, J., Carlson, R., Kragic, D.: Enhanced visual scene understanding through human-robot dialog. In: 2010 AAAI Fall Symposium on Dialog with Robots (2010)Google Scholar
  12. 12.
    Katz, D., Brock, O.: Manipulating articulated objects with interactive perception. In: Proceedings of the IEEE ICRA, Pasadena, USA, pp. 272–277 (2008)Google Scholar
  13. 13.
    Kenney, J., Buckley, T., Brock, O.: Interactive segmentation for manipulation in unstructured environments. In: ICRA 2009, USA, pp. 1343–1348 (2009)Google Scholar
  14. 14.
    Kootstra, G., Bergström, N., Kragic, D.: Fast and automatic detection and segmentation of unknown objects. In: Proceedings of the IEEE-RAS International Conference on Humanois Robotics, Nashville, TN, December 6-8 (2010)Google Scholar
  15. 15.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI, pp. 674–679 (1981)Google Scholar
  16. 16.
    Microsoft Corp. Redmond WA. Kinect for Xbox 360Google Scholar
  17. 17.
    Mishra, A.K., Aloimonos, Y.: Active segmentation. I. J. Humanoid Robotics 6(3), 361–386 (2009)CrossRefGoogle Scholar
  18. 18.
    Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)CrossRefGoogle Scholar
  19. 19.
    Shi, J., Tomasi, C.: Good features to track, Tech. report, Ithaca, USA (1993)Google Scholar
  20. 20.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  21. 21.
    Stein, A.N., Stepleton, T.S., Hebert, M.: Towards unsupervised whole-object segmentation: Combining automated matting with boundary detection. In: CVPR. IEEE Computer Society, Los Alamitos (2008)Google Scholar
  22. 22.
    Strom, J., Richardson, A., Olson, E.: Graph-based segmentation for colored 3d laser point clouds. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2131–2136 (October 2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Niklas Bergström
    • 1
  • Carl Henrik Ek
    • 1
  • Mårten Björkman
    • 1
  • Danica Kragic
    • 1
  1. 1.Computer Vision and Active Perception LaboratoryRoyal Institute of Technology (KTH)StockholmSweden

Personalised recommendations