Foreground Segmentation from Occlusions Using Structure and Motion Recovery

  • Kai Cordes
  • Björn Scheuermann
  • Bodo Rosenhahn
  • Jörn Ostermann
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 359)


The segmentation of foreground objects in camera images is a fundamental step in many computer vision applications. For visual effect creation, the foreground segmentation is required for the integration of virtual objects between scene elements. On the other hand, camera and scene estimation is needed to integrate the objects perspectively correct into the video.

In this paper, discontinued feature tracks are used to detect occlusions. If these features reappear after their occlusion, they are connected to the correct previously discontinued trajectory during sequential camera and scene estimation. The combination of optical flow for features in consecutive frames and SIFT matching for the wide baseline feature connection provides accurate and stable feature tracking. The knowledge of occluded parts of a connected feature track is used to feed an efficient segmentation algorithm which crops the foreground image regions automatically. The presented graph cut based segmentation uses a graph contraction technique to minimize the computational expense.

The presented application in the integration of virtual objects into video. For this application, the accurate estimation of camera and scene is crucial. The segmentation is used for the automatic occlusion of the integrated objects with foreground scene content. Demonstrations show very realistic results.


Gaussian Mixture Model Virtual Object Foreground Object Bundle Adjustment Foreground Region 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pollefeys, M., Gool, L.V.V., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. International Journal of Computer Vision (IJCV) 59(3), 207–232 (2004)CrossRefGoogle Scholar
  2. 2.
    van den Hengel, A., Dick, A., Thormählen, T., Ward, B., Torr, P.H.S.: Videotrace: rapid interactive scene modelling from video. In: SIGGRAPH, vol. 86, ACM, New York (2007)Google Scholar
  3. 3.
    Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Seidel, H.P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009)Google Scholar
  4. 4.
    Hillman, P., Lewis, J., Sylwan, S., Winquist, E.: Issues in adapting research algorithms to stereoscopic visual effects. In: IEEE International Conference on Image Processing (ICIP), pp. 17–20 (2010)Google Scholar
  5. 5.
    Cornelis, K., Verbiest, F., Van Gool, L.: Drift detection and removal for sequential structure from motion algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 26(10), 1249–1259 (2004)CrossRefGoogle Scholar
  6. 6.
    Engels, C., Fraundorfer, F., Nistér, D.: Integration of tracked and recognized features for locally and globally robust structure from motion. In: VISAPP (Workshop on Robot Perception), pp. 13–22 (2008)Google Scholar
  7. 7.
    Zhang, G., Dong, Z., Jia, J., Wong, T.-T., Bao, H.: Efficient Non-consecutive Feature Tracking for Structure-from-Motion. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 422–435. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Cordes, K., Müller, O., Rosenhahn, B., Ostermann, J.: Feature Trajectory Retrieval with Application to Accurate Structure and Motion Recovery. In: Bebis, G. (ed.) ISVC 2011, Part I. LNCS, vol. 6938, pp. 156–167. Springer, Heidelberg (2011)Google Scholar
  9. 9.
    Apostoloff, N.E., Fitzgibbon, A.W.: Automatic video segmentation using spatiotemporal t-junctions. In: British Machine Vision Conference, BMVC (2006)Google Scholar
  10. 10.
    Apostoloff, N.E., Fitzgibbon, A.W.: Learning spatiotemporal t-junctions for occlusion detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 553–559 (2005)Google Scholar
  11. 11.
    Guan, L., Franco, J.S., Pollefeys, M.: 3d occlusion inference from silhouette cues. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)Google Scholar
  12. 12.
    Brox, T., Malik, J.: Object Segmentation by Long Term Analysis of Point Trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Brox, T., Malik, J.: Large displacement optical flow: Descriptor matching in variational motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 33(3), 500–513 (2011)CrossRefGoogle Scholar
  14. 14.
    Sheikh, Y., Javed, O., Kanade, T.: Background subtraction for freely moving cameras. In: IEEE International Conference on Computer Vision and Pattern Recognition (ICCV), pp. 1219–1225 (2009)Google Scholar
  15. 15.
    Zhang, G., Jia, J., Hua, W., Bao, H.: Robust bilayer segmentation and motion/depth estimation with a handheld camera. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 33(3), 603–617 (2011)zbMATHCrossRefGoogle Scholar
  16. 16.
    Liu, C., Yuen, J., Torralba, A.: Sift flow: Dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 33(5), 978–994 (2011)CrossRefGoogle Scholar
  17. 17.
    Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In: IEEE International Conference on Computer Vision (ICCV), vol. 1, pp. 105–112 (2001)Google Scholar
  18. 18.
    Scheuermann, B., Rosenhahn, B.: SlimCuts: GraphCuts for High Resolution Images Using Graph Reduction. In: Boykov, Y., Kahl, F., Lempitsky, V., Schmidt, F.R. (eds.) EMMCVPR 2011. LNCS, vol. 6819, pp. 219–232. Springer, Heidelberg (2011)Google Scholar
  19. 19.
    Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 674–679 (1981)Google Scholar
  20. 20.
    Thormählen, T., Hasler, N., Wand, M., Seidel, H.P.: Registration of sub-sequence and multi-camera reconstructions for camera motion estimation. Journal of Virtual Reality and Broadcasting 7(2) (2010)Google Scholar
  21. 21.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2), 91–110 (2004)CrossRefGoogle Scholar
  22. 22.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference (BMVC), vol. 1, pp. 384–393 (2002)Google Scholar
  23. 23.
    Dickscheid, T., Schindler, F., Förstner, W.: Coding images with local features. International Journal of Computer Vision (IJCV) 94(2), 1–21 (2010)Google Scholar
  24. 24.
    Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle Adjustment – A Modern Synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) ICCV-WS 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  25. 25.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry, 2nd edn. Cambridge University Press (2003)Google Scholar
  26. 26.
    Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM SIGGRAPH Papers 23(3), 309–314 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Kai Cordes
    • 1
  • Björn Scheuermann
    • 1
  • Bodo Rosenhahn
    • 1
  • Jörn Ostermann
    • 1
  1. 1.Institut für Informationsverarbeitung (TNT)Leibniz Universität HannoverHannoverGermany

Personalised recommendations