Skip to main content
Log in

PWP3D: Real-Time Segmentation and Tracking of 3D Objects

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We formulate a probabilistic framework for simultaneous region-based 2D segmentation and 2D to 3D pose tracking, using a known 3D model. Given such a model, we aim to maximise the discrimination between statistical foreground and background appearance models, via direct optimisation of the 3D pose parameters. The foreground region is delineated by the zero-level-set of a signed distance embedding function, and we define an energy over this region and its immediate background surroundings based on pixel-wise posterior membership probabilities (as opposed to likelihoods). We derive the differentials of this energy with respect to the pose parameters of the 3D object, meaning we can conduct a search for the correct pose using standard gradient-based non-linear minimisation techniques. We propose novel enhancements at the pixel level based on temporal consistency and improved online appearance model adaptation. Furthermore, straightforward extensions of our method lead to multi-camera and multi-object tracking as part of the same framework. The parallel nature of much of the processing in our algorithm means it is amenable to GPU acceleration, and we give details of our real-time implementation, which we use to generate experimental results on both real and artificial video sequences, with a number of 3D models. These experiments demonstrate the benefit of using pixel-wise posteriors rather than likelihoods, and showcase the qualities, such as robustness to occlusions and motion blur (and also some failure modes), of our tracker.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bibby, C., & Reid, I. (2008). Robust real-time visual tracking using pixel-wise posteriors. In ECCV 2008 (pp. 831–844).

    Google Scholar 

  • Binford, T. O. (1981). Inferring surfaces from images. Artificial Intelligence, 17(1-3), 205–244.

    Article  Google Scholar 

  • Bouguet, J. Y. (2008). Camera calibration toolbox for Matlab.

  • Brox, T., Rosenhahn, B., Gall, J., & Cremers, D. (2009). Combined region- and motion-based 3D tracking of rigid and articulated objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 402–415.

    Article  Google Scholar 

  • Cremers, D., Osher, S. J., & Soatto, S. (2006). Kernel density estimation and intrinsic alignment for shape priors in level set segmentation. International Journal of Computer Vision, 69(3), 335–351.

    Article  Google Scholar 

  • Cremers, D., Rousson, M., & Deriche, R. (2007). A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape. International Journal of Computer Vision, 72(2), 195–215.

    Article  Google Scholar 

  • Dambreville, S., Sandhu, R., Yezzi, A., & Tannenbaum, A. (2008). Robust 3D pose estimation and efficient 2D region-based segmentation from a 3D shape prior. In ECCV 2008 (pp. 169–182).

    Google Scholar 

  • Drummond, T., & Cipolla, R. (1999). Visual tracking and control using Lie algebras. In CVPR 1999 (pp. 652–657).

    Google Scholar 

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Distance transforms of sampled functions. Tech. Rep., Cornell Computing and Information Science.

  • Freeman, W. T. (1993). Exploiting the generic view assumption to estimate scene parameters. In ICCV 1993 (pp. 347–356).

    Google Scholar 

  • Gall, J., Rosenhahn, B., & Seidel, H. P. (2008). Drift-free tracking of rigid and articulated objects. In CVPR 2008 (pp. 1–8).

    Google Scholar 

  • Gilbert, J. C., & Nocedal, J. (1992). Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on Optimization, 2(1), 21–42.

    Article  MathSciNet  MATH  Google Scholar 

  • Hager, G., & Belhumeur, P. (1998). Efficient region tracking with parametric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10), 1025–1039.

    Article  Google Scholar 

  • Hager, W. W., & Zhang, H. (2005). A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM Journal on Optimization, 16(1), 170–192.

    Article  MathSciNet  MATH  Google Scholar 

  • Harris, C. (1993). Tracking with rigid models. In Active vision (pp. 59–73).

    Google Scholar 

  • Jurie, F., & Dhome, M. (2002). Hyperplane approximation for template matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 996–1000.

    Article  Google Scholar 

  • Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In ISMAR 2007 (pp. 1–10).

    Google Scholar 

  • Kohli, P., Rihan, J., Bray, M., Torr, P. H. S., Kohli, P., Rihan, J., Bray, M., & Torr, P. H. S. (2008). Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. In IJCV (pp. 285–298).

    Google Scholar 

  • Lepetit, V., & Fua, P. (2005). Monocular model-based 3D tracking of rigid objects: a survey. Foundations and Trends in Computer Graphics and Vision, 1(1), 1–89.

    Article  Google Scholar 

  • Lepetit, V., Lagger, P., & Fua, P. (2005). Randomized trees for real-time keypoint recognition. In CVPR 2005 (pp. 775–781).

    Google Scholar 

  • Liu, Y., Stoll, C., Gall, J., Seidel, H. P., & Theobalt, C. (2011). Markerless motion capture of interacting characters using multi-view image segmentation. In CVPR 2011 (pp. 1249–1256).

    Chapter  Google Scholar 

  • Lowe, D. G. (1992). Robust model-based motion tracking through the integration of search and estimation. International Journal of Computer Vision, 8(2), 113–122.

    Article  Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Marchand, E., Bouthemy, P., Chaumette, F., & Moreau, V. (1999). Robust real-time visual tracking using a 2D–3D model-based approach. In ICCV 1999 (pp. 262–268).

    Google Scholar 

  • NVIDIA (2009). NVIDIA CUDA Programming Guide 2.2.

  • Osher, S., & Sethian, JA (1988). Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics, 79(1), 12–49.

    Article  MathSciNet  MATH  Google Scholar 

  • Ozuysal, M., Lepetit, V., Fleuret, F., & Fua, P. (2006). Feature harvesting for tracking-by-detection. In ECCV 2006 (pp. 592–605).

    Google Scholar 

  • Press, W. H. (2007). Numerical recipes: the art of scientific computing (3rd edn.). Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Prisacariu, V., & Reid, I. (2009). PWP3D: real-time segmentation and tracking of 3D objects. In BMVC 2009 (pp. 1–10).

    Google Scholar 

  • Prisacariu, V., & Reid, I. (2011a). Nonlinear shape manifolds as shape priors in level set segmentation and tracking. In CVPR 2011 (pp. 2185–2192).

    Chapter  Google Scholar 

  • Prisacariu, V., & Reid, I. (2011b). Shared shape spaces. In ICCV 2011.

    Google Scholar 

  • Rosenhahn, B., Brox, T., & Weickert, J. (2007). Three-dimensional shape knowledge for joint image segmentation and pose tracking. International Journal of Computer Vision, 73(3), 243–262.

    Article  Google Scholar 

  • Rosten, E., & Drummond, T. W. (2005). Fusing points and lines for high performance tracking. In ICCV 2005 (pp. 1508–1511).

    Google Scholar 

  • Scharr, H. (2000). Optimal operators in digital image processing. PhD thesis.

  • Schmaltz, C., Rosenhahn, B., Brox, T., Cremers, D., Weickert, J., Wietzke, L., & Sommer, G. (2007a). Region-based pose tracking. In IbPRIA 2007 (pp. 56–63).

    Google Scholar 

  • Schmaltz, C., Rosenhahn, B., Brox, T., Weickert, J., Cremers, D., Wietzke, L., & Sommer, G. (2007b). Occlusion modeling by tracking multiple objects. In DAGM 2007 (pp. 173–183).

    Google Scholar 

  • Vese, L. A., & Chan, T. F. (2002). A multiphase level set framework for image segmentation using the Mumford and shah model. International Journal of Computer Vision, 50(3), 271–293.

    Article  MATH  Google Scholar 

  • Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., & Schmalstieg, D. (2008). Pose tracking from natural features on mobile phones. In ISMAR 2008 (pp. 125–134).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor A. Prisacariu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prisacariu, V.A., Reid, I.D. PWP3D: Real-Time Segmentation and Tracking of 3D Objects. Int J Comput Vis 98, 335–354 (2012). https://doi.org/10.1007/s11263-011-0514-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-011-0514-3

Keywords

Navigation