Abstract
We formulate a probabilistic framework for simultaneous region-based 2D segmentation and 2D to 3D pose tracking, using a known 3D model. Given such a model, we aim to maximise the discrimination between statistical foreground and background appearance models, via direct optimisation of the 3D pose parameters. The foreground region is delineated by the zero-level-set of a signed distance embedding function, and we define an energy over this region and its immediate background surroundings based on pixel-wise posterior membership probabilities (as opposed to likelihoods). We derive the differentials of this energy with respect to the pose parameters of the 3D object, meaning we can conduct a search for the correct pose using standard gradient-based non-linear minimisation techniques. We propose novel enhancements at the pixel level based on temporal consistency and improved online appearance model adaptation. Furthermore, straightforward extensions of our method lead to multi-camera and multi-object tracking as part of the same framework. The parallel nature of much of the processing in our algorithm means it is amenable to GPU acceleration, and we give details of our real-time implementation, which we use to generate experimental results on both real and artificial video sequences, with a number of 3D models. These experiments demonstrate the benefit of using pixel-wise posteriors rather than likelihoods, and showcase the qualities, such as robustness to occlusions and motion blur (and also some failure modes), of our tracker.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bibby, C., & Reid, I. (2008). Robust real-time visual tracking using pixel-wise posteriors. In ECCV 2008 (pp. 831–844).
Binford, T. O. (1981). Inferring surfaces from images. Artificial Intelligence, 17(1-3), 205–244.
Bouguet, J. Y. (2008). Camera calibration toolbox for Matlab.
Brox, T., Rosenhahn, B., Gall, J., & Cremers, D. (2009). Combined region- and motion-based 3D tracking of rigid and articulated objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 402–415.
Cremers, D., Osher, S. J., & Soatto, S. (2006). Kernel density estimation and intrinsic alignment for shape priors in level set segmentation. International Journal of Computer Vision, 69(3), 335–351.
Cremers, D., Rousson, M., & Deriche, R. (2007). A review of statistical approaches to level set segmentation: integrating color, texture, motion and shape. International Journal of Computer Vision, 72(2), 195–215.
Dambreville, S., Sandhu, R., Yezzi, A., & Tannenbaum, A. (2008). Robust 3D pose estimation and efficient 2D region-based segmentation from a 3D shape prior. In ECCV 2008 (pp. 169–182).
Drummond, T., & Cipolla, R. (1999). Visual tracking and control using Lie algebras. In CVPR 1999 (pp. 652–657).
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Distance transforms of sampled functions. Tech. Rep., Cornell Computing and Information Science.
Freeman, W. T. (1993). Exploiting the generic view assumption to estimate scene parameters. In ICCV 1993 (pp. 347–356).
Gall, J., Rosenhahn, B., & Seidel, H. P. (2008). Drift-free tracking of rigid and articulated objects. In CVPR 2008 (pp. 1–8).
Gilbert, J. C., & Nocedal, J. (1992). Global convergence properties of conjugate gradient methods for optimization. SIAM Journal on Optimization, 2(1), 21–42.
Hager, G., & Belhumeur, P. (1998). Efficient region tracking with parametric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10), 1025–1039.
Hager, W. W., & Zhang, H. (2005). A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM Journal on Optimization, 16(1), 170–192.
Harris, C. (1993). Tracking with rigid models. In Active vision (pp. 59–73).
Jurie, F., & Dhome, M. (2002). Hyperplane approximation for template matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 996–1000.
Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In ISMAR 2007 (pp. 1–10).
Kohli, P., Rihan, J., Bray, M., Torr, P. H. S., Kohli, P., Rihan, J., Bray, M., & Torr, P. H. S. (2008). Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. In IJCV (pp. 285–298).
Lepetit, V., & Fua, P. (2005). Monocular model-based 3D tracking of rigid objects: a survey. Foundations and Trends in Computer Graphics and Vision, 1(1), 1–89.
Lepetit, V., Lagger, P., & Fua, P. (2005). Randomized trees for real-time keypoint recognition. In CVPR 2005 (pp. 775–781).
Liu, Y., Stoll, C., Gall, J., Seidel, H. P., & Theobalt, C. (2011). Markerless motion capture of interacting characters using multi-view image segmentation. In CVPR 2011 (pp. 1249–1256).
Lowe, D. G. (1992). Robust model-based motion tracking through the integration of search and estimation. International Journal of Computer Vision, 8(2), 113–122.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Marchand, E., Bouthemy, P., Chaumette, F., & Moreau, V. (1999). Robust real-time visual tracking using a 2D–3D model-based approach. In ICCV 1999 (pp. 262–268).
NVIDIA (2009). NVIDIA CUDA Programming Guide 2.2.
Osher, S., & Sethian, JA (1988). Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. Journal of Computational Physics, 79(1), 12–49.
Ozuysal, M., Lepetit, V., Fleuret, F., & Fua, P. (2006). Feature harvesting for tracking-by-detection. In ECCV 2006 (pp. 592–605).
Press, W. H. (2007). Numerical recipes: the art of scientific computing (3rd edn.). Cambridge: Cambridge University Press.
Prisacariu, V., & Reid, I. (2009). PWP3D: real-time segmentation and tracking of 3D objects. In BMVC 2009 (pp. 1–10).
Prisacariu, V., & Reid, I. (2011a). Nonlinear shape manifolds as shape priors in level set segmentation and tracking. In CVPR 2011 (pp. 2185–2192).
Prisacariu, V., & Reid, I. (2011b). Shared shape spaces. In ICCV 2011.
Rosenhahn, B., Brox, T., & Weickert, J. (2007). Three-dimensional shape knowledge for joint image segmentation and pose tracking. International Journal of Computer Vision, 73(3), 243–262.
Rosten, E., & Drummond, T. W. (2005). Fusing points and lines for high performance tracking. In ICCV 2005 (pp. 1508–1511).
Scharr, H. (2000). Optimal operators in digital image processing. PhD thesis.
Schmaltz, C., Rosenhahn, B., Brox, T., Cremers, D., Weickert, J., Wietzke, L., & Sommer, G. (2007a). Region-based pose tracking. In IbPRIA 2007 (pp. 56–63).
Schmaltz, C., Rosenhahn, B., Brox, T., Weickert, J., Cremers, D., Wietzke, L., & Sommer, G. (2007b). Occlusion modeling by tracking multiple objects. In DAGM 2007 (pp. 173–183).
Vese, L. A., & Chan, T. F. (2002). A multiphase level set framework for image segmentation using the Mumford and shah model. International Journal of Computer Vision, 50(3), 271–293.
Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., & Schmalstieg, D. (2008). Pose tracking from natural features on mobile phones. In ISMAR 2008 (pp. 125–134).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Prisacariu, V.A., Reid, I.D. PWP3D: Real-Time Segmentation and Tracking of 3D Objects. Int J Comput Vis 98, 335–354 (2012). https://doi.org/10.1007/s11263-011-0514-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0514-3