Multivariate Relevance Vector Machines for Tracking

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3953)


This paper presents a learning based approach to tracking articulated human body motion from a single camera. In order to address the problem of pose ambiguity, a one-to-many mapping from image features to state space is learned using a set of relevance vector machines, extended to handle multivariate outputs. The image features are Hausdorff matching scores obtained by matching different shape templates to the image, where the multivariate relevance vector machines (MVRVM) select a sparse set of these templates. We demonstrate that these Hausdorff features reduce the estimation error in clutter compared to shape-context histograms. The method is applied to the pose estimation problem from a single input frame, and is embedded within a probabilistic tracking framework to include temporal information. We apply the algorithm to 3D hand tracking and full human body tracking.


Mapping Function Relevance Vector Machine Shape Template Human Body Motion Multivariate Output 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agarwal, A., Triggs, B.: 3D human pose from silhouettes by relevance vector regression. In: Proc. Conf. Computer Vision and Pattern Recognition, Washington, DC, July 2004, vol. 2, pp. 882–888 (2004)Google Scholar
  2. 2.
    Agarwal, A., Triggs, B.: Learning to track 3D human motion from silhouettes. In: Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, pp. 9–16 (2004)Google Scholar
  3. 3.
    Agarwal, A., Triggs, B.: Monocular human motion capture with a mixture of regressors. In: IEEE Workshop on Vision for Human Computer Interaction (2005)Google Scholar
  4. 4.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Analysis and Machine Intell. 24(4), 509–522 (2002)CrossRefGoogle Scholar
  5. 5.
    Brand, M.: Shadow puppetry. In: Proc. 7th Int. Conf. on Computer Vision, Corfu, Greece, September 1999, vol. II, pp. 1237–1244 (1999)Google Scholar
  6. 6.
    Cham, T.J., Rehg, J.M.: A multiple hypothesis approach to figure tracking. In: Proc. Conf. Computer Vision and Pattern Recognition, Fort Collins, CO, June 1999, vol. II, pp. 239–245 (1999)Google Scholar
  7. 7.
    Deutscher, J., Blake, A., Reid, I.: Articulated body motion capture by annealed particle filtering. In: Proc. Conf. Computer Vision and Pattern Recognition, Hilton Head, SC, June 2000, vol. II, pp. 126–133 (2000)Google Scholar
  8. 8.
    Gavrila, D.M.: Pedestrian detection from a moving vehicle. In: Proc. 6th European Conf. on Computer Vision, Dublin, Ireland, June/July, vol. II, pp. 37–49 (2000)Google Scholar
  9. 9.
    Howe, N.R., Leventon, M.E., Freeman, W.T.: Bayesian reconstruction of 3D human motion from single-camera video. In: Adv. Neural Information Processing Systems, Denver, CO, November 1999, pp. 820–826 (1999)Google Scholar
  10. 10.
    Huttenlocher, D.P., Noh, J.J., Rucklidge, W.J.: Tracking non-rigid objects in complex scenes. In: Proc. 4th Int. Conf. on Computer Vision, Berlin, May 1993, pp. 93–101 (1993)Google Scholar
  11. 11.
    Jordan, M., Jacobs, R.: Hierarchical mixtures of experts and the em algorithm. Neural Computation 6, 181–214 (1994)CrossRefGoogle Scholar
  12. 12.
    Olson, C.F., Huttenlocher, D.P.: Automatic target recognition by matching oriented edge pixels. Transactions on Image Processing 6(1), 103–113 (1997)CrossRefGoogle Scholar
  13. 13.
    Rosales, R., Athitsos, V., Sigal, L., Scarloff, S.: 3D hand pose reconstruction using specialized mappings. In: Proc. 8th Int. Conf. on Computer Vision, Vancouver, Canada, July 2001, vol. I, pp. 378–385 (2001)Google Scholar
  14. 14.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parametersensitive hashing. In: Proc. 9th Int. Conf. on Computer Vision, vol. II, pp. 750–757 (2003)Google Scholar
  15. 15.
    Sidenbladh, H., Torre, F.D.L., Black, M.J.: A framework for modeling the appearance of 3d articulated figures. In: IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, pp. 368–375 (2000)Google Scholar
  16. 16.
    Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Discriminative density propagation for 3d human motion estimation. In: Proc. Conf. Computer Vision and Pattern Recognition, June 2005, pp. 217–323 (2005)Google Scholar
  17. 17.
    Sminchisescu, C., Triggs, B.: Estimating articulated human motion with covariance scaled sampling. Int. Journal of Robotics Research 22(6), 371–393 (2003)CrossRefGoogle Scholar
  18. 18.
    Stenger, B., Thayananthan, A., Torr, P.H.S., Cipolla, R.: Filtering using a treebased estimator. In: Proc. 9th Int. Conf. on Computer Vision, vol. II, pp. 1063–1070 (2003)Google Scholar
  19. 19.
    Thayananthan, A.: Template-based pose estimation and tracking of 3D hand motion. PhD thesis, University of Cambridge, UK (2005)Google Scholar
  20. 20.
    Thayananthan, A., Stenger, B., Torr, P.H.S., Cipolla, R.: Shape context and chamfer matching in cluttered scenes. In: Proc. Conf. Computer Vision and Pattern Recognition, vol. I, pp. 127–133 (2003)Google Scholar
  21. 21.
    Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J Machine Learning Research, 211–244 (2001)Google Scholar
  22. 22.
    Tipping, M.E., Faul, A.: Fast marginal likelihood maximisation for sparse bayesian models. In: Proc. Ninth Intl. Workshop on Artificial Intelligence and Statistics, Key West, FL (January 2003)Google Scholar
  23. 23.
    Toyama, K., Blake, A.: Probabilistic tracking with exemplars in a metric space. Int. Journal of Computer Vision 48(1), 9–19 (2002)zbMATHCrossRefGoogle Scholar
  24. 24.
    Vermaak, J., Doucet, A., P´erez, P.: Maintaining multi-modality through mixture tracking. In: Proc. 9th Int. Conf. on Computer Vision (2003)Google Scholar
  25. 25.
    Williams, O., Blake, A., Cipolla, R.: A sparse probabilistic learning algorithm for real-time tracking. In: Proc. 9th Int. Conf. on Computer Vision, Nice, France, October 2003, vol. I, pp. 353–360 (2003)Google Scholar
  26. 26.
    Wu, Y., Lin, J.Y., Huang, T.S.: Capturing natural hand articulation. In: Proc. 8th Int. Conf. on Computer Vision, Vancouver, Canada, July 2001, vol. II, pp. 426–432 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  1. 1.University of CambridgeUK
  2. 2.Toshiba Corporate R&D CenterKawasakiJapan
  3. 3.Oxford Brookes UniversityUK

Personalised recommendations