International Journal of Computer Vision

, Volume 88, Issue 2, pp 214–237

Learning Articulated Structure and Motion

Article

Abstract

Humans demonstrate a remarkable ability to parse complicated motion sequences into their constituent structures and motions. We investigate this problem, attempting to learn the structure of one or more articulated objects, given a time series of two-dimensional feature positions. We model the observed sequence in terms of “stick figure” objects, under the assumption that the relative joint angles between sticks can change over time, but their lengths and connectivities are fixed. The problem is formulated as a single probabilistic model that includes multiple sub-components: associating the features with particular sticks, determining the proper number of sticks, and finding which sticks are physically joined. We test the algorithm on challenging datasets of 2D projections of optical human motion capture and feature trajectories from real videos.

Keywords

Structure from motion Graphical models Non-rigid motion 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdel-Malek, K., Arora, J., Beck, S., Bhatti, M., Carroll, J., Cook, T., Dasgupta, S., Grosland, N., Han, R., Kim, H., Lu, J., Swan, C., Williams, A., & Yang, J. Digital human modeling and virtual reality for FCS (Technical Report VSR-04.02). The Virtual Soldier Research (VSR) Program, Center for Computer-Aided Design, College of Engineering, The University of Iowa, October 2004. Google Scholar
  2. Bray, M., Kohli, P., & Torr, P. (2006). Posecut: Simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts. In ECCV (2), pp. 642–655. Google Scholar
  3. Costeira, J., & Kanade, T. (1996). A multi-body factorization method for motion analysis. In Image understanding workshop (pp. 1013–1026). Google Scholar
  4. Costeira, J. P., & Kanade, T. (1998). A multibody factorization method for independently moving-objects. International Journal of Computer Vision, 29(3), 159–179. CrossRefGoogle Scholar
  5. Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. New York: Wiley-Interscience. MATHCrossRefGoogle Scholar
  6. Culverhouse, P. F., & Wang, H. (2003). Robust motion segmentation by spectral clustering. In British machine vision conference (pp. 639–648). Google Scholar
  7. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39, 1–38. MATHMathSciNetGoogle Scholar
  8. Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976. CrossRefMathSciNetGoogle Scholar
  9. Gear, C. W. (1998). Multibody grouping from motion images. International Journal of Computer Vision, 29(2), 133–150. doi:10.1023/A:1008026310903. ISSN 0920-5691. CrossRefGoogle Scholar
  10. Ghahramani, Z., & Hinton, G. E. (1996a). The EM algorithm for mixtures of factor analyzers (Technical Report CRG-TR-96-1). University of Toronto. Google Scholar
  11. Ghahramani, Z., & Hinton, G. E. (1996b). Parameter estimation for linear dynamical systems (Technical Report CRG-TR-96-2). University of Toronto. Google Scholar
  12. Golub, G. H., & Van Loan, C. F. (1996). Matrix computations. Baltimore: Johns Hopkins Press. MATHGoogle Scholar
  13. Gruber, A., & Weiss, Y. (2003). Factorization with uncertainty and missing data: Exploiting temporal coherence. In Thrun, S., Saul, L. K., & Schölkopf, B. (Eds.) Advances in Neural Information Processing Systems. Cambridge: MIT Press. ISBN0-262-20152-6. Google Scholar
  14. Gruber, A., & Weiss, Y. (2004). Multibody factorization with uncertainty and missing data using the EM algorithm. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 707–714). Google Scholar
  15. Hartley, R., & Zisserman, A. (2003). Multiple view geometry. Cambridge: Cambridge University Press. Google Scholar
  16. Herda, L., Fua, P., Plankers, R., Boulic, R., & Thalmann, D. (2001). Using skeleton-based tracking to increase the reliability of optical motion capture. Human Movement Science Journal, 20(3), 313–341. CrossRefGoogle Scholar
  17. Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14, 201–211. Google Scholar
  18. Kirk, A. G., O’Brien, J. F., & Forsyth, D. A. (2005). Skeletal parameter estimation from optical motion capture data. In Proceedings of IEEE conference on computer vision and pattern recognition. Los Alamitos: IEEE Comput. Soc. ISBN 0-7695-2372-2. Google Scholar
  19. Neal, R., & Hinton, G. (1998). A view of the em algorithm that justifies incremental, sparse, and other variants. In Jordan, M. I. (Ed.) Learning in graphical models. Norwell: Kluwer Academic. Google Scholar
  20. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: analysis and an algorithm. In Advances in neural information processing systems (NIPS). Google Scholar
  21. Ross, D. A. (2008a). Learning probabilistic models for visual motion (PhD thesis). University of Toronto, Ontario, Canada. Google Scholar
  22. Ross, D. A. (2008b). Learning probabilistic models for visual motion (PhD thesis). University of Toronto, Toronto, Ontario, Canada. Google Scholar
  23. Ross, D. A., & Zemel, R. S. (2006). Learning parts-based representations of data. Journal of Machine Learning Research, 7, 2369–2397. MathSciNetGoogle Scholar
  24. Ross, D. A., Tarlow, D., & Zemel, R. S. (2007). Learning articulated skeletons from motion. In Workshop on dynamical vision at ICCV. Google Scholar
  25. Ross, D. A., Tarlow, D., & Zemel, R. S. (2008). Unsupervised learning of skeletons from motion. In Forsyth, D., Torr, P., & Zisserman, A. (Eds.) Proceedings of the 10th European conference on computer vision (ECCV 2008). Berlin: Springer. Google Scholar
  26. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905. CrossRefGoogle Scholar
  27. Shi, J., & Tomasi, C. (1994). Good features to track. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), (pp. 593–600). Google Scholar
  28. Silaghi, M. C., Plankers, R., Boulic, R., Fua, P., & Thalmann, D. (1998). Local and global skeleton fitting techniques for optical motion capture, modeling and motion capture techniques for virtual environments. In Lecture notes in artificial intelligence (pp. 26–40). Berlin: Springer. Google Scholar
  29. Sminchisescu, C., & Triggs, B. (2003). Estimating articulated human motion with covariance scaled sampling. International Journal of Robotics Research, 22(6), 371–393. CrossRefGoogle Scholar
  30. Song, Y., Goncalves, L., & Perona, P. (2003). Unsupervised learning of human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7), 814–827. CrossRefGoogle Scholar
  31. Song, Y., Goncalves, L., & Perona, P. (2001). Learning probabilistic structure for human motion detection. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 771–777). Los Alamitos: IEEE Comput. Soc. ISBN 0-7695-1272-0. Google Scholar
  32. Taycher, L., Fisher III, J. W., & Darrell, T. (2002). Recovering articulated model topology from observed rigid motion. In Becker, S., Thrun, S., & Obermayer, K. (Eds.) Advances in neural information processing systems (NIPS) (pp. 1311–1318). Cambridge: MIT Press. Google Scholar
  33. Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision, 9, 137–154. CrossRefGoogle Scholar
  34. Tresadern, P., & Reid, I. (2005). Articulated structure from motion by factorization. In CVPR ’05: proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05) (Vol. 2, pp. 1110–1115). Washington: IEEE Comput. Soc. doi:10.1109/CVPR.2005.75. ISBN 0-7695-2372-2. CrossRefGoogle Scholar
  35. Viklands, T. (2006). Algorithms for the weighted orthogonal Procrustes problem and other least squares problems (PhD thesis). Umeå University, Umeå, Sweden. Google Scholar
  36. Weiss, Y. (1999). Segmentation using eigenvectors: a unifying view. In Proceedings of the international conference on computer vision (ICCV). Google Scholar
  37. Yan, J., & Pollefeys, M. (2005a). Factorization-based approach to articulated motion recovery. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR). Google Scholar
  38. Yan, J., & Pollefeys, M. (2005b). Articulated motion segmentation using ransac with priors. In Workshop on dynamical vision (ICCV). Google Scholar
  39. Yan, J., & Pollefeys, M. (2006a). A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In Proceedings computer vision—ECCV 2006, 9th European conference on computer vision, Part III, Graz, Austria, May 7–13. Google Scholar
  40. Yan, J., & Pollefeys, M. (2006b). Automatic kinematic chain building from feature trajectories of articulated objects. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR). Google Scholar
  41. Yan, J., & Pollefeys, M. (2008). A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5), 865–877. ISSN 0162-8828. http://doi.ieeecomputersociety.org/10.1109/TPAMI.2007.70739. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • David A. Ross
    • 1
  • Daniel Tarlow
    • 1
  • Richard S. Zemel
    • 1
  1. 1.University of TorontoTorontoCanada

Personalised recommendations