Unsupervised Video Adaptation for Parsing Human Motion

  • Haoquan Shen
  • Shoou-I Yu
  • Yi Yang
  • Deyu Meng
  • Alexander Hauptmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8693)


In this paper, we propose a method to parse human motion in unconstrained Internet videos without labeling any videos for training. We use the training samples from a public image pose dataset to avoid the tediousness of labeling video streams. There are two main problems confronted. First, the distribution of images and videos are different. Second, no temporal information is available in the training images. To smooth the inconsistency between the labeled images and unlabeled videos, our algorithm iteratively incorporates the pose knowledge harvested from the testing videos into the image pose detector via an adjust-and-refine method. During this process, continuity and tracking constraints are imposed to leverage the spatio-temporal information only available in videos. For our experiments, we have collected two datasets from YouTube and experiments show that our method achieves good performance for parsing human motions. Furthermore, we found that our method achieves better performance by using unlabeled video than adding more labeled pose images into the training set.


Unsupervised Video Pose Estimation Image to Video Adaptation Unconstrained Internet Videos 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

978-3-319-10602-1_23_MOESM1_ESM.mp4 (18.4 mb)
Electronic Supplementary Material (MP4 18,857 KB)


  1. 1.
    Bergtholdt, M., Kappes, J.: A study of parts-based object class detection using complete graphs. In: IJCV (2009)Google Scholar
  2. 2.
    Fablet, R., Black, M.J.: Automatic detection and tracking of human motion with a view-based representation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 476–491. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)CrossRefGoogle Scholar
  4. 4.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR (2008)Google Scholar
  5. 5.
    Finley, T., Joachims, T.: Training structural svms when exact inference is intractable. In: ICML (2008)Google Scholar
  6. 6.
    Fischler, M., Elschlager, R.: The representation and matching of pictorial structures, vol. 100, pp. 67–92 (1973)Google Scholar
  7. 7.
    Fragkiadaki, K., Hu, H., Shi, J.: Pose from flow and flow from pose. In: CVPR (2013)Google Scholar
  8. 8.
    Hogg, D.: Model-based vision: a program to see a walking person. Image and Vision computing 1(1), 5–20 (1983)Google Scholar
  9. 9.
    Jiang, H.: Human pose estimation using consistent maxcovering. In: ICCV (2009)Google Scholar
  10. 10.
    Ju, S.X., Black, M.J., Yacoob, Y.: Cardboard people: A parameterized model of articulated image motion. In: FG (1996)Google Scholar
  11. 11.
    Kumar, M., Zisserman, A., Torr, P.: Efficient discriminative learning of parts-based models. In: CVPR (2010)Google Scholar
  12. 12.
    Lan, X., Huttenlocher, D.: Beyond trees: Common-factor models for 2d human pose recovery. In: ICCV (2005)Google Scholar
  13. 13.
    Lee, M., Cohen, I.: Proposal maps driven mcmc for estimating human body pose in static images. In: CVPR (2004)Google Scholar
  14. 14.
    Ma, Z., Yang, Y., Nie, F., Sebe, N., Yan, S., Hauptmann, A.: Harnessing lab knowledge for real-world action recognition. International Journal of Computer Vision 109(1-2), 60–73 (2014)Google Scholar
  15. 15.
    O’Rourke, J., Badler, N.: Model-based image analysis of human motion using constraint propagation. PAMI 2(6), 522–536 (1980)CrossRefGoogle Scholar
  16. 16.
    O’Rourke, J., Badler, N.: 2d human pose estimation in tv shows. Statistical and Geometrical Approaches to Visual Motion Analysis 1, 128–147 (2009)Google Scholar
  17. 17.
    Ramanan, D.: Learning to parse images of articulated bodies. In: NIPS (2007)Google Scholar
  18. 18.
    Ramanan, D., Forsyth, D., Zisserman, A.: Strike a pose: Tracking people by finding stylized poses. In: CVPR (2005)Google Scholar
  19. 19.
    Ramanan, D., Sminchisescu, C.: Training deformable models for localization. In: CVPR (2006)Google Scholar
  20. 20.
    Ren, X., Berg, A.C., Malik, J.: Recovering human body configurations using pairwise constraints between parts. In: ICCV (2005)Google Scholar
  21. 21.
    Rohr, K.: Towards model-based recognition of human movements in image sequences. CVGIP-Image Understanding 59(1), 94–115 (1994)CrossRefGoogle Scholar
  22. 22.
    Ronfard, R., Schmid, C., Triggs, B.: Learning to parse pictures of people. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 700–714. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  23. 23.
    Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: CVPR (2010)Google Scholar
  24. 24.
    Sapp, B., Taskar, B.: Modec: Multimodal decomposable models for human pose estimation. In: CVPR (2013)Google Scholar
  25. 25.
    Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: CVPR (2011)Google Scholar
  26. 26.
    Sigal, L., Black, M.: Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In: CVPR (2006)Google Scholar
  27. 27.
    Sigal, L., Isard, M., Sigelman, B.H., Black, M.J.: Attractive people: Assembling loose-limbed models using non-parametric belief propagation. In: NIPS (2003)Google Scholar
  28. 28.
    Singh, V.K., Nevatia, R., Huang, C.: Efficient inference with multiple heterogeneous part detectors for human pose estimation. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 314–327. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  29. 29.
    Wang, H., Kläser, A., Schmid, C., Liu, C.: Action Recognition by Dense Trajectories. In: IEEE Conference on Computer Vision & Pattern Recognition, Colorado Springs, United States, pp. 3169–3176 (June 2011),
  30. 30.
    Wang, Y., Mori, G.: Multiple tree models for occlusion and spatial constraints in human pose estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 710–724. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  31. 31.
    Yang, Y., Ramanan, D.: Articulated pose estimation using flexible mixtures of parts. In: CVPR (2011)Google Scholar
  32. 32.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 61(1), 55–79 (2013)Google Scholar
  33. 33.
    Yuille, A., Rangarajan, A.: The concave-convex procedure. Neural Computation 15(4), 915–936 (2003)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Haoquan Shen
    • 1
  • Shoou-I Yu
    • 2
  • Yi Yang
    • 3
  • Deyu Meng
    • 4
  • Alexander Hauptmann
    • 2
  1. 1.School of Computer ScienceZhejiang UniversityChina
  2. 2.School of Computer ScienceCarnegie Mellon UniversityUSA
  3. 3.ITEEThe University of QueenslandAustralia
  4. 4.School of Mathematics and StatisticsXi’an Jiaotong UniversityChina

Personalised recommendations