MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation

  • Arjun JainEmail author
  • Jonathan Tompson
  • Yann LeCun
  • Christoph Bregler
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9004)


In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion (This dataset can be downloaded from, that extends the FLIC dataset [1] with additional motion features. We apply our architecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems.


Optical Flow Motion Feature Laplacian Pyramid Convolutional Layer Convolutional Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors would like to thank Tyler Zhu for his help with the data-set creation. This research was funded in part by the Office of Naval Research ONR Award N000141210327.


  1. 1.
    Sapp, B., Taskar, B.: Modec: multimodal decomposable models for human pose estimation. In: CVPR (2013)Google Scholar
  2. 2.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  3. 3.
    Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14, 201–211 (1973)CrossRefGoogle Scholar
  4. 4.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR (2008)Google Scholar
  5. 5.
    Weiss, D., Sapp, B., Taskar, B.: Sidestepping intractable inference with structured ensemble cascades. In: NIPS (2010)Google Scholar
  6. 6.
    Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: BMVC (2009)Google Scholar
  7. 7.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)Google Scholar
  8. 8.
    Sapp, B., Toshev, A., Taskar, B.: Cascaded models for articulated pose estimation. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 406–420. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  9. 9.
    Hogg, D.: Model-based vision: a program to see a walking person. Image Vis. Comput. 1, 5–20 (1983)CrossRefGoogle Scholar
  10. 10.
    Rehg, J.M., Kanade, T.: Model-based tracking of self-occluding articulated objects. In: Computer Vision (1995)Google Scholar
  11. 11.
    Kakadiaris, I.A., Metaxas, D.: Model-based estimation of 3d human motion with occlusion based on active multi-viewpoint selection. In: CVPR (1996)Google Scholar
  12. 12.
    Wren, C.R., Azarbayejani, A., Darrell, T., Pentland, A.P.: Pfinder: Real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19, 780–785 (1997)CrossRefGoogle Scholar
  13. 13.
    Bregler, C., Malik, J.: Tracking people with twists and exponential maps. In: CVPR (1998)Google Scholar
  14. 14.
    Deutscher, J., Blake, A., Reid, I.: Articulated body motion capture by annealed particle filtering. In: CVPR (2000)Google Scholar
  15. 15.
    Sidenbladh, H., Black, M.J., Fleet, D.J.: Stochastic tracking of 3D human figures using 2D image motion. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 702–718. Springer, Heidelberg (2000) CrossRefGoogle Scholar
  16. 16.
    Sminchisescu, C., Triggs, B.: Covariance scaled sampling for monocular 3d body tracking. In: CVPR (2001)Google Scholar
  17. 17.
    Sigal, L., Balan, A., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87, 4–27 (2010)CrossRefGoogle Scholar
  18. 18.
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. In: TOG (2005)Google Scholar
  19. 19.
    Poppe, R.: Vision-based human motion analysis: an overview. Compu. Vis. Image Underst. 108, 4–18 (2007)CrossRefGoogle Scholar
  20. 20.
    De Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 1–9 (2008)CrossRefGoogle Scholar
  21. 21.
    Jain, A., Thormählen, T., Seidel, H.P., Theobalt, C.: Moviereshape: tracking and reshaping of humans in videos. In: TOG (2010)Google Scholar
  22. 22.
    Stoll, C., Hasler, N., Gall, J., Seidel, H., Theobalt, C.: Fast articulated motion tracking using a sums of gaussians body model. In: ICCV (2011)Google Scholar
  23. 23.
    Freeman, W.T., Roth, M.: Orientation histograms for hand gesture recognition. In: International Workshop on Automatic Face and Gesture Recognition (1995)Google Scholar
  24. 24.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)CrossRefGoogle Scholar
  25. 25.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)CrossRefGoogle Scholar
  26. 26.
    Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  27. 27.
    Mori, G., Malik, J.: Estimating human body configurations using shape context matching. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 666–680. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  28. 28.
    Agarwal, A., Triggs, B., Rhone-Alpes, I., Montbonnot, F.: Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28, 44–58 (2006)CrossRefGoogle Scholar
  29. 29.
    Grauman, K., Shakhnarovich, G., Darrell, T.: Inferring 3d structure with a statistical image-based shape model. In: ICCV (2003)Google Scholar
  30. 30.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: ICCV (2003)Google Scholar
  31. 31.
    Ramanan, D., Forsyth, D., Zisserman, A.: Strike a pose: Tracking people by finding stylized poses. In: CVPR (2005)Google Scholar
  32. 32.
    Buehler, P., Zisserman, A., Everingham, M.: Learning sign language by watching TV (using weakly aligned subtitles) (2009)Google Scholar
  33. 33.
    Fischler, M.A., Elschlager, R.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22, 67–92 (1973)CrossRefGoogle Scholar
  34. 34.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  35. 35.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation. In: CVPR (2009)Google Scholar
  36. 36.
    Dantone, M., Gall, J., Leistner, C., Gool., L.V.: Human pose estimation using body parts dependent joint regressors. In: CVPR (2013)Google Scholar
  37. 37.
    Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: CVPR (2011)Google Scholar
  38. 38.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: CVPR (2013)Google Scholar
  39. 39.
    Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: ICCV (2009)Google Scholar
  40. 40.
    Gkioxari, G., Arbelaez, P., Bourdev, L., Malik, J.: Articulated pose estimation using discriminative armlet classifiers. In: CVPR (2013)Google Scholar
  41. 41.
    Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth images. ACM (2013)Google Scholar
  42. 42.
    Zeiler, M., R., F.: Visualizing and understanding convolutional neural networks. In: arXiv preprint arXiv:1311.2901. (2013)
  43. 43.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: an astounding baseline for recognition (2014)Google Scholar
  44. 44.
    Yaniv Taigman, Ming Yang, M.R., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: CVPR (2014)Google Scholar
  45. 45.
    Deng, L., Abdel-Hamid, O., Yu, D.: A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion. In: ICASSP (2013)Google Scholar
  46. 46.
    Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR (2013)Google Scholar
  47. 47.
    Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: Deepflow: large displacement optical flow with deep matching. In: ICCV (2013)Google Scholar
  48. 48.
    Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: CVPR (2014)Google Scholar
  49. 49.
    Jain, A., Tompson, J., Andriluka, M., Taylor, G., Bregler, C.: Learning human pose estimation features with convolutional networks. In: ICLR (2014)Google Scholar
  50. 50.
    Tompson, J., Stein, M., LeCun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. In: TOG (2014)Google Scholar
  51. 51.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)Google Scholar
  52. 52.
    Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)Google Scholar
  53. 53.
    Giusti, A., Ciresan, D.C., Masci, J., Gambardella, L.M., Schmidhuber, J.: Fast image scanning with deep max-pooling convolutional neural networks. In: CoRR (2013)Google Scholar
  54. 54.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. In: ICLR (2014)Google Scholar
  55. 55.
    Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: ICML (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Arjun Jain
    • 1
    Email author
  • Jonathan Tompson
    • 1
  • Yann LeCun
    • 1
  • Christoph Bregler
    • 1
  1. 1.New York UniversityNew YorkUSA

Personalised recommendations