The Visual Computer

, Volume 35, Issue 4, pp 507–520 | Cite as

Action snapshot with single pose and viewpoint

  • Meili Wang
  • Shihui GuoEmail author
  • Minghong Liao
  • Dongjian He
  • Jian Chang
  • Jianjun Zhang
Original Article


Many art forms present visual content as a single image captured from a particular viewpoint. How to select a meaningful representative moment from an action performance is difficult, even for an experienced artist. Often, a well-picked image can tell a story properly. This is important for a range of narrative scenarios, such as journalists reporting breaking news, scholars presenting their research, or artists crafting artworks. We address the underlying structures and mechanisms of a pictorial narrative with a new concept, called the action snapshot, which automates the process of generating a meaningful snapshot (a single still image) from an input of scene sequences. The input of dynamic scenes could include several interactive characters who are fully animated. We propose a novel method based on information theory to quantitatively evaluate the information contained in a pose. Taking the selected top postures as input, a convolutional neural network is constructed and trained with the method of deep reinforcement learning to select a single viewpoint, which maximally conveys the information of the sequence. User studies are conducted to experimentally compare the computer-selected poses and viewpoints with those selected by human participants. The results show that the proposed method can assist the selection of the most informative snapshot effectively from animation-intensive scenarios.


Action snapshot Information entropy Pose Viewpoint selection 



We are grateful to the reviewers and editors for their valuable comments and constructive suggestions. This work is supported by National Natural Science Foundation (61402374, 61661146002, 61702433), China, Postdoctoral Science Foundation (2014M562457, 2016M600506) and the Fundamental Research Funds for the Central Universities (QN2012033). We also thank the researchers who maintains the CMU motion capture database and the Stanford 3D Scanning Repository.


  1. 1.
    Akgun, B., Cakmak, M., Jiang, K., Thomaz, A.L.: Keyframe-based learning from demonstration. Int. J. Soc. Robot. 4(4), 343–355 (2012)CrossRefGoogle Scholar
  2. 2.
    Assa, J., Caspi, Y., Cohen-Or, D.: Action synopsis: pose selection and illustration. ACM Trans. Graph. (TOG) 24(3), 667–676 (2005)CrossRefGoogle Scholar
  3. 3.
    Assa, J., Cohen-Or, D., Yeh, I.C., Lee, T.Y., et al.: Motion overview of human actions. In: ACM Transactions on Graphics (TOG), vol. 27, p. 115. ACM (2008)Google Scholar
  4. 4.
    Assa, J., Wolf, L., Cohen-Or, D.: The virtual director: a correlation-based online viewing of human motion. In: Computer Graphics Forum, vol. 29, pp. 595–604. Wiley Online Library (2010)Google Scholar
  5. 5.
    Caspi, Y., Axelrod, A., Matsushita, Y., Gamliel, A.: Dynamic stills and clip trailers. Vis Comput 22(9), 642–652 (2006)CrossRefGoogle Scholar
  6. 6.
    Coleman, P., Bibliowicz, J., Singh, K., Gleicher, M.: Staggered poses: a character motion representation for detail-preserving editing of pose and coordinated timing. In: Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 137–146. Eurographics Association (2008)Google Scholar
  7. 7.
    Correa, C.D., Ma, K.L.: Dynamic video narratives. In: ACM Transactions on Graphics (TOG), 29, 88. ACM (2010)Google Scholar
  8. 8.
    Grodzevich, O., Romanko, O.: Normalization and other topics in multi-objective optimization. In: Proceedings of the Fields MITACS Industrial Problems Workshop (2006)Google Scholar
  9. 9.
    Halit, C., Capin, T.: Multiscale motion saliency for keyframe extraction from motion capture sequences. Comput. Animat. Virtual Worlds 22(1), 3–14 (2011)CrossRefGoogle Scholar
  10. 10.
    Huang, K.S., Chang, C.F., Hsu, Y.Y., Yang, S.N.: Key probe: a technique for animation keyframe extraction. Vis. Comput. 21(8–10), 532–541 (2005)CrossRefGoogle Scholar
  11. 11.
    Jin, C., Fevens, T., Mudur, S.: Optimized keyframe extraction for 3d character animations. Comput. Animat. Virtual Worlds 23(6), 559–568 (2012)CrossRefGoogle Scholar
  12. 12.
    Kwon, J.Y., Lee, I.K.: Determination of camera parameters for character motions using motion area. Vis. Comput. 24(7–9), 475–483 (2008)CrossRefGoogle Scholar
  13. 13.
    Lee, H.J., Shin, H.J., Choi, J.J.: Single image summarization of 3d animation using depth images. Comput. Animat. Virtual Worlds 23(3–4), 417–424 (2012)CrossRefGoogle Scholar
  14. 14.
    Lessing, G.: Laocoon, or the limits of painting and poetry. Cosmop. Art J. 3(2), 56–58 (1976)Google Scholar
  15. 15.
    Lino, C., Christie, M.: Efficient composition for virtual camera control. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 65–70. Eurographics Association (2012)Google Scholar
  16. 16.
    Lino, C., Christie, M.: Intuitive and efficient camera control with the toric space. ACM Trans. Graph. (TOG) 34(4), 82 (2015)CrossRefGoogle Scholar
  17. 17.
    Liu, Xm, Hao, Am, Zhao, D.: Optimization-based key frame extraction for motion capture animation. Vis. Comput. 29(1), 85–95 (2013)CrossRefGoogle Scholar
  18. 18.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  19. 19.
    Rav-Acha, A., Pritch, Y., Peleg, S.: Making a long video short: dynamic video synopsis. In: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 1, pp. 435–441. IEEE (2006)Google Scholar
  20. 20.
    Rudoy, D., Zelnik-Manor, L.: Viewpoint selection for human actions. Int. J. Comput. Vis. 97(3), 243–254 (2012)CrossRefGoogle Scholar
  21. 21.
    Secord, A., Lu, J., Finkelstein, A., Singh, M., Nealen, A.: Perceptual models of viewpoint preference. ACM Trans. Graph. (TOG) 30(5), 109 (2011)CrossRefGoogle Scholar
  22. 22.
    Speidel, K.: Can a single still picture tell a story? Definitions of narrative and the alleged problem of time with single still pictures. Diegesis (2013)Google Scholar
  23. 23.
    Turkay, C., Koc, E., Balcisoy, S.: An information theoretic approach to camera control for crowded scenes. Vis. Comput. 25(5), 451–459 (2009)CrossRefGoogle Scholar
  24. 24.
    Vázquez, P.P., Feixas, M., Sbert, M., Heidrich, W.: Viewpoint selection using viewpoint entropy. VMV 1, 273–280 (2001)Google Scholar
  25. 25.
    Wang, M., Guo, S., Liao, M., He, D., Chang, J., Zhang, J., Zhang, Z.: Pose selection for animated scenes and a case study of bas-relief generation. In: Computer Graphics International Conference, p. 31 (2017)Google Scholar
  26. 26.
    Wang, M., Guo, S., Zhang, H., He, D., Chang, J., Zhang, J.J.: Saliency-based relief generation. IETE Tech. Rev. 30(6), 454–480 (2013)CrossRefGoogle Scholar
  27. 27.
    Wang, W., Gao, T.: Constructing canonical regions for fast and effective view selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4114–4122 (2016)Google Scholar
  28. 28.
    Werner, W.: Narrative and narrativity: a narratological reconceptualization and its applicability to the visual arts. Word Image 19(3), 180–197 (2003)CrossRefGoogle Scholar
  29. 29.
    Williams, R.: The Animator’s Survival Kit: A Manual of Methods, Principles and Formulas for Classical, Computer, Games, Stop Motion and Internet Animators. Macmillan, NY (2009)Google Scholar
  30. 30.
    Xia, G., Sun, H., Niu, X., Zhang, G., Feng, L.: Keyframe extraction for human motion capture data based on joint kernel sparse representation. IEEE Trans. Ind. Electron. 64(2), 1589–1599 (2017)CrossRefGoogle Scholar
  31. 31.
    Zhang, Y.W., Zhou, Y.Q., Li, X.L., Liu, H., Zhang, L.L.: Bas-relief generation and shape editing through gradient-based mesh deformation. IEEE Trans. Vis. Comput. Graph. 21(3), 328–338 (2015)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Information Engineering Northwest A&F UniversityXianyangChina
  2. 2.Key Laboratory of Agricultural Internet of Things, Ministry of AgricultureXianyangChina
  3. 3.School of SoftwareXiamen UniversityXiamenChina
  4. 4.National Centre for Computer AnimationBournemouth UniversityPooleUK

Personalised recommendations