Signal, Image and Video Processing

, Volume 9, Issue 3, pp 705–714 | Cite as

Locating and recognizing multiple human actions by searching for maximum score subsequences

  • Hong-Bo Zhang
  • Shao-Zi Li
  • Shu-Yuan Chen
  • Song-Zhi Su
  • Xian-Ming Lin
  • Dong-Lin Cao
Original Paper


Despite the numerous methods to recognize human actions in a video, few are designed for videos containing more than one action over a certain time period. Moreover, existing multiple action recognition methods adopt windowed sequence search strategy. Windowed sequence searching requires an exhaustive trial of window length yielding intensive computation. This work presents a frame-based strategy, capable of searching for maximum score subsequences that correspond to actions. Therefore, start and ending times of all actions are located, and action categories are identified as well. Moreover, contrast mutual information is proposed as a new score function to increase recognition accuracy. Experimental results indicate that the proposed method locates and recognizes multiple actions in a video accurately, even for the conventional single action classification problem.


Multiple action recognition Frame-based strategy Maximum score subsequences Contrast mutual information 



The authors would like to thank the anonymous reviewers for the valuable and insightful comments on the earlier version of this manuscript. This work was partially supported by National Nature Science Foundation of China (No. 61202143), the Nature Science Foundation of Fujian Province (No. 2011J01367), Xiamen University 985 Project and National Science Council of Taiwan (NSC-101-2221-E-155-060).


  1. 1.
    Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3):16:1–16:43 (2011)Google Scholar
  2. 2.
    Roppe, R.: A survey on vision-based human action recognition. Image Comput. 28(3), 976–990 (2010)Google Scholar
  3. 3.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)CrossRefGoogle Scholar
  4. 4.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proceedings 2rd International Workshop Visual Surveillance Performance Evaluation Tracking Surveillance, Beijing, China, Oct. 15–16, pp. 65–72 (2005)Google Scholar
  5. 5.
    Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: Presented at British Machine Vision Conference, London, England, Sept. 7–10, (2009)Google Scholar
  6. 6.
    Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Presented at British Machine Vision Conference. Leeds, UK, Sept. 1–4, (2008)Google Scholar
  7. 7.
    Wang, Y., Mori, G.: Learning a discriminative hidden part model for human action recognition. In: Proceedings 22nd Annual Conference on Neural Information Processing Systems, Vancouver, Canada, Dec. 8–11, pp. 1721–1728 (2008)Google Scholar
  8. 8.
    Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: Proceedings International Conference Computer Vision, Rio de Janeiro, Brazil, Oct. 14–21, pp. 1–8 (2007)Google Scholar
  9. 9.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings 15th ACM International Conference Multimedia, Bavaria, Germany, Sept. 24–29, pp. 357–360 (2007)Google Scholar
  10. 10.
    Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: Proceedings International Workshop on Spatial Coherence for Visual Motion Analysis, Prague, Czech republic, May 15–15, pp. 91–103 (2004)Google Scholar
  11. 11.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings 26th IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, United states, Jun. 23–28, pp. 1–8 (2008) Google Scholar
  12. 12.
    Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Proceedings 10th European Conference on Computer Vision, Marseille, France, Oct. 12–18, pp. 650–663 (2008)Google Scholar
  13. 13.
    Yuan, J., Liu, S., Wu, Y.: Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1728–1743 (2011)Google Scholar
  14. 14.
    Niebles, J.C., Wang, H., Li, F.-F.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)CrossRefGoogle Scholar
  15. 15.
    Bently, J.: Programming pearls: algorithm design techniques. Commun. ACM 27(9), 865–873 (1984)CrossRefGoogle Scholar
  16. 16.
  17. 17.
    Zhang, T., Liu, J., Liu, S., Xu, C., Lu, H.: Boosted exemplar learning for action recognition and annotation. IEEE Trans. Circuits Syst. Video Technol. 21(7), 853–866 (2011)CrossRefGoogle Scholar
  18. 18.
    Seo, H.J., Milanfar, P.: Action recognition from one example. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 867–882 (2011)CrossRefGoogle Scholar
  19. 19.
    Chakraborty, B., Holte, M., Moeslund, T.B., Gonzalez, J.: Selective spatio-temporal interest points. Comput. Vis. Image Underst. 116(3), 396–410 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  1. 1.School of Information Science and TechnologyXiamen UniversityXiamenChina
  2. 2.Fujian Key Laboratory of the Brain-like Intelligent SystemsXiamen UniversityXiamenChina
  3. 3.Department of Computer Science and EngineeringYuan Ze UniversityTaoyuanTaiwan

Personalised recommendations