Efficient Human Action Detection Using a Transferable Distance Function

  • Weilong Yang
  • Yang Wang
  • Greg Mori
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5995)


In this paper, we address the problem of efficient human action detection with only one template. We choose the standard sliding-window approach to scan the template video against test videos, and the template video is represented by patch-based motion features. Using generic knowledge learnt from previous training sets, we weight the patches on the template video, by a transferable distance function. Based on the patch weighting, we propose a cascade structure which can efficiently scan the template video over test videos. Our method is evaluated on a human action dataset with cluttered background, and a ballet video with complex human actions. The experimental results show that our cascade structure not only achieves very reliable detection, but also can significantly improve the efficiency of patch-based human action detection, with an order of magnitude improvement in efficiency.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Moeslund, T., Hilton, A., Kruger, V.: A survey of advances in vision-based human motion capture and analysis. CVIU 103(2-3), 90–126 (2006)Google Scholar
  2. 2.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)CrossRefGoogle Scholar
  3. 3.
    Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: ICCV, vol. 1, pp. 166–173 (2005)Google Scholar
  4. 4.
    Laptev, I., Pérez, P.: Retrieving actions in movies. In: ICCV (2007)Google Scholar
  5. 5.
    Shechtman, E., Irani, M.: Space-time behavior based correlation. In: CVPR (2005)Google Scholar
  6. 6.
    Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV (2007)Google Scholar
  7. 7.
    Yang, W., Wang, Y., Mori, G.: Human action recognition from a single clip per action. In: The 2nd International Workshop on Machine Learning for Vision-based Motion Analysis (2009)Google Scholar
  8. 8.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, pp. 726–733 (2003)Google Scholar
  9. 9.
    Viola, P., Jones, M.: Robust real-time face detection. In: IJCV (2004)Google Scholar
  10. 10.
    Ferencz, A., Learned-Miller, E., Malik, J.: Learning to locate informative features for visual identification. In: IJCV (2006)Google Scholar
  11. 11.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. PAMI 28(4), 594–611 (2006)Google Scholar
  12. 12.
    Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  13. 13.
    Frome, A., Singer, Y., Malik, J.: Image retrieval and classification using local distance functions. In: NIPS, vol. 19. MIT Press, Cambridge (2007)Google Scholar
  14. 14.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Weilong Yang
    • 1
  • Yang Wang
    • 1
  • Greg Mori
    • 1
  1. 1.School of Computing ScienceSimon Fraser UniversityBurnabyCanada

Personalised recommendations