Cross-View Action Recognition from Temporal Self-similarities

  • Imran N. Junejo
  • Emilie Dexter
  • Ivan Laptev
  • Patrick Pérez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5303)


This paper concerns recognition of human actions under view changes. We explore self-similarities of action sequences over time and observe the striking stability of such measures across views. Building upon this key observation we develop an action descriptor that captures the structure of temporal similarities and dissimilarities within an action sequence. Despite this descriptor not being strictly view-invariant, we provide intuition and experimental validation demonstrating the high stability of self-similarities under view changes. Self-similarity descriptors are also shown stable under action variations within a class as well as discriminative for action recognition. Interestingly, self-similarities computed from different image features possess similar properties and can be used in a complementary fashion. Our method is simple and requires neither structure recovery nor multi-view correspondence estimation. Instead, it relies on weak geometric properties and combines them with machine learning for efficient cross-view action recognition. The method is validated on three public datasets, it has similar or superior performance compared to related methods and it performs well even in extreme conditions such as when recognizing actions from top views while using side views for training only.


Recognition Accuracy Action Recognition Human Action Recognition Gait Recognition Motion Capture Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Moeslund, T., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. CVIU 103, 90–126 (2006)Google Scholar
  2. 2.
    Wang, L., Hu, W., Tan, T.: Recent developments in human motion analysis. Pattern Recognition 36, 585–601 (2003)CrossRefGoogle Scholar
  3. 3.
    Yilmaz, A., Shah, M.: Recognizing human actions in videos acquired by uncalibrated moving cameras. In: Proc. ICCV, pp. I:150–157 (2005) Google Scholar
  4. 4.
    Syeda-Mahmood, T., Vasilescu, M., Sethi, S.: Recognizing action events from multiple viewpoints. In: Proc. EventVideo, pp. 64–72 (2001)Google Scholar
  5. 5.
    Parameswaran, V., Chellappa, R.: View invariance for human action recognition. IJCV 66, 83–101 (2006)CrossRefGoogle Scholar
  6. 6.
    Shen, Y., Foroosh, H.: View invariant action recognition using fundamental ratios. In: Proc.  CVPR (2008)Google Scholar
  7. 7.
    Li, R., Tian, T., Sclaroff, S.: Simultaneous learning of nonlinear manifold and dynamical models for high-dimensional time series. In: Proc. ICCV (2007)Google Scholar
  8. 8.
    Ogale, A., Karapurkar, A., Aloimonos, Y.: View-invariant modeling and recognition of human actions using grammars. In: Proc. W. on Dyn. Vis., pp. 115–126 (2006)Google Scholar
  9. 9.
    Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D exemplars. In: Proc. ICCV (2007)Google Scholar
  10. 10.
    Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: Proc. CVPR (2007)Google Scholar
  11. 11.
    Benabdelkader, C., Cutler, R., Davis, L.: Gait recognition using image self-similarity. EURASIP J. Appl. Signal Process 2004, 572–585 (2004)CrossRefGoogle Scholar
  12. 12.
    Cutler, R., Davis, L.: Robust real-time periodic motion detection, analysis, and applications. PAMI 22, 781–796 (2000)CrossRefGoogle Scholar
  13. 13.
    Carlsson, S.: Recognizing walking people. In: Proc. ECCV, pp. I: 472–486 (2000)Google Scholar
  14. 14.
    Lele, S.: Euclidean distance matrix analysis (EDMA): Estimation of mean form and mean form difference. Mathematical Geology 25, 573–602 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Tenenbaum, J., de Silva, V., Langford, J.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)CrossRefGoogle Scholar
  16. 16.
    Rao, C., Yilmaz, A., Shah, M.: View-invariant representation and recognition of actions. IJCV 50(2), 203–226 (2002)CrossRefzbMATHGoogle Scholar
  17. 17.
    Ali, S., Basharat, A., Shah, M.: Chaotic invariants for human action recognition. In: Proc. ICCV (2007)Google Scholar
  18. 18.
    Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29, 2247–2253 (2007)CrossRefGoogle Scholar
  19. 19.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR, pp. I: 886–893 (2005)Google Scholar
  20. 20.
    Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Image Understanding Workshop, pp. 121–130 (1981)Google Scholar
  21. 21.
    Laptev, I., Caputo, B., Schüldt, C., Lindeberg, T.: Local velocity-adapted motion events for spatio-temporal recognition. CVIU 108, 207–229 (2007)Google Scholar
  22. 22.
    Niebles, J., Wang, H., Li, F.: Unsupervised learning of human action categories using spatial-temporal words. In: Proc. BMVC (2006)Google Scholar
  23. 23.
    Marszałek, M., Schmid, C., Harzallah, H., van de Weijer, J.: Learning object representations for visual object class recognition. In: The PASCAL VOC 2007 Challenge Workshop, in conjunction with ICCV (2007)Google Scholar
  24. 24.
    Ikizler, N., Duygulu, P.: Human action recognition using distribution of oriented rectangular patches. In: Workshop on Human Motion, pp. 271–284 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Imran N. Junejo
    • 1
  • Emilie Dexter
    • 1
  • Ivan Laptev
    • 1
  • Patrick Pérez
    • 1
  1. 1.INRIA Rennes - Bretagne AtlantiqueRennes CedexFrance

Personalised recommendations