Action Disambiguation Analysis Using Normalized Google-Like Distance Correlogram

  • Qianru Sun
  • Hong Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7726)


Classifying realistic human actions in video remains challenging for existing intro-variability and inter-ambiguity in action classes. Recently, Spatial-Temporal Interest Point (STIP) based local features have shown great promise in complex action analysis. However, these methods have the limitation that they typically focus on Bag-of-Words (BoW) algorithm, which can hardly discriminate actions’ ambiguity due to ignoring of spatial-temporal occurrence relations of visual words. In this paper, we propose a new model to capture this contextual relationship in terms of pairwise features’ co-occurrence. Normalized Google-Like Distance (NGLD) is proposed to numerically measuring this co-occurrence, due to its effectiveness in semantic correlation analysis. All pairwise distances compose a NGLD correlogram and its normalized form is incorporated into the final action representation. It is proved a much richer descriptor by observably reducing action ambiguity in experiments, conducted on WEIZMANN dataset and the more challenging UCF sports. Results also demonstrate the proposed model is more effective and robust than BoW on different setups.


Local Feature Visual Word Action Recognition Semantic Distance Human Action Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Yilmaz, A., Shah, M.: Actions Sketch: A Novel Action Representation. In: CVPR, pp. 984–989 (2005)Google Scholar
  2. 2.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing Action at a Distance. In: ICCV, pp. 726–733 (2003)Google Scholar
  3. 3.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as Space-Time Shapes. In: ICCV, pp. 1395–1402 (2005)Google Scholar
  4. 4.
    Bregonzio, M., Gong, S.G., Xiang, T.: Recognising Action as Clouds of Space-Time Interest Points. In: CVPR, pp. 1948–1955 (2009)Google Scholar
  5. 5.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, pp. 32–36 (2004)Google Scholar
  6. 6.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS, pp. 65–72 (2005)Google Scholar
  7. 7.
    Scovanner, P., Ali, S., Shah, M.: A 3-Dimensional SIFT Descriptor and its Application to Action Recognition. In: ACM Conf. Multimedia, pp. 357–360 (2007)Google Scholar
  8. 8.
    Niebles, J.C., Wang, H.C., Fei-Fei, L.: Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. IJCV 79, 299–318 (2008)CrossRefGoogle Scholar
  9. 9.
    Kläser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC, pp. 995–1004 (2008)Google Scholar
  10. 10.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object Detection with Discriminatively Trained Part Based Models. PAMI 32, 1627–1645 (2010)CrossRefGoogle Scholar
  11. 11.
    Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC, pp. 124.1–124.11 (2009)Google Scholar
  12. 12.
    Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Transctions on Knowledge and Data Engineering 19(3), 370–383 (2007)CrossRefGoogle Scholar
  13. 13.
    Savarese, S., DelPozo, A., Niebles, J.C., Fei-Fei, L.: Spatial-Temporal correlatons for unsupervised action classification. In: WMVC, pp. 1–8 (2008)Google Scholar
  14. 14.
    Kovashka, A., Grauman, K.: Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition. In: CVPR, pp. 2046–2053 (2010)Google Scholar
  15. 15.
    Banerjee, P., Nevatia, R.: Learning Neighborhood Co-occurrence Statistics of Sparse Features for Human Activity Recognition. In: AVSS, pp. 212–217 (2011)Google Scholar
  16. 16.
    Rodriguez, M.D., Ahmed, J., Mubarak, S.: Action MACH: A Spatio-temporal Maximum Average Correlation Height Filter for Action Recognition. In: CVPR, pp. 1–8 (2008)Google Scholar
  17. 17.
    Danielsson, O., Carlsson, S., Sullivan, J.: Automatic learning and extraction of multi-local features. In: ICCV, pp. 917–924 (2009)Google Scholar
  18. 18.
    Liu, D., Hua, G., Viola, P., Chen, T.: Integrated feature selection and higherorder spatial feature extraction for object categorization. In: CVPR, pp. 1–8 (2008)Google Scholar
  19. 19.
    Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: ICCV, pp. 492–497 (2009)Google Scholar
  20. 20.
    Haralick, R.M.: Statistical and structural approaches to texture. Proceedings of the IEEE 67(5), 786–804 (1979)CrossRefGoogle Scholar
  21. 21.
    Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: CVPR, pp. 2033–2040 (2006)Google Scholar
  22. 22.
    Sapp, B., Chaudhry, R., Yu, X., Singh, G., Perera, I., Ferraro, F., Tzoukermann, E., Kosecka, J., Neumann, J.: Recognizing Manipulation Actions in Arts and Crafts Shows using Domain-Specific Visual and Textual Cues. In: ICCV Workshops, pp. 1554–1561 (2011)Google Scholar
  23. 23.
    Edelman, S.: Representation and recognition in vision. MIT Press (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Qianru Sun
    • 1
  • Hong Liu
    • 1
  1. 1.Engineering Lab on Intelligent Perception for Internet Of Things(ELIP), Key Laboratory for Machine Perception, Shenzhen Graduate SchoolPeking UniversityChina

Personalised recommendations