Similarity Constrained Latent Support Vector Machine: An Application to Weakly Supervised Action Classification

  • Nataliya Shapovalova
  • Arash Vahdat
  • Kevin Cannons
  • Tian Lan
  • Greg Mori
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7578)


We present a novel algorithm for weakly supervised action classification in videos. We assume we are given training videos annotated only with action class labels. We learn a model that can classify unseen test videos, as well as localize a region of interest in the video that captures the discriminative essence of the action class. A novel Similarity Constrained Latent Support Vector Machine model is developed to operationalize this goal. This model specifies that videos should be classified correctly, and that the latent regions of interest chosen should be coherent over videos of an action class. The resulting learning problem is challenging, and we show how dual decomposition can be employed to render it tractable. Experimental results demonstrate the efficacy of the method.


Latent Variable Action Recognition Latent Region Test Video Training Video 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. PAMI 32, 1627–1645 (2010)CrossRefGoogle Scholar
  2. 2.
    Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: ICCV (2011)Google Scholar
  3. 3.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)Google Scholar
  4. 4.
    Bilen, H., Namboodiri, V., Gool, L.V.: Object and action classification with latent variables. In: Proceedings of the British Machine Vision Conference (2011)Google Scholar
  5. 5.
    Aggarwal, J., Ryoo, M.: Human activity analysis: A review. ACM Comput. Surv. 43, 16:1–16:43 (2011)Google Scholar
  6. 6.
    Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR (2010)Google Scholar
  7. 7.
    Vezhnevets, A., Ferrari, V., Buhmann, J.M.: Weakly supervised semantic segmentation with a multi-image model. In: ICCV (2011)Google Scholar
  8. 8.
    Joachims, T.: Transductive inference for text classification using support vector machines. In: International Conference on Machine Learning, ICML (1999)Google Scholar
  9. 9.
    Sontag, D., Globerson, A., Jaakkola, T.: Introduction to dual decomposition for inference. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press (2011)Google Scholar
  10. 10.
    Komodakis, N.: Efficient training for pairwise or higher order crfs via dual decomposition. In: CVPR, pp. 1841–1848 (2011)Google Scholar
  11. 11.
    Fergus, R., Perona, P., Zisserman, A.: Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision 71, 273–303 (2007)Google Scholar
  12. 12.
    Viola, P.A., Platt, J.C., Zhang, C.: Multiple instance boosting for object detection. In: NIPS (2005)Google Scholar
  13. 13.
    Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: CVPR (2010)Google Scholar
  14. 14.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  15. 15.
    Sorokin, A., Forsyth, D.: Utility data annotation with amazon mechanical turk. In: First IEEE Workshop on Internet Vision (at CVPR). (2008)Google Scholar
  16. 16.
    Wang, Y., Mori, G.: Hidden part models for human action recognition: Probabilistic vs. max-margin. IEEE Trans. PAMI 33, 1310–1323 (2011)Google Scholar
  17. 17.
    Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: ICML (2009)Google Scholar
  18. 18.
    Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: NIPS (2004)Google Scholar
  19. 19.
    Zien, A., Brefeld, U., Scheffer, T.: Transductive support vector machines for structured variables. In: International Conference on Machine Learning, ICML (2007)Google Scholar
  20. 20.
    Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC (2008)Google Scholar
  21. 21.
    Do, T.M.T., Artieres, T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)Google Scholar
  22. 22.
    Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR (2008)Google Scholar
  23. 23.
    Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Nataliya Shapovalova
    • 1
  • Arash Vahdat
    • 1
  • Kevin Cannons
    • 1
  • Tian Lan
    • 1
  • Greg Mori
    • 1
  1. 1.School of Computing ScienceSimon Fraser UniversityCanada

Personalised recommendations