Improving Human Action Recognition Using Score Distribution and Ranking

Hoai, Minh; Zisserman, Andrew

doi:10.1007/978-3-319-16814-2_1

Minh Hoai^17,18 &
Andrew Zisserman¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9007))

Included in the following conference series:

Asian Conference on Computer Vision

1967 Accesses
16 Citations

Abstract

We propose two complementary techniques to improve the performance of action recognition systems. The first technique addresses the temporal interval ambiguity of actions by learning a classifier score distribution over video subsequences. A classifier based on this score distribution is shown to be more effective than using the maximum or average scores. The second technique learns a classifier for the relative values of action scores, capturing the correlation and exclusion between action classes. Both techniques are simple and have efficient implementations using a Least-Squares SVM. We demonstrate that taken together the techniques exceed the state-of-the-art performance by a wide margin on challenging benchmarks for human actions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis

Article 26 April 2016

A statistical framework for few-shot action recognition

Article 05 April 2021

HMDB51: A Large Video Database for Human Motion Recognition

Notes

1.
http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/.

References

Patron-Perez, A., Marszalek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in tv shows. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2441–2453 (2012)
Article Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision (2011)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the International Conference on Computer Vision (2013)
Google Scholar
Satkin, S., Hebert, M.: Modeling the temporal extent of actions. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 536–548. Springer, Heidelberg (2010)
Chapter Google Scholar
Duchenne, O., Laptev, I., Sivic, J., Bach, F.R., Ponce, J.: Automatic annotation of human actions in video. In: Proceedings of the International Conference on Computer Vision (2009)
Google Scholar
Buehler, P., Everingham, M., Zisserman, A.: Learning sign language by watching TV (using weakly aligned subtitles). In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)
Chapter Google Scholar
Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: Proceedings of the International Conference on Computer Vision (2011)
Google Scholar
Shapovalova, N., Vahdat, A., Cannons, K., Lan, T., Mori, G.: Similarity constrained latent support vector machine: an application to weakly supervised action classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 55–68. Springer, Heidelberg (2012)
Chapter Google Scholar
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34, 601–614 (2012)
Article Google Scholar
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems (2003)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)
Article Google Scholar
Dietterich, T., Lathrop, R., Lozano-Pérez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artif. Intell. 89, 31–71 (1997)
Article MATH Google Scholar
Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Advances in Neural Information Processing Systems (1998)
Google Scholar
Zhang, Q., Goldman, S.A.: EM-DD: an improved multiple-instance learning technique. In: Advances in Neural Information Processing Systems (2002)
Google Scholar
Hu, Y., Li, M., Yu, N.: Multiple-instance ranking: learning to rank images for image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Ray, S., Craven, M.: Supervised versus multiple instance learning: an empirical comparison. In: Proceedings of the International Conference on Machine Learning (2005)
Google Scholar
Wohlhart, P., Köstinger, M., Roth, P.M., Bischof, H.: Multiple instance boosting for face recognition in videos. In: Proceedings of the International Conference on Pattern Recognition (2011)
Google Scholar
Gartner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: Proceedings of the International Conference on Machine Learning (2002)
Google Scholar
Chen, Y., Bi, J., Wang, J.Z.: Miles: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1931–1947 (2006)
Article Google Scholar
Kwok, J.T., Cheung, P.M.: Marginalized multi-instance kernels. In: International Joint Conference on Artificial Intelligence (2007)
Google Scholar
Ping, W., Xu, Y., Wang, J., Hua, X.S.: FAMER: making multi-instance learning better and faster. In: International Conference on Data Mining (2011)
Google Scholar
Zhou, Z.H., Sun, Y.Y., Li, Y.F.: Multi-instance learning by treating instances as non-i.i.d. samples. In: Proceedings of the International Conference on Machine Learning (2009)
Google Scholar
Ping, W., Xu, Y., Ren, K., Chi, C.H., Shen, F.: Non-I.I.D. multi-instance dimensionality reduction by learning a maximum bag margin subspace. In: AAAI Conference on Artificial Intelligence (2010)
Google Scholar
Li, W., Duan, L., Xu, D., Tsang, I.W.H.: Text-based image retrieval using progressive multi-instance learning. In: Proceedings of the International Conference on Computer Vision (2011)
Google Scholar
Hajimirsadeghi, H., Li, J., Mori, G., Sayed, T., Zaki, M.: Multiple instance learning by discriminative training of markov networks. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2013)
Google Scholar
Poggio, T., Vetter, T.: Recognition and structure from one 2D model view: observations on prototypes, object classes and symmetries. Technical report AIM-1347, MIT (1992)
Google Scholar
Vedaldi, A., Blaschko, M., Zisserman, A.: Learning equivariant structured output svm regressors. In: Proceedings of the International Conference on Computer Vision (2011)
Google Scholar
Nowozin, S., Bakir, G., Tsuda, K.: Discriminative subsequence mining for action classification. In: Proceedings of the International Conference on Computer Vision (2007)
Google Scholar
Nguyen, M.H., Torresani, L., De la Torre, F., Rother, C.: Weakly supervised discriminative localization and classification: a joint learning process. In: Proceedings of the International Conference on Computer Vision (2009)
Google Scholar
Yuan, J., Liu, Z., Yu, Y.: Discriminative subvolume search for efficient action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Hoai, M., Lan, Z.Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Gaidon, A., Harchaoui, Z., Schmid, C.: Actom sequence models for efficient action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Cheung, P.M., Kwok, J.T.: A regularization framework for multiple-instance learning. In: Proceedings of the International Conference on Machine Learning (2006)
Google Scholar
Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Trans. Syst. Man Cybern. 18, 183–190 (1988)
Article MATH MathSciNet Google Scholar
Yager, R.R., Filev, D.P.: Induced ordered weighted averaging operators. IEEE Trans. Syst. Man Cybern. 29, 141–150 (1999)
Article Google Scholar
Hajimirsadeghi, H., Mori, G.: Multiple instance real boosting with aggregation functions. In: Proceedings of the International Conference on Pattern Recognition (2012)
Google Scholar
Li, F., Sminchisescu, C.: Convex multiple-instance learning by estimating likelihood ratio. In: Advances in Neural Information Processing Systems (2010)
Google Scholar
Aytar, Y., Orhan, O.B., Shah, M.: Improving semantic concept detection and retrieval using contextual estimates. In: ICME (2007)
Google Scholar
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: Proceedings of the International Conference on Computer Vision (2007)
Google Scholar
Torresani, L., Szummer, M., Fitzgibbon, A.: Efficient object category recognition using classemes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 776–789. Springer, Heidelberg (2010)
Chapter Google Scholar
Li, L.J., Su, H., Xing, E.P., Fei-Fei, L.: Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: Advances in Neural Information Processing Systems (2010)
Google Scholar
Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2012)
Google Scholar
Bourdev, L., Maji, S., Malik, J.: Describing people: a poselet-based approach to attribute classification. In: Proceedings of the International Conference on Computer Vision, pp. 1543–1550 (2011)
Google Scholar
Song, Z., Chen, Q., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2010)
Google Scholar
Suykens, J.A.K., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9, 293–300 (1999)
Article MathSciNet Google Scholar
Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the International Conference on Machine Learning (1998)
Google Scholar
Suykens, J.A.K., Gestel, T.V., Brabanter, J.D., DeMoor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2002)
Book MATH Google Scholar
Tommasi, T., Caputo, B.: The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: Proceedings of the British Machine Vision Conference (2009)
Google Scholar
Hoai, M.: Regularized max pooling for image categorization. In: Proceedings of the British Machine Vision Conference (2014)
Google Scholar
Cawley, G.C., Talbot, N.L.: Fast exact leave-one-out cross-validation of sparse least-squares support vector machines. Neural Netw. 17, 1467–1475 (2004)
Article MATH Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Vig, E., Dorr, M., Cox, D.: Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. Lecture Notes in Computer Science, vol. 7578, pp. 84–97. Springer, Heidelberg (2012)
Chapter Google Scholar
Marin-Jimenez, M.J., Yeguas, E., de la Blanca, N.P.: Exploring stip-based models for recognizing human interactions in tv videos. PRL 34, 1819–1828 (2013)
Article Google Scholar
Jiang, Y.-G., Dai, Q., Xue, X., Liu, W., Ngo, C.-W.: Trajectory-based modeling of human actions with motion reference points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 425–438. Springer, Heidelberg (2012)
Chapter Google Scholar
Mathe, S., Sminchisescu, C.: Dynamic eye movement datasets and learnt saliency models for visual action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 842–856. Springer, Heidelberg (2012)
Chapter Google Scholar
Gaidon, A., Harchaoui, Z., Schmid, C.: Recognizing activities with cluster-trees of tracklets. In: Proceedings of the British Machine Vision Conference (2012)
Google Scholar
Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 256–269. Springer, Heidelberg (2012)
Chapter Google Scholar
Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 581–595. Springer, Heidelberg (2014)
Chapter Google Scholar
Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Yu, G., Yuan, J., Liu, Z.: Propagative hough voting for human activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. lncs, vol. 7574, pp. 693–706. Springer, Heidelberg (2012)
Chapter Google Scholar
Hoai, M., Zisserman, A.: Talking heads: detecting humans and recognizing their interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar

Download references

Acknowledgements

This work was supported by the EPSRC grant EP/I012001/1 and a Royal Society Wolfson Research Merit Award.

Author information

Authors and Affiliations

Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, UK
Minh Hoai & Andrew Zisserman
Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
Minh Hoai

Authors

Minh Hoai
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minh Hoai .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoai, M., Zisserman, A. (2015). Improving Human Action Recognition Using Score Distribution and Ranking. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-16814-2_1
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Human Action Recognition Using Score Distribution and Ranking

Abstract

Access this chapter

Similar content being viewed by others

EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis

A statistical framework for few-shot action recognition

HMDB51: A Large Video Database for Human Motion Recognition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving Human Action Recognition Using Score Distribution and Ranking

Abstract

Access this chapter

Similar content being viewed by others

EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis

A statistical framework for few-shot action recognition

HMDB51: A Large Video Database for Human Motion Recognition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation