Supervised Learning and Codebook Optimization for Bag-of-Words Models

Jiu, Mingyuan; Wolf, Christian; Garcia, Christophe; Baskurt, Atilla

doi:10.1007/s12559-012-9137-4

Supervised Learning and Codebook Optimization for Bag-of-Words Models

Published: 24 April 2012

Volume 4, pages 409–419, (2012)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Mingyuan Jiu¹,
Christian Wolf¹,
Christophe Garcia¹ &
…
Atilla Baskurt¹

571 Accesses
30 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we present a novel approach for supervised codebook learning and optimization for bag-of-words models. This type of models is frequently used in visual recognition tasks like object class recognition or human action recognition. An entity is represented as a histogram of codewords, which are traditionally clustered with unsupervised methods like k-means or random forests and then classified in a supervised way. We propose a new supervised method for joint codebook creation and class learning, which learns the cluster centers of the codebook in a goal-directed way using the class labels of the training set. As a result, the codebook is highly correlated to the recognition problem, leading to a more discriminative codebook. We propose two different learning algorithms, one based on error backpropagation and the other based on cluster label reassignment. We apply the proposed method to human action recognition from video sequences and evaluate it on the KTH data set, reporting very promising results. The proposed technique allows us to improve the discriminative power of an unsupervised learned codebook or to keep the discriminative power while decreasing the size of the learned codebook, thus decreasing the computational complexity due to the nearest neighbor search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Csurka G, Dance C, Fan LX, Willamowski J, Bray C. Visual categorization with bags of keypoints. In: Proceedings of ECCV international workshop on statistical learning in computer vision. 2004. p. 1–22.
Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis (IJCV). 2004;60(2):91–110.
Article Google Scholar
Dollar P, Rabaud V, Cottrell G, Belongie S. Behavior recognition via sparse spatio-temporal features. In: ICCV workshop on visual surveillance and performance evaluation of tracking and surveillance. 2005. p. 65–72.
Moosmann F, Triggs B, Jurie F. Fast discriminative visual codebooks using randomized clustering forests. In: NIPS. 2007. p. 985–92.
Liu J, Shah M. Learning human actions via information maximization. In: CVPR. 2008. p. 1–8.
Liu J, Yang Y, Shah M. Learning semantic visual vocabularies using diffusion distance. In: CVPR. 2009. p. 461-68.
Saghafi B, Farahzadeh E, Rajan D, Sluzek A. Embedding visual words into concept space for action and scene recognition. In: BMVC. 2010. p. 1–11.
Niebles JC, Wang H, Fei-Fei L. Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis (IJCV). 2008;79(3):299–318.
Article Google Scholar
Gilbert A, Illingworth J, Bowden R. Action recognition using mined hierarchical compound features. IEEE Trans Pattern Anal Mach Intell (PAMI). 2011;33(5):883–97.
Article Google Scholar
Laptev I. On space-time interest points. Int J Comput Vis (IJCV). 2005;64(2/3):107–23.
Article Google Scholar
Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local svm approach. In: ICPR. 2004. p. 32–6.
Oikonomopoulos A, Patras I, Pantic M. An implicit spatiotemporal shape model for human activity localization and recognition. In: CVPR. 2009. p. 27–33.
Ryoo MS, Aggarwal JK. Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: ICCV. 2009. p. 1593–600.
Ta A-P, Wolf C, Lavoué G, Baskurt A, Jolion JM. Pairwise features for human action recognition. In: ICPR. 2010. p. 3224–7.
Mikolajczyk K, Uemura H. Action recognition with appearancemotion features and fast search trees. Comput Vis Image Underst (CVIU). 2011;115(3):426–38.
Article Google Scholar
Aggarwal JK, Ryoo MS. Human activity analysis: a review. ACM Comput Surv (inpress).
Turaga P, Chellappa R, Subrahmanian VS, Udrea O. Machine recognition of human activities: a survey. IEEE Trans Circuits Syst Video Technol. 2008;18(11):1473–88.
Article Google Scholar
Weinland D, Ronfard R, Boyer E. A survey of vision-based methods for action representation, segmentation and recognition. Comput Vis Image Underst (CVIU). 2011;115:224–41.
Article Google Scholar
Song Y, Concalves L, Perona P. Unsupervised learning of human motion. IEEE Trans Pattern Anal Mach Intell (PAMI). 2003;25(7):814–27.
Article Google Scholar
Gorelick L, Blank M, Shechtman E, Irani M, Basri R. Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell (PAMI). 2007;29(12):2247–53.
Article Google Scholar
Wang L, Geng X, Leckie C, Ramamohanarao K. Moving shape dynamics: a signal processing perspective. In: CVPR. 2008. p. 1–8.
Weinland D, Boyer E, Ronfard R. Action recognition from arbitrary views using 3D exemplars. In: ICCV. 2007. p. 1–7.
Ke Y, Sukthankar R, Hebert M. Efficient visual event detection using volumetric features. In: ICCV. 2005. p. 166–73.
Mikolajczyk K, Uemura H. Action recognition with motion-appearance vocabulary forest. In: CVPR. 2008. p. 1–8.
Zhang Z, Hu Y, Chan S, Chia LT. Motion context: A new representation for human action recognition. In: ECCV. 2008.
Bregonzio M, Gong SG, Xiang T. Recognising action as clouds of space-time interest points. In: CVPR. 2009. p. 1948–55.
Liu J, Ali S, Shah M. Recognizing human actions using multiple features. In: CVPR. 2008. p. 1–8.
Sun X, Chen M, Hauptmann A. Action recognition via local descriptors and holistic features. In: CVPR workshop on human communicative behavior analysis. 2009. p. 58–65.
Seo HJ, Milanfar P. Action recognition from one example. IEEE Trans Pattern Anal Mach Intell (PAMI). 2011;33(5):867–82.
Article Google Scholar
Shechtman E, Irani M. Space-time behavior based correlation. In: CVPR. 2005. p. 405–12.
Ta A-P, Wolf C, Lavoué G, Baskurt A. Recognizing and localizing individual activities through graph matching. In: International conference on advanced video and signal-based surveillance. 2010.
Abdelkader MF, Almageed WA, Srivastava A, Chellappa R. Silhouette-based gesture and action recognition via modeling trajectories on riemannian shape manifolds. Comput Vis Image Underst (CVIU). 2011;115(3):439–55.
Article Google Scholar
Boiman O, Irani M. Detecting irregularities in images and in video. Int J Comput Vis (IJCV). 2007;74(1):17–31.
Article Google Scholar
Cuntoor NP, Yegnanarayana B, Chellappa R. Activity modeling using event probability sequences. IEEE Trans Image Process. 2008;17(4):594–07.
Article PubMed CAS Google Scholar
Shi Q, Cheng L, Wang L, Smola A. Human action segmentation and recognition using discriminative semi-Markov models. Int J Comput Vis (IJCV). 2010;93(1):22–32.
Article Google Scholar
Xiang T, Gong S. Activity based surveillance video content modelling. Pattern Recogn. 2008;41(7):2309–26.
Article Google Scholar
Xiang T, Gong S. Incremental and adaptive abnormal behaviour detection. Comput Vis Image Underst (CVIU). 2008;11(1):59–73.
Article Google Scholar
Zhang D, Perez DG, Bengio S, McCowan I. Semi-supervised adapted hmms for unusual event detection. In: CVPR. 2005. p. 611–8.
Zhou H, Kimber D. Unusual event detection via multi-camera video mining. In: ICPR. 2006. p. 1161–6.
Jhuang H, Serre T, Wolf L, Poggio T. A biologically inspired system for action recognition. In: ICCV. 2007. p. 1–8.
Taylor GW, Fergus R, Lecun Y, Bregler C. Convolutional learning of spatio-temporal features. In: ECCV. 2010.
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A. Sequential deep learning for human action recognition. In: International workshop on human behavior understanding: inducing behavioral change, 2011.
Fathi A, Mori G. Action recognition by learning mid-level motion features. In: CVPR. 2008. p. 1–8.
Dyana A, Das S. Trajectory representation using gabor features for motion-based video retrieval. Pattern Recogn Lett. 2009;30(10):877–92.
Article Google Scholar
Stauffer C, Grimson WEL. Learning patterns of activity using real-time tracking. IEEE Trans Pattern Anal Mach Intell (PAMI). 2000;22(8):747–57.
Article Google Scholar
Ryoo MS, Aggarwal JK. Stochastic representation and recognition of high-Level group activities. Int J Comput Vis (IJCV). 2010;93(2):183–200.
Article Google Scholar
Wang L, Wang Y, Gao W. Mining layered grammar rules for action recognition. Int J Comput Vis (IJCV). 2011;93(2):162–82.
Article Google Scholar
Niebles JC, Fei-Fei L. A hierarchical model of shape and appearance for human action classification. In: CVPR. 2007. p. 1–8.
Bishop CM. Neural networks for pattern recognition. Oxford: Oxford university press; 1994. p. 140–45
Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B. Learning realistic human actions from movies. In: CVPR. 2008. p. 1–8.
Dietterich TG. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998;10(7):1895–924.
Article PubMed Google Scholar
Chang C-C, Lin C-J. LIBSVM a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2(3):27:1–27:27.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, Villeurbanne, 69621, France
Mingyuan Jiu, Christian Wolf, Christophe Garcia & Atilla Baskurt

Authors

Mingyuan Jiu
View author publications
You can also search for this author in PubMed Google Scholar
Christian Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Atilla Baskurt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingyuan Jiu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiu, M., Wolf, C., Garcia, C. et al. Supervised Learning and Codebook Optimization for Bag-of-Words Models. Cogn Comput 4, 409–419 (2012). https://doi.org/10.1007/s12559-012-9137-4

Download citation

Received: 28 July 2011
Accepted: 03 April 2012
Published: 24 April 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s12559-012-9137-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised Learning and Codebook Optimization for Bag-of-Words Models

Abstract

Access this article

Similar content being viewed by others

Learning Hierarchical Bag of Words Using Naive Bayes Clustering

Joint Dictionary and Classifier Learning for Categorization of Images Using a Max-margin Framework

Open Issues on Codebook Generation in Image Classification Tasks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Supervised Learning and Codebook Optimization for Bag-of-Words Models

Abstract

Access this article

Similar content being viewed by others

Learning Hierarchical Bag of Words Using Naive Bayes Clustering

Joint Dictionary and Classifier Learning for Categorization of Images Using a Max-margin Framework

Open Issues on Codebook Generation in Image Classification Tasks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation