Group sparse representation for image categorization and semantic video retrieval

Liu, YaNan; Wu, Fei; Zhuang, YueTing

doi:10.1007/s11432-011-4344-2

Group sparse representation for image categorization and semantic video retrieval

Research Papers
Published: 02 August 2011

Volume 54, pages 2051–2063, (2011)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

YaNan Liu^1,2,
Fei Wu² &
YueTing Zhuang²

208 Accesses
8 Citations
Explore all metrics

Abstract

Multimedia content analysis and management are a promising and challenging theme. In this paper we develop a novel approach to image representation, which we call group sparse representation (GSR), for image classification and video retrieval. The basic idea is to represent a test image as a weighted combination of all the training images. In particular, we introduce two sets of weight coefficients, one for each training image and the other for each class. Moreover, we formulate our concern as a group nonnegative garrote model. The resulting representations are sparse, and they are appropriate for discriminant analysis. Experiments on Caltech101 and PASCAL VOC2008 image dataset and TRECVID2005 video corpus testify that our proposed approach is efficient and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Group sparse based locality – sensitive dictionary learning for video semantic analysis

Article 29 July 2018

Collaborative Dictionary Learning and Soft Assignment for Sparse Coding of Image Features

Discriminative sparse neighbor coding

Article 07 October 2015

References

Candes E, Romberg J, Tao T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inform Theory, 2006, 52: 489–509
Article MathSciNet Google Scholar
Candes E, Tao T. Near optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans Inform Theory, 2006, 52: 5406–5425
Article MathSciNet Google Scholar
Donoho D. Compressed sensing. IEEE Trans Inform Theory, 2006, 52: 1289–1306
Article MathSciNet Google Scholar
Donoho D, Tanner J. Thresholds for the recovery of sparse solutions via l1 minimization. In: Conference on Information Sciences and Systems. 2006. 202–206
Goyal V K, Fletcher A K, Rangan S. Compressive sampling and lossy compression. IEEE Signal Proc Mag, 2008, 25: 48–56
Article Google Scholar
Kwak N. Principal component analysis based on l1-norm maximization. IEEE Trans Pattern Anal, 2008, 30: 1672–1680
Article Google Scholar
Elad M. Optimized projections for compressed sensing. IEEE Trans Signal Proces, 2007, 55: 5695–5702
Article MathSciNet Google Scholar
Yang J C, Wright J, Huang T, et al. Image supre-resolution as sparse representation of raw image patches. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008). 2008. 1–8
Han Y H, Wu F, Jia J Z, et al. Multi-task sparse discriminant analysis (MtSDA) with overlapping categories. In: Proceeding of the 24th Conference on Artificial Intelligence (AAAI). 2010
Han Y H, Wu F, He X F, et al. Multi-label transfer learning with sparse representation. IEEE Trans Circ Syst Vid, 2010, 20: 1110–1121
Article Google Scholar
Han Y H, Wu F, Tian Q, et al. Multi-label boosting for image annotation by structural grouping sparsity. In: Proceedings of the International Conference on Multimedia (MM’ 10). New York: ACM. 15–24
Sen P, Darabi S. Compressive dual photography. Comput Graph Forum, 2009, 28: 609–618
Article Google Scholar
Wright J, Yang A, Ganesh A, et al. Robust face recognition via sparse representation. IEEE Trans Pattern Anal, 2009, 31: 201–227
Article Google Scholar
Tibshirani R. Regression shrinkage and selection via the lasso. J Roy Stat Soc B, 1996, 58: 267–288
MathSciNet MATH Google Scholar
Meinshausen N, Buhlmann P. High-dimensional graphs and variable selection with the Lasso. Ann Stat, 2006, 34: 1436–1462
Article MathSciNet MATH Google Scholar
Breiman L. Better subset regression using the nonnegative garrote. Technometrics, 1995, 37: 373–384
Article MathSciNet MATH Google Scholar
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J Roy Stat Soc B, 2006, 68: 49–67
Article MathSciNet MATH Google Scholar
Yuan M, Lin Y. On the non-negative garrotte estimator. J Roy Stat Soc B, 2007, 69: 143–161
Article MathSciNet MATH Google Scholar
Ferrari V, Tuytelaars T, Gool L V. Simultaneous object recognition and segmentation by image exploration. In: Proceedings of 8th European Conference on Computer Vision. 2004. 40–54
Grauman K, Darrell T. Efficient image matching with distributions of local invariant features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005). 2005. 627–634
Lazebnik S, Schmid C, Ponce J. A sparse texture representation using local affine regions. IEEE Trans Pattern Anal, 2005, 27: 1265–1278
Article Google Scholar
Schmid C, Mohr R. Local grayscale invariants for image retrieval. IEEE Trans Pattern Anal, 1997, 19: 530–534
Article Google Scholar
Fergus R, Perona P, Zisserman A. Object class recognition by unsupervised scale-invariant learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2003). 2003. 264–271
Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Trans Pattern Anal, 2005, 27: 1615–1630
Article Google Scholar
Lowe D. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 60: 1150–1157
Article Google Scholar
Belongie S, Malik J, Puzicha J. Shape matching and object recognition using shape context. IEEE Trans Pattern Anal, 2002, 2: 509–522
Article Google Scholar
Ke Y, Sukthankar R. PCA-SIFT: a more distinctive representation for local image descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2004). 2004. 511–517
Freeman W, Adelson E. The design and use of steerable filters. IEEE Trans Pattern Anal, 1991, 13: 891–906
Article Google Scholar
Lowe D. Object recognition from local scale-invariant features. In: Proceedings of IEEE International Conference on Computer Vision. 1999. 1150–1157
Loncomilla P, Ruiz-del-Solar J. Improving SIFT-based object recognition for robot applications. LNCS, 2005, 3617: 1084–1092
Google Scholar
Xing J, Miao Z J. An improved algorithm on image stitching based on SIFT features. In: 2nd International Conference on Innovative Computing, Informatio and Control (ICICIC 2007). 2007. 453
Wu C C, Clipp B, Li X W, et al. 3D model matching with viewpoint-invariant patches. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008). 2008. 1–8
Kim T K, Cipolla R. Gesture recognition under small sample size. LNCS, 2007, 4843: 335–344
Google Scholar
Battiato S, Gallo G, Puglisi G, et al. SIFT features tracking for video stabilization. In: Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP 2007). 2007. 825–830
Dorko G, Schmid C. Selection of scale-invariant parts for object class recognition. In: IEEE Conference on Computer Vision (ICCV). 2003. 634–639
Sivic J, Russell B, Efros A, et al. Discovering object categories in image collections. In: IEEE Conference on Computer Vision (ICCV). 2005. 370–377
Li F F, Perona P. A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2005. 524–531
Yang J, Jiang Y G, Hauptmann A G, et al. Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval (MIR’ 07). 2007. 197–206
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2006. 2169–2178
Sivic J, Zisserman A. Video google: a text retrieval approach to object matching in videos. In: IEEE Conference on Computer Vision (ICCV). 2003. 1470–1477
Nowak E, Jurie E, Triggs B. Sampling strategies for bag-of-features image classification. In: European Conference on Computer Vision (ECCV). 2006. 490–503
Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315: 972–976
Article MathSciNet Google Scholar
Bosch A, Zisserman A, Munoz X. Scene classification using a hybrid generative/dicriminative approach. IEEE Trans Pattern Anal, 2008, 30: 712–727
Article Google Scholar
Yang J C, Yu K, Gong Y H, et al. Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2009. 1794–1801
Wang C H, Yan S C, Zhang L, et al. Multi-label sparse coding for automatic image annotation. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2009. 1643–1650
Morioka N, Satoh S. Learning directional local pairwise bases with sparse coding. In: Proceedings of the British Machine Vision Conference (BMVC). 2010. 32.1–32.11
Lu A, Hou X W, Chen X L, et al. Insect species recognition using sparse representation. In: Proceedings of the British Machine Vision Conference (BMVC). 2010. 108.1–108.10
Vapnik V. Statistical Learning Theory. New York: John Wiley and Sons, 1998. 421–427
MATH Google Scholar
Hastie T, Tishirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag, 2001. 19–27
MATH Google Scholar
Golub G H, Loan C F V. Matrix Computations. 3rd ed. Baltimore: Johns Hopkins University Press, 1996. 49–50
MATH Google Scholar
Efron B, Johnstone I, Hastie T, et al. Least angle regression. Ann Stat(with discussions), 2004, 32: 407–499
MathSciNet MATH Google Scholar
Friedman J H, Hastie T, Hoefling H, et al. Pathwise coordinate optimization. Ann Appl Stat, 2007, 2: 302–332
Article Google Scholar
Amaldi E, Kann V. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor Comput Sci, 1998, 209: 237–260
Article MathSciNet MATH Google Scholar
Donoho D L. For most large underdetermined systems of equations, the minimal l1-norm near-solution approximates the sparsest near-solution. Commun Pur Appl Math, 2006, 59: 907–934
Article MathSciNet Google Scholar
Li F F, Fergus R, Perona P. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: IEEE Conference Computer Vision and Pattern Recognition (CVPR), Workshop on Generative-Model Based Vision. 2004. 178
Everingham M, Van Gool, L, Williams C K, et al. The PASCAL visual object classes challenge 2008 (VOC2008) results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html
TREVID. http://www-nlpir.nist.gov/projects/trevid/.
LSCOM Lexicon Definitions and Annotations Version 1.0. Columbia University ADVENT Technical Report 117-2006-3. 2006
Xia D Y, Wu F, Zhang X Q, et al. Local and global approaches of affinity propagation clustering for large scale data. J Zhejiang Univ Sci A, 2008, 9: 1373–1381
Article MATH Google Scholar
Candes E, Romberg J. l1-magic: Recovery of sparse signals via convex programming. http://www.acm.caltech.edu/l1magic/.2005
Lewis D D. Naive Bayes at forty: the independence assumption in information retrieval. In: 10th European Conference on Machine Learning (ECML-98). 1998. 4–15
Liu Y N, Wu F, Zhang Z H, et al. Sparse representation using nonnegative curds and whey. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2010. 3578–3585

Download references

Author information

Authors and Affiliations

School of Information, Zhejiang University of Finance & Economics, Hangzhou, 310018, China
YaNan Liu
College of Computer Science and Technology, Zhejiang University, Hangzhou, 310012, China
YaNan Liu, Fei Wu & YueTing Zhuang

Authors

YaNan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wu
View author publications
You can also search for this author in PubMed Google Scholar
YueTing Zhuang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to YaNan Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Wu, F. & Zhuang, Y. Group sparse representation for image categorization and semantic video retrieval. Sci. China Inf. Sci. 54, 2051–2063 (2011). https://doi.org/10.1007/s11432-011-4344-2

Download citation

Received: 21 October 2009
Accepted: 15 February 2011
Published: 02 August 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s11432-011-4344-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Group sparse representation for image categorization and semantic video retrieval

Abstract

Access this article

Similar content being viewed by others

Group sparse based locality – sensitive dictionary learning for video semantic analysis

Collaborative Dictionary Learning and Soft Assignment for Sparse Coding of Image Features

Discriminative sparse neighbor coding

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Group sparse representation for image categorization and semantic video retrieval

Abstract

Access this article

Similar content being viewed by others

Group sparse based locality – sensitive dictionary learning for video semantic analysis

Collaborative Dictionary Learning and Soft Assignment for Sparse Coding of Image Features

Discriminative sparse neighbor coding

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation