Mining Mid-level Features for Image Classification

Fernando, Basura; Fromont, Elisa; Tuytelaars, Tinne

doi:10.1007/s11263-014-0700-1

Mining Mid-level Features for Image Classification

Published: 21 February 2014

Volume 108, pages 186–203, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Basura Fernando¹,
Elisa Fromont² &
Tinne Tuytelaars¹

2684 Accesses
60 Citations
Explore all metrics

Abstract

Mid-level or semi-local features learnt using class-level information are potentially more distinctive than the traditional low-level local features constructed in a purely bottom-up fashion. At the same time they preserve some of the robustness properties with respect to occlusions and image clutter. In this paper we propose a new and effective scheme for extracting mid-level features for image classification, based on relevant pattern mining. In particular, we mine relevant patterns of local compositions of densely sampled low-level features. We refer to the new set of obtained patterns as Frequent Local Histograms or FLHs. During this process, we pay special attention to keeping all the local histogram information and to selecting the most relevant reduced set of FLH patterns for classification. The careful choice of the visual primitives and an extension to exploit both local and global spatial information allow us to build powerful bag-of-FLH-based image representations. We show that these bag-of-FLHs are more discriminative than traditional bag-of-words and yield state-of-the-art results on various image classification benchmarks, including Pascal VOC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Hierarchical Bag of Words Using Naive Bayes Clustering

Locality constrained encoding of frequency and spatial information for image classification

Article 01 March 2018

Yongsheng Pan, Yong Xia, … Weidong Cai

Bag-of-Words Image Representation: Key Ideas and Further Insight

Notes

http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

References

Agarwal, A., & Triggs, B. (2008). Multilevel image coding with hyperfeatures. International Journal of Computer Vision, 78, 15–27. doi:10.1007/s11263-007-0072-x.
Google Scholar
Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large database. SIGMOD Record, 22, 207–216.
Article Google Scholar
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB, (pp. 487–499). http://portal.acm.org/citation.cfm?id=645920.672836. Accessed 30 Aug 2013.
Boiman, O., Shechtman, E., Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR.
Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In International conference on computer vision (ICCV). http://www.eecs.berkeley.edu/lbourdev/poselets . Accessed 30 Aug 2013.
Boureau, Y. L., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. In CVPR.
Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27.
Article Google Scholar
Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In BMVC.
Cheng, H., Yan, X., Han, J., & Hsu, C. W. (2007). Discriminative frequent pattern analysis for effective classification. In ICDE (pp. 716–725). doi:10.1109/ICDE.2007.367917.
Chum, O., Perdoch, M., & Matas, J. (2009). Geometric min-hashing: Finding a (thick) needle in a haystack. In CVPR. doi:10.1109/CVPR.2009.5206531.
Cinbis, R. G., Verbeek, J., & Schmid, C. (2012). Image categorization using fisher kernels of non-iid image models. In CVPR.
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Work. on statistical learning in CV (pp. 1–22).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on computer vision and patternn recognition (CVPR).
Endres, I., Shih, K. J., Jiaa, J., & Hoiem, D. (2013). Learning collections of part models for object recognition. In The IEEE conference on computer vision and pattern recognition (CVPR).
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL Visual Object Classes Challenge 2007 Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html. Accessed 30 Aug 2013.
Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In CVPR (pp. 1778–1785). doi:10.1109/CVPR.2009.5206772.
Fernando, B., Fromont, E., Muselet, D., & Sebban, M. (2012). Discriminative feature fusion for image classification. In CVPR.
Fernando, B., Fromont, É., & Tuytelaars, T. (2012). Effective use of frequent itemset mining for image classification. In ECCV, Lecture Notes in Computer Science (Vol. 7572, pp. 214–227). New York: Springer.
Fernando, B., & Tuytelaars, T. (2013). Mining multiple queries for image retrieval: On-the-fly learning of an object-specific mid-level representation. In ICCV.
Gilbert, A., Illingworth, J., Bowden, R. (2009). Fast realistic multi-action recognition using mined dense spatio-temporal features. In ICCV (pp. 925–931). doi:10.1109/ICCV.2009.5459335.
Jaakkola, T., & Haussler, D. (1998) Exploiting generative models in discriminative classifiers. In NIPS (pp. 487–493).
Juneja, M., Vedaldi, A., Jawahar, C. V., & Zisserman, A. (2013) Blocks that shout: Distinctive parts for scene classification. In CVPR.
Kim, S., Jin, X., & Han, J. (2010). Disiclass: Discriminative frequent pattern-based image classification. In Tenth int. workshop on multimedia data mining. doi: 10.1145/1814245.1814252.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR (pp. 2169–2178).
Lee, A. J., Liu, Y. H., Tsai, H. M., Lin, H. H., & Wu, H. W. (2009). Mining frequent patterns in image databases with 9d-spa representation. Journal of Systems and Software, 82(4), 603–618. doi:10.1016/j.jss.2008.08.028.
Google Scholar
Lee, Y. J., Efros, A. A., & Hebert, M. (2013). Style-aware mid-level representation for discovering visual connections in space and time. In International conference on computer vision.
Ling, H., & Soatto, S. (2007). Proximity distribution kernels for geometric context in category recognition. In ICCV.
Liu, D., Hua, G., Viola, P., & Chen, T. (2008). Integrated feature selection and higher-order spatial feature extraction for object categorization. In it CVPR.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In ICCV, (pp. 1150–1157).
Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In ICVGIP (pp. 722–729). doi: 10.1109/ICVGIP.2008.47.
Nowozin, S., Tsuda, K., Uno, T., Kudo, T., & Bakir, G. (2007). Weighted substructure mining for image analysis. In CVPR. doi:10.1109/CVPR.2007.383171.
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In ECCV (pp. 71–84).
Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV (pp. 143–156). http://dl.acm.org/citation.cfm?id=1888089.1888101. Accessed 30 Aug 2013.
Quack, T., Ferrari, V., & Gool, L. V. (2006). Video mining with frequent itemset configurations. In CIVR (pp. 360–369).
Quack, T., Ferrari, V., Leibe, B., & Van Gool, L. (2007). Efficient mining of frequent and distinctive feature configurations. In ICCV.
Rematas, K., Fritz, M., & Tuytelaars, T. (2012). The pooled nbnn kernel: Beyond image-to-class and image-to-image. ACCV, 7724, 176–189.
Google Scholar
Savarese, S., Winn, J., & Criminisi, A. (2006). Discriminative object class models of appearance and shape by correlatons. In CVPR.
Shahbaz Khan, F., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In ICCV (pp. 979–986).
Sharma, G., Jurie, F., & Schmid, C. (2013). Expanded parts model for human attribute and action recognition in still images. In CVPR.
Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for large-scale image classification. In Advances in neural information processing systems.
Singh, S., Gupta, A., & Efros, A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV.
Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. ICCV, 2, 1470–1477.
Google Scholar
Sivic, J., & Zisserman, A. (2004). Video data mining using configurations of viewpoint invariant regions. In CVPR. doi: 10.1109/CVPR.2004.1315071.
Tuytelaars, T., Fritz, M., Saenko, K., & Darrell, T. (2011). The nbnn kernel. In ICCV (pp. 1824–1831).
Uno, T., Asai, T., Uchida, Y., & Arimura, H. (2003). Lcm: An efficient algorithm for enumerating frequent closed item sets. In FIMI. http://fimi.ua.ac.be/src/. Accessed 30 Aug 2013.
van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In ICIP (pp. 493–496).
Xie, N., Ling, H., Hu, W., & Zhang, X. (2010). Use bin-ratio information for category and scene classification. In CVPR (pp. 2313–2319). doi: 10.1109/CVPR.2010.5539917.
Yan, X., Cheng, H., Han, J., & Xin, D. (2005). Summarizing itemset patterns: A profile-based approach. In ACM SIGKDD.
Yang, Y., & Newsam, S. (2011). Spatial pyramid co-occurrence for image classification. In ICCV.
Yao, B., & Fei-Fei, L. (2010). Grouplet: A structured image representation for recognizing human and object interactions. In CVPR.
Yimeng Zhang, T. C. (2009). Efficient kernels for identifying unbounded-order spatial features. In CVPR.
Yuan, J., Luo, J., & Wu, Y. (2008) Mining compositional features for boosting. In CVPR. doi: 10.1109/CVPR.2008.4587347.
Yuan, J., Wu, Y., & Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In CVPR. doi: 10.1109/CVPR.2007.383222.
Yuan, J., Yang, M., & Wu, Y. (2011). Mining discriminative co-occurrence patterns for visual recognition. In CVPR (pp. 2777–2784). doi:10.1109/CVPR.2011.5995476.
Yun, U., & Leggett, J. J. (2005). Wfim: Weighted frequent itemset mining with a weight range and a minimum weight. In SDM’05.

Download references

Acknowledgments

The authors acknowledge the support of the iMinds Impact project Beeldcanon, the FP7 ERC Starting Grant 240530 COGNIMUND and PASCAL 2 Network of Excellence.

Author information

Authors and Affiliations

KU Leuven, ESAT-PSI, iMinds, Heverlee, Belgium
Basura Fernando & Tinne Tuytelaars
Laboratoire Hubert-Curien, UMR CNRS 5516, Université de Lyon, Université de St-Etienne, 42000 , Saint-Étienne, France
Elisa Fromont

Authors

Basura Fernando
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Fromont
View author publications
You can also search for this author in PubMed Google Scholar
Tinne Tuytelaars
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Basura Fernando.

Additional information

Communicated by M. Hebert.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fernando, B., Fromont, E. & Tuytelaars, T. Mining Mid-level Features for Image Classification. Int J Comput Vis 108, 186–203 (2014). https://doi.org/10.1007/s11263-014-0700-1

Download citation

Received: 05 December 2012
Accepted: 04 February 2014
Published: 21 February 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11263-014-0700-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Mid-level Features for Image Classification

Abstract

Access this article

Similar content being viewed by others

Learning Hierarchical Bag of Words Using Naive Bayes Clustering

Locality constrained encoding of frequency and spatial information for image classification

Bag-of-Words Image Representation: Key Ideas and Further Insight

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining Mid-level Features for Image Classification

Abstract

Access this article

Similar content being viewed by others

Learning Hierarchical Bag of Words Using Naive Bayes Clustering

Locality constrained encoding of frequency and spatial information for image classification

Bag-of-Words Image Representation: Key Ideas and Further Insight

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation