Novel Distributional Visual-Feature Representations for image classification

López-Monroy, A. Pastor; Montes-y-Gómez, Manuel; Escalante, Hugo Jair; González, Fabio A.

doi:10.1007/s11042-018-6674-1

Novel Distributional Visual-Feature Representations for image classification

Published: 24 September 2018

Volume 78, pages 11313–11336, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

A. Pastor López-Monroy ORCID: orcid.org/0000-0003-1018-4221^1,2,3,
Manuel Montes-y-Gómez³,
Hugo Jair Escalante³ &
…
Fabio A. González⁴

289 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

The Bag-of-Visual-Words (BoVW) representation is a well known strategy to approach many computer vision problems. The idea behind BoVW is similar to the Bag-of-Words (BoW) used in text mining tasks: to build word histograms to represent documents. Regarding computer vision, most of the research has been devoted to obtain better visual words, rather than in improving the final representation. This is somewhat surprising, as there are many alternative ways of improving the BoW representation within the text mining community that can be applied in computer vision as well. This paper aims at evaluating the usefulness of Distributional Term Representations (DTRs) for image classification. DTRs represent instances by exploiting statistics of feature occurrences and co-occurrences along the dataset. We focus in the suitability and effectiveness of using well-known DTRs in different image collections. Furthermore, we devise two novel distributional strategies that learn appropriated groups of images to compute better suited distributional features. We report experimental results in several image datasets showing the effectiveness of the proposed DTRs over BoVW and other methods in the literature including deep learning based strategies. In particular we show the effectiveness of the proposed representations on image collections from narrow domains, where target categories are subclasses of a more general class (e.g., subclasses of birds, aircrafts, or dogs).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What Image Classifiers Really See – Visualizing Bag-of-Visual Words Models

Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Article 10 March 2021

Improvement the Bag of Words Image Representation Using Spatial Information

Notes

Note that α_i does not need to be optimized, and could be any term weighting scheme in text mining to capture the feature contribution (e.g., frequency, boolean, information gain, etc.). In our evaluation we use the simplest one: the visual word occurrence frequency.
Please note that the process to build the representation is totally unsupervised, but it is used in image classification which is a supervised problem.
Albeit there are many alternative ways to extract image patches (i.e., dense or regular grid-based, keypoint-based, etc.), in this paper we use regular-grid-based to simplify the explanation of the visual features that we use as terms.
The experimental settings for LSA and LDA were experimentally determined between 200 and 60 concepts for each dataset in the same way that in [30]. For the SPR, the parameter to build the pyramid representation was fixed to three as suggested in [30].
This version of visual bigrams as features (López-Monroy et al. [38]) requires the use of a building strategy in the order of O(n²) for time and space.
The proposed DTRs outperformed these baselines in experiments from Section 8 only for Histophatology and Butterflies classification. We did not include them since some images and classes from collections in Table 1 were explicitly used in pretrained CNN models (e.g., mandarin duck from the birds collection).

References

Bosch A, Zisserman A, Muñoz X (2006) Scene classification via plsa. In: Computer vision–ECCV 2006. Springer, pp 517–530
Bruni E, Boleda G, Baroni M, Tran NK (2012) Distributional semantics in technicolor. In: ACL. ACL, pp 136–145
Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res (JAIR) 49:1–47
Article MathSciNet Google Scholar
Chen G, Yang J, Jin H, Shechtman E, Brandt J, Han TX (2015) Selective pooling vector for fine-grained recognition. In: 2015 IEEE winter conference on applications of computer vision. IEEE, pp 860–867
Cruz-Roa A, Caicedo JC, González FA (2011) Visual pattern mining in histology image collections using bag of features. Artif Intell Med 52:91–106
Article Google Scholar
Cruz-Roa A, Díaz G, Romero E, González F (2011) Automatic annotation of histopathological images using a latent topic model based on non-negative matrix factorization. J Pathol Inf 2(2):4
Article Google Scholar
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: International workshop on statistical learning in computer vision, ECCV, vol 1, p 22
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Díaz G, Romero E (2012) Micro-structural tissue analysis for automatic histopathological image annotation. Microsc Res Tech 75:343–358
Article Google Scholar
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: CVPR, vol 2. IEEE, pp 524–531
Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Article Google Scholar
Feng Y, Lapata M (2010) Topic models for image annotation and text illustration. In: NAACL. ACL, pp 831–839
Gavves E, Fernando B, Snoek CG, Smeulders AW, Tuytelaars T (2013) Fine-grained categorization by alignments. In: Proceedings of the IEEE international conference on computer vision, pp 1713–1720
Gavves E, Fernando B, Snoek CG, Smeulders AW, Tuytelaars T (2015) Local alignments for fine-grained categorization. Int J Comput Vis 111(2):191–212
Article Google Scholar
Gosselin PH, Murray N, Jégou H, Perronnin F (2014) Revisiting the fisher vector for fine-grained classification. Pattern Recognit Lett 49:92–98
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The weka data mining software: an update. In: SIGKDD explorations 11
Jamieson M, Fazly A, Dickinson S, Stevenson S, Wachsmuth S (2007) Learning structured appearance models from captioned images of cluttered scenes. In: ICCV, pp 1–8
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Machine learning: ECML-98, Lecture Notes in Computer Science, vol 1398. Springer, Berlin, pp 137–142
Chapter Google Scholar
Joachims T (1999) Advances in kernel methods. chap. Making large-scale support vector machine learning practical. MIT Press, Cambridge, pp 169–184
Google Scholar
Kanan C (2014) Fine-grained object recognition with gnostic fields. In: IEEE Winter conference on applications of computer vision. IEEE, pp 23–30
Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L (2011) Novel dataset for fine-grained image categorization. In: First workshop on fine-grained visual categorization. IEEE conference on computer vision and pattern recognition. Colorado Springs
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience
Lavelli A, Sebastiani F, Zanoli R (2004) Distributional term representations: an experimental comparison. In: CIKM. ACM, pp 615–624
Lazebnik S, Schmid C, Ponce J et al (2004) Semi-local affine parts for object recognition. In: BMVC, pp 779–788
Lazebnik S, Schmid C, Ponce J (2005) A maximum entropy framework for part-based texture and object recognition. In: ICCV, vol 1. IEEE, pp 832–838
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, vol 2. IEEE, pp 2169–2178
Li Z, Xiong Z, Zhang Y, Liu C, Li K (2011) Fast text categorization using concise semantic analysis. Pattern Recognit Lett 32(3):441–448
Article Google Scholar
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: Proceedings of the 24th international conference on artificial intelligence, IJCAI’15. http://dl.acm.org/citation.cfm?id=2832415.2832474. AAAI Press, pp 1617–1623
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115. https://doi.org/10.1016/j.neucom.2015.08.096. http://www.sciencedirect.com/science/article/pii/S0925231215016331. Big Data Driven Intelligent Transportation Systems
Article Google Scholar
Liu Y, Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: predicting your career path. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. http://dl.acm.org/citation.cfm?id=3015812.3015842. AAAI Press, pp 201–207
Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI’16. http://dl.acm.org/citation.cfm?id=3060832.3060981. AAAI Press, pp 2576–2582
López-Monroy AP, Montes-y Gómez M, Escalante HJ, Cruz-Roa A, González FA (2013) Bag-of-visual-ngrams for histopathology image classification. In: IX international seminar on medical information processing and analysis, vol 8922. SPIE, p 89220P
López-Monroy AP, Montes-y Gómez M, Escalante HJ, Villaseñor-Pineda L, Stamatatos E (2015) Discriminative subprofile-specific representations for author profiling in social media. Knowl-Based Syst 89:134–147
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
López-Monroy AP, Montes-y Gómez M, Escalante HJ, Cruz-Roa A, González FA (2016) Improving the bovw via discriminative visual n-grams and {MKL} strategies. Neurocomputing 175, Part A:768– 781
Article Google Scholar
Maji S, Kannala J, Rahtu E, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. Tech. rep.
Phan XH, Nguyen LM, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, WWW ’08, pp 91–100
Quack T, Ferrari V, Leibe B, Van Gool L (2007) Efficient mining of frequent and distinctive feature configurations. In: ICCV. IEEE, pp 1–8
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision. ICCV
Sriram B, Fuhry D, Demir E, Ferhatosmanoglu H, Demirbas M (2010) Short text classification in twitter to improve information filtering. In: SIGIR, SIGIR ’10, pp 841–842
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tirilly P, Claveau V, Gros P (2008) Language modeling for bag-of-visual words image categorization. In: CIVR, pp 249–258
Tirilly P, Claveau V, Gros P (2009) A review of weighting schemes for bag of visual words image retrieval. Tech. rep. TEXMEX - INRIA - IRISA
Tommasi T, Orabona F, Caputo B (2007) Image annotation task: an svm-based cue integration approach. In: 2007 CLEF workshop
Wang H, Ullah MM, Klaser A, Laptev I (2009) Evaluation of local spatio-temporal features for action recognition. In: BMVC, pp 1–11
Yang S, Bo L, Wang J, Shapiro LG (2012) Unsupervised template learning for fine-grained object recognition. In: Advances in neural information processing systems, pp 3122–3130
Yuan J, Wu Y, Yang M (2007) Discovery of collocation patterns: from visual words to visual phrases. In: CVPR, pp 1–8
Yuan J, Yang M, Wu Y (2011) Mining discriminative co-occurrence patterns for visual recognition. In: CVPR. IEEE, pp 2777–2784
Zhang J, Marszalek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73:213–238
Article Google Scholar
Zheng QF, Wang W, Gao W (2006) Effective and efficient object-based image retrieval using visual phrases. In: ACMMM, pp 77–80
Zheng YT, Zhao M, Neo SY, Chua TS, Tian Q (2008) Visual synset: towards a higher-level visual representation. In: CVPR. IEEE, pp 1–8

Download references

Acknowledgements

This work was supported by CONACyT under grant 241306 and the scholarship 243957. H.J. Escalante was supported by Redes Temáticas CONACyT en Tecnologías del Lenguaje (RedTTL) e Inteligencia Computacional Aplicada (RedICA).

Author information

Authors and Affiliations

Department of Computer Science, Mathematics Research Center (CIMAT), Jalisco S/N, Col. Valenciana, Guanajuato, GTO, 36023, México
A. Pastor López-Monroy
Department of Computer Science, University of Houston, 4800 Calhoun Rd, Houston, TX, 77004, USA
A. Pastor López-Monroy
Computer Science Department, Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro No.1, Sta. Ma. Tonantzintla, C.P. 72840, Puebla, México
A. Pastor López-Monroy, Manuel Montes-y-Gómez & Hugo Jair Escalante
Computing Systems and Industrial Engineering Department, Universidad Nacional de Colombia, Cra 30 No 45 03-Ciudad Universitaria, Bogotá, DC, Colombia
Fabio A. González

Authors

A. Pastor López-Monroy
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Montes-y-Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Jair Escalante
View author publications
You can also search for this author in PubMed Google Scholar
Fabio A. González
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Pastor López-Monroy.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

López-Monroy, A.P., Montes-y-Gómez, M., Escalante, H.J. et al. Novel Distributional Visual-Feature Representations for image classification. Multimed Tools Appl 78, 11313–11336 (2019). https://doi.org/10.1007/s11042-018-6674-1

Download citation

Received: 19 December 2017
Revised: 08 September 2018
Accepted: 11 September 2018
Published: 24 September 2018
Issue Date: May 2019
DOI: https://doi.org/10.1007/s11042-018-6674-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novel Distributional Visual-Feature Representations for image classification

Abstract

Access this article

Similar content being viewed by others

What Image Classifiers Really See – Visualizing Bag-of-Visual Words Models

Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Improvement the Bag of Words Image Representation Using Spatial Information

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Novel Distributional Visual-Feature Representations for image classification

Abstract

Access this article

Similar content being viewed by others

What Image Classifiers Really See – Visualizing Bag-of-Visual Words Models

Bag-of-Visual-Words codebook generation using deep features for effective classification of imbalanced multi-class image datasets

Improvement the Bag of Words Image Representation Using Spatial Information

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation