Abstract
The Bag-of-Visual-Words (BoVW) representation is a well known strategy to approach many computer vision problems. The idea behind BoVW is similar to the Bag-of-Words (BoW) used in text mining tasks: to build word histograms to represent documents. Regarding computer vision, most of the research has been devoted to obtain better visual words, rather than in improving the final representation. This is somewhat surprising, as there are many alternative ways of improving the BoW representation within the text mining community that can be applied in computer vision as well. This paper aims at evaluating the usefulness of Distributional Term Representations (DTRs) for image classification. DTRs represent instances by exploiting statistics of feature occurrences and co-occurrences along the dataset. We focus in the suitability and effectiveness of using well-known DTRs in different image collections. Furthermore, we devise two novel distributional strategies that learn appropriated groups of images to compute better suited distributional features. We report experimental results in several image datasets showing the effectiveness of the proposed DTRs over BoVW and other methods in the literature including deep learning based strategies. In particular we show the effectiveness of the proposed representations on image collections from narrow domains, where target categories are subclasses of a more general class (e.g., subclasses of birds, aircrafts, or dogs).
Similar content being viewed by others
Notes
Note that αi does not need to be optimized, and could be any term weighting scheme in text mining to capture the feature contribution (e.g., frequency, boolean, information gain, etc.). In our evaluation we use the simplest one: the visual word occurrence frequency.
Please note that the process to build the representation is totally unsupervised, but it is used in image classification which is a supervised problem.
Albeit there are many alternative ways to extract image patches (i.e., dense or regular grid-based, keypoint-based, etc.), in this paper we use regular-grid-based to simplify the explanation of the visual features that we use as terms.
This version of visual bigrams as features (López-Monroy et al. [38]) requires the use of a building strategy in the order of O(n2) for time and space.
The proposed DTRs outperformed these baselines in experiments from Section 8 only for Histophatology and Butterflies classification. We did not include them since some images and classes from collections in Table 1 were explicitly used in pretrained CNN models (e.g., mandarin duck from the birds collection).
References
Bosch A, Zisserman A, Muñoz X (2006) Scene classification via plsa. In: Computer vision–ECCV 2006. Springer, pp 517–530
Bruni E, Boleda G, Baroni M, Tran NK (2012) Distributional semantics in technicolor. In: ACL. ACL, pp 136–145
Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res (JAIR) 49:1–47
Chen G, Yang J, Jin H, Shechtman E, Brandt J, Han TX (2015) Selective pooling vector for fine-grained recognition. In: 2015 IEEE winter conference on applications of computer vision. IEEE, pp 860–867
Cruz-Roa A, Caicedo JC, González FA (2011) Visual pattern mining in histology image collections using bag of features. Artif Intell Med 52:91–106
Cruz-Roa A, Díaz G, Romero E, González F (2011) Automatic annotation of histopathological images using a latent topic model based on non-negative matrix factorization. J Pathol Inf 2(2):4
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: International workshop on statistical learning in computer vision, ECCV, vol 1, p 22
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Díaz G, Romero E (2012) Micro-structural tissue analysis for automatic histopathological image annotation. Microsc Res Tech 75:343–358
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: CVPR, vol 2. IEEE, pp 524–531
Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Feng Y, Lapata M (2010) Topic models for image annotation and text illustration. In: NAACL. ACL, pp 831–839
Gavves E, Fernando B, Snoek CG, Smeulders AW, Tuytelaars T (2013) Fine-grained categorization by alignments. In: Proceedings of the IEEE international conference on computer vision, pp 1713–1720
Gavves E, Fernando B, Snoek CG, Smeulders AW, Tuytelaars T (2015) Local alignments for fine-grained categorization. Int J Comput Vis 111(2):191–212
Gosselin PH, Murray N, Jégou H, Perronnin F (2014) Revisiting the fisher vector for fine-grained classification. Pattern Recognit Lett 49:92–98
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The weka data mining software: an update. In: SIGKDD explorations 11
Jamieson M, Fazly A, Dickinson S, Stevenson S, Wachsmuth S (2007) Learning structured appearance models from captioned images of cluttered scenes. In: ICCV, pp 1–8
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Machine learning: ECML-98, Lecture Notes in Computer Science, vol 1398. Springer, Berlin, pp 137–142
Joachims T (1999) Advances in kernel methods. chap. Making large-scale support vector machine learning practical. MIT Press, Cambridge, pp 169–184
Kanan C (2014) Fine-grained object recognition with gnostic fields. In: IEEE Winter conference on applications of computer vision. IEEE, pp 23–30
Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L (2011) Novel dataset for fine-grained image categorization. In: First workshop on fine-grained visual categorization. IEEE conference on computer vision and pattern recognition. Colorado Springs
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience
Lavelli A, Sebastiani F, Zanoli R (2004) Distributional term representations: an experimental comparison. In: CIKM. ACM, pp 615–624
Lazebnik S, Schmid C, Ponce J et al (2004) Semi-local affine parts for object recognition. In: BMVC, pp 779–788
Lazebnik S, Schmid C, Ponce J (2005) A maximum entropy framework for part-based texture and object recognition. In: ICCV, vol 1. IEEE, pp 832–838
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, vol 2. IEEE, pp 2169–2178
Li Z, Xiong Z, Zhang Y, Liu C, Li K (2011) Fast text categorization using concise semantic analysis. Pattern Recognit Lett 32(3):441–448
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: Proceedings of the 24th international conference on artificial intelligence, IJCAI’15. http://dl.acm.org/citation.cfm?id=2832415.2832474. AAAI Press, pp 1617–1623
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115. https://doi.org/10.1016/j.neucom.2015.08.096. http://www.sciencedirect.com/science/article/pii/S0925231215016331. Big Data Driven Intelligent Transportation Systems
Liu Y, Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: predicting your career path. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. http://dl.acm.org/citation.cfm?id=3015812.3015842. AAAI Press, pp 201–207
Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI’16. http://dl.acm.org/citation.cfm?id=3060832.3060981. AAAI Press, pp 2576–2582
López-Monroy AP, Montes-y Gómez M, Escalante HJ, Cruz-Roa A, González FA (2013) Bag-of-visual-ngrams for histopathology image classification. In: IX international seminar on medical information processing and analysis, vol 8922. SPIE, p 89220P
López-Monroy AP, Montes-y Gómez M, Escalante HJ, Villaseñor-Pineda L, Stamatatos E (2015) Discriminative subprofile-specific representations for author profiling in social media. Knowl-Based Syst 89:134–147
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
López-Monroy AP, Montes-y Gómez M, Escalante HJ, Cruz-Roa A, González FA (2016) Improving the bovw via discriminative visual n-grams and {MKL} strategies. Neurocomputing 175, Part A:768– 781
Maji S, Kannala J, Rahtu E, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. Tech. rep.
Phan XH, Nguyen LM, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, WWW ’08, pp 91–100
Quack T, Ferrari V, Leibe B, Van Gool L (2007) Efficient mining of frequent and distinctive feature configurations. In: ICCV. IEEE, pp 1–8
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision. ICCV
Sriram B, Fuhry D, Demir E, Ferhatosmanoglu H, Demirbas M (2010) Short text classification in twitter to improve information filtering. In: SIGIR, SIGIR ’10, pp 841–842
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tirilly P, Claveau V, Gros P (2008) Language modeling for bag-of-visual words image categorization. In: CIVR, pp 249–258
Tirilly P, Claveau V, Gros P (2009) A review of weighting schemes for bag of visual words image retrieval. Tech. rep. TEXMEX - INRIA - IRISA
Tommasi T, Orabona F, Caputo B (2007) Image annotation task: an svm-based cue integration approach. In: 2007 CLEF workshop
Wang H, Ullah MM, Klaser A, Laptev I (2009) Evaluation of local spatio-temporal features for action recognition. In: BMVC, pp 1–11
Yang S, Bo L, Wang J, Shapiro LG (2012) Unsupervised template learning for fine-grained object recognition. In: Advances in neural information processing systems, pp 3122–3130
Yuan J, Wu Y, Yang M (2007) Discovery of collocation patterns: from visual words to visual phrases. In: CVPR, pp 1–8
Yuan J, Yang M, Wu Y (2011) Mining discriminative co-occurrence patterns for visual recognition. In: CVPR. IEEE, pp 2777–2784
Zhang J, Marszalek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73:213–238
Zheng QF, Wang W, Gao W (2006) Effective and efficient object-based image retrieval using visual phrases. In: ACMMM, pp 77–80
Zheng YT, Zhao M, Neo SY, Chua TS, Tian Q (2008) Visual synset: towards a higher-level visual representation. In: CVPR. IEEE, pp 1–8
Acknowledgements
This work was supported by CONACyT under grant 241306 and the scholarship 243957. H.J. Escalante was supported by Redes Temáticas CONACyT en Tecnologías del Lenguaje (RedTTL) e Inteligencia Computacional Aplicada (RedICA).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
López-Monroy, A.P., Montes-y-Gómez, M., Escalante, H.J. et al. Novel Distributional Visual-Feature Representations for image classification. Multimed Tools Appl 78, 11313–11336 (2019). https://doi.org/10.1007/s11042-018-6674-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6674-1