Skip to main content
Log in

Novel Distributional Visual-Feature Representations for image classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The Bag-of-Visual-Words (BoVW) representation is a well known strategy to approach many computer vision problems. The idea behind BoVW is similar to the Bag-of-Words (BoW) used in text mining tasks: to build word histograms to represent documents. Regarding computer vision, most of the research has been devoted to obtain better visual words, rather than in improving the final representation. This is somewhat surprising, as there are many alternative ways of improving the BoW representation within the text mining community that can be applied in computer vision as well. This paper aims at evaluating the usefulness of Distributional Term Representations (DTRs) for image classification. DTRs represent instances by exploiting statistics of feature occurrences and co-occurrences along the dataset. We focus in the suitability and effectiveness of using well-known DTRs in different image collections. Furthermore, we devise two novel distributional strategies that learn appropriated groups of images to compute better suited distributional features. We report experimental results in several image datasets showing the effectiveness of the proposed DTRs over BoVW and other methods in the literature including deep learning based strategies. In particular we show the effectiveness of the proposed representations on image collections from narrow domains, where target categories are subclasses of a more general class (e.g., subclasses of birds, aircrafts, or dogs).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Note that αi does not need to be optimized, and could be any term weighting scheme in text mining to capture the feature contribution (e.g., frequency, boolean, information gain, etc.). In our evaluation we use the simplest one: the visual word occurrence frequency.

  2. Please note that the process to build the representation is totally unsupervised, but it is used in image classification which is a supervised problem.

  3. Albeit there are many alternative ways to extract image patches (i.e., dense or regular grid-based, keypoint-based, etc.), in this paper we use regular-grid-based to simplify the explanation of the visual features that we use as terms.

  4. The experimental settings for LSA and LDA were experimentally determined between 200 and 60 concepts for each dataset in the same way that in [30]. For the SPR, the parameter to build the pyramid representation was fixed to three as suggested in [30].

  5. This version of visual bigrams as features (López-Monroy et al. [38]) requires the use of a building strategy in the order of O(n2) for time and space.

  6. The proposed DTRs outperformed these baselines in experiments from Section 8 only for Histophatology and Butterflies classification. We did not include them since some images and classes from collections in Table 1 were explicitly used in pretrained CNN models (e.g., mandarin duck from the birds collection).

References

  1. Bosch A, Zisserman A, Muñoz X (2006) Scene classification via plsa. In: Computer vision–ECCV 2006. Springer, pp 517–530

  2. Bruni E, Boleda G, Baroni M, Tran NK (2012) Distributional semantics in technicolor. In: ACL. ACL, pp 136–145

  3. Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res (JAIR) 49:1–47

    Article  MathSciNet  Google Scholar 

  4. Chen G, Yang J, Jin H, Shechtman E, Brandt J, Han TX (2015) Selective pooling vector for fine-grained recognition. In: 2015 IEEE winter conference on applications of computer vision. IEEE, pp 860–867

  5. Cruz-Roa A, Caicedo JC, González FA (2011) Visual pattern mining in histology image collections using bag of features. Artif Intell Med 52:91–106

    Article  Google Scholar 

  6. Cruz-Roa A, Díaz G, Romero E, González F (2011) Automatic annotation of histopathological images using a latent topic model based on non-negative matrix factorization. J Pathol Inf 2(2):4

    Article  Google Scholar 

  7. Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: International workshop on statistical learning in computer vision, ECCV, vol 1, p 22

  8. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  9. Díaz G, Romero E (2012) Micro-structural tissue analysis for automatic histopathological image annotation. Microsc Res Tech 75:343–358

    Article  Google Scholar 

  10. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  11. Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: CVPR, vol 2. IEEE, pp 524–531

  12. Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70

    Article  Google Scholar 

  13. Feng Y, Lapata M (2010) Topic models for image annotation and text illustration. In: NAACL. ACL, pp 831–839

  14. Gavves E, Fernando B, Snoek CG, Smeulders AW, Tuytelaars T (2013) Fine-grained categorization by alignments. In: Proceedings of the IEEE international conference on computer vision, pp 1713–1720

  15. Gavves E, Fernando B, Snoek CG, Smeulders AW, Tuytelaars T (2015) Local alignments for fine-grained categorization. Int J Comput Vis 111(2):191–212

    Article  Google Scholar 

  16. Gosselin PH, Murray N, Jégou H, Perronnin F (2014) Revisiting the fisher vector for fine-grained classification. Pattern Recognit Lett 49:92–98

    Article  Google Scholar 

  17. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The weka data mining software: an update. In: SIGKDD explorations 11

  18. Jamieson M, Fazly A, Dickinson S, Stevenson S, Wachsmuth S (2007) Learning structured appearance models from captioned images of cluttered scenes. In: ICCV, pp 1–8

  19. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093

  20. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Machine learning: ECML-98, Lecture Notes in Computer Science, vol 1398. Springer, Berlin, pp 137–142

    Chapter  Google Scholar 

  21. Joachims T (1999) Advances in kernel methods. chap. Making large-scale support vector machine learning practical. MIT Press, Cambridge, pp 169–184

    Google Scholar 

  22. Kanan C (2014) Fine-grained object recognition with gnostic fields. In: IEEE Winter conference on applications of computer vision. IEEE, pp 23–30

  23. Khosla A, Jayadevaprakash N, Yao B, Fei-Fei L (2011) Novel dataset for fine-grained image categorization. In: First workshop on fine-grained visual categorization. IEEE conference on computer vision and pattern recognition. Colorado Springs

  24. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561

  25. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  26. Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience

  27. Lavelli A, Sebastiani F, Zanoli R (2004) Distributional term representations: an experimental comparison. In: CIKM. ACM, pp 615–624

  28. Lazebnik S, Schmid C, Ponce J et al (2004) Semi-local affine parts for object recognition. In: BMVC, pp 779–788

  29. Lazebnik S, Schmid C, Ponce J (2005) A maximum entropy framework for part-based texture and object recognition. In: ICCV, vol 1. IEEE, pp 832–838

  30. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, vol 2. IEEE, pp 2169–2178

  31. Li Z, Xiong Z, Zhang Y, Liu C, Li K (2011) Fast text categorization using concise semantic analysis. Pattern Recognit Lett 32(3):441–448

    Article  Google Scholar 

  32. Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457

  33. Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: Proceedings of the 24th international conference on artificial intelligence, IJCAI’15. http://dl.acm.org/citation.cfm?id=2832415.2832474. AAAI Press, pp 1617–1623

  34. Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115. https://doi.org/10.1016/j.neucom.2015.08.096. http://www.sciencedirect.com/science/article/pii/S0925231215016331. Big Data Driven Intelligent Transportation Systems

    Article  Google Scholar 

  35. Liu Y, Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: predicting your career path. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI’16. http://dl.acm.org/citation.cfm?id=3015812.3015842. AAAI Press, pp 201–207

  36. Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI’16. http://dl.acm.org/citation.cfm?id=3060832.3060981. AAAI Press, pp 2576–2582

  37. López-Monroy AP, Montes-y Gómez M, Escalante HJ, Cruz-Roa A, González FA (2013) Bag-of-visual-ngrams for histopathology image classification. In: IX international seminar on medical information processing and analysis, vol 8922. SPIE, p 89220P

  38. López-Monroy AP, Montes-y Gómez M, Escalante HJ, Villaseñor-Pineda L, Stamatatos E (2015) Discriminative subprofile-specific representations for author profiling in social media. Knowl-Based Syst 89:134–147

    Article  Google Scholar 

  39. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  40. López-Monroy AP, Montes-y Gómez M, Escalante HJ, Cruz-Roa A, González FA (2016) Improving the bovw via discriminative visual n-grams and {MKL} strategies. Neurocomputing 175, Part A:768– 781

    Article  Google Scholar 

  41. Maji S, Kannala J, Rahtu E, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. Tech. rep.

  42. Phan XH, Nguyen LM, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, WWW ’08, pp 91–100

  43. Quack T, Ferrari V, Leibe B, Van Gool L (2007) Efficient mining of frequent and distinctive feature configurations. In: ICCV. IEEE, pp 1–8

  44. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision. ICCV

  45. Sriram B, Fuhry D, Demir E, Ferhatosmanoglu H, Demirbas M (2010) Short text classification in twitter to improve information filtering. In: SIGIR, SIGIR ’10, pp 841–842

  46. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  47. Tirilly P, Claveau V, Gros P (2008) Language modeling for bag-of-visual words image categorization. In: CIVR, pp 249–258

  48. Tirilly P, Claveau V, Gros P (2009) A review of weighting schemes for bag of visual words image retrieval. Tech. rep. TEXMEX - INRIA - IRISA

  49. Tommasi T, Orabona F, Caputo B (2007) Image annotation task: an svm-based cue integration approach. In: 2007 CLEF workshop

  50. Wang H, Ullah MM, Klaser A, Laptev I (2009) Evaluation of local spatio-temporal features for action recognition. In: BMVC, pp 1–11

  51. Yang S, Bo L, Wang J, Shapiro LG (2012) Unsupervised template learning for fine-grained object recognition. In: Advances in neural information processing systems, pp 3122–3130

  52. Yuan J, Wu Y, Yang M (2007) Discovery of collocation patterns: from visual words to visual phrases. In: CVPR, pp 1–8

  53. Yuan J, Yang M, Wu Y (2011) Mining discriminative co-occurrence patterns for visual recognition. In: CVPR. IEEE, pp 2777–2784

  54. Zhang J, Marszalek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73:213–238

    Article  Google Scholar 

  55. Zheng QF, Wang W, Gao W (2006) Effective and efficient object-based image retrieval using visual phrases. In: ACMMM, pp 77–80

  56. Zheng YT, Zhao M, Neo SY, Chua TS, Tian Q (2008) Visual synset: towards a higher-level visual representation. In: CVPR. IEEE, pp 1–8

Download references

Acknowledgements

This work was supported by CONACyT under grant 241306 and the scholarship 243957. H.J. Escalante was supported by Redes Temáticas CONACyT en Tecnologías del Lenguaje (RedTTL) e Inteligencia Computacional Aplicada (RedICA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Pastor López-Monroy.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

López-Monroy, A.P., Montes-y-Gómez, M., Escalante, H.J. et al. Novel Distributional Visual-Feature Representations for image classification. Multimed Tools Appl 78, 11313–11336 (2019). https://doi.org/10.1007/s11042-018-6674-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6674-1

Keywords

Navigation