Robust image features for classification and zero-shot tasks by merging visual and semantic attributes

de Resende, Damares Crystina Oliveira; Ponti, Moacir Antonelli

doi:10.1007/s00521-021-06601-7

Robust image features for classification and zero-shot tasks by merging visual and semantic attributes

Original Article
Published: 06 January 2022

Volume 34, pages 4459–4471, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Damares Crystina Oliveira de Resende ORCID: orcid.org/0000-0003-1367-9561¹ &
Moacir Antonelli Ponti¹

360 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We investigate visual-semantic representations by combining visual features and semantic attributes to form a compact subspace containing the most relevant properties of each domain. This subspace can better represent image features for recognition tasks and even allow to better interpret results in the light of the nature of semantic attributes, offering a path for explainable learning. Experiments were performed in four benchmark datasets and compared against state-of-the-art algorithms. The method shows to be robust for up to 20% degradation of semantic attributes and offering possibilities for future work on the deployment of an automatic gathering of semantic data to improve representations for image classification. Additionally, empirical evidence suggests the high-level concepts adds linearity to the feature space, allowing for example PCA and SVM to perform well in the combined visual and semantic features. Also, the representations allow for zero-shot learning which demonstrates the viability of merging semantic and visual data at both training and test time for learning aspects that transcend class boundaries that allow the classification of unseen data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

References

Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459
Article Google Scholar
Akata Z, Perronnin F, Harchaoui Z, Schmid C (2015) Label-embedding for image classification. IEEE Transactions Pattern Anal Mach Intell 38(7):1425–1438
Article Google Scholar
Almousli H, Vincent P (2013) Semi supervised autoencoders: better focusing model capacity during feature extraction In: International Conference on Neural Information Processing, Springer pp 328–335
Biederman I (1987) Recognition-by-components: a theory of human image understanding. Psychol Rev 94(2):115
Article Google Scholar
Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010) The balanced accuracy and its posterior distribution In: 2010 20th International Conference on Pattern Recognition IEEE pp 3121–3124
Cavallari G, Ribeiro L, Ponti M (2018) Unsupervised representation learning using convolutional and stacked auto-encoders: a domain and cross-domain feature space analysis In: 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) IEEE pp 440–446
Cayton L (2005) Algorithms for manifold learning. Univ California San Diego Tech Rep 12(1–17):1
Google Scholar
Chollet F (2015) Keras https://github.com/fchollet/keras
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database In: CVPR09
Deselaers T, Ferrari V (2011) Visual and semantic similarity in imagenet In: CVPR 2011, pp 1777–1784 IEEE
Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE pp 1778–1785
Ge Z, Demyanov S, Bozorgtabar B, Abedini M, Chakravorty R, Bowling A, Garnavi R (2017) Exploiting local and generic features for accurate skin lesions classification using clinical and dermoscopy imaging In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), IEEE pp 986–990
Gonzalez RC, Thomason MG (1978) Syntactic pattern recognition: an introduction. Addison-Wesley, Reading, MA
MATH Google Scholar
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 770–778
Hotelling H (1992) Relations between two sets of variates. Breakthroughs in statistics. Springer, New York, NY, pp 162–190
Chapter Google Scholar
Jayaraman D, Grauman K (2014) Zero-shot recognition with unreliable attributes In: Advances in neural information processing systems pp 3464–3472
Juan DC, Lu CT, Li Z, Peng F, Timofeev A, Chen YT, Gao Y, Duerig T, Tomkins A, Ravi S (2019) Graph-rise: Graph-regularized image semantic embedding arXiv preprint arXiv:1902.10814
Kodirov E, Xiang T, Gong S (2017) Semantic autoencoder for zero-shot learning In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 3174–3183
Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer In: 2009 IEEE Conference on Computer Vision and Pattern Recognition IEEE pp 951–958
Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition In: Proceedings of the IEEE International Conference on Computer Vision pp 1449–1457
Lu Y (2015) Unsupervised learning on neural network outputs: with application in zero-shot learning arXiv preprint arXiv:1506.00990
Mello RF, Ponti MA (2018) Machine learning: a practical approach on the statistical learning theory. Springer, New York
Book Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119.
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines In: ICML
Patterson G, Xu C, Su H, Hays J (2014) The sun attribute database: beyond categories for deeper scene understanding. Int J Computer V 108(1–2):59–81
Google Scholar
Ponti MA, Ribeiro LSF, Nazare TS, Bui T, Collomosse J (2017) Everything you wanted to know about deep learning for computer vision but were afraid to ask In: 30th SIBGRAPI conference on graphics, patterns and images tutorials (SIBGRAPI-T), IEEE pp 17–41
Ponti MA, Santos FPd, Ribeiro LSF, Cavallari GB (2021) Training deep networks from zero to hero: avoiding pitfalls and going beyond In: SIBGRAPI - Conference on graphics, patterns and images
Ranzato M, Boureau YL, Chopra S, LeCun Y (2007) A unified energy-based framework for unsupervised learning In: Artificial Intelligence and Statistics, pp 371–379
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement arXiv preprint arXiv:1804.02767
Ren Z, Jin H, Lin Z, Fang C, Yuille A (2015) Multi-instance visual-semantic embedding arXiv preprint arXiv:1512.06963
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Silberer C, Ferrari V, Lapata M (2013) Models of semantic representation with visual attributes In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol 1, pp 572–582
Su Y, Jurie F (2012) Improving image classification using semantic attributes. Int J Computer V 100(1):59–77
Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Vogel J, Schiele B (2004) Natural scene retrieval based on a semantic modeling step In: International Conference on Image and Video Retrieval Springer pp 207–215
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD Birds-200-2011 Dataset Tech Rep CNS-TR-2011-001, California Institute of Technology
Wang J, Yang Y, Mao J, Huang Z, Huang C, Xu W (2016) Cnn-rnn: a unified framework for multi-label image classification In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2285–2294
Xian Y, Akata Z, Sharma G, Nguyen Q, Hein M, Schiele B (2016) Latent embeddings for zero-shot classification In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 69–77
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions Pattern Anal Mach Intell 41(9):2251–2265
Article Google Scholar
Xian Y, Lorenz T, Schiele B, Akata Z (2018) Feature generating networks for zero-shot learning In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Xian Y, Schiele B, Akata Z (2017) Zero-shot learning - the good, the bad and the ugly In: IEEE Computer Vision and Pattern Recognition (CVPR)
Xiao J, Hays J, Ehinger K.A, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE pp 3485–3492
Xu H, Qi G, Li J, Wang M, Xu K, Gao H (2018) Fine-grained image classification by visual-semantic embedding In: IJCAI, pp 1043–1049
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, pp 3320–3328
Zhang J, Wu Q, Shen C, Zhang J, Lu J (2018) Multilabel image classification with regional latent semantic dependencies. IEEE Transactions Multimedia 20(10):2801–2813
Article Google Scholar
Zhang Z, Saligrama V (20136 Zero-shot learning via joint latent similarity embedding In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 6034–6042

Download references

Acknowledgment

This work was supported by FAPESP grants #2018/22482-0 and #2019/07316-0; CNPq fellowship #304266/2020-5.

Author information

Authors and Affiliations

ICMC / Universidade de São Paulo, São Carlos, Brazil
Damares Crystina Oliveira de Resende & Moacir Antonelli Ponti

Authors

Damares Crystina Oliveira de Resende
View author publications
You can also search for this author in PubMed Google Scholar
Moacir Antonelli Ponti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Damares Crystina Oliveira de Resende.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Code availability

Source code can be found at a public Git repository https://github.com/MAP-VICG/VisualSemanticEncoder.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Resende, D.C.O., Ponti, M.A. Robust image features for classification and zero-shot tasks by merging visual and semantic attributes. Neural Comput & Applic 34, 4459–4471 (2022). https://doi.org/10.1007/s00521-021-06601-7

Download citation

Received: 24 December 2020
Accepted: 04 October 2021
Published: 06 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00521-021-06601-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust image features for classification and zero-shot tasks by merging visual and semantic attributes

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

ImageNet Large Scale Visual Recognition Challenge

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust image features for classification and zero-shot tasks by merging visual and semantic attributes

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

ImageNet Large Scale Visual Recognition Challenge

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation