Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics

Li, Ao-Xue; Zhang, Ke-Xin; Wang, Li-Wei

doi:10.1007/s11633-019-1177-8

Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics

Research Article
Open access
Published: 15 May 2019

Volume 16, pages 563–574, (2019)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Automation and Computing Aims and scope Submit manuscript

Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics

Download PDF

943 Accesses
1 Altmetric
Explore all metrics

A Correction to this article was published on 17 January 2020

This article has been updated

Abstract

Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task for two main reasons: lack of sufficient training data for every class and difficulty in learning discriminative features for representation. In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i.e., zero-shot fine-grained classification. In the first feature learning phase, we finetune deep convolutional neural networks using hierarchical semantic structure among fine-grained classes to extract discriminative deep visual features. Meanwhile, a domain adaptation structure is induced into deep convolutional neural networks to avoid domain shift from training data to test data. In the second label inference phase, a semantic directed graph is constructed over attributes of fine-grained classes. Based on this graph, we develop a label propagation algorithm to infer the labels of images in the unseen classes. Experimental results on two benchmark datasets demonstrate that our model outperforms the state-of-the-art zero-shot learning models. In addition, the features obtained by our feature learning model also yield significant gains when they are used by other zero-shot learning models, which shows the flexility of our model in zero-shot fine-grained classification.

Article PDF

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Article 15 September 2023

HybridPrompt: Domain-Aware Prompting for Cross-Domain Few-Shot Learning

Article 24 June 2024

Learning to Prompt for Vision-Language Models

Article 31 July 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Change history

17 January 2020
The article Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics written by Ao-Xue Li, Ke-Xin Zhang and Li-Wei Wang, was originally published on vol. 16, no. 5 of International Journal of Automation and Computing without Open Access. After publication, the authors decided to opt for Open Choice and to make the article an Open Access publication. Therefore, the copyright of the article has been changed to © The Author(s) 2020 and the article is forthwith distributed under the terms of the Creative Commons Attribution 4.0 International License (<ExternalRef><RefSource>https://doi.org/creativecommons.org/licenses/by/4.0/</RefSource><RefTarget Address="http://www.creativecommons.org/licenses/by/4.0/" TargetType="URL"/></ExternalRef>), which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
17 January 2020
The article Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics written by Ao-Xue Li, Ke-Xin Zhang and Li-Wei Wang, was originally published on vol. 16, no. 5 of International Journal of Automation and Computing without Open Access. After publication, the authors decided to opt for Open Choice and to make the article an Open Access publication. Therefore, the copyright of the article has been changed to © The Author(s) 2020 and the article is forthwith distributed under the terms of the Creative Commons Attribution 4.0 International License (<ExternalRef><RefSource>https://doi.org/creativecommons.org/licenses/by/4.0/</RefSource><RefTarget Address="http://www.creativecommons.org/licenses/by/4.0/" TargetType="URL"/></ExternalRef>), which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

References

B. Zhao, J. S. Feng, X. Wu, S. C. Yan. A survey on deep learning-based fine-grained object classification and semantic segmentation. International Journal of Automation and Computing, vol. 14, no. 2, pp. 119–135, 2017. DOI: https://doi.org/10.1007/s11633-017-1053-3.
Article Google Scholar
M. El Mallahi, A. Zouhri, A. El Affar, A. Tahiri, H. Qjidaa. Radial Hahn moment invariants for 2D and 3D image recognition. International Journal of Automation and Computing, vol. 15, no. 3, pp. 277–289, 2018. DOI: https://doi.org/10.1007/s11633-017-1071-1.
Article Google Scholar
H. S. Du, Q. P. Hu, D. F. Qiao, I Pitas. Robust face recognition via low-rank sparse representation-based classification. International Journal of Automation and Computing, vol. 12, no. 6, pp. 579–587, 2015. DOI: https://doi.org/10.1007/s11633-015-0901-2.
Article Google Scholar
T. Long, X. Xu, F. M. Shen, L. Liu, N. Xie, Y. Yang. Zero-shot learning via discriminative representation extraction. Pattern Recognition Letters, vol. 109, pp. 27–34, 2018. DOI: https://doi.org/10.1016/j.patrec.2017.09.030.
Article Google Scholar
E. Kodirov, T. Xiang, S. G. Gong. Semantic autoencoder for zero-shot learning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 3174–3183, 2017. DOI: https://doi.org/18.1109/CVPR.2017.473.
C. Wah, S. Branson, P. Welinder, P. Perona, S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset, Technical Report CNS-TR-2011-001, California Institute of Technology, USA, 2011.
Google Scholar
P. Welinder, S. Branson, T. Mita, C Wah, F. Schroff, S. Belongie, P. Perona. Caltech-UCSD Birds 200, Technical Report CNS-TR-2010-001, California Institute of Technology, USA, 2010.
Google Scholar
T. Berg, J. X. Liu, S. W. Lee, M. L. Alexander, D. W. Jacobs, P. N. Belhumeur. Birdsnap: Large-scale fine-grained visual categorization of birds. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp. 2019–2026, 2014. doi https://doi.org/10.1109/CVPR.2014.259.
B. P. Yao, A. Khosla, F. F. Li. Combining randomization and discrimination for fine-grained image categorization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Springs, Colorado, USA, pp. 1577–1584, 2011.
M. E. Nilsback, A. Zisserman. Automated flower classification over a large number of classes. In Proceedings of the 6th Indian Conference on Computer Vision, Graphics & Image Processing, IEEE, Bhubaneswar, India, pp. 722–729, 2008. DOI: https://doi.org/10.1109/ICVGIP.2008.47.
Google Scholar
A. R. Sfar, N. Boujemaa, D. Geman. Vantage feature frames for fine-grained categorization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, pp. 835–842, 2013. DOI: https://doi.org/10.1109/CVPR2013113.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. H. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, L. Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015. DOI: https://doi.org/10.1007/s11263-015-0816-y.
Article MathSciNet Google Scholar
B. Romera-Paredes, P. H. S. Torr. An embarrassingly simple approach to zero-shot learning. In Proceedings of the 32nd International Conference on Machine Learning, ACM, Lille, France, pp. 2152–2161, 2015.
Google Scholar
Z. M. Zhang, V. Saligrama. Zero-shot learning via semantic similarity embedding. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 4166–1174, 2015. DOI: https://doi.org/10.1109/ICCV.2015.474.
Z. Y. Fu, T. A. Xiang, E. Kodirov, S. G. Gong. Zero-shot object recognition by semantic manifold distance. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 2635–2644, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298879.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Curran Associates Inc., Lake Tahoe, USA, spp. 1188–1196, 2013.
Google Scholar
N. Zhang, J. Donahue, R. Girshick, T. Darrell. Part-based R-CNNs for fine-grained category detection. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 834–849, 2014. DOI: https://doi.org/10.1007/978-3-319-10590-1.54.
Google Scholar
S. L. Huang, Z. Xu, D. C. Tao, Y. Zhang. Part-stacked CNN for fine-grained visual categorization. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 1173–1182, 2016. DOI: https://doi.org/10.1109/CVPR.2016.132.
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Curran Associates Inc., Lake Tahoe, USA, pp. 1097–1105, 2012.
Google Scholar
Z. Xu, S. L. Huang, Y. Zhang, D. C. Tao. Augmenting strong supervision using web data for fine-grained categorization. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 2524–2532, 2015. DOI: https://doi.org/10.1109/ICCV.2015.290.
H. Zhang, T. Xu, M. Elhoseiny, X. L. Huang, S. T. Zhang, A. Elgammal, D. Metaxas. SPDA-CNN: Unifying semantic part detection and abstraction for fine-grained recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 1143–1152, 2016. DOI: https://doi.org/10.1109/CVPR.2016.129.
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 1–9, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.
T. Y. Lin, A. RoyChowdhury, S. Maji. Bilinear CNN models for fine-grained visual recognition. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1449–1457, 2015. DOI: https://doi.org/10.1109/ICCV.2015.170.
X. P. Zhang, H. K. Xiong, W. G. Zhou, W. Y. Lin, Q. Tian. Picking deep filter responses for fine-grained image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 1134–1142, 2016. DOI: https://doi.org/10.1109/CVPR.2016.128.
E. Kodirov, T. Xiang, Z. Y. Fu, S. G. Gong. Unsupervised domain adaptation for zero-shot learning. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 2452–2460, 2015. DOI: https://doi.org/10.1109/ICCV.2015.282.
C. H. Lampert, H. Nickisch, S. Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 3, pp. 453–465, 2014. DOI: https://doi.org/10.1109/TPAMI.2013.140.
Article Google Scholar
P. Kankuekul, A. Kawewong, S. Tangruamsub, O. Hasegawa. Online incremental attribute-based zero-shot learning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, pp. 3657–3664, 2012. DOI: https://doi.org/10.1109/CVPR.2012.6248112.
M. Rohrbach, M. Stark, B. Schiele. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Springs, Colorado USA, pp. 1641–1648, 2011. DOI: https://doi.org/10.1109/CVPR.2011.5995627.
X. D. Yu, Y. Aloimonos. Attribute-based transfer learning for object categorization with zero/one training example. In Proceedings of the 11th European Conference on Computer Vision, Springer, Heraklion, Greece, pp. 127–140, 2010. DOI: https://doi.org/10.1007/978-3-642-15555-0_10.
Google Scholar
M. Palatucci, D. Pomerleau, G. Hinton, T. M. Mitchell. Zero-shot learning with semantic output codes. In Proceedings of the 22nd International Conference on Neural Information Processing Systems, Curran Associates Inc., Vancouver, Canada, pp. 1410–1418, 2009.
Google Scholar
C. H. Lampert, H. Nickisch, S. Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, pp. 951–958, 2009. DOI: https://doi.org/10.1109/CVPR.2009.5206594.
Y. Q. Xian, B. Schiele, Z. Akata. Zero-shot learning-the good, the bad and the ugly. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 4582–4591, 2017. DOI: https://doi.org/10.1109/CVPR.2017.328.
Z. M. Zhang, V. Saligrama. Zero-shot learning via joint latent similarity embedding. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 6034–6042, 2016. DOI: https://doi.org/10.1109/CVPR.2016.649.
M. Bucher, S. Herbin, F. Jurie. Improving semantic embedding consistency by metric learning for zero-shot classification. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 730–746, 2016. DOI: https://doi.org/10.1007/978-3-319-46454-1_44.
Google Scholar
Z. Akata, M. Malinowski, M. Fritz, B. Schiele. Multi-cue zero-shot learning with strong supervision. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 59–68, 2016. DOI: https://doi.org/10.1109/CVPR.2016.14.
Google Scholar
R. Z. Qiao, L. Q. Liu, C. H. Shen, A. van denHengel. Less is more: Zero-shot learning from online textual documents with noise suppression. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 2441–2448, 2016. DOI: https://doi.org/10.1109/CVPR.2016.247.
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of International Conference on Learning Representations, ICLR, San Diego, USA, pp. 59–68, 2015.
Google Scholar
Y. Ganin, V. S. Lempitsky. Unsupervised domain adaptation by backpropagation. In Proceedings of the 32nd International Conference on Machine Learning, ICML, Lille, France, pp. 1180–1189, 2015.
Google Scholar
D. Y. Zhou, J. Y. Huang, B. Scholkopf. Learning from labelled and unlabelled data on a directed graph. In Proceedings of the 22nd International Conference on Machine Learning, ICML, Bonn, Germany, pp. 1036–1043, 2005.
Google Scholar
A. X. Li, Z. W. Lu, L. W. Wang, T. Xiang, J. R. Wen. Zero-shot scene classification for high spatial resolution remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 7, pp. 4157–4167, 2017. DOI: https://doi.org/10.1109/TGRS.2017.2689071.
Article Google Scholar
R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, C. J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, vol. 9, pp. 1871–1874, 2008.
MATH Google Scholar
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, vol. 1, no. 4, pp. 541–551, 1989. DOI: https://doi.org/10.1162/neco.1989.1.4.541.
Article Google Scholar
Y. Q. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, ACM, Orlando, USA, 2014. DOI: https://doi.org/10.1145/2647868.2654889.
Google Scholar

Download references

Acknowledgement

This work was supported by National Basic Research Program of China (973 Program) (No. 2015CB352502), National Nature Science Foundation of China (No. 61573026) and Beijing Nature Science Foundation (No. L172037).

Author information

Authors and Affiliations

The Key Laboratory of Machine Perception (MOE), School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Ao-Xue Li, Ke-Xin Zhang & Li-Wei Wang

Authors

Ao-Xue Li
View author publications
You can also search for this author in PubMed Google Scholar
Ke-Xin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Li-Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ao-Xue Li.

Additional information

Recommended by Associate Editor Bin Luo

The original version of this article was revised due to a retrospective Open Access order.

Ao-Xue Li received the B. Sc. degree in electronic science and technology from Beijing Normal University, China in 2015. She is currently a Ph. D. degree candidate in computer science and technology at Peking University, China.

Her research interests include computer vision and machine learning.

Ke-Xin Zhang received the B. Sc. degree in computer science and technology from Peking University, China in 2018. She is currently a master stadent in computer science and technology at Peking University, China.

Her research interests include computer vision and machine learning.

Li-Wei Wang received the B. Sc. and M. Sc. degrees in electronic engineering from Department of Electronic Engineering, Tsinghua University, China in 1999 and 2002, respectively, the Ph. D. degree in applied mathematics from School of Mathematical Sciences, Peking University, China in 2005. He is currently a full professor of School of Electronics Engineering and Computer Sciences, Peking University, China. He has published about 100 refereed journal and conference papers. He was named among “AI’s 10 to Watch” in 2010.

His research interest is machine learning, with application to computer vision.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, AX., Zhang, KX. & Wang, LW. Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics. Int. J. Autom. Comput. 16, 563–574 (2019). https://doi.org/10.1007/s11633-019-1177-8

Download citation

Received: 10 October 2018
Accepted: 08 March 2019
Published: 15 May 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11633-019-1177-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics

Abstract

Article PDF

Similar content being viewed by others

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

HybridPrompt: Domain-Aware Prompting for Cross-Domain Few-Shot Learning

Learning to Prompt for Vision-Language Models

Change history

17 January 2020

17 January 2020

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics

Abstract

Article PDF

Similar content being viewed by others

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

HybridPrompt: Domain-Aware Prompting for Cross-Domain Few-Shot Learning

Learning to Prompt for Vision-Language Models

Change history

17 January 2020

17 January 2020

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation