Coarse2Fine: a two-stage training method for fine-grained visual classification

Eshratifar, Amir Erfan; Eigen, David; Gormish, Michael; Pedram, Massoud

doi:10.1007/s00138-021-01180-y

Coarse2Fine: a two-stage training method for fine-grained visual classification

Original Paper
Published: 25 February 2021

Volume 32, article number 49, (2021)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Amir Erfan Eshratifar ORCID: orcid.org/0000-0002-1339-7671¹,
David Eigen²,
Michael Gormish² &
…
Massoud Pedram¹

512 Accesses
6 Citations
Explore all metrics

Abstract

Small inter-class and large intra-class variations are the key challenges in fine-grained visual classification. Objects from different classes share visually similar structures, and objects in the same class can have different poses and viewpoints. Therefore, the proper extraction of discriminative local features (e.g., bird’s beak or car’s headlight) is crucial. Most of the recent successes on this problem are based upon the attention models which can localize and attend the local discriminative objects parts. In this work, we propose a training method for visual attention networks, Coarse2Fine, which creates a differentiable path from the attended feature maps to the input space. Coarse2Fine learns an inverse mapping function from the attended feature maps to the informative regions in the raw image, which will guide the attention maps to better attend the fine-grained features. Besides, we propose an initialization method for the attention weights. Our experiments show that Coarse2Fine reduces the classification error by up to 5.1% on common fine-grained datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Interpretable Attention Guided Network for Fine-Grained Visual Classification

Fine-grained visual classification with multi-scale features based on self-supervised attention filtering mechanism

Article 17 March 2022

Haiyuan Chen, Lianglun Cheng, … Wing-Kuen Ling

Improving Fine-Grained Visual Recognition in Low Data Regimes via Self-boosting Attention Mechanism

References

Dubey, A., Gupta, O., Guo, P., Raskar, R., Farrell, R., Naik, N.: Pairwise confusion for fine-grained visual classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision: ECCV 2018, pp. 71–88. Springer, Cham (2018)
Chapter Google Scholar
Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4476–4484 (2017)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
He, X., Peng, Y.: Weakly supervised learning of part selection model with spatial constraints for fine-grained image classification. In: AAAI (2017)
Horn, G.V., Mac Aodha, O., Song, Y., Shepard, A., Adam, H., Perona, P., Belongie, S.J.: The inaturalist challenge 2017 dataset. arXiv:1707.06642 (2017)
Hu, T., Qi, H.: See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification. arXiv:1901.09891 (2019)
Hu, T., Qi, H., Huang, C., Huang, Q., Lu, Y., Xu, J.: Weakly supervised local attention network for fine-grained visual classification. arXiv:1808.02152 (2018)
Jaderberg, M., Simonyan, K., Zisserman, A., kavukcuoglu, k.: Spatial transformer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds) Advances in Neural Information Processing Systems 28, pp 2017–2025 (2015)
Khosla, A., Jayadevaprakash, N., Yao, B., Li, F.F.: Novel dataset for fine-grained image categorization : Stanford dogs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011) (2012)
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: 2013 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 554–561. IEEE Computer Society, Los Alamitos, CA, USA (2013)
Krause, J., Jin, H., Yang, J., Fei-Fei, L.: Fine-grained recognition without part annotations. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5546–5555 (2015)
Li, P., Xie, J., Wang, Q., Gao, Z.: Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Lin, D., Shen, X., Lu, C., Jia, J.: Deep lac: Deep localization, alignment and classification for fine-grained recognition. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1666–1674 (2015)
Lin, T., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1449–1457 (2015)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)
Maji, S., Rahtu, E., Kannala, J., Blaschko, M.B., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv:1306.5151 (2013)
Simon, M., Rodner, E.: Neural activation constellations: unsupervised part model discovery with convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Sun, M., Yuan, Y., Zhou, F., Ding, E.: Multi-attention multi-class constraint for fine-grained image recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision: ECCV 2018, pp. 834–850. Springer, Cham (2018)
Chapter Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv:1602.07261 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset. Technical report CNS-TR-2011-001 (2011)
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. arXiv:1704.06904 (2017)
Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., Maybank, S.J.: Learning attentions: residual attentional siamese network for high performance online visual tracking. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 4854–4863 (2018)
Wang, Y., Morariu, V.I., Davis, L.S.: Weakly-supervised discriminative patch learning via CNN for fine-grained recognition. arXiv:1611.09932 (2016)
Wei, X., Zhang, Y., Gong, Y., Zhang, J., Zheng, N.: Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision: ECCV 2018, pp. 365–380. Springer, Cham (2018)
Chapter Google Scholar
Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: ECCV (2016)
Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., Wang, L.: Learning to navigate for fine-grained classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision: ECCV 2018, pp. 438–454. Springer, Cham (2018)
Chapter Google Scholar
Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based r-cnns for fine-grained category detection. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision: ECCV 2014, pp. 834–849. Springer, Cham (2014)
Chapter Google Scholar
Zhang, X., Wei, Y., Feng, J., Yang, Y.,Huang, T.S.: Adversarial complementary learning for weakly supervised object localization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5219–5227 (2017)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Zhou, Z.H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2017)
Article Google Scholar

Download references

Acknowledgements

This work has been done during the internship of Amir Erfan Eshratifar at Clarifai.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, 90089, USA
Amir Erfan Eshratifar & Massoud Pedram
Clarifai, San Francisco, CA, 94105, USA
David Eigen & Michael Gormish

Authors

Amir Erfan Eshratifar
View author publications
You can also search for this author in PubMed Google Scholar
David Eigen
View author publications
You can also search for this author in PubMed Google Scholar
Michael Gormish
View author publications
You can also search for this author in PubMed Google Scholar
Massoud Pedram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amir Erfan Eshratifar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eshratifar, A.E., Eigen, D., Gormish, M. et al. Coarse2Fine: a two-stage training method for fine-grained visual classification. Machine Vision and Applications 32, 49 (2021). https://doi.org/10.1007/s00138-021-01180-y

Download citation

Received: 30 March 2020
Revised: 16 December 2020
Accepted: 03 February 2021
Published: 25 February 2021
DOI: https://doi.org/10.1007/s00138-021-01180-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Coarse2Fine: a two-stage training method for fine-grained visual classification

Abstract

Access this article

Similar content being viewed by others

Interpretable Attention Guided Network for Fine-Grained Visual Classification

Fine-grained visual classification with multi-scale features based on self-supervised attention filtering mechanism

Improving Fine-Grained Visual Recognition in Low Data Regimes via Self-boosting Attention Mechanism

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Coarse2Fine: a two-stage training method for fine-grained visual classification

Abstract

Access this article

Similar content being viewed by others

Interpretable Attention Guided Network for Fine-Grained Visual Classification

Fine-grained visual classification with multi-scale features based on self-supervised attention filtering mechanism

Improving Fine-Grained Visual Recognition in Low Data Regimes via Self-boosting Attention Mechanism

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation