Skip to main content
Log in

Adversarial erasing attention for fine-grained image classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recognizing fine-grained subcategories is a challenging task due to the large intra-class diversities and small inter-class variances of the fine-grained images. The common thought is to find out the parts that can distinguish similar subcategories efficiently. Most previous works rely on the manual annotations or attention technologies to localize the discriminative parts and have achieved great progress. However, these manual annotations are demanding in practical applications and some complicated constrains on the loss functions have to be adopted to localize the discriminative parts for building multi-view feature representations. To handle the challenges above, the strategy of adversarial erasing is applied on the attention module in this paper, which learns to localize different discriminative parts by erasing the most one from the image. Without the complicated loss functions, the proposed attention module can localize the discriminative parts more efficiently. Different from many part based methods, the classification network which consists of three subnetworks is introduced, and the subnetworks are trained by the original image and two discriminative parts respectively. Moreover, features learned from the three subnetworks are then fused in a more efficiently way to build better feature representations. Four mostly used datasets of CUB-200-2011, Stanford Dogs, Stanford Cars and FGVC-Aircraft are utilized to evaluate the proposed method and experimental results show that it can outperform some state-of-the-art methods without using the manual annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2927–2936

  2. Azizpour H, Razavian AS, Sullivan J, Maki A, Carlsson S (2016) Factors of transferability for a generic convnet representation. IEEE Trans Pattern Anal Mach Intell 38(9):1790–1802

    Article  Google Scholar 

  3. Berg T, Liu J, Woo Lee S, Alexander ML, Jacobs D, Belhumeur P (2014) Birdsnap: Large-scale fine-grained visual categorization of birds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2011–2018

  4. Bourdev L, Maji S, Brox T, Malik J (2010) Detecting people using mutually consistent poselet activations. In: European conference on computer vision. Springer, pp 168–181

  5. Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. BMVC

  6. Chai Y, Lempitsky V, Zisserman A (2013) Symbiotic segmentation and part localization for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 321–328

  7. Chang YS (2018) Fine-grained attention for image caption generation. Multimed Tools Appl 77(3):2959–2971

    Article  Google Scholar 

  8. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: Delving deep into convolutional nets. arXiv:1405.3531

  9. Dai Z, Chen M, Zhu S, Tan P (2018) Batch feature erasing for person re-identification and beyond. arXiv:1811.07130

  10. Darrell T, Huang C, Jia Y (2012) Beyond spatial pyramids: Receptive field learning for pooled image features. In: 2012 IEEE Conference on computer vision and pattern recognition. IEEE, pp 3370–3377

  11. Ding Z, Fu Y (2016) Robust transfer metric learning for image classification. IEEE Trans Image Process 26(2):660–670

    Article  MathSciNet  Google Scholar 

  12. Farrell R, Oza O, Zhang N, Morariu VI, Darrell T, Davis LS (2011) Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In: 2011 IEEE international conference on Computer vision (ICCV). IEEE, pp 161–168

  13. Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR

  14. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  15. Gosselin PH, Murray N, Jégou H, Perronnin F (2014) Revisiting the fisher vector for fine-grained classification. Pattern Recogn Lett 49:92–98

    Article  Google Scholar 

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  17. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  18. Huang C, He Z, Cao G, Cao W (2016) Task-driven progressive part localization for fine-grained object recognition. IEEE Trans Multimed 18(12):2372–2383

    Article  Google Scholar 

  19. Huh M, Agrawal P, Efros AA (2016) What makes imagenet good for transfer learning? arXiv:1608.08614

  20. Huh MH, Zhang N (2019) Feedback adversarial learning: Spatial feedback for improving generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1476–1485

  21. Iscen A, Tolias G, Gosselin PH, Jégou H (2015) A comparison of dense region detectors for image search and fine-grained classification. IEEE Trans Image Process 24(8):2369–2381

    Article  MathSciNet  Google Scholar 

  22. Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: CVPR Workshops, vol 2, pp 1

  23. Krause J, Jin H, Yang J, Fei-Fei L (2015) Fine-grained recognition without part annotations. In: 2015 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 5546–5555

  24. Krause J, Stark M, Deng J, Fei-fei L (2013) 3d object representations for fine-grained categorization. In: 2013 IEEE international conference on Computer vision workshops (ICCVW). IEEE, pp 554–561

  25. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  26. Kumar A, Kim J, Lyndon D, Fulham M, Feng D (2016) An ensemble of fine-tuned convolutional neural networks for medical image classification. IEEE J Biomed Health Inf 21(1):31–40

    Article  Google Scholar 

  27. Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1222–1230

  28. Lin TY, RoyChowdhury A, Maji S (2017) Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence

  29. Liu J, Kanazawa A, Jacobs D, Belhumeur P (2012) Dog breed classification using part localization. ECCV. pp 172–185

  30. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  31. Liu X, Xia T, Wang J, Lin Y (2016) Fully convolutional attention localization networks: Efficient attention localization for fine-grained recognition. CoRR, arXiv:1603.06765

  32. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  33. Lu X, Guo Y, Liu N, Wan L, Fang T (2018) Non-convex joint bilateral guided depth upsampling. Multimed Tools Appl 77(12):15521–15544

    Article  Google Scholar 

  34. Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Trans Circ Syst Video Technol 10(19):1–15

    Google Scholar 

  35. Lu X, Ma C, Ni B, Yang X, Reid I, Yang MH (2018) Deep regression tracking with shrinkage loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 353–369

  36. Lu X, Ni B, Ma C, Yang X (2019) Learning transform-aware attentive network for object tracking. Neurocomputing 349:133–144

    Article  Google Scholar 

  37. Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3623–3632

  38. Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151

  39. Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. In: ICVGIP. IEEE, pp 722–729

  40. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1717–1724

  41. Quattoni A, Collins M, Darrell T (2008) Transfer learning for image classification with sparse prototype representations. In: CVPR. IEEE, pp 1–8

  42. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  43. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  44. Rosenfeld A, Ullman S (2016) Visual concept recognition and localization via iterative introspection. In: Asian conference on computer vision. Springer, pp 264–279

  45. Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298

    Article  Google Scholar 

  46. Simon M, Rodner E (2015) Neural activation constellations: Unsupervised part model discovery with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1143–1151

  47. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, pp 140–C1556

  48. Singh KK, Ojha U, Lee YJ (2018) Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. arXiv:1811.11155

  49. Stark M, Krause J, Pepik B, Meger D, Little JJ, Schiele B, Koller D (2011) Fine-grained categorization for 3d scene understanding. Int J Robot Res 30 (13):1543–1552

    Article  Google Scholar 

  50. Sumbul G, Cinbis RG, Aksoy S (2019) Multisource region attention network for fine-grained object recognition in remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing

  51. Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. In: 2017 IEEE international conference on Computer vision (ICCV). IEEE, pp 843–852

  52. Van Horn G, Branson S, Farrell R, Haber S, Barry J, Ipeirotis P, Perona P, Belongie S (2015) Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 595–604

  53. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. California Inst. Technol. Pasadena, Tech. Rep CNS-TR-2011-001

  54. Wang D, Shen Z, Shao J, Zhang W, Xue X, Zhang Z (2015) Multiple granularity descriptors for fine-grained categorization. In: 2015 IEEE international conference on Computer vision (ICCV). IEEE, pp 2399–2406

  55. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164

  56. Wang H, Gong D, Li Z, Liu W (2019) Decorrelated adversarial learning for age-invariant face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3527–3536

  57. Wang W, Lu X, Shen J, Crandall DJ, Shao L (2019) Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9236–9245

  58. Wang Y, Choi J, Morariu VI, Davis LS (2016) Mining discriminative triplets of patches for fine-grained classification. arXiv:1605.01130

  59. Wegner JD, Branson S, Hall D, Schindler K, Perona P (2016) Cataloging public objects using aerial and street-level images-urban trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6014–6023

  60. Wei XS, Xie CW, Wu J, Shen C (2018) Mask-cnn: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714

    Article  Google Scholar 

  61. Wei Y, Feng J, Liang X, Cheng MM, Zhao Y, Yan S (2017) Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1568–1576

  62. Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 842–850

  63. Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1641–1648

  64. Xie L, Tian Q, Wang M, Zhang B (2014) Spatial pooling of heterogeneous features for image classification. IEEE Trans Image Process 23(5):1994–2008

    Article  MathSciNet  Google Scholar 

  65. Xie N, Lai F, Doran D, Kadav A (2019) Visual entailment: A novel task for fine-grained image understanding. arXiv:1901.06706

  66. Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: ECCV 2018. Springer, pp 438–454

  67. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks?. In: Advances in neural information processing systems, pp 3320–3328

  68. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833

  69. Zhang H, Hu S, Zhang X (2014) Sift flow for large-displacement object tracking. Appl Opt 53(27):6194–6205

    Article  Google Scholar 

  70. Zhang H, Wang Y, Luo L, Lu X, Zhang M (2017) Sift flow for abrupt motion tracking via adaptive samples selection with sparse representation. Neurocomputing 249:253–265

    Article  Google Scholar 

  71. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European conference on computer vision. Springer, pp 834–849

  72. Zhang N, Farrell R, Darrell T (2012) Pose pooling kernels for sub-category recognition. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, pp 3665–3672. IEEE

  73. Zhang N, Farrell R, Iandola F, Darrell T (2013) Deformable part descriptors for fine-grained recognition and attribute prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp 729–736

  74. Zhang N, Shelhamer E, Gao Y, Darrell T (2015) Fine-grained pose prediction, normalization, and recognition. arXiv:1511.07063

  75. Zhang T, Ghanem B, Liu S, Ahuja N (2013) Robust visual tracking via structured multi-task sparse learning. Int J Comput Vis 101(2):367–383

    Article  MathSciNet  Google Scholar 

  76. Zhang X, Wei Y, Feng J, Yang Y, Huang TS (2018) Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1325–1334

  77. Zhang X, Xiong H, Zhou W, Lin W, Tian Q (2016) Picking deep filter responses for fine-grained image recognition. In: CVPR, pp 1134–1142

  78. Zhang X, Xiong H, Zhou W, Lin W, Tian Q (2017) Picking neural activations for fine-grained recognition. IEEE Trans Multimed 19(12):2736–2750

    Google Scholar 

  79. Zhang Y, Wei XS, Wu J, Cai J, Lu J, Nguyen VA, Do MN (2016) Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans Image Process 25(4):1713–1725

    Article  MathSciNet  Google Scholar 

  80. Zhao B, Feng J, Wu X, Yan S (2017) A survey on deep learning-based fine-grained object classification and semantic segmentation. Int J Autom Comput

  81. Zhao B, Wu X, Feng J, Peng Q, Yan S (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimed 19(6):1245–1256

    Article  Google Scholar 

  82. Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Int. Conf. on computer vision

  83. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: 2016 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 2921–2929

Download references

Acknowledgements

The work was partly supported by the National Science Foundation of China (NSFC), under contract No. 61673274.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huilin Xiong.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, J., Jiang, L., Zhang, T. et al. Adversarial erasing attention for fine-grained image classification. Multimed Tools Appl 80, 22867–22889 (2021). https://doi.org/10.1007/s11042-020-08666-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08666-3

Keywords

Navigation