Advertisement

PFNet: a novel part fusion network for fine-grained visual categorization

  • Jingyun Liang
  • Jinlin GuoEmail author
  • Yanming Guo
  • Songyang Lao
Article
  • 52 Downloads

Abstract

The existing methods in fine-grained visual categorization focus on integrating multiple deep CNN models or complicated attention mechanism, resulting in increasing cumbersome networks. In addition, most methods rely on part annotations which requires expensive expert guidance. In this paper, without extra annotation, we propose a novel part fusion network (PFNet) to effectively fuse discriminative image parts for classification. More specifically, PFNet consists of a part feature extractor to extract part features and a two-level classification network to utilize part-level and image-level features simultaneously. Part-level features are trained with the weighted part loss, which embeds a weighting mechanism based on different parts’ characteristics. Easy parts, hard parts and background parts are proposed and discriminatively used for classification. Moreover, part-level features are fused to form an image-level feature so as to introduce global supervision and generate final predictions. Experiments on three popular benchmark datasets show that our framework achieves competitive performance compared with the state-of-the-art. Code is available at https://github.com/MichaelLiang12/PFNet-FGVC.

Keywords

Fine-grained visual categorization Image classification Convolutional neural network 

Notes

Acknowledgments

This work was supported by National Natural Science Foundation of China: 61571453.

References

  1. 1.
    Branson S, Van Horn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. arXiv:1406.2952
  2. 2.
    Cai S, Zuo W, Zhang L (2017) Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 511–520Google Scholar
  3. 3.
    Chen X, Gupta A (2017) An implementation of faster rcnn with study for region sampling. arXiv:1702.02138
  4. 4.
    Cui Y, Zhou F, Lin Y, Belongie S (2016) Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1153–1162Google Scholar
  5. 5.
    Cui Y, Zhou F, Wang J, Liu X, Lin Y, Belongie S (2017) Kernel pooling for convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2930Google Scholar
  6. 6.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 886–893Google Scholar
  7. 7.
    Farrell R, Oza O, Zhang N, Morariu VI, Darrell T, Davis LS (2011) Birdlets: subordinate categorization using volumetric primitives and pose-normalized appearance. In: Proceedings of the IEEE international conference on computer vision, pp 161–168Google Scholar
  8. 8.
    Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446Google Scholar
  9. 9.
    Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326Google Scholar
  10. 10.
    Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448Google Scholar
  11. 11.
    He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37 (9):1904–1916CrossRefGoogle Scholar
  12. 12.
    Huang S, Xu Z, Tao D, Zhang Y (2016) Part-stacked cnn for fine-grained visual categorization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1173–1182Google Scholar
  13. 13.
    Karessli N, Akata Z, Schiele B, Bulling A, et al. (2017) Gaze embeddings for zero-shot image classification. In: Proceedings of the IEEE international conference on computer vision, pp 6412–6421Google Scholar
  14. 14.
    Kong S, Fowlkes C (2017) Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7025–7034Google Scholar
  15. 15.
    Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561Google Scholar
  16. 16.
    Krause J, Jin H, Yang J, Fei-Fei L (2015) Fine-grained recognition without part annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5546–5555Google Scholar
  17. 17.
    Lin D, Shen X, Lu C, Jia J (2015) Deep lac: deep localization, alignment and classification for fine-grained recognition. In:Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1666–1674Google Scholar
  18. 18.
    Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457Google Scholar
  19. 19.
    Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988Google Scholar
  20. 20.
    Liu L, Fieguth P (2012) Texture classification from random features. IEEE Trans Pattern Anal Mach Intell 34(3):574–586CrossRefGoogle Scholar
  21. 21.
    Liu J, Kanazawa A, Jacobs D, Belhumeur P (2012) Dog breed classification using part localization. In: European conference on computer vision, pp 172–185Google Scholar
  22. 22.
    Liu X, Xia T, Wang J, Yang Y, Zhou F, Lin Y (2016) Fully convolutional attention networks for fine-grained recognition. arXiv:1603.06765
  23. 23.
    Liu L, Chen J, Fieguth P, Zhao G, Chellappa R, Pietikainen M (2018) A survey of recent advances in texture representation. arXiv:1801.10324
  24. 24.
    Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2018) Deep learning for generic object detection: a survey. arXiv:1809.02165
  25. 25.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110MathSciNetCrossRefGoogle Scholar
  26. 26.
    Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151
  27. 27.
    Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRefGoogle Scholar
  28. 28.
    Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorchGoogle Scholar
  29. 29.
    Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769Google Scholar
  30. 30.
    Simon M, Rodner E (2015) Neural activation constellations: unsupervised part model discovery with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 1143–1151Google Scholar
  31. 31.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  32. 32.
    Tang S, Zheng Y-T, Wang Y, Chua T-S (2012) Sparse ensemble learning for concept detection. IEEE Trans Multimed 14(1):43–54CrossRefGoogle Scholar
  33. 33.
    Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171Google Scholar
  34. 34.
    Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD Birds-200-2011 Dataset. Tech. Rep. CNS-TR-2011-001 California Institute of TechnologyGoogle Scholar
  35. 35.
    Wang D, Shen Z, Shao J, Zhang W, Xue X, Zhang Z (2015) Multiple granularity descriptors for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision, pp 2399–2406Google Scholar
  36. 36.
    Wang Y, Choi J, Morariu VI, Davis LS (2016) Mining discriminative triplets of patches for fine-grained classification. arXiv:1605.01130
  37. 37.
    Wei XS, Xie CW, Wu J (2016) Mask-cnn: localizing parts and selecting descriptors for fine-grained image recognition. arXiv:1605.06878
  38. 38.
    Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842–850Google Scholar
  39. 39.
    Xie S, Yang T, Wang X, Lin Y (2015) Hyper-class augmented and regularized deep learning for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2645–2654Google Scholar
  40. 40.
    Xu Z, Huang S, Zhang Y, Tao D (2015) Augmenting strong supervision using web data for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision, pp 2524–2532Google Scholar
  41. 41.
    Yang S, Wang J, Wang J, Shapiro L (2012) Unsupervised template learning for fine-grained object recognition. In: Proceedings of the neural information processing systems, pp 3122–3130Google Scholar
  42. 42.
    Zhang N, Farrell R, Darrell T (2012) Pose pooling kernels for sub-category recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3665–3672Google Scholar
  43. 43.
    Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European conference on computer vision, pp 834–849Google Scholar
  44. 44.
    Zhang X, Xiong H, Zhou W, Lin W, Tian Q (2016) Picking deep filter responses for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1134–1142Google Scholar
  45. 45.
    Zhang H, Xu T, Elhoseiny M, Huang X, Zhang S, Elgammal A, Metaxas D (2016) Spda-cnn: unifying semantic part detection and abstraction for fine-grained recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1143–1152Google Scholar
  46. 46.
    Zhang Y, Wei XS, Wu J, Cai J, Lu J, Nguyen VA, Do MN (2016) Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans Image Process 25(4):1713–1725MathSciNetCrossRefGoogle Scholar
  47. 47.
    Zhao B, Wu X, Feng J, Peng Q, Yan S (2016) Diversified visual attention networks for fine-grained object classification. arXiv:1606.08572
  48. 48.
    Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, pp 5209–5217Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Jingyun Liang
    • 1
  • Jinlin Guo
    • 1
    Email author
  • Yanming Guo
    • 1
  • Songyang Lao
    • 1
  1. 1.College of System EngineeringNational University of Defense TechnologyChangshaChina

Personalised recommendations