Advertisement

Multimedia Tools and Applications

, Volume 77, Issue 23, pp 30291–30310 | Cite as

Exploring part-aware segmentation for fine-grained visual categorization

  • Cheng Pang
  • Hongxun Yao
  • Xiaoshuai Sun
  • Sicheng Zhao
  • Yanhao Zhang
Article

Abstract

It is challenge to segment fine-grained objects due to appearance variations and clutter of backgrounds. Most of existing segmentation methods hardly separate small parts of the instance from its background with sufficient accuracy. However, such small parts usually contain important semantic information, which is crucial in fine-grained categorization. Observing that fine-grained objects almost share the same configuration of parts, we present a novel part-aware segmentation method, which explicitly detects semantic parts and preserve these parts during segmentation. We firstly design a hybrid part localization method, which generates accurate part proposals with moderate computation. Then we iteratively update the segmentation outputs and the part proposals, which obtains better foreground segmentation results. Experiments demonstrate the superiority of the proposed method, as compared to state-of-the-art segmentation approaches for fine-grained categorization.

Keywords

Image segmentation Fine-grained visual categorization GrabCut 

Notes

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Project No. 61472103, No. 61772158 and No. 61702136.

References

  1. 1.
    Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2927–2936Google Scholar
  2. 2.
    Angelova A, Zhu S (2013) Efficient object detection and segmentation for fine-grained recognition. In: 2013 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE, pp 811–818Google Scholar
  3. 3.
    Berg T, Belhumeur PN (2013) Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 955–962Google Scholar
  4. 4.
    Berg T, Liu J, Lee SW, Alexander ML, Jacobs DW, Belhumeur PN (2014) Birdsnap: Large-scale fine-grained visual categorization of birds. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2019–2026Google Scholar
  5. 5.
    Bossard L, Guillaumin M, Van Gool L (2014) Food-101–mining discriminative components with random forests. In: European conference on computer vision (ECCV). Springer, pp 446–461Google Scholar
  6. 6.
    Boykov YY, Jolly MP (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in nd images. In: 2001 IEEE international conference on computer vision (ICCV). IEEE, pp 105–112Google Scholar
  7. 7.
    Branson S, Van Horn G, Wah C, Perona P, Belongie S (2014) The ignorant led by the blind: a hybrid human–machine vision system for fine-grained categorization. Int J Comput Vis 108(1-2):3–29MathSciNetzbMATHGoogle Scholar
  8. 8.
    Chai Y, Lempitsky V, Zisserman A Symbiotic segmentation and part localization for fine-grained categorization. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 321–328Google Scholar
  9. 9.
    Cheng M, Mitra NJ, Huang X, Torr PH, Hu S (2015) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582CrossRefGoogle Scholar
  10. 10.
    Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1, pp 1–2Google Scholar
  11. 11.
    Cui Y, Zhou F, Lin Y, Belongie S (2016) Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEEGoogle Scholar
  12. 12.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR), 2005 IEEE conference on, vol 1, pp 886–893. IEEEGoogle Scholar
  13. 13.
    Deng J, Krause J, Fei-Fei L (2013) Fine-grained crowdsourcing for fine-grained recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 580–587Google Scholar
  14. 14.
    Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338CrossRefGoogle Scholar
  15. 15.
    Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1–8Google Scholar
  16. 16.
    Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vision 59(2):167–181CrossRefGoogle Scholar
  17. 17.
    Freytag A, Rodner E, Darrell T, Denzler J (2014) Exemplar-specific patch features for fine-grained recognition. In: German Conference on Pattern Recognition. Springer, Cham, pp 144–156Google Scholar
  18. 18.
    Freytag A, Rodner E, Denzler J (2014) Birds of a feather flock together–local learning of mid-level representations for fine-grained recognition. In: ECCV workshop on parts and attributes, vol 2Google Scholar
  19. 19.
    Gkioxari G, Malik J (2015) Finding action tubes. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 759–768Google Scholar
  20. 20.
    Goering C, Rodner E, Freytag A, Denzler J (2014) Nonparametric part transfer for fine-grained recognition. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2489–2496Google Scholar
  21. 21.
    Huang S, Xu Z, Tao D, Zhang Y (2016) Part-stacked cnn for fine-grained visual categorization. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEEGoogle Scholar
  22. 22.
    Jain S, Xiong B, Grauman K (2017) Pixel objectness. arXiv:1701.0534
  23. 23.
    Jiang F, Zhang S, Wu S, Gao Y, Zhao D (2015) Multi-layered gesture recognition with kinect. J Mach Learn Res 16:227–254MathSciNetzbMATHGoogle Scholar
  24. 24.
    Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proceedings CVPR workshop on fine-grained visual categorization (FGVC), vol 2Google Scholar
  25. 25.
    Lin D, Shen X, Lu C, Jia J (2015) Deep lac: deep localization, alignment and classification for fine-grained recognition. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1666–1674Google Scholar
  26. 26.
    Lin TY, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 1449–1457Google Scholar
  27. 27.
    Liu J, Belhumeur PN (2013) BBird part localization using exemplar-based models with enforced pose and subcategory consistency. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 2520–2527Google Scholar
  28. 28.
    Liu J, Kanazawa A., Jacobs D., Belhumeur P. (2012) Dog breed classification using part localization. In: Computer Vision–ECCV 2012, pp 172–185. SpringerGoogle Scholar
  29. 29.
    Liu J, Li Y, Belhumeur PN (2014) Part-pair representation for part localization. In: Computer Vision–ECCV 2014, pp 456–471. SpringerGoogle Scholar
  30. 30.
    Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI, vol 30, pp 1266–1272Google Scholar
  31. 31.
    Liu W, Yang X, Tao D, Cheng J, Tang Y (2018) Multiview dimension reduction via Hessian multiset canonical correlations. Information Fusion 41:119–128CrossRefGoogle Scholar
  32. 32.
    Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: Recognizing complex activities from sensor data. In: IJCAI, pp 1617–1623Google Scholar
  33. 33.
    Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115CrossRefGoogle Scholar
  34. 34.
    Liu Y., Zhang L, Nie L, Yan Y, Rosenblum DS (2016) Fortune teller: predicting your career path. In: AAAI, pp 201–207Google Scholar
  35. 35.
    Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum DS (2016) Urban water quality prediction based on multi-task multi-view learning. In: International joint conference on artificial intelligenceGoogle Scholar
  36. 36.
    Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1096–1104Google Scholar
  37. 37.
    Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of IEEE international conference on computer vision, p 1150Google Scholar
  38. 38.
    Malisiewicz T, Gupta A, Efros AA (2011) Ensemble of exemplar-svms for object detection and beyond. In: 2011 IEEE International conference on computer vision (ICCV). IEEE, pp 89–96Google Scholar
  39. 39.
    Mottos AB, Feris RS (2014) Fusing well-crafted feature descriptors for efficient fine-grained classification. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 5197–5201Google Scholar
  40. 40.
    Ni B, Yang X, Gao S (2016) Progressively parsing interactional objects for fine grained action detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1020–1028Google Scholar
  41. 41.
    Pang C, Yao H, Sun X (2014) Discriminative features for bird species classification. In: International Conference on internet multimedia computing and service. ACM, p 256Google Scholar
  42. 42.
    Pang C, Yao H, Yang Z, Sun X, Zhao S, Zhang Y (2015) Part-aware segmentation for fine-grained categorization. In: Pacific rim conference on multimedia, pp 538–548. SpringerGoogle Scholar
  43. 43.
    Preoţiuc-Pietro D, Liu Y, Hopkins D, Ungar L Beyond binary labels: political ideology prediction of twitter users. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), vol 1, pp 729–740Google Scholar
  44. 44.
    Rosch E, Mervis CB, Gray WD, Johnson DM, Boyes-Braem P (1976) Basic objects in natural categories. Cogn Psychol 8(3):382–439CrossRefGoogle Scholar
  45. 45.
    Rother C, Kolmogorov V, Blake A (2004) Interactive foreground extraction using iterated graph cuts. In: ACM transactions on graphics (TOG). ACM, vol 23, pp 309–314Google Scholar
  46. 46.
    Singh B, Shao M (2016) A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEEGoogle Scholar
  47. 47.
    Sochor J, Herout A, Havel J (2016) Boxcars: 3d boxes as cnn input for improved fine-grained vehicle recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3006–3015Google Scholar
  48. 48.
    Wah C, Branson S, Welinder P et al. (2011) The caltech-ucsd birds-200-2011 dataset. California Institute of TechnologyGoogle Scholar
  49. 49.
    Wang Y, Choi J, Morariu VI, Davis LS (2016) Mining discriminative triplets of patches for fine-grained classification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEEGoogle Scholar
  50. 50.
    Weijer JVD, Schmid C, Verbeek J, Larlus D (2009) Learning color names for real-world applications. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 18(7):1512–23MathSciNetCrossRefGoogle Scholar
  51. 51.
    Wilf P, Zhang S, Chikkerur S, Little SA, Wing SL, Serre T (2016) Computer vision cracks the leaf code. Proceedings of the National Academy of Sciences of the United States of America 113(12):3305– 3310CrossRefGoogle Scholar
  52. 52.
    Wu B, Nevatia R, Li Y (2008) Segmentation of multiple, partially occluded objects by grouping, merging, assigning part detection responses. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1–8Google Scholar
  53. 53.
    Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 842–850Google Scholar
  54. 54.
    Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: 2013 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 1641–1648Google Scholar
  55. 55.
    Yao B, Khosla A, Fei-Fei L (2011) Combining randomization and discrimination for fine-grained image categorization. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1577–1584Google Scholar
  56. 56.
    Yao B, Ma J, Fei-Fei L (2013) Discovering object functionality. In: 2013 IEEE international conference on computer vision (ICCV). IEEE, pp 2512–2519Google Scholar
  57. 57.
    Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European conference on computer vision (ECCV), pp 834–849. SpringerGoogle Scholar
  58. 58.
    Zhang N, Farrell R, Iandola F, Darrell T (2013) Deformable part descriptors for fine-grained recognition and attribute prediction. In: 2013 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 729–736Google Scholar
  59. 59.
    Zhang S, Kasiviswanathan S, Yuen PC, Harandi M (2015) Online dictionary learning on symmetric positive definite manifolds with vision applications. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 3165–3173Google Scholar
  60. 60.
    Zhang S, Yao H, Sun X, Wang K, Zhang J, Lu X, Zhang Y (2014) Action recognition based on overcomplete independent component analysis. Inf Sci 281:635–647CrossRefGoogle Scholar
  61. 61.
    Zhang S, Zhou H, Jiang F, Li X (2015) Robust visual tracking using structurally random projection and weighted least squares. IEEE Trans Circuits Syst Video Technol 25(11):1749–1760CrossRefGoogle Scholar
  62. 62.
    Zhang S, Zhou H, Yao H, Zhang Y, Wang K, Zhang J (2015) Adaptive normalhedge for robust visual tracking. Signal Process 110:132–142CrossRefGoogle Scholar
  63. 63.
    Zhang X, Zhou F, Lin Y, Zhang S (2016) Embedding label structures for fine-grained feature representation. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEEGoogle Scholar
  64. 64.
    Zhou F, Lin Y (2016) Fine-grained image classification by exploring bipartite-graph labels. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEEGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations