Learning to Navigate for Fine-Grained Classification

  • Ze Yang
  • Tiange Luo
  • Dong Wang
  • Zhiqiang Hu
  • Jun Gao
  • Liwei WangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11218)


Fine-grained classification is challenging due to the difficulty of finding discriminative features. Finding those subtle traits that fully characterize the object is not straightforward. To handle this circumstance, we propose a novel self-supervision mechanism to effectively localize informative regions without the need of bounding-box/part annotations. Our model, termed NTS-Net for Navigator-Teacher-Scrutinizer Network, consists of a Navigator agent, a Teacher agent and a Scrutinizer agent. In consideration of intrinsic consistency between informativeness of the regions and their probability being ground-truth class, we design a novel training paradigm, which enables Navigator to detect most informative regions under the guidance from Teacher. After that, the Scrutinizer scrutinizes the proposed regions from Navigator and makes predictions. Our model can be viewed as a multi-agent cooperation, wherein agents benefit from each other, and make progress together. NTS-Net can be trained end-to-end, while provides accurate fine-grained classification predictions as well as highly informative regions during inference. We achieve state-of-the-art performance in extensive benchmark datasets.



This work is supported by National Basic Research Program of China (973 Program) (grant no. 2015CB352502), NSFC (61573026) and BJNSF (L172037).


  1. 1.
    Arbelaez, P., Ponttuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: CVPR, pp. 328–335 (2014)Google Scholar
  2. 2.
    Berg, T., Belhumeur, P.N.: POOF: part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: CVPR (2013)Google Scholar
  3. 3.
    Branson, S., Horn, G.V., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. In: BMVC (2014)Google Scholar
  4. 4.
    Burges, C., et al.: Learning to rank using gradient descent. In: ICML, pp. 89–96 (2005)Google Scholar
  5. 5.
    Cai, S., Zuo, W., Zhang, L.: Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: ICCV, October 2017Google Scholar
  6. 6.
    Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML, pp. 129–136 (2007)Google Scholar
  7. 7.
    Carreira, J., Sminchisescu, C.: CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts. IEEE Computer Society (2012)Google Scholar
  8. 8.
    Chai, Y., Lempitsky, V., Zisserman, A.: Symbiotic segmentation and part localization for fine-grained categorization. In: ICCV, pp. 321–328 (2013)Google Scholar
  9. 9.
    Cossock, D., Zhang, T.: Statistical analysis of bayes optimal subset ranking. IEEE Trans. Inf. Theory 54(11), 5140–5154 (2008)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  11. 11.
    Endres, I., Hoiem, D.: Category independent object proposals. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 575–588. Springer, Heidelberg (2010). Scholar
  12. 12.
    Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPRGoogle Scholar
  13. 13.
    Gavves, E., Fernando, B., Snoek, C.G.M., Smeulders, A.W.M., Tuytelaars, T.: Fine-grained categorization by alignments. In: ICCV, pp. 1713–1720 (2014)Google Scholar
  14. 14.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)Google Scholar
  15. 15.
    Gosselin, P.H., Murray, N., Jgou, H., Perronnin, F.: Revisiting the fisher vector for fine-grained classification. Patt. Recogn. Lett. 49, 92–98 (2014)CrossRefGoogle Scholar
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. TPAMI 37(9), 1904–1916 (2015)CrossRefGoogle Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  18. 18.
    Herbrich, R.: Large margin rank boundaries for ordinal regression. In: Advances in Large Margin Classifiers, vol. 88 (2000)Google Scholar
  19. 19.
    Jaderberg, M., Simonyan, K., Zisserman, A., kavukcuoglu, k.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)Google Scholar
  20. 20.
    Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., Yan, S.: Tree-structured reinforcement learning for sequential object localization. In: NIPS, pp. 127–135 (2016)Google Scholar
  21. 21.
    Konda, V.R.: Actor-critic algorithms. SIAM J. Control Optim. 42(4), 1143–1166 (2002)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Krause, J., Jin, H., Yang, J., Fei-Fei, L.: Fine-grained recognition without part annotations. In: CVPR, June 2015Google Scholar
  23. 23.
    Krause, J., Stark, M., Jia, D., Li, F.F.: 3D object representations for fine-grained categorization. In: ICCV Workshops, pp. 554–561 (2013)Google Scholar
  24. 24.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  25. 25.
    Lam, M., Mahasseni, B., Todorovic, S.: Fine-grained recognition as HSnet search for informative image parts. In: CVPR, July 2017Google Scholar
  26. 26.
    Li, Z., Yang, Y., Liu, X., Zhou, F., Wen, S., Xu, W.: Dynamic computational time for visual attention. In: ICCV, October 2017Google Scholar
  27. 27.
    Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, July 2017Google Scholar
  28. 28.
    Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV (2015)Google Scholar
  29. 29.
    Liu, J., Kanazawa, A., Jacobs, D., Belhumeur, P.: Dog breed classification using part localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 172–185. Springer, Heidelberg (2012). Scholar
  30. 30.
    Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)CrossRefGoogle Scholar
  31. 31.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  32. 32.
    Liu, X., Xia, T., Wang, J., Lin, Y.: Fully convolutional attention localization networks: efficient attention localization for fine-grained recognition. CoRR (2016)Google Scholar
  33. 33.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, November 2015Google Scholar
  34. 34.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)Google Scholar
  35. 35.
    Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. Technical report (2013)Google Scholar
  36. 36.
    Moghimi, M., Belongie, S., Saberian, M., Yang, J., Vasconcelos, N., Li, L.J.: Boosted convolutional neural networks. In: BMVC, pp. 24.1–24.13 (2016)Google Scholar
  37. 37.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)Google Scholar
  38. 38.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  39. 39.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Lecun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. Arxiv (2013)Google Scholar
  41. 41.
    Uijlings, J.R., Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)CrossRefGoogle Scholar
  42. 42.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report (2011)Google Scholar
  43. 43.
    Wang, D., Shen, Z., Shao, J., Zhang, W., Xue, X., Zhang, Z.: Multiple granularity descriptors for fine-grained categorization. In: ICCV, pp. 2399–2406 (2015)Google Scholar
  44. 44.
    Xia, F., Liu, T.Y., Wang, J., Li, H., Li, H.: Listwise approach to learning to rank: theory and algorithm. In: ICML, pp. 1192–1199 (2008)Google Scholar
  45. 45.
    Xie, L., Tian, Q., Hong, R., Yan, S.: Hierarchical part matching for fine-grained visual categorization. In: ICCV, pp. 1641–1648 (2013)Google Scholar
  46. 46.
    Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 834–849. Springer, Cham (2014). Scholar
  47. 47.
    Zhang, X., Xiong, H., Zhou, W., Lin, W., Tian, Q.: Picking deep filter responses for fine-grained image recognition. In: CVPR, June 2016Google Scholar
  48. 48.
    Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. Trans. Multi. 19(6), 1245–1256 (2017)CrossRefGoogle Scholar
  49. 49.
    Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: ICCV, October 2017Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Key Laboratory of Machine Perception, MOE, School of EECSPeking UniversityBeijingChina
  2. 2.Center for Data Science, Beijing Institute of Big Data ResearchPeking UniversityBeijingChina

Personalised recommendations