Pattern Recognition and Image Analysis

, Volume 28, Issue 4, pp 658–663 | Cite as

In Defense of Active Part Selection for Fine-Grained Classification

  • D. KorschEmail author
  • J. Denzler
Proceedings of the 6th International Workshop


Fine-grained classification is a recognition task where subtle differences distinguish between different classes. To tackle this classification problem, part-based classification methods are mostly used. Partbased methods learn an algorithm to detect parts of the observed object and extract local part features for the detected part regions. In this paper we show that not all extracted part features are always useful for the classification. Furthermore, given a part selection algorithm that actively selects parts for the classification we estimate the upper bound for the fine-grained recognition performance. This upper bound lies way above the current state-of-the-art recognition performances which shows the need for such an active part selection method. Though we do not present such an active part selection algorithm in this work, we propose a novel method that is required by active part selection and enables sequential part-based classification. This method uses a support vector machine (SVM) ensemble and allows to classify an image based on arbitrary number of part features. Additionally, the training time of our method does not increase with the amount of possible part features. This fact allows to extend the SVM ensemble with an active part selection component that operates on a large amount of part feature proposals without suffering from increasing training time.


fine-grained recognition SVM ensemble bagging 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recognition with visual attention,” CoRR, abs/1412.7755 (2014). Scholar
  2. 2.
    L. Breiman. “Bagging predictors,” Mach. Learn. 24 (2), 123–140 (1996).zbMATHGoogle Scholar
  3. 3.
    J. Denzler and C. M. Brown. “Information theoretic sensor data selection for active object recognition and state estimation,” IEEE Trans. Pattern Anal. Mach. Intell. 24 (2), 145–157 (2002).Google Scholar
  4. 4.
    M. Jaderberg, K. Simonyan, A. Zisserman, et al. “Spatial transformer networks,” in Advances in Neural Information Processing Systems 28: Proc. Annual Conf. NIPS 2015 (Montreal, Canada, 2015), pp. 2017–2025.Google Scholar
  5. 5.
    J. Krause, H. Jin, J. Yang, and L. Fei–Fei, “Finegrained recognition without part annotations,” in Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, 2015), pp. 5546–5555.CrossRefGoogle Scholar
  6. 6.
    J. Krause, B. Sapp, A. Howard, H. Zhou, A. Toshev, T. Duerig, J. Philbin, and L. Fei–Fei, “The unreasonable effectiveness of noisy data for fine–grained recognition,” in Computer Vision–ECCV 2016, Proc. 14th European Conf., Part II, Ed. by B. Leibe et al., Lecture Notes in Computer Science (Springer, Cham, 2016), Vol. 9906, pp. 301–320.Google Scholar
  7. 7.
    B. Linghu and B.–Y. Sun, “Constructing effective SVM ensembles for image classification,” in Proc. 2010 3rd International Symposium on Knowledge Acquisition and Modeling (KAM) (Wuhan, China, 2010), IEEE, pp. 80–83.Google Scholar
  8. 8.
    X. Liu, T. Xia, J. Wang, and Y. Lin, “Fully convolutional attention localization networks: Efficient attention localization for fine–grained recognition,” arXiv:1603.06765 (2016). Scholar
  9. 9.
    V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, “Recurrent models of visual attention,” in Advances in Neural Information Processing Systems 27: Proc. Annual Conf. NIPS 2014 (Montreal, Canada, 2014), pp. 2204–2212.Google Scholar
  10. 10.
    P. Sermanet, A. Frome, and E. Real, “Attention for finegrained categorization,” arXiv:1412.7054 (2014). Scholar
  11. 11.
    M. Simon and E. Rodner, “Neural activation constellations: Unsupervised part model discovery with convolutional networks,” in Proc. 2015 IEEE Int. Conf. on Computer Vision (ICCV) (Santiago, Chile, 2015), pp. 1143–1151.Google Scholar
  12. 12.
    M. Simon, E. Rodner, and J. Denzler, “Part detector discovery in deep convolutional neural networks,” in Computer Vision–ACCV 2014, Proc. 12th Asian Conference on Computer Vision, Ed. by D. Cremers et al., Lecture Notes in Computer Science (Springer, Cham, 2014), Vol. 9004, pp. 162–177.Google Scholar
  13. 13.
    K. Simonyan and A. Zisserman. “Very deep convolutional networks for large–scale image recognition,” CoRR, abs/1409.1556 (2014). Scholar
  14. 14.
    C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The Caltech–UCSD Birds–200–2011 Dataset, Technical Report CNS–TR–2011–001 (California Institute of Technology, 2011).Google Scholar
  15. 15.
    S.–J. Wang, A. Mathew, Y. Chen, L.–F. Xi, L. Ma, and J. Lee, “Empirical analysis of support vector machine ensemble classifiers,” Expert Syst. Appl. 36 (3), Part 2, 6466–6476 (2009).CrossRefGoogle Scholar
  16. 16.
    H. Zheng, J. Fu, T. Mei, and J. Luo, “Learning multiattention convolutional neural network for fine–grained image recognition,” in Proc. 2017 IEEE Int. Conf. on Computer Vision (ICCV) (Venice, Italy, 2017), pp. 5219–5227.CrossRefGoogle Scholar

Copyright information

© Pleiades Publishing, Ltd. 2018

Authors and Affiliations

  1. 1.Computer Vision GroupFriedrich Schiller University JenaJenaGermany

Personalised recommendations