Which Looks Like Which: Exploring Inter-class Relationships in Fine-Grained Visual Categorization

  • Jian Pu
  • Yu-Gang Jiang
  • Jun Wang
  • Xiangyang Xue
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8691)


Fine-grained visual categorization aims at classifying visual data at a subordinate level, e.g., identifying different species of birds. It is a highly challenging topic receiving significant research attention recently. Most existing works focused on the design of more discriminative feature representations to capture the subtle visual differences among categories. Very limited efforts were spent on the design of robust model learning algorithms. In this paper, we treat the training of each category classifier as a single learning task, and formulate a generic multiple task learning (MTL) framework to train multiple classifiers simultaneously. Different from the existing MTL methods, the proposed generic MTL algorithm enforces no structure assumptions and thus is more flexible in handling complex inter-class relationships. In particular, it is able to automatically discover both clusters of similar categories and outliers. We show that the objective of our generic MTL formulation can be solved using an iterative reweighted ℓ2 method. Through an extensive experimental validation, we demonstrate that our method outperforms several state-of-the-art approaches.


Fine-grained visual categorization inter-class relationship multiple task learning 


  1. 1.
    Angelova, A., Zhu, S.: Efficient object detection and segmentation for fine-grained recognition. In: CVPR (2013)Google Scholar
  2. 2.
    Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: NIPS (2007)Google Scholar
  3. 3.
    Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)CrossRefGoogle Scholar
  4. 4.
    Babenko, B., Branson, S., Belongie, S.: Similarity metrics for categorization: From monolithic to category specific. In: ICCV (2009)Google Scholar
  5. 5.
    Bar-Hillel, A., Weinshall, D.: Subordinate class recognition using relational object models. In: NIPS (2006)Google Scholar
  6. 6.
    Bart, E., Porteous, I., Perona, P., Welling, M.: Unsupervised learning of visual taxonomies. In: CVPR (2008)Google Scholar
  7. 7.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci. 2(1), 183–202 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Berg, T., Belhumeur, P.N.: How do you tell a blackbird from a crow? In: ICCV (2013)Google Scholar
  9. 9.
    Berg, T., Liu, J., Lee, S.W., Alexander, M.L., Jacobs, D.W., Belhumeur, P.N.: Birdsnap: Large-scale fine-grained visual categorization of birds. In: CVPR (2014)Google Scholar
  10. 10.
    Bo, L., Ren, X., Fox, D.: Kernel Descriptors for Visual Recognition. In: NIPS (2010)Google Scholar
  11. 11.
    Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 438–451. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Chai, Y., Rahtu, E., Lempitsky, V., Van Gool, L., Zisserman, A.: TriCoS: A tri-level class-discriminative co-segmentation method for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 794–807. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Chai, Y., Lempitsky, V., Zisserman, A.: Symbiotic segmentation and part localization for fine-grained categorization. In: ICCV (2013)Google Scholar
  15. 15.
    Chapelle, O.: Training a support vector machine in the primal. Neural. Comput. 19(5), 1155–1178 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Chen, J., Zhou, J., Ye, J.: Integrating low-rank and group-sparse structures for robust multi-task learning. In: KDD (2011)Google Scholar
  17. 17.
    Deng, J., Krause, J., Fei-Fei, L.: Fine-grained crowdsourcing for fine-grained recognition. In: CVPR (2013)Google Scholar
  18. 18.
    Duan, K., Parikh, D., Crandall, D., Grauman, K.: Discovering localized attributes for fine-grained recognition. In: CVPR (2012)Google Scholar
  19. 19.
    Farrell, R., Oza, O., Zhang, N., Morariu, V.I., Darrell, T., Davis, L.S.: Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance. In: ICCV (2011)Google Scholar
  20. 20.
    Fergus, R., Bernal, H., Weiss, Y., Torralba, A.: Semantic label sharing for learning with many categories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 762–775. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Gavves, E., Fernando, B., Snoek, C.G.M., Smeulders, A.W.M., Tuytelaars, T.: Fine-grained categorization by alignments. In: ICCV (2013)Google Scholar
  22. 22.
    Gong, P., Ye, J., Zhang, C.: Robust multi-task feature learning. In: KDD (2012)Google Scholar
  23. 23.
    Griffin, G., Perona, P.: Learning and using taxonomies for fast visual categorization. In: CVPR (2008)Google Scholar
  24. 24.
    Jalali, A., Ravikumar, P.D., Sanghavi, S., Ruan, C.: A dirty model for multi-task learning. In: NIPS (2010)Google Scholar
  25. 25.
    Kang, Z., Grauman, K., Sha, F.: Learning with whom to share in multi-task feature learning. In: ICML (2011)Google Scholar
  26. 26.
    Khan, F.S., Van De Weijer, J., Bagdanov, A.D., Vanrell, M.: Portmanteau vocabularies for multi-cue image representation. In: NIPS (2011)Google Scholar
  27. 27.
    Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on FGVC, CVPR (2011)Google Scholar
  28. 28.
    Kumar, A., Daumé III, H.: Learning task grouping and overlap in multi-task learning. In: ICML (2012)Google Scholar
  29. 29.
    Melacci, S., Belkin, M.: Laplacian Support Vector Machines Trained in the Primal. JMLR 12, 1149–1184 (2011)zbMATHMathSciNetGoogle Scholar
  30. 30.
    Salakhutdinov, R., Torralba, A., Tenenbaum, J.: Learning to share visual appearance for multiclass object detection. In: CVPR (2011)Google Scholar
  31. 31.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  32. 32.
    Su, H., Yu, A.W., Fei-Fei, L.: Efficient euclidean projections onto the intersection of norm balls. In: ICML (2012)Google Scholar
  33. 33.
    Todorovic, S., Ahuja, N.: Learning subcategory relevances for category recognition. In: CVPR (2008)Google Scholar
  34. 34.
    Wah, C., Branson, S., Perona, P., Belongie, S.: Multiclass recognition and part localization with humans in the loop. In: ICCV (2011)Google Scholar
  35. 35.
    Wang, H., Nie, F., Huang, H., Risacher, S.L., Ding, C.H.Q., Saykin, A.J., Shen, L.: Adni: Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance. In: ICCV (2011)Google Scholar
  36. 36.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T.S., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)Google Scholar
  37. 37.
    Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD Birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology (2010)Google Scholar
  38. 38.
    Wipf, D.P., Nagarajan, S.S.: Iterative reweighted ℓ1 and ℓ2 methods for finding sparse solutions. J. Sel. Topics Signal Processing 4(2), 317–329 (2010)CrossRefGoogle Scholar
  39. 39.
    Xie, L., Tian, Q., Hong, R., Yan, S., Zhang, B.: Hierarchical Part Matching for Fine-Grained Visual Categorization. In: ICCV (2013)Google Scholar
  40. 40.
    Yang, S., Bo, L., Wang, J., Shapiro, L.: Unsupervised Template Learning for Fine-Grained Object Recognition. In: NIPS (2012)Google Scholar
  41. 41.
    Yao, B., Bradski, G., Fei-Fei, L.: A codebook-free and annotation-free approach for fine-grained image categorization. In: CVPR (2012)Google Scholar
  42. 42.
    Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: CVPR (2011)Google Scholar
  43. 43.
    Zhang, N., Farrell, R., Darrell, T.: Pose pooling kernels for sub-category recognition. In: CVPR (2012)Google Scholar
  44. 44.
    Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: ICCV (2013)Google Scholar
  45. 45.
    Zhou, J., Chen, J., Ye, J.: Clustered multi-task learning via alternating structure optimization. In: NIPS (2011)Google Scholar
  46. 46.
    Zweig, A., Weinshall, D.: Hierarchical regularization cascade for joint learning. In: ICML (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jian Pu
    • 1
  • Yu-Gang Jiang
    • 1
  • Jun Wang
    • 2
  • Xiangyang Xue
    • 1
  1. 1.School of Computer Science, Shanghai Key Laboratory of Intelligent Information ProcessingFudan UniversityShanghaiChina
  2. 2.IBM T. J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations