Tri-level Combination for Image Representation

  • Ruiying Li
  • Chunjie ZhangEmail author
  • Qingming HuangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9916)


The context of objects can provide auxiliary discrimination beyond objects. However, this effective information has not been fully explored. In this paper, we propose Tri-level Combination for Image Representation (TriCoIR) to solve the problem at three different levels: object intrinsic, strongly-related context and weakly-related context. Object intrinsic excludes external disturbances and more focuses on the objects themselves. Strongly-related context is cropped from the input image with a more loose bound to contain surrounding context. Weakly-related one is recovered from the image other than object for global context. First, strongly and weakly-related context are constructed from input images. Second, we make cascade transformations for more intrinsical object information, which depends on the consistency between generated global context and input images in the regions other than object. Finally, a joint representation is acquired based on these three level features. The experiments on two benchmark datasets prove the effectiveness of TriCoIR.


Image representation Object categorization Intrinsic and Context 



This work is supported by National Basic Research Program of China (973 Program): 2012CB316400 and 2015CB351802, National Natural Science Foundation of China: 61303154 and 61332016, the Open Project of Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences.


  1. 1.
    Chai, Y., Rahtu, E., Lempitsky, V., Gool, L., Zisserman, A.: TriCoS: a tri-level class-discriminative co-segmentation method for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 794–807. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33718-5_57 CrossRefGoogle Scholar
  2. 2.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  3. 3.
    Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)CrossRefGoogle Scholar
  4. 4.
    Heitz, G., Koller, D.: Learning spatial context: using stuff to find things. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 30–43. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88682-2_4 CrossRefGoogle Scholar
  5. 5.
    Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: CVPR, pp. 1–8 (2008)Google Scholar
  6. 6.
    Nguyen, M.H., Torresani, L., de la Torre, F., Rother, C.: Weakly supervised discriminative localization, classification: a joint learning process. In: ICCV, pp. 1925–1932 (2009)Google Scholar
  7. 7.
    Bilen, H., Namboodiri, V.P., Van Gool, L.J.: Object and action classification with latent variables. In: BMVC, p. 3 (2011)Google Scholar
  8. 8.
    Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: CVPR, pp. 1271–1278 (2009)Google Scholar
  9. 9.
    Russakovsky, O., Lin, Y., Yu, K., Fei-Fei, L.: Object-centric spatial pooling for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 1–15. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33709-3_1 CrossRefGoogle Scholar
  10. 10.
    Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 392–407. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10584-0_26 Google Scholar
  11. 11.
    Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: NIPS, pp. 487–495 (2014)Google Scholar
  12. 12.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, p. 4 (2014)Google Scholar
  13. 13.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset, Technical report (2011)Google Scholar
  14. 14.
    Nilsback, M.-E., Zisserman, A.: A visual vocabulary for flower classification. In: CVPR, pp. 1447–1454 (2006)Google Scholar
  15. 15.
    Yao, B., Khosla, A., Li, F.: Combining randomization and discrimination for fine-grained image categorization. In: CVPR, pp. 1577–1584 (2011)Google Scholar
  16. 16.
    Chen, Q., Song, Z., Hua, Y., Huang, Z., Yan, S.: Hierarchical matching with side information for image classification. In: CVPR, pp. 3426–3433 (2012)Google Scholar
  17. 17.
    Khan, F.S., Weijer, J., Bagdanov, A.D., Vanrell, M.: Portmanteau vocabularies for multi-cue image representation. In: NIPS, pp. 1323–1331 (2011)Google Scholar
  18. 18.
    Deng, J., Krause, J., Fei-Fei, L.: Fine-grained crowdsourcing for fine-grained recognition. In: CVPR, pp. 580–587 (2013)Google Scholar
  19. 19.
    Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: NIPS, pp. 244–252 (2010)Google Scholar
  20. 20.
    Yang, S., Bo, L., Wang, J., Shapiro, L.G.: Unsupervised template learning for fine-grained object recognition. In: NIPS, pp. 3122–3130 (2012)Google Scholar
  21. 21.
    Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: ICCV, pp. 729–736 (2013)Google Scholar
  22. 22.
    Gehler, P., Nowozin, S.: On feature combination for multiclass object classification. In: ICCV, pp. 221–228 (2009)Google Scholar
  23. 23.
    Awais, M., Yan, F., Mikolajczyk, K., Kittler, J.: Two-stage augmented kernel matrix for object recognition. In: Sansone, C., Kittler, J., Roli, F. (eds.) MCS 2011. LNCS, vol. 6713, pp. 137–146. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-21557-5_16 CrossRefGoogle Scholar
  24. 24.
    Yan, F., Mikolajczyk, K., Barnard, M., Cai, H., Kittler, J.: p norm multiple kernel Fisher discriminant analysis for object and image categorisation. In: CVPR, pp. 3626–3632 (2010)Google Scholar
  25. 25.
    Awais, M., Yan, F., Mikolajczyk, K., Kittler, J.: Augmented kernel matrix vs classifier fusion for object recognition. In: BMVC, p. 60.1 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.School of Computer and Control EngineeringUniversity of Chinese Academy of SciencesBeijingChina
  2. 2.Key Laboratory of Big Data Mining and Knowledge ManagementChinese Academy of SciencesBeijingChina
  3. 3.Key Laboratory of Intelligent Information Processing, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina

Personalised recommendations