Skip to main content
Log in

Modulating Shape Features by Color Attention for Object Recognition

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Bag-of-words based image representation is a successful approach for object recognition. Generally, the subsequent stages of the process: feature detection, feature description, vocabulary construction and image representation are performed independent of the intentioned object classes to be detected. In such a framework, it was found that the combination of different image cues, such as shape and color, often obtains below expected results.

This paper presents a novel method for recognizing object categories when using multiple cues by separately processing the shape and color cues and combining them by modulating the shape features by category-specific color attention. Color is used to compute bottom-up and top-down attention maps. Subsequently, these color attention maps are used to modulate the weights of the shape features. In regions with higher attention shape features are given more weight than in regions with low attention.

We compare our approach with existing methods that combine color and shape cues on five data sets containing varied importance of both cues, namely, Soccer (color predominance), Flower (color and shape parity), PASCAL VOC 2007 and 2009 (shape predominance) and Caltech-101 (color co-interference). The experiments clearly demonstrate that in all five data sets our proposed framework significantly outperforms existing methods for combining color and shape information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bach, F. (2008). Exploring large feature spaces with hierarchical multiple kernel learning. In NIPS.

    Google Scholar 

  • Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via plsa. In ECCV.

    Google Scholar 

  • Bosch, A., Zisserman, A., & Munoz, X. (2007a). Image classification using random forests and ferns. In ICCV.

    Google Scholar 

  • Bosch, A., Zisserman, A., & Munoz, X. (2007b). Representing shape with a spatial pyramid kernel. In CIVR.

    Google Scholar 

  • Bosch, A., Zisserman, A., & Munoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4), 712–727.

    Article  Google Scholar 

  • Burghouts, G. J., & Geusebroek, J. M. (2009). Performance evaluation of local colour invariants. Computer Vision and Image Understanding, 113, 48–62.

    Article  Google Scholar 

  • Cai, H., Yan, F., & Mikolajczyk, K. (2010). Learning weights for codebook in image classification and retrieval. In CVPR.

    Google Scholar 

  • Dorko, G., & Schmid, C. (2003). Selection of scale-invariant parts for object class recognition. In ICCV.

    Google Scholar 

  • Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The Pascal visual object classes challenge 2007 results.

  • Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The Pascal visual object classes challenge 2008 (voc2008) results. [online]. available: http://www.pascal-network.org/challenges/voc/voc2008/.

  • Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The Pascal visual object classes challenge 2009 results.

  • Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In CVPR.

    Google Scholar 

  • Fulkerson, B., Vedaldi, A., & Soatto, S. (2008). Localizing objects with smart dictionaries. In ECCV.

    Google Scholar 

  • Gao, D., Han, S., & Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 989–1005.

    Article  Google Scholar 

  • Gehler, P. V., & Nowozin, S. (2009). On feature combination for multiclass object classification. In Proc. ICCV.

    Google Scholar 

  • Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV.

    Google Scholar 

  • Ito, S., & Kubota, S. (2010). Object classification using heterogeneous co-occurrence features. In ECCV.

    Google Scholar 

  • Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.

    Article  Google Scholar 

  • Jost, T., Ouerhani, N., von Wartburg, R., Mri, R., & Hgli, H. (2005). Assessing the contribution of color in visual attention. Computer Vision and Image Understanding, 100(1–2), 107–123.

    Article  Google Scholar 

  • Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In ICCV.

    Google Scholar 

  • Khan, F. S., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In ICCV.

    Google Scholar 

  • Lazebnik, S., & Raginsky, M. (2009). Supervised learning of quantizer codebooks by information loss minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(7), 1294–1309.

    Article  Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1265–1278.

    Article  Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR.

    Google Scholar 

  • Li, L., Hu, W., Li, B., Yuan, C., Zhu, P., & Li, W. (2010a). Event recognition based on top-down motion attention. In Proc. ICPR.

    Google Scholar 

  • Li, L., Yuan, C., Hu, W., & Li, B. (2010b). Top-down cues for event recognition. In ACCV.

    Google Scholar 

  • Liu, T., Sun, J., Zheng, N., Tang, X., & Shum, H. (2007). Learning to detect a salient object. In CVPR.

    Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant points. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Marszalek, M., Schmid, C., Harzallah, H., & van de Weijer, J. (2007). Learning object representation for visual object class recognition 2007. In Visual recognition challenge workshop in conjuncture with ICCV.

    Google Scholar 

  • Meur, O. L., Callet, P. L., Barba, D., & Thoreau, D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5), 802–817.

    Article  Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.

    Article  Google Scholar 

  • Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Gool, L. V. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72.

    Article  Google Scholar 

  • Nilsback, M. E., & Zisserman, A. (2006). A visual vocabulary for flower classification. In CVPR.

    Google Scholar 

  • Nilsback, M. E., & Zisserman, A. (2007). Delving into the whorl of flower segmentation. In BMVC.

    Google Scholar 

  • Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In ICVGIP.

    Google Scholar 

  • Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In ECCV.

    Google Scholar 

  • Oliva, A., & Torralba, A. B. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

    Article  MATH  Google Scholar 

  • Orabona, F., Luo, J., & Caputo, B. (2010). Online-batch strongly convex multi kernel learning. In CVPR.

    Google Scholar 

  • Perronnin, F. (2008). Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1243–1256.

    Article  Google Scholar 

  • Peters, R. J., & Itti, L. (2007). Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention. In CVPR.

    Google Scholar 

  • Quelhas, P., & Odobez, J. M. (2006). Natural scene image modeling using color and texture visterms. In CIVR.

    Google Scholar 

  • Quelhas, P., Monay, F., Odobez, J., Gatica-Perez, D., Tuytelaars, T., & Gool, L. V. (2005). Modelling scenes with local descriptors and latent aspects. In ICCV.

    Google Scholar 

  • Rakotomamonjy, A., Bach, F., Canu, S., & Grandvalet, Y. (2007). More efficiency in multiple kernel learning. In ICML.

    Google Scholar 

  • Sivic, J., & Zisserman, A. (2003). Video google: a text retrieval approach to object matching in videos. In ICCV.

    Google Scholar 

  • Snoek, C. G. M., Worring, M., & Smeulders, A. W. M. (2005). Early versus late fusion in semantic video analysis. In ACM MM.

    Google Scholar 

  • Stottinger, J., Hanbury, A., Gevers, T., & Sebe, N. (2009). Lonely but attractive: sparse color salient points for object retrieval and categorization. In CVPR Workshops.

    Google Scholar 

  • Treisman, A. (1996). The binding problem. Current Opinion in Neurobiology, 6, 171–178.

    Article  Google Scholar 

  • Tsotsos, J., Culhan, S. M., Lai, W. W., Davis, N., & Nuflo, F. (1995). Modeling visual-attention via selective tuning. Artificial Intelligence, 78, 507–545.

    Article  Google Scholar 

  • Tuytelaars, T., & Schmid, C. (2007). Vector quantizing feature space with a regular lattice. In ICCV.

    Google Scholar 

  • van de Sande, K., Gevers, T., & Snoek, C. (2008). Evaluation of color descriptors for object and scene recognition. In CVPR.

    Google Scholar 

  • van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596.

    Article  Google Scholar 

  • van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In ECCV.

    Google Scholar 

  • van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In ICIP.

    Google Scholar 

  • van de Weijer, J., Gevers, T., & Bagdanov, A. D. (2006). Boosting color saliency in image feature detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 150–156.

    Article  Google Scholar 

  • van de Weijer, J., Schmid, C., Verbeek, J. J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transactions on Image Processing, 18(7), 1512–1524.

    Article  MathSciNet  Google Scholar 

  • Varma, M., & Babu, B. R. (2009). More generality in efficient multiple kernel learning. In ICML.

    Google Scholar 

  • Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. In ICCV.

    Google Scholar 

  • Vazquez, E., Gevers, T., Lucassen, M., van de Weijer, J., & Baldrich, R. (2010). Saliency of color image derivatives: a comparison between computational models and human perception. Journal of the Optical Society of America A, Online, 27(3), 1–20.

    Google Scholar 

  • Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In ICCV.

    Google Scholar 

  • Vogel, J., & Schiele, B. (2007). Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2), 133–157.

    Article  Google Scholar 

  • Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19, 1395–1407.

    Article  MATH  Google Scholar 

  • Wettschereck, D., Aha, D. W., & Mohri, T. (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11, 273–314.

    Article  Google Scholar 

  • Winn, J. M., Criminisi, A., & Minka, T. P. (2005). Object categorization by learned universal visual dictionary. In ICCV.

    Google Scholar 

  • Wolfe, J. M. (2000). The deployment of visual attention: two surprises. Search and target acquisition. NATO-RTO.

  • Wolfe, J. M., & Horowitz, T. (2004). What attributes guide the deployment of visual attention and how do they do it? Nature Reviews Neuroscience, 5, 1–7.

    Article  Google Scholar 

  • Xie, N., Ling, H., Hu, W., & Zhang, X. (2010). Use bin-ratio information for category and scene classification. In CVPR.

    Google Scholar 

  • Yang, L., Jin, R., Sukthankar, R., & Jurie, F. (2008). Unifying discriminative visual codebook generation with classifier training for object category recognition. In CVPR.

    Google Scholar 

  • Zhou, X., Yu, K., Zhang, T., & Huang, T. S. (2010). Image classification using super-vector coding of local image descriptors. In ECCV.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fahad Shahbaz Khan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, F.S., van de Weijer, J. & Vanrell, M. Modulating Shape Features by Color Attention for Object Recognition. Int J Comput Vis 98, 49–64 (2012). https://doi.org/10.1007/s11263-011-0495-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-011-0495-2

Keywords

Navigation