Modulating Shape Features by Color Attention for Object Recognition

Khan, Fahad Shahbaz; van de Weijer, Joost; Vanrell, Maria

doi:10.1007/s11263-011-0495-2

Modulating Shape Features by Color Attention for Object Recognition

Published: 29 September 2011

Volume 98, pages 49–64, (2012)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Fahad Shahbaz Khan¹,
Joost van de Weijer¹ &
Maria Vanrell¹

1270 Accesses
109 Citations
Explore all metrics

Abstract

Bag-of-words based image representation is a successful approach for object recognition. Generally, the subsequent stages of the process: feature detection, feature description, vocabulary construction and image representation are performed independent of the intentioned object classes to be detected. In such a framework, it was found that the combination of different image cues, such as shape and color, often obtains below expected results.

This paper presents a novel method for recognizing object categories when using multiple cues by separately processing the shape and color cues and combining them by modulating the shape features by category-specific color attention. Color is used to compute bottom-up and top-down attention maps. Subsequently, these color attention maps are used to modulate the weights of the shape features. In regions with higher attention shape features are given more weight than in regions with low attention.

We compare our approach with existing methods that combine color and shape cues on five data sets containing varied importance of both cues, namely, Soccer (color predominance), Flower (color and shape parity), PASCAL VOC 2007 and 2009 (shape predominance) and Caltech-101 (color co-interference). The experiments clearly demonstrate that in all five data sets our proposed framework significantly outperforms existing methods for combining color and shape information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bach, F. (2008). Exploring large feature spaces with hierarchical multiple kernel learning. In NIPS.
Google Scholar
Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via plsa. In ECCV.
Google Scholar
Bosch, A., Zisserman, A., & Munoz, X. (2007a). Image classification using random forests and ferns. In ICCV.
Google Scholar
Bosch, A., Zisserman, A., & Munoz, X. (2007b). Representing shape with a spatial pyramid kernel. In CIVR.
Google Scholar
Bosch, A., Zisserman, A., & Munoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4), 712–727.
Article Google Scholar
Burghouts, G. J., & Geusebroek, J. M. (2009). Performance evaluation of local colour invariants. Computer Vision and Image Understanding, 113, 48–62.
Article Google Scholar
Cai, H., Yan, F., & Mikolajczyk, K. (2010). Learning weights for codebook in image classification and retrieval. In CVPR.
Google Scholar
Dorko, G., & Schmid, C. (2003). Selection of scale-invariant parts for object class recognition. In ICCV.
Google Scholar
Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The Pascal visual object classes challenge 2007 results.
Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The Pascal visual object classes challenge 2008 (voc2008) results. [online]. available: http://www.pascal-network.org/challenges/voc/voc2008/.
Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The Pascal visual object classes challenge 2009 results.
Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In CVPR.
Google Scholar
Fulkerson, B., Vedaldi, A., & Soatto, S. (2008). Localizing objects with smart dictionaries. In ECCV.
Google Scholar
Gao, D., Han, S., & Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 989–1005.
Article Google Scholar
Gehler, P. V., & Nowozin, S. (2009). On feature combination for multiclass object classification. In Proc. ICCV.
Google Scholar
Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV.
Google Scholar
Ito, S., & Kubota, S. (2010). Object classification using heterogeneous co-occurrence features. In ECCV.
Google Scholar
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.
Article Google Scholar
Jost, T., Ouerhani, N., von Wartburg, R., Mri, R., & Hgli, H. (2005). Assessing the contribution of color in visual attention. Computer Vision and Image Understanding, 100(1–2), 107–123.
Article Google Scholar
Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In ICCV.
Google Scholar
Khan, F. S., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In ICCV.
Google Scholar
Lazebnik, S., & Raginsky, M. (2009). Supervised learning of quantizer codebooks by information loss minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(7), 1294–1309.
Article Google Scholar
Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1265–1278.
Article Google Scholar
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR.
Google Scholar
Li, L., Hu, W., Li, B., Yuan, C., Zhu, P., & Li, W. (2010a). Event recognition based on top-down motion attention. In Proc. ICPR.
Google Scholar
Li, L., Yuan, C., Hu, W., & Li, B. (2010b). Top-down cues for event recognition. In ACCV.
Google Scholar
Liu, T., Sun, J., Zheng, N., Tang, X., & Shum, H. (2007). Learning to detect a salient object. In CVPR.
Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant points. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Marszalek, M., Schmid, C., Harzallah, H., & van de Weijer, J. (2007). Learning object representation for visual object class recognition 2007. In Visual recognition challenge workshop in conjuncture with ICCV.
Google Scholar
Meur, O. L., Callet, P. L., Barba, D., & Thoreau, D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5), 802–817.
Article Google Scholar
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.
Article Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Gool, L. V. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72.
Article Google Scholar
Nilsback, M. E., & Zisserman, A. (2006). A visual vocabulary for flower classification. In CVPR.
Google Scholar
Nilsback, M. E., & Zisserman, A. (2007). Delving into the whorl of flower segmentation. In BMVC.
Google Scholar
Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In ICVGIP.
Google Scholar
Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In ECCV.
Google Scholar
Oliva, A., & Torralba, A. B. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Article MATH Google Scholar
Orabona, F., Luo, J., & Caputo, B. (2010). Online-batch strongly convex multi kernel learning. In CVPR.
Google Scholar
Perronnin, F. (2008). Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1243–1256.
Article Google Scholar
Peters, R. J., & Itti, L. (2007). Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention. In CVPR.
Google Scholar
Quelhas, P., & Odobez, J. M. (2006). Natural scene image modeling using color and texture visterms. In CIVR.
Google Scholar
Quelhas, P., Monay, F., Odobez, J., Gatica-Perez, D., Tuytelaars, T., & Gool, L. V. (2005). Modelling scenes with local descriptors and latent aspects. In ICCV.
Google Scholar
Rakotomamonjy, A., Bach, F., Canu, S., & Grandvalet, Y. (2007). More efficiency in multiple kernel learning. In ICML.
Google Scholar
Sivic, J., & Zisserman, A. (2003). Video google: a text retrieval approach to object matching in videos. In ICCV.
Google Scholar
Snoek, C. G. M., Worring, M., & Smeulders, A. W. M. (2005). Early versus late fusion in semantic video analysis. In ACM MM.
Google Scholar
Stottinger, J., Hanbury, A., Gevers, T., & Sebe, N. (2009). Lonely but attractive: sparse color salient points for object retrieval and categorization. In CVPR Workshops.
Google Scholar
Treisman, A. (1996). The binding problem. Current Opinion in Neurobiology, 6, 171–178.
Article Google Scholar
Tsotsos, J., Culhan, S. M., Lai, W. W., Davis, N., & Nuflo, F. (1995). Modeling visual-attention via selective tuning. Artificial Intelligence, 78, 507–545.
Article Google Scholar
Tuytelaars, T., & Schmid, C. (2007). Vector quantizing feature space with a regular lattice. In ICCV.
Google Scholar
van de Sande, K., Gevers, T., & Snoek, C. (2008). Evaluation of color descriptors for object and scene recognition. In CVPR.
Google Scholar
van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596.
Article Google Scholar
van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In ECCV.
Google Scholar
van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In ICIP.
Google Scholar
van de Weijer, J., Gevers, T., & Bagdanov, A. D. (2006). Boosting color saliency in image feature detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 150–156.
Article Google Scholar
van de Weijer, J., Schmid, C., Verbeek, J. J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transactions on Image Processing, 18(7), 1512–1524.
Article MathSciNet Google Scholar
Varma, M., & Babu, B. R. (2009). More generality in efficient multiple kernel learning. In ICML.
Google Scholar
Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. In ICCV.
Google Scholar
Vazquez, E., Gevers, T., Lucassen, M., van de Weijer, J., & Baldrich, R. (2010). Saliency of color image derivatives: a comparison between computational models and human perception. Journal of the Optical Society of America A, Online, 27(3), 1–20.
Google Scholar
Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In ICCV.
Google Scholar
Vogel, J., & Schiele, B. (2007). Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2), 133–157.
Article Google Scholar
Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19, 1395–1407.
Article MATH Google Scholar
Wettschereck, D., Aha, D. W., & Mohri, T. (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11, 273–314.
Article Google Scholar
Winn, J. M., Criminisi, A., & Minka, T. P. (2005). Object categorization by learned universal visual dictionary. In ICCV.
Google Scholar
Wolfe, J. M. (2000). The deployment of visual attention: two surprises. Search and target acquisition. NATO-RTO.
Wolfe, J. M., & Horowitz, T. (2004). What attributes guide the deployment of visual attention and how do they do it? Nature Reviews Neuroscience, 5, 1–7.
Article Google Scholar
Xie, N., Ling, H., Hu, W., & Zhang, X. (2010). Use bin-ratio information for category and scene classification. In CVPR.
Google Scholar
Yang, L., Jin, R., Sukthankar, R., & Jurie, F. (2008). Unifying discriminative visual codebook generation with classifier training for object category recognition. In CVPR.
Google Scholar
Zhou, X., Yu, K., Zhang, T., & Huang, T. S. (2010). Image classification using super-vector coding of local image descriptors. In ECCV.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Vision Centre Barcelona, Universitat Autonoma de Barcelona, Barcelona, Spain
Fahad Shahbaz Khan, Joost van de Weijer & Maria Vanrell

Authors

Fahad Shahbaz Khan
View author publications
You can also search for this author in PubMed Google Scholar
Joost van de Weijer
View author publications
You can also search for this author in PubMed Google Scholar
Maria Vanrell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fahad Shahbaz Khan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, F.S., van de Weijer, J. & Vanrell, M. Modulating Shape Features by Color Attention for Object Recognition. Int J Comput Vis 98, 49–64 (2012). https://doi.org/10.1007/s11263-011-0495-2

Download citation

Received: 25 October 2010
Accepted: 02 September 2011
Published: 29 September 2011
Issue Date: May 2012
DOI: https://doi.org/10.1007/s11263-011-0495-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modulating Shape Features by Color Attention for Object Recognition

Abstract

Access this article

Similar content being viewed by others

Fusing Color and Shape for Bag-of-Words Based Object Recognition

Adding Color Information to Spatially-Enhanced, Bag-of-Visual-Words Models

Image Classification Based on Modified BOW Model

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modulating Shape Features by Color Attention for Object Recognition

Abstract

Access this article

Similar content being viewed by others

Fusing Color and Shape for Bag-of-Words Based Object Recognition

Adding Color Information to Spatially-Enhanced, Bag-of-Visual-Words Models

Image Classification Based on Modified BOW Model

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation