Advertisement

Modulating Shape Features by Color Attention for Object Recognition

  • Fahad Shahbaz KhanEmail author
  • Joost van de Weijer
  • Maria Vanrell
Article

Abstract

Bag-of-words based image representation is a successful approach for object recognition. Generally, the subsequent stages of the process: feature detection, feature description, vocabulary construction and image representation are performed independent of the intentioned object classes to be detected. In such a framework, it was found that the combination of different image cues, such as shape and color, often obtains below expected results.

This paper presents a novel method for recognizing object categories when using multiple cues by separately processing the shape and color cues and combining them by modulating the shape features by category-specific color attention. Color is used to compute bottom-up and top-down attention maps. Subsequently, these color attention maps are used to modulate the weights of the shape features. In regions with higher attention shape features are given more weight than in regions with low attention.

We compare our approach with existing methods that combine color and shape cues on five data sets containing varied importance of both cues, namely, Soccer (color predominance), Flower (color and shape parity), PASCAL VOC 2007 and 2009 (shape predominance) and Caltech-101 (color co-interference). The experiments clearly demonstrate that in all five data sets our proposed framework significantly outperforms existing methods for combining color and shape information.

Keywords

Color features Image representation Object recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bach, F. (2008). Exploring large feature spaces with hierarchical multiple kernel learning. In NIPS. Google Scholar
  2. Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via plsa. In ECCV. Google Scholar
  3. Bosch, A., Zisserman, A., & Munoz, X. (2007a). Image classification using random forests and ferns. In ICCV. Google Scholar
  4. Bosch, A., Zisserman, A., & Munoz, X. (2007b). Representing shape with a spatial pyramid kernel. In CIVR. Google Scholar
  5. Bosch, A., Zisserman, A., & Munoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4), 712–727. CrossRefGoogle Scholar
  6. Burghouts, G. J., & Geusebroek, J. M. (2009). Performance evaluation of local colour invariants. Computer Vision and Image Understanding, 113, 48–62. CrossRefGoogle Scholar
  7. Cai, H., Yan, F., & Mikolajczyk, K. (2010). Learning weights for codebook in image classification and retrieval. In CVPR. Google Scholar
  8. Dorko, G., & Schmid, C. (2003). Selection of scale-invariant parts for object class recognition. In ICCV. Google Scholar
  9. Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The Pascal visual object classes challenge 2007 results. Google Scholar
  10. Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The Pascal visual object classes challenge 2008 (voc2008) results. [online]. available: http://www.pascal-network.org/challenges/voc/voc2008/.
  11. Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The Pascal visual object classes challenge 2009 results. Google Scholar
  12. Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In CVPR. Google Scholar
  13. Fulkerson, B., Vedaldi, A., & Soatto, S. (2008). Localizing objects with smart dictionaries. In ECCV. Google Scholar
  14. Gao, D., Han, S., & Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 989–1005. CrossRefGoogle Scholar
  15. Gehler, P. V., & Nowozin, S. (2009). On feature combination for multiclass object classification. In Proc. ICCV. Google Scholar
  16. Harzallah, H., Jurie, F., & Schmid, C. (2009). Combining efficient object localization and image classification. In ICCV. Google Scholar
  17. Ito, S., & Kubota, S. (2010). Object classification using heterogeneous co-occurrence features. In ECCV. Google Scholar
  18. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259. CrossRefGoogle Scholar
  19. Jost, T., Ouerhani, N., von Wartburg, R., Mri, R., & Hgli, H. (2005). Assessing the contribution of color in visual attention. Computer Vision and Image Understanding, 100(1–2), 107–123. CrossRefGoogle Scholar
  20. Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In ICCV. Google Scholar
  21. Khan, F. S., van de Weijer, J., & Vanrell, M. (2009). Top-down color attention for object recognition. In ICCV. Google Scholar
  22. Lazebnik, S., & Raginsky, M. (2009). Supervised learning of quantizer codebooks by information loss minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(7), 1294–1309. CrossRefGoogle Scholar
  23. Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1265–1278. CrossRefGoogle Scholar
  24. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR. Google Scholar
  25. Li, L., Hu, W., Li, B., Yuan, C., Zhu, P., & Li, W. (2010a). Event recognition based on top-down motion attention. In Proc. ICPR. Google Scholar
  26. Li, L., Yuan, C., Hu, W., & Li, B. (2010b). Top-down cues for event recognition. In ACCV. Google Scholar
  27. Liu, T., Sun, J., Zheng, N., Tang, X., & Shum, H. (2007). Learning to detect a salient object. In CVPR. Google Scholar
  28. Lowe, D. G. (2004). Distinctive image features from scale-invariant points. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  29. Marszalek, M., Schmid, C., Harzallah, H., & van de Weijer, J. (2007). Learning object representation for visual object class recognition 2007. In Visual recognition challenge workshop in conjuncture with ICCV. Google Scholar
  30. Meur, O. L., Callet, P. L., Barba, D., & Thoreau, D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5), 802–817. CrossRefGoogle Scholar
  31. Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630. CrossRefGoogle Scholar
  32. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Gool, L. V. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72. CrossRefGoogle Scholar
  33. Nilsback, M. E., & Zisserman, A. (2006). A visual vocabulary for flower classification. In CVPR. Google Scholar
  34. Nilsback, M. E., & Zisserman, A. (2007). Delving into the whorl of flower segmentation. In BMVC. Google Scholar
  35. Nilsback, M. E., & Zisserman, A. (2008). Automated flower classification over a large number of classes. In ICVGIP. Google Scholar
  36. Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In ECCV. Google Scholar
  37. Oliva, A., & Torralba, A. B. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175. zbMATHCrossRefGoogle Scholar
  38. Orabona, F., Luo, J., & Caputo, B. (2010). Online-batch strongly convex multi kernel learning. In CVPR. Google Scholar
  39. Perronnin, F. (2008). Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1243–1256. CrossRefGoogle Scholar
  40. Peters, R. J., & Itti, L. (2007). Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention. In CVPR. Google Scholar
  41. Quelhas, P., & Odobez, J. M. (2006). Natural scene image modeling using color and texture visterms. In CIVR. Google Scholar
  42. Quelhas, P., Monay, F., Odobez, J., Gatica-Perez, D., Tuytelaars, T., & Gool, L. V. (2005). Modelling scenes with local descriptors and latent aspects. In ICCV. Google Scholar
  43. Rakotomamonjy, A., Bach, F., Canu, S., & Grandvalet, Y. (2007). More efficiency in multiple kernel learning. In ICML. Google Scholar
  44. Sivic, J., & Zisserman, A. (2003). Video google: a text retrieval approach to object matching in videos. In ICCV. Google Scholar
  45. Snoek, C. G. M., Worring, M., & Smeulders, A. W. M. (2005). Early versus late fusion in semantic video analysis. In ACM MM. Google Scholar
  46. Stottinger, J., Hanbury, A., Gevers, T., & Sebe, N. (2009). Lonely but attractive: sparse color salient points for object retrieval and categorization. In CVPR Workshops. Google Scholar
  47. Treisman, A. (1996). The binding problem. Current Opinion in Neurobiology, 6, 171–178. CrossRefGoogle Scholar
  48. Tsotsos, J., Culhan, S. M., Lai, W. W., Davis, N., & Nuflo, F. (1995). Modeling visual-attention via selective tuning. Artificial Intelligence, 78, 507–545. CrossRefGoogle Scholar
  49. Tuytelaars, T., & Schmid, C. (2007). Vector quantizing feature space with a regular lattice. In ICCV. Google Scholar
  50. van de Sande, K., Gevers, T., & Snoek, C. (2008). Evaluation of color descriptors for object and scene recognition. In CVPR. Google Scholar
  51. van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596. CrossRefGoogle Scholar
  52. van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In ECCV. Google Scholar
  53. van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In ICIP. Google Scholar
  54. van de Weijer, J., Gevers, T., & Bagdanov, A. D. (2006). Boosting color saliency in image feature detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 150–156. CrossRefGoogle Scholar
  55. van de Weijer, J., Schmid, C., Verbeek, J. J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transactions on Image Processing, 18(7), 1512–1524. MathSciNetCrossRefGoogle Scholar
  56. Varma, M., & Babu, B. R. (2009). More generality in efficient multiple kernel learning. In ICML. Google Scholar
  57. Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. In ICCV. Google Scholar
  58. Vazquez, E., Gevers, T., Lucassen, M., van de Weijer, J., & Baldrich, R. (2010). Saliency of color image derivatives: a comparison between computational models and human perception. Journal of the Optical Society of America A, Online, 27(3), 1–20. Google Scholar
  59. Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In ICCV. Google Scholar
  60. Vogel, J., & Schiele, B. (2007). Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 72(2), 133–157. CrossRefGoogle Scholar
  61. Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19, 1395–1407. zbMATHCrossRefGoogle Scholar
  62. Wettschereck, D., Aha, D. W., & Mohri, T. (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11, 273–314. CrossRefGoogle Scholar
  63. Winn, J. M., Criminisi, A., & Minka, T. P. (2005). Object categorization by learned universal visual dictionary. In ICCV. Google Scholar
  64. Wolfe, J. M. (2000). The deployment of visual attention: two surprises. Search and target acquisition. NATO-RTO. Google Scholar
  65. Wolfe, J. M., & Horowitz, T. (2004). What attributes guide the deployment of visual attention and how do they do it? Nature Reviews Neuroscience, 5, 1–7. CrossRefGoogle Scholar
  66. Xie, N., Ling, H., Hu, W., & Zhang, X. (2010). Use bin-ratio information for category and scene classification. In CVPR. Google Scholar
  67. Yang, L., Jin, R., Sukthankar, R., & Jurie, F. (2008). Unifying discriminative visual codebook generation with classifier training for object category recognition. In CVPR. Google Scholar
  68. Zhou, X., Yu, K., Zhang, T., & Huang, T. S. (2010). Image classification using super-vector coding of local image descriptors. In ECCV. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Fahad Shahbaz Khan
    • 1
    Email author
  • Joost van de Weijer
    • 1
  • Maria Vanrell
    • 1
  1. 1.Computer Vision Centre BarcelonaUniversitat Autonoma de BarcelonaBarcelonaSpain

Personalised recommendations