International Journal of Computer Vision

, Volume 105, Issue 3, pp 205–221 | Cite as

Coloring Action Recognition in Still Images

  • Fahad Shahbaz Khan
  • Rao Muhammad Anwer
  • Joost van de Weijer
  • Andrew D. Bagdanov
  • Antonio M. Lopez
  • Michael Felsberg
Article

Abstract

In this article we investigate the problem of human action recognition in static images. By action recognition we intend a class of problems which includes both action classification and action detection (i.e. simultaneous localization and classification). Bag-of-words image representations yield promising results for action classification, and deformable part models perform very well object detection. The representations for action recognition typically use only shape cues and ignore color information. Inspired by the recent success of color in image classification and object detection, we investigate the potential of color for action classification and detection in static images. We perform a comprehensive evaluation of color descriptors and fusion approaches for action recognition. Experiments were conducted on the three datasets most used for benchmarking action recognition in still images: Willow, PASCAL VOC 2010 and Stanford-40. Our experiments demonstrate that incorporating color information considerably improves recognition performance, and that a descriptor based on color names outperforms pure color descriptors. Our experiments demonstrate that late fusion of color and shape information outperforms other approaches on action recognition. Finally, we show that the different color–shape fusion approaches result in complementary information and combining them yields state-of-the-art performance for action classification.

Keywords

Color features Image representation Action recognition 

References

  1. Benavente, R., Vanrell, M., & Baldrich, R. (2008). Parametric fuzzy sets for automatic color naming. Journal of the Optical Society of America A, 25(10), 2582–2593.CrossRefGoogle Scholar
  2. Berlin, B., & Kay, P. (1969). Basic color terms: Their universality and evolution. Berkeley, CA: University of California Press.Google Scholar
  3. Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via plsa. In Proceedings of the European conference on computer vision.Google Scholar
  4. Bosch, A., Zisserman, A., & Munoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4), 712–727.CrossRefGoogle Scholar
  5. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Conference on computer vision and pattern recognition.Google Scholar
  6. Delaitre, V., Laptev, I., & Sivic, J. (2010). Recognizing human actions in still images: a study of bag-of-features and part-based representations. In Proceedings of the British machine vision conference.Google Scholar
  7. Delaitre, V., Sivic, J., & Laptev, I. (2011). Learning person-object interactions for action recognition in still images. In Advances in neural information processing systems.Google Scholar
  8. Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In Proceedings of the European conference on computer vision Google Scholar
  9. Elfiky, N., Khan, F. S., van de Weijer, J., & Gonzalez, J. (2012). Discriminative compact pyramids for object and scene recognition. Pattern Recognition, 45(4), 1627–1636.MATHCrossRefGoogle Scholar
  10. Everingham, M., Gool, L.V., Williams, C.K.I., JWinn, Zisserman A. (2009). The pascal visual object classes challenge 2009 (VOC2009) results.Google Scholar
  11. Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.Google Scholar
  12. Felsberg, M., & Hedborg, J. (2007). Real-time view-based pose recognition and interpolation for tracking initialization. Journal of Real-Time Image Processing, 2(3), 103–115.CrossRefGoogle Scholar
  13. Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRefGoogle Scholar
  14. Gaidon, A., Harchaoui, Z., & Schmid, C. (2011). Actom sequence models for efficient action detection. In Conference on computer vision and pattern recognition.Google Scholar
  15. Gehler, P. V., & Nowozin, S. (2009). On feature combination for multiclass object classification. In Proceedings of IEEE international conference on computer vision.Google Scholar
  16. Geusebroek, J. M., van den Boomgaard, R., Smeulders, A. W. M., & Geerts, H. (2001). Color invariance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12), 1338–1350.CrossRefGoogle Scholar
  17. Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In European conference on computer vision.Google Scholar
  18. Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., & Huang, T. S. (2009). Action detection in complex scenes with spatial and temporal ambiguities. In Proceedings of IEEE international conference on computer vision.Google Scholar
  19. Khan, F. S., van de Weijer, J., Bagdanov, A. D., & Vanrell, M. (2011). Portmanteau vocabularies for multi-cue image representations. In Advances in neural information processing systems.Google Scholar
  20. Khan, F. S., Anwer, R. M., van de Weijer, J., Bagdanov, A. D., Vanrell, M., & Lopez, A. M. (2012a). Color attributes for object detection. In Conference on computer vision and pattern recognition.Google Scholar
  21. Khan, F. S., van de Weijer, J., & Vanrell, M. (2012b). Modulating shape features by color attention for object recognition. International Journal of Computer Vision, 98(1), 49–64.CrossRefGoogle Scholar
  22. Lan, Z. Z., Bao, L., Yu, S. I., Liu, W., & Hauptmann, A. G. (2012). Double fusion for multimedia event detection. In Multimedia Modeling.Google Scholar
  23. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision & pattern recognition.Google Scholar
  24. Lenz, R., Bui, T. H., & Hernandez-Andres, J. (2005). Group theoretical structure of spectral spaces. Journal of Mathematical Imaging and Vision, 23(3), 297–313.MathSciNetCrossRefGoogle Scholar
  25. Li, L. J., Su, H., Xing, E. P., & Li, F. F. (2010). Object bank: A high-level image representation for scene classification and semantic feature sparsification. In Advances in neural information processing systems.Google Scholar
  26. Lowe, D. G. (2004). Distinctive image features from scale-invariant points. International Journal of Computer Vision, 60(2), 91–110.CrossRefGoogle Scholar
  27. Maji, S., Bourdev, L. D., & Malik, J. (2011). Action recognition from a distributed representation of pose and appearance. In Computer vision and pattern recognition.Google Scholar
  28. Mullen, K. T. (1985). The contrast sensitivity of human colour vision to red–green and blue–yellow chromatic gratings. The Journal of Physiology, 359, 381–400.Google Scholar
  29. Pagani, A., Stricker, D., & Felsberg, M. (2009). Integral p-channels for fast and robust region matching. In Proceedings of international consortium for intergenerational programmes. Google Scholar
  30. Prest, A., Schmid, C., & Ferrari, V. (2012). Weakly supervised learning of interactions between humans and objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 601–614.Google Scholar
  31. van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596.Google Scholar
  32. Shapovalova, N., Gong, W., Pedersoli, M., Roca, F. X., & Gonzalez, J. (2011). On importance of interactions and context in human action recognition. In Iberian conference on pattern recognition and image analysis.Google Scholar
  33. Sharma, G., Jurie, F., & Schmid, C. (2012). Discriminative spatial saliency for image classification. In Conference on computer vision and pattern recognition.Google Scholar
  34. Sharma, G., Jurie, F., & Schmid, C. (2013). Expanded parts model for human attribute and action recognition in still images. In Conference on computer vision and pattern recognition.Google Scholar
  35. Tran, D., & Yuan, J. (2012). Max-margin structured output regression for spatio-temporal action localization. In Advances in neural information processing systems.Google Scholar
  36. Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In Proceedings of IEEE international conference on computer vision.Google Scholar
  37. Vigo, D. A. R., Khan, F. S., van de Weijer, J. & Gevers, T. (2010). The impact of color on bag-of-words based object recognition. In Indian council of philosophical research.Google Scholar
  38. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T. S., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In Conference on computer vision and pattern recognition.Google Scholar
  39. van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In Proceedings of the European conference on computer vision.Google Scholar
  40. van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In International consortium for intergenerational programmes.Google Scholar
  41. van de Weijer, J., Schmid, C., Verbeek, J. J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transaction in Image Processing (TIP), 18(7), 1512–1524.CrossRefGoogle Scholar
  42. Yao, B., & Li, F. F. (2012). Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1691–1703.MathSciNetCrossRefGoogle Scholar
  43. Yao, B., Jiang, X., Khosla, A., Lin, A. L., Guibas, L. J., & Li, F. F. (2011). Human action recognition by learning bases of action attributes and parts. In Proceedings of IEEE international conference on computer vision.Google Scholar
  44. Yuan, J., Liu, Z., & Wu, Y. (2011). Discriminative video pattern search for efficient action detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1728–1743.CrossRefGoogle Scholar
  45. Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object catergories: An in-depth study. A comprehensive study. International Journal of Computer Vision, 73(2), 213–218.CrossRefGoogle Scholar
  46. Zhang, J., Huang, K., Yu, Y., & Tan, T. (2010). Boosted local structured hog-lbp for object localization. In IEEE conference on computer vision & pattern recognition.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Fahad Shahbaz Khan
    • 1
  • Rao Muhammad Anwer
    • 2
  • Joost van de Weijer
    • 2
  • Andrew D. Bagdanov
    • 3
  • Antonio M. Lopez
    • 2
  • Michael Felsberg
    • 1
  1. 1.Computer Vision LaboratoryLinköping University LinköpingSweden
  2. 2.Computer Vision Centre BarcelonaUniversitat Autonoma de BarcelonaBarcelonaSpain
  3. 3.Media Integration and Communication Center University of FlorenceFlorenceItaly

Personalised recommendations