Advertisement

Actions in Still Web Images: Visualization, Detection and Retrieval

  • Piji Li
  • Jun Ma
  • Shuai Gao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6897)

Abstract

We describe a framework for human action retrieval in still web images by verb queries, for instance “phoning”. Firstly, we build a group of visual discriminative instances for each action class, called “Exemplarlets”. Thereafter we employ Multiple Kernel Learning (MKL) to learn an optimal combination of histogram intersection kernels, each of which captures a state-of-the-art feature channel. Our features include the distribution of edges, dense visual words and feature descriptors at different levels of spatial pyramid. For a new image we can detect the hot-region using a sliding-window detector learnt via MKL. The hot-region can imply latent actions in the image. After the hot-region has been detected, we build a inverted index in the visual search path, which we called Visual Inverted Index (VII). Finally, fusing the visual search path and the text search path, we can get the accurate results either relevant to text or to visual information. We show both the detection and retrieval results on our newly collected dataset of six actions as well as demonstrate improved performance over existing methods.

Keywords

Web image retrieval action detection multiple kernel learning visual inverted index exemplarlet 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval. Addison-Wesley Harlow, England (1999)Google Scholar
  2. 2.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
  3. 3.
    Chi, M., Zhang, P., Zhao, Y., Feng, R., Xue, X.: Web image retrieval reranking with multi-view clustering. In: Proceedings of the 18th International Conference on World Wide Web, pp. 1189–1190. ACM, New York (2009)CrossRefGoogle Scholar
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893. IEEE, Los Alamitos (2005)Google Scholar
  5. 5.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: British Machine Vision Conference (2009)Google Scholar
  6. 6.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)CrossRefGoogle Scholar
  7. 7.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (2009)Google Scholar
  8. 8.
    Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(10), 1775–1789 (2009)CrossRefGoogle Scholar
  9. 9.
    Ikizler, N., Cinbis, R., Pehlivan, S., Duygulu, P.: Recognizing actions from still images. In: 19th International Conference on Pattern Recognition, pp. 1–4. IEEE, Los Alamitos (2009)Google Scholar
  10. 10.
    Ikizler-Cinbis, N., Cinbis, R., Sclaroff, S.: Learning actions from the web. In: IEEE 12th International Conference on Computer Vision, pp. 995–1002. IEEE, Los Alamitos (2010)Google Scholar
  11. 11.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, Los Alamitos (2008)Google Scholar
  12. 12.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE, Los Alamitos (2006)Google Scholar
  13. 13.
    Li, P., Zhang, L., Ma, J.: Dual-ranking for web image retrieval. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 166–173. ACM, New York (2010)CrossRefGoogle Scholar
  14. 14.
    Moeslund, T., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding 104(2-3), 90–126 (2006)CrossRefGoogle Scholar
  15. 15.
    Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299–318 (2008)CrossRefGoogle Scholar
  16. 16.
    Popescu, A., Moëllic, P., Kanellos, I., Landais, R.: Lightweight web image reranking. In: Proceedings of the seventeen ACM International Conference on Multimedia, pp. 657–660. ACM, New York (2009)CrossRefGoogle Scholar
  17. 17.
    Tian, X., Tao, D., Hua, X., Wu, X.: Active reranking for web image search. IEEE Transactions on Image Processing 19(3), 805–820 (2010)MathSciNetCrossRefGoogle Scholar
  18. 18.
    van Leuken, R., Garcia, L., Olivares, X., van Zwol, R.: Visual diversification of image search results. In: Proceedings of the 18th International Conference on World Wide Web, pp. 341–350. ACM, New York (2009)CrossRefGoogle Scholar
  19. 19.
    Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off (2007)Google Scholar
  20. 20.
    Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 606–613. IEEE, Los Alamitos (2010)Google Scholar
  21. 21.
    Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2030–2037. IEEE, Los Alamitos (2010)Google Scholar
  22. 22.
    Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9–16. IEEE, Los Alamitos (2010)Google Scholar
  23. 23.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24. IEEE, Los Alamitos (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Piji Li
    • 1
  • Jun Ma
    • 1
  • Shuai Gao
    • 1
  1. 1.School of Computer Science & TechnologyShandong UniversityJinanChina

Personalised recommendations