Skip to main content

Integrating Randomization and Discrimination for Classifying Human-Object Interaction Activities

  • Chapter
  • First Online:
Human-Centered Social Media Analytics

Abstract

In this chapter we study the problem of classifying human–object interaction activities in still images. The goal of our method is to explore fine image statistics and identify the discriminative image patches for recognition. We achieve this goal by combining two ideas, discriminative feature mining and randomization. Discriminative feature mining allows us to model the detailed information that distinguishes different classes of images, while randomization allows us to handle the huge feature space and prevent over-fitting. We propose a random forest with discriminative decision trees algorithm where every tree node is a discriminative classifier that is trained by combining the information in this node as well as all upstream nodes. Besides human action recognition in still images, we also evaluate our method on subordinate categorization. Experimental results show that our method identifies semantically meaningful visual information and outperforms state-of-the-art algorithms on various datasets. Using our method, we achieved the best results and won the award in PASCAL VOC action classification challenges in 2011 and 2012.

An early version of this chapter was presented in Yao et al. [37], and the code is available at http://vision.stanford.edu/discrim_rf/

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the terms “patches” and “regions” interchangeably throughout this chapter.

  2. 2.

    A dictionary size of 1024, 256, 256 is used for PASCAL action [11, 12], PPMI [33], and Caltech-UCSD Birds [32] datasets respectively.

  3. 3.

    The baseline results are available from the dataset website: http://ai.stanford.edu/~bangpeng/ppmi

  4. 4.

    A summary of the results in 2011 PASCAL challenge is in http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2011/workshop/index.html http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2011/workshop/index.html.

  5. 5.

    A summary of the results in 2012 PASCAL challenge is in http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/workshop/index.html http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/workshop/index.html.

  6. 6.

    These approaches were specifically developed for the 2012 PASCAL VOC challenge and have not been tested on other datasets but we expect similar performance improvements on them.

References

  1. Bernard, S., Heutte, L., Adam, S.: On the selection of decision trees in random forests. In: IEEE International Joint Conference on Neural Networks, IJCNN, pp. 302–307 (2009)

    Google Scholar 

  2. Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2007)

    Google Scholar 

  3. Branson, S., Wah, C., Babenko, B., Schroff, F., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: Proceedings of the European Conference on Computer Vision (ECCV) (2010)

    Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Collin, C.A., McMullen, P.A.: Subordinate-level categorization relies on high spatial frequencies to a greater degree than basic-level categorization. Percept. Psychophys. 67(2), 354–364 (2005)

    Article  Google Scholar 

  6. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

    Google Scholar 

  7. Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-of-features and part-based representations. In: Proceedings of the British Machine Vision Conference (BMVC) (2010)

    Google Scholar 

  8. Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40, 139–157 (2000)

    Article  Google Scholar 

  9. Duan, G., Huang, C., Ai, H., Lao, S.: Boosting associated pairing comparison features for pedestrian detection. In: Proceedings of the Workshop on Visual Surveillance (2009)

    Google Scholar 

  10. Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results (2010)

    Google Scholar 

  11. Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results (2011)

    Google Scholar 

  12. Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2011 (VOC2012) Results (2012)

    Google Scholar 

  13. Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

    Google Scholar 

  14. Fei-Fei, L., Fergus, R., Torralba, A.: Recognizing and learning object categories. Short Course in the IEEE International Conference on Computer Vision (2009)

    Google Scholar 

  15. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminantly trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2010)

    Article  Google Scholar 

  16. Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2003)

    Google Scholar 

  17. Hillel, A.B., Weinshall, D.: Subordinate class recognition using relational object models. In: Proceedings of the Conference on Neural Information Processing Systems (NIPS) (2007)

    Google Scholar 

  18. Johnson, K.E., Eilers, A.T.: Effects of knowledge and development on subordinate level categorization. Cogn. Dev. 13(4), 515–545 (1998)

    Article  Google Scholar 

  19. Khosla, A., Xiao, J., Torralba, A., Oliva, A.: Memorability of image regions. In: Advances in Neural Information Processing Systems (NIPS), Lake Tahoe (2012)

    Google Scholar 

  20. Khosla, A., Yao, B., Jayadevaprakash, N., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs (2011)

    Google Scholar 

  21. Lazebnik, S.: Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)

    Google Scholar 

  22. Li, L.-J., Su, H., Xing, E., Fei-Fei, L.: Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: Proceedings of the Conference on Neural Information Processing Systems (NIPS) (2010)

    Google Scholar 

  23. Lowe, David G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)

    Article  Google Scholar 

  24. Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: Proceedings of the Conference on Neural Information Processing Systems (NIPS) (2007)

    Google Scholar 

  25. Ojala, T., Pietikainen, M., Harwood, D.: Performance evaluation of texture measures with classification based on kullback discrimination of distributions. In: Proceedings of the IEEE International Conference on Pattern Recognition (ICPR) (1994)

    Google Scholar 

  26. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the shape envelope. Int. J. Comput. Vision 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  27. Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)

    Google Scholar 

  28. Tu, Z.: Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2005)

    Google Scholar 

  29. van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1582–1596 (2010)

    Article  Google Scholar 

  30. van de Weijer, J., Schmid, C., Verbeek, J., Larlus, D.: Learning color names for real-world applications. IEEE Trans. Image Process. 18(7), 1512–1523 (2009)

    Article  MathSciNet  Google Scholar 

  31. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  32. Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD birds 200. Technical Report CNS-TR-201, Caltech (2010)

    Google Scholar 

  33. Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  34. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  35. Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L. J., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2011)

    Google Scholar 

  36. Yao, B., Khosla, A., Fei-Fei, L.: Classifying actions and measuring action similarity by modeling the mutual context of objects and human poses. In: Proceedings of the International Conference on Machine Learning (ICML) (2011)

    Google Scholar 

  37. Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)

    Google Scholar 

  38. Yao, B., Bradski, G., Fei-Fei, L.: A codebook-free and annotation-free approach for fine-grained image categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  39. Yao, B., Fei-Fei, L.: Action recognition with exemplar based 2.5D graph matching. In: Proceedings of the European Conference on Computer Vision (ECCV) (2012)

    Google Scholar 

Download references

Acknowledgments

L.F-F. is partially supported by an NSF CAREER grant (IIS-0845230), an ONR MURI grant, the DARPA VIRAT program and the DARPA Mind’s Eye program. B.Y. is partially supported by the SAP Stanford Graduate Fellowship, and the Microsoft Research Fellowship. A.K. is supported by the Facebook Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Khosla .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Khosla, A., Yao, B., Fei-Fei, L. (2014). Integrating Randomization and Discrimination for Classifying Human-Object Interaction Activities. In: Fu, Y. (eds) Human-Centered Social Media Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-05491-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05491-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05490-2

  • Online ISBN: 978-3-319-05491-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics