Part Bricolage: Flow-Assisted Part-Based Graphs for Detecting Activities in Videos

  • Sukrit Shankar
  • Vijay Badrinarayanan
  • Roberto Cipolla
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)


Space-time detection of human activities in videos can significantly enhance visual search. To handle such tasks, while solely using low-level features has been found somewhat insufficient for complex datasets; mid-level features (like body parts) that are normally considered, are not robustly accounted for their inaccuracy. Moreover, the activity detection mechanisms do not constructively utilize the importance and trustworthiness of the features.

This paper addresses these problems and introduces a unified formulation for robustly detecting activities in videos. Our first contribution is the formulation of the detection task as an undirected node- and edge-weighted graphical structure called Part Bricolage (PB), where the node weights represent the type of features along with their importance, and edge weights incorporate the probability of the features belonging to a known activity class, while also accounting for the trustworthiness of the features connecting the edge. Prize-Collecting-Steiner-Tree (PCST) problem [19] is solved for such a graph that gives the best connected subgraph comprising the activity of interest. Our second contribution is a novel technique for robust body part estimation, which uses two types of state-of-the-art pose detectors, and resolves the plausible detection ambiguities with pre-trained classifiers that predict the trustworthiness of the pose detectors. Our third contribution is the proposal of fusing the low-level descriptors with the mid-level ones, while maintaining the spatial structure between the features.

For a quantitative evaluation of the detection power of PB, we run PB on Hollywood and MSR-Actions datasets and outperform the state-of-the-art by a significant margin for various detection paradigms.


Activity Understanding Pose Estimation Graph Structures 


  1. 1.
    Black, M.J., Anandan, P.: A framework for the robust estimation of optical flow. In: Proceedings of the Fourth International Conference on Computer Vision, pp. 231–236. IEEE (1993)Google Scholar
  2. 2.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: ICCV (2009)Google Scholar
  3. 3.
    Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)CrossRefGoogle Scholar
  4. 4.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), software available at
  5. 5.
    Chen, C.Y., Grauman, K.: Efficient activity detection with max-subgraph search. In: CVPR (2012)Google Scholar
  6. 6.
    Chen, J., Kim, M., Wang, Y., Ji, Q.: Switching gaussian process dynamic models for simultaneous composite motion tracking and recognition. In: CVPR (2009)Google Scholar
  7. 7.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  8. 8.
    Dittrich, M.T., Klau, G.W., Rosenwald, A., Dandekar, T., Müller, T.: Identifying functional modules in protein–protein interaction networks: an integrated exact approach. Bioinformatics 24(13), i223–i231 (2008)Google Scholar
  9. 9.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: CVPR (2003)Google Scholar
  10. 10.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results (2007),
  11. 11.
    Fragkiadaki, K., Hu, H., Shi, J.: Pose from flow and flow from pose. In: CVPR (2013)Google Scholar
  12. 12.
    Gopalan, R.: Joint sparsity-based representation and analysis of unconstrained activities. In: CVPR (2013)Google Scholar
  13. 13.
    Jain, A., Gupta, A., Rodriguez, M., Davis, L.S.: Representing videos using mid-level discriminative patches. In: CVPR (2013)Google Scholar
  14. 14.
    Jain, M., Jégou, H., Bouthemy, P., et al.: Better exploiting motion for better action recognition. In: CVPR (2013)Google Scholar
  15. 15.
    Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR (2010)Google Scholar
  16. 16.
    Laptev, I.: On space-time interest points. IJCV 64(2-3), 107–123 (2005)CrossRefGoogle Scholar
  17. 17.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  18. 18.
    Lee, C.S., Elgammal, A.: Coupled visual and kinematic manifold models for tracking. IJCV 87(1-2), 118–139 (2010)CrossRefGoogle Scholar
  19. 19.
    Ljubić, I., Weiskircher, R., Pferschy, U., Klau, G.W., Mutzel, P., Fischetti, M.: An algorithmic framework for the exact solution of the prize-collecting steiner tree problem. Mathematical Programming 105(2-3), 427–449 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Ma, S., Zhang, J., Ikizler-Cinbis, N., Sclaroff, S.: Action recognition and localization by hierarchical space-time segments. In: ICCV (2013)Google Scholar
  21. 21.
    Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR (2011)Google Scholar
  22. 22.
    Malgireddy, M., Inwogu, I., Govindaraju, V.: A temporal bayesian model for classifying, detecting and localizing activities in video sequences. In: CVPR (2012)Google Scholar
  23. 23.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)Google Scholar
  24. 24.
    Ramanan, D., Forsyth, D.A.: Automatic annotation of everyday movements. In: NIPS (2003)Google Scholar
  25. 25.
    Raptis, M., Sigal, L.: Poselet key-framing: A model for human activity recognition. In: CVPR (2013)Google Scholar
  26. 26.
    Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: CVPR (2012)Google Scholar
  27. 27.
    Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: CVPR (2011)Google Scholar
  28. 28.
    Schindler, K., Van Gool, L.: Action snippets: How many frames does human action recognition require? In: CVPR (2008)Google Scholar
  29. 29.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: ICPR (2004)Google Scholar
  30. 30.
    Shi, F., Petriu, E., Laganiere, R.: Sampling strategies for real-time action recognition. In: CVPR (2013)Google Scholar
  31. 31.
    Sullivan, M., Shah, M.: Action mach: Maximum average correlation height filter for action recognition. In: CVPR (2008)Google Scholar
  32. 32.
    Taylor, G.W., Sigal, L., Fleet, D.J., Hinton, G.E.: Dynamical binary latent variable models for 3d human pose tracking. In: CVPR (2010)Google Scholar
  33. 33.
    Thurau, C., Hlavác, V.: Pose primitive based human action recognition in videos or still images. In: CVPR (2008)Google Scholar
  34. 34.
    Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: CVPR (2013)Google Scholar
  35. 35.
    Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)Google Scholar
  36. 36.
    Wang, H., Schmid, C., et al.: Action recognition with improved trajectories. In: ICCV (2013)Google Scholar
  37. 37.
    Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C., et al.: Evaluation of local spatio-temporal features for action recognition. In: BMVC (2009)Google Scholar
  38. 38.
    Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: CVPR (2010)Google Scholar
  39. 39.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)Google Scholar
  40. 40.
    Yao, A., Gall, J., Van Gool, L.: Coupled action recognition and pose estimation from multiple views. IJCV 100(1), 16–37 (2012)CrossRefzbMATHGoogle Scholar
  41. 41.
    Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: ICCV (2009)Google Scholar
  42. 42.
    Yuan, J., Liu, Z., Wu, Y.: Discriminative subvolume search for efficient action detection. In: CVPR (2009)Google Scholar
  43. 43.
    Zanfir, M., Leordeanu, M., Sminchisescu, C.: The moving pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection. In: ICCV (2013)Google Scholar
  44. 44.
    Zhu, J., Wang, B., Yang, X., Zhang, W., Tu, Z.: Action recognition with actons. In: ICCV (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Sukrit Shankar
    • 1
  • Vijay Badrinarayanan
    • 1
  • Roberto Cipolla
    • 1
  1. 1.Machine Intelligence Lab, Division of Information ProcessingUniversity of CambridgeUK

Personalised recommendations