Advertisement

Cluster Computing

, Volume 21, Issue 1, pp 311–322 | Cite as

Object and motion cues based collaborative approach for human activity localization and recognition in unconstrained videos

  • Javid UllahEmail author
  • Muhammad Arfan Jaffar
Article

Abstract

This paper addresses the problem of activity localization and recognition in large scale video datasets by the collaborative use of holistic and motion based information (called motion cues). The concept of salient objects is used to obtain the holistic information while the motion cues are obtained by affine motion model and optical flow. The motion cues compensate the camera motion and localize the object of interest in a set of object proposals. Furthermore, the holistic information and motion cues are fused to get a reliable object of interest. In recognition phase, the holistic and motion based features are extracted from the object of interest for the training and testing of classifier. The extreme learning machine is adopted as a classifier to reduce the training and testing time and increase the classification accuracy. The effectiveness of the proposed approach is tested on UCF sports dataset. The detailed experimentation reveals that the proposed approach performs better than state-of-the-art action localization and recognition approaches.

Keywords

Action localization Action recognition Extreme learning Feature learning Large scale data clustering Object proposal Optical flow 

Notes

Acknowledgements

This work is supported by Higher Education Commission of Pakistan.

References

  1. 1.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  2. 2.
    Viola, Paul, Jones, Michael: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRefGoogle Scholar
  3. 3.
    Lampert, C., Blaschko, M., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: IEEE Conference on Computer Vision and Pattern Recognition, June 2008Google Scholar
  4. 4.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)CrossRefGoogle Scholar
  5. 5.
    Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRefGoogle Scholar
  6. 6.
    Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of IEEE International Joint Conference on Neural NetworksGoogle Scholar
  7. 7.
    Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  8. 8.
    Everts, I., van Gemert, J., Gevers, T.: Evaluation of color stips for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  9. 9.
    Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)Google Scholar
  10. 10.
    Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  11. 11.
    Van Gemert, J.C., Veenman, C.J., Geusebroek, J.-M.: Episode-constrained cross-validation in video concept retrieval. IEEE Trans. Multimed. 11(4), 780–786 (2009)CrossRefGoogle Scholar
  12. 12.
    Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition, June, 2011Google Scholar
  13. 13.
    Tian, Y., Sukthankar, R., Shah, M.: Spatiotemporal deformable part models for action detection. In: In Proceedings of IEEE CVPR, pp. 2642–2649 (2013)Google Scholar
  14. 14.
    Yuan, J., Liu, Z., Wu, Y.: Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1728–1743 (2011)CrossRefGoogle Scholar
  15. 15.
    Cao, L., Liu, Z., Huang, T.S.: Cross-dataset action detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
  16. 16.
    Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: IEEE Conference on Computer Vision, pp. 2003–2010, Nov 2011Google Scholar
  17. 17.
    Tran, D., Yuan, J.: Max-margin structured output regression for spatio-temporal action localization. In: In Proceedings of Neural Information Process (NIPS), pp. 350–358, Dec 2012Google Scholar
  18. 18.
    Tran, D., Yuan, J., Forsyth, D.: Video event detection: from subvolume localization to spatio-temporal path search. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 404–416 (2013)CrossRefGoogle Scholar
  19. 19.
    Derpanis, K., Sizintsev, M., Cannons, K., Wildes, R.: Efficient action spotting based on a spacetime oriented structure representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1990–1997. (2010)Google Scholar
  20. 20.
    Sapienza, M., Cuzzolin, F. Torr, P.H.: Learning discriminative spacetime actions from weakly labelled videos. In: In Proceedings of BMVC (2012)Google Scholar
  21. 21.
    Laptev, I., Perez, P.: Retrieving actions in movies. In: In Proceedings of ICCV, pp. 1–8. (2007)Google Scholar
  22. 22.
    Kläser, A., Marszałek, M., Schmid, C., Zisserman, A.: Human focused action localization in video. In International Workshop on Sign, Gesture, Activity (2010)Google Scholar
  23. 23.
    Zhao, S., Precioso, F., Cord, M.: Spatio-temporal tube data representation and kernel design for svm-based video object retrieval system. Multimed. Tools Appl. 55(1), 105–125 (2011)CrossRefGoogle Scholar
  24. 24.
    Raptis, M., Kokkinos, I., Soatto, S.: Discovering discriminative action parts from mid-level video representations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1242–1249. June, 2012Google Scholar
  25. 25.
    Tran, D., Yuan, J.: Optimal spatio-temporal path discovery for video event detection. In: IEEE Conference on Computer Vision and Pattern Recognition, June, 2011Google Scholar
  26. 26.
    Satkin, S., Hebert, M.: Modeling the temporal extent of actions. In: In Proceedings of European Conference on Computer Vision, Sep, 2010Google Scholar
  27. 27.
    Duchenne, O., Laptev, I., Sivic, J., Bach, E., Ponce, J.: Automatic annotation of human actions in video. In: IEEE International Conference on Computer Vision (2009)Google Scholar
  28. 28.
    Gaidon, A., Harchaoui, Z., Schmid, C.: Actom sequence models for efficient action detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3201–3208. (2011)Google Scholar
  29. 29.
    Yao, A., Gall, J., Van Gool, L.: A hough transform-based voting framework for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2061–2068. (2010)Google Scholar
  30. 30.
    Willems, G., Becker, J.H., Tuytelaars, T., Van Gool, L.: Exemplarbased action recognition in video. In: In Proceedings of BMVC, pp. 1–11. (2009)Google Scholar
  31. 31.
    Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 883–897 (2011)CrossRefGoogle Scholar
  32. 32.
    Liu, j., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1996–2003. (2009)Google Scholar
  33. 33.
    Yu, G., Goussies, N., Yuan, J., Liu, Z.: Fast action detection via discriminative random forest voting and top-k subvolume search. IEEE Trans. Multimed. 13(3), 507–517 (2011)CrossRefGoogle Scholar
  34. 34.
    Wang, T., Wang, S., Ding, X.: Detecting human action as the spatio-temporal tube of maximum mutual information. IEEE Trans. Circuits Syst. Video Technol. 24(2), 277–290 (2014)CrossRefGoogle Scholar
  35. 35.
    Trichet, R., Nevatia, R.: Video segmentation with spatio-temporal tubes. In: In IEEE International Conference on Advanced Video and Signal Based Surveillance (2013)Google Scholar
  36. 36.
    Jain, M., Gemert, J., Jegou, H., Bouthemy, P., Snoek C.: Action localization with tubelets from motion. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 740–747, June, 2014Google Scholar
  37. 37.
    Alexe, B., Deselaers, V., Ferrari, T.: What is an object? in: IEEE Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
  38. 38.
    Endres, I., Hoiem, D.: Category independent object proposals. In: In European Conference on Computer Vision (2010)Google Scholar
  39. 39.
    Manen, S., Guillaumin, M., Van Gool, L. : Prime object proposals with randomized prims algorithm. In: IEEE Conference on Computer Vision (2013)Google Scholar
  40. 40.
    Rahtu, E., Kannala, J., Blaschko, M.: Learning a category independent object detection cascade. In: IEEE Conference on Computer Vision (2011)Google Scholar
  41. 41.
    Gkioxari, G., Malik, J.: Finding action tubes. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  42. 42.
    Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., Van Gool, l.: Deepproposals: hunting objects and actions by cascading deep convolutional layers. In: IEEE Conference on Computer Vision (2015)Google Scholar
  43. 43.
    Arbelaez, P., Pont-Tuset, J., Barron, J., Marques, E., Malik, J.: Multiscale combinatorial grouping. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  44. 44.
    Zhou, Z., Shi, F., Wu, W.: Learning spatial and temporal extents of human actions for action detection. IEEE Trans. Multimed. 17(4), 512–525 (2015)CrossRefGoogle Scholar
  45. 45.
    Gemert, J., Jain, M., Gati, E., Snoek, C.: Apt: action localization proposals from dense trajectories. In: In Proceedings of BMVC (2015)Google Scholar
  46. 46.
    Sultani, W., Shah, M.: What if we do not have multiple videos of the same action? video action localization using web images. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  47. 47.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE Conference on Computer Vision, pp. 3551–3558, Dec, 2013Google Scholar
  48. 48.
    Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C., Lear, I., Vista, I., Liama, C.: Evaluation of local spatio-temporal features for action recognition. In: Proceedings of BMVC, pp. 1–11, 2009Google Scholar
  49. 49.
    Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2046–2053. (2010)Google Scholar
  50. 50.
    Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: ECCV (2010)Google Scholar
  51. 51.
    Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: IEEE Conference on Computer Vision (2013)Google Scholar
  52. 52.
    Yeffet, L., Wolf, l.: Local trinary patterns for human action recognition. In: Proceedings of ICCV, pp. 492–497. (2009)Google Scholar
  53. 53.
    Oneata, D. Revaud, J., Verbeek, J., Schmid, C.: Spatiotemporal object detection proposals. In: European Conference on Computer Vision (2014)Google Scholar
  54. 54.
    Gati, E., Schavemaker, J., Gemert, J.: Bing3d: fast spatio-temporal proposals for action localization. In: Netherlands Conference on Computer Vision (2015)Google Scholar
  55. 55.
    Odobez, J., Bouthemy, P.: Robust multiresolution estimation of parametric motion models. J. Vis. Commun. Image Represent. 6(4), 348–365 (1995)CrossRefGoogle Scholar
  56. 56.
    Krahenbuhl, P., Koltun, V.: Learning to propose objects. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  57. 57.
    Khan, A., Ullah, J., Jaffar, M.A., Chai, T.: Color image segmentation: a novel spatial fuzzy genetic algorithm. Signal Image Video Process. 8(7), 1233–1243 (2014)CrossRefGoogle Scholar
  58. 58.
    Chaudhry, R. Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  59. 59.
    Oliva, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  60. 60.
    Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: In Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 401–408, 2007Google Scholar
  61. 61.
    Broomhead, D.S., Lowe, D.: Multivariable functional interpolation and adaptive networks. Complex Syst. 2(3), 321–355 (1988)MathSciNetzbMATHGoogle Scholar
  62. 62.
    Chen, S., Cowan, C., Grant, P.: Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans. Neural Netw. 2(2), 302–309 (1991)CrossRefGoogle Scholar
  63. 63.
    Huang, G., Huang, G.-B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)CrossRefzbMATHGoogle Scholar
  64. 64.
    Minhas, R., Baradarani, A., Seifzadeh, S., Jonathan, W.Q.: Human action recognition using extreme learning machine based on visual vocabularies. Int. J. Neurocomput. 73(10–12), 1906–1917 (2010)CrossRefGoogle Scholar
  65. 65.
    Iosifidis,A., Tefas, A., Pitas, I.: Multi-view human action recognition under occlusion based on fuzzy distances and neural networks. In: European Signal Processing Conference (2013)Google Scholar
  66. 66.
    Iosifidis, A., Tefas, A., Pitas, I.: Minimum class variance extreme learning machine for human action recognition. IEEE Trans. Circuits Syst. Video Technol. 23(1), 1968–1979 (2013)CrossRefGoogle Scholar
  67. 67.
    Iosifidis, A., Tefas, A., Pitas, I.: Dynamic action recognition based on dynemes and extreme learning machine. Pattern Recogn. Lett. 34(15), 1890–1898 (2013)CrossRefGoogle Scholar
  68. 68.
    Ma, S., Zhang, J., Ikizler-Cinbis, N., Sclaroff, S.: Action recognition and localization by hierarchical space-time segments. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2744–2751, Dec, 2013Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.National University of Computer and Emerging SciencesIslamabadPakistan
  2. 2.Al Imam Mohammad Ibn Saud Islamic University (IMSIU)RiyadhSaudi Arabia

Personalised recommendations