Skip to main content
Log in

Dynamic view selection for multi-camera action recognition

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

For multi-camera human action recognition methods, there is often a trade-off between classification accuracy and computational efficiency. Methods that generate 3D models or query all of the cameras in the network for each target are often computationally expensive. In this paper, we present an action recognition method that operates in a multi-camera environment, but dynamically selects a single camera at a time. We learn the relative utility of a particular viewpoint compared with switching to a different available camera in the network for future classification. We cast this learning problem as a Markov Decision Process, and incorporate reinforcement learning to estimate the value of the possible view-shifts. On two benchmark multi-camera action recognition datasets, our method outperforms approaches that incorporate all available cameras in both speed and classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2002)

    MATH  Google Scholar 

  2. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)

    Article  Google Scholar 

  3. Chaaraoui, A.A., Climent-Pérez, P., Flórez-Revuelta, F.: Silhouette-based human action recognition using sequences of key poses. Pattern Recogn. Lett. 34(15), 1799–1807 (2013)

    Article  Google Scholar 

  4. Cheema, S., Eweiwi, A., Thurau, C., Bauckhage, C.: Action recognition by learning discriminative key poses. In: IEEE International Conference on Computer Vision Workshops, pp. 1302–1309 (2011)

  5. Cilla, R., Patricio, M.A., Berlanga, A., Molina, J.M.: Fusion of single view soft k-nn classifiers for multicamera human action recognition. In: Hybrid Artificial Intelligence Systems, pp. 436–443. Springer (2010)

  6. Farhadi, A., Tabrizi, M., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: IEEE International Conference on Computer Vision, pp. 948–955 (2009)

  7. Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3dpost multi-view and 3d human action/interaction database. In: Visual Media Production, 2009. CVMP’09. Conference for, pp. 159–168. IEEE (2009)

  8. Holte, M.B., Chakraborty, B., Gonzalez, J., Moeslund, T.B.: A local 3-d motion descriptor for multi-view human action recognition from 4-d spatio-temporal interest points. IEEE J. Sel. Top. Signal Process. 6(5), 553–565 (2012)

    Article  Google Scholar 

  9. Holte, M.B., Moeslund, T.B., Nikolaidis, N., Pitas, I.: 3d human action recognition for multi-view camera systems. In: 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2011 International Conference on, pp. 342–349. IEEE (2011)

  10. Iosifidis, A., Tefas, A., Pitas, I.: Multi-view human action recognition under occlusion based on fuzzy distances and neural networks. In: Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pp. 1129–1133. IEEE (2012)

  11. Iosifidis, A., Tefas, A., Pitas, I.: View-independent human action recognition based on multi-view action images and discriminant learning. In: IVMSP Workshop, 2013 IEEE 11th, pp. 1–4 (2013)

  12. Jiang, Z., Zhang, G., Davis, L.S.: Submodular dictionary learning for sparse coding. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 3418–3425. IEEE (2012)

  13. Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision Conference, pp. 995–1004 (2008)

  14. Laptev, I.: On space-time interest points. Int. J. Comput. Vision 64(2–3), 107–123 (2005)

    Article  Google Scholar 

  15. Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3209–3216 (2011)

  16. Liu, L., Shao, L., Rockett, P.: Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern Recogn. 46(7), 1810–1818 (2013)

    Article  Google Scholar 

  17. Määttä, T., Härmä, A., Aghajan, H.: On efficient use of multi-view data for activity recognition. In: Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras. ICDSC ’10, pp. 158–165. ACM, New York, NY, USA (2010)

  18. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge university press, Cambridge (2008)

    Book  Google Scholar 

  19. Parrigan, K., Souvenir, R.: Aggregating low-level features for human action recognition. In: Advances in Visual Computing, Lecture Notes in Computer Science, pp. 143–152 (2010)

  20. Poppe, R.: A survey on vision-based human action recognition. Image Vision Comput. 28(6), 976–990 (2010)

    Article  Google Scholar 

  21. Rudoy, D., Zelnik-Manor, L.: Viewpoint selection for human actions. Int. J. Comput. Vision 97(3), 243–254 (2012)

    Article  Google Scholar 

  22. Schindler, K., Van Gool, L.: Action snippets: how many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  23. Shen, C., Zhang, C., Fels, S.: A multi-camera surveillance system that estimates quality-of-view measurement. In: Image Processing, 2007. ICIP 2007. IEEE International Conference on, vol. 3, pp. III–193. IEEE (2007)

  24. Souvenir, R., Babbs, J.: Learning the viewpoint manifold for action recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2008)

  25. Spurlock, S., Souvenir, R.: Multi-view action recognition one camera at a time. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2014)

  26. Srivastava, G., Iwaki, H., Park, J., Kak, A.C.: Distributed and lightweight multi-camera human activity classification. In: Distributed Smart Cameras, 2009. ICDSC 2009. Third ACM/IEEE International Conference on, pp. 1–8. IEEE (2009)

  27. Tishby, N., Slonim, N.: Data clustering by markovian relaxation and the information bottleneck method. In: Advances in Neural Information Processing Systems, pp. 640–646 (2000)

  28. Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: Proceedings of the 10th European Conference on Computer Vision: Part I, pp. 548–561. Springer-Verlag (2008)

  29. Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: European Conference on Computer Vision, pp. 548–561 (2008)

  30. Turaga, P., Veeraraghavan, A., Chellappa, R.: Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  31. Wang, X.: Intelligent multi-camera video surveillance: a review. Pattern Recogn. Lett. 26, 1–25 (2015)

    Google Scholar 

  32. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  33. Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: Proceedings of International Conference on Computer Vision, pp. 1–7 (2007)

  34. Weinland, D., Özuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: Computer Vision–ECCV 2010, pp. 635–648. Springer (2010)

  35. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vision Image Underst. 104(2), 249–257 (2006)

    Article  Google Scholar 

  36. Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vision Image Underst. 115(2), 224–241 (2011)

    Article  Google Scholar 

  37. Wu, C., Khalili, A.H., Aghajan, H.: Multiview activity recognition in smart homes with spatio-temporal features. In: Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras, pp. 142–149. ACM (2010)

  38. Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 489–496 (2011)

  39. Yan, P., Khan, S.M., Shah, M.: Learning 4d action feature models for arbitrary view action recognition. In: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1–7. IEEE (2008)

  40. Zhao, Z., Elgammal, A.M.: Information theoretic key frame selection for action recognition. In: Proceedings of the British Machine Vision Conference, pp. 1–10 (2008)

  41. Zheng, J., Jiang, Z.: Learning view-invariant sparse representations for cross-view action recognition. In: Proceedings of International Conference on Computer Vision, pp. 3176–3183. IEEE (2013)

  42. Zheng, J., Jiang, Z., Phillips, P.J., Chellappa, R.: Cross-view action recognition via a transferable dictionary pair. In: Proceedings of the British Machine Vision Conference, p. 7 (2012)

  43. Zhu, F., Shao, L., Lin, M.: Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recogn. Lett. 33, 438–445 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Scott Spurlock.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Spurlock, S., Souvenir, R. Dynamic view selection for multi-camera action recognition. Machine Vision and Applications 27, 53–63 (2016). https://doi.org/10.1007/s00138-015-0715-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-015-0715-9

Keywords

Navigation