Dynamic view selection for multi-camera action recognition

Spurlock, Scott; Souvenir, Richard

doi:10.1007/s00138-015-0715-9

Dynamic view selection for multi-camera action recognition

Original Paper
Published: 28 September 2015

Volume 27, pages 53–63, (2016)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Scott Spurlock¹ &
Richard Souvenir²

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

For multi-camera human action recognition methods, there is often a trade-off between classification accuracy and computational efficiency. Methods that generate 3D models or query all of the cameras in the network for each target are often computationally expensive. In this paper, we present an action recognition method that operates in a multi-camera environment, but dynamically selects a single camera at a time. We learn the relative utility of a particular viewpoint compared with switching to a different available camera in the network for future classification. We cast this learning problem as a Markov Decision Process, and incorporate reinforcement learning to estimate the value of the possible view-shifts. On two benchmark multi-camera action recognition datasets, our method outperforms approaches that incorporate all available cameras in both speed and classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2002)
MATH Google Scholar
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Article Google Scholar
Chaaraoui, A.A., Climent-Pérez, P., Flórez-Revuelta, F.: Silhouette-based human action recognition using sequences of key poses. Pattern Recogn. Lett. 34(15), 1799–1807 (2013)
Article Google Scholar
Cheema, S., Eweiwi, A., Thurau, C., Bauckhage, C.: Action recognition by learning discriminative key poses. In: IEEE International Conference on Computer Vision Workshops, pp. 1302–1309 (2011)
Cilla, R., Patricio, M.A., Berlanga, A., Molina, J.M.: Fusion of single view soft k-nn classifiers for multicamera human action recognition. In: Hybrid Artificial Intelligence Systems, pp. 436–443. Springer (2010)
Farhadi, A., Tabrizi, M., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: IEEE International Conference on Computer Vision, pp. 948–955 (2009)
Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3dpost multi-view and 3d human action/interaction database. In: Visual Media Production, 2009. CVMP’09. Conference for, pp. 159–168. IEEE (2009)
Holte, M.B., Chakraborty, B., Gonzalez, J., Moeslund, T.B.: A local 3-d motion descriptor for multi-view human action recognition from 4-d spatio-temporal interest points. IEEE J. Sel. Top. Signal Process. 6(5), 553–565 (2012)
Article Google Scholar
Holte, M.B., Moeslund, T.B., Nikolaidis, N., Pitas, I.: 3d human action recognition for multi-view camera systems. In: 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2011 International Conference on, pp. 342–349. IEEE (2011)
Iosifidis, A., Tefas, A., Pitas, I.: Multi-view human action recognition under occlusion based on fuzzy distances and neural networks. In: Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pp. 1129–1133. IEEE (2012)
Iosifidis, A., Tefas, A., Pitas, I.: View-independent human action recognition based on multi-view action images and discriminant learning. In: IVMSP Workshop, 2013 IEEE 11th, pp. 1–4 (2013)
Jiang, Z., Zhang, G., Davis, L.S.: Submodular dictionary learning for sparse coding. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 3418–3425. IEEE (2012)
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision Conference, pp. 995–1004 (2008)
Laptev, I.: On space-time interest points. Int. J. Comput. Vision 64(2–3), 107–123 (2005)
Article Google Scholar
Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3209–3216 (2011)
Liu, L., Shao, L., Rockett, P.: Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern Recogn. 46(7), 1810–1818 (2013)
Article Google Scholar
Määttä, T., Härmä, A., Aghajan, H.: On efficient use of multi-view data for activity recognition. In: Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras. ICDSC ’10, pp. 158–165. ACM, New York, NY, USA (2010)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge university press, Cambridge (2008)
Book Google Scholar
Parrigan, K., Souvenir, R.: Aggregating low-level features for human action recognition. In: Advances in Visual Computing, Lecture Notes in Computer Science, pp. 143–152 (2010)
Poppe, R.: A survey on vision-based human action recognition. Image Vision Comput. 28(6), 976–990 (2010)
Article Google Scholar
Rudoy, D., Zelnik-Manor, L.: Viewpoint selection for human actions. Int. J. Comput. Vision 97(3), 243–254 (2012)
Article Google Scholar
Schindler, K., Van Gool, L.: Action snippets: how many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Shen, C., Zhang, C., Fels, S.: A multi-camera surveillance system that estimates quality-of-view measurement. In: Image Processing, 2007. ICIP 2007. IEEE International Conference on, vol. 3, pp. III–193. IEEE (2007)
Souvenir, R., Babbs, J.: Learning the viewpoint manifold for action recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2008)
Spurlock, S., Souvenir, R.: Multi-view action recognition one camera at a time. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2014)
Srivastava, G., Iwaki, H., Park, J., Kak, A.C.: Distributed and lightweight multi-camera human activity classification. In: Distributed Smart Cameras, 2009. ICDSC 2009. Third ACM/IEEE International Conference on, pp. 1–8. IEEE (2009)
Tishby, N., Slonim, N.: Data clustering by markovian relaxation and the information bottleneck method. In: Advances in Neural Information Processing Systems, pp. 640–646 (2000)
Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: Proceedings of the 10th European Conference on Computer Vision: Part I, pp. 548–561. Springer-Verlag (2008)
Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: European Conference on Computer Vision, pp. 548–561 (2008)
Turaga, P., Veeraraghavan, A., Chellappa, R.: Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Wang, X.: Intelligent multi-camera video surveillance: a review. Pattern Recogn. Lett. 26, 1–25 (2015)
Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
MATH Google Scholar
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: Proceedings of International Conference on Computer Vision, pp. 1–7 (2007)
Weinland, D., Özuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: Computer Vision–ECCV 2010, pp. 635–648. Springer (2010)
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vision Image Underst. 104(2), 249–257 (2006)
Article Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vision Image Underst. 115(2), 224–241 (2011)
Article Google Scholar
Wu, C., Khalili, A.H., Aghajan, H.: Multiview activity recognition in smart homes with spatio-temporal features. In: Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras, pp. 142–149. ACM (2010)
Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 489–496 (2011)
Yan, P., Khan, S.M., Shah, M.: Learning 4d action feature models for arbitrary view action recognition. In: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1–7. IEEE (2008)
Zhao, Z., Elgammal, A.M.: Information theoretic key frame selection for action recognition. In: Proceedings of the British Machine Vision Conference, pp. 1–10 (2008)
Zheng, J., Jiang, Z.: Learning view-invariant sparse representations for cross-view action recognition. In: Proceedings of International Conference on Computer Vision, pp. 3176–3183. IEEE (2013)
Zheng, J., Jiang, Z., Phillips, P.J., Chellappa, R.: Cross-view action recognition via a transferable dictionary pair. In: Proceedings of the British Machine Vision Conference, p. 7 (2012)
Zhu, F., Shao, L., Lin, M.: Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recogn. Lett. 33, 438–445 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Sciences, Elon University, Elon, USA
Scott Spurlock
Department of Computer Science, University of North Carolina at Charlotte, Charlotte, USA
Richard Souvenir

Authors

Scott Spurlock
View author publications
You can also search for this author in PubMed Google Scholar
Richard Souvenir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Scott Spurlock.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spurlock, S., Souvenir, R. Dynamic view selection for multi-camera action recognition. Machine Vision and Applications 27, 53–63 (2016). https://doi.org/10.1007/s00138-015-0715-9

Download citation

Received: 14 October 2014
Revised: 21 July 2015
Accepted: 26 August 2015
Published: 28 September 2015
Issue Date: January 2016
DOI: https://doi.org/10.1007/s00138-015-0715-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic view selection for multi-camera action recognition

Abstract

Access this article

Similar content being viewed by others

Action Recognition in the Presence of One Egocentric and Multiple Static Cameras

View-invariant human action recognition via robust locally adaptive multi-view learning

Multi-view Recognition Using Weighted View Selection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Action Recognition in the Presence of One Egocentric and Multiple Static Cameras

View-invariant human action recognition via robust locally adaptive multi-view learning

Multi-view Recognition Using Weighted View Selection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation