Fully Autonomous UAV-Based Action Recognition System Using Aerial Imagery

Peng, Han; Razi, Abolfazl

doi:10.1007/978-3-030-64556-4_22

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12509))

Included in the following conference series:

International Symposium on Visual Computing

1511 Accesses
5 Citations

Abstract

Human action recognition is an important topic in artificial intelligence with a wide range of applications including surveillance systems, search-and-rescue operations, human-computer interaction, etc. However, most of the current action recognition systems utilize videos captured by stationary cameras. Another emerging technology is the use of unmanned ground and aerial vehicles (UAV/UGV) for different tasks such as transportation, traffic control, border patrolling, wild-life monitoring, etc. This technology has become more popular in recent years due to its affordability, high maneuverability, and limited human interventions. However, there does not exist an efficient action recognition algorithm for UAV-based monitoring platforms. This paper considers UAV-based video action recognition by addressing the key issues of aerial imaging systems such as camera motion and vibration, low resolution, and tiny human size. In particular, we propose an automated deep learning-based action recognition system which includes the three stages of video stabilization using the SURF feature selection and Lucas-Kanade method, human action area detection using faster region-based convolutional neural networks (R-CNN), and action recognition. We propose a novel structure that extends and modifies the InceptionResNet-v2 architecture by combining a 3D CNN architecture and a residual network for action recognition. We achieve an average accuracy of 85.83% for the entire-video-level recognition when applying our algorithm to the popular UCF-ARG aerial imaging dataset. This accuracy significantly improves upon the state-of-the-art accuracy by a margin of 17%.

This material is based upon the work supported by the National Science Foundation under Grant No. 1755984. This work is also partially supported by the Arizona Board of Regents (ABOR) under Grant No. 1003329.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Object Detection in Surveillance Using Deep Learning Methods: A Comparative Analysis

Efficient Deep Vision for Aerial Visual Understanding

TF-Net: Deep Learning Empowered Tiny Feature Network for Night-Time UAV Detection

References

Nagendran, A., Harper, D.: UCF-ARG dataset, University of Central Florida (2010). http://crcv.ucf.edu/data/UCF-ARG.php
Abiodun, O.I., Jantan, A., Omolara, A.E., Dada, K.V., Mohamed, N.A., Arshad, H.: State-of-the-art in artificial neural network applications: a survey. Heliyon 4(11), e00938 (2018)
Article Google Scholar
AlDahoul, N., Sabri, M., Qalid, A., Mansoor, A.M.: Real-time human detection for aerial captured video sequences via deep models. Comput. Intell. Neurosci. 2018 (2018)
Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Bouguet, J.Y., et al.: Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm (2001)
Google Scholar
Burghouts, G., van Eekeren, A., Dijk, J.: Focus-of-attention for human activity recognition from UAVs. In: Electro-Optical and Infrared Systems: Technology and Applications XI, vol. 9249 (2014)
Google Scholar
Danafar, S., Gheissari, N.: Action recognition for surveillance applications using optic flow and SVM. In: Asian Conference on Computer Vision (2007)
Google Scholar
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Google Scholar
Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal multiplier networks for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4768–4777 (2017)
Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Han, S., Achar, M., Lee, S., Peña-Mora, F.: Empirical assessment of a RGB-D sensor on motion capture and action recognition for construction worker monitoring. Visual. Eng. 1(1), 6 (2013)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016. https://doi.org/10.1109/CVPR.2016.90
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Lowe, D.G., et al.: Object recognition from local scale-invariant features. In: ICCV, vol. 99, pp. 1150–1157 (1999)
Google Scholar
Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision (1981)
Google Scholar
Mliki, H., Bouhlel, F., Hammami, M.: Human activity recognition from UAV-captured video sequences. Pattern Recogn. 100, 107140 (2020)
Article Google Scholar
Peng, H., Razi, A., Afghah, F., Ashdown, J.: A unified framework for joint mobility prediction and object profiling of drones in UAV networks. J. Commun. Netw. 20(5), 434–442 (2018)
Article Google Scholar
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)
Article Google Scholar
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541 (2017)
Google Scholar
Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2012). https://doi.org/10.1007/s10462-012-9356-9
Article Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Shamsoshoara, A., Afghah, F., Razi, A., Mousavi, S., Ashdown, J., Turk, K.: An autonomous spectrum management scheme for unmanned aerial vehicle networks in disaster relief operations. IEEE Access 8, 58064–58079 (2020)
Article Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems 27, pp. 568–576 (2014)
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Northern Arizona University, Flagstaff, AZ, 86011, USA
Han Peng & Abolfazl Razi

Authors

Han Peng
View author publications
You can also search for this author in PubMed Google Scholar
Abolfazl Razi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abolfazl Razi .

Editor information

Editors and Affiliations

University of Nevada Reno, Reno, NV, USA
George Bebis
Stony Brook University, Stony Brook, NY, USA
Zhaozheng Yin
Drexel University, Philadelphia, PA, USA
Edward Kim
RWTH Aachen University, Aachen, Germany
Jan Bender
University of Edinburgh, Edinburgh, UK
Kartic Subr
IBM Research – Cambridge, Cambridge, MA, USA
Bum Chul Kwon
University of Waterloo, Waterloo, ON, Canada
Jian Zhao
Graz University of Technology, Graz, Austria
Denis Kalkofen
The Hong Kong Polytechnic University, Hong Kong, Hong Kong
George Baciu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, H., Razi, A. (2020). Fully Autonomous UAV-Based Action Recognition System Using Aerial Imagery. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2020. Lecture Notes in Computer Science(), vol 12509. Springer, Cham. https://doi.org/10.1007/978-3-030-64556-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-64556-4_22
Published: 07 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64555-7
Online ISBN: 978-3-030-64556-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fully Autonomous UAV-Based Action Recognition System Using Aerial Imagery

Abstract

Access this chapter

Similar content being viewed by others

Object Detection in Surveillance Using Deep Learning Methods: A Comparative Analysis

Efficient Deep Vision for Aerial Visual Understanding

TF-Net: Deep Learning Empowered Tiny Feature Network for Night-Time UAV Detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fully Autonomous UAV-Based Action Recognition System Using Aerial Imagery

Abstract

Access this chapter

Similar content being viewed by others

Object Detection in Surveillance Using Deep Learning Methods: A Comparative Analysis

Efficient Deep Vision for Aerial Visual Understanding

TF-Net: Deep Learning Empowered Tiny Feature Network for Night-Time UAV Detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation