Advertisement

UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition

  • Asanka G. PereraEmail author
  • Yee Wei Law
  • Javaan Chahl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)

Abstract

Current UAV-recorded datasets are mostly limited to action recognition and object tracking, whereas the gesture signals datasets were mostly recorded in indoor spaces. Currently, there is no outdoor recorded public video dataset for UAV commanding signals. Gesture signals can be effectively used with UAVs by leveraging the UAVs visual sensors and operational simplicity. To fill this gap and enable research in wider application areas, we present a UAV gesture signals dataset recorded in an outdoor setting. We selected 13 gestures suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. We provide 119 high-definition video clips consisting of 37151 frames. The overall baseline gesture recognition performance computed using Pose-based Convolutional Neural Network (P-CNN) is 91.9%. All the frames are annotated with body joints and gesture classes in order to extend the dataset’s applicability to a wider research area including gesture recognition, action recognition, human pose recognition and situation awareness.

Keywords

UAV Gesture dataset UAV control Gesture recognition 

Notes

Acknowledgement

This project was partly supported by Project Tyche, the Trusted Autonomy Initiative of the Defence Science and Technology Group (grant number myIP6780).

References

  1. 1.
    Barekatain, M., et al.: Okutama-action: an aerial view video dataset for concurrent human action detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2153–2160, July 2017.  https://doi.org/10.1109/CVPRW.2017.267
  2. 2.
    Bonetto, M., Korshunov, P., Ramponi, G., Ebrahimi, T.: Privacy in mini-drone based video surveillance. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 04, pp. 1–6, May 2015.  https://doi.org/10.1109/FG.2015.7285023
  3. 3.
    Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-24673-2_3CrossRefGoogle Scholar
  4. 4.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)Google Scholar
  5. 5.
    Carol Neidle, A.T., Sclaroff, S.: 5th Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon, May 2012Google Scholar
  6. 6.
    Chaquet, J.M., Carmona, E.J., Fernández-Caballero, A.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. 117(6), 633–659 (2013).  https://doi.org/10.1016/j.cviu.2013.01.013. http://www.sciencedirect.com/science/article/pii/S1077314213000295CrossRefGoogle Scholar
  7. 7.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. CoRR abs/1405.3531 (2014). http://arxiv.org/abs/1405.3531
  8. 8.
    Cherian, A., Mairal, J., Alahari, K., Schmid, C.: Mixing body-part sequences for human pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014Google Scholar
  9. 9.
    Cheron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: The IEEE International Conference on Computer Vision (ICCV), December 2015Google Scholar
  10. 10.
    Costante, G., Bellocchio, E., Valigi, P., Ricci, E.: Personalizing vision-based gestural interfaces for HRI with UAVs: a transfer learning approach. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3319–3326, September 2014.  https://doi.org/10.1109/IROS.2014.6943024
  11. 11.
    Girdhar, R., Ramanan, D.: Attentional pooling for action recognition. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 34–45. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6609-attentional-pooling-for-action-recognition.pdf
  12. 12.
    Gkioxari, G., Malik, J.: Finding action tubes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015Google Scholar
  13. 13.
    Guyon, I., Athitsos, V., Jangyodsuk, P., Escalante, H.J.: The ChaLearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25(8), 1929–1951 (2014)CrossRefGoogle Scholar
  14. 14.
    Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: 2013 IEEE International Conference on Computer Vision, pp. 3192–3199, December 2013.  https://doi.org/10.1109/ICCV.2013.396
  15. 15.
    Kang, S., Wildes, R.P.: Review of action recognition and detection methods. CoRR abs/1610.06906 (2016). http://arxiv.org/abs/org/1610.06906
  16. 16.
    Lee, J., Tan, H., Crandall, D., Šabanović, S.: Forecasting hand gestures for human-drone interaction. In: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, HRI 2018, pp. 167–168. ACM, New York (2018).  https://doi.org/10.1145/3173386.3176967
  17. 17.
    Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shape-motion prototype trees. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 444–451, September 2009.  https://doi.org/10.1109/ICCV.2009.5459184
  18. 18.
    Oh, S., et al.: A large-scale benchmark dataset for event recognition in surveillance video. In: CVPR 2011, pp. 3153–3160 (2011).  https://doi.org/10.1109/CVPR.2011.5995586
  19. 19.
    Pfeil, K., Koh, S.L., LaViola, J.: Exploring 3D gesture metaphors for interaction with unmanned aerial vehicles. In: Proceedings of the 2013 International Conference on Intelligent User Interfaces, IUI 2013, pp. 257–266. ACM, New York (2013).  https://doi.org/10.1145/2449396.2449429
  20. 20.
    Piergiovanni, A.J., Ryoo, M.S.: Fine-grained activity recognition in baseball videos. CoRR abs/1804.03247 (2018). http://arxiv.org/abs/1804.03247
  21. 21.
    Pisharady, P.K., Saerbeck, M.: Recent methods and databases in vision-based hand gesture recognition: a review. Comput. Vis. Image Underst. 141, 152–165 (2015).  https://doi.org/10.1016/j.cviu.2015.08.004. http://www.sciencedirect.com/science/article/pii/S1077314215001794CrossRefGoogle Scholar
  22. 22.
    Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_33CrossRefGoogle Scholar
  23. 23.
    Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1194–1201, June 2012.  https://doi.org/10.1109/CVPR.2012.6247801
  24. 24.
    Ruffieux, S., Lalanne, D., Mugellini, E.: ChAirGest: a challenge for multimodal mid-air gesture recognition for close HCI. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI 2013, pp. 483–488. ACM, New York (2013).  https://doi.org/10.1145/2522848.2532590
  25. 25.
    Ruffieux, S., Lalanne, D., Mugellini, E., Abou Khaled, O.: A survey of datasets for human gesture recognition. In: Kurosu, M. (ed.) HCI 2014. LNCS, vol. 8511, pp. 337–348. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07230-2_33CrossRefGoogle Scholar
  26. 26.
    Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1593–1600, September 2009.  https://doi.org/10.1109/ICCV.2009.5459361
  27. 27.
    Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  28. 28.
    Song, Y., Demirdjian, D., Davis, R.: Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database. In: Face and Gesture 2011, pp. 500–506, March 2011.  https://doi.org/10.1109/FG.2011.5771448
  29. 29.
    Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. Technical report. UCF Center for Research in Computer Vision (2012)Google Scholar
  30. 30.
    University of Central Florida: UCF aerial action dataset, November 2011. http://crcv.ucf.edu/data/UCF_Aerial_Action.php
  31. 31.
    University of Central Florida: UCF-ARG Data Set, November 2011. http://crcv.ucf.edu/data/UCF-ARG.php
  32. 32.
    U.S. Navy: Aircraft signals NATOPS manual, NAVAIR 00–80t-113 (1997). http://www.navybmr.com/study%20material/NAVAIR_113.pdf
  33. 33.
    Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vis. 101(1), 184–204 (2013).  https://doi.org/10.1007/s11263-012-0564-1CrossRefGoogle Scholar
  34. 34.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  35. 35.
    Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. CoRR abs/1801.07455 (2018). http://arxiv.org/abs/1801.07455

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of EngineeringUniversity of South AustraliaMawson LakesAustralia
  2. 2.Joint and Operations Analysis DivisionDefence Science and Technology GroupMelbourneAustralia

Personalised recommendations