Human Pose and Path Estimation from Aerial Video Using Dynamic Classifier Selection

Perera, Asanka G.; Law, Yee Wei; Chahl, Javaan

doi:10.1007/s12559-018-9577-6

Human Pose and Path Estimation from Aerial Video Using Dynamic Classifier Selection

Published: 29 June 2018

Volume 10, pages 1019–1041, (2018)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

551 Accesses
14 Citations
1 Altmetric
Explore all metrics

Abstract

We consider the problem of estimating human pose and trajectory by an aerial robot with a monocular camera in near real time. We present a preliminary solution whose distinguishing feature is a dynamic classifier selection architecture. In our solution, each video frame is corrected for perspective using projective transformation. Then, two alternative feature sets are used: (i) Histogram of Oriented Gradients (HOG) of the silhouette, (ii) Convolutional Neural Network (CNN) features of the RGB image. The features (HOG or CNN) are classified using a dynamic classifier. A class is defined as a pose-viewpoint pair, and a total of 64 classes are defined to represent a forward walking and turning gait sequence. Our solution provides three main advantages: (i) Classification is efficient due to dynamic selection (4-class vs. 64-class classification). (ii) Classification errors are confined to neighbors of the true viewpoints. (iii) The robust temporal relationship between poses is used to resolve the left-right ambiguities of human silhouettes. Experiments conducted on both fronto-parallel videos and aerial videos confirm our solution can achieve accurate pose and trajectory estimation for both scenarios. We found using HOG features provides higher accuracy than using CNN features. For example, applying the HOG-based variant of our scheme to the “walking on a figure 8-shaped path” dataset (1652 frames) achieved estimation accuracies of 99.6% for viewpoints and 96.2% for number of poses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Revisiting Self-supervised Monocular Depth Estimation

Determining Location and Detecting Changes Using a Single Training Video

Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

References

Orrite C, Herrero JE. Shape matching of partially occluded curves invariant under projective transformation. Comput Vis Image Underst 2004;93(1):34–64.
Article Google Scholar
Richter-Gebert J. Perspectives on projective geometry: a guided tour through real and complex geometry. Berlin: Springer Science & Business Media; 2011.
Book Google Scholar
Rogez G, Orrite C, Guerrero JJ, Torr PHS. Exploiting projective geometry for view-invariant monocular human motion analysis in man-made environments. Comput Vis Image Underst 2014;120:126–40.
Article Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1; 2005. p. 886–93.
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE 1998;86(11):2278–324.
Article Google Scholar
Woods K, Kegelmeyer WP, Bowyer K. Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 1997;19(4):405–10.
Article Google Scholar
Ko AHR, Sabourin R, Britto AS Jr. From dynamic classifier selection to dynamic ensemble selection. Pattern Recogn 2008;41(5):1718–31.
Article Google Scholar
Agarwal A, Triggs B. Recovering 3D human pose from monocular images. IEEE Trans Pattern Anal Mach Intell 2006;28(1):44–58.
Article PubMed Google Scholar
Kuncheva LI, Bezdek JC, Duin RPW. Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recogn 2001;34(2):299–314.
Article Google Scholar
Tulyakov S, Jaeger S, Govindaraju V, Doermann D. Review of classifier combination methods. Machine learning in document analysis and recognition. In: Marinai S and Fujisawa H, editors. Berlin: Springer; 2008. p. 361–86.
Perera AG, Law YW, Al-Naji A, Chahl J. Human motion analysis from UAV video. International Journal of Intelligent Unmanned Systems 2018;6(2):69–92. https://doi.org/10.1108/IJIUS-10-2017-0012.
Article Google Scholar
Wang J, She M, Nahavandi S, Kouzani A. A review of vision-based gait recognition methods for human identification. 2010 international conference on digital image computing: techniques and applications; 2010. p. 320–7.
Hartley R, Zisserman A. Multiple view geometry in computer vision. Cambridge: Cambridge University Press; 2003.
Google Scholar
Rogez G, Guerrero JJ, Martínez J, Orrite-Urunuela C. Viewpoint independent human motion analysis in man-made environments. BMVC, vol. 6; 2006. p. 659.
Kuncheva LI. Switching between selection and fusion in combining classifiers: an experiment. IEEE Trans Syst, Man, Cybern B 2002;32(2):146–56.
Article CAS Google Scholar
Xue Z, Ming D, Song W, Wan B, Jin S. Infrared gait recognition based on wavelet transform and support vector machine. Pattern Recogn 2010;43(8):2904–10.
Article Google Scholar
Collins RT, Gross R, Shi J. Silhouette-based human identification from body shape and gait. Proceedings of fifth IEEE international conference on automatic face gesture recognition; 2002. p. 366–71.
Sarkar S, Phillips PJ, Liu Z, Vega IR, Grother P, Bowyer KW. The humanID gait challenge problem: data sets, performance, and analysis. IEEE Trans Pattern Anal Mach Intell 2005;27(2):162–77.
Article PubMed Google Scholar
Veeraraghavan A, Roy-Chowdhury AK, Chellappa R. Matching shape sequences in video with applications in human movement analysis. IEEE Trans Pattern Anal Mach Intell 2005;27(12):1896–909.
Article PubMed Google Scholar
Zeng W, Wang C, Li Y. Model-Based Human gait recognition via deterministic learning. Cogn Comput 2014;6(2):218–29.
Article Google Scholar
Boulgouris NV, Hatzinakos D, Plataniotis KN. Gait recognition: a challenging signal processing technology for biometric identification. IEEE Signal Process Mag 2005;22(6):78–90.
Article Google Scholar
Sheikh Y, Sheikh M, Shah M. Exploring the space of a human action. Tenth IEEE international conference on computer vision, 2005. ICCV 2005, vol. 1; 2005. p. 144–49.
Rao C, Yilmaz A, Shah M. View-invariant representation and recognition of actions. Int J Comput Vis 2002;50(2):203–26.
Article Google Scholar
Rapantzikos K, Avrithis Y, Kollias S. Spatiotemporal features for action recognition and salient event detection. Cogn Comput 2011;3(1):167–84.
Article Google Scholar
Chen SB, Xin Y, Luo B. Action-Based Pedestrian identification via hierarchical matching pursuit and order preserving sparse coding. Cogn Comput 2016;8(5):797–805.
Article Google Scholar
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey of deep neural network architectures and their applications. Neurocomputing 2017;234:11–26.
Article Google Scholar
Wei SE, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. 2016 IEEE conference on computer vision and pattern recognition (CVPR); 2016. p. 4724–32.
Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. Computer vision – ECCV 2016. In: Leibe B, Matas J, Sebe N, and Welling M, editors. Cham: Springer International Publishing; 2016. p. 483–99.
Rogez G, Weinzaepfel P, Schmid C. LCR-Net: localization-classification-regression for human pose. CVPR 2017 - IEEE conference on computer vision & pattern recognition. Honolulu, United States; 2017. Available from: https://hal.inria.fr/hal-01505085.
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, et al. DeepCut: joint subset partition and labeling for multi person pose estimation. The IEEE conference on computer vision and pattern recognition (CVPR); 2016.
Shah R, Romijnders R. 2016. Applying deep learning to basketball trajectories. CoRR. Available from: arXiv:1608.03793.
Yi S, Li H, Wang X. Pedestrian behavior understanding and prediction with deep neural networks. Computer vision – ECCV 2016. In: Leibe B, Matas J, Sebe N, and Welling M, editors. Cham: Springer International Publishing; 2016. p. 263– 79.
Fernando T, Denman S, Sridharan S, Fookes C. 2017. Soft + hardwired attention: an LSTM framework for human trajectory prediction and abnormal event detection. CoRR. Available from: arXiv:1702.05552.
Labbaci H, Medjahed B, Aklouf Y. A deep learning approach for long term QoS-compliant service composition. Service-oriented computing. In: Maximilien M, Vallecillo A, Wang J, and Oriol M, editors. Cham: Springer International Publishing; 2017. p. 287–94.
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, et al. Real-time human pose recognition in parts from single depth images. CVPR 2011; 2011. p. 1297–1304.
Zhao S, Chen L, Yao H, Zhang Y, Sun X. Strategy for dynamic 3D depth data matching towards robust action retrieval. Neurocomputing 2015;151:533–43. Available from: http://www.sciencedirect.com/science/article/pii/S0925231214013940.
Article Google Scholar
Sigal L, Black MJ. Measure locally, reason globally: occlusion-sensitive articulated pose estimation. 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol. 2; 2006. p. 2041–8.
Huang Y, Bogo F, Lassner C, Kanazawa A, Gehler PV, Romero J, et al. Towards accurate marker-less human shape and pose estimation over time. International conference on 3D vision (3DV); 2017.
Li Y, Sun Z. Generative tracking of 3D human motion in latent space by sequential clonal selection algorithm. Multimedia Tools and Applications 2014;69(1):79–109.
Article Google Scholar
Lan X, Huttenlocher DP. Beyond trees: common-factor models for 2D human pose recovery. Tenth IEEE international conference on computer vision (ICCV’05), vol. 1; 2005. p. 470–7.
Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? Advances in neural information processing systems 27. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, and Weinberger KQ, editors. Curran Associates, Inc.; 2014. p. 3320–8.
Chaturvedi I, Ong YS, Arumugam RV. Deep transfer learning for classification of time-delayed Gaussian networks. Signal Process 2015;110:250–62. Machine learning and signal processing for human pose recovery and behavior analysis.
Article Google Scholar
Martín-Félez R, Xiang T. Gait recognition by ranking. Computer vision – ECCV 2012. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, and Schmid C, editors. Berlin: Springer; 2012. p. 328–41.
Farrajota M, Rodrigues JMF, du Buf JMH. A deep neural network video framework for monitoring elderly persons. Universal access in human-computer interaction. Interaction techniques and environments. In: Antona M and Stephanidis C, editors. Cham: Springer International Publishing; 2016. p. 370–81.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25. In: Pereira F, Burges CJC, Bottou L, and Weinberger KQ, editors. Curran Associates, Inc.; 2012. p. 1097–105.
Rahmani H, Mian A, Shah M. Learning a deep model for human action recognition from novel viewpoints. IEEE Trans Pattern Anal Mach Intell 2018;40(3):667–81.
Article PubMed Google Scholar
Farhadi A, Tabrizi MK. Learning to recognize activities from the wrong view point. Computer vision – ECCV 2008. In: Forsyth D, Torr P, and Zisserman A, editors. Berlin: Springer; 2008. p. 154–66.
Andriluka M, Schnitzspan P, Meyer J, Kohlbrecher S, Petersen K, von Stryk O, et al. Vision based victim detection from unmanned aerial vehicles. 2010 IEEE/RSJ international conference on intelligent robots and systems (IROS); 2010. p. 1740–7.
Naseer T, Sturm J, Cremers D. FollowMe: person following and gesture recognition with a quadrocopter. 2013 IEEE/RSJ international conference on intelligent robots and systems; 2013. p. 624–30.
Lim H, Sinha SN. Monocular localization of a moving person onboard a Quadrotor MAV. 2015 IEEE international conference on robotics and automation (ICRA); 2015. p. 2182–9.
Aguilar WG, Luna MA, Moya JF, Abad V, Parra H, Ruiz H. Pedestrian detection for UAVs using cascade classifiers with meanshift. 2017 IEEE 11th international conference on semantic computing (ICSC); 2017. p. 509– 14.
Lao W, Han J, De With PHN. Automatic video-based human motion analyzer for consumer surveillance system. IEEE Trans Consum Electron 2009;55(2):591–8.
Article Google Scholar
Rudol P, Doherty P. Human body detection and geolocalization for UAV search and rescue missions using color and thermal imagery. Aerospace conference, 2008 IEEE; 2008. p. 1–8.
Al-Naji A, Perera AG, Chahl J. Remote monitoring of cardiorespiratory signals from a hovering unmanned aerial vehicle. BioMedical Engineering OnLine. 2017;16(1):101.
Article PubMed PubMed Central Google Scholar
Oreifej O, Mehran R, Shah M. Human identity recognition in aerial images. 2010 IEEE conference on computer vision and pattern recognition (CVPR); 2010. p. 709–16.
Yeh MC, Chiu HK, Wang JS. Fast medium-scale multiperson identification in aerial videos. Multimedia Tools and Applications 2016;75(23):16117–33.
Article Google Scholar
Monajjemi M, Bruce J, Sadat SA, Wawerla J, Vaughan R. UAV, do you see me? Establishing mutual attention between an uninstrumented human and an outdoor UAV in flight. 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS); 2015. p. 3614–20.
Minaeian S, Liu J, Son YJ. Vision-based Target Detection and Localization via a Team of Cooperative UAV and UGVs. IEEE Trans Syst Man Cybern Syst 2016;46(7):1005–16.
Article Google Scholar
Smith KE, Kahanpää L, Kekäläinen P, Treves W, Chardin M. An invitation to algebraic geometry. Math Intell 2004;26(4):71–2.
Article Google Scholar
Vondrick C, Patterson D, Ramanan D. Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 2013;101(1):184–204.
Article Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. 2009 IEEE conference on computer vision and pattern recognition; 2009. p. 248–55.
Whittle M. An introduction to gait analysis. Butterworth-Heinemann, 4th edition, 2007. For more details please see: https://www.elsevier.com/books/an-introduction-to-gait-analysis/whittle/978-0-7506-8883-3%23.
Rosales R, Sclaroff S. Combining generative and discriminative models in a framework for articulated pose estimation. Int J Comput Vis 2006;67(3):251–76.
Article Google Scholar
Rogez G, Rihan J, Orrite-Uruñuela C, Torr PHS. Fast human pose detection using randomized hierarchical cascades of rejectors. Int J Comput Vis 2012;99(1):25–52.
Article Google Scholar
Sigal L, Balan AO, Black MJ. Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vis 2009;87(1):4.
Google Scholar
Gross R, Shi J. The CMU motion of body (MoBo) database. Pittsburgh: Robotics Institute; 2001. CMU-RI-TR-01-18.
Google Scholar
Garcia-Pedrajas N, Ortiz-Boyer D. Improving multiclass pattern recognition by the combination of two strategies. IEEE Trans Pattern Anal Mach Intell 2006;28(6):1001–6.
Article PubMed Google Scholar
Dietterich TG, Bakiri G. Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 1995;2:263–86.
Article Google Scholar
Lan X, Huttenlocher DP. A unified spatio-temporal articulated model for tracking. Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition (CVPR 2004), vol. 1; 2004. p. I–722–I–729.
Fürnkranz J. Round robin classification. J Mach Learn Res 2002;2(Mar):721–47.
Google Scholar
Masulli F, Valentini G. Effectiveness of error correcting output coding methods in ensemble and monolithic learning machines. Formal Pattern Analysis & Applications 2004;6(4):285–300.
Article Google Scholar
Masulli F, Valentini G. Effectiveness of error correcting output codes in multiclass learning problems. Berlin: Springer; 2000. p. 107–16.
Ghani R. Using error-correcting codes for text classification. ICML; 2000. p. 303–10.
Allwein EL, Schapire RE, Singer Y. Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 2000;1(Dec):113–41.
Google Scholar
Hastie T, Tibshirani R. Classification by pairwise coupling. Ann Stat 1998;26(2):451–71.
Article Google Scholar
Rogez G, Orrite-Uruñuela C, del Rincón JM. A spatio-temporal 2D-models framework for human pose recovery in monocular sequences. Pattern Recogn 2008;41(9):2926–44.
Article Google Scholar
Şentaş A, Tashiev İ, Küçükayvaz F, Kul S, Eken S, Sayar A, et al. Performance evaluation of support vector machine and convolutional neural network algorithms in real-time vehicle type classification. Advances in internet, data & web technologies. In: Barolli L, Xhafa F, Javaid N, Spaho E, and Kolici V, editors. Cham: Springer International Publishing; 2018. p. 934–43.
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15(1):1929–58.
Google Scholar
Cogswell M, Ahmed F, Girshick RB, Zitnick L, Batra D. 2015. Reducing overfitting in deep networks by decorrelating representations. CoRR. Available from: arXiv:1511.06068.
Kandaswamy C, Monteiro JC, Silva LM, Cardoso JS. Multi-source deep transfer learning for cross-sensor biometrics. Neural Comput Applic 2017;28(9):2461–75.
Article Google Scholar
Jain A, Tompson J, LeCun Y, Bregler C. Modeep: a deep learning framework using motion features for human pose estimation. Computer vision – ACCV 2014. In: Cremers D, Reid I, Saito H, and Yang MH, editors. Cham: Springer International Publishing; 2015. p. 302–15.
Anguita D, Ghio A, Pischiutta S, Ridella S. A hardware-friendly support vector machine for embedded automotive applications. 2007 international joint conference on neural networks; 2007. p. 1360–4.
Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 2011. p. 1385–92.

Download references

Funding

This project was partly supported by Project Tyche, the Trusted Autonomy Initiative of the Defence Science and Technology Group (grant number myIP6780).

Author information

Authors and Affiliations

School of Engineering, University of South Australia, Mawson Lakes, SA, 5095, Australia
Asanka G. Perera, Yee Wei Law & Javaan Chahl
Joint and Operations Analysis Division, Defence Science and Technology Group, Melbourne, Victoria, 3207, Australia
Javaan Chahl

Authors

Asanka G. Perera
View author publications
You can also search for this author in PubMed Google Scholar
Yee Wei Law
View author publications
You can also search for this author in PubMed Google Scholar
Javaan Chahl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asanka G. Perera.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Informed Consent

The data collection was conducted under the approval of University of South Australia’s Human Research Ethics Committee (protocol no. 0000035185).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Perera, A.G., Law, Y.W. & Chahl, J. Human Pose and Path Estimation from Aerial Video Using Dynamic Classifier Selection. Cogn Comput 10, 1019–1041 (2018). https://doi.org/10.1007/s12559-018-9577-6

Download citation

Received: 19 October 2017
Accepted: 13 June 2018
Published: 29 June 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s12559-018-9577-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Pose and Path Estimation from Aerial Video Using Dynamic Classifier Selection

Abstract

Access this article

Similar content being viewed by others

Revisiting Self-supervised Monocular Depth Estimation

Determining Location and Detecting Changes Using a Single Training Video

Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed Consent

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human Pose and Path Estimation from Aerial Video Using Dynamic Classifier Selection

Abstract

Access this article

Similar content being viewed by others

Revisiting Self-supervised Monocular Depth Estimation

Determining Location and Detecting Changes Using a Single Training Video

Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed Consent

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation