Skip to main content
Log in

Human Pose and Path Estimation from Aerial Video Using Dynamic Classifier Selection

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

We consider the problem of estimating human pose and trajectory by an aerial robot with a monocular camera in near real time. We present a preliminary solution whose distinguishing feature is a dynamic classifier selection architecture. In our solution, each video frame is corrected for perspective using projective transformation. Then, two alternative feature sets are used: (i) Histogram of Oriented Gradients (HOG) of the silhouette, (ii) Convolutional Neural Network (CNN) features of the RGB image. The features (HOG or CNN) are classified using a dynamic classifier. A class is defined as a pose-viewpoint pair, and a total of 64 classes are defined to represent a forward walking and turning gait sequence. Our solution provides three main advantages: (i) Classification is efficient due to dynamic selection (4-class vs. 64-class classification). (ii) Classification errors are confined to neighbors of the true viewpoints. (iii) The robust temporal relationship between poses is used to resolve the left-right ambiguities of human silhouettes. Experiments conducted on both fronto-parallel videos and aerial videos confirm our solution can achieve accurate pose and trajectory estimation for both scenarios. We found using HOG features provides higher accuracy than using CNN features. For example, applying the HOG-based variant of our scheme to the “walking on a figure 8-shaped path” dataset (1652 frames) achieved estimation accuracies of 99.6% for viewpoints and 96.2% for number of poses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Orrite C, Herrero JE. Shape matching of partially occluded curves invariant under projective transformation. Comput Vis Image Underst 2004;93(1):34–64.

    Article  Google Scholar 

  2. Richter-Gebert J. Perspectives on projective geometry: a guided tour through real and complex geometry. Berlin: Springer Science & Business Media; 2011.

    Book  Google Scholar 

  3. Rogez G, Orrite C, Guerrero JJ, Torr PHS. Exploiting projective geometry for view-invariant monocular human motion analysis in man-made environments. Comput Vis Image Underst 2014;120:126–40.

    Article  Google Scholar 

  4. Dalal N, Triggs B. Histograms of oriented gradients for human detection. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1; 2005. p. 886–93.

  5. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE 1998;86(11):2278–324.

    Article  Google Scholar 

  6. Woods K, Kegelmeyer WP, Bowyer K. Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 1997;19(4):405–10.

    Article  Google Scholar 

  7. Ko AHR, Sabourin R, Britto AS Jr. From dynamic classifier selection to dynamic ensemble selection. Pattern Recogn 2008;41(5):1718–31.

    Article  Google Scholar 

  8. Agarwal A, Triggs B. Recovering 3D human pose from monocular images. IEEE Trans Pattern Anal Mach Intell 2006;28(1):44–58.

    Article  PubMed  Google Scholar 

  9. Kuncheva LI, Bezdek JC, Duin RPW. Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recogn 2001;34(2):299–314.

    Article  Google Scholar 

  10. Tulyakov S, Jaeger S, Govindaraju V, Doermann D. Review of classifier combination methods. Machine learning in document analysis and recognition. In: Marinai S and Fujisawa H, editors. Berlin: Springer; 2008. p. 361–86.

  11. Perera AG, Law YW, Al-Naji A, Chahl J. Human motion analysis from UAV video. International Journal of Intelligent Unmanned Systems 2018;6(2):69–92. https://doi.org/10.1108/IJIUS-10-2017-0012.

    Article  Google Scholar 

  12. Wang J, She M, Nahavandi S, Kouzani A. A review of vision-based gait recognition methods for human identification. 2010 international conference on digital image computing: techniques and applications; 2010. p. 320–7.

  13. Hartley R, Zisserman A. Multiple view geometry in computer vision. Cambridge: Cambridge University Press; 2003.

    Google Scholar 

  14. Rogez G, Guerrero JJ, Martínez J, Orrite-Urunuela C. Viewpoint independent human motion analysis in man-made environments. BMVC, vol. 6; 2006. p. 659.

  15. Kuncheva LI. Switching between selection and fusion in combining classifiers: an experiment. IEEE Trans Syst, Man, Cybern B 2002;32(2):146–56.

    Article  CAS  Google Scholar 

  16. Xue Z, Ming D, Song W, Wan B, Jin S. Infrared gait recognition based on wavelet transform and support vector machine. Pattern Recogn 2010;43(8):2904–10.

    Article  Google Scholar 

  17. Collins RT, Gross R, Shi J. Silhouette-based human identification from body shape and gait. Proceedings of fifth IEEE international conference on automatic face gesture recognition; 2002. p. 366–71.

  18. Sarkar S, Phillips PJ, Liu Z, Vega IR, Grother P, Bowyer KW. The humanID gait challenge problem: data sets, performance, and analysis. IEEE Trans Pattern Anal Mach Intell 2005;27(2):162–77.

    Article  PubMed  Google Scholar 

  19. Veeraraghavan A, Roy-Chowdhury AK, Chellappa R. Matching shape sequences in video with applications in human movement analysis. IEEE Trans Pattern Anal Mach Intell 2005;27(12):1896–909.

    Article  PubMed  Google Scholar 

  20. Zeng W, Wang C, Li Y. Model-Based Human gait recognition via deterministic learning. Cogn Comput 2014;6(2):218–29.

    Article  Google Scholar 

  21. Boulgouris NV, Hatzinakos D, Plataniotis KN. Gait recognition: a challenging signal processing technology for biometric identification. IEEE Signal Process Mag 2005;22(6):78–90.

    Article  Google Scholar 

  22. Sheikh Y, Sheikh M, Shah M. Exploring the space of a human action. Tenth IEEE international conference on computer vision, 2005. ICCV 2005, vol. 1; 2005. p. 144–49.

  23. Rao C, Yilmaz A, Shah M. View-invariant representation and recognition of actions. Int J Comput Vis 2002;50(2):203–26.

    Article  Google Scholar 

  24. Rapantzikos K, Avrithis Y, Kollias S. Spatiotemporal features for action recognition and salient event detection. Cogn Comput 2011;3(1):167–84.

    Article  Google Scholar 

  25. Chen SB, Xin Y, Luo B. Action-Based Pedestrian identification via hierarchical matching pursuit and order preserving sparse coding. Cogn Comput 2016;8(5):797–805.

    Article  Google Scholar 

  26. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey of deep neural network architectures and their applications. Neurocomputing 2017;234:11–26.

    Article  Google Scholar 

  27. Wei SE, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. 2016 IEEE conference on computer vision and pattern recognition (CVPR); 2016. p. 4724–32.

  28. Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. Computer vision – ECCV 2016. In: Leibe B, Matas J, Sebe N, and Welling M, editors. Cham: Springer International Publishing; 2016. p. 483–99.

  29. Rogez G, Weinzaepfel P, Schmid C. LCR-Net: localization-classification-regression for human pose. CVPR 2017 - IEEE conference on computer vision & pattern recognition. Honolulu, United States; 2017. Available from: https://hal.inria.fr/hal-01505085.

  30. Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, et al. DeepCut: joint subset partition and labeling for multi person pose estimation. The IEEE conference on computer vision and pattern recognition (CVPR); 2016.

  31. Shah R, Romijnders R. 2016. Applying deep learning to basketball trajectories. CoRR. Available from: arXiv:1608.03793.

  32. Yi S, Li H, Wang X. Pedestrian behavior understanding and prediction with deep neural networks. Computer vision – ECCV 2016. In: Leibe B, Matas J, Sebe N, and Welling M, editors. Cham: Springer International Publishing; 2016. p. 263– 79.

  33. Fernando T, Denman S, Sridharan S, Fookes C. 2017. Soft + hardwired attention: an LSTM framework for human trajectory prediction and abnormal event detection. CoRR. Available from: arXiv:1702.05552.

  34. Labbaci H, Medjahed B, Aklouf Y. A deep learning approach for long term QoS-compliant service composition. Service-oriented computing. In: Maximilien M, Vallecillo A, Wang J, and Oriol M, editors. Cham: Springer International Publishing; 2017. p. 287–94.

  35. Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, et al. Real-time human pose recognition in parts from single depth images. CVPR 2011; 2011. p. 1297–1304.

  36. Zhao S, Chen L, Yao H, Zhang Y, Sun X. Strategy for dynamic 3D depth data matching towards robust action retrieval. Neurocomputing 2015;151:533–43. Available from: http://www.sciencedirect.com/science/article/pii/S0925231214013940.

    Article  Google Scholar 

  37. Sigal L, Black MJ. Measure locally, reason globally: occlusion-sensitive articulated pose estimation. 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol. 2; 2006. p. 2041–8.

  38. Huang Y, Bogo F, Lassner C, Kanazawa A, Gehler PV, Romero J, et al. Towards accurate marker-less human shape and pose estimation over time. International conference on 3D vision (3DV); 2017.

  39. Li Y, Sun Z. Generative tracking of 3D human motion in latent space by sequential clonal selection algorithm. Multimedia Tools and Applications 2014;69(1):79–109.

    Article  Google Scholar 

  40. Lan X, Huttenlocher DP. Beyond trees: common-factor models for 2D human pose recovery. Tenth IEEE international conference on computer vision (ICCV’05), vol. 1; 2005. p. 470–7.

  41. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? Advances in neural information processing systems 27. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, and Weinberger KQ, editors. Curran Associates, Inc.; 2014. p. 3320–8.

  42. Chaturvedi I, Ong YS, Arumugam RV. Deep transfer learning for classification of time-delayed Gaussian networks. Signal Process 2015;110:250–62. Machine learning and signal processing for human pose recovery and behavior analysis.

    Article  Google Scholar 

  43. Martín-Félez R, Xiang T. Gait recognition by ranking. Computer vision – ECCV 2012. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, and Schmid C, editors. Berlin: Springer; 2012. p. 328–41.

  44. Farrajota M, Rodrigues JMF, du Buf JMH. A deep neural network video framework for monitoring elderly persons. Universal access in human-computer interaction. Interaction techniques and environments. In: Antona M and Stephanidis C, editors. Cham: Springer International Publishing; 2016. p. 370–81.

  45. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25. In: Pereira F, Burges CJC, Bottou L, and Weinberger KQ, editors. Curran Associates, Inc.; 2012. p. 1097–105.

  46. Rahmani H, Mian A, Shah M. Learning a deep model for human action recognition from novel viewpoints. IEEE Trans Pattern Anal Mach Intell 2018;40(3):667–81.

    Article  PubMed  Google Scholar 

  47. Farhadi A, Tabrizi MK. Learning to recognize activities from the wrong view point. Computer vision – ECCV 2008. In: Forsyth D, Torr P, and Zisserman A, editors. Berlin: Springer; 2008. p. 154–66.

  48. Andriluka M, Schnitzspan P, Meyer J, Kohlbrecher S, Petersen K, von Stryk O, et al. Vision based victim detection from unmanned aerial vehicles. 2010 IEEE/RSJ international conference on intelligent robots and systems (IROS); 2010. p. 1740–7.

  49. Naseer T, Sturm J, Cremers D. FollowMe: person following and gesture recognition with a quadrocopter. 2013 IEEE/RSJ international conference on intelligent robots and systems; 2013. p. 624–30.

  50. Lim H, Sinha SN. Monocular localization of a moving person onboard a Quadrotor MAV. 2015 IEEE international conference on robotics and automation (ICRA); 2015. p. 2182–9.

  51. Aguilar WG, Luna MA, Moya JF, Abad V, Parra H, Ruiz H. Pedestrian detection for UAVs using cascade classifiers with meanshift. 2017 IEEE 11th international conference on semantic computing (ICSC); 2017. p. 509– 14.

  52. Lao W, Han J, De With PHN. Automatic video-based human motion analyzer for consumer surveillance system. IEEE Trans Consum Electron 2009;55(2):591–8.

    Article  Google Scholar 

  53. Rudol P, Doherty P. Human body detection and geolocalization for UAV search and rescue missions using color and thermal imagery. Aerospace conference, 2008 IEEE; 2008. p. 1–8.

  54. Al-Naji A, Perera AG, Chahl J. Remote monitoring of cardiorespiratory signals from a hovering unmanned aerial vehicle. BioMedical Engineering OnLine. 2017;16(1):101.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Oreifej O, Mehran R, Shah M. Human identity recognition in aerial images. 2010 IEEE conference on computer vision and pattern recognition (CVPR); 2010. p. 709–16.

  56. Yeh MC, Chiu HK, Wang JS. Fast medium-scale multiperson identification in aerial videos. Multimedia Tools and Applications 2016;75(23):16117–33.

    Article  Google Scholar 

  57. Monajjemi M, Bruce J, Sadat SA, Wawerla J, Vaughan R. UAV, do you see me? Establishing mutual attention between an uninstrumented human and an outdoor UAV in flight. 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS); 2015. p. 3614–20.

  58. Minaeian S, Liu J, Son YJ. Vision-based Target Detection and Localization via a Team of Cooperative UAV and UGVs. IEEE Trans Syst Man Cybern Syst 2016;46(7):1005–16.

    Article  Google Scholar 

  59. Smith KE, Kahanpää L, Kekäläinen P, Treves W, Chardin M. An invitation to algebraic geometry. Math Intell 2004;26(4):71–2.

    Article  Google Scholar 

  60. Vondrick C, Patterson D, Ramanan D. Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 2013;101(1):184–204.

    Article  Google Scholar 

  61. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. 2009 IEEE conference on computer vision and pattern recognition; 2009. p. 248–55.

  62. Whittle M. An introduction to gait analysis. Butterworth-Heinemann, 4th edition, 2007. For more details please see: https://www.elsevier.com/books/an-introduction-to-gait-analysis/whittle/978-0-7506-8883-3%23.

  63. Rosales R, Sclaroff S. Combining generative and discriminative models in a framework for articulated pose estimation. Int J Comput Vis 2006;67(3):251–76.

    Article  Google Scholar 

  64. Rogez G, Rihan J, Orrite-Uruñuela C, Torr PHS. Fast human pose detection using randomized hierarchical cascades of rejectors. Int J Comput Vis 2012;99(1):25–52.

    Article  Google Scholar 

  65. Sigal L, Balan AO, Black MJ. Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vis 2009;87(1):4.

    Google Scholar 

  66. Gross R, Shi J. The CMU motion of body (MoBo) database. Pittsburgh: Robotics Institute; 2001. CMU-RI-TR-01-18.

    Google Scholar 

  67. Garcia-Pedrajas N, Ortiz-Boyer D. Improving multiclass pattern recognition by the combination of two strategies. IEEE Trans Pattern Anal Mach Intell 2006;28(6):1001–6.

    Article  PubMed  Google Scholar 

  68. Dietterich TG, Bakiri G. Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 1995;2:263–86.

    Article  Google Scholar 

  69. Lan X, Huttenlocher DP. A unified spatio-temporal articulated model for tracking. Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition (CVPR 2004), vol. 1; 2004. p. I–722–I–729.

  70. Fürnkranz J. Round robin classification. J Mach Learn Res 2002;2(Mar):721–47.

    Google Scholar 

  71. Masulli F, Valentini G. Effectiveness of error correcting output coding methods in ensemble and monolithic learning machines. Formal Pattern Analysis & Applications 2004;6(4):285–300.

    Article  Google Scholar 

  72. Masulli F, Valentini G. Effectiveness of error correcting output codes in multiclass learning problems. Berlin: Springer; 2000. p. 107–16.

  73. Ghani R. Using error-correcting codes for text classification. ICML; 2000. p. 303–10.

  74. Allwein EL, Schapire RE, Singer Y. Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 2000;1(Dec):113–41.

    Google Scholar 

  75. Hastie T, Tibshirani R. Classification by pairwise coupling. Ann Stat 1998;26(2):451–71.

    Article  Google Scholar 

  76. Rogez G, Orrite-Uruñuela C, del Rincón JM. A spatio-temporal 2D-models framework for human pose recovery in monocular sequences. Pattern Recogn 2008;41(9):2926–44.

    Article  Google Scholar 

  77. Şentaş A, Tashiev İ, Küçükayvaz F, Kul S, Eken S, Sayar A, et al. Performance evaluation of support vector machine and convolutional neural network algorithms in real-time vehicle type classification. Advances in internet, data & web technologies. In: Barolli L, Xhafa F, Javaid N, Spaho E, and Kolici V, editors. Cham: Springer International Publishing; 2018. p. 934–43.

  78. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15(1):1929–58.

    Google Scholar 

  79. Cogswell M, Ahmed F, Girshick RB, Zitnick L, Batra D. 2015. Reducing overfitting in deep networks by decorrelating representations. CoRR. Available from: arXiv:1511.06068.

  80. Kandaswamy C, Monteiro JC, Silva LM, Cardoso JS. Multi-source deep transfer learning for cross-sensor biometrics. Neural Comput Applic 2017;28(9):2461–75.

    Article  Google Scholar 

  81. Jain A, Tompson J, LeCun Y, Bregler C. Modeep: a deep learning framework using motion features for human pose estimation. Computer vision – ACCV 2014. In: Cremers D, Reid I, Saito H, and Yang MH, editors. Cham: Springer International Publishing; 2015. p. 302–15.

  82. Anguita D, Ghio A, Pischiutta S, Ridella S. A hardware-friendly support vector machine for embedded automotive applications. 2007 international joint conference on neural networks; 2007. p. 1360–4.

  83. Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 2011. p. 1385–92.

Download references

Funding

This project was partly supported by Project Tyche, the Trusted Autonomy Initiative of the Defence Science and Technology Group (grant number myIP6780).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asanka G. Perera.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Informed Consent

The data collection was conducted under the approval of University of South Australia’s Human Research Ethics Committee (protocol no. 0000035185).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Perera, A.G., Law, Y.W. & Chahl, J. Human Pose and Path Estimation from Aerial Video Using Dynamic Classifier Selection. Cogn Comput 10, 1019–1041 (2018). https://doi.org/10.1007/s12559-018-9577-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-018-9577-6

Keywords

Navigation