Skip to main content

3D Human Pose Estimation Based on Multi-Input Multi-Output Convolutional Neural Network and Event Cameras: A Proof of Concept on the DHP19 Dataset

  • Conference paper
  • First Online:
Pattern Recognition. ICPR International Workshops and Challenges (ICPR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12661))

Included in the following conference series:

Abstract

Nowadays Human Pose Estimation (HPE) represents one of the main research themes in the field of computer vision. Despite innovative methods and solutions introduced for frame processing algorithms, the use of standard frame-based cameras still has several drawbacks such as data redundancy and fixed frame-rate. The use of event-based cameras guarantees higher temporal resolution with lower memory and computational cost while preserving the significant information to be processed and thus it represents a new solution for real-time applications. In this paper, the DHP19 dataset was employed, the first and, to date, the only one with HPE data recorded from Dynamic Vision Sensor (DVS) event-based cameras. Starting from the baseline single-input single-output (SISO) Convolutional Neural Network (CNN) model proposed in the literature, a novel multi-input multi-output (MIMO) CNN-based architecture was proposed in order to model simultaneously two different single camera views. Experimental results show that the proposed MIMO approach outperforms the standard SISO model in terms of accuracy and training time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The code to reproduce all results is available at the following link: https://github.com/AlessandroManilii/3D_HumanPoseEstimation_event-based_dataset.

References

  1. Amin, S., Andriluka, M., Rohrbach, M., Schiele, B.: Multi-view pictorial structures for 3d human pose estimation. In: 24th British Machine Vision Conference, pp. 1–12. BMVA Press (2013)

    Google Scholar 

  2. Amir, A., et al.: A low power, fully event-based gesture recognition system. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7388–7397 (2017)

    Google Scholar 

  3. Brandli, C., Berner, R., Yang, M., Liu, S.C., Delbruck, T.: A 240\(\times \) 180 130 DB 3 \(\mu \)s latency global shutter spatiotemporal vision sensor. IEEE J. Solid-State Circuits 49(10), 2333–2341 (2014)

    Article  Google Scholar 

  4. Calabrese, E., et al.: Dhp19: dynamic vision sensor 3d human pose dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019

    Google Scholar 

  5. Cao, Z., Simon, T., Wei, S., Sheikh, Y., et al.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 42(5), 1146-1161 (2019)

    Google Scholar 

  6. Capecci, M., et al.: A tool for home-based rehabilitation allowing for clinical evaluation in a visual markerless scenario. In: 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 8034–8037. IEEE (2015)

    Google Scholar 

  7. Capecci, M., et al.: The kimore dataset: kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation. IEEE Trans. Neural Syst. Rehabil. Eng. 27(7), 1436–1448 (2019)

    Article  Google Scholar 

  8. Hu, Y., Liu, H., Pfeiffer, M., Delbruck, T.: DVS benchmark datasets for object tracking, action recognition, and object recognition. Front. Neurosci. 10, 405 (2016). https://doi.org/10.3389/fnins.2016.00405, https://www.frontiersin.org/article/10.3389/fnins.2016.00405

  9. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)

    Google Scholar 

  10. Lichtsteiner, P., Posch, C., Delbruck, T.: A 128\(\times \)128 120 DB 15\(\mu \) s latency asynchronous temporal contrast vision sensor. IEEE J. Solid-State Circuits 43(2), 566–576 (2008)

    Article  Google Scholar 

  11. Liciotti, D., Paolanti, M., Frontoni, E., Mancini, A., Zingaretti, P.: Person re-identification dataset with RGB-D camera in a top-view configuration. In: Nasrollahi, K., Distante, C., Hua, G., Cavallaro, A., Moeslund, T.B., Battiato, S., Ji, Q. (eds.) FFER/VAAM -2016. LNCS, vol. 10165, pp. 1–11. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56687-0_1

    Chapter  Google Scholar 

  12. Liu, H., Moeys, D.P., Das, G., Neil, D., Liu, S., Delbrück, T.: Combined frame- and event-based detection and tracking. In: 2016 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2511–2514 (2016)

    Google Scholar 

  13. Lungu, I., Corradi, F., Delbrück, T.: Live demonstration: convolutional neural network driven by dynamic vision sensor playing roshambo. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), p. 1 (2017)

    Google Scholar 

  14. Maqueda, A.I., Loquercio, A., Gallego, G., García, N., Scaramuzza, D.: Event-based vision meets deep learning on steering prediction for self-driving cars. CoRR abs/1804.01310 (2018), http://arxiv.org/abs/1804.01310

  15. Mehta, D., Rhodin, H., Casas, D., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3D human pose estimation using transfer learning and improved CNN supervision. CoRR abs/1611.09813 (2016), http://arxiv.org/abs/1611.09813

  16. Moccia, S., Migliorelli, L., Carnielli, V., Frontoni, E.: Preterm infants’ pose estimation with spatio-temporal features. IEEE Trans. Biomed. Eng. 67(8), 2370–2380 (2019)

    Google Scholar 

  17. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  18. Paolanti, M., Romeo, L., Liciotti, D., Pietrini, R., Cenci, A., Frontoni, E., Zingaretti, P.: Person re-identification with RGB-D camera in top-view configuration through multiple nearest neighbor classifiers and neighborhood component features selection. Sensors 18(10), 3471 (2018)

    Article  Google Scholar 

  19. Paolanti, M., Romeo, L., Martini, M., Mancini, A., Frontoni, E., Zingaretti, P.: Robotic retail surveying by deep learning visual and textual data. Robot. Auton. Syst. 118, 179–188 (2019)

    Article  Google Scholar 

  20. Rhodin, H., Robertini, N., Casas, D., Richardt, C., Seidel, H., Theobalt, C.: General automatic human shape and motion capture using volumetric contour cues. CoRR abs/1607.08659 (2016), http://arxiv.org/abs/1607.08659

  21. Sigal, L., Balan, A., Black, M.J.: HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vision 87(1), 4–27 (2010)

    Article  Google Scholar 

  22. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

    Google Scholar 

  23. Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Rosati .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Manilii, A., Lucarelli, L., Rosati, R., Romeo, L., Mancini, A., Frontoni, E. (2021). 3D Human Pose Estimation Based on Multi-Input Multi-Output Convolutional Neural Network and Event Cameras: A Proof of Concept on the DHP19 Dataset. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12661. Springer, Cham. https://doi.org/10.1007/978-3-030-68763-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68763-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68762-5

  • Online ISBN: 978-3-030-68763-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics