Skip to main content

Real-Time Human Pose Detection and Recognition Using MediaPipe

  • Conference paper
  • First Online:
Soft Computing and Signal Processing (ICSCSP 2021)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1413))

Included in the following conference series:

Abstract

Significance of human action recognition has increased manifolds due to its wide-scale application in the field of public security, gaming, etc., due to the introduction of various new technologies. We propose a framework that detects human action under different conditions and viewing angles that enable the identification of divergent patterns based on different spatiotemporal trajectories. In this paper, we use new technology such as MediaPipe Holistic which provides pose, face, and hand landmark detection models which parses the frames obtained through real-time device feed using OpenCV through our MediaPipe Holistic model and provide a total of 501 landmarks which is exported as coordinates to a CSV file upon which we train a custom multi-class classification model to understand the relationship between the class and coordinates to classify and detect custom body language pose. The machine learning classification algorithms implemented in this paper are random forest, linear regression, ridge classifier, and gradient boosting classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. Yong, J. Lee, W.-T. Chang, W. Hua, M. Georg, M. Grundmann (2019) MediaPipe: a framework for building perception pipelines

    Google Scholar 

  2. M. Sun, P. Kohli, J. Shotton, Conditional regression forests for human pose estimation, in Proceeding/CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2012), pp. 3394–3401. https://doi.org/10.1109/CVPR.2012.6248079

  3. A. Gupta, A. Kembhavi, L.S. Davis, Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789. https://doi.org/10.1109/TPAMI.2009.83

  4. L. Liu, L. Shao, X. Li, K. Lu, Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans. Cybernet. 46(1), 158–170 (2016). https://doi.org/10.1109/TCYB.2015.2399172

    Article  Google Scholar 

  5. D. Ramanan, D. Forsyth (2004) Automatic annotation of everyday movements

    Google Scholar 

  6. W. Niu, J. Long, D. Han, Y. Wang, Human activity detection and recognition for video surveillance 1, 719–722 (2004). https://doi.org/10.1109/ICME.2004.1394293

  7. I. Grishchenko, V. Bazarevsky, MediaPipe holistic—simultaneous face, hand and pose prediction, on device. Google AI Blog, Google, 10 Dec 2020. https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html

  8. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, D. Passos, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  9. S. Gautam, An improved mammogram classification approach using back propagation neural network, in Data Engineering and Intelligent Computing (Springer, Singapore, 2018), pp. 369–376

    Google Scholar 

  10. M. Navyasri, Robust features for emotion recognition from speech by using Gaussian mixture model classification, in Information and Communication Technology for Intelligent Systems (ICTIS 2017), vol 2 (Springer International Publishing, 2018), pp. 437–444

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amritanshu Kumar Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, A.K., Kumbhare, V.A., Arthi, K. (2022). Real-Time Human Pose Detection and Recognition Using MediaPipe. In: Reddy, V.S., Prasad, V.K., Wang, J., Reddy, K. (eds) Soft Computing and Signal Processing. ICSCSP 2021. Advances in Intelligent Systems and Computing, vol 1413. Springer, Singapore. https://doi.org/10.1007/978-981-16-7088-6_12

Download citation

Publish with us

Policies and ethics