Advertisement

Camera localization for a human-pose in 3D space using a single 2D human-pose image with landmarks: a multimedia social network emerging demand

  • Mo’taz Al-Hami
  • Rolf Lakaemper
  • Majdi Rawashdeh
  • M. Shamim Hossain
Article
  • 48 Downloads

Abstract

Recovering a 3D human-pose in the form of an abstracted skeleton from a 2D image suffers from loss of depth information. Assuming the projected human-pose is represented by a set of 2D landmarks capturing the human-pose limbs, recovering back the original 3D locations is an ill posed problem. To recover a 3D configuration, camera localization in 3D space plays a major role, an inaccurate camera localization might mislead the recovery process. In this paper, we propose a 3D camera localization model using only human-pose appearance in a 2D image (i.e., the set of 2D landmarks). We apply a supervised multi-class logistic regression to assign the camera location in 3D space. In the learning process, we assume a set of predefined labeled camera locations. The features we train consist of relative length of limbs and 2D shape context. The goal is to build a relation between these projected landmarks and the camera location in 3D space. This kind of analysis allows us to reconstruct 3D human-poses based on the 2D projection only without any predefined camera parameters. Also, makes real-time multimedia exchange more reliable specially for human-pose related tasks. We test our model on a set of real images showing a variety of camera locations.

Keywords

Human-pose Projection Camera localization Multimedia Logistic regression 2D shape context 3D reconstruction Rotation matrix Translation Extrinsic camera Intrinsic camera Principal component analysis Features Projection error 

Notes

References

  1. 1.
    Akhter I, Black MJ (2015) Pose-conditioned joint angle limits for 3D human pose reconstruction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1446–1455Google Scholar
  2. 2.
    Al-Badarneh A, Khalil M, Al-Hami M (2008) Improving protein 3D structure prediction accuracy using dense regions areas of secondary structures in the contact map. Am J Biochem Biotechnol 4(4):375–384CrossRefGoogle Scholar
  3. 3.
    Al-Hami M (2016) Towards a better pose understanding for humanoid robots. PhD thesis, Temple University LibrariesGoogle Scholar
  4. 4.
    Al-Hami M, Khreishah A, Wu J (2013) Video streaming over wireless lan with network codingGoogle Scholar
  5. 5.
    Al-Hami M, Lakaemper R (2014) Sitting pose generation using genetic algorithm for nao humanoid robots. In: 2014 IEEE workshop on Advanced robotics and its social impacts (ARSO), IEEE, pp 137–142Google Scholar
  6. 6.
    Al-Hami M, Lakaemper R (2015) Towards human pose semantic synthesis in 3D based on query keywords. In: ScitepressGoogle Scholar
  7. 7.
    Al-Hami M, Lakaemper R (2015) Towards human pose semantic synthesis in 3D based on query keywords. In: VISAPP (3), pp 420–427Google Scholar
  8. 8.
    Al-Hami M, Lakaemper R (2017) Reconstructing 3D human poses from keyword based image database query. In: 2017 International Conference on 3D vision (3DV), IEEE, pp 440–448Google Scholar
  9. 9.
    Awad G, Le DD, Ngo CW, Nguyen VT, Quénot G, Snoek C, Satoh S (2017) Video indexing, search, detection, and description with focus on trecvid. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval, ACM, pp 3–4Google Scholar
  10. 10.
    Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522CrossRefGoogle Scholar
  11. 11.
    Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4733–4742Google Scholar
  12. 12.
    Chen CH, Ramanan D (2017) 3D human pose estimation= 2D pose estimation+ matching. In: CVPR. Volume 2, p 6Google Scholar
  13. 13.
    Ferrari V, Marin-Jimenez M, Zisserman A (2008) Progressive search space reduction for human pose estimation. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2008, IEEE, pp 1–8Google Scholar
  14. 14.
    Gavrila D (2000) Pedestrian detection from a moving vehicle. In: Computer Vision ECCV 2000. Springer, pp 37–49Google Scholar
  15. 15.
    Gross R, Shi J (2001) The cmu motion of body (mobo) databaseGoogle Scholar
  16. 16.
    Jokinen K, Wilcock G (2014) Multimodal open-domain conversations with the nao robot. In: Natural interaction with Robots, Knowbots and Smartphones. Springer, pp 213–224Google Scholar
  17. 17.
    Lakaemper R KinectTCP documentation. https://sites.google.com/a/temple.edu/kinecttcp/ Accessed: 2018-08-8
  18. 18.
    Lan X, Huttenlocher DP (2004) A unified spatio-temporal articulated model for tracking. In: IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR), 2004. Volume 1, IEEE, pp I–722Google Scholar
  19. 19.
    Lan X, Huttenlocher DP (2005) Beyond trees: Common-factor models for 2D human pose recovery. In: Tenth IEEE international Conference on Computer Vision (ICCV), 2005. Volume 1, IEEE, pp 470–477Google Scholar
  20. 20.
    Lin CJ, Weng RC, Keerthi SS (2008) Trust region newton method for logistic regression. J Mach Learn Res 9:627–650MathSciNetzbMATHGoogle Scholar
  21. 21.
    Mehta D, Sridhar S, Sotnychenko O, Rhodin H, Shafiei M, Seidel HP, Xu W, Casas D, Theobalt C (2017) Vnect: Real-time 3D human pose estimation with a single rgb camera. ACM Transactions on Graphics (TOG) 36(4):44CrossRefGoogle Scholar
  22. 22.
    Mousas C, Anagnostopoulos CN (2017) Performance-driven hybrid full-body character control for navigation and interaction in virtual environments. 3D Res 8(2):18CrossRefGoogle Scholar
  23. 23.
    Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, Springer, pp 483–499Google Scholar
  24. 24.
    Ramakrishna V, Kanade T, Sheikh Y (2012) Reconstructing 3D human pose from 2D image landmarks, pp 573–586CrossRefGoogle Scholar
  25. 25.
    Ramanan D (2006) Learning to parse images of articulated bodies. In: Advances in neural information processing systems, pp 1129–1136Google Scholar
  26. 26.
    Rennie JD (2005) Regularized logistic regression is strictly convex. Unpublished manuscript. people.csail.mit.edu/jrennie/writing/convexLR.pdf
  27. 27.
    Sapp B, Taskar B (2013) Modec: Multimodal decomposable models for human pose estimation. In: IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2013, IEEE, pp 3674–3681Google Scholar
  28. 28.
    Schönemann P (1966) A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1):1–10MathSciNetCrossRefGoogle Scholar
  29. 29.
    Sharma D, Lakhmi J, Favorskaya M, Howlett RJ (2015) Fusion of smart, multimedia and computer gaming technologies. Volume 1. Springer, BerlinGoogle Scholar
  30. 30.
    Taylor CJ (2000) Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2000. Volume 1, IEEE, pp 677–684Google Scholar
  31. 31.
    The vicon skeleton template. http://mocap.cs.cmu.edu/info.php Accessed: 2016-1-15
  32. 32.
    Varadarajan J, Subramanian R, Bulò SR, Ahuja N, Lanz O, Ricci E (2018) Joint estimation of human pose and conversational groups from social scenes. Int J Comput Vis 126(2-4):410–429MathSciNetCrossRefGoogle Scholar
  33. 33.
    Wang C, Wang Y, Lin Z, Yuille AL, Gao W (2014) Robust estimation of 3D human poses from a single image. In: 2014 IEEE conference on Computer vision and pattern recognition (CVPR), IEEE, pp 2369–2376Google Scholar
  34. 34.
    Yang W, Ouyang W, Li H, Wang X (2016) End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation, pp 3073–3082Google Scholar
  35. 35.
    Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE Multimedia 19 (2):4–10CrossRefGoogle Scholar
  36. 36.
    Zhou X, Zhu M, Leonardos S, Derpanis KG, Daniilidis K (2016) Sparseness meets deepness: 3D human pose estimation from monocular video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4966–4975Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018
corrected publication November/2018

Authors and Affiliations

  1. 1.Department of Computer Information SystemThe Hashemite UniversityZarqaJordan
  2. 2.Department of Computer & Information SciencesTemple UniversityPhiladelphiaUSA
  3. 3.Department of Business Information TechnologyPrincess Sumaya University for TechnologyAmmanJordan
  4. 4.Research Chair of Pervasive and Mobile ComputingCollege of Computer and Information Sciences, King Saud UniversityRiyadhSaudi Arabia
  5. 5.Department of Software EngineeringCollege of Computer and Information Sciences, King Saud UniversityRiyadhSaudi Arabia

Personalised recommendations