Abstract
We propose Structural Triangulation, a closed-form solution for optimal 3D human pose considering multi-view 2D pose estimations, calibrated camera parameters, and bone lengths. To start with, we focus on embedding structural constraints of human body in the process of 2D-to-3D inference using triangulation. Assume bone lengths are known in prior, then the inference process is formulated as a constrained optimization problem. By proper approximation, the closed-form solution to this problem is achieved. Further, we generalize our method with Step Constraint Algorithm to help converge when large error occurs in 2D estimations. In experiment, public datasets (Human3.6M and Total Capture) and synthesized data are used for evaluation. Our method achieves state-of-the-art results on Human3.6M Dataset when bone lengths are known and competitive results when they are not. The generality and efficiency of our method are also demonstrated.
This work has been funded in part by the NSFC grants 62176156 and the Science and Technology Commission of Shanghai Municipality under Grant 20DZ2220400. The code is available at https://github.com/chzh9311/structural-triangulation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)
Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: CVPR (2014)
Bogo, F., et al.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Burenius, M., Sullivan, J., Carlsson, S.: 3D pictorial structures for multiple view articulated pose estimation. In: CVPR (2013)
Chen, H., Guo, P., Li, P., Lee, G.H., Chirikjian, G.: Multi-person 3D pose estimation in crowded scenes based on multi-view geometry. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 541–557. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_32
Chen, L., Ai, H., Chen, R., Zhuang, Z., Liu, S.: Cross-view tracking for multi-human 3D pose estimation at over 100 fps. In: CVPR (2020)
Chong, E.K., Zak, S.H.: An Introduction to Optimization. John Wiley, Hoboken (2004)
Conn, A.R., Gould, N.I., Toint, P.L.: Trust Region Methods. SIAM, Philadelphia (2000)
Dabral, R., Mundhada, A., Kusupati, U., Afaque, S., Sharma, A., Jain, A.: Learning 3D human pose from structure and motion. In: ECCV (2018)
Dong, Z., Song, J., Chen, X., Guo, C., Hilliges, O.: Shape-aware multi-person pose estimation from multi-view images. In: ICCV (2021)
Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Trans. Comput. C- 22(1), 67–92 (1973). https://doi.org/10.1109/T-C.1973.223602
Gall, J., Stoll, C., de Aguiar, E., Theobalt, C., Rosenhahn, B., Seidel, H.P.: Motion capture using joint skeleton tracking and surface estimation. In: CVPR (2009)
Hartley, R.I., Sturm, P.: Triangulation. CVIU 68(2), 146–157 (1997). https://doi.org/10.1006/cviu.1997.0547, http://www.sciencedirect.com/science/article/pii/S1077314297905476
He, Y., Yan, R., Fragkiadaki, K., Yu, S.I.: Epipolar transformers. In: CVPR (2020)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. TPAMI 36(7), 1325–1339 (2014). https://doi.org/10.1109/TPAMI.2013.248
Iskakov, K., Burkov, E., Lempitsky, V., Malkov, Y.: Learnable triangulation of human pose. In: ICCV (2019)
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: ICCV (2019)
Li, J., Xu, C., Chen, Z., Bian, S., Yang, L., Lu, C.: Hybrik: a hybrid analytical-neural inverse kinematics solution for 3D human pose and shape estimation. In: CVPR (2021)
Li, X., Fan, Z., Liu, Y., Li, Y., Dai, Q.: 3D pose detection of closely interactive humans using multi-view cameras. Sensors 19(12), 2831 (2019). https://doi.org/10.3390/s19122831, https://www.mdpi.com/1424-8220/19/12/2831
Lin, J., Lee, G.H.: Multi-view multi-person 3D pose estimation with plane sweep stereo. In: CVPR (2021)
Ma, X., Su, J., Wang, C., Ci, H., Wang, Y.: Context modeling in 3D human pose estimation: a unified perspective. In: CVPR (2021)
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV (2017)
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Harvesting multiple views for marker-less 3D human pose annotations. In: CVPR (2017)
Qiu, H., Wang, C., Wang, J., Wang, N., Zeng, W.: Cross view fusion for 3D human pose estimation. In: ICCV (2019)
Remelli, E., Han, S., Honari, S., Fua, P., Wang, R.: Lightweight multi-view 3D pose estimation through camera-disentangled representation. In: CVPR (2020)
Rhodin, H., et al.: Learning monocular 3D human pose estimation from multi-view images. In: CVPR (2018)
Sharma, S., Varigonda, P.T., Bindal, P., Sharma, A., Jain, A.: Monocular 3D human pose estimation by generation and ordinal ranking. In: ICCV (2019)
Tome, D., Toso, M., Agapito, L., Russell, C.: Rethinking pose in 3D: multi-stage refinement and recovery for markerless motion capture. In: 3DV (2018)
Trumble, M., Gilbert, A., Malleson, C., Hilton, A., Collomosse, J.: Total capture: 3d human pose estimation fusing video and inertial sensors. In: BMCV (2017)
Tu, H., Wang, C., Zeng, W.: VoxelPose: towards multi-camera 3D human pose estimation in wild environment. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 197–212. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_12
Wandt, B., Ackermann, H., Rosenhahn, B.: A kinematic chain space for monocular motion capture. In: ECCV Workshops (2018)
Xie, R., Wang, C., Wang, Y.: Metafuse: a pre-trained fusion model for human pose estimation. In: CVPR (2020)
Xu, J., Yu, Z., Ni, B., Yang, J., Yang, X., Zhang, W.: Deep kinematics analysis for monocular 3D human pose estimation. In: CVPR (2020)
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)
Yao, Y., Jafarian, Y., Park, H.S.: Monet: Multiview semi-supervised keypoint detection via epipolar divergence. In: ICCV (2019)
Zeng, A., Sun, X., Yang, L., Zhao, N., Liu, M., Xu, Q.: Learning skeletal graph neural networks for hard 3D pose estimation. In: ICCV (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, Z., Zhao, X., Wan, X. (2022). Structural Triangulation: A Closed-Form Solution to Constrained 3D Human Pose Estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13665. Springer, Cham. https://doi.org/10.1007/978-3-031-20065-6_40
Download citation
DOI: https://doi.org/10.1007/978-3-031-20065-6_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20064-9
Online ISBN: 978-3-031-20065-6
eBook Packages: Computer ScienceComputer Science (R0)