Advertisement

Joint Camera Pose Estimation and 3D Human Pose Estimation in a Multi-camera Setup

  • Jens PuweinEmail author
  • Luca Ballan
  • Remo Ziegler
  • Marc Pollefeys
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9004)

Abstract

In this paper we propose an approach to jointly perform camera pose estimation and human pose estimation from videos recorded by a set of cameras separated by wide baselines. Multi-camera pose estimation is very challenging in case of wide baselines or in general when patch-based feature correspondences are difficult to establish across images.

For this reason, we propose to exploit the motion of an articulated structure in the scene, such as a human, to relate these cameras. More precisely, we first run a part-based human pose estimation for each camera and each frame independently. Correctly detected joints are then used to compute an initial estimate of the epipolar geometry between pairs of cameras. In a combined optimization over all the recorded sequences, the multi-camera configuration and the 3D motion of the kinematic structure in the scene are inferred. The optimization accounts for time continuity, part-based detection scores, optical flow, and body part visibility.

Our approach was evaluated on 4 publicly available datasets, evaluating the accuracy of the camera poses and the human poses.

Keywords

Optical Flow Joint Position Camera Calibration Final Optimization Bundle Adjustment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

This project is supported by a grant of CTI Switzerland, the 4DVideo ERC Starting Grant Nr. 210806 and the SNF Recording Studio Grant.

References

  1. 1.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  2. 2.
    Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (surf). Comput. Vis. Image Underst. (CVIU) 110, 356–359 (2008)CrossRefGoogle Scholar
  3. 3.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2000)zbMATHGoogle Scholar
  4. 4.
    Ma, Y., Soatto, S., Kosecka, J., Sastry, S.S.: An Invitation to 3-D Vision. Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35, 2878–2890 (2013)CrossRefGoogle Scholar
  6. 6.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  7. 7.
    Amin, S., Andriluka, M., Rohrbach, M., Schiele, B.: Multi-view pictorial structures for 3D human pose estimation. In: Proceedings of the British Machine Vision Conference (BMVC) (2013)Google Scholar
  8. 8.
    Burenius, M., Sullivan, J., Carlsson, S.: 3D pictorial structures for multiple view articulated pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  9. 9.
    Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  10. 10.
    Ramanan, D., Forsyth, D.A., Zisserman, A.: Strike a pose: tracking people by finding stylized poses. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)Google Scholar
  11. 11.
    Bregler, C., Malik, J.: Tracking people with twists and exponential maps. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1998)Google Scholar
  12. 12.
    Salzmann, M., Urtasun, R.: Combining discriminative and generative methods for 3D deformable surface and articulated pose reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)Google Scholar
  13. 13.
    Sigal, L., Bhatia, S., Roth, S., Black, M., Isard, M.: Tracking loose-limbed people. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2004)Google Scholar
  14. 14.
    Ballan, L., Cortelazzo, G.M.: Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes. In: International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT) (2008)Google Scholar
  15. 15.
    Liu, Y., Stoll, C., Gall, J., Seidel, H.P., Theobalt, C.: Markerless motion capture of interacting characters using multi-view image segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)Google Scholar
  16. 16.
    Ballan, L., Taneja, A., Gall, J., Van Gool, L., Pollefeys, M.: Motion capture of hands in action using discriminative salient points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 640–653. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  17. 17.
    de La Gorce, M., Fleet, D., Paragios, N.: Model-based 3D hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 76, 231–243 (2011)Google Scholar
  18. 18.
    Hasler, N., Rosenhahn, B., Thormhlen, T., Wand, M., Gall, J., Seidel, H.P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  19. 19.
    Krahnstoever, N., Mendonca, P.: Bayesian autocalibration for surveillance. In: IEEE International Conference on Computer Vision (ICCV) (2005)Google Scholar
  20. 20.
    Lv, F., Zhao, T., Nevatia, R.: Camera calibration from video of a walking human. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 28, 1513–1518 (2006)CrossRefGoogle Scholar
  21. 21.
    Chen, T., Del Bimbo, A., Pernici, F., Serra, G.: Accurate self-calibration of two cameras by observations of a moving person on a ground plane. In: IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS) (2007)Google Scholar
  22. 22.
    Jaynes, C.: Multi-view calibration from planar motion for video surveillance. In: Second IEEE Workshop on Visual Surveillance (VS 1999) (1999)Google Scholar
  23. 23.
    Stein, G.P.: Tracking from multiple view points: Self-calibration of space and time. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1999)Google Scholar
  24. 24.
    Bose, B., Grimson, E.: Ground plane rectification by tracking moving objects. In: IEEE International Workshop on Visual Surveillance and PETS (2004)Google Scholar
  25. 25.
    Meingast, M., Oh, S., Sastry, S.: Automatic camera network localization using object image tracks. In: IEEE International Conference on Computer Vision (ICCV) (2007)Google Scholar
  26. 26.
    Puwein, J., Ziegler, R., Ballan, L., Pollefeys, M.: PTZ camera network calibration from moving people in sports broadcasts. In: IEEE Workshop on Applications of Computer Vision (WACV) (2012)Google Scholar
  27. 27.
    Sinha, S., Pollefeys, M.: Camera network calibration and synchronization romsilhouettes in archived video. Int. J. Comput. Vis. 87, 266–283 (2010)CrossRefGoogle Scholar
  28. 28.
    Izo, T., Grimson, W.: Simultaneous pose estimation and camera calibration from multiple views. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) (2004)Google Scholar
  29. 29.
    Ye, G., Liu, Y., Hasler, N., Ji, X., Dai, Q., Theobalt, C.: Performance capture of interacting characters with handheld kinects. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 828–841. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  30. 30.
    Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)CrossRefMathSciNetGoogle Scholar
  31. 31.
    Triggs, B., McLauchlan, P., Hartley, R., Fitzgibbon, A.: Bundle adjustment — a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) Vision Algorithms 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000) CrossRefGoogle Scholar
  32. 32.
    OpenCV. http://opencv.org/. Accessed 19 Aug 2014
  33. 33.
    Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003) CrossRefGoogle Scholar
  34. 34.
    Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A.J., Bartlett, P.L., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (2000)Google Scholar
  35. 35.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2005)Google Scholar
  36. 36.
    Inria: Inria dancer, 4D repository. http://4drepository.inrialpes.fr/public/datasets. Accessed 17 Jun 2014
  37. 37.
    Sigal, L., Balan, A., Black, M.: Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated humanmotion. Int. J. Comput. Vis. 87, 4–27 (2010)CrossRefGoogle Scholar
  38. 38.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (BMVC) (2010)Google Scholar
  39. 39.
    HumanEva. http://vision.cs.brown.edu/humaneva/. Accessed 19 Aug 2014
  40. 40.
    Gall, J., Rosenhahn, B., Brox, T., Seidel, H.P.: Optimization and filtering for human motion capture. Int. J. Comput. Vis. 87, 75–92 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Jens Puwein
    • 1
    Email author
  • Luca Ballan
    • 1
  • Remo Ziegler
    • 2
  • Marc Pollefeys
    • 1
  1. 1.Department of Computer ScienceETH ZurichZürichSwitzerland
  2. 2.VizrtZürichSwitzerland

Personalised recommendations