Advertisement

Machine Vision and Applications

, Volume 27, Issue 7, pp 1035–1046 | Cite as

Parsing human skeletons in an operating room

  • Vasileios Belagiannis
  • Xinchao Wang
  • Horesh Beny Ben Shitrit
  • Kiyoshi Hashimoto
  • Ralf Stauder
  • Yoshimitsu Aoki
  • Michael Kranzfelder
  • Armin Schneider
  • Pascal Fua
  • Slobodan Ilic
  • Hubertus Feussner
  • Nassir Navab
Original Paper

Abstract

Multiple human pose estimation is an important yet challenging problem. In an operating room (OR) environment, the 3D body poses of surgeons and medical staff can provide important clues for surgical workflow analysis. For that purpose, we propose an algorithm for localizing and recovering body poses of multiple human in an OR environment under a multi-camera setup. Our model builds on 3D Pictorial Structures and 2D body part localization across all camera views, using convolutional neural networks (ConvNets). To evaluate our algorithm, we introduce a dataset captured in a real OR environment. Our dataset is unique, challenging and publicly available with annotated ground truths. Our proposed algorithm yields to promising pose estimation results on this dataset.

Keywords

Human pose estimation Part-based model Medical workflow analysis 

Notes

Acknowledgments

This work was supported in part by the Swiss National Science Foundation and by DFG - Deutsche Forschungsgemeinschaft under the project “Advanced Learning for Tracking and Detection in Medical Workflow Analysis”. The authors would like to thank Iro Laina for helping with the data preparation.

References

  1. 1.
    Agarwal, A., Triggs, B.: Recovering 3D human pose from monocular images. Pattern Anal. Mach. Intell. IEEE Trans. 28(1), 44–58 (2006)CrossRefGoogle Scholar
  2. 2.
    Alahari, K., Seguin, G., Sivic, J., Laptev, I.: Pose estimation and segmentation of people in 3d movies. In: Computer Vision (ICCV), 2013 IEEE International Conference on, pp. 2112–2119. IEEE (2013)Google Scholar
  3. 3.
    Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1–8. IEEE (2008)Google Scholar
  4. 4.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: people detection and articulated pose estimation. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 1014–1021. IEEE (2009)Google Scholar
  5. 5.
    Andriluka, M., Roth, S., Schiele, B.: Monocular 3d pose estimation and tracking by detection. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 623–630. IEEE (2010)Google Scholar
  6. 6.
    Belagiannis, V., Amann, C., Navab, N., Ilic, S.: Holistic human pose estimation with regression forests. In: Perales, F.J., Santos-Victor, J. (eds.) Articulated Motion and Deformable Objects, pp. 20–30. Springer (2014)Google Scholar
  7. 7.
    Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures for multiple human pose estimation. In: Computer Vision and Pattern Recognition, 2014. CVPR 2014. IEEE Conference on. IEEE (2014)Google Scholar
  8. 8.
    Belagiannis, V., Amin, S., Andriluka, M., Schiele, B., Navab, N., Ilic, S.: 3D pictorial structures revisited: multiple human pose estimation. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99), pp. 1–1 (2015). doi: 10.1109/TPAMI.2015.2509986
  9. 9.
    Belagiannis, V., Rupprecht, C., Carneiro, G., Navab, N.: Robust optimization for deep regression. In: Computer Vision (ICCV), 2015 IEEE International Conference on. IEEE (2015)Google Scholar
  10. 10.
    Belagiannis, V., Wang, X., Schiele, B., Fua, P., Ilic, S., Navab, N.: Multiple human pose estimation with temporally consistent 3D pictorial structures. In: Computer Vision—ECCV 2014, ChaLearn Looking at People Workshop. Springer (2014)Google Scholar
  11. 11.
    Berclaz, J., Fleuret, F., Turetken, E., Fua, P.: Multiple object tracking using k-shortest paths optimization. Pattern Anal. Mach. Intell. IEEE Trans. 33(9), 1806–1819 (2011)CrossRefGoogle Scholar
  12. 12.
    Bishop, C.M., et al.: Pattern Recognition and Machine Learning, vol. 1. Springer, New York (2006)zbMATHGoogle Scholar
  13. 13.
    Burenius, M., Sullivan, J., Carlsson, S.: 3D pictorial structures for multiple view articulated pose estimation. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 3618–3625. IEEE (2013)Google Scholar
  14. 14.
    Chen, X., Yuille, A.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Advances in Neural Information Processing Systems (2014)Google Scholar
  15. 15.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886–893. IEEE (2005)Google Scholar
  16. 16.
    Eichner, M., Ferrari, V.: We are family: joint pose estimation of multiple persons. In: Computer Vision—ECCV 2010, pp. 228–242. Springer (2010)Google Scholar
  17. 17.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)CrossRefGoogle Scholar
  18. 18.
    Finley, T., Joachims, T.: Training structural svms when exact inference is intractable. In: Proceedings of the 25th International Conference on Machine Learning, pp. 304–311. ACM (2008)Google Scholar
  19. 19.
    Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22(1), 67–92 (1973)CrossRefGoogle Scholar
  20. 20.
    Gammeter, S., Ess, A., Jäggli, T., Schindler, K., Leibe, B., Van Gool, L.: Articulated multi-body tracking under egomotion. In: Computer Vision–ECCV 2008, pp. 816–830. Springer (2008)Google Scholar
  21. 21.
    Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 415–422. IEEE (2011)Google Scholar
  22. 22.
    Grauman, K., Shakhnarovich, G., Darrell, T.: Inferring 3D structure with a statistical image-based shape model. In: Computer Vision, 2003. Proceedings of the Ninth IEEE International Conference on, pp. 641–647. IEEE (2003)Google Scholar
  23. 23.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
  24. 24.
    Hofmann, M., Gavrila, D.M.: Multi-view 3D human pose estimation in complex environment. Int. J. Comput. Vis. 96(1), 103–124 (2012)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: Computer Vision (ICCV), 2013 IEEE International Conference on, pp. 3192–3199. IEEE (2013)Google Scholar
  26. 26.
    Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara, S., Sheikh, Y.: Panoptic studio: a massively multiview system for social motion capture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3334–3342 (2015)Google Scholar
  27. 27.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)Google Scholar
  28. 28.
    Lallemand, J., Pauly, O., Schwarz, L., Tan, D., Ilic, S.: Multi-task forest for human pose estimation in depth images. In: 3DTV-Conference, 2013 International Conference on, pp. 271–278. IEEE (2013)Google Scholar
  29. 29.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  30. 30.
    Lee, M.W., Nevatia, R.: Human pose tracking using multi-level structured models. In: Computer Vision–ECCV 2006, pp. 368–381. Springer (2006)Google Scholar
  31. 31.
    Li, S., Chan, A.B.: 3D human pose estimation from monocular images with deep convolutional neural network. In: Asian Conference on Computer Vision—ACCV 2014 (2014)Google Scholar
  32. 32.
    Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H.P., Theobalt, C.: Markerless motion capture of multiple characters using multiview image segmentation. Pattern Anal. Mach. Intell. IEEE Trans. 35(11), 2720–2735 (2013)CrossRefGoogle Scholar
  33. 33.
    Luo, X., Berendsen, B., Tan, R.T., Veltkamp, R.C.: Human pose estimation for multiple persons based on volume reconstruction. In: Pattern Recognition (ICPR), 2010 20th International Conference on, pp. 3591–3594. IEEE (2010)Google Scholar
  34. 34.
    Mitchelson, J.R., Hilton, A.: Simultaneous pose estimation of multiple people using multiple-view cues with hierarchical sampling. In: BMVC, pp. 1–10 (2003)Google Scholar
  35. 35.
    Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2), 90–126 (2006)CrossRefGoogle Scholar
  36. 36.
    Padoy, N., Blum, T., Feussner, H., Berger, M.O., Navab, N.: On-line recognition of surgical activity for monitoring in the operating room. In: AAAI, pp. 1718–1724 (2008)Google Scholar
  37. 37.
    Pfister, T., Simonyan, K., Charles, J., Zisserman, A.: Deep convolutional neural networks for efficient pose estimation in gesture videos. In: Asian Conference on Computer Vision (2014)Google Scholar
  38. 38.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 588–595. IEEE (2013)Google Scholar
  39. 39.
    Plankers, R., Fua, P.: Articulated soft objects for multi-view shape and motion capture. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 63–83 (2003)Google Scholar
  40. 40.
    Ramanan, D., Forsyth, D.A.: Finding and tracking people from the bottom up. In: Computer Vision and Pattern Recognition, 2003. Proceedings of the 2003 IEEE Computer Society Conference on, vol. 2, pp. II–467. IEEE (2003)Google Scholar
  41. 41.
    Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56(1), 116–124 (2013)CrossRefGoogle Scholar
  42. 42.
    Shitrit, H.B., Berclaz, J., Fleuret, F., Fua, P.: Multi-commodity network flow for tracking multiple people. Pattern Anal. Mach. Intell. IEEE Trans. 36(8), 1614–1627 (2014)CrossRefGoogle Scholar
  43. 43.
    Sigal, L., Black, M.J.: Guest editorial: state of the art in image-and video-based human pose and motion estimation. Int. J. Comput. Vis. 87(1), 1–3 (2010)MathSciNetCrossRefGoogle Scholar
  44. 44.
    Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Discriminative density propagation for 3D human motion estimation. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 390–397. IEEE (2005)Google Scholar
  45. 45.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)Google Scholar
  46. 46.
    Stauder, R., Okur, A., Peter, L., Schneider, A., Kranzfelder, M., Feussner, H., Navab, N.: Random forests for phase detection in surgical workflow analysis. In: Information Processing in Computer-Assisted Interventions, pp. 148–157. Springer (2014)Google Scholar
  47. 47.
    Taylor, G.W., Sigal, L., Fleet, D.J., Hinton, G.E.: Dynamical binary latent variable models for 3D human pose tracking. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 631–638. IEEE (2010)Google Scholar
  48. 48.
    Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems (2014)Google Scholar
  49. 49.
    Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Computer Vision and Pattern Recognition, 2014. CVPR 2014. IEEE Conference on. IEEE (2014)Google Scholar
  50. 50.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)MathSciNetzbMATHGoogle Scholar
  51. 51.
    Turetken, E., Wang, X., Becker, C., Fua, P.: Detecting and tracking cells using network flow programming. arXiv preprint arXiv:1501.05499 (2015)
  52. 52.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1, pp. I–511. IEEE (2001)Google Scholar
  53. 53.
    Wang, X.: Tracking interacting objects in image sequences. Ph.D. thesis, EPFL (2015)Google Scholar
  54. 54.
    Wang, X., Ablavsky, V., Shitrit, H.B., Fua, P.: Take your eyes off the ball: improving ball-tracking by focusing on team play. Comput. Vis. Image Underst. 119, 102–115 (2014)CrossRefGoogle Scholar
  55. 55.
    Wang, X., Turetken, E., Fleuret, F., Fua, P.: Tracking interacting objects optimally using integer programming. In: ECCV, pp. 17–32 (2014)Google Scholar
  56. 56.
    Wang, X., Turetken, E., Fleuret, F., Fua, P.: Tracking interacting objects using intertwined flows. IEEE Trans. Pattern Anal. Mach. Intell. (99), 1–1. doi: 10.1109/TPAMI.2015.2513406
  57. 57.
    Weede, O., Dittrich, F., Worn, H., Jensen, B., Knoll, A., Wilhelm, D., Kranzfelder, M., Schneider, A., Feussner, H.: Workflow analysis and surgical phase recognition in minimally invasive surgery. In: Robotics and Biomimetics (ROBIO), 2012 IEEE International Conference on, pp. 1080–1074. IEEE (2012)Google Scholar
  58. 58.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 1385–1392. IEEE (2011)Google Scholar
  59. 59.
    Yao, A., Gall, J., Gool, L.V., Urtasun, R.: Learning probabilistic non-linear latent variable models for tracking complex activities. In: Advances in Neural Information Processing Systems, pp. 1359–1367 (2011)Google Scholar
  60. 60.
    Zhao, T., Nevatia, R.: Tracking multiple humans in complex situations. Pattern Anal. Mach. Intell. IEEE Trans. 26(9), 1208–1221 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Vasileios Belagiannis
    • 1
    • 2
  • Xinchao Wang
    • 3
  • Horesh Beny Ben Shitrit
    • 3
  • Kiyoshi Hashimoto
    • 4
  • Ralf Stauder
    • 1
  • Yoshimitsu Aoki
    • 4
  • Michael Kranzfelder
    • 5
  • Armin Schneider
    • 5
  • Pascal Fua
    • 3
  • Slobodan Ilic
    • 1
    • 6
  • Hubertus Feussner
    • 5
  • Nassir Navab
    • 1
    • 7
  1. 1.Computer Aided Medical ProceduresTechnische Universität MünchenMunichGermany
  2. 2.VGGUniversity of OxfordOxfordUK
  3. 3.CVLABEcole Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
  4. 4.Aoki Media Sensing LabKeio UniversityTokyoJapan
  5. 5.MITI, Klinikum rechts der IsarTechnische Universität MünchenMunichGermany
  6. 6.Siemens AGMunichGermany
  7. 7.Johns Hopkins UniversityBaltimoreUSA

Personalised recommendations