Real-Time Human Body Pose Estimation for In-Car Depth Images
- 1 Citations
- 345 Downloads
Abstract
Over the next years, the number of autonomous vehicles is expected to increase. This new paradigm will change the role of the driver inside the car, and so, for safety purposes, the continuous monitoring of the driver/passengers becomes essential. This monitoring can be achieved by detecting the human body pose inside the car to understand the driver/passenger’s activity. In this paper, a method to accurately detect the human body pose on depth images acquired inside a car with a time-of-flight camera is proposed. The method consists in a deep learning strategy where the architecture of the convolutional neural network used is composed by three branches: the first branch is used to estimate the confidence maps for each joint position, the second one to associate different body parts, and the third branch to detect the presence of each joint in the image. The proposed framework was trained and tested in 8820 and 1650 depth images, respectively. The method showed to be accurate, achieving an average distance error between the detected joints and the ground truth of 7.6 pixels and an average accuracy, precision, and recall of 95.6%, 96.0%, and 97.8% respectively. Overall, these results demonstrate the robustness of the method and its potential for in-car body pose monitoring purposes.
Keywords
Autonomous driving Deep-learning Depth images Pose estimationNotes
Acknowledgements
This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project nº 002797; Funding Reference: POCI-01-0247-FEDER-002797].
References
- 1.Levinson, J., et al.: Towards fully autonomous driving: systems and algorithms. In: IEEE Intelligent Vehicles Symposium, pp. 163–168 (2011)Google Scholar
- 2.Banks, V.A., Stanton, N.A.: Analysis of driver roles: modelling the changing role of the driver in automated driving systems using EAST analysis of driver roles: modelling the changing role of the driver in automated driving systems using EAST. Theor. Issues Ergon. Sci. 1–17 (2017) Google Scholar
- 3.Regazzoni, D., De Vecchi, G., Rizzi, C.: RGB cams vs RGB-D sensors: Low cost motion capture technologies performances and limitations. J. Manuf. Syst. 33(4), 719–728 (2014)CrossRefGoogle Scholar
- 4.Shao, L., Han, J., Xu, D., Shotton, J.: Computer vision for RGB-D sensors: Kinect and its applications. IEEE Trans. Cybern. 43(5), 1314–1317 (2013)CrossRefGoogle Scholar
- 5.Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)Google Scholar
- 6.Casner, B.Y.S.M., Hutchins, E.L., Norman, D.O.N., Promise, A.C.: The challenges of partially automated driving. In: Communications of the ACM, pp. 70–77 (2016)CrossRefGoogle Scholar
- 7.Fagnant, D.J., Kockelman, K.: Preparing a nation for autonomous vehicles: opportunities, barriers and policy recommendations. Transp. Res. Part A 77, 167–181 (2015)Google Scholar
- 8.Krueger, R., Rashidi, T.H., Rose, J.M.: Preferences for shared autonomous vehicles. Transp. Res. Part C Emerg. Technol. 69, 343–355 (2016)CrossRefGoogle Scholar
- 9.Demirdjian, D., Varri, C.: Driver pose estimation with 3D Time-of-Flight sensor. In: 2009 IEEE Workshop on Computational Intelligence in Vehicles and Vehicular Systems, pp. 16–22 (2009)Google Scholar
- 10.Ye, M., Yang, R.: Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In: CVPR 2014, pp. 2345–2352 (2014)Google Scholar
- 11.Ye, M., Wang, X., Yang, R., Ren, L., Pollefeys, M.: Accurate 3D pose estimation from a single depth image. In: 2011 International Conference on Computer Vision, pp. 731–738 (2011)Google Scholar
- 12.Sigalas, M., Pateraki, M., Trahanias, P.: Full-body pose tracking? The top view reprojection approach. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1569–1582 (2016)CrossRefGoogle Scholar
- 13.Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56(1), 116–124 (2013)CrossRefGoogle Scholar
- 14.Shotton, J., et al.: Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2821–2840 (2013)CrossRefGoogle Scholar
- 15.Tsai, M.-H., Chen, K.-H., Lin, I.-C.: Real-time upper body pose estimation from depth images. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 2234–2238 (2015)Google Scholar
- 16.Buys, K., Cagniart, C., Baksheev, A., De Laet, T., De Schutter, J., Pantofaru, C.: An adaptable system for RGB-D based human body detection and pose estimation. J. Vis. Commun. Image Represent. 25(1), 39–52 (2014)CrossRefGoogle Scholar
- 17.Haque, A., Peng, B., Luo, Z., Alahi, A., Yeung, S., Fei-Fei, L.: Towards Viewpoint Invariant 3D Human Pose Estimation, pp. 160–177. Springer, Cham (2016)Google Scholar
- 18.Belagiannis, V., Zisserman, A., Group, V.G.: Recurrent human pose estimation. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition (2017)Google Scholar
- 19.Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation, pp. 1831–1840. arXiv preprint arXiv:1702.07432
- 20.He, K, Gkioxari, G, Dollár, P, Girshick, R.: Mask R-CNN. In: Computer Vision (ICCV), pp. 2980–2988 (2017)Google Scholar
- 21.Chen, X., Yuille, A.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Conference on Neural Information Processing Systems, pp. 1–9 (2014)Google Scholar
- 22.Tompson, J., Jain, A., Lecun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1–9 (2014)Google Scholar
- 23.Fan, X., Zheng, K., Lin, Y., Wang, S.: Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation Google Scholar
- 24.Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 717–732. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_44CrossRefGoogle Scholar
- 25.Borghi, G.: POSEidon: Face-from-depth for driver pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5494–5503 (2017)Google Scholar
- 26.Murthy, P., Kovalenko, O., Elhayek, A., Gava, C., Stricker, D.: 3D Human Pose Tracking inside Car using Single RGB Spherical Camera (2017)Google Scholar
- 27.Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Computer Vision and Pattern Recognition, pp. 1–14 (2014)Google Scholar