Fine-Grained Head Pose Estimation Based on a 6D Rotation Representation with Multiregression Loss

Chen, Jin; Xu, Huahu; Bian, Minjie; Shi, Jiangang; Huang, Yuzhe; Cheng, Chen

doi:10.1007/978-3-031-24386-8_13

Jin Chen¹⁹,
Huahu Xu¹⁹,
Minjie Bian^19,20,
Jiangang Shi²¹,
Yuzhe Huang¹⁹ &
…
Chen Cheng¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 461))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

601 Accesses

Abstract

Estimating the head pose is vital in action evaluation since it has extensive applications such as in automobile driver-assistance systems, performance evaluations of athletes and customers’ attention in retail stores. It is difficult to predict the head orientation from an RGB image by deep learning more accurately. We propose 6DHPENet, a fine-grained 6D head pose estimation network, to estimate the 3D rotations of the head. First, the model adopts a 6D rotation representation for 3D rotations as training objective to guarantee effective learning. 6D rotation representation is a continuous and one-to-one mapping function for 3D rotations. Second, achieving 3D facial landmarks from real-time activities consumes more time and is subject to frontal views. We drop the 3D facial landmarks to enhance the adaptability and generalization ability in various application scenes. Third, after the last convolution extraction layer, a squeeze-and-excitation module is introduced to construct both the local spatial and global channel-wise facial feature information by explicitly modeling the interdependencies between the feature channels. Finally, a multiregression loss function is presented to improve the accuracy and stability for a full-range view of the head pose estimation. In addition, our method is compact and efficient for mobile devices because of the lightweight CNN backbone. The quantitative experiment results trained on 300W-LP datasets show the superior performance of our 6D rotation representation-based multiregression fine-grained method on the AFLW2000 and BIWI datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vincent, L., Pascal, F.: Monocular model-based 3D tracking of rigid objects: a survey, now (2005)
Google Scholar
Xiangyu, Z., Zhen, L., Xiaoming, L., Hailin, S., Stan, Z.L.: Face alignment across large poses: a 3D solution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 146–155. IEEE (2016)
Google Scholar
Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Trans. Intell. Transp. Syst. 11(2), 300–311 (2010)
Google Scholar
Rehder, E., Kloeden H., Stiller, C.: Head detection and orientation estimation for pedestrian safety. In: Proceedings IEEE International Conference Intelligent Transportation Systems 2014, pp. 2292–2297 (2014)
Google Scholar
Pirsiavash, H., Vondrick, C., Torralba, A.: Assessing the quality of actions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 556–571. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_36
Chapter Google Scholar
Adrian, B., Georgios, T.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In: Proceedings of International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Blanz, V., Vetter, T., Rockwood, A.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH 1999: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194(1999)
Google Scholar
Ruiz, N., Chong, E., Rehg, J.M.: Fine-grained head pose estimation without keypoints. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2155–215509 (2018)
Google Scholar
Hsu H., Wu, S T., Sheng, W., Wing, H.W., Lee C.: QuatNet: quaternion-based head pose estimation with multiregression loss. IEEE Trans. Multimedia 21(4), 1035–1046 (2019)
Google Scholar
Tsun-Yi, Y., Yi-Ting, C., Yen-Yu, L., Yung-Yu, C.: FSA-Net: learning fine-grained structure aggregation for head pose estimation from a single image. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1087–1096 (2019)
Google Scholar
Slabaugh, G.G.: Computing Euler angles from a rotation matrix (1999)
Google Scholar
Ashutosh, S., Justin, D., Andrew, Y.N.: Learning 3-D object orientation from images. In: 2009 IEEE International Conference on Robotics and Automation, pp. 794–800. IEEE (2009)
Google Scholar
Zhou, Y., Barnes, C., Jingwan, L., Yang, J., Hao, L.: On the continuity of rotation representations in neural networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5738–5746 (2019)
Google Scholar
Yi, S., Xiaogang W., Xiaoou T.: Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013)
Google Scholar
Mingxing, T., Quoc, V.L.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)
Xiaohan, D., Zhang, X., Ningning, M., Jungong, H., Guiguang, D., Jian, S.: RepVGG: making VGG-style ConvNets great again. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13728–13737 (2021)
Google Scholar
Jie, H., Li, S., Gang, S., Albanie, S.: Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 99 (2017)
Google Scholar
Zhou, Y., Gregson , J.: WHENet: real-time fine-grained estimation for wide range head pose. In: 2020 British Machine Vision Conference (BMVC) (2020)
Google Scholar
Hempel, T., Abdelrahman, A. A., Al-Hamadi A.: 6D rotation representation for unconstrained head pose estimation. arXiv e-prints (2022)
Google Scholar
Zhi, C., Zong, C., Dong, L., Ying, C.: A vector-based representation to enhance head pose estimation. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1187–1196 (2021)
Google Scholar
Wu, C.Y., Xu, Q., Neumann, U.: Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry (2021)
Google Scholar
Soheil, S., Arya, S., Josep, M.P., Federico, T.: On closed-form formulas for the 3-d nearest rotation matrix problem. IEEE Trans. Robotics 36(4), 1333–1339 (2020)
Google Scholar
Kendall, A., Cipolla, R., et al.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings CVPR, vol. 3, p. 8 (2017)
Google Scholar
Li, F.J., Xu, Z.B.: The essential order of approximation for neural networks. Sci. China Ser. F Inf. Sci. 194(1), 120–127 (2004)
MATH Google Scholar
Stiefel manifold. https://en.wikipedia.org/wiki/Stiefel_manifold
Bloom, D.M.: Linear algebra and geometry. CUP Archive (1979)
Google Scholar
Rodrigues, O.: Journal de Math´ematiques 5, 380 (1840)
Google Scholar
Gabriele, F., Matthias, D., Juergen, G., Andrea, F., Luc, V.G.: Random forests for real time 3D face analysis. Int. J. Comput. Vision 101(3), 437–458 (2013)
Article Google Scholar
Christos, S., Georgios, T., Stefanos, Z., Maja, P.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 397–403 (2013)
Google Scholar
Xiangxin, Z., Deva, R.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886. IEEE (2012)
Google Scholar
Erjin, Z., Haoqiang, F., Zhimin, C., Yuning, J., Qi, Y.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 386–391 (2013)
Google Scholar
Peter, N.B., David, W.J., David, J.K., Neeraj, K.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)
Article Google Scholar
Peter, M., Roth, M.K., Paul, W., Horst, B.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: Proceedings First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)
Google Scholar
Kaipeng, Z., Zhanpeng, Z., Zhifeng, L., Yu, Q.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Tsun-Yi, Y., Yi-Hsuan, H., Yen-Yu, L., Pi-Cheng, H., Yung-Yu, C.: SSR-Net: a compact soft stagewise regression network for age estimation. In: IJCAI, vol. 5, p. 7 (2018)
Google Scholar
Bin, H., Renwen, C., Wang, X., Qinbang, Z.: Improving head pose estimation using two-stage ensembles with top-k regression. Image Vis. Comput. 93, 103827 (2020)
Article Google Scholar
Jamie, S., Shaogang, G., Eng-Jon, O.: Understanding pose discrimination in similarity space. In: BMVC, pp. 1–10 (1999)
Google Scholar
Jeffrey, N., Shaogang, G.: Composite support vector machines for detection of faces across views and pose estimation. Image Vis. Comput. 20(5–6), 359–368 (2002). https://doi.org/10.1016/S0262-8856(02)00008-2
Article Google Scholar
Cai, Q., Gallup, D., Zhang, C., Zhang, Z.: 3D deformable face tracking with a commodity depth camera. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 229–242. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15558-1_17
Chapter Google Scholar
Ruigang, Y., Zhengyou, Z.: Model-based head pose tracking with stereo vision. In Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp. 255–260. IEEE (2002)
Google Scholar
Srinivasan, S., Boyer, K.L.: Head pose estimation using view based eigenspaces. In: Object Recognition Supported by User Interaction for Service Robots, vol. 4, pp. 302–305. IEEE (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Jin Chen, Huahu Xu, Minjie Bian, Yuzhe Huang & Chen Cheng
State Grid Shanghai Municipal Electric Power Company, Shanghai, China
Minjie Bian
Shanghai Shangda Hairun Information System Co., Ltd., Shanghai, China
Jiangang Shi

Authors

Jin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Huahu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Minjie Bian
View author publications
You can also search for this author in PubMed Google Scholar
Jiangang Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhe Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huahu Xu .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Xi’an Jiaotong-Liverpool University, Suzhou, China
Xinheng Wang
Zhejiang University City College, Hangzhou, China
Wei Wei
London South Bank University, London, UK
Tasos Dagiuklas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Xu, H., Bian, M., Shi, J., Huang, Y., Cheng, C. (2022). Fine-Grained Head Pose Estimation Based on a 6D Rotation Representation with Multiregression Loss. In: Gao, H., Wang, X., Wei, W., Dagiuklas, T. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 461. Springer, Cham. https://doi.org/10.1007/978-3-031-24386-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-24386-8_13
Published: 25 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24385-1
Online ISBN: 978-3-031-24386-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics