Skip to main content

Fine-Grained Head Pose Estimation Based on a 6D Rotation Representation with Multiregression Loss

  • Conference paper
  • First Online:
Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2022)

Abstract

Estimating the head pose is vital in action evaluation since it has extensive applications such as in automobile driver-assistance systems, performance evaluations of athletes and customers’ attention in retail stores. It is difficult to predict the head orientation from an RGB image by deep learning more accurately. We propose 6DHPENet, a fine-grained 6D head pose estimation network, to estimate the 3D rotations of the head. First, the model adopts a 6D rotation representation for 3D rotations as training objective to guarantee effective learning. 6D rotation representation is a continuous and one-to-one mapping function for 3D rotations. Second, achieving 3D facial landmarks from real-time activities consumes more time and is subject to frontal views. We drop the 3D facial landmarks to enhance the adaptability and generalization ability in various application scenes. Third, after the last convolution extraction layer, a squeeze-and-excitation module is introduced to construct both the local spatial and global channel-wise facial feature information by explicitly modeling the interdependencies between the feature channels. Finally, a multiregression loss function is presented to improve the accuracy and stability for a full-range view of the head pose estimation. In addition, our method is compact and efficient for mobile devices because of the lightweight CNN backbone. The quantitative experiment results trained on 300W-LP datasets show the superior performance of our 6D rotation representation-based multiregression fine-grained method on the AFLW2000 and BIWI datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vincent, L., Pascal, F.: Monocular model-based 3D tracking of rigid objects: a survey, now (2005)

    Google Scholar 

  2. Xiangyu, Z., Zhen, L., Xiaoming, L., Hailin, S., Stan, Z.L.: Face alignment across large poses: a 3D solution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 146–155. IEEE (2016)

    Google Scholar 

  3. Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Trans. Intell. Transp. Syst. 11(2), 300–311 (2010)

    Google Scholar 

  4. Rehder, E., Kloeden H., Stiller, C.: Head detection and orientation estimation for pedestrian safety. In: Proceedings IEEE International Conference Intelligent Transportation Systems 2014, pp. 2292–2297 (2014)

    Google Scholar 

  5. Pirsiavash, H., Vondrick, C., Torralba, A.: Assessing the quality of actions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 556–571. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_36

    Chapter  Google Scholar 

  6. Adrian, B., Georgios, T.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In: Proceedings of International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  7. Blanz, V., Vetter, T., Rockwood, A.: A morphable model for the synthesis of 3D faces. In: SIGGRAPH 1999: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194(1999)

    Google Scholar 

  8. Ruiz, N., Chong, E., Rehg, J.M.: Fine-grained head pose estimation without keypoints. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2155–215509 (2018)

    Google Scholar 

  9. Hsu H., Wu, S T., Sheng, W., Wing, H.W., Lee C.: QuatNet: quaternion-based head pose estimation with multiregression loss. IEEE Trans. Multimedia 21(4), 1035–1046 (2019)

    Google Scholar 

  10. Tsun-Yi, Y., Yi-Ting, C., Yen-Yu, L., Yung-Yu, C.: FSA-Net: learning fine-grained structure aggregation for head pose estimation from a single image. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1087–1096 (2019)

    Google Scholar 

  11. Slabaugh, G.G.: Computing Euler angles from a rotation matrix (1999)

    Google Scholar 

  12. Ashutosh, S., Justin, D., Andrew, Y.N.: Learning 3-D object orientation from images. In: 2009 IEEE International Conference on Robotics and Automation, pp. 794–800. IEEE (2009)

    Google Scholar 

  13. Zhou, Y., Barnes, C., Jingwan, L., Yang, J., Hao, L.: On the continuity of rotation representations in neural networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5738–5746 (2019)

    Google Scholar 

  14. Yi, S., Xiaogang W., Xiaoou T.: Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013)

    Google Scholar 

  15. Mingxing, T., Quoc, V.L.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)

  16. Xiaohan, D., Zhang, X., Ningning, M., Jungong, H., Guiguang, D., Jian, S.: RepVGG: making VGG-style ConvNets great again. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13728–13737 (2021)

    Google Scholar 

  17. Jie, H., Li, S., Gang, S., Albanie, S.: Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 99 (2017)

    Google Scholar 

  18. Zhou, Y., Gregson , J.: WHENet: real-time fine-grained estimation for wide range head pose. In: 2020 British Machine Vision Conference (BMVC) (2020)

    Google Scholar 

  19. Hempel, T., Abdelrahman, A. A., Al-Hamadi A.: 6D rotation representation for unconstrained head pose estimation. arXiv e-prints (2022)

    Google Scholar 

  20. Zhi, C., Zong, C., Dong, L., Ying, C.: A vector-based representation to enhance head pose estimation. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1187–1196 (2021)

    Google Scholar 

  21. Wu, C.Y., Xu, Q., Neumann, U.: Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry (2021)

    Google Scholar 

  22. Soheil, S., Arya, S., Josep, M.P., Federico, T.: On closed-form formulas for the 3-d nearest rotation matrix problem. IEEE Trans. Robotics 36(4), 1333–1339 (2020)

    Google Scholar 

  23. Kendall, A., Cipolla, R., et al.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings CVPR, vol. 3, p. 8 (2017)

    Google Scholar 

  24. Li, F.J., Xu, Z.B.: The essential order of approximation for neural networks. Sci. China Ser. F Inf. Sci. 194(1), 120–127 (2004)

    MATH  Google Scholar 

  25. Stiefel manifold. https://en.wikipedia.org/wiki/Stiefel_manifold

  26. Bloom, D.M.: Linear algebra and geometry. CUP Archive (1979)

    Google Scholar 

  27. Rodrigues, O.: Journal de Math´ematiques 5, 380 (1840)

    Google Scholar 

  28. Gabriele, F., Matthias, D., Juergen, G., Andrea, F., Luc, V.G.: Random forests for real time 3D face analysis. Int. J. Comput. Vision 101(3), 437–458 (2013)

    Article  Google Scholar 

  29. Christos, S., Georgios, T., Stefanos, Z., Maja, P.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 397–403 (2013)

    Google Scholar 

  30. Xiangxin, Z., Deva, R.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886. IEEE (2012)

    Google Scholar 

  31. Erjin, Z., Haoqiang, F., Zhimin, C., Yuning, J., Qi, Y.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 386–391 (2013)

    Google Scholar 

  32. Peter, N.B., David, W.J., David, J.K., Neeraj, K.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)

    Article  Google Scholar 

  33. Peter, M., Roth, M.K., Paul, W., Horst, B.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: Proceedings First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)

    Google Scholar 

  34. Kaipeng, Z., Zhanpeng, Z., Zhifeng, L., Yu, Q.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)

    Article  Google Scholar 

  35. Tsun-Yi, Y., Yi-Hsuan, H., Yen-Yu, L., Pi-Cheng, H., Yung-Yu, C.: SSR-Net: a compact soft stagewise regression network for age estimation. In: IJCAI, vol. 5, p. 7 (2018)

    Google Scholar 

  36. Bin, H., Renwen, C., Wang, X., Qinbang, Z.: Improving head pose estimation using two-stage ensembles with top-k regression. Image Vis. Comput. 93, 103827 (2020)

    Article  Google Scholar 

  37. Jamie, S., Shaogang, G., Eng-Jon, O.: Understanding pose discrimination in similarity space. In: BMVC, pp. 1–10 (1999)

    Google Scholar 

  38. Jeffrey, N., Shaogang, G.: Composite support vector machines for detection of faces across views and pose estimation. Image Vis. Comput. 20(5–6), 359–368 (2002). https://doi.org/10.1016/S0262-8856(02)00008-2

    Article  Google Scholar 

  39. Cai, Q., Gallup, D., Zhang, C., Zhang, Z.: 3D deformable face tracking with a commodity depth camera. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 229–242. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15558-1_17

    Chapter  Google Scholar 

  40. Ruigang, Y., Zhengyou, Z.: Model-based head pose tracking with stereo vision. In Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp. 255–260. IEEE (2002)

    Google Scholar 

  41. Srinivasan, S., Boyer, K.L.: Head pose estimation using view based eigenspaces. In: Object Recognition Supported by User Interaction for Service Robots, vol. 4, pp. 302–305. IEEE (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huahu Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, J., Xu, H., Bian, M., Shi, J., Huang, Y., Cheng, C. (2022). Fine-Grained Head Pose Estimation Based on a 6D Rotation Representation with Multiregression Loss. In: Gao, H., Wang, X., Wei, W., Dagiuklas, T. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 461. Springer, Cham. https://doi.org/10.1007/978-3-031-24386-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24386-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24385-1

  • Online ISBN: 978-3-031-24386-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics