Lightweight head pose estimation without keypoints based on multi-scale lightweight neural network

Chen, Xiaolei; Lu, Yubing; Cao, Baoning; Lin, Dongmei; Ahmad, Ishfaq

doi:10.1007/s00371-023-02781-6

Lightweight head pose estimation without keypoints based on multi-scale lightweight neural network

Original article
Published: 02 February 2023

Volume 39, pages 2455–2469, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Xiaolei Chen ORCID: orcid.org/0000-0001-9060-5369¹,
Yubing Lu¹,
Baoning Cao¹,
Dongmei Lin¹ &
…
Ishfaq Ahmad²

339 Accesses
1 Altmetric
Explore all metrics

Abstract

Head pose estimation methods without facial key points have emerged as a promising research field. However, there remain several unsolved challenges. For example, the current methods incur a computational cost, require large memory, and are difficult to deploy in practical applications. We propose a lightweight high-precision head pose estimation method based on a dual-stream convolutional neural network for overcoming these issues. The network comprises a dual-stream lightweight backbone network, external attention module, and soft stagewise regression (SSR) module. Dual-stream lightweight backbone network can extract original image features more effectively while keeping low computational overhead. External attention module can enhance the feature map extraction from the backbone network and improve the feature attention. SSR module calculates the probability of the head in each direction and predicts the head pose by regression. Extensive experiments on Annotated Facial Landmarks in the Wild (AFLW2000) and Biwi Kinect Head Pose Database (BIWI) datasets demonstrate that the model proposed in this paper has fewer parameters and lower estimation errors than the state-of-the-art methods in the field of head pose estimation in recent years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rotation Axis Focused Attention Network (RAFA-Net) for Estimating Head Pose

TinyPoseNet: A Fast and Compact Deep Network for Robust Head Pose Estimation

Fine-Grained Head Pose Estimation Based on a 6D Rotation Representation with Multiregression Loss

Data availability statement

Data are openly available in a public repository. The data that support the findings of this study are openly available in [300 W-LP: http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/3ddfa/main.htm], [ALFW2000: http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/3DDFA/main.htm], [BIWI dataset: https://data.vision.ee.ethz.ch/cvl/gfanelli/head_pose/head_forest.html#db].

References

Khan, K., Khan, R.U., Leonardi, R., et al.: Head pose estimation: a survey of the last ten years[J]. Signal Process.: Image Commun. 99, 116479 (2021)
Google Scholar
Asad S, Mooney B, Ahmad I, et al.: Object detection and sensory feedback techniques in building smart cane for the visually impaired: an overview[C]. Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments. 2020: 1–7.
Chang F J, Tuan Tran A, Hassner T, et al.: Faceposenet: Making a case for landmark-free face alignment[C]. Proceedings of the IEEE International Conference on Computer Vision Workshops. 2017: 1599–1608.
Lee S, Saitoh T.: Head pose estimation using convolutional neural network[M]. IT Convergence and Security 2017. Springer, Singapore, 2018: 164–171. C.
Xu X, Kakadiaris I A.: Joint head pose estimation and face alignment framework using global and local CNN features[C]. 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 2017: 642–649.
Mukherjee, S.S., Robertson, N.M.: Deep head pose: gaze-direction estimation in multimodal video[J]. IEEE Trans. Multim. 17(11), 2094–2107 (2015)
Article Google Scholar
Szegedy C, Liu W, Jia Y, et al.: Going deeper with convolutions[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1–9.
Chuan T, Xinrui H, Zhicheng W, et al:. Head Pose Estimation via Multi-Task Cascade CNN[C]. Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference. 2019: 123–127.
Zhang, K., Zhang, Z., Li, Z., et al.: Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Xu, L., Chen, J., Gan, Y.: Head pose estimation with soft labels using regularized convolutional neural network[J]. Neurocomputing 337, 339–353 (2019)
Article Google Scholar
Zhang, H., Wang, M., Liu, Y., et al.: FDN: feature decoupling network for head pose estimation[C. Proc. AAAI Conf. Artif. Intell. 34(07), 12789–12796 (2020)
Google Scholar
Ruiz N, Chong E, Rehg J M.: Fine-grained head pose estimation without keypoints[C]. Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018: 2074–2083.
Yang T Y, Chen Y T, Lin Y Y, et al.: Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 1087–1096.
Cao Z, Chu Z, Liu D, et al.: A vector-based representation to enhance head pose estimation[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021: 1188–1197.
Hou Q, Zhou D, Feng J.: Coordinate attention for efficient mobile network design[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 13713–13722.
Murphy-Chutorian, E., Trivedi, M.M.: Head-pose estimation in computer vision: a survey[J]. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2008)
Article Google Scholar
Dollár P, Welinder P, Perona P.: Cascaded pose regression[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010: 1078–1085.
Fanelli, G., Dantone, M., Gall, J., et al.: Random forests for real time 3d face analysis[J]. Int. J. Comput. Vision 101(3), 437–458 (2013)
Article Google Scholar
He, L., Wang, G., Liao, Q., et al.: Depth-images-based pose estimation using regression forests and graphical models[J]. Neurocomputing 164, 210–219 (2015)
Article Google Scholar
Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition[J]. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 121–135 (2017)
Article Google Scholar
Zhu X, Lei Z, Liu X, et al.: Face alignment across large poses: A 3d solution[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 146–155.
Liang, D., Geng, Q., Sun, H., et al.: Inferred box harmonization and aggregation for degraded face detection in crowds. Multim. Tools Appl. 81, 35411–35430 (2022)
Article Google Scholar
Kumar A, Alavi A, Chellappa R. Kepler: Keypoint and pose estimation of unconstrained faces by learning efficient h-cnn regressors[C. 2017 12th ieee international conference on automatic face & gesture recognition (fg 2017). IEEE, 2017: 258-265
Xin M, Mo S, Lin Y.: EVA-GCN: Head Pose Estimation Based on Graph Convolutional Networks[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 1462–1471.
Liang X, Xu L, Zhang W, et al.: A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition[J]. The Visual Computer, 2022: 1–14.
Cao Z, Liu D, Wang Q, et al.: Towards unbiased label distribution learning for facial pose estimation using anisotropic spherical Gaussian[C]. European Conference on Computer Vision. Springer, Cham, 2022: 737–753.
Bahroun, S., Abed, R. & Zagrouba, E.: Deep 3D-LBP: CNN-based fusion of shape modeling and texture descriptors for accurate face recognition. Vis Comput (2021).
Yang S, Qiao K, Shi S, et al.: EnNeRFACE: improving the generalization of face reenactment with adaptive ensemble neural radiance fields[J]. The Visual Computer, 2022: 1–14.
Liu, H., Fang, S., Zhang, Z., et al.: MFDNet: Collaborative poses perception and matrix Fisher distribution for head pose estimation[J]. IEEE Trans. Multim. 24, 2449–2460 (2021)
Article Google Scholar
Yang, T.Y., Huang, Y.H., Lin, Y.Y., et al.: SSR-Net: a compact soft stagewise regression network for age estimation[C]. IJCAI. 5(6), 7 (2018)
Google Scholar
Howard A G, Zhu M, Chen B, et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
Han K, Wang Y, Tian Q, et al.: Ghostnet: More features from cheap operations[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1580–1589.
Hu J, Shen L, Sun G.: Squeeze-and-excitation networks[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132–7141.
Woo S, Park J, Lee J Y, et al.: Cbam: Convolutional block attention module[C. Proceedings of the European conference on computer vision (ECCV). 2018: 3–19.
Li X, Wang W, Hu X, et al.: Selective kernel networks[C. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 510–519.
Sandler M, Howard A, Zhu M, et al.: Mobilenetv2: Inverted residuals and linear bottlenecks[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510–4520.
Ma X, Guo J, Tang S, et al.: DCANet: learning connected attentions for convolutional neural networks[J]. arXiv preprint arXiv:2007.05099, 2020.
Tan M, Le Q.: Efficientnet: Rethinking model scaling for convolutional neural networks[C]. International Conference on Machine Learning. PMLR, 2019: 6105–6114.
Tan M, Le Q V.: Efficientnetv2: Smaller models and faster training[J]. arXiv preprint arXiv:2104.00298, 2021.
Stergiou A, Poppe R, Kalliatakis G.: Refining activation downsampling with Softpool[J]. arXiv preprint arXiv:2101.00440, 2021.
Liu W, Anguelov D, Erhan D, et al:. SSD: Single shot multibox detector[C}. European conference on computer vision. Springer, Cham, 2016: 21–37.
Anisimov D, Khanova T.: Towards lightweight convolutional neural networks for object detection[C]. 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, 2017: 1-8
Bulat A, Tzimiropoulos G.: How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks)[C]. Proceedings of the IEEE International Conference on Computer Vision. 2017: 1021–1030.
Kazemi V, Sullivan J.: One millisecond face alignment with an ensemble of regression trees[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 1867–1874.
Huang, B., Chen, R., Wang, Xu., Zhou, Q.: Improving head pose estimation using two-stage ensembles with top-k regression. Image Vis. Comput. 93, 103827 (2020)
Article Google Scholar

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (Nos. 61967012, 61866022, and 61861027) and the Science and Technology Program of Gansu Province (Grant No. 20JR5RA459).

Author information

Authors and Affiliations

College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou, 730000, China
Xiaolei Chen, Yubing Lu, Baoning Cao & Dongmei Lin
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, USA
Ishfaq Ahmad

Authors

Xiaolei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yubing Lu
View author publications
You can also search for this author in PubMed Google Scholar
Baoning Cao
View author publications
You can also search for this author in PubMed Google Scholar
Dongmei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ishfaq Ahmad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaolei Chen.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, X., Lu, Y., Cao, B. et al. Lightweight head pose estimation without keypoints based on multi-scale lightweight neural network. Vis Comput 39, 2455–2469 (2023). https://doi.org/10.1007/s00371-023-02781-6

Download citation

Accepted: 11 January 2023
Published: 02 February 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00371-023-02781-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lightweight head pose estimation without keypoints based on multi-scale lightweight neural network

Abstract

Access this article

Similar content being viewed by others

Rotation Axis Focused Attention Network (RAFA-Net) for Estimating Head Pose

TinyPoseNet: A Fast and Compact Deep Network for Robust Head Pose Estimation

Fine-Grained Head Pose Estimation Based on a 6D Rotation Representation with Multiregression Loss

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lightweight head pose estimation without keypoints based on multi-scale lightweight neural network

Abstract

Access this article

Similar content being viewed by others

Rotation Axis Focused Attention Network (RAFA-Net) for Estimating Head Pose

TinyPoseNet: A Fast and Compact Deep Network for Robust Head Pose Estimation

Fine-Grained Head Pose Estimation Based on a 6D Rotation Representation with Multiregression Loss

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation