Multiple Teacher Knowledge Distillation for Head Pose Estimation Without Keypoints

Thai, Chien; Nham, Ninh; Tran, Viet; Bui, Minh; Ninh, Huong; Tran, Hai

doi:10.1007/s42979-023-02233-x

Multiple Teacher Knowledge Distillation for Head Pose Estimation Without Keypoints

Original Research
Published: 29 September 2023

Volume 4, article number 758, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Chien Thai¹,
Ninh Nham¹,
Viet Tran²,
Minh Bui²,
Huong Ninh² &
…
Hai Tran²

103 Accesses
Explore all metrics

Abstract

In recent years, human head pose estimation has played a significant role in facial analysis with a variety of practical applications such as gaze estimation, virtual reality, driver assistance, etc. Due to its importance, in this paper, we propose a lightweight model to effectively deal with the task of head pose estimation. Firstly, the teacher models is trained on the synthesis dataset 300W-LPA to obtain the head pose pseudo labels; before an architecture with ResNet18 backbone is adopted and trained with the ensemble of these pseudo labels via the knowledge distillation process. Real-world head pose datasets AFLW-2000 and BIWI are used to evaluate our proposed approach efficacy. Experimental results prove the significant improvement of our proposed approach in the testing accuracy in comparison with other state-of-the-art head pose estimation methods. Furthermore, our model has the real-time speed of \(\sim\)300 FPS when inferring on Tesla V100. Source code and pre-trained weight are available at github.com/chientv99/headpose.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A Data-Driven Approach to Improve 3D Head-Pose Estimation

Lightweight head pose estimation without keypoints based on multi-scale lightweight neural network

Article 02 February 2023

TinyPoseNet: A Fast and Compact Deep Network for Robust Head Pose Estimation

Availability of data and materials

Modified head pose dataset will be available soon at https://drive.google.com/drive/folders/11N3O-eONLXGRrr_x9PJRjBQVibiK32dO?usp=sharing.

Code availability

All source codes for head pose estimation method is available at https://github.com/chientv99/headpose.

References

Cao X, Wei Y, Wen F, Sun J. Face alignment by explicit shape regression. Int J Comput Vis. 2014;107(2):177–90.
Article MathSciNet Google Scholar
Lathuilière S, Juge R, Mesejo P, Munoz-Salinas R, Horaud R. Deep mixture of linear inverse regressions applied to head-pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. pp. 4817–4825.
Fanelli, G., Weise, T., Gall, J., Van Gool, L.: Real time head pose estimation from consumer depth cameras. In: Joint Pattern Recognition Symposium, Springer; 2011. pp. 101–110.
Xiong X, De la Torre F. Global supervised descent method. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 2664–2673.
Sun Y, Wang X, Tang X. Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2013. p. 3476–3483.
Xin M, Mo S, Lin Y. Eva-GCN: Head pose estimation based on graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 1462–1471.
Bulat A, Tzimiropoulos G. How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In: Proceedings of the IEEE International Conference on Computer Vision; 2017, p. 1021–1030.
DeMenthon DF, Davis LS. Model-based object pose in 25 lines of code. Int J Comput Vision. 1995;15(1–2):123–41.
Article Google Scholar
Ruiz N, Chong E, Rehg JM. Fine-grained head pose estimation without keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; 2018, p. 2074–2083.
Yang T-Y, Chen Y-T, Lin Y-Y, Chuang Y-Y. FSA-net: Learning fine-grained structure aggregation for head pose estimation from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 1087–1096.
Zhou, Y., Gregson, J.: Whenet: Real-time fine-grained estimation for wide range head pose. arXiv:2005.10353 (2020)
Chang F-J, Tuan TA, Hassner T, Masi I, Nevatia R, Medioni G. Faceposenet: making a case for landmark-free face alignment. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017. p. 1599–1608.
Meyer, G.P., Gupta, S., Frosio, I., Reddy, D., Kautz, J.: Robust model-based 3D head pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 3649–3657.
Mukherjee SS, Robertson NM. Deep head pose: Gaze-direction estimation in multimodal video. IEEE Trans Multimedia. 2015;17(11):2094–107.
Article Google Scholar
Martin M, Van De Camp F, Stiefelhagen R. Real time head model creation and head pose estimation on consumer depth cameras. In: 2014 2nd International Conference on 3D Vision, vol. 1, IEEE; 2014. p. 641–648.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 770–778.
Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 1492–1500
Zhang H, Wu C, Zhang Z, Zhu Y, Lin H, Zhang Z, Sun Y, He T, Mueller J, Manmatha R, et al. Resnest: split-attention networks. arXiv:2004.08955 (2020)
Gao S-H, et al. Res2net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2019;43(2):652–62.
Article Google Scholar
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y. Fitnets: hints for thin deep nets. arXiv:1412.6550 (2014)
Park W, Kim D, Lu Y, Cho M. Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 3967–3976.
Niyogi S, Freeman WT. Example-based head tracking. In: Proceedings of the Second International Conference on Automatic Face and Gesture Recognition; 1996. p. 374–378.
Beymer D. Face recognition under varying pose. CVPR. 1994;94:137.
Google Scholar
Sherrah J, Gong S, Ong E-J. Face distributions in similarity space under varying head pose. Image Vis Comput. 2001;19(12):807–19.
Article Google Scholar
Ng J, Gong S. Composite support vector machines for detection of faces across views and pose estimation. Image Vis Comput. 2002;20(5–6):359–68.
Article Google Scholar
Sherrah J, Gong S, Ong E-J. Understanding pose discrimination in similarity space. In: Proceedings of the British Machine Vision Conference; 1999. p. 523–32
Huang J, Shao X, Wechsler H. Face pose discrimination using support vector machines (SVM). In: Proceedings of fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170), vol. 1; 1998. p. 154–156.
Zhang Z, Hu Y, Liu M, Huang T. Head pose estimation in seminar room using multi view face detectors. In: International Evaluation Workshop on Classification of Events, Activities and Relationships, Springer; 2006. p. 299–304.
Jones M, Viola P. Fast multi-view face detection. Mitsubishi Electric Research Lab TR-20003-96; 2003. 3(14):2.
Chen D, Ren S, Wei Y, Cao X, Sun J. Joint cascade face detection and alignment. In: European Conference on Computer Vision, Springer.2014; p. 109–122.
Kumar A, Alavi A, Chellappa R. Kepler. Keypoint and pose estimation of unconstrained faces by learning efficient h-cnn regressors. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (fg 2017). 2017; p. 258–265.
Zhu X, Ramanan D. Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012; p. 2879–2886.
Ranjan R, Sankaranarayanan S, Castillo CD, Chellappa R. An all-in-one convolutional neural network for face analysis. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017); 2017. p. 17–24.
Ranjan R, Patel VM, Chellappa R. Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell. 2017;41(1):121–35.
Article Google Scholar
Gu J, Yang X, De Mello S, Kautz J. Dynamic facial analysis: from bayesian filtering to recurrent neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 1548–1557.
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition; 2009. p. 248–255.
Srinivas A, Lin T-Y, Parmar N, Shlens J, Abbeel P, Vaswani A. Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 16519–16529.
Hsu G-S, Huang W-F, Yap MH. Edge-embedded multi-dropout framework for real-time face alignment. IEEE Access. 2019;8:6032–44.
Article Google Scholar
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 146–155 (2016)
Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I., Zafeiriou, S.: Retinaface: Single-stage dense face localisation in the wild. arXiv:1905.00641 (2019)
Jun W,Liu YHHS, Mei T. Facex-zoo: a pytorh toolbox for face recognition. 2021.
Kazemi V, Sullivan J. One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2014. p. 1867–1874.
Yang T-Y, Huang Y-H, Lin Y-Y, Hsiu P-C, Chuang Y-Y. Ssr-net: a compact soft stagewise regression network for age estimation. IJCAI. 2018;5:7.
Google Scholar
Thai C, Tran V, Bui M, Ninh H, Tran H. An effective deep network for head pose estimation without keypoints. In: Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM; 2022. p. 90–98
Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence. 2018.

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Hanoi University of Science and Technology, Hanoi, 10000, Vietnam
Chien Thai & Ninh Nham
Viettel Aerospace Institute, Viettel Group, Hanoi, 10000, Vietnam
Viet Tran, Minh Bui, Huong Ninh & Hai Tran

Authors

Chien Thai
View author publications
You can also search for this author in PubMed Google Scholar
Ninh Nham
View author publications
You can also search for this author in PubMed Google Scholar
Viet Tran
View author publications
You can also search for this author in PubMed Google Scholar
Minh Bui
View author publications
You can also search for this author in PubMed Google Scholar
Huong Ninh
View author publications
You can also search for this author in PubMed Google Scholar
Hai Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chien Thai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances on Pattern Recognition Applications and Methods 2022” guest edited by Ana Fred, Maria De Marsico and Gabriella Sanniti di Baja.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Thai, C., Nham, N., Tran, V. et al. Multiple Teacher Knowledge Distillation for Head Pose Estimation Without Keypoints. SN COMPUT. SCI. 4, 758 (2023). https://doi.org/10.1007/s42979-023-02233-x

Download citation

Received: 19 May 2022
Accepted: 28 July 2023
Published: 29 September 2023
DOI: https://doi.org/10.1007/s42979-023-02233-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple Teacher Knowledge Distillation for Head Pose Estimation Without Keypoints

Abstract

Access this article

Similar content being viewed by others

A Data-Driven Approach to Improve 3D Head-Pose Estimation

Lightweight head pose estimation without keypoints based on multi-scale lightweight neural network

TinyPoseNet: A Fast and Compact Deep Network for Robust Head Pose Estimation

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple Teacher Knowledge Distillation for Head Pose Estimation Without Keypoints

Abstract

Access this article

Similar content being viewed by others

A Data-Driven Approach to Improve 3D Head-Pose Estimation

Lightweight head pose estimation without keypoints based on multi-scale lightweight neural network

TinyPoseNet: A Fast and Compact Deep Network for Robust Head Pose Estimation

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation