Skip to main content
Log in

Adaptive occlusion hybrid second-order attention network for head pose estimation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Head pose estimation (HPE) is a challenging and critical research subject with a wide range of applications in areas such as driver monitoring, attention recognition, and human-computer interaction. However, there are two challenging problems in HPE, the first one is that in real application scenarios, occlusion is very common, which affects the accuracy of HPE to a great extent. The second is that most research works use Euler angles to represent the head pose, which may lead to problems in neural network optimization. To solve these problems, an adaptive occlusion hybrid second-order attention network model was proposed. First, facial landmarks were detected by the occlusion-aware module to generate heat maps reflecting the presence or absence of occlusion in the specific facial parts, thereby enhancing features in the non-occluded parts of the face and suppressing features in the occluded regions. Meanwhile, we designed a novel second-order information attention module to interact with spatial and channel information using second-order statistical information, such that the model learns the feature correlations of different facial parts while paying more attention to important channels and suppressing redundant ones to further reduce the effect of occlusion and excavate more powerful features. Furthermore, to avoid ambiguity in common head pose representation, we introduced an exponential map to represent the head pose and designed a prediction framework capable of capturing the geometry of the pose space. The results of the experiments showed that the proposed model was competitive with methods using depth information from the BIWI dataset and achieved obvious advantages on the challenging AFLW2000 dataset, with more robust performance under large poses and occlusion interference, and stronger robustness compared with other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availibility statement

The original datasets are available from [24] and [54]. Our code will not be made publicly available, because we may use it to build a commercial application.

References

  1. Murphy-Chutorian E, Trivedi MM (2009) Head pose estimation in computer vision: a survey. IEEE Trans Pattern Anal Mach Intell 31(4):607–626. https://doi.org/10.1109/TPAMI.2008.106

    Article  Google Scholar 

  2. Wang K, Zhao R, Ji Q (2018) Human Computer Interaction with Head Pose, Eye Gaze and Body Gestures. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp 789-789. https://doi.org/10.1109/FG.2018.00126

  3. Li Y, Li J, Jiang X et al (2019) A Driving Attention Detection Method Based on Head Pose. In: 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation(SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp 483-490. https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00124

  4. Bosch N, Dmello SK (2021) Automatic detection of mind wandering from video in the lab and in the classroom. IEEE Trans Affect Comput 12(4):974–988. https://doi.org/10.1109/TAFFC.2019.2908837

    Article  Google Scholar 

  5. Zhuang Z, Tao H, Chen Y et al (2022) An Optimal Iterative Learning Control Approach for Linear Systems With Nonuniform Trial Lengths Under Input Constraints. IEEE Trans on Syst, Man, and Cybern: Syst 1–13. https://doi.org/10.1109/TSMC.2022.3225381

  6. Zhuang Z, Tao H, Chen Y et al (2022) Iterative learning control for repetitive tasks with randomly varying trial lengths using successive projection. Int J Adapt Control Signal Process 36(5):1196–1215. https://doi.org/10.1002/acs.3396

    Article  MathSciNet  Google Scholar 

  7. Stojanovic V, Nedic N (2016) Robust Kalman filtering for nonlinear multivariable stochastic systems in the presence of non-Gaussian noise. Int J of Robust and Nonlinear Control 26(3):445–460. https://doi.org/10.1002/rnc.3319

    Article  MathSciNet  Google Scholar 

  8. Banan A, Nasiri A, Taheri-Garavand A (2020) Deep learning-based appearance features extraction for automated carp species identification. Aquac Eng 89:102053. https://doi.org/10.1016/j.aquaeng.2020.102053

    Article  Google Scholar 

  9. Chen C, Zhang Q, Kashani MH et al (2022) Forecast of rainfall distribution based on fixed sliding window long short-term memory. Eng Appl of Comput Fluid Mech 16(1):248–261. https://doi.org/10.1080/19942060.2021.2009374

    Article  Google Scholar 

  10. Afan HA, Ibrahem Ahmed Osman A, Essam Y et al (2021) Modeling the fluctuations of groundwater level by employing ensemble deep learning techniques. Eng Appl of Comput Fluid Mech 15(1):1420–1439. https://doi.org/10.1080/19942060.2021.1974093

    Article  Google Scholar 

  11. Chen W, Sharifrazi D, Liang G et al (2022) Accurate discharge coefficient prediction of streamlined weirs by coupling linear regression and deep convolutional gated recurrent unit. Eng Appl of Comput Fluid Mech 16(1):965–976. https://doi.org/10.1080/19942060.2022.2053786

    Article  Google Scholar 

  12. Wang W, Du Y, Chau K et al (2021) An ensemble hybrid forecasting model for annual runoff based on sample entropy, secondary decomposition, and long short-term memory neural network. Water Resour Manag 35:4695–4726. https://doi.org/10.1007/S11269-021-02920-5

    Article  Google Scholar 

  13. Lepetit V, Fua P (2005) Monocular Model-Based 3D Tracking of Rigid Objects: A Survey. Found Trends Comput Graph Vis 1(1):1–89. https://doi.org/10.1561/0600000001

    Article  Google Scholar 

  14. Gao S, Wang J, Lu H et al (2020) Pose-Guided Visible Part Matching for Occluded Person Reid. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11741-11749. https://doi.org/10.1109/CVPR42600.2020.01176

  15. Dai T, Cai J, Zhang Y et al (2019) Second-Order Attention Network for Single Image Super-Resolution. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp 11057-11066. https://doi.org/10.1109/CVPR.2019.01132

  16. Hall, B.C (2003) Lie Algebras and the Exponential Mapping. In: Lie Groups, Lie Algebras, and Representations, pp 27-62. https://doi.org/10.1007/978-0-387-21554-9_2

  17. Abate AF, Bisogni C, Castiglione A et al (2022) Head pose estimation: An extensive survey on recent techniques and applications. Pattern Recognit 127:108591. https://doi.org/10.1016/j.patcog.2022.108591

    Article  Google Scholar 

  18. Dong X, Yu S, Weng X et al (2018) Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 360-368. https://doi.org/10.1109/CVPR.2018.00045

  19. Dong X, Yu S, Weng X et al (2021) Supervision by Registration and Triangulation for Landmark Detection. IEEE Trans Pattern Anal Mach Intell 43(10):3681–3694. https://doi.org/10.1109/TPAMI.2020.2983935

    Article  Google Scholar 

  20. Ranjan R, Patel VM, Chellappa R (2019) Hyperface: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135. https://doi.org/10.1109/TPAMI.2017.2781233

    Article  Google Scholar 

  21. Kumar A, Alavi A, Chellappa R (2017) Kepler: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp 258-265. https://doi.org/10.1109/FG.2017.149

  22. Bulat A, Tzimiropoulos G (2017) How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks). In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1021-1030. https://doi.org/10.1109/ICCV.2017.116

  23. Sun Y, Wang X-G, Tang X (2013) Deep Convolutional Network Cascade for Facial Point Detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp 3476-3483. https://doi.org/10.1109/CVPR.2013.446

  24. Zhu X, Lei Z, Liu X et al (2016) Face Alignment Across Large Poses: A 3D Solution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 146-155. https://doi.org/10.1109/CVPR.2016.23

  25. Guo J, Zhu X, Yang Yet al (2020) Towards Fast, Accurate and Stable 3D Dense Face Alignment. In: Vedaldi A, Bischof H, Brox T, Frahm JM. (eds) Computer Vision - ECCV 2020, Lecture Notes in Computer Science. Springer, Cham, pp 152-168. https://doi.org/10.1007/978-3-030-58529-7_10

  26. Ruiz N, Chong E, Rehg JM (2018) Fine-Grained Head Pose Estimation Without Keypoints. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 2074-2083. https://doi.org/10.1109/CVPRW.2018.00281

  27. Yang TY, Chen YT, Lin YY et al (2019) FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1087-1096. https://doi.org/10.1109/CVPR.2019.00118

  28. Zhang H, Wang M, Liu Y et al (2020) FDN: Feature Decoupling Network for Head Pose Estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 34(07): 12789-12796. https://doi.org/10.1609/aaai.v34i07.6974

  29. Dhingra N (2022) LwPosr: Lightweight Efficient Fine Grained Head Pose Estimation. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 1495-1505. https://doi.org/10.1109/WACV51458.2022.00127

  30. Dhingra N (2021) HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp 1-8. https://doi.org/10.1109/FG52635.2021.9667080

  31. Xu Y-Q, Jung C, Chang Y (2021) Head pose estimation using deep neural networks and 3D point clouds. Pattern Recognit 121:108210. https://doi.org/10.1016/j.patcog.2021.108210

    Article  Google Scholar 

  32. Hu Z, Zhang Y, Xing Y et al (2022) Toward Human-Centered Automated Driving: A Novel Spatiotemporal Vision Transformer-Enabled Head Tracker. IEEE Veh Technol Mag 2–9. https://doi.org/10.1109/MVT.2021.3140047

  33. Cao Z, Chu Z, Liu D et al (2021) A Vector-based Representation to Enhance Head Pose Estimation. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1188-1197. https://doi.org/10.1109/WACV48630.2021.00123

  34. Liu H, Fang S, Zhang Z et al (2021) MFDNet: Collaborative poses perception and matrix Fisher distribution for head pose estimation. IEEE Trans Multimed 24:2449–2460. https://doi.org/10.1109/TMM.2021.3081873

    Article  Google Scholar 

  35. Hsu H-W, Wu T-Y, Wan S et al (2019) Quatnet: Quaternion-Based Head Pose Estimation with Multiregression Loss. IEEE Trans Multimed 21(4):1035–1046. https://doi.org/10.1109/TMM.2018.2866770

    Article  Google Scholar 

  36. Tay NC, Tee C, Ong TS, Teh PS (2019) Abnormal Behavior Recognition using CNN-LSTM with Attention Mechanism. In: 2019 1st International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), pp 1-5. https://doi.org/10.1109/ICECIE47765.2019.8974824

  37. Wang K, Liu M (2022) YOLOv3-MT: A YOLOv3 using multi-target tracking for vehicle visual detection. Appl Intell 52(2):2070–2091. https://doi.org/10.1007/s10489-021-02491-3

    Article  Google Scholar 

  38. Li YX, Wu XR, Li C (2022) A hierarchical conditional random field-based attention mechanism approach for gastric histopathology image classification. Appl Intell 52(9): 9717-9738. https://doi.org/10.1007/s10489-021-02886-2

  39. DING, Z. R (2022) GLPose: Global-Local Attention Network with Feature Interpolation Regularization for Head Pose Estimation of People Wearing Facial Masks. In 33rd British Machine Vision Conference 2022

  40. Zhu X, Yang Q, Zhao L et al (2022) An Improved Tiered Head Pose Estimation Network with Self-Adjust Loss Function. Entropy 24(7):974. https://doi.org/10.3390/e24070974

    Article  Google Scholar 

  41. Li Y K, Yu Y Z, Liu Y L, et al (2022) MS-GCN: Multi-Stream Graph Convolution Network for Driver Head Pose Estimation. In: 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), pp: 3819-3824. https://doi.org/10.1109/ITSC55140.2022.9922277

  42. Li Y, Zeng JB, Shan SG, Chen XL (2019) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28:2439-2450. https://doi.org/10.1109/TIP.2018.2886767

  43. Hu J, Shen L, Sun G et al (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372

    Article  Google Scholar 

  44. Woo S, Park J, Lee JY et al (2018) Cbam: Convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018, Lecture Notes in Computer Science. Springer Cham, pp 3-19. https://doi.org/10.1007/978-3-030-01234-2_1

  45. Liu H, Nie H, Zhang Z et al (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322. https://doi.org/10.1016/j.neucom.2020.09.068

    Article  Google Scholar 

  46. Liu T, Wang J, Yang B et al (2021) NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436: 210-220. https://doi.org/10.1016/j.neucom.2020.12.090

  47. Xu LH, Chen JY, Gan YL (2019) Head pose estimation with soft labels using regularized convolutional neural network. Neurocomputing 337:339–353. https://doi.org/10.1016/j.neucom.2018.12.074

    Article  Google Scholar 

  48. Lee T (2018) Bayesian attitude estimation with the matrix fisher distribution on SO(3). IEEE Trans Autom Control 63(10):3377–3392. https://doi.org/10.1109/TAC.2018.2797162

    Article  MathSciNet  Google Scholar 

  49. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770-778. https://doi.org/10.1109/CVPR.2016.90

  50. Dong X, Yan Y, Ouyang W et al (2018) Style aggregated network for facial landmark detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 379-388. https://doi.org/10.1109/CVPR.2018.00047

  51. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 13708-13717. https://doi.org/10.1109/CVPR46437.2021.01350

  52. Richard M. Murray and Zexiang Li and S. Shankar Sastry. A Mathematical Introduction to Robotic Manipulation. CRC Press, Boca Raton, pp 22-34

  53. MacQueen J (1967) Classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp 281-297

  54. Fanelli G, Dantone M, Gall J et al (2013) Random Forests for Real Time 3D Face Analysis. Int J Comput Vis 101(3):437–458. https://doi.org/10.1007/s11263-012-0549-0

    Article  Google Scholar 

  55. Sagonas C, Tzimiropoulos G, Zafeiriou S et al (2013) 300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge. In: 2013 IEEE International Conference on Computer Vision Workshops, pp 397-403. https://doi.org/10.1109/ICCVW.2013.59

  56. Zhang KP, Zhang ZP, Li ZF et al (2016) Joint Face Detection and Alignment using Multitask Cascaded Convolutional Networks. IEEE Signal Process Lett 23(10):1499–1503. https://doi.org/10.1109/LSP.2016.2603342

    Article  Google Scholar 

  57. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) International Conference on Learning Representations, San Diego

  58. Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 1867-1874. https://doi.org/10.1109/CVPR.2014.241

  59. Xin M, Mo S, Lin Y (2021) EVA-GCN: Head Pose Estimation Based on Graph Convolutional Networks. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 1462-1471. https://doi.org/10.1109/CVPRW53098.2021.00162

  60. Mukherjee SS, Robertson NM (2015) Deep head pose: Gaze-direction estimation in multimodal video. IEEE Trans Multimed 17(11):2094–2107. https://doi.org/10.1109/TMM.2015.2482819

    Article  Google Scholar 

  61. Gu JW, Yang XD, Mello SD et al (2017) Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1531-1540. https://doi.org/10.1109/CVPR.2017.167

  62. Martin M, Camp FVD, Stiefelhagen R (2014) Real Time Head Model Creation and Head Pose Estimation on Consumer Depth Cameras. In: 2014 2nd International Conference on 3D Vision, pp 641-648. https://doi.org/10.1109/3DV.2014.54

  63. Wang Q, Wu B, Zhu P et al (2020) ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11531-11539. https://doi.org/10.1109/CVPR42600.2020.01155

Download references

Acknowledgements

This research was funded by the National Natural Science Foundation of China (62272485), Natural Science Foundation of Xinjiang Uygur Autonomous Region (Grant No. 2020DO1A131) and Teaching and Research Fund of Yangtze University (Grant No. JY2020101). We gratefully acknowledge all the members who participated in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Xie.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

(MP4 16688 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, Q., Xie, K., Wen, C. et al. Adaptive occlusion hybrid second-order attention network for head pose estimation. Int. J. Mach. Learn. & Cyber. 15, 667–683 (2024). https://doi.org/10.1007/s13042-023-01933-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01933-3

Keywords

Navigation