Abstract
Facial pose estimation refers to the task of predicting face orientation from a single RGB image. It is an important research topic with a wide range of applications in computer vision. Label distribution learning (LDL) based methods have been recently proposed for facial pose estimation, which achieve promising results. However, there are two major issues in existing LDL methods. First, the expectations of label distributions are biased, leading to a biased pose estimation. Second, fixed distribution parameters are applied for all learning samples, severely limiting the model capability. In this paper, we propose an Anisotropic Spherical Gaussian (ASG)-based LDL approach for facial pose estimation. In particular, our approach adopts the spherical Gaussian distribution on a unit sphere which constantly generates unbiased expectation. Meanwhile, we introduce a new loss function that allows the network to learn the distribution parameter for each learning sample flexibly. Extensive experimental results show that our method sets new state-of-the-art records on AFLW2000 and BIWI datasets.
Z. Cao and D. Liu—Equal contributions.
Q. Wang—The analysis and all work described in this paper was performed by the authors at Purdue and RIT. Qifan Wang served as an advisor to the project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Albiero, V., Chen, X., Yin, X., Pang, G., Hassner, T.: img2pose: face alignment and detection via 6DoF, face pose estimation. In: CVPR (2021)
Cao, Z., Chu, Z., Liu, D., Chen, Y.: A vector-based representation to enhance head pose estimation. In: WACV (2021)
Chang, F.J., Tuan Tran, A., Hassner, T., Masi, I., Nevatia, R., Medioni, G.: FaceposeNet: making a case for landmark-free face alignment. In: ICCV Workshops (2017)
Chen, Z., Liu, Z., Hu, H., Bai, J., Lian, S., Shi, F., Wang, K.: A realistic face-to-face conversation system based on deep neural networks. In: ICCV (2019)
Cheng, Z., et al.: Physical attack on monocular depth estimation with optimal adversarial patches. In: ECCV (2022)
Cui, Y., Yan, L., Cao, Z., Liu, D.: TF-blender: temporal feature blender for video object detection. In: ICCV (2021)
De Rousiers, C., Bousseau, A., Subr, K., Holzschuch, N., Ramamoorthi, R.: Real-time rough refraction. In: Symposium on Interactive 3D Graphics and Games, pp. 111–118 (2011)
Diaz, R., Marathe, A.: Soft labels for ordinal regression. In: CVPR (2019)
Fan, Y.Y., et al.: Label distribution-based facial attractiveness computation by deep residual learning. IEEE Trans. Multimedia 20(8), 2196–2208 (2017)
Fanelli, G., Dantone, M., Gall, J., Fossati, A., Van Gool, L.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013)
Fisher, R.A.: Dispersion on a sphere. Proc. R. Soc. London Ser. A Math. Phys. Sci. 217(1130), 295–305 (1953)
Gao, G., Lauri, M., Zhang, J., Frintrop, S.: Occlusion resistant object rotation regression from point cloud segments. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 716–729. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_44
Geng, X., Hou, P.: Pre-release prediction of crowd opinion on movies by label distribution learning. In: IJCAI (2015)
Geng, X., Xia, Y.: Head pose estimation based on multivariate label distribution. In: CVPR (2014)
Geng, X., Yin, C., Zhou, Z.H.: Facial age estimation by learning from label distributions. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2401–2412 (2013). https://doi.org/10.1109/TPAMI.2013.51
Geronimo, D., Lopez, A.M., Sappa, A.D., Graf, T.: Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1239–1258 (2009)
González, Á.: Measurement of areas on a sphere using Fibonacci and latitude-longitude lattices. Math. Geosci. 42(1), 49–64 (2010)
Hara, K., Nishino, K., Ikeuchi, K.: Multiple light sources and reflectance property estimation based on a mixture of spherical distributions. In: ICCV (2005)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hsu, H.W., Wu, T.Y., Wan, S., Wong, W.H., Lee, C.Y.: QuatNet: quaternion-based head pose estimation with multiregression loss. IEEE Trans. Multimedia 21(4), 1035–1046 (2018)
Huang, B., Chen, R., Xu, W., Zhou, Q.: Improving head pose estimation using two-stage ensembles with top-k regression. Image Vis. Comput. 93, 103827 (2020)
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: CVPR (2014)
Kent, J.T.: The Fisher-Bingham distribution on the sphere. J. R. Stat. Soc. Ser. B (Methodol.) 44(1), 71–80 (1982)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155 (2009)
Liu, D., Cui, Y., Tan, W., Chen, Y.: SG-Net: spatial granularity network for one-stage video instance segmentation. In: CVPR (2021)
Liu, et al..: DenserNet: weakly supervised visual localization using multi-scale feature aggregation. In: AAAI (2021)
Liu, X., et al.: AgeNet: deeply learned regressor and classifier for robust apparent age estimation. In: ICCVW (2015)
Liu, Z., Chen, Z., Bai, J., Li, S., Lian, S.: Facial pose estimation by deep learning from label distributions. In: CVPR Workshops (2019)
Liu, Z., Hu, H., Wang, Z., Wang, K., Bai, J., Lian, S.: Video synthesis of human upper body with realistic face. In: 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 200–202. IEEE (2019)
Liu, Z., et al.: Unveiling the power of mixup for stronger classifiers. arXiv preprint arXiv:2103.13027 (2021)
Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: ICCV Workshops (2017)
Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: Proceedings of the First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)
Murphy-Chutorian, E., Doshi, A., Trivedi, M.M.: Head pose estimation for driver assistance systems: a robust algorithm and experimental evaluation. In: 2007 IEEE Intelligent Transportation Systems Conference, pp. 709–714. IEEE (2007)
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: PVNet: pixel-wise voting network for 6DoF pose estimation. In: CVPR (2019)
Ruiz, N., Chong, E., Rehg, J.M.: Fine-grained head pose estimation without keypoints. In: CVPR Workshops (2018)
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: ICCV Workshops (2013)
Song, C., Song, J., Huang, Q.: HybridPose: 6D object pose estimation under hybrid representations. In: CVPR (2020)
Valle, R., Buenaposada, J.M., Baumela, L.: Multi-task head pose estimation in-the-wild. IEEE Trans. Pattern Anal. Mach. Intell. 43, 2874–2881 (2020)
Wang, J., Ren, P., Gong, M., Snyder, J., Guo, B.: All-frequency rendering of dynamic, spatially-varying reflectance. In: ACM SIGGRAPH Asia 2009 papers, pp. 1–10 (2009)
Xiang, S.: Eliminating topological errors in neural network rotation estimation using self-selecting ensembles. ACM Trans. Graph. (TOG) 40(4), 1–21 (2021)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)
Xu, K., Sun, W.L., Dong, Z., Zhao, D.Y., Wu, R.D., Hu, S.M.: Anisotropic spherical gaussians. ACM Trans. Graph. (TOG) 32(6), 1–11 (2013)
Yan, L., et al.: GL-RG: global-local representation granularity for video captioning. In: IJCAI (2022)
Yang, H., Mou, W., Zhang, Y., Patras, I., Gunes, H., Robinson, P.: Face alignment assisted by head pose estimation. arXiv preprint arXiv:1507.03148 (2015)
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: CVPR (2016)
Yang, T.Y., Chen, Y.T., Lin, Y.Y., Chuang, Y.Y.: FSA-net: learning fine-grained structure aggregation for head pose estimation from a single image. In: CVPR (2019)
Zakharov, S., Shugurov, I., Ilic, S.: DPOD: 6D pose object detector and refiner. In: ICCV (2019)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Zhang, Z., Wang, M., Geng, X.: Crowd counting in public video surveillance by label distribution learning. Neurocomputing 166, 151–163 (2015)
Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: CVPR (2019)
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3D solution. In: CVPR (2016)
Zhu, X., Lei, Z., Yan, J., Yi, D., Li, S.Z.: High-fidelity pose and expression normalization for face recognition in the wild. In: CVPR (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cao, Z., Liu, D., Wang, Q., Chen, Y. (2022). Towards Unbiased Label Distribution Learning for Facial Pose Estimation Using Anisotropic Spherical Gaussian. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13672. Springer, Cham. https://doi.org/10.1007/978-3-031-19775-8_43
Download citation
DOI: https://doi.org/10.1007/978-3-031-19775-8_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19774-1
Online ISBN: 978-3-031-19775-8
eBook Packages: Computer ScienceComputer Science (R0)