Advertisement

Facial Expression Recognition with Inconsistently Annotated Datasets

  • Jiabei Zeng
  • Shiguang Shan
  • Xilin Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11217)

Abstract

Annotation errors and bias are inevitable among different facial expression datasets due to the subjectiveness of annotating facial expressions. Ascribe to the inconsistent annotations, performance of existing facial expression recognition (FER) methods cannot keep improving when the training set is enlarged by merging multiple datasets. To address the inconsistency, we propose an Inconsistent Pseudo Annotations to Latent Truth (IPA2LT) framework to train a FER model from multiple inconsistently labeled datasets and large scale unlabeled data. In IPA2LT, we assign each sample more than one labels with human annotations or model predictions. Then, we propose an end-to-end LTNet with a scheme of discovering the latent truth from the inconsistent pseudo labels and the input face images. To our knowledge, IPA2LT serves as the first work to solve the training problem with inconsistently labeled FER datasets. Experiments on synthetic data validate the effectiveness of the proposed method in learning from inconsistent labels. We also conduct extensive experiments in FER and show that our method outperforms other state-of-the-art and optional methods under a rigorous evaluation protocol involving 7 FER datasets.

Notes

Acknowledgement

We gratefully acknowledge the supports from National Key R&D Program of China (grant 2017YFA0700800), National Natural Science Foundation of China (grant 61702481), and External Cooperation Program of CAS (grant GJHZ1843).

References

  1. 1.
    Azadi, S., Feng, J., Jegelka, S., Darrell, T.: Auxiliary image regularization for deep cnns with noisy labels. In: ICLR (2016)Google Scholar
  2. 2.
    Benitez-Quiroz, C.F., Srinivasan, R., Martinez, A.M., et al.: Emotionet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: CVPR, pp. 5562–5570 (2016)Google Scholar
  3. 3.
    Chen, X., Lin, Q., Zhou, D.: Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing. In: ICML, pp. 64–72 (2013)Google Scholar
  4. 4.
    Chu, W.S., De la Torre, F., Cohn, J.F.: Selective transfer machine for personalized facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 39(3), 529–545 (2017)CrossRefGoogle Scholar
  5. 5.
    Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat., 20–28 (1979)CrossRefGoogle Scholar
  6. 6.
    Dehghani, M., Severyn, A., Rothe, S., Kamps, J.: Avoiding your teacher’s mistakes: training neural networks with controlled weak supervision. arXiv preprint arXiv:1711.00313 (2017)
  7. 7.
    Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: ICCV Workshops, pp. 2106–2112 (2011)Google Scholar
  8. 8.
    Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. Proc. Nat. Acad. Sci. 111(15), E1454–E1462 (2014)CrossRefGoogle Scholar
  9. 9.
    Goldberger, J., Ben-Reuven, E.: Training deep neuralnetworks using a noise adaptation layer. In: ICLR (2017)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  11. 11.
    Hu, N., Englebienne, G., Lou, Z.: Kr02se, B.: Learning to recognize human activities using soft labels. IEEE Trans. Pattern Anal. Mach. Intell. 39(10), 1973–1984 (2017)CrossRefGoogle Scholar
  12. 12.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  13. 13.
    Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: Mentornet: regularizing very deep neural networks on corrupted labels. arXiv preprint arXiv:1712.05055 (2017)
  14. 14.
    Jung, H., Lee, S., Yim, J., Park, S., Kim, J.: Joint fine-tuning in deep neural networks for facial expression recognition. In: ICCV, pp. 2983–2991 (2015)Google Scholar
  15. 15.
    Reed, S.: Training deep neural networks on noisy labels with bootstrapping. In: ICLR Workshops, pp. 1–11 (2015)Google Scholar
  16. 16.
    Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: CVPR, pp. 2584–2593 (2017)Google Scholar
  17. 17.
    Li, Y., et al.: Learning from noisy labels with distillation. In: CVPR, pp. 1910–1918 (2017)Google Scholar
  18. 18.
    Liu, P., Han, S., Meng, Z., Tong, Y.: Facial expression recognition via a boosted deep belief network. In: CVPR, pp. 1805–1812 (2014)Google Scholar
  19. 19.
    Liu, Q., Peng, J., Ihler, A.T.: Variational inference for crowdsourcing. In: NIPS, pp. 692–700 (2012)Google Scholar
  20. 20.
    Lucey, P., et al.: The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: CVPR Workshops (2010)Google Scholar
  21. 21.
    Mnih, V., Hinton, G.E.: Learning to label aerial images from noisy data. In: ICML, pp. 567–574 (2012)Google Scholar
  22. 22.
    Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. PP(99), 1 (2017)Google Scholar
  23. 23.
    Moreno, P.G., Artés-Rodríguez, A., Teh, Y.W., Perez-Cruz, F.: Bayesian nonparametric crowdsourcing. J. Mach. Learn. Res. (2015)Google Scholar
  24. 24.
    Natarajan, N., Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. In: NIPS, pp. 1196–1204 (2013)Google Scholar
  25. 25.
    Pantic, M.: Facial expression recognition. In: Encyclopedia of biometrics, pp. 400–406. Springer (2009)Google Scholar
  26. 26.
    Sukhbaatar, S., Fergus, R.: Learning from noisy labels with deep neural networks. arXiv preprint arXiv:1406.2080 (2014)
  27. 27.
    Tian, Y., Kanade, T., Cohn, J.F.: Facial expression recognition. In: Handbook of Face Recognition, pp. 487–519. Springer, London (2011).  https://doi.org/10.1007/978-0-85729-932-1_19CrossRefGoogle Scholar
  28. 28.
    Valstar, M.F., Pantic, M.: Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: International Conference on Language Resources and Evaluation, Workshop on EMOTION, pp. 65–70 (2010)Google Scholar
  29. 29.
    Veit, A., Alldrin, N., Chechik, G., Krasin, I., Gupta, A., Belongie, S.: Learning from noisy large-scale datasets with minimal supervision. In: CVPR (2017)Google Scholar
  30. 30.
    Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)CrossRefGoogle Scholar
  31. 31.
    Zhang, P., Obradovic, Z.: Learning from inconsistent and unreliable annotators by a Gaussian mixture model and bayesian information criterion. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 553–568 (2011)CrossRefGoogle Scholar
  32. 32.
    Zhang, Y., Chen, X., Zhou, D., Jordan, M.I.: Spectral methods meet em: a provably optimal algorithm for crowdsourcing. J. Mach. Learn. Res. 17(1), 3537–3580 (2016)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Zhao, G., Huang, X., Taini, M., Li, S.Z., Pietikäinen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 29(9), 607–619 (2011)CrossRefGoogle Scholar
  34. 34.
    Zhao, X., et al.: Peak-piloted deep network for facial expression recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 425–442. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_27CrossRefGoogle Scholar
  35. 35.
    Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: Is the problem solved? Proc. VLDB Endow. 10(5), 541–552 (2017)CrossRefGoogle Scholar
  36. 36.
    Zhou, D., Liu, Q., Platt, J., Meek, C.: Aggregating ordinal labels from crowds by minimax conditional entropy. In: ICML, pp. 262–270 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Institute of Computing Technology, CASBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.CAS Center for Excellence in Brain Science and Intelligence TechnologyBeijingChina

Personalised recommendations