Abstract
First impressions strongly influence social interactions, having a high impact in the personal and professional life. In this paper, we present a deep Classification-Regression Network (CR-Net) for analyzing the Big Five personality problem and further assisting on job interview recommendation in a first impressions setup. The setup is based on the ChaLearn First Impressions dataset, including multimodal data with video, audio, and text converted from the corresponding audio data, where each person is talking in front of a camera. In order to give a comprehensive prediction, we analyze the videos from both the entire scene (including the person’s motions and background) and the face of the person. Our CR-Net first performs personality trait classification and applies a regression later, which can obtain accurate predictions for both personality traits and interview recommendation. Furthermore, we present a new loss function called Bell Loss to address inaccurate predictions caused by the regression-to-the-mean problem. Extensive experiments on the First Impressions dataset show the effectiveness of our proposed network, outperforming the state-of-the-art.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Note that our aim is to perform an analysis of our network and loss proposal in order to enhance first impressions recognition. We do not argue that interview recommendation variable has a direct application in real scenarios. Different jobs require different competences and studying automatic recommendation of job profiles is out of the scope of this work.
Images are from Lisa Feldman Barrett’s Keynote speech “From Essences to Predictions: Understanding the Nature of Emotion” on European Society for Cognitive and Affective Neuroscience 2018.
References
Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26.
Basu, A., Dasgupta, A., Thyagharajan, A., Routray, A., Guha, R., & Mitra, P. (2018). A portable personality recognizer based on affective state classification using spectral fusion of features. IEEE Transactions on Affective Computing, 9(3), 330–342.
Bekhouche, S. E., Dornaika, F., Ouafi, A., & Taleb-Ahmed, A. (2017). Personality traits and job candidate screening via analyzing facial videos. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 1660–1663). IEEE.
Bland, J. M., & Altman, D. G. (1994a). Regression towards the mean. BMJ: British Medical Journal, 308(6942), 1499.
Bland, J. M., & Altman, D. G. (1994b). Statistics notes: Some examples of regression towards the mean. BMJ, 309(6957), 780.
Chen, S., Zhang, C., & Dong, M. (2018). Deep age estimation: From classification to ranking. IEEE Transactions on Multimedia, 20(8), 2209–2222.
Corr, P. J., & Matthews, G. (2009). The Cambridge handbook of personality psychology, chap. MethodsofPersonalityAssessment (pp. 110–126). Cambridge: Cambridge University Press.
Correa, J. A. M., Abadi, M. K., Sebe, N., & Patras, I. (2018). Amigos: A dataset for affect, personality and mood research on individuals and groups. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2018.2884461.
Escalante, H. J., Kaya, H., Salah, A. A., Escalera, S., Gucluturk, Y., Guclu, U., et al. (2018). Explaining first impressions: Modeling, recognizing, and explaining apparent personality from videos. arXiv preprint arXiv:1802.00745.
Escalante, H. J., Ponce-López, V., Wan, J., Riegler, M. A., Chen, B., Clapés, A., et al. (2016). Chalearn joint contest on multimedia challenges beyond visual analysis: An overview. In ICPR (pp. 67–73).
Eyben, F., Wöllmer, M., & Schuller, B. (2010). Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on multimedia (pp. 1459–1462). ACM.
Gao, B. B., Zhou, H. Y., Wu, J., & Geng, X. (2018). Age estimation using expectation of label distribution learning. In IJCAI (pp. 712–718).
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.
Güçlütürk, Y., Güçlü, U., Baro, X., Escalante, H. J., Guyon, I., Escalera, S., et al. (2018). Multimodal first impression analysis with deep residual networks. IEEE Transactions on Affective Computing, 9(3), 316–329.
Güçlütürk, Y., Güçlü, U., van Gerven, M. A., & van Lier, R. (2016a). Deep impression: Audiovisual deep residual networks for multimodal apparent personality trait recognition. In European conference on computer vision (pp. 349–358). Berlin: Springer.
Gürpınar, F., Kaya, H., & Salah, A. A. (2016b) Combining deep facial and ambient features for first impression estimation. In European conference on computer vision (pp. 372–385). Berlin: Springer.
Gürpinar, F., Kaya, H., & Salah, A. A. (2016) Multimodal fusion of audio, scene, and face features for first impression estimation. In 2016 23rd International conference on pattern recognition (ICPR) (pp. 43–48). IEEE.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE international joint conference on neural networks (vol. 2, pp. 985–990). IEEE.
Huang, S., & Ramanan, D. (2017). Expecting the unexpected: Training detectors for unusual pedestrians with adversarial imposters. In IEEE conference on computer vision and pattern recognition (CVPR) (vol. 1).
Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.
Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694–711). Berlin: Springer.
Kaya, H., Gürpinar, F., & Salah, A. A. (2017). Multi-modal score fusion and decision trees for explainable automatic job candidate screening from video CVS. In CVPR workshops (pp. 1651–1659).
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R., Torralba, A., et al. (2015). Skip-thought vectors. In Advances in neural information processing systems (pp. 3294–3302).
Klein, D. N., Kotov, R., & Bufferd, S. J. (2011). Personality and depression: Explanatory models and review of the evidence. Annual Review of Clinical Psychology, 7, 269–295.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681–4690).
Li, Y., Miao, Q., Tian, K., Fan, Y., Xu, X., Li, R., et al. (2016). Large-scale gesture recognition with a fusion of rgb-d data based on the c3d model. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 25–30). IEEE.
Li, Y., Miao, Q., Tian, K., Fan, Y., Xu, X., Li, R., et al. (2017). Large-scale gesture recognition with a fusion of rgb-d data based on saliency theory and c3d model. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 2956–2964.
Mairesse, F., & Walker, M. (2007). Personage: Personality generation for dialogue. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 496–503).
Mohammadi, G., & Vinciarelli, A. (2015). Automatic personality perception: Prediction of trait attribution based on prosodic features extended abstract. In 2015 International conference on affective computing and intelligent interaction (ACII) (pp. 484–490). IEEE.
Naim, I., Tanveer, M. I., Gildea, D., & Hoque, M.E. (2015). Automated prediction and analysis of job interview performance: The role of what you say and how you say it. In 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG) (vol. 1, pp. 1–6). IEEE.
Niu, Z., Zhou, M., Wang, L., Gao, X., & Hua, G. (2016). Ordinal regression with multiple output CNN for age estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4920–4928).
Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. The Journal of Abnormal and Social Psychology, 66(6), 574.
Parkhi, O. M., Vedaldi, A., Zisserman, A., et al. (2015). Deep face recognition. In British machine vision conference (Vol. 1, p. 6).
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., et al. (2017). Automatic differentiation in pytorch.
Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 1296.
Polzehl, T., Moller, S., & Metze, F. (2010). Automatically assessing personality from speech. In 2010 IEEE fourth international conference on semantic computing (ICSC) (pp. 134–140). IEEE.
Ponce-López, V., Chen, B., Oliu, M., Corneanu, C., Clapés, A., Guyon, I., et al. (2016). Chalearn lap 2016: First round challenge on first impressions-dataset and results. In European conference on computer vision (pp. 400–418). Berlin: Springer.
Rothe, R., Timofte, R., & Van Gool, L. (2015). Dex: Deep expectation of apparent age from a single image. In Proceedings of the IEEE international conference on computer vision workshops (pp. 10–15).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Subramaniam, A., Patel, V., Mishra, A., Balasubramanian, P., & Mittal, A. (2016). Bi-modal first impressions recognition using temporally ordered deep audio and stochastic visual features. In European conference on computer vision (pp. 337–348). Berlin: Springer.
Tan, Z., Wan, J., Lei, Z., Zhi, R., Guo, G., & Li, S. Z. (2018). Efficient group-n encoding and decoding for facial age estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(11), 2610–2623.
Ventura, C., Masip, D., & Lapedriza, A. (2017). Interpreting CNN models for apparent personality trait regression. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 1705–1713). IEEE.
Vo, N. N., Liu, S., He, X., & Xu, G. (2018). Multimodal mixture density boosting network for personality mining. In Pacific-Asia conference on knowledge discovery and data mining (pp. 644–655). Berlin: Springer.
Wang, X., Yu, K., Dong, C., & Change Loy, C. (2018). Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 606–615).
Wei, X. S., Zhang, C. L., Zhang, H., & Wu, J. (2018). Deep bimodal regression of apparent personality traits from short video sequences. IEEE Transactions on Affective Computing, 9(3), 303–315.
Xia, F., Asabere, N. Y., Liu, H., Chen, Z., & Wang, W. (2017). Socially aware conference participant recommendation with personality traits. IEEE Systems Journal, 11(4), 2255–2266.
Zhang, C. L., Zhang, H., Wei, X. S., & Wu, J. (2016). Deep bimodal regression for apparent personality analysis. In European conference on computer vision (pp. 311–324). Berlin: Springer.
Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503.
Zhao, G., Ge, Y., Shen, B., Wei, X., & Wang, H. (2018). Emotion analysis for personality inference from eeg signals. IEEE Transactions on Affective Computing, 9(3), 362–371.
Acknowledgements
The work was supported by the National Key R&D Program of China under Grant #2018YFC0807500, the National Natural Science Foundations of China #61961160704, #61876179, #61772396, #61772392, #61902296, the Fundamental Research Funds for the Central Universities #JBF180301, Xi’an Key Laboratory of Big Data and Intelligent Vision #201805053ZD4CG37, the Science and Technology Development Fund of Macau (#0008/2018/A1, #0025/2019/A1, #0010/2019/AFJ, #0025/2019/AKP), Spanish project TIN2016-74946-P (MINECO/FEDER, UE) and CERCA Programme/Generalitat de Catalunya.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Wenjun Zeng.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Wan, J., Miao, Q. et al. CR-Net: A Deep Classification-Regression Network for Multimodal Apparent Personality Analysis. Int J Comput Vis 128, 2763–2780 (2020). https://doi.org/10.1007/s11263-020-01309-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-020-01309-y