Abstract
This paper explores the use of language models to predict 20 human traits from users’ Facebook status updates. The data was collected by the myPersonality project, and includes user statuses along with their personality, gender, political identification, religion, race, satisfaction with life, IQ, self-disclosure, fair-mindedness, and belief in astrology. A single interpretable model meets state of the art results for well-studied tasks such as predicting gender and personality; and sets the standard on other traits such as IQ, sensational interests, political identity, and satisfaction with life. Additionally, highly weighted words are published for each trait. These lists are valuable for creating hypotheses about human behavior, as well as for understanding what information a model is extracting. Using performance and extracted features we analyze models built on social media. The real world problems we explore include gendered classification bias and Cambridge Analytica’s use of psychographic models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Stewart, J.B.: Facebook has 50 minutes of your time each day. It wants more. The New York Times, vol. 5 (2016)
SunCorp, Digitising reputation pays off in the rental market (2017)
Khandani, A.E., Kim, A.J., Lo, A.W.: Consumer credit-risk models via machine-learning algorithms. J. Bank. Financ. 34(11), 2767–2787 (2010)
Cogburn, D.L., Espinoza-Vasquez, F.K.: From networked nominee to networked nation: examining the impact of web 2.0 and social media on political participation and civic engagement in the 2008 Obama campaign. J. Polit. Mark. 10(1–2), 189–213 (2011)
González, R.J.: Hacking the citizenry? Personality profiling, big data and the election of Donald Trump. Anthropol. Today 33(3), 9–12 (2017)
Fitzpatrick, K.K., Darcy, A., Vierhile, M.: Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (woebot): a randomized controlled trial. JMIR Mental Health 4(2), e19 (2017). https://doi.org/10.2196/mental.7785. PMID: 28588005, PMCID: 5478797
Allan, R.: Hard questions: who should decide what is hate speech in an online global community? (2017)
Cheng, J., Danescu-Niculescu-Mizil, C., Leskovec, J.: Antisocial behavior in online discussion communities. In: ICWSM, pp. 61–70 (2015)
Noulas, A., Scellato, S., Lambiotte, R., Pontil, M., Mascolo, C.: A tale of many cities: universal patterns in human urban mobility. PloS one 7(5), e37027 (2012)
Yang, S.-H., Long, B., Smola, A., Sadagopan, N., Zheng, Z., Zha, H.: Like like alike: joint friendship and interest propagation in social networks. In: Proceedings of the 20th International Conference on World Wide Web, pp. 537–546. ACM (2011)
Kosinski, M., Matz, S.C., Gosling, S.D., Popov, V., Stillwell, D.: Facebook as a research tool for the social sciences: opportunities, challenges, ethical considerations, and practical guidelines. Am. Psychol. 70(6), 543 (2015)
Henrich, J., Heine, S.J., Norenzayan, A.: The weirdest people in the world? Behav. Brain Sci. 33(2–3), 61–83 (2010)
Egan, V., Auty, J., Miller, R., Ahmadi, S., Richardson, C., Gargan, I.: Sensational interests and general personality traits. J. Forensic Psychiatry 10(3), 567–582 (1999)
Egan, V., Campbell, V.: Sensational interests, sustaining fantasies and personality predict physical aggression. Pers. Individ. Differ. 47(5), 464–469 (2009)
Weiss, A., Egan, V., Figueredo, A.J.: Sensational interests as a form of intrasexual competition. Pers. Individ. Differ. 36(3), 563–573 (2004)
Hagger-Johnson, G., Egan, V., Stillwell, D.: Are social networking profiles reliable indicators of sensational interests? J. Res. Pers. 45(1), 71–76 (2011)
Wang, N., Kosinski, M., Stillwell, D., Rust, J.: Can well-being be measured using facebook status updates? Validation of facebook’s gross national happiness index. Soc. Indic. Res. 115(1), 483–491 (2014)
Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. 110(15), 5802–5805 (2013)
Schwartz, H.A., et al.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS One 8(9), e73791 (2013)
Farnadi, G., et al.: Computational personality recognition in social media. User Model. User Adapt. Interact. 26(2–3), 109–142 (2016)
Sap, M., et al.: Developing age and gender predictive lexica over social media. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1146–1151 (2014)
The New York Times, How trump consultants exploited the data of millions (2018)
Watch, M.: Facebook valuation drops \$75 billion in week after cambridge analytica scandal (2018)
The Guardian, I made Steve Bannons psychological warfare tool: meet the data war whistleblower (2018)
Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic inquiry and word count. In: LIWC 2001, vol. 71, no. 2001, p. 2001. Lawrence Erlbaum Associates, Mahway (2001)
Youyou, W., Kosinski, M., Stillwell, D.: Computer-based personality judgments are more accurate than those made by humans. Proc. Natl. Acad. Sci. 112(4), 1036–1040 (2015)
Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, vol. 1, pp. 1107–1116 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Nguyen, D., et al.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: Proceedings of COLING 2014, The 25th International Conference on Computational Linguistics: Technical Papers, pp. 1950–1961 (2014)
Bivens, R.: The gender binary will not be deprogrammed: ten years of coding gender on facebook. New Media Soc. 19(6), 880–898 (2017)
Digman, J.M.: Personality structure: emergence of the five-factor model. Ann. Rev. Psychol. 41(1), 417–440 (1990)
McCrae, R.R., Costa, P.T.: Validation of the five-factor model of personality across instruments and observers. J. Personality Soc. Psychol. 52(1), 81 (1987)
M. LLC, The development and piloting of an online IQ test (2014)
Kosinski, M.: Measurement and prediction of individual and group differences in the digital environment. Department of Psychology, University of Cambridge (2014)
Flynn, J.R.: Massive IQ gains in 14 nations: what IQ tests really measure. Psychol. Bull. 101(2), 171 (1987)
Diener, E., Emmons, R.A., Larsen, R.J., Griffin, S.: The satisfaction with life scale. J. Pers. Assess. 49(1), 71–75 (1985)
Cooke, L., Wardle, J., Gibson, E., Sapochnik, M., Sheiham, A., Lawson, M.: Demographic, familial and trait predictors of fruit and vegetable consumption by pre-school children. Public Health Nutr. 7(2), 295–302 (2004)
Peciña, M., et al.: Personality trait predictors of placebo analgesia and neurobiological correlates. Neuropsychopharmacology 38(4), 639 (2013)
Quilty, L.C., Sellbom, M., Tackett, J.L., Bagby, R.M.: Personality trait predictors of bipolar disorder symptoms. Psychiatry Res. 169(2), 159–163 (2009)
Tett, R.P., Jackson, D.N., Rothstein, M.: Personality measures as predictors of job performance: a meta-analytic review. Pers. Psychol. 44(4), 703–742 (1991)
Park, G., et al.: Automatic personality assessment through social media language. J. Pers. Soc. Psychol. 108(6), 934 (2015)
Cesare, N., Grant, C., Nsoesie, E.O.: Detection of user demographics on social media: a review of methods and recommendations for best practices. arXiv preprint arXiv:1702.01807 (2017)
Kleinberg, J., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 (2016)
John, O.P., Srivastava, S.: The big five trait taxonomy: history, measurement, and theoretical perspectives. In: Handbook of Personality: Theory and Research, vol. 2, pp. 102–138 (1999)
Kleinberg, J.M.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems, pp. 463–470 (2003)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Shamir, R., Sharan, R.: 1 1 algorithmic approaches to clustering gene expression data. In: Current Topics in Computational Molecular Biology, p. 269 (2002)
Dixon, S., Pampalk, E., Widmer, G.: Classification of dance music by periodicity patterns (2003)
Meinshausen, N., Yu, B.: Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 46, 246–270 (2009)
Lau, R.R., Sigelman, L., Rovner, I.B.: The effects of negative political campaigns: a meta-analytic reassessment. J. Polit. 69(4), 1176–1209 (2007)
Huddy, L.: Group identity and political cohesion. In: Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource (2003)
Branscombe, N.R., Wann, D.L.: Collective self-esteem consequences of outgroup derogation when a valued social identity is on trial. Eur. J. Soc. Psychol. 24(6), 641–657 (1994)
Schneider, M.C., Bos, A.L.: Measuring stereotypes of female politicians. Polit. Psychol. 35(2), 245–266 (2014)
Dolan, K.: The impact of gender stereotyped evaluations on support for women candidates. Polit. Behav. 32(1), 69–88 (2010)
Vehtari, A., Gelman, A., Gabry, J.: Efficient implementation of leave-one-out cross-validation and WAIC for evaluating fitted bayesian models. arXiv preprint arXiv:1507.04544 (2015)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Preoţiuc-Pietro, D., Liu, Y., Hopkins, D., Ungar, L.: Beyond binary labels: political ideology prediction of twitter users. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 729–740 (2017)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Sniekers, S., et al.: Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence. Nature Genet. 49(7), 1107 (2017)
Gottlieb, B.W., Gottlieb, J., Berkell, D., Levy, L.: Sociometric status and solitary play of LD boys and girls. J. Learn. Disabil. 19(10), 619–622 (1986)
Bryan, T., Wheeler, R., Felcan, J., Henek, T.: come on, dummy an observational study of children’s communications. J. Learn. Disabil. 9(10), 661–669 (1976)
McConaughy, S.H., Ritter, D.R.: Social competence and behavioral problems of learning disabled boys aged 6–11. J. Learn. Disabil. 19(1), 39–45 (1986)
Bellanti, C.J., Bierman, K.L.: Disentangling the impact of low cognitive ability and inattention on social behavior and peer relationships. J. Clin. Child Psychol. 29(1), 66–75 (2000)
Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische mathematik 14(5), 403–420 (1970)
Iyyer, M., Enns, P., Boyd-Graber, J., Resnik, P.: Political ideology detection using recursive neural networks. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1113–1122 (2014)
Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524 (2017)
Wired, The decline and fall of an ultra rich online gaming empire (2008)
CBS News: Trump campaign phased out use of Cambridge analytica data before election (2018)
Pew, Religious landscape study (2014)
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.-W.: Men also like shopping: reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457 (2017)
Zou, W.Y., Socher, R., Cer, D., Manning, C.D.: Bilingual word embeddings for phrase-based machine translation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1393–1398 (2013)
Clinchant, S., Perronnin, F.: Aggregating continuous word embeddings for information retrieval. In: Proceedings of the Workshop on Continuous Vector Space Models and Their Compositionality, pp. 100–109 (2013)
Luo, J., Sorour, S.E., Goda, K., Mine, T.: Predicting student grade based on free-style comments using word2vec and ann by considering prediction results obtained in consecutive lessons. International Educational Data Mining Society (2015)
Bolukbasi, T., Chang, K.-W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In: Advances in Neural Information Processing Systems, pp. 4349–4357 (2016)
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214–226. ACM (2012)
Joseph, M., Kearns, M., Morgenstern, J., Neel, S., Roth, A.: Rawlsian fairness for machine learning. arXiv preprint arXiv:1610.09559 (2016)
Kusner, M.J., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Advances in Neural Information Processing Systems, pp. 4069–4079 (2017)
Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1171–1180. International World Wide Web Conferences Steering Committee (2017)
Hardt, M., Price, E., Srebro, N., et al.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems, pp. 3315–3323 (2016)
Grgic-Hlaca, N., Zafar, M.B., Gummadi, K.P., Weller, A.: The case for process fairness in learning: Feature selection for fair decision making. In: NIPS Symposium on Machine Learning and the Law, vol. 1, p. 2 (2016)
Saroglou, V.: Religiousness as a cultural adaptation of basic traits: a five-factor model perspective. Personality Soc. Psychol. Rev. 14(1), 108–125 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Cutler, A., Kulis, B. (2018). Inferring Human Traits from Facebook Statuses. In: Staab, S., Koltsova, O., Ignatov, D. (eds) Social Informatics. SocInfo 2018. Lecture Notes in Computer Science(), vol 11185. Springer, Cham. https://doi.org/10.1007/978-3-030-01129-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-01129-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01128-4
Online ISBN: 978-3-030-01129-1
eBook Packages: Computer ScienceComputer Science (R0)